Code for the Recreational Exposure to Open Water and Infection (REPOWI) systematic review and meta-analysis (SRMA) protocol
R code used in 'Recreational exposure to polluted open water and infection: A systematic review and meta-analysis protocol'. The repository contains a single R script ('quantsynth_demo.R') which demonstrates the core quantitative synthesis (meta-analytical modelling and visualisation) methods which are intended to be used in the final review. The script takes as its input the meta-analytical dataset from members of this review team's similar previous systematic review (Leonard et al. 2018a, 2018b), which contains estimates of the association/effect based on estimated cases of infection in non-bathers versus bathers. However, the meta-analytical dataset assembled for our new systematic review will contain estimates of the association/effect based on estimated cases of infection in those least exposed to pollution versus those more exposed to pollution. To run:
- Open the 'REPS-SRMA_protocol.RProj' R Project file (which will open R in RStudio)
- Open 'quantsynth_demo.R' in the 'src' folder (which will then open the script within the R Project in RStudio.
- Run the code to populate the 'figures' folder with the figures
The data used in this example code (data/metaanalysis dataset.csv) is from Leonard et al. (2018a), which was previously deposited as an Excel file at Open Research Exeter as Leonard et al. (2018b). Whilst this dataset was not generated by MLJ and is not strictly relevant to the planned REPOWI review, it is helpful to briefly summarise here the meaning of each column within it. The following information is from a data dictionary provided by the original first author Anne Leonard, with extra information/interpretation provided by me (Matt Lloyd Jones) in italics where deemed helpful, and bold items being those used in the 'quantsynth_demo.R' script:
-
studyid: Study identifier - usually main peer-reviewed publication first author and year.
-
country: Country in which the study was conducted.
-
yearsofstudy: Year(s) in which the study was conducted.
-
studydesign: Study design - randomised trial, prospective cohort, retrospective cohort, cross-sectional, case-control.
-
studysize: Study size (total number of participants in study).
-
studyquality: Study quality as determined by overall CASP assessment (classed as poor, moderate, or good).
-
studypopulation: Brief narrative description of the study population.
-
eligiblitycriteria [sic]: Brief narrative description of the eligibility criteria reported by the study.
-
exposure: Exposure definition as described by the study/publication. Distinction between this and exposuredefinition is not always clear.
-
exposuredefinition: Brief narrative description of the exposure, as described by the study/publication. Distinction between this and exposure is not always clear.
-
exposureanalysis: Indication of which analysis the row of data should be used for based upon exposure definition. Categories include:
- any = main analysis i.e. any contact): Most inclusive definition of exposure if multiple exposure definitions are reported by a study.
- both = both main analysis and head-immersion analysis: Used if a study only has 1 exposure definition and head immersion is part of that definition.
- head = head-immersion analysis. Used if a study has more than 1 exposure definition. This definition must involve head immersion or getting the face wet).
Exposure (subgroup) analysis in which the exposure was included in Leonard et al (2018a). Primary analysis included 'any' and 'both', whilst a follow-up analysis included 'head'. Therefore, in 'quantsynth_demo.R', 'head' exposures are removed to reproduce the primary analysis approach.
-
comparatorgroup: Brief description of the comparator group as reported by the study/publication (for meta-analysis dataset, these should all be some variation of 'non-bather' (e.g. non-swimmer, beach-going non-bather, no water contact).
-
healthoutcomecategory: Major outcome category - there are 3 categories: Any, Ear, Enteric. These are based on outcome and case definition reported by paper (e.g. if eye irritation symptoms were collected, these would be classed as Eye). 'Other' tends to be non-specific symptoms e.g. fever. 'Specific' is where a specific microbiological agent of infection is studied and confirmed. This level of outcome categorisation aligns with the infection type categorisation intended to be used for the REPOWI review, such that 'Ear' aligns with 'Ear' and 'Enteric' aligns with 'Gastrointestinal' (and is renamed as such in 'quantsynth_demo.R').
-
outcomereportedbypaper: What the paper called the outcome (in case of wanting to go to original publication and check data).
-
casedefinition: Case definition. Only relevant for outcomes based on self-reported symptoms - specifically, it is the case definition used in the study for estimating cases of infection where based on self-reported symptoms (e.g. ear ache, 3 or more loose or runny stools within 24h).
-
casedefinitiontype: Type of case definition - there are 4 types: Can't tell, combination, Multi-symptom and Single. If a case defintion is not provided in the paper, this is classed as 'can't tell'. If a case defintion requires 2 or more symptoms to be present, this is classed as a 'combination' case definition. If a case definition allows any one of multiple symptoms to be present to be counted as a case, this is classed as 'Multi-symptom' (i.e. sensitive case defintion). Single symptoms should be self-explanatory e.g. earache, vomiting. Only relevant for outcomes based on self-reported symptoms.
-
outcomeassessment: How the illnesses were reported by participants - usually self-reported. Broadest category of outcome assessment, based on level of definitiveness (i.e. whether estimated cases of infection was based on self-reported symptoms, a clinician's diagnosis, or microbiological confirmation).
-
methodoffollowuporrecall: How data on illness was collected. Method used for following-up with participants to assess the outcome (potential or confirmed infection).
-
durationoffollowuporrecall: How long after exposure participants were followed up. When follow-up occured (time range)/recall period. So '10 days' would mean that follow up occurred at 10 days since exposure, and therefore the recall period was within 10 days of exposure.
-
incidenceorprevalence: Whether incidence or prevalence was measured (all should be incidence) All outcomes in the meta-analytical dataset are listed as incidence because the Leonard et al. (2018a) focussed on incidence.
-
numberexposed: Total number of bathers considered in the analysis. Total number of participants in the exposed (bather) group.
-
numberofunexposed: Total number of non-bathers considered in the analysis. Total number of participants in the unexposed (non-bather) comparator group.
-
numberofexposedcases: Total count of exposed cases (if available). Estimated number of cases of infection in the exposed (bather) group.
-
numberofunexposedcaes [sic]: Total count of unexposed cases (if available). Estimated number of cases of infection in the unexposed (non-bather) comparator group. The typo in this column name is corrected in the script and so it becomes 'numberofunexposedcases' after pre-processing.
-
numberofexposednoncases: Total count of exposed non-cases (if available). Estimated number of non-cases of infection in the exposure group.
-
numberofunexposednoncases: Total count of unexposed non-cases (if available). Estimated number of non-cases of infection in the unexposed (non-bather) comparator group.
-
or: Odds ratio for use in meta-analysis. Point estimate of the odds ratio used for the Leonard et al. (2018a) meta-analysis (adjusted odds ratios are preferred).
-
lor: Lower 95% confidence interval for use in meta-analysis (if available). Lower estimate (95% CI) of the odds ratio used for the Leonard et al. (2018a) meta-analysis (adjusted odds ratios are preferred).
-
uor: Upper 95% confidence interval for use in meta-analysis (if available). Upper estimate (95% CI) of the odds ratio used for the Leonard et al. (2018a) meta-analysis (adjusted odds ratios are preferred).
-
cor: Calculated or reported crude odds ratio (if available).
-
corlower95ci: Lower 95% confidence interval based on crude or calculated odds ratio (if available).
-
corupper95ci Upper 95% confidence interval based on crude or calculated odds ratio (if available).
-
aor: Adjusted odds ratio (if available).
-
aorlower95ci: Lower 95% confidence interval for adjusted odds ratio (if available).
-
aorupper95ci: Upper 95% confidence interval for adjusted odds ratio (if available).
-
symptom: Health outcome group for presentation in figures. These are the outcome categories used for meta-analysis in Leonard et al. (2018a; Figure 2), combining symptom/infection type and case definition elements. Used in the script from removing dysphagia from the enteric/gastrointestinal category as per Leonard et al. (2018a).
-
region: Geographic region the country was conducted in (Europe, north america, or Oceania).
-
rct: Whether the study is a randomised control study or not (yes or no).
-
logOR: Log of the odds ratio. Log of the point estimate of the odds ratio used for use in meta-analysis (i.e. the 'or' column, with adjusted preferred).
-
logOR_L: Log of the lower 95% confidence interval of the odds ratio. Log of the lower estimate of the odds ratio used for use in meta-analysis (i.e. the 'lor' column, with adjusted preferred).
-
logOR_U: Log of the upper 95% confidence interval of the odds ratio. Log of the upper estimate of the odds ratio used for use in meta-analysis (i.e. the 'lor' column, with adjusted preferred).
-
logORse: Log of the OR standard error. Log of SE of the odds ratio used for use in meta-analysis (calculated from confidence intervals; see commented out lines 96-98 in script demonstrating calculation of unlogged OR standard error).
We greatly thank Ed Ivimey-Cook (@EIvimeyCook) for kindly checking the code, data and documentation in this repository.
Leonard, A.F.C., Singer, A., Ukoumunne, O.C., Gaze, W.H., Garside, R., 2018a. Is it safe to go back into the water? A systematic review and meta-analysis of the risk of acquiring infections from recreational exposure to seawater. Int J Epidemiol 47, 572–586. https://doi.org/10.1093/ije/dyx281
Leonard, A.F.C., Singer, A., Ukoumunne, O.C., Gaze, W.H., Garside, R., 2018b. Is it safe to go back into the water? A systematic review and meta-analysis of the risk of acquiring infections from recreational exposure to seawater (dataset). https://doi.org/10.24378/exe.123