The grades of recommendation, assessment, development and evaluation (GRADE) approach is widely implemented in systematic reviews, health technology assessment and guideline development organisations throughout the world. We have previously reported on the development of the Semi-Automated Quality Assessment Tool (SAQAT), which enables a semi-automated validity assessment based on GRADE criteria. The main advantage to our approach is the potential to improve inter-rater agreement of GRADE assessments particularly when used by less experienced researchers, because such judgements can be complex and challenging to apply without training. This is the first study examining the inter-rater agreement of the SAQAT.
We conducted two studies to compare: a) the inter-rater agreement of two researchers using the SAQAT independently on 28 meta-analyses and b) the inter-rater agreement between a researcher using the SAQAT (who had no experience of using GRADE) and an experienced member of the GRADE working group conducting a standard GRADE assessment on 15 meta-analyses.
There was substantial agreement between independent researchers using the Quality Assessment Tool for all domains (for example, overall GRADE rating: weighted kappa 0.79; 95% CI 0.65 to 0.93). Comparison between the SAQAT and a standard GRADE assessment suggested that inconsistency was parameterised too conservatively by the SAQAT. Therefore the tool was amended. Following amendment we found fair-to-moderate agreement between the standard GRADE assessment and the SAQAT (for example, overall GRADE rating: weighted kappa 0.35; 95% CI 0.09 to 0.87).
Despite a need for further research, the SAQAT may aid consistent application of GRADE, particularly by less experienced researchers.
Background: Risk-of-bias assessments are now a standard component of systematic reviews. At present, reviewers need to manually identify relevant parts of research articles for a set of methodological elements that affect the risk of bias, in order to make a risk-of-bias judgement for each of these elements. We investigate the use of text mining methods to automate risk-of-bias assessments in systematic reviews. We aim to identify relevant sentences within the text of included articles, to rank articles by risk of bias and to reduce the number of risk-of-bias assessments that the reviewers need to perform by hand.
Methods: We use supervised machine learning to train two types of models, for each of the three risk-of-bias properties of sequence generation, allocation concealment and blinding. The first model predicts whether a sentence in a research article contains relevant information. The second model predicts a risk-of-bias value for each research article. We use logistic regression, where each independent variable is the frequency of a word in a sentence or article, respectively.
Results: We found that sentences can be successfully ranked by relevance with area under the receiver operating characteristic (ROC) curve (AUC) > 0.98. Articles can be ranked by risk of bias with AUC > 0.72. We estimate that more than 33% of articles can be assessed by just one reviewer, where two reviewers are normally required.
Conclusions: We show that text mining can be used to assist risk-of-bias assessments.
Risk of bias; systematic review; text mining; machine learning
Network meta-analysis (multiple treatments meta-analysis, mixed treatment comparisons) attempts to make the best use of a set of studies comparing more than two treatments. However, it is important to assess whether a body of evidence is consistent or inconsistent. Previous work on models for network meta-analysis that allow for heterogeneity between studies has either been restricted to two-arm trials or followed a Bayesian framework. We propose two new frequentist ways to estimate consistency and inconsistency models by expressing them as multivariate random-effects meta-regressions, which can be implemented in some standard software packages. We illustrate the approach using the mvmeta package in Stata. Copyright © 2012 John Wiley & Sons, Ltd.
The assumption of consistency, defined as agreement between direct and indirect sources of evidence, underlies the increasingly popular method of network meta-analysis. This assumption is often evaluated by statistically testing for a difference between direct and indirect estimates within each loop of evidence. However, the test is believed to be underpowered. We aim to evaluate its properties when applied to a loop typically found in published networks.
In a simulation study we estimate type I error, power and coverage probability of the inconsistency test for dichotomous outcomes using realistic scenarios informed by previous empirical studies. We evaluate test properties in the presence or absence of heterogeneity, using different estimators of heterogeneity and by employing different methods for inference about pairwise summary effects (Knapp-Hartung and inverse variance methods).
As expected, power is positively associated with sample size and frequency of the outcome and negatively associated with the presence of heterogeneity. Type I error converges to the nominal level as the total number of individuals in the loop increases. Coverage is close to the nominal level in most cases. Different estimation methods for heterogeneity do not greatly impact on test performance, but different methods to derive the variances of the direct estimates impact on inconsistency inference. The Knapp-Hartung method is more powerful, especially in the absence of heterogeneity, but exhibits larger type I error. The power for a ‘typical’ loop (comprising of 8 trials and about 2000 participants) to detect a 35% relative change between direct and indirect estimation of the odds ratio was 14% for inverse variance and 21% for Knapp-Hartung methods (with type I error 5% in the former and 11% in the latter).
The study gives insight into the conditions under which the statistical test can detect important inconsistency in a loop of evidence. Although different methods to estimate the uncertainty of the mean effect may improve the test performance, this study suggests that the test has low power for the ‘typical’ loop. Investigators should interpret results very carefully and always consider the comparability of the studies in terms of potential effect modifiers.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2288-14-106) contains supplementary material, which is available to authorized users.
Mixed treatment comparison; Multiple interventions; Coherence; Consistency; Simulation study; Bias
Livestock and poultry operations that feed large numbers of animals are common. Facility capacity varies, but it is not uncommon for facilities to house 1,000 swine with multiple barns at a single site, feedlots to house 50,000 cattle, and poultry houses to house 250,000 hens. There is primary research that suggests livestock facilities that confine animals indoors for feeding can represent a health hazard for surrounding communities. In this protocol, we describe a review about the association between proximity to animal-feeding operations (AFOs) and the health of individuals in nearby communities. A systematic review of the topic was published by some members of our group in 2010. The purpose of this review is to update that review.
The populations of interest are people living in communities near livestock production facilities. Outcomes of interest are any health outcome measured in humans such as respiratory disease, gastrointestinal disease, and mental health. Measures of antibiotic resistance in people from the communities compared to measures of resistance found in animals and the environment on animal-feeding operations will also be summarized. The exposure of interest will be exposure to livestock production using a variety of metrics such as distance from facilities, endotoxin levels, and measures of odor. Electronic searches will be conducted using MEDLINE and MEDLINE In-Process (via OvidSP), CAB Abstracts (via Web of Knowledge), and Science Citation Index (via Web of Knowledge). No language or date restriction will be applied. We will access the risk of bias using a pilot version of a tool developed by the Methods Groups of the Cochrane Collaboration for non-randomized interventions.
We propose to conduct a meta-analysis for each health metric (e.g., combining all respiratory disease outcomes, combining all gastrointestinal outcomes). A planned subgroup analysis will be based on the domains of the risk of bias.
This systematic review will provide synthesis of current evidence reporting the association between living near an animal-feeding operation and human health.
Systematic review registration
Systematic review; Animal-feeding operations; Human health
The grading of recommendation, assessment, development and evaluation (GRADE) approach is widely implemented in health technology assessment and guideline development organisations throughout the world. GRADE provides a transparent approach to reaching judgements about the quality of evidence on the effects of a health care intervention, but is complex and therefore challenging to apply in a consistent manner.
We developed a checklist to guide the researcher to extract the data required to make a GRADE assessment. We applied the checklist to 29 meta-analyses of randomised controlled trials on the effectiveness of health care interventions. Two reviewers used the checklist for each paper and used these data to rate the quality of evidence for a particular outcome.
For most (70%) checklist items, there was good agreement between reviewers. The main problems were for items relating to indirectness where considerable judgement is required.
There was consistent agreement between reviewers on most items in the checklist. The use of this checklist may be an aid to improving the consistency and reproducibility of GRADE assessments, particularly for inexperienced users or in rapid reviews without the resources to conduct assessments by two researchers independently.
GRADE; Checklist; Quality assessment
Several aggregate data meta-analyses suggest that treatment guided by the serum concentration of natriuretic peptides (B-type natriuretic peptide (BNP) or its derivative N-terminal pro-B-type natriuretic peptide (NT-BNP)) reduces all-cause mortality compared with usual care in patients with heart failure (HF). We propose to conduct a meta-analysis using individual participant data (IPD) to estimate the effect of BNP-guided therapy on clinical outcomes, and estimate the extent of effect modification for clinically important subgroups.
We will use standard systematic review methods to identify relevant trials and assess study quality. We will include all randomized controlled trials (RCTs) of BNP-guided treatment for HF that report a clinical outcome. The primary outcome will be time to all-cause mortality. We will collate anonymized, individual patient data into a single database, and carry out appropriate data checks. We will use fixed-effects and random-effects meta-analysis methods to combine hazard ratios (HR) estimated within each RCT, across all RCTs. We will also include a meta-analysis and meta-regression analyses based on aggregate data, and combine IPD with aggregate data if we obtain IPD for a subset of trials.
The IPD meta-analysis will allow us to estimate how patient characteristics modify treatment benefit, and to identify relevant subgroups of patients who are likely to benefit most from BNP-guided therapy. This is important because aggregate meta-analyses have suggested that clinically relevant subgroup effects exist, but these analyses have been unable to quantify the effects reliably or precisely.
PROSPERO 2013: CRD42013005335
Heart failure; B-type natriuretic peptide; individual participant data meta-analysis
In 2008, the Cochrane Collaboration introduced a tool for assessing the risk of bias in clinical trials included in Cochrane reviews. The risk of bias (RoB) tool is based on narrative descriptions of evidence-based methodological features known to increase the risk of bias in trials.
To assess the usability of this tool, we conducted an evaluation by means of focus groups, online surveys and a face-to-face meeting. We obtained feedback from a range of stakeholders within The Cochrane Collaboration regarding their experiences with, and perceptions of, the RoB tool and associated guidance materials. We then assessed this feedback in a face-to-face meeting of experts and stakeholders and made recommendations for improvements and further developments of the RoB tool.
The survey attracted 380 responses. Respondents reported taking an average of between 10 and 60 minutes per study to complete their RoB assessments, which 83% deemed acceptable. Most respondents (87% of authors and 95% of editorial staff) thought RoB assessments were an improvement over past approaches to trial quality assessment. Most authors liked the standardized approach (81%) and the ability to provide quotes to support judgements (74%). A third of participants disliked the increased workload and found the wording describing RoB judgements confusing. The RoB domains reported to be the most difficult to assess were incomplete outcome data and selective reporting of outcomes. Authors expressed the need for more guidance on how to incorporate RoB assessments into meta-analyses and review conclusions. Based on this evaluation, recommendations were made for improvements to the RoB tool and the associated guidance. The implementation of these recommendations is currently underway.
Overall, respondents identified positive experiences and perceptions of the RoB tool. Revisions of the tool and associated guidance made in response to this evaluation, and improved provision of training, may improve implementation.
Survey; Focus groups; Bias assessment; Quality assessment; Systematic reviews
Cell-free fetal DNA (cffDNA) can be detected in maternal blood during pregnancy, opening the possibility of early non-invasive prenatal diagnosis for a variety of genetic conditions. Since 1997, many studies have examined the accuracy of prenatal fetal sex determination using cffDNA, particularly for pregnancies at risk of an X-linked condition. Here we report a review and meta-analysis of the published literature to evaluate the use of cffDNA for prenatal determination (diagnosis) of fetal sex. We applied a sensitive search of multiple bibliographic databases including PubMed (MEDLINE), EMBASE, the Cochrane library and Web of Science.
Ninety studies, incorporating 9,965 pregnancies and 10,587 fetal sex results met our inclusion criteria. Overall mean sensitivity was 96.6% (95% credible interval 95.2% to 97.7%) and mean specificity was 98.9% (95% CI = 98.1% to 99.4%). These results vary very little with trimester or week of testing, indicating that the performance of the test is reliably high.
Based on this review and meta-analysis we conclude that fetal sex can be determined with a high level of accuracy by analyzing cffDNA. Using cffDNA in prenatal diagnosis to replace or complement existing invasive methods can remove or reduce the risk of miscarriage. Future work should concentrate on the economic and ethical considerations of implementing an early non-invasive test for fetal sex.
Cell-free fetal DNA; Meta-analysis; Non-invasive prenatal diagnosis
Background Many meta-analyses contain only a small number of studies, which makes it difficult to estimate the extent of between-study heterogeneity. Bayesian meta-analysis allows incorporation of external evidence on heterogeneity, and offers advantages over conventional random-effects meta-analysis. To assist in this, we provide empirical evidence on the likely extent of heterogeneity in particular areas of health care.
Methods Our analyses included 14 886 meta-analyses from the Cochrane Database of Systematic Reviews. We classified each meta-analysis according to the type of outcome, type of intervention comparison and medical specialty. By modelling the study data from all meta-analyses simultaneously, using the log odds ratio scale, we investigated the impact of meta-analysis characteristics on the underlying between-study heterogeneity variance. Predictive distributions were obtained for the heterogeneity expected in future meta-analyses.
Results Between-study heterogeneity variances for meta-analyses in which the outcome was all-cause mortality were found to be on average 17% (95% CI 10–26) of variances for other outcomes. In meta-analyses comparing two active pharmacological interventions, heterogeneity was on average 75% (95% CI 58–95) of variances for non-pharmacological interventions. Meta-analysis size was found to have only a small effect on heterogeneity. Predictive distributions are presented for nine different settings, defined by type of outcome and type of intervention comparison. For example, for a planned meta-analysis comparing a pharmacological intervention against placebo or control with a subjectively measured outcome, the predictive distribution for heterogeneity is a log-normal (−2.13, 1.582) distribution, which has a median value of 0.12. In an example of meta-analysis of six studies, incorporating external evidence led to a smaller heterogeneity estimate and a narrower confidence interval for the combined intervention effect.
Conclusions Meta-analysis characteristics were strongly associated with the degree of between-study heterogeneity, and predictive distributions for heterogeneity differed substantially across settings. The informative priors provided will be very beneficial in future meta-analyses including few studies.
Meta-analysis; heterogeneity; intervention studies; Bayesian analysis
Cochrane systematic reviews collate and summarise studies of the effects of healthcare interventions. The characteristics of these reviews and the meta-analyses and individual studies they contain provide insights into the nature of healthcare research and important context for the development of relevant statistical and other methods.
We classified every meta-analysis with at least two studies in every review in the January 2008 issue of the Cochrane Database of Systematic Reviews (CDSR) according to the medical specialty, the types of interventions being compared and the type of outcome. We provide descriptive statistics for numbers of meta-analyses, numbers of component studies and sample sizes of component studies, broken down by these categories.
We included 2321 reviews containing 22,453 meta-analyses, which themselves consist of data from 112,600 individual studies (which may appear in more than one meta-analysis). Meta-analyses in the areas of gynaecology, pregnancy and childbirth (21%), mental health (13%) and respiratory diseases (13%) are well represented in the CDSR. Most meta-analyses address drugs, either with a control or placebo group (37%) or in a comparison with another drug (25%). The median number of meta-analyses per review is six (inter-quartile range 3 to 12). The median number of studies included in the meta-analyses with at least two studies is three (inter-quartile range 2 to 6). Sample sizes of individual studies range from 2 to 1,242,071, with a median of 91 participants.
It is clear that the numbers of studies eligible for meta-analyses are typically very small for all medical areas, outcomes and interventions covered by Cochrane reviews. This highlights the particular importance of suitable methods for the meta-analysis of small data sets. There was little variation in number of studies per meta-analysis across medical areas, across outcome data types or across types of interventions being compared.
Missing outcome data from randomized trials lead to greater uncertainty and
possible bias in estimating the effect of an experimental treatment. An
intention-to-treat analysis should take account of all randomized
participants even if they have missing observations.
To review and develop imputation methods for missing outcome data in
meta-analysis of clinical trials with binary outcomes.
We review some common strategies, such as simple imputation of positive or
negative outcomes, and develop a general approach involving
‘informative missingness odds ratios’ (IMORs). We
describe several choices for weighting studies in the meta-analysis, and
illustrate methods using a meta-analysis of trials of haloperidol for
IMORs describe the relationship between the unknown risk among missing
participants and the known risk among observed participants. They are
allowed to differ between treatment groups and across trials. Application of
IMORs and other methods to the haloperidol trials reveals the overall
conclusion to be robust to different assumptions about the missing data.
The methods are based on summary data from each trial (number of observed
positive outcomes, number of observed negative outcomes and number of
missing outcomes) for each intervention group. This limits the options for
analysis, and greater flexibility would be available with individual
We propose that available reasons for missingness be used to determine
appropriate IMORs. We also recommend a strategy for undertaking sensitivity
analyses, in which the IMORs are varied over plausible ranges.
With the advent of high throughput genotyping technology and the information available via projects such as the human genome sequencing and the HapMap project, more and more data relevant to the study of genetics and disease risk will be produced. Systematic reviews and meta-analyses of human genome epidemiology studies rely on the ability to identify relevant studies and to obtain suitable data from these studies. A first port of call for most such reviews is a search of MEDLINE. We examined whether this could be usefully supplemented by identifying databases on the World Wide Web that contain genetic epidemiological information.
We conducted a systematic search for online databases containing genetic epidemiological information on gene prevalence or gene-disease association. In those containing information on genetic association studies, we examined what additional information could be obtained to supplement a MEDLINE literature search.
We identified 111 databases containing prevalence data, 67 databases specific to a single gene and only 13 that contained information on gene-disease associations. Most of the latter 13 databases were linked to MEDLINE, although five contained information that may not be available from other sources.
There is no single resource of structured data from genetic association studies covering multiple diseases, and in relation to the number of studies being conducted there is very little information specific to gene-disease association studies currently available on the World Wide Web. Until comprehensive data repositories are created and utilized regularly, new data will remain largely inaccessible to many systematic review authors and meta-analysts.
Making sense of rapidly evolving evidence on genetic associations is crucial to making genuine advances in human genomics and the eventual integration of this information in the practice of medicine and public health. Assessment of the strengths and weaknesses of this evidence, and hence the ability to synthesize it, has been limited by inadequate reporting of results. The STrengthening the REporting of Genetic Association studies (STREGA) initiative builds on the STrengthening the Reporting of OBservational Studies in Epidemiology (STROBE) Statement and provides additions to 12 of the 22 items on the STROBE checklist. The additions concern population stratification, genotyping errors, modelling haplotype variation, Hardy–Weinberg equilibrium, replication, selection of participants, rationale for choice of genes and variants, treatment effects in studying quantitative traits, statistical methods, relatedness, reporting of descriptive and outcome data and the volume of data issues that are important to consider in genetic association studies. The STREGA recommendations do not prescribe or dictate how a genetic association study should be designed, but seek to enhance the transparency of its reporting, regardless of choices made during design, conduct or analysis.
Epidemiology; gene-disease associations; gene-environment interaction; genetics; genome-wide association; meta-analysis; reporting recommendations; systematic review
Systematic reviews often provide recommendations for further research. When meta-analyses are inconclusive, such recommendations typically argue for further studies to be conducted. However, the nature and amount of future research should depend on the nature and amount of the existing research. We propose a method based on conditional power to make these recommendations more specific. Assuming a random-effects meta-analysis model, we evaluate the influence of the number of additional studies, of their information sizes and of the heterogeneity anticipated among them on the ability of an updated meta-analysis to detect a prespecified effect size. The conditional powers of possible design alternatives can be summarized in a simple graph which can also be the basis for decision making. We use three examples from the Cochrane Database of Systematic Reviews to demonstrate our strategy. We demonstrate that if heterogeneity is anticipated, it might not be possible for a single study to reach the desirable power no matter how large it is. Copyright © 2012 John Wiley & Sons, Ltd.
meta-analysis; power; sample size; evidence-based medicine; random effects; cumulative meta-analysis
Multivariate meta-analysis allows the joint synthesis of effect estimates based on multiple outcomes from multiple studies, accounting for the potential correlations among them. However, standard methods for multivariate meta-analysis for multiple outcomes are restricted to problems where the within-study correlation is known or where individual participant data are available. This paper proposes an approach to approximating the within-study covariances based on information about likely correlations between underlying outcomes. We developed methods for both continuous and dichotomous data and for combinations of the two types. An application to a meta-analysis of treatments for stroke illustrates the use of the approximated covariance in multivariate meta-analysis with correlated outcomes. Copyright © 2012 John Wiley & Sons, Ltd.
multivariate meta-analysis; correlated outcomes; nested events; delta method; within-study correlation
Network meta-analysis is becoming more popular as a way to analyse multiple treatments simultaneously and, in the right circumstances, rank treatments. A difficulty in practice is the possibility of ‘inconsistency’ or ‘incoherence’, where direct evidence and indirect evidence are not in agreement. Here, we develop a random-effects implementation of the recently proposed design-by-treatment interaction model, using these random effects to model inconsistency and estimate the parameters of primary interest. Our proposal is a generalisation of the model proposed by Lumley and allows trials with three or more arms to be included in the analysis. Our methods also facilitate the ranking of treatments under inconsistency. We derive R and I2 statistics to quantify the impact of the between-study heterogeneity and the inconsistency. We apply our model to two examples. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
inconsistency; mixed treatment comparisons; multiple treatments meta-analysis; network meta-analysis; sensitivity analysis