A systematic review can be concluded in a qualitative way by discussing, comparing and tabulating the results of the various studies, or by statistically analysing the results from independent studies: therefore conducting a meta‐analysis. Meta‐analysis has been defined by Glass20
as “the statistical analysis of a large collection of analysis results from individual studies for the purpose of integrating the findings”. By combining individual studies it is possible to provide a ‐single and more precise estimate of the treatment effects.11,21
However, the quantitative synthesis of results from a series of studies is meaningful only if these studies have been identified and collected in a proper and systematic way. Thus, the reason why the systematic review always precedes the meta‐analysis and the two methodologies are commonly used together. Ideally, the combination of individual study results to get a single summary estimate is appropriate when the selected studies are targeted to a common goal, have similar clinical populations, and share the same study design. When the studies are thought to be too different (statistically or clinically), some researchers prefer not to calculate summary estimates. Reasons for not presenting the summary estimates are usually related to study heterogeneity aspects such as clinical diversity (e.g. different metrics or outcomes, participant characteristics, different settings, etc.), methodological diversity (different study designs) and statistical heterogeneity.22
Some methods, however, are available for dealing with these problems in order to combine the study results.22
Nevertheless, the source of heterogeneity should be always explored using, for example, sensitivity analyses. In this analysis the primary studies are classified in different groups based on methodological and/or clinical characteristics and subsequently compared. Even after this subgroup analysis the studies included in the groups may still be statistically heterogeneous and therefore the calculation of a single estimate may be questionable.11,19
Statistically heterogeneity can be calculated with different tests but the most popular are the Cochran's Q23
Although the latter is thought to be more powerful, it has been shown that their performance is similar24
and these tests are generally weak (low power). Therefore, their confidence intervals should always be presented in meta‐analyses and taken into consideration when interpreting heterogeneity. Although heterogeneity can be seen as a “statistical” problem, it is also an opportunity for obtaining important clinical information about the influences of specific clinical differences.11
Sometimes, the goal of a meta‐analysis is to explore the source of diversity among studies.15
In this situation the inclusion criteria are purposely allowed to be broader.
Meta‐analyses of observational studies
Although meta‐analyses usually combine results from RCTs, meta‐analyses of epidemiological studies (case‐control, cross‐sectional or cohort studies) are increasing in the literature, and therefore, guidelines for conducting this type of meta‐analysis have been proposed (e.g. Meta‐analysis Of Observational Studies in Epidemiology, MOOSE25
). Although the highest level of evidence study design is the RCT, observational studies are used in situations where RCTs are not possible such as when investigating the potential causes of a rare disease or the prevalence of a condition and other etiological hypotheses.3,4,11
The two designs, however, usually address different research questions (e.g. efficacy versus effectiveness) and therefore the inclusion of both RCTs and observational studies in meta‐analyses would not be appropriate.11,15
Major problems of observational studies are the lack of a control group, the difficultly controlling for confounding variables, and the high risk of bias.26
Nevertheless, observational studies and therefore the meta‐analyses of observational studies can be useful and are an important step in examining the effectiveness of treatments in healthcare.3,4,11
For the meta‐analyses of observational studies, sensitivity analyses for exploring the source of heterogeneity is often the main aim. To note, meta‐analyses themselves can be considered “observational studies of the evidence”11
and, as a consequence, they may be influenced by known and unknown confounders similarly to primary type observational studies.
Meta‐analyses based on individual patient data
While “traditional” meta‐analyses combine aggregate data (average of the study participants such as mean treatment effects, mean age, etc.) for calculating a summary estimate, it is possible (if data are available) to perform meta‐analyses using the individual participant data on which the aggregate data are derived.27‐29
Meta‐analyses based on individual participant data are increasing.28
This kind of meta‐analysis is considered the most comprehensive and has been regarded as the gold standard for systematic reviews.29,30
Of course, it is not possible to simply pool together the participants of various studies as if they come from a large, single trial. The analysis must be stratified by study so that the clustering of patients within the studies is retained for preserving the effects of the randomization used in the primary investigations and avoiding artifacts such as the Simpson's paradox, which is a change of direction of the associations.11,15,28,29
There are several potential advantages of this kind of meta‐analysis such as consistent data checking, consistent use of inclusion and exclusion criteria, better methods for dealing with missing data, the possibility of performing the same statistical analyses across studies, and a better examination of the effects of participant‐level covariates.15,31,32
Unfortunately, meta‐analyses on individual patient data are often difficult to conduct, time consuming, and it is often not easy to obtain the original data needed for performance of a such an analysis.
Cumulative and Bayesian meta‐analyses
Another form of meta‐analysis is the so‐called “cumulative meta‐analysis”. Cumulative meta‐analyses recognize the cumulative nature of scientific evidence and knowledge.11
In cumulative meta‐analysis a new relevant study on a given topic is added whenever it becomes available. Therefore, a cumulative meta‐analysis shows the pattern of evidence over time and can identify the point when a treatment becomes clinically significant.11,15,33
Cumulative meta‐analyses are not updated meta‐analyses since there is not a single pooling but the results are summarized as each new study is added.33
As a consequence, in the forest plot, commonly used for displaying the effect estimates, the horizontal lines represent the treatment effect estimates as each study is added and not the results of the single studies. The cumulative meta‐analysis should be interpreted within the Bayesian framework even if they differ from the “pure” Bayesian approach for meta‐analysis.
The Bayesian approach differs from the classical, or frequentist methods to meta‐analysis in that data and model parameters are considered to be random quantities and probability is interpreted as an uncertainty rather than a frequency.11,15,34
Compared to the frequentist methods, the Bayesian approach incorporates prior distributions, that can be specified based on a priori beliefs
(being unknown random quantities), and the evidence coming from the study is described as a likelihood function.11,15,34
The combination of prior distribution and likelihood function gives the posterior probability density function.34
The uncertainty around the posterior effect estimate is defined as a credibility interval, which is the equivalent of the confidence interval in the frequentist approach.11,15,34
Although Bayesian meta‐analyses are increasing, they are still less common than traditional (frequentist) meta‐analyses.
Conducting a systematic review and meta‐analysis
As aforementioned, a systematic review must follow well‐defined and established methods. One reference source of practical guidelines for properly apply methodological principles when conducting systematic reviews and meta‐analyses is the Cochrane Handbook for Systematic Reviews of Interventions
that is available for free online.12
However other guidelines and textbooks on systematic reviews and meta‐analysis are available.11,13,14,15
Similarly, authors of reviews should report the results in a transparent and complete way and for this reason an international group of experts developed and published the QUOROM (Quality Of Reporting Of Meta‐analyses),16
and recently the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta‐Analyses)17
guidelines addressing the reporting of systematic reviews and meta‐analyses of studies which evaluate healthcare interventions.17,18
In this section the authors briefly present the principal steps necessary for conducting a systematic review and meta‐analysis, derived from available reference guidelines and textbooks in which all the contents (and much more) of the following section can be found.11,12,14
A summary of the steps is presented in . As with any research, the methods are similar to any other study and start with a careful development of the review protocol, which includes the definition of the research question, the collection and analysis of data, and the interpretation of the results. The protocol defines the methods that will be used in the review and should be set out before starting the review in order to avoid bias, and in case of deviation this should be reported and justified in the manuscript.
Figure 1. Steps in conducting a systematic review. Modified from11,14 Step 1. Defining the review question and eligibility criteria
The authors should start by formulating a precise research question, which means they should clearly report the objectives of the review and what question they would like to address. If necessary, a broad research question may be divided into more specific questions. According to the PICOS framework,35,36
the question should define the P
utcome(s) and S
tudy design(s). This information will also provide the rationale for the inclusion and exclusion criteria for which a background section explaining the context and the key conceptual issues may be also needed. When using terms that may have different interpretations, operational definitions should be provided. An example may be the term “neuromuscular control” which can be interpreted in different ways by different researchers and practitioners. Furthermore, the inclusion criteria should be precise enough to allow the selection of all the studies relevant for answering the research question. In theory, only the best evidence available should be used for the systematic reviews. Unfortunately, the use of an appropriate design (e.g. RCT) does not ensure the study was well‐conducted. However, the use of cut‐offs in quality scores as inclusion criteria is not appropriate given their subjective nature, and a sensitivity analysis comparing all available studies based on some methodological key characteristics is preferable.
Step 2. Searching for studies
The search strategy must be clearly stated and should allow the identification of all the relevant studies. The search strategy is usually based on the PICOS elements and can be conducted using electronic databases, reading the reference lists of relevant studies, hand‐searching journals and conference proceedings, contacting authors, experts in the field and manufacturers, for example.
Currently, it is possible to easily search the literature using electronic databases. However, the use of only one database does not ensure that all the relevant studies will be found and therefore various databases should be searched. The Physiotherapy Evidence Database (PEDro: http://www.pedro.org.au
) provides free access to RCTs (about 18,000) and systematic reviews (almost 4000) on musculoskeletal and orthopaedic physiotherapy (sports being represented by more than 60%). Other available electronic databases are MEDLINE (through PubMed), EMBASE, SCOPUS, CINAHL, Web of Science of the Thomson Reuters and The Cochrane Controlled Trials Register. The necessity of using different databases is justified by the fact that, for example, 1800 journals indexed in MEDLINE are not indexed in EMBASE, and vice versa.
The creation and selection of appropriate keywords and search term lists is important to find the relevant literature, ensuring that the search will be highly sensitive without compromising precision. Therefore, the development of the search strategy is not easy and should be developed carefully taking into consideration the differences between databases and search interfaces. Although Boolean searching (e.g. AND, OR, NOT) and proximity operators (e.g. NEAR, NEXT) are usually available, every database interface has its own search syntax (e.g. different truncation and wildcards) and a different thesaurus for indexing (e.g. MeSH for MEDLINE and EMTREE for EMBASE). Filters already developed for specific topics are also available. For example, PEDro has filters included in search strategies (called SDIs) that are used regularly and automatically in some of the above mentioned databases for retrieving guidelines, RCTs, and systematic reviews.37
After performing the literature search using electronic databases, however, other search strategies should be adopted such as browsing the reference lists of primary and secondary literature and hand searching journals not indexed. Internet sources such as specialized websites can be also used for retrieving grey literature (e.g. unpublished papers, reports, conference proceedings, thesis or any other publications produced by governments, institutions, associations, universities, etc.). Attempts may be also performed for finding, if any, unpublished studies in order to reduce the risk of publication bias (trend to publish positive results or results going in the same direction). Similarly, the selection of only English‐language studies may exacerbate the bias, since authors may tend to publish more positive findings in international journals and more negative results in local journals. Unpublished and non‐English studies generally have lower quality and their inclusion may also introduce a bias. There is no rule for deciding whether to include or not include unpublished or exclusively English‐language studies. The authors are usually invited to think about the influence of these decisions on the findings and/or explore the effects of their inclusion with a sensitivity analysis.
Step 3. Selecting the studies
The selection of the studies should be conducted by more than one reviewer as this process is quite subjective (the agreement, using kappa statistic, between reviewers should be reported together with the reasons for disagreements). Before selecting the studies, the results of the different searches are merged using reference management software and duplicates deleted. After an initial screening of titles and abstracts where the obviously irrelevant studies are removed, the full papers of potentially relevant studies should be retrieved and are selected based on the previously defined inclusion and exclusion criteria. In case of disagreements, a consensus should be reached by discussion or with the help of a third reviewer. Direct contact with the author(s) of the study may also help in clarifying a decision.
An important phase at this step is the assessment of quality. The use of quality scores for weighting the study entered in the meta‐analysis is not recommended, as it is not recommended to include in a meta‐analysis only studies above a cut‐off quality score. However, the quality criteria of the studies must be considered when interpreting the results of a meta‐analysis. This can be done qualitatively or quantitatively through subgroup and sensitivity analyses based on important methodological aspects, which can be assessed using checklists that are preferable over quality scores. If quality scores would like to be used for weighting, alternative statistical techniques have been proposed. e.g.38
The assessment of quality should be performed by two independent observers. The Cochrane handbook, however, makes a distinction between study quality and risk of bias (related for example to the method used to generate random allocation, concealment, blindness, etc.), focusing more on the latter. As for quality assessment, the risk of bias should be taken into consideration when interpreting the findings of the meta‐analysis. The quality of a study is generally assessed based on the information reported in the studies thus linking the quality of reporting to the quality of the research itself, which is not necessarily true. Furthermore, a study conducted at the highest possible standard may still have high risk of bias. In both cases, however, it is important that the authors of primary studies appropriately report the results and for this reason guidelines have been created for improving the quality of reporting such as the CONSORT (Consolidated Standards of Reporting Trials39
) and the STROBE (Strengthening the Reporting of Observational Studies in Epidemiology40
Step 4. Data extraction
Data extraction must be accurate and unbiased and therefore, to reduce possible errors, it should be performed by at least two researchers. Standardized data extraction forms should be created, tested, and if necessary modified before implementation. The extraction forms should be designed taking into consideration the research question and the planned analyses. Information extracted can include general information (author, title, type of publication, country of origin, etc.), study characteristics (e.g. aims of the study, design, randomization techniques, etc.), participant characteristics (e.g. age, gender, etc.), intervention and setting, outcome data and results (e.g. statistical techniques, measurement tool, number of follow up, number of participants enrolled, allocated, and included in the analysis, results of the study such as odds ratio, risk ratio, mean difference and confidence intervals, etc.). Disagreements should be noted and resolved by discussing and reaching a consensus. If needed, a third researcher can be involved to resolve the disagreement.
Step 5. Analysis and presentation of the results (data synthesis)
Once the data are extracted, they are combined, analyzed, and presented. This data synthesis can be done quantitatively using statistical techniques (meta‐analysis), or qualitatively using a narrative approach when pooling is not believed to be appropriate. Irrespective of the approach (quantitative or qualitative), the synthesis should start with a descriptive summary (in tabular form) of the included studies. This table usually includes details on study type, interventions, sample sizes, participant characteristics, outcomes, for example. The quality assessment or the risk of bias should also be reported. For narrative reviews a comprehensive synthesis framework () has been proposed.14,41
Standardization of outcomes
To allow comparison between studies the results of the studies should be expressed in a standardized format such as effect sizes. The appropriate effect size for standardizing the outcomes should be similar between studies so that they can be compared and it can be calculated from the data available in the original articles. Furthermore, it should be interpretable. When the outcomes of the primary studies are reported as means and standard deviations, the effect size can be the raw (unstandardized) difference in means (D), the standardized difference in means (d or g) or the response ratio (R). If the results are reported in the studies as binary outcomes the effect sizes can be the risk ratio (RR), the odds ratio (OR) or the risk difference (RD).15
When a quantitative approach is chosen, meta–analytical techniques are used. Textbooks and courses are available for learning statistical meta‐analytical techniques. Once a summary statistic is calculated for each study, a “pooled” effect estimate of the interventions is determined as the weighting average of individual study estimates, so that the larger studies have more “weight” than the small studies. This is necessary because small studies are more affected by the role of chance.11,15
The two main statistical models used for combining the results are the “fixed‐effect” and the “random‐effects” model. Under the fixed effect model, it is assumed that the variability between studies is only due to random variation because there is only one true (common) effect. In other words, it is assumed that the group of studies give an estimate of the same treatment effect and therefore the effects are part of the same distribution. A common method for weighting each study is the inverse‐variance method, where the weight is given by the inverse of variance of each estimate. Therefore, the two essential data required for this calculation are the estimate of the effect with its standard error. On the other hand, the “random‐effects” model assumes a different underlying effect for each study (the true effect varies from study to study). Therefore the study weight will take into account two sources of error: the between‐ and within‐studies variance. As in the fixed‐effect model, the weight is calculated using the inverse‐variance method, but in random‐effects model the study specific standard errors are adjusted incorporating both within and between‐studies variance. For this reason, the confidence intervals obtained with random‐effect models are usually wider. In theory, the fixed‐effect model can be applied when the studies are heterogeneous while the random‐effects model can be applied when the results are not heterogeneous. However, the statistical tests for examining heterogeneity lack power and, as aforementioned, the heterogeneity should be carefully scrutinized (e.g. interpreting the confidence intervals) before taking a decision. Sometimes, both fixed‐ and random‐effects models are used for examining the robustness of the analysis. Once the analyses are completed, results should be presented as point estimates with the corresponding confidence intervals and exact p
Other than the calculations of the individual studies and summary estimates, other analyses are necessary. As mentioned various time, the exploration of possible source of heterogeneity is important and can be performed using sensitivity, subgroup, or regression analyses. Using meta‐regressions is also possible to examine the effects of differences in study characteristics on the treatment effect estimate. When using meta‐regression, the larger studies have more influence than smaller studies; and regarding other analyses, recall that the limitations should be taken into account before deciding to use it and when interpreting the results.
The results of each trial are commonly displayed with their corresponding confidence intervals in the so‐called “forest plot” (). In the forest plot the study is represented by a square and a horizontal line indicating the confidence interval, where the dimension of the square reflects the weight of each study. A solid vertical line usually corresponds to no effect of treatment. The summary point estimate is usually represented with a diamond at the bottom of the graph with the horizontal extremities indicating the confidence interval. This graphic solution gives an immediate overview of the results.
Figure 3. Example of a forest plot: the squares represent the effect estimate of the individual studies and the horizontal lines indicate the confidence interval; the dimension of the square reflects the weight of each study. The diamond represent the summary point (more ...)
An alternated graphic solution called a funnel plot can be used for investigating the effects of small studies and for identifying publication bias (). The funnel plot is a scatter‐plot of the effect estimates of individual studies against measures of study size and precision (commonly, the standard error, but the use of sample size is still common). If there is no publication bias the funnel plot will be symmetrical (). However, the funnel plot examination is subjective, based upon visual inspection, and therefore can be unreliable. In addition, other causes may influence the symmetry of the funnel plot such as the measures used for estimating the effects and precision, and differences between small and large studies.14
Therefore, its use and interpretation should be done with caution.
Example of symmetric (A) and asymmetric (B) funnel plots.
Step 6. Interpretation of the results
The final part of the process pertains to the interpretation of the results. When interpreting or commenting on the findings, the limitations should be discussed and taken into account, such as the overall risk of bias and the specific biases of the studies included in the systematic review, and the strength of the evidence. Furthermore, the interpretation should be performed based not solely using P‐values, but rather on the uncertainty and the clinical/practical importance. Ideally, the interpretation should help the clinician in understanding how to apply the findings in practice, provide recommendations or implications for policies, and offer directions for further research.