PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Appl Stat. Author manuscript; available in PMC Jan 1, 2012.
Published in final edited form as:
J Appl Stat. Jan 1, 2011; 38(1): 15–27.
doi:  10.1080/02664760903008987
PMCID: PMC3017356
NIHMSID: NIHMS151352
A Unified Approach of Meta-Analysis: Application to an Antecedent Biomarker Study in Alzheimer’s Disease
Chengjie Xiong, PhD,1 Gerald van Belle, PhD,2 Kejun Zhu, PhD,3 J. Philip Miller,1 and John C. Morris, MD4,5
1 Division of Biostatistics, Washington University in St. Louis, St. Louis, MO 63110, U.S.A
4 Department of Neurology, Washington University in St. Louis, St. Louis, MO 63110, U.S.A
5 Department of Pathology and Immunology, Washington University in St. Louis, St. Louis, MO 63110, U.S.A
2 School of Public Health & Community Medicine, Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, WA 98195-4691, U.S.A
3 College of Management Science, China University of Geosciences, Wuhan, PR. China
Corresponding Author: Chengjie Xiong, Ph.D., Division of Biostatistics, Campus Box 8067, Washington University in St. Louis, St. Louis, MO 63110, U.S.A, Phone: (314)3623635, Fax: (314)362-2693, chengjie/at/wubios.wustl.edu
SAS code for implementing the proposed methodology can be requested to the corresponding author through email at chengjie/at/wubios.wustl.edu.
This article provides a unified methodology of meta-analysis that synthesizes medical evidence by using both available individual patient data (IPD) and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers, and can only be obtained when existing evidence from the literature and the most recent individual patient data are optimally synthesized. We propose a general linear mixed effects model to conduct meta-analyses when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies. Our approach includes both the traditional meta-analyses in which only summary statistics are available for all studies and the other extreme case in which individual patient data are available for all studies as special examples. We implement the proposed model with statistical procedures from standard computing packages. We provide measures of heterogeneity based on the proposed model. Finally, we demonstrate the proposed methodology through a real life example studying the cerebrospinal fluid biomarkers to identify individuals with high risk of developing Alzheimer’s disease when they are still cognitively normal.
Keywords: confidence interval, general linear mixed effects model, heterogeneity index, individual patient data, maximum likelihood estimate (MLE), meta-analyses
Systematic reviews have become increasingly used as a means of assessing and interpreting the results from medical research. Such reviews aim to comprehensively identify and assess all studies relevant to a given scientific question, and meta-analysis has been the major statistical methodology for the quantitative synthesis of study results. Many methods for meta-analysis are available, and the most popularly applied in the medical research focus on the optimum combination of published summary statistics in some form of weighted averages.14 Usually, each study is given a weight according to the precision of its results on summary statistics. Studies with good precisions are weighted more heavily than studies with greater uncertainty. The variance for the overall estimate of the parameter under study in meta-analyses is in general from two different sources, one is associated with the individual studies (i.e., the within-study variance), and the other is associated with the possible difference between different studies (i.e., between-study variance). When the between-study variance is assumed to be 0, each study is simply weighted according to its own variance. This approach characterizes a fixed effects model which is exemplified by the Mantel-Haenszel method56 or the Peto method.7 When the between-study variance is not zero, the random effects models6 add the between-study variance to the within-study variance of each individual study when the overall mean of the random effects is estimated. Fixed effects and random effects model for continuous outcomes have also been described.8
Published summary statistics may come from different analytic strategies and may be reported in very inconsistent formats in the literature. Therefore, important scientific questions may not always be adequately addressed by meta-analyses based on summary statistics alone.9 Further, even if all summary statistics were adequately and consistently reported in studies used in a meta-analysis, the lack of individual patient data might result in a loss of information for estimating the model parameters based on the likelihood principle.10 This is partly due to the fact that the likelihood function from the summary statistics is in general not equivalent to the likelihood function from individual patient data if the summary statistics are no longer sufficient statistics of the individual patient data.11 Because of these, meta-analysis using individual patient data has been widely considered the gold standard of meta-analyses.12 This approach provides several advantages, including the ability to use the full likelihood function from the individual patient data for the statistical inferences, the potential to produce consistent analyses across different studies, the use of up-to-date data, the possibility to avoid biases associated with use of aggregate data in meta-regression,1315 and the potential to generate additional scientific hypotheses, particularly related to the individual patient characteristics data where the data would be typically lacking in published results.16 The statistical methodologies of meta-analyses of individual patient data have recently been studied by several authors for different scenarios, including a multilevel model framework for the meta-analyses of clinical trials when the outcome variables are binary,17 a general linear mixed model for the case of continuous outcome variables,9 and another general framework of meta-analyses for ordinal outcomes based on proportional odds model.18 Other approaches have also appeared in the literature.1923
Obtaining and analyzing individual patient data can be both costly and time consuming. In some cases, it is practically impossible to obtain individual patients data from all qualified studies in a meta-analysis.16 Therefore, researchers are very likely confronted with the situation in their meta-analyses that individual patient data are only available for some of the studies and summary statistics reported in the literature have to be used for the others. Whereas statistical methods for meta-analysis using either summary statistics or individual patient data across all studies have been well developed, little attention has been paid to the case when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies in the meta-analysis. This paper aims to address this very question. We propose a likelihood based approach for meta-analyses based on both individual patient data and summary statistics. We also implement the proposed methodology with standard computing packages. Finally, we demonstrate our proposed methodology by presenting an example to study cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal.
We focus on the situation when the outcome variable Y is continuous in cross-sectional studies. We assume that the continuous outcome variable can be modeled as a linear regression function of covariates within each study used in the meta-analyses. The choice of a linear regression model is flexible enough as it contains the meta-analyses not only for controlled clinical trials in which the treatment arms can be represented by dummy variables (i.e., covariates) but also for observational studies in which the association between the outcome variable and covariates is often the primary interest. We focus on the case when only one covariate X is used in the linear regression model, but point out that the generalization of our method to more than 1 covariate is straightforward. Therefore, our major scientific interest in the meta-analysis is to estimate the overall regression parameters (i.e., the slope and the intercept) by pooling all study specific regression parameters together.
Assume that a total of k studies are used in a meta-analysis. Let yij be the outcome of patient j in study i. Let xij be the covariate of patient j in study i. For study i, given equation M1 (t represents the matrix transpose),
equation M2
(1)
where uij is typically 1, equation M3 is the study specific vector of regression coefficients, and εij’s are assumed independently and identically distributed as equation M4 among individuals within the study. Notice that model (1) implies the conditional independence of yij’s within the study, given equation M5. In general, the major interest in study i is the estimate of β1i. In the case of controlled clinical trials assessing the effect of a novel treatment for a disease against a common control, if xij is coded as 0 for control and 1 for the treatment, then β1i represents the treatment effect (relative to the control), and β0i is the mean response from the control arm in trial i. In the case of observational studies between the outcome and the covariate, β1i represents the association between the outcome and the covariate in study i.
We further assume a random effect among studies by assuming that equation M6 follows a bivariate normal distribution with mean vector equation M7 and covariance matrix
equation M8
Notice that a correlation between the intercept and the slope is introduced in this assumption, which conceptually allows the possibility that the effect size of the treatment in clinical trials of two arms is associated with the level of the outcome measures in the control group across the trials. Further, this assumption also makes it necessary to estimate the intercept parameter even if the main interest of the meta-analysis is to estimate the slope in the model. If all individual patient data are available from all studies in the meta-analysis, the model becomes a standard general linear mixed effects model in which intercepts and slopes vary randomly across studies, i.e.,
equation M9
(2)
where equation M10 follows a bivariate normal distribution with mean vector 0 and covariance matrix A, and is assumed independent of εij.
We now assume that for the first s(1≤ sk) studies, only published summary statistics are available. Let equation M11 be the maximum likelihood estimates from study i(1≤ is) based on the conditional model (1), given equation M12. Assume that the sample size from study i is large enough such that, conditional on equation M13 follows approximately a normal distribution with mean equation M14 and covariance matrix Γi which is estimated by
equation M15
More specifically, equation M16 and equation M17 can be obtained by simply squaring the reported standard errors for [beta]0i and [beta]1i from study i, respectively. Therefore, the unconditional distribution of equation M18 is approximately normal with mean equation M19 and covariance matrix Σi = A + Γi. The likelihood function associated with equation M20 for study i is then
equation M21
where |Σi| is the determinant of Σi. It is possible that in some published studies, only summary statistics about the slope parameter β1i were reported. If this is the case for study i, then the associated likelihood function of [beta]1i is
equation M22
Suppose that for the next ks(1≤ sk) studies, individual patient data are available. Therefore, model (2) can be directly fitted to these data. More specifically, let Yi = (yi1, yi2, …, yini)t be the vector of outcomes from a total sample of size ni in study i (s < ik). Let Xi be the associated ni × 2 design matrix obtained by stacking the row vectors of covariates uij and xij from the entire sample. Yi follows a ni-dimensional normal distribution with mean equation M23 and covariance matrix equation M24, where Ii is the identity matrix of dimension ni. The likelihood function associated with the observed data in study i is
equation M25
Combining above likelihood functions Li(β0, β1, A) and equation M26 together, we obtain the full likelihood function for the meta-analysis from a total of k studies as
equation M27
(3)
The maximum likelihood estimates to model parameters are obtained by maximizing the likelihood function (3). We propose a method to use the existing computing packages to maximize the likelihood function and obtain the MLEs of the model parameters. The core step here is to treat the reported intercepts and slopes as observations for outcome variable Y for these studies from which only these summary statistics are available. This approach is justified based on the fact that the estimated intercepts and slopes are unbiased estimates of the relevant study-specific intercepts and slopes and asymptotically normally distributed. More specifically, for study i(1≤ is), if the only published summary statistic is the estimated slope [beta]1i with the estimated variance equation M28, then we treat the estimated slope as an observed outcome from an individual in study i and let y1i = [beta]1i. Given β1i, [beta]1i is unbiased estimate of β1i and asymptotically normal with estimated variance equation M29. Across the studies, β1i is assumed normal with mean β1 and variance equation M30. It follows that
equation M31
(4)
where ui1 = 0, xi1 = 1, e1i is distributed as equation M32, ε1i is distributed as equation M33, and e1i and εi1 are independent. Model (4) is exactly the same as model (2) when uij = 0 and xij= 1 for study i(1≤ is) as if there was only one observation from one individual for the outcome of the study.
If published summary statistics contain both estimates to the intercept and slope from study i(1≤ is), then both the slope estimate [beta]1i and a linear combination of equation M34 can be treated as two independent observations for the outcome variable from two different individuals in the study. To see this, we let
equation M35
This linear combination of equation M36 in yi2 is based on the well-known technique of Gram-Schmidt orthogonalization24 so that, conditional on equation M37, yi1 from (4) and yi2 are not only uncorrelated but also with the same variance equation M38. All these make yi1 and yi2 satisfy model (1) for study i. Across the studies, equation M39 follows a bivariate normal distribution with mean vector equation M40 and covariance matrix A. It follows that
equation M41
(5)
where
equation M42
and equation M43 follows a bivariate normal distribution with mean vector 0 and covariance matrix A, and is independent of εi2. Therefore, model (5) and (4) together are exactly the same as model (2) for study i(1≤ is) as if there were only two independent observations from two individuals for the outcome of the study. Mathematically, the information from the reported intercept and slope equation M44 is exactly the same as that from equation M45 because of the one-to-one linear transformation. Statistically, these two vectors are equivalent too because they provide exactly the same contribution to the likelihood function for the entire meta-analysis. This same reason also warrants the invariance of the resulting statistical inferences when different linear one-to-one transformations are used to transform data. The reason that equation M46 is used for study i is the fact that it conforms to model (2) with the rest of studies and therefore facilitates the use of standard computing packages to implement the meta-analysis.
When equation M47, i = 1,2,…, s, are assumed known, the maximization to the likelihood function (3) can be easily implemented in SAS (R) MIXED procedure25 through appropriate manipulation of the input SAS data set. In order for SAS (R) MIXED procedure25 to implement model (2) simultaneously for the studies with individual patient data and for the other studies that have only the summary statistics available, we need to first prepare an augmented SAS data set which contains all available data from both studies with individual patient data and those with only summary statistics. More specifically, for studies that individual patient data are available, this augmented data set (called AUGDATA) contains the study identification, the subject identification within each study, the response variable, and two covariates uij (i.e., = 1) and xij. For studies that only summary statistics are available, this augmented data set contains a single observation of these variables from a single individual with u = 0 and x = 1 as given in (4) for a study that reported only the treatment effect (i.e., the slope) and two observations from two individuals as given by (4) and (5) for a study that reported both the intercept and slope estimates. Similar data codes have also been used by Houwelingen et al.26 when SAS (R) MIXED procedure25 was used to perform the traditional meta-analyses when only summary statistics were available from all trials. Another SAS data set (called COVAR) is also needed for SAS (R) MIXED procedure25 to implement model (2) simultaneously for the studies with individual patient data and for the others that have only the summary statistics available. This data set contains either the initial estimates or the permanent estimates (i.e., for these studies with only summary statistics available) for the variance and covariance parameters involved in the random components of the model. More specifically, the first three observations of COVAR should be the initial estimates to the three parameters in the covariance matrix equation M48 of the random vector equation M49. Next observations of COVAR should be all within-study variances equation M50’s from the error term εijs in model (2). Notice that for these studies (i.e., the first s studies) with only summary statistics available, equation M51 is permanently estimated by (or treated as known as) equation M52 in this analysis, whereas for these studies with individual patient data, initial estimates for these variance parameters can be specified for the maximization program of SAS (R) MIXED procedure25 to proceed. With these data sets AUGDATA and COVAR, the following SAS codes can be used to implement the proposed meta-analysis (in italics):
Proc mixed data=AUGDATA method=; class study id;
Model y=u x/noint s cl ddfm=;
Random u x/sub=study type=un;
Repeated study/sub=id group=study;
Parms/parmswdata=COVAR eqcon=4 to 3+s;
Run;
In the above SAS code, the MODEL statement fits model (2) when both reported summary statistics and individual patient data are used in the analysis. Option METHOD= specifies either the MLE or the restricted maximum likelihood estimation (REML) in the maximization of function (3). Option NOINT makes sure that model (2) is fitted with covariates u and x and without an additional intercept. Option CL gives the 95% confidence interval for the expected slope and intercept across the studies. Option DDFM specifies the method for computing the denominator degrees of freedom for the test of fixed effects (e.g., SATTERTH, KR). The RANDOM statement specifies the fact that the study-specific vector of intercept and slope equation M53 follows a bivariate normal distribution across studies. The REPEATED statement specifies the subject-specific error term in model (2) which allows different variances across different studies. The PARMS statement specifies the initial values for the covariance and variance parameters in model (2) while treating the variances of the summary statistics for the studies with only summary statistics available as permanently given by their reported estimates. Alternatively, instead of providing initial estimates to variance/covariance estimates in COVAR, one can also estimate the fixed effects and the variance/covariance parameters with repeated calls to SAS (R) MIXED procedure until they converge. This iterative approach has also recently been used in meta-analyses combining entire survival curves over multiple studies27,28.
Addressing statistical heterogeneity of studies in meta-analyses is one of the most fundamental aspects of many systematic reviews. Because the heterogeneity may determine the extent to which the conclusions of a meta-analysis can be generalized, it is important to quantify and test the extent of heterogeneity among the studies used in the meta-analysis. We will address the statistical heterogeneity for both slope and intercept parameters based on model (2).
We begin with perhaps the most important parameter in the meta-analysis model (2)—the slope β1. A simple approach of assessing statistical heterogeneity of the slopes across studies in a meta-analysis can be based on individual estimates from these studies by applying well established statistical methods4,29. More specifically, let [beta]1i be the estimate from study i(1≤ ik) and equation M54 be the associated estimated variance. Let equation M55 denote the precision of the estimate. Let
equation M56
A test of homogeneity of the β1i’s is given by
equation M57
which has a Chi-squared distribution with k−1 degrees of freedom under the assumption of homogeneity with the fixed effect meta-analysis model when the sample sizes from all studies are large. It has been widely realized, however, that this test has poor power when the number of studies in a meta-analysis is small and excessive power to detect clinically insignificant heterogeneity when there are too many studies.30,31 Realizing the potential limitations of a statistical test to characterize the degree of heterogeneity in a meta-analysis, Higgins and Thompson30 proposed a new measure of heterogeneity in a meta-analysis that overcomes the shortcomings of existing measures. Their focus is on the impact of heterogeneity on the results of a meta-analysis, i.e., on the degree to which conclusions might be generalized to situations outside those investigated in the studies at hand. An application of Higgins and Thompson’s index30 to the slope parameter in model (2) gives the index of overall heterogeneity among studies:
equation M58
where equation M59 is the shared within-study variance for the estimated slope among individual studies, or when the studies have differing within-study variations, the ‘typical’ within-study variance in the term of Higgins and Thompson30, and equation M60 is the between-study variance of study specific slopes β1i. The estimation of overall heterogeneity among studies in a meta-analysis requires the estimate to both the between-study variation and the ‘typical’ within-study variance. For the latter, Higgins and Thompson30 used the following estimator
equation M61
(6)
Other estimates of equation M62 were also proposed by Xiong et al.32 Let equation M63 be the MLE of equation M64 through the mixed effects model (2). An estimate to the index of overall heterogeneity for the slope parameter is
equation M65
(7)
The similar statistical approach can be applied to obtain an estimate to the index of overall heterogeneity for the intercept parameter [beta]0i’s as
equation M66
(8)
where equation M67 be the MLE of equation M68 through the mixed effects model (2), and equation M69 is the estimate to the ‘typical’ within-study variance equation M70:
equation M71
and equation M72. Higgins and Thompson30 suggested certain cutoffs to indicate different degree of heterogeneity for which I2= 0%, 25%, 50%, and 75% represents no heterogeneity, low heterogeneity, moderate heterogeneity, and high heterogeneity, respectively.
In order to demonstrate our proposed methodology, we present an example to study cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal. Researchers in AD have identified Apolipoprotein E4 (ApoE4) alleles as a major genetic risk factor of AD33. Although the pathological hallmarks of AD are the neurofibrillary tangles and the senile plaques34,35, the diagnosis of AD in living patients is still largely a clinical judgment based on careful neurological and/or neuropsychological examination combined with results from other clinical tests. Therefore, the search for biomarkers that can be used to differentiate AD from normal aging has been one of the primary research activities in AD. Individuals with AD have been found to have decreased level of cerebrospinal fluid (CSF) β-amyloid42 as compared to normal nondemented controls36. Further, because AD is a progressive neurodegenerative disorder leading to the death of brain cells that can not be replaced once lost, it is important to assess the potential of these biomarkers to identify individuals that are at high risk of AD while they are still cognitively normal. The importance of such antecedent biomarkers of AD is further highlighted by the fact that no pharmaceutical treatments are effective for the disease’s late stages. We chose to study whether CSF β-amyloid42 is decreased among individuals of normal nondemented aging who are ApoE4 positive as compared to those who are ApoE4 negative. Although many published studies have compared CSF β-amyloid42 level between individuals with AD and normal nondemented controls, very few have reported CSF β-amyloid42 level as a function of ApoE4 status among individuals who are still cognitively normal. As a matter of fact, our comprehensive MEDLINE search identified a total of 6 published studies on CSF β-amyloid42 during the period of 1990 to 2007 which actually reported summary statistics as a function of ApoE4 status for individuals who were not demented3742. The summary statistics for these 6 published studies are presented in Table 1.
Table 1
Table 1
Reported Summary Statistics from Six Studies on CSF β-amyloid42 (in pg/mL) as a Function of ApoE4 Genotype
In a recent study for which the first author served as the statistical data analyst, Fagan et al.36 reported data on CSF biomarkers from a relatively large sample of individuals with normal aging and AD from Washington University Alzheimer’s Disease Research Center (WU ADRC). They compared various CSF biomarkers among normal aging and AD at baseline and studied their predictive power of the subsequent development of AD. From the entire sample reported36, we identified a subset of 139 individuals who were nondemented at baseline and whose ApoE4 genotypes were available. We therefore have the individual patient data on CSF β-amyloid42 from this subset sample as a function of ApoE4 genotype. Out of these 139 nondemented individuals, 89 are ApoE4 negative, and 50 are ApoE4 positive. In order to best address the question whether CSF β-amyloid42 is decreased among individuals of nondemented aging who are ApoE4 positive as compared to those who are ApoE4 negative, we combined all existing data including the published 6 studies from which only summary statistics were available and the latest individual patient data on the 139 individuals from the WU ADRC, and applied our proposed methodology of meta-analyses to these data (both intercepts and slopes of the published studies used in model (2)). The point estimate for the mean difference of CSF β-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative is −111.07 pg/mL with an estimated standard error of 29.50 pg/mL. A 95% confidence interval estimate to the mean difference of CSF β-amyloid42 is from −183.26 pg/mL to −38.89 pg/mL. The observed significance level for the observed mean difference is 0.010, indicating a statistically significant difference on mean CSF β-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative at the significance level of 5%. Instead of providing initial estimates to variance/covariance estimates in COVAR, we also implemented the iterative approach27,28 by estimating the fixed effects and the variance/covariance parameters with repeated calls to SAS (R) MIXED procedure until they converge. The new approach resulted in essentially the same parameter estimates as reported above. We also observe that if the individual patient data from 139 nondemented individuals of the WU ADRC were not included in the meta-analyses through our proposed methodology, a traditional random effect meta-analysis on the summary statistics of six published studies3742 would give a point estimate of −31.69 pg/mL to the mean difference of CSFβ-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative with an asymptotic 95% confidence interval estimate from −128.93 pg/mL to 65.56 pg/mL as reported in Xiong et al.32. Even when the summary statistics of Fagan et al.36 were used along with the summary statistics of the six published studies in a traditional random effect meta-analysis, the point estimate to the mean difference of CSFβ-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative is −52.90 pg/mL with an asymptotic 95% confidence interval estimate from −141.20 pg/mL to 35.3936 pg/mL. The fact that the traditional meta-analyses without incorporating the latest individual patient data failed to detect statistically significant difference on the mean CSF β-amyloid42 level between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative highlights the importance of combining the latest individual patient data with published summary statistics through our proposed meta-analyses methodology. Figure 1 presents a forest plot including the summary statistics and 95% confidence intervals for each individual study and the results of the meta-analysis based on our proposed methodology with the individual patient data of Fagan et al.36
Figure 1
Figure 1
A Forest Plot on the Difference of CSF Aβ42 (in pg/mL) between ApoE4 Positive abd Negative Individuals
An estimate to the Higgins-Thompson index of overall heterogeneity30 for the slope parameter (i.e., the difference on CSF β-amyloid42 between two ApoE4 genotypes) is equation M73, suggesting a moderate degree of heterogeneity among estimated differences on CSF β-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative. An estimate to the Higgins-Thompson index of overall heterogeneity30 for the intercept parameter (i.e., the mean CSF β-amyloid42 level for the genotype of ApoE4 negative) is equation M74, indicating a very high degree of heterogeneity for the estimated mean CSF β-amyloid42 from individuals of normal aging who are ApoE4 negative among the studies used in this meta-analysis. Because age can be a potential confounding factor of CSF β-amyloid4243, we further compared the mean level of CSF β-amyloid42 between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative after adjusting for the effect of age, and found very similar estimates as reported above.
Our proposed methodology of meta-analyses provides a unified approach that synthesizes medical evidence by using both individual patient data and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers. Such scientific evidence can only be obtained when all existing evidence from the literature and the most up-to-date individual patient data are jointly analyzed statistically. When used appropriately, our proposed methods can help the biomedical researchers to report the most up-to-date evidence on medicine by combining their latest individual patient data with at least summary statistics from already published studies. Further, among studies that only reported summary statistics, our proposed approach can easily deal with the situation when some studies reported only the slope parameter estimate (i.e., the treatment effect in a two-arm clinical trial) along with the relevant standard error and others reported both intercept and slope parameters as well as their associated covariance matrix, the latter of which is equivalent to when summary statistics were reported for both arms individually in two-arm clinical trials. Through an appropriate augmentation of all available data (including both individual patient data and summary statistics), we showed that the standard statistical procedures such as SAS (R) MIXED procedure can be used to implement the maximum likelihood estimates of the parameters of interest and obtain appropriate statistical inferences. Notice that the traditional meta-analyses of random effects models on summary statistics26 is a special case of our approach in which all studies have only summary statistics, and the approach proposed by Higgins et al.9 is also a special case of our approach in which individual patient data are available from every study used in the meta-analyses. We also proposed measures of heterogeneity for both slope and intercept parameters based on our proposed model of meta-analyses. These measures generalize the index of heterogeneity of Higgins and Thompson30 and have potentially important interpretative implications in the proposed meta-analyses. Finally, we demonstrated our proposed methodology through a real life application studying the cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal. This real life application not only offers the most up-to-date statistically significant evidence that the mean level of CSF β-amyloid42 was decreased in individuals of normal aging who are ApoE4 positive as compared to those who are ApoE4 negative, but also highlights the importance of our proposed meta-analyses methodology due to the fact that such statistically significant medical evidence would not be achieved if individual patient data of CSF β-amyloid42 from 139 individuals of the WU ADRC were not combined with the summary statistics of the six published studies3742 through our meta-analyses model.
Acknowledgments
The authors would like to thank Dr. David Holtzman and Dr. Anne Fagan, Department of Neurology, Washington University in St. Louis, for providing the biomarker data used in the paper. The authors also thank the Clinical and Psychometric and Genetic Cores of the Alzheimer’s Disease Research Center at Washington University for subject assessments. Dr. Xiong’s work was supported by grant K25 AG025189 from the National Institute on Aging and by the Alan A. and Edith Wolff Charitable Trust. Financial support for this study was also provided in part by National Institute on Aging grants AG003991, AG005681, and AG026276 for Chengjie Xiong, J. Philip Miller, and John C. Morris.
1. Egger M, Davey Smith G, Phillips AN. Meta-analysis; principles and procedures. BMJ. 1997;315:1533–1537. [PMC free article] [PubMed]
2. DerSimonian R, Laird NM. Meta-analysis in clinical trials. Controlled Clinical Trials. 1986;7:177–188. [PubMed]
3. DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: an update. Contemporary Clinical Trials. 2007;28:105–114. [PubMed]
4. Whitehead A, Whitehead J. A general parametric approach to the meta-analysis of clinical trials. Stats Med. 1991;10:1665–1677. [PubMed]
5. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22:719–748. [PubMed]
6. Laird NM, Mosteller F. Some statistical methods for combining experimental results. Int J Technol Assess Health Care. 1990;6:5–30. [PubMed]
7. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis. 1985;27:335–371. [PubMed]
8. Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press; Orlando: 1985.
9. Higgins JPT, Whitehead A, Turner RM, Omar RZ, Thompson SG. Meta-analysis of continuous outcome data from individual patients. Stats Med. 2001;20:2219–2241. [PubMed]
10. Fisher RA. Statistical Methods for Researcher Workers. Oliver & Boyd; Edinburgh: 1925.
11. Lehmann EL, Casella G. Theory of Point Estimation. 2. Springer-Verlag; New York: 1998.
12. Steward LA, Tierney JF. To IPD or not to IPD? Eval Health Prof. 2002;25:76–97. [PubMed]
13. Schmid CH, Stark PC, Berlin JA, et al. Meta-regression detected associations between heterogeneous treatment effects and study-level, but not patient-level, factors. J Clin Epidemiol. 2004;57:683–687. [PubMed]
14. Lambert PC, Sutton AJ, Abrams KR, et al. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. J Clin Epidemiol. 2002;55:86–94. [PubMed]
15. Berlin J, Santanna J, Schmid C, et al. Anti-Lymphocyte Antibody Induction Therapy Study Group. Individual patient versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Stats Med. 2002;21:371–387. [PubMed]
16. Stewart LA, Clarke MJ. Practical methodology of meta-analyses (overviews) using updated individual patient data. Stats Med. 1995;14:2057–2079. [PubMed]
17. Turner RM, Omar RZ, Yang M, et al. A multilevel model framework for meta-analysis of clinical trials with binary outcomes. Stat Med. 2000;19:3417–3432. [PubMed]
18. Whitehead A, Omar RZ, Higgins JPT, et al. Meta-analysis of ordinal outcomes using individual patient data. Stats Med. 2001;20:2243–2260. [PubMed]
19. Man-son HM, Wells G. Meta-analysis of efficacy of quinine for treatment of nocturnal leg cramps in elderly people. BMJ. 1995;310:13–17. [PMC free article] [PubMed]
20. Moore RA, McQuay HJ. Single-patient data meta-analysis of 3453 postoperative patients: Oral tramadol versus placebo, codeine and combination analgesics. Pain. 1997;69:287–294. [PubMed]
21. The International Study Group on Improving ORS. Impact of glycine-containing ORS solutions on stool output and duration of diarrhea: a meta-analysis of seven clinical trials. Bulletin of the World Health Organization. 1991;69:541–548. [PubMed]
22. Nicolucci A, Carinci F, Graepel JG, Hohman TC, Ferris F, Lachin JM. The efficacy of tolrestat in the treatment of diabetic peripheral neuropathy. A Meta-analysis of individual patient data. Diabetes Care. 1996;19:1091–1096. [PubMed]
23. Van Houwelingen HC, Zwinderman K, Stijnen T. A bivariate approach to meta-analysis. Stats Med. 1993;12:2272–2284.
24. Noble B, Daniel JW. Applied Linear Algebra. Prentice-Hall Inc; Englewood Cliffs NJ: 1977.
25. Little R, Milliken GA, Stroup W, Wolfinger R. SAS System for Mixed Models. SAS Institute Inc; Cary NC: 1996.
26. Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stats Med. 2002;21:589–624. [PubMed]
27. Arends LR, Hunink MGM, Stijnen T. Meta-analysis of summary survival curve data. Statistics in Medicine. 2008;27:4381–4396. [PubMed]
28. Dear KBG. Iterative generalized least squares for mata-analysis of survival data at multiple times. Biometrics. 1994;50:989–1002. [PubMed]
29. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129.
30. Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21:1539–1558. [PubMed]
31. Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine. 1998;17:841–856. [PubMed]
32. Xiong C, Gao F, Yan Y, et al. On Measuring Overall Heterogeneity in Meta-analyses-Application to CSF Biomarker Studies in Alzheimer’s Disease. Journal of Modern Applied Statistical Methods. 2008 in press.
33. Myers RH, Schaefer EJ, Wilson PWF, et al. Apolipoprotein E ε 4 association with dementia in a population-based study: The Framingham study. Neurology. 1996;46:673–677. [PubMed]
34. Braak H, Braak E. Neuropathologic staging of Alzheimer-related changes. ActaNeuropathologica. 1991;82:239–259. [PubMed]
35. McKeel DW, Jr, Price JL, Miller JP, Grant EA, Xiong C, Berg L, Morris JC. Neuropathologic criteria for diagnosing Alzheimer disease in persons with pure dementia of Alzheimer type. Journal of Neuropathology and Experimental Neuropathology. 2004;63 (10):1028–1037. [PubMed]
36. Fagan AM, Roe CM, Xiong C, et al. Cerebrospinal fluid tau/β-amyloid42 ratio as a prediction of cognitive decline in nondemented older adults. Arch Neurol. 2007;64:343–349. [PubMed]
37. Sunderland T, Mirza N, Putnam KT, et al. Cerebrospinal fluid β-amyloid1–42 and tau in control subjects at risk for Alzheimer’s disease: The effect of apoEε 4 allele. Biol psychiatry. 2004;56:670–676. [PubMed]
38. Jensen M, Schroder J, Blomberg M, et al. Cerebrospinal fluid 42 is increased early in sporadic Alzheimer’s disease and declines with disease progression. Ann Neurol. 1999;45:504–511. [PubMed]
39. Andreasen N, Hesse C, Davidsson P, et al. Cerebrospinal fluid β-amyloid(1–42) in Alzheimer’s disease: differences between early- and late-onset Alzheimer’s disease and stability during the course of disease. Arch Neurol. 1999;56:673–680. [PubMed]
40. Tapiola T, Pirttila T, Mehta PD, et al. Relationship between apoE genotype and CSF β-amyloid (1–42) and tau in patients with probable and definite Alzheimer’s disease. Neurobiology of Aging. 2000;21:735–740. [PubMed]
41. Riemenschneider M, Schmolke M, Lautenschlager N, et al. Cerebrospinal beta-amyloid(1–42) in early Alzheimer’s disease: association with apolipoprotein E genotype and cognitive decline. Neuroscience Letters. 2000;284:85–88. [PubMed]
42. Prince JA, Zetterberg H, Andreasen N, et al. APOE ε 4 allele is associated with reduced cerebrospinal fluid levels of Aβ42. Neurology. 2004;62:2116–2118. [PubMed]
43. Peskind ER, Li G, Shofer J, et al. Age and Apolipoprotein E*4 allele effects on cerebrospinal fluid β-amyloid 42 in adults with normal aging. Arch. Neurol. 2006;63:936–939. [PubMed]