Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3017356

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Meta-analysis models when individual patient data are partly available
- 3. Maximum likelihood estimates and implementations
- 4. Heterogeneity in meta-analysis when individual patient data are partly available
- 5. Application to a biomarker study in Alzheimer’s disease and normal aging
- 7. Discussion
- References

Authors

Related links

J Appl Stat. Author manuscript; available in PMC 2012 January 1.

Published in final edited form as:

J Appl Stat. 2011 January 1; 38(1): 15–27.

doi: 10.1080/02664760903008987PMCID: PMC3017356

NIHMSID: NIHMS151352

Chengjie Xiong, PhD,^{1} Gerald van Belle, PhD,^{2} Kejun Zhu, PhD,^{3} J. Philip Miller,^{1} and John C. Morris, MD^{4,}^{5}

Corresponding Author: Chengjie Xiong, Ph.D., Division of Biostatistics, Campus Box 8067, Washington University in St. Louis, St. Louis, MO 63110, U.S.A, Phone: (314)3623635, Fax: (314)362-2693, Email: ude.ltsuw.soibuw@eijgnehc

SAS code for implementing the proposed methodology can be requested to the corresponding author through email at ude.ltsuw.soibuw@eijgnehc.

This article provides a unified methodology of meta-analysis that synthesizes medical evidence by using both available individual patient data (IPD) and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers, and can only be obtained when existing evidence from the literature and the most recent individual patient data are optimally synthesized. We propose a general linear mixed effects model to conduct meta-analyses when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies. Our approach includes both the traditional meta-analyses in which only summary statistics are available for all studies and the other extreme case in which individual patient data are available for all studies as special examples. We implement the proposed model with statistical procedures from standard computing packages. We provide measures of heterogeneity based on the proposed model. Finally, we demonstrate the proposed methodology through a real life example studying the cerebrospinal fluid biomarkers to identify individuals with high risk of developing Alzheimer’s disease when they are still cognitively normal.

Systematic reviews have become increasingly used as a means of assessing and interpreting the results from medical research. Such reviews aim to comprehensively identify and assess all studies relevant to a given scientific question, and meta-analysis has been the major statistical methodology for the quantitative synthesis of study results. Many methods for meta-analysis are available, and the most popularly applied in the medical research focus on the optimum combination of published summary statistics in some form of weighted averages.^{1}^{–}^{4} Usually, each study is given a weight according to the precision of its results on summary statistics. Studies with good precisions are weighted more heavily than studies with greater uncertainty. The variance for the overall estimate of the parameter under study in meta-analyses is in general from two different sources, one is associated with the individual studies (i.e., the within-study variance), and the other is associated with the possible difference between different studies (i.e., between-study variance). When the between-study variance is assumed to be 0, each study is simply weighted according to its own variance. This approach characterizes a fixed effects model which is exemplified by the Mantel-Haenszel method^{5}^{–}^{6} or the Peto method.^{7} When the between-study variance is not zero, the random effects models^{6} add the between-study variance to the within-study variance of each individual study when the overall mean of the random effects is estimated. Fixed effects and random effects model for continuous outcomes have also been described.^{8}

Published summary statistics may come from different analytic strategies and may be reported in very inconsistent formats in the literature. Therefore, important scientific questions may not always be adequately addressed by meta-analyses based on summary statistics alone.^{9} Further, even if all summary statistics were adequately and consistently reported in studies used in a meta-analysis, the lack of individual patient data might result in a loss of information for estimating the model parameters based on the likelihood principle.^{10} This is partly due to the fact that the likelihood function from the summary statistics is in general not equivalent to the likelihood function from individual patient data if the summary statistics are no longer sufficient statistics of the individual patient data.^{11} Because of these, meta-analysis using individual patient data has been widely considered the gold standard of meta-analyses.^{12} This approach provides several advantages, including the ability to use the full likelihood function from the individual patient data for the statistical inferences, the potential to produce consistent analyses across different studies, the use of up-to-date data, the possibility to avoid biases associated with use of aggregate data in meta-regression,^{13}^{–}^{15} and the potential to generate additional scientific hypotheses, particularly related to the individual patient characteristics data where the data would be typically lacking in published results.^{16} The statistical methodologies of meta-analyses of individual patient data have recently been studied by several authors for different scenarios, including a multilevel model framework for the meta-analyses of clinical trials when the outcome variables are binary,^{17} a general linear mixed model for the case of continuous outcome variables,^{9} and another general framework of meta-analyses for ordinal outcomes based on proportional odds model.^{18} Other approaches have also appeared in the literature.^{19}^{–}^{23}

Obtaining and analyzing individual patient data can be both costly and time consuming. In some cases, it is practically impossible to obtain individual patients data from all qualified studies in a meta-analysis.^{16} Therefore, researchers are very likely confronted with the situation in their meta-analyses that individual patient data are only available for some of the studies and summary statistics reported in the literature have to be used for the others. Whereas statistical methods for meta-analysis using either summary statistics or individual patient data across all studies have been well developed, little attention has been paid to the case when individual patient data are only available for some of the studies and summary statistics have to be used for the rest of the studies in the meta-analysis. This paper aims to address this very question. We propose a likelihood based approach for meta-analyses based on both individual patient data and summary statistics. We also implement the proposed methodology with standard computing packages. Finally, we demonstrate our proposed methodology by presenting an example to study cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal.

We focus on the situation when the outcome variable *Y* is continuous in cross-sectional studies. We assume that the continuous outcome variable can be modeled as a linear regression function of covariates within each study used in the meta-analyses. The choice of a linear regression model is flexible enough as it contains the meta-analyses not only for controlled clinical trials in which the treatment arms can be represented by dummy variables (i.e., covariates) but also for observational studies in which the association between the outcome variable and covariates is often the primary interest. We focus on the case when only one covariate *X* is used in the linear regression model, but point out that the generalization of our method to more than 1 covariate is straightforward. Therefore, our major scientific interest in the meta-analysis is to estimate the overall regression parameters (i.e., the slope and the intercept) by pooling all study specific regression parameters together.

Assume that a total of *k* studies are used in a meta-analysis. Let *y _{ij}* be the outcome of patient

$${y}_{ij}={\beta}_{0i}+{\beta}_{1i}{x}_{ij}+{\epsilon}_{ij}={\beta}_{0i}{u}_{ij}+{\beta}_{1i}{x}_{ij}+{\epsilon}_{ij},$$

(1)

where *u _{ij}* is typically 1,
${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t}$ is the study specific vector of regression coefficients, and

We further assume a random effect among studies by assuming that ${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t}$ follows a bivariate normal distribution with mean vector ${(\begin{array}{ll}{\beta}_{0}\hfill & {\beta}_{1}\hfill \end{array})}^{t}$ and covariance matrix

$$A=\left(\begin{array}{cc}{\sigma}_{00}^{2}& \rho {\sigma}_{00}{\sigma}_{11}\\ \rho {\sigma}_{00}{\sigma}_{11}& {\sigma}_{11}^{2}\end{array}\right).$$

Notice that a correlation between the intercept and the slope is introduced in this assumption, which conceptually allows the possibility that the effect size of the treatment in clinical trials of two arms is associated with the level of the outcome measures in the control group across the trials. Further, this assumption also makes it necessary to estimate the intercept parameter even if the main interest of the meta-analysis is to estimate the slope in the model. If all individual patient data are available from all studies in the meta-analysis, the model becomes a standard general linear mixed effects model in which intercepts and slopes vary randomly across studies, i.e.,

$${y}_{ij}={\beta}_{0}{u}_{ij}+{\beta}_{1}{x}_{ij}+{e}_{0i}{u}_{ij}+{e}_{1i}{x}_{ij}+{\epsilon}_{ij},$$

(2)

where
${(\begin{array}{ll}{e}_{0i}\hfill & {e}_{1i}\hfill \end{array})}^{t}$ follows a bivariate normal distribution with mean vector 0 and covariance matrix *A*, and is assumed independent of *ε _{ij}*.

We now assume that for the first *s*(1≤ *s* ≤ *k*) studies, only published summary statistics are available. Let
${\left(\begin{array}{ll}{\widehat{\beta}}_{0i}\hfill & {\widehat{\beta}}_{1i}\hfill \end{array}\right)}^{t}$ be the maximum likelihood estimates from study *i*(1≤ *i* ≤ *s*) based on the conditional model (1), given
${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t}$. Assume that the sample size from study *i* is large enough such that, conditional on
${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t},{\left(\begin{array}{ll}{\widehat{\beta}}_{0i}\hfill & {\widehat{\beta}}_{1i}\hfill \end{array}\right)}^{t}$ follows approximately a normal distribution with mean
${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t}$ and covariance matrix Γ* _{i}* which is estimated by

$${\widehat{\mathrm{\Gamma}}}_{i}=\left(\begin{array}{ll}{\widehat{\gamma}}_{00i}^{2}\hfill & {\widehat{\gamma}}_{01i}\hfill \\ {\widehat{\gamma}}_{01i}\hfill & {\widehat{\gamma}}_{11i}^{2}\hfill \end{array}\right).$$

More specifically,
${\widehat{\gamma}}_{00i}^{2}$ and
${\widehat{\gamma}}_{11i}^{2}$ can be obtained by simply squaring the reported standard errors for _{0}* _{i}* and

$${L}_{i}({\beta}_{0},{\beta}_{1},A)=\frac{1}{2\pi \sqrt{{\mathrm{\sum}}_{i}}exp\left[-0.5\times \left({\widehat{\beta}}_{0i}-{\beta}_{0}\phantom{\rule{0.38889em}{0ex}}{\widehat{\beta}}_{1i}-{\beta}_{1}\right){\mathrm{\sum}}_{i}^{-1}{\left({\widehat{\beta}}_{0i}-{\beta}_{0}\phantom{\rule{0.38889em}{0ex}}{\widehat{\beta}}_{1i}-{\beta}_{1}\right)}^{t}\right],}$$

where |Σ* _{i}*| is the determinant of Σ

$${L}_{i}({\beta}_{0},{\beta}_{1},A)=\frac{1}{\sqrt{2\pi ({\sigma}_{11}^{2}+{\gamma}_{11i}^{2})}}exp\left[-\frac{{\left({\widehat{\beta}}_{1i}-{\beta}_{1}\right)}^{2}}{2({\sigma}_{11}^{2}+{\gamma}_{11i}^{2})}\right].$$

Suppose that for the next *k* − *s*(1≤ *s* ≤ *k*) studies, individual patient data are available. Therefore, model (2) can be directly fitted to these data. More specifically, let *Y _{i}* = (

$${L}_{i}\left({\beta}_{0},{\beta}_{1},A,{\sigma}_{i}^{2}\right)=\frac{1}{\sqrt{{(2\pi )}^{{n}_{i}}{\mathrm{\sum}}_{i}}exp\left[-0.5\times {({Y}_{i}-X({\beta}_{0}-{\beta}_{1}))}^{t}{\mathrm{\sum}}_{i}^{-1}({Y}_{i}-X({\beta}_{0}-{\beta}_{1}))\right].}$$

Combining above likelihood functions *L _{i}*(

$$L\left({\beta}_{0},{\beta}_{1},A,{\sigma}_{s+1}^{2},\dots ,{\sigma}_{k}^{2}\right)=\underset{{L}_{i}({\beta}_{0},{\beta}_{1},A)}{\overset{\underset{{L}_{i}({\beta}_{0},{\beta}_{1},A,{\sigma}_{i}^{2})}{\overset{.}{i=s+1k}}}{i=1s}}$$

(3)

The maximum likelihood estimates to model parameters are obtained by maximizing the likelihood function (3). We propose a method to use the existing computing packages to maximize the likelihood function and obtain the MLEs of the model parameters. The core step here is to treat the reported intercepts and slopes as observations for outcome variable *Y* for these studies from which only these summary statistics are available. This approach is justified based on the fact that the estimated intercepts and slopes are unbiased estimates of the relevant study-specific intercepts and slopes and asymptotically normally distributed. More specifically, for study *i*(1≤ *i* ≤ *s*), if the only published summary statistic is the estimated slope _{1}* _{i}* with the estimated variance
${\widehat{\gamma}}_{11i}^{2}$, then we treat the estimated slope as an observed outcome from an individual in study

$${y}_{i1}={\widehat{\beta}}_{1i}={\beta}_{0}{u}_{i1}+{\beta}_{1}{x}_{i1}+{e}_{0i}{u}_{i1}+{e}_{1i}{x}_{i1}+{\epsilon}_{i1},$$

(4)

where *u _{i}*

If published summary statistics contain both estimates to the intercept and slope from study *i*(1≤ *i* ≤ *s*), then both the slope estimate _{1}* _{i}* and a linear combination of
${\left(\begin{array}{ll}{\widehat{\beta}}_{0i}\hfill & {\widehat{\beta}}_{1i}\hfill \end{array}\right)}^{t}$ can be treated as two independent observations for the outcome variable from two different individuals in the study. To see this, we let

$${y}_{i2}=\frac{{\gamma}_{11i}^{2}}{\sqrt{{\gamma}_{00i}^{2}{\gamma}_{11i}^{2}-{\gamma}_{01i}^{2}}}{\widehat{\beta}}_{0i}-\frac{{\gamma}_{01i}}{\sqrt{{\gamma}_{00i}^{2}{\gamma}_{11i}^{2}-{\gamma}_{01i}^{2}}}{\widehat{\beta}}_{1i}.$$

This linear combination of
${\left(\begin{array}{ll}{\widehat{\beta}}_{0i}\hfill & {\widehat{\beta}}_{1i}\hfill \end{array}\right)}^{t}$ in *y _{i}*

$${y}_{i2}={\beta}_{0}{u}_{i2}+{\beta}_{1}{x}_{i2}+{e}_{0i}{u}_{i2}+{e}_{1i}{x}_{i2}+{\epsilon}_{i2},$$

(5)

where

$$\begin{array}{l}{u}_{i2}=\frac{{\gamma}_{11i}^{2}}{\sqrt{{\gamma}_{00i}^{2}{\gamma}_{11i}^{2}-{\gamma}_{01i}^{2}}}\\ {x}_{i2}=-\frac{{\gamma}_{01i}}{\sqrt{{\gamma}_{00i}^{2}{\gamma}_{11i}^{2}-{\gamma}_{01i}^{2}}},\end{array}$$

and
${(\begin{array}{ll}{e}_{0i}\hfill & {e}_{1i}\hfill \end{array})}^{t}$ follows a bivariate normal distribution with mean vector 0 and covariance matrix *A*, and is independent of *ε _{i}*

When
${\mathrm{\Gamma}}_{i}=\left(\begin{array}{ll}{\gamma}_{00i}^{2}\hfill & {\gamma}_{01i}\hfill \\ {\gamma}_{01i}\hfill & {\gamma}_{11i}^{2}\hfill \end{array}\right)$, *i* = 1,2,…, *s*, are assumed known, the maximization to the likelihood function (3) can be easily implemented in SAS (R) MIXED procedure^{25} through appropriate manipulation of the input SAS data set. In order for SAS (R) MIXED procedure^{25} to implement model (2) simultaneously for the studies with individual patient data and for the other studies that have only the summary statistics available, we need to first prepare an augmented SAS data set which contains all available data from both studies with individual patient data and those with only summary statistics. More specifically, for studies that individual patient data are available, this augmented data set (called AUGDATA) contains the study identification, the subject identification within each study, the response variable, and two covariates *u _{ij}* (i.e., = 1) and

Proc mixed data=AUGDATA method=; class study id;

Model y=u x/noint s cl ddfm=;

Random u x/sub=study type=un;

Repeated study/sub=id group=study;

Parms/parmswdata=COVAR eqcon=4 to 3+s;

Run;

In the above SAS code, the MODEL statement fits model (2) when both reported summary statistics and individual patient data are used in the analysis. Option METHOD= specifies either the MLE or the restricted maximum likelihood estimation (REML) in the maximization of function (3). Option NOINT makes sure that model (2) is fitted with covariates *u* and *x* and without an additional intercept. Option CL gives the 95% confidence interval for the expected slope and intercept across the studies. Option DDFM specifies the method for computing the denominator degrees of freedom for the test of fixed effects (e.g., SATTERTH, KR). The RANDOM statement specifies the fact that the study-specific vector of intercept and slope
${(\begin{array}{ll}{\beta}_{0i}\hfill & {\beta}_{1i}\hfill \end{array})}^{t}$ follows a bivariate normal distribution across studies. The REPEATED statement specifies the subject-specific error term in model (2) which allows different variances across different studies. The PARMS statement specifies the initial values for the covariance and variance parameters in model (2) while treating the variances of the summary statistics for the studies with only summary statistics available as permanently given by their reported estimates. Alternatively, instead of providing initial estimates to variance/covariance estimates in COVAR, one can also estimate the fixed effects and the variance/covariance parameters with repeated calls to SAS (R) MIXED procedure until they converge. This iterative approach has also recently been used in meta-analyses combining entire survival curves over multiple studies^{27}^{,}^{28}.

Addressing statistical heterogeneity of studies in meta-analyses is one of the most fundamental aspects of many systematic reviews. Because the heterogeneity may determine the extent to which the conclusions of a meta-analysis can be generalized, it is important to quantify and test the extent of heterogeneity among the studies used in the meta-analysis. We will address the statistical heterogeneity for both slope and intercept parameters based on model (2).

We begin with perhaps the most important parameter in the meta-analysis model (2)—the slope *β*_{1}. A simple approach of assessing statistical heterogeneity of the slopes across studies in a meta-analysis can be based on individual estimates from these studies by applying well established statistical methods^{4}^{,}^{29}. More specifically, let _{1}* _{i}* be the estimate from study

$${\widehat{\beta}}_{1}=\frac{{\displaystyle \sum _{i=1}^{k}}{w}_{i}{\widehat{\beta}}_{1i}}{{\displaystyle \sum _{i=1}^{k}}{w}_{i}}.$$

A test of homogeneity of the *β*_{1}* _{i}*’s is given by

$$Q=\sum _{i=1}^{k}{w}_{i}{({\widehat{\beta}}_{1i}-{\widehat{\beta}}_{1})}^{2},$$

which has a Chi-squared distribution with *k*−1 degrees of freedom under the assumption of homogeneity with the fixed effect meta-analysis model when the sample sizes from all studies are large. It has been widely realized, however, that this test has poor power when the number of studies in a meta-analysis is small and excessive power to detect clinically insignificant heterogeneity when there are too many studies.^{30}^{,}^{31} Realizing the potential limitations of a statistical test to characterize the degree of heterogeneity in a meta-analysis, Higgins and Thompson^{30} proposed a new measure of heterogeneity in a meta-analysis that overcomes the shortcomings of existing measures. Their focus is on the impact of heterogeneity on the results of a meta-analysis, i.e., on the degree to which conclusions might be generalized to situations outside those investigated in the studies at hand. An application of Higgins and Thompson’s index^{30} to the slope parameter in model (2) gives the index of overall heterogeneity among studies:

$${I}_{\mathit{slope}}^{2}=\frac{{\sigma}_{11}^{2}}{{\gamma}_{11}^{2}+{\sigma}_{11}^{2}},$$

where
${\gamma}_{11}^{2}$ is the shared within-study variance for the estimated slope among individual studies, or when the studies have differing within-study variations, the ‘typical’ within-study variance in the term of Higgins and Thompson^{30}, and
${\sigma}_{11}^{2}$ is the between-study variance of study specific slopes *β*_{1}* _{i}*. The estimation of overall heterogeneity among studies in a meta-analysis requires the estimate to both the between-study variation and the ‘typical’ within-study variance. For the latter, Higgins and Thompson

$${\widehat{\gamma}}_{11}^{2}=\frac{(k-1){\displaystyle \sum _{i=1}^{k}}{w}_{i}}{{({\displaystyle \sum _{i=1}^{k}}{w}_{i})}^{2}-{\displaystyle \sum _{i=1}^{k}}{w}_{i}^{2}}.$$

(6)

Other estimates of
${\gamma}_{11}^{2}$ were also proposed by Xiong et al.^{32} Let
${\widehat{\sigma}}_{11}^{2}$ be the MLE of
${\sigma}_{11}^{2}$ through the mixed effects model (2). An estimate to the index of overall heterogeneity for the slope parameter is

$${I}_{\mathit{slope}}^{2}=\frac{{\widehat{\sigma}}_{11}^{2}}{{\widehat{\sigma}}_{11}^{2}+{\widehat{\gamma}}_{11}^{2}}.$$

(7)

The similar statistical approach can be applied to obtain an estimate to the index of overall heterogeneity for the intercept parameter _{0}* _{i}*’s as

$${I}_{int\mathit{ercept}}^{2}=\frac{{\widehat{\sigma}}_{00}^{2}}{{\widehat{\sigma}}_{00}^{2}+{\widehat{\gamma}}_{00}^{2}},$$

(8)

where ${\widehat{\sigma}}_{00}^{2}$ be the MLE of ${\sigma}_{00}^{2}$ through the mixed effects model (2), and ${\widehat{\gamma}}_{00}^{2}$ is the estimate to the ‘typical’ within-study variance ${\gamma}_{00}^{2}$:

$${\widehat{\gamma}}_{00}^{2}=\frac{(k-1){\displaystyle \sum _{i=1}^{k}}{w}_{i0}}{{({\displaystyle \sum _{i=1}^{k}}{w}_{i0})}^{2}-{\displaystyle \sum _{i=1}^{k}}{w}_{i0}^{2}},$$

and
${w}_{i0}=1/{\widehat{\gamma}}_{00i}^{2}$. Higgins and Thompson^{30} suggested certain cutoffs to indicate different degree of heterogeneity for which *I*^{2}= 0%, 25%, 50%, and 75% represents no heterogeneity, low heterogeneity, moderate heterogeneity, and high heterogeneity, respectively.

In order to demonstrate our proposed methodology, we present an example to study cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal. Researchers in AD have identified Apolipoprotein E4 (ApoE4) alleles as a major genetic risk factor of AD^{33}. Although the pathological hallmarks of AD are the neurofibrillary tangles and the senile plaques^{34}^{,}^{35}, the diagnosis of AD in living patients is still largely a clinical judgment based on careful neurological and/or neuropsychological examination combined with results from other clinical tests. Therefore, the search for biomarkers that can be used to differentiate AD from normal aging has been one of the primary research activities in AD. Individuals with AD have been found to have decreased level of cerebrospinal fluid (CSF) *β*-amyloid_{42} as compared to normal nondemented controls^{36}. Further, because AD is a progressive neurodegenerative disorder leading to the death of brain cells that can not be replaced once lost, it is important to assess the potential of these biomarkers to identify individuals that are at high risk of AD while they are still cognitively normal. The importance of such antecedent biomarkers of AD is further highlighted by the fact that no pharmaceutical treatments are effective for the disease’s late stages. We chose to study whether CSF *β*-amyloid_{42} is decreased among individuals of normal nondemented aging who are ApoE4 positive as compared to those who are ApoE4 negative. Although many published studies have compared CSF *β*-amyloid_{42} level between individuals with AD and normal nondemented controls, very few have reported CSF *β*-amyloid_{42} level as a function of ApoE4 status among individuals who are still cognitively normal. As a matter of fact, our comprehensive MEDLINE search identified a total of 6 published studies on CSF *β*-amyloid_{42} during the period of 1990 to 2007 which actually reported summary statistics as a function of ApoE4 status for individuals who were not demented^{37}^{–}^{42}. The summary statistics for these 6 published studies are presented in Table 1.

Reported Summary Statistics from Six Studies on CSF *β*-amyloid_{42} (in pg/mL) as a Function of ApoE4 Genotype

In a recent study for which the first author served as the statistical data analyst, Fagan et al.^{36} reported data on CSF biomarkers from a relatively large sample of individuals with normal aging and AD from Washington University Alzheimer’s Disease Research Center (WU ADRC). They compared various CSF biomarkers among normal aging and AD at baseline and studied their predictive power of the subsequent development of AD. From the entire sample reported^{36}, we identified a subset of 139 individuals who were nondemented at baseline and whose ApoE4 genotypes were available. We therefore have the individual patient data on CSF *β*-amyloid_{42} from this subset sample as a function of ApoE4 genotype. Out of these 139 nondemented individuals, 89 are ApoE4 negative, and 50 are ApoE4 positive. In order to best address the question whether CSF *β*-amyloid_{42} is decreased among individuals of nondemented aging who are ApoE4 positive as compared to those who are ApoE4 negative, we combined all existing data including the published 6 studies from which only summary statistics were available and the latest individual patient data on the 139 individuals from the WU ADRC, and applied our proposed methodology of meta-analyses to these data (both intercepts and slopes of the published studies used in model (2)). The point estimate for the mean difference of CSF β-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative is −111.07 pg/mL with an estimated standard error of 29.50 pg/mL. A 95% confidence interval estimate to the mean difference of CSF *β*-amyloid_{42} is from −183.26 pg/mL to −38.89 pg/mL. The observed significance level for the observed mean difference is 0.010, indicating a statistically significant difference on mean CSF *β*-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative at the significance level of 5%. Instead of providing initial estimates to variance/covariance estimates in COVAR, we also implemented the iterative approach^{27}^{,}^{28} by estimating the fixed effects and the variance/covariance parameters with repeated calls to SAS (R) MIXED procedure until they converge. The new approach resulted in essentially the same parameter estimates as reported above. We also observe that if the individual patient data from 139 nondemented individuals of the WU ADRC were not included in the meta-analyses through our proposed methodology, a traditional random effect meta-analysis on the summary statistics of six published studies^{37}^{–}^{42} would give a point estimate of −31.69 pg/mL to the mean difference of CSF*β*-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative with an asymptotic 95% confidence interval estimate from −128.93 pg/mL to 65.56 pg/mL as reported in Xiong et al.^{32}. Even when the summary statistics of Fagan et al.^{36} were used along with the summary statistics of the six published studies in a traditional random effect meta-analysis, the point estimate to the mean difference of CSF*β*-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative is −52.90 pg/mL with an asymptotic 95% confidence interval estimate from −141.20 pg/mL to 35.3936 pg/mL. The fact that the traditional meta-analyses without incorporating the latest individual patient data failed to detect statistically significant difference on the mean CSF *β*-amyloid_{42} level between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative highlights the importance of combining the latest individual patient data with published summary statistics through our proposed meta-analyses methodology. Figure 1 presents a forest plot including the summary statistics and 95% confidence intervals for each individual study and the results of the meta-analysis based on our proposed methodology with the individual patient data of Fagan et al.^{36}

A Forest Plot on the Difference of CSF Aβ42 (in pg/mL) between ApoE4 Positive abd Negative Individuals

An estimate to the Higgins-Thompson index of overall heterogeneity^{30} for the slope parameter (i.e., the difference on CSF *β*-amyloid_{42} between two ApoE4 genotypes) is
${I}_{\mathit{slope}}^{2}=70.6\%$, suggesting a moderate degree of heterogeneity among estimated differences on CSF *β*-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative. An estimate to the Higgins-Thompson index of overall heterogeneity^{30} for the intercept parameter (i.e., the mean CSF *β*-amyloid_{42} level for the genotype of ApoE4 negative) is
${I}_{int\mathit{ercept}}^{2}=99.6\%$, indicating a very high degree of heterogeneity for the estimated mean CSF *β*-amyloid_{42} from individuals of normal aging who are ApoE4 negative among the studies used in this meta-analysis. Because age can be a potential confounding factor of CSF *β*-amyloid_{42}^{43}, we further compared the mean level of CSF *β*-amyloid_{42} between individuals of normal aging who are ApoE4 positive and those who are ApoE4 negative after adjusting for the effect of age, and found very similar estimates as reported above.

Our proposed methodology of meta-analyses provides a unified approach that synthesizes medical evidence by using both individual patient data and published summary statistics within the framework of likelihood principle. Most up-to-date scientific evidence on medicine is crucial information not only to consumers but also to decision makers. Such scientific evidence can only be obtained when all existing evidence from the literature and the most up-to-date individual patient data are jointly analyzed statistically. When used appropriately, our proposed methods can help the biomedical researchers to report the most up-to-date evidence on medicine by combining their latest individual patient data with at least summary statistics from already published studies. Further, among studies that only reported summary statistics, our proposed approach can easily deal with the situation when some studies reported only the slope parameter estimate (i.e., the treatment effect in a two-arm clinical trial) along with the relevant standard error and others reported both intercept and slope parameters as well as their associated covariance matrix, the latter of which is equivalent to when summary statistics were reported for both arms individually in two-arm clinical trials. Through an appropriate augmentation of all available data (including both individual patient data and summary statistics), we showed that the standard statistical procedures such as SAS (R) MIXED procedure can be used to implement the maximum likelihood estimates of the parameters of interest and obtain appropriate statistical inferences. Notice that the traditional meta-analyses of random effects models on summary statistics^{26} is a special case of our approach in which all studies have only summary statistics, and the approach proposed by Higgins et al.^{9} is also a special case of our approach in which individual patient data are available from every study used in the meta-analyses. We also proposed measures of heterogeneity for both slope and intercept parameters based on our proposed model of meta-analyses. These measures generalize the index of heterogeneity of Higgins and Thompson^{30} and have potentially important interpretative implications in the proposed meta-analyses. Finally, we demonstrated our proposed methodology through a real life application studying the cerebrospinal fluid (CSF) biomarkers that can be used to identify individuals with high risk of developing Alzheimer’s disease (AD) when they are still cognitively normal. This real life application not only offers the most up-to-date statistically significant evidence that the mean level of CSF *β*-amyloid_{42} was decreased in individuals of normal aging who are ApoE4 positive as compared to those who are ApoE4 negative, but also highlights the importance of our proposed meta-analyses methodology due to the fact that such statistically significant medical evidence would not be achieved if individual patient data of CSF *β*-amyloid_{42} from 139 individuals of the WU ADRC were not combined with the summary statistics of the six published studies^{37}^{–}^{42} through our meta-analyses model.

The authors would like to thank Dr. David Holtzman and Dr. Anne Fagan, Department of Neurology, Washington University in St. Louis, for providing the biomarker data used in the paper. The authors also thank the Clinical and Psychometric and Genetic Cores of the Alzheimer’s Disease Research Center at Washington University for subject assessments. Dr. Xiong’s work was supported by grant K25 AG025189 from the National Institute on Aging and by the Alan A. and Edith Wolff Charitable Trust. Financial support for this study was also provided in part by National Institute on Aging grants AG003991, AG005681, and AG026276 for Chengjie Xiong, J. Philip Miller, and John C. Morris.

1. Egger M, Davey Smith G, Phillips AN. Meta-analysis; principles and procedures. BMJ. 1997;315:1533–1537. [PMC free article] [PubMed]

2. DerSimonian R, Laird NM. Meta-analysis in clinical trials. Controlled Clinical Trials. 1986;7:177–188. [PubMed]

3. DerSimonian R, Kacker R. Random-effects model for meta-analysis of clinical trials: an update. Contemporary Clinical Trials. 2007;28:105–114. [PubMed]

4. Whitehead A, Whitehead J. A general parametric approach to the meta-analysis of clinical trials. Stats Med. 1991;10:1665–1677. [PubMed]

5. Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959;22:719–748. [PubMed]

6. Laird NM, Mosteller F. Some statistical methods for combining experimental results. Int J Technol Assess Health Care. 1990;6:5–30. [PubMed]

7. Yusuf S, Peto R, Lewis J, Collins R, Sleight P. Beta blockade during and after myocardial infarction: an overview of the randomized trials. Prog Cardiovasc Dis. 1985;27:335–371. [PubMed]

8. Hedges LV, Olkin I. Statistical Methods for Meta-Analysis. Academic Press; Orlando: 1985.

9. Higgins JPT, Whitehead A, Turner RM, Omar RZ, Thompson SG. Meta-analysis of continuous outcome data from individual patients. Stats Med. 2001;20:2219–2241. [PubMed]

10. Fisher RA. Statistical Methods for Researcher Workers. Oliver & Boyd; Edinburgh: 1925.

11. Lehmann EL, Casella G. Theory of Point Estimation. 2. Springer-Verlag; New York: 1998.

12. Steward LA, Tierney JF. To IPD or not to IPD? Eval Health Prof. 2002;25:76–97. [PubMed]

13. Schmid CH, Stark PC, Berlin JA, et al. Meta-regression detected associations between heterogeneous treatment effects and study-level, but not patient-level, factors. J Clin Epidemiol. 2004;57:683–687. [PubMed]

14. Lambert PC, Sutton AJ, Abrams KR, et al. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. J Clin Epidemiol. 2002;55:86–94. [PubMed]

15. Berlin J, Santanna J, Schmid C, et al. Anti-Lymphocyte Antibody Induction Therapy Study Group. Individual patient versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Stats Med. 2002;21:371–387. [PubMed]

16. Stewart LA, Clarke MJ. Practical methodology of meta-analyses (overviews) using updated individual patient data. Stats Med. 1995;14:2057–2079. [PubMed]

17. Turner RM, Omar RZ, Yang M, et al. A multilevel model framework for meta-analysis of clinical trials with binary outcomes. Stat Med. 2000;19:3417–3432. [PubMed]

18. Whitehead A, Omar RZ, Higgins JPT, et al. Meta-analysis of ordinal outcomes using individual patient data. Stats Med. 2001;20:2243–2260. [PubMed]

19. Man-son HM, Wells G. Meta-analysis of efficacy of quinine for treatment of nocturnal leg cramps in elderly people. BMJ. 1995;310:13–17. [PMC free article] [PubMed]

20. Moore RA, McQuay HJ. Single-patient data meta-analysis of 3453 postoperative patients: Oral tramadol versus placebo, codeine and combination analgesics. Pain. 1997;69:287–294. [PubMed]

21. The International Study Group on Improving ORS. Impact of glycine-containing ORS solutions on stool output and duration of diarrhea: a meta-analysis of seven clinical trials. Bulletin of the World Health Organization. 1991;69:541–548. [PubMed]

22. Nicolucci A, Carinci F, Graepel JG, Hohman TC, Ferris F, Lachin JM. The efficacy of tolrestat in the treatment of diabetic peripheral neuropathy. A Meta-analysis of individual patient data. Diabetes Care. 1996;19:1091–1096. [PubMed]

23. Van Houwelingen HC, Zwinderman K, Stijnen T. A bivariate approach to meta-analysis. Stats Med. 1993;12:2272–2284.

24. Noble B, Daniel JW. Applied Linear Algebra. Prentice-Hall Inc; Englewood Cliffs NJ: 1977.

25. Little R, Milliken GA, Stroup W, Wolfinger R. SAS System for Mixed Models. SAS Institute Inc; Cary NC: 1996.

26. Van Houwelingen HC, Arends LR, Stijnen T. Advanced methods in meta-analysis: multivariate approach and meta-regression. Stats Med. 2002;21:589–624. [PubMed]

27. Arends LR, Hunink MGM, Stijnen T. Meta-analysis of summary survival curve data. Statistics in Medicine. 2008;27:4381–4396. [PubMed]

28. Dear KBG. Iterative generalized least squares for mata-analysis of survival data at multiple times. Biometrics. 1994;50:989–1002. [PubMed]

29. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129.

30. Higgins JPT, Thompson SG. Quantifying heterogeneity in a meta-analysis. Statistics in Medicine. 2002;21:1539–1558. [PubMed]

31. Hardy RJ, Thompson SG. Detecting and describing heterogeneity in meta-analysis. Statistics in Medicine. 1998;17:841–856. [PubMed]

32. Xiong C, Gao F, Yan Y, et al. On Measuring Overall Heterogeneity in Meta-analyses-Application to CSF Biomarker Studies in Alzheimer’s Disease. Journal of Modern Applied Statistical Methods. 2008 in press.

33. Myers RH, Schaefer EJ, Wilson PWF, et al. Apolipoprotein E *ε* 4 association with dementia in a population-based study: The Framingham study. Neurology. 1996;46:673–677. [PubMed]

34. Braak H, Braak E. Neuropathologic staging of Alzheimer-related changes. ActaNeuropathologica. 1991;82:239–259. [PubMed]

35. McKeel DW, Jr, Price JL, Miller JP, Grant EA, Xiong C, Berg L, Morris JC. Neuropathologic criteria for diagnosing Alzheimer disease in persons with pure dementia of Alzheimer type. Journal of Neuropathology and Experimental Neuropathology. 2004;63 (10):1028–1037. [PubMed]

36. Fagan AM, Roe CM, Xiong C, et al. Cerebrospinal fluid tau/*β*-amyloid_{42} ratio as a prediction of cognitive decline in nondemented older adults. Arch Neurol. 2007;64:343–349. [PubMed]

37. Sunderland T, Mirza N, Putnam KT, et al. Cerebrospinal fluid *β*-amyloid_{1–42} and tau in control subjects at risk for Alzheimer’s disease: The effect of apoE*ε* 4 allele. Biol psychiatry. 2004;56:670–676. [PubMed]

38. Jensen M, Schroder J, Blomberg M, et al. Cerebrospinal fluid *Aβ*42 is increased early in sporadic Alzheimer’s disease and declines with disease progression. Ann Neurol. 1999;45:504–511. [PubMed]

39. Andreasen N, Hesse C, Davidsson P, et al. Cerebrospinal fluid *β*-amyloid_{(1–42)} in Alzheimer’s disease: differences between early- and late-onset Alzheimer’s disease and stability during the course of disease. Arch Neurol. 1999;56:673–680. [PubMed]

40. Tapiola T, Pirttila T, Mehta PD, et al. Relationship between apoE genotype and CSF *β*-amyloid (1–42) and tau in patients with probable and definite Alzheimer’s disease. Neurobiology of Aging. 2000;21:735–740. [PubMed]

41. Riemenschneider M, Schmolke M, Lautenschlager N, et al. Cerebrospinal beta-amyloid_{(1–42)} in early Alzheimer’s disease: association with apolipoprotein E genotype and cognitive decline. Neuroscience Letters. 2000;284:85–88. [PubMed]

42. Prince JA, Zetterberg H, Andreasen N, et al. APOE ε 4 allele is associated with reduced cerebrospinal fluid levels of Aβ42. Neurology. 2004;62:2116–2118. [PubMed]

43. Peskind ER, Li G, Shofer J, et al. Age and Apolipoprotein E*4 allele effects on cerebrospinal fluid *β*-amyloid 42 in adults with normal aging. Arch. Neurol. 2006;63:936–939. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |