Currentlarge-scale studies focus on screening multiple biomarkers to improve risk assessment, prediction and management of disease. Such efforts require assaying stored biospecimens from several thousand subjects to measure levels of numerous biomarkers. Most, if not all, of these biomarkers turn out to be unassociated with the disease,16
resulting in loss of bio-resources that may be irreplaceable. In such situations, specimen pooling offers a practical strategy that not only reduces overall assay cost but also conserves specimens for future analyses of additional biomarkers yet to be identified as informative.
A pooling strategy was proposed earlier to estimate the individual-level odds ratio associated with a continuous exposure in unmatched or frequency-matched case-control studies.10
We extended that work and showed that unbiased estimation of the individual-level odds ratio parameter can be based on pooled exposure measurements for a fine-matched case-control study. If pooling sets are formed in a way that maintains the matching, a conditional logistic model applies, and OR scan be estimated using standard software. Unlike the method of Weinberg and Umbach,10
no offset adjustment is needed. Confounder effects can be estimated similarly and, in the presence of a potential effect modifier, only pairs with concordant values of the effect modifier should be pooled.
Conditional logistic regression involves likelihood-based inference, and the unbiasedness of estimates and the validity of tests and confidence intervals require a large number of observations in the analysis. With small numbers, bias can be severe.17
We observed some bias, generally away from the null, which increased as the pool size increased. This bias is attributable to the decreasing number of pooling sets, which reduces the effective sample size. Increasing the number of strata in our simulations by ten-fold eliminated detectable bias (data not shown). Nevertheless, when employing pooled exposure assessment, one should consider this issue and may need to adopt alternative estimators to cope with small numbers.17
Pooled analysis may also be applicable to exposures measured externally in the environment rather than in blood or serum specimens. Environmental exposures such as pesticides, arsenic, or lead can be measured in household dust, tap water, or ambient air. Depending on the medium (dust or water, say) samples from different sources could be combined on a weight or volume basis to form a pooled sample, and exposure assessed in the pooled sample would be used for estimation of odds ratios as outlined here.
The approach we have described might be useful for case-control studies with alternative matching schemes, e.g., randomized recruitment18–19
In such designs, pools should additionally be stratified by first-stage or screening variables. The implications of pooling in such designs remain to be explored.
Pooling presents both advantages and limitations and may not be the best approach for every study. As mentioned before, when a small fraction of specimens falls under the LOD, then pooling may further reduce this fraction, because a mean of g sample values has the same mean but smaller variance than an individual value. For example, if the exposure is distributed as a standard log-normal random variable and the LOD is 0.431, then 20% of the individually-measured exposures fall below the LOD. If 3 random subjects were pooled, then only about 3.4% of the pooled observations would fall below the LOD. In contrast, when a large proportion of specimens fall under the LOD (for example, if the population mean is actually below the LOD for the assay), then pooling can even exacerbate the problem.
Another limitation worth noting is that transformations are difficult to accommodate in a set-based analysis. The argument in the eAppendix (http://links.lww.com
) remains correct if, in the underlying logistic model (1
is replaced by some non-linear function of u
such as log(u
, or, generically, h
). Then, the variable v1
in the likelihood contribution (3
, with a similar redefinition for v0
. The resulting likelihood contribution for pools is still correct. However, a difficulty arises because what is actually measured in a pooled specimen, namely, the average of the g
constituent individual concentrations, allows one to measure Σu1i
but not Σh
). Naïvely substituting h(Σu1i
) for Σh
) leads to bias because the two differ. Hence, one cannot employ pooled analysis to estimate coefficients for a non-linear transformation of exposure. Under the null hypothesis of no exposure effect(β
= 0), however, the likelihood contribution of (3
) is 0.5 even for the naïve substitution. Thus, a pooled analysis can still provide a valid test of the null hypothesis for a non-linear transformation of exposure, even though it cannot provide an unbiased estimate of the effect. A related feature is that one can test departures from the assumed linear relationship between log-odds of disease and exposure by testing whether additional polynomial terms improve fit; however, if fit improves, the additional terms lack meaningful interpretations.
Another limitation is that effect-modifiers must be identified and accommodated a priori in the formation of pooling sets. Post-hoc analysis of effect modification between the exposure and another variable would require that the pooling be designed afresh. Continuous effect modifiers, even when known in advance, would require categorization, leading to coarsening and possibly loss of power. Multiple effect modifiers require that pooling be done within strata defined by cross-classified levels of effect modifiers. For example, if both age (young vs. old) and sex (male vs. female) are considered potential effect modifiers, then pooling sets are formed separately within the four age-sex groups. On the other hand, if additional covariates are confounders but not potential effect modifiers, the pooling sets are formed at random, without regard to the age and sex variables, and the value of the confounder for a pool is assigned by summing over values measured on individuals in the pool.
These limitations not withstanding, pooling strategies can greatly reduce the cost of a matched study, both in money and in depletion of irreplaceable specimens, while incurring minimal loss of statistical power or precision.