PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Epidemiology. Author manuscript; available in PMC Apr 11, 2011.
Published in final edited form as:
PMCID: PMC3073304
NIHMSID: NIHMS281777
Use of Multiple Assays Subject to Detection Limits With Regression Modeling in Assessing the Relationship Between Exposure and Outcome
Paul S. Albert,a Ofer Harel,b Neil Perkins,c and Richard Browned
aBiostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Rockville, MD
bDepartment of Statistics, University of Connecticut, Storrs, CT
cEpidemiology Branch, Division of Epidemiology, Statistics, and Prevention, Eunice Kennedy Shriver National Institute of Child Health, Rockville, MD
dDepartment of Clinical Laboratory Sciences, University at Buffalo, State University of New York, Buffalo, NY.
Correspondence: Paul S. Albert, Biostatistics and Bioinformatics Branch, Division of Epidemiology, Statistics, and Prevention Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, 6100 Executive Blvd Room 7B05F, Bethesda, MD 20906. albertp/at/mail.nih.gov.
Background
The goal of many studies in environmental epidemiology is to assess the relationship between chemical exposure and disease outcome. Often various assays can be used to measure a particular environmental exposure, with some assays being more invasive or expensive than others.
Methods
We consider the situation in which 2 assays can be used to measure an environmental exposure. The first assay has measurement error and is subject to a lower detection limit (LOD), and the second assay has less measurement error and is not subject to a lower LOD. In this situation, the first assay is less invasive or less expensive and is measured in all study participants, whereas the second assay is more invasive or more expensive and is only measured in a subset of individuals. We develop a flexible class of regression models that incorporates both sets of assay measurements and allows for continuous or binary outcomes. We explore different design strategies for selecting the subset of patients in whom to measure the second assay. One design strategy is to measure the second more invasive or expensive assay only when the first assay is below LOD. We compare these designs with a simple design in which the second assay is measured in a random subset of patients without regard to the results of the first assay.
Results
We develop estimation approaches for these regression models. We demonstrate through simulations that there are efficiency advantages of measuring the second assay in at least a fraction of cases in which the first assay is above LOD. We illustrate the methodology by using data from a study examining the effect of environmental polychlorinated biphenyl exposure on the risk of endometriosis.
Conclusion
The proposed methodology has good statistical properties and will be a useful methodological technique for studying the effect of exposure on outcome when exposure assays are subject to LOD.
Biomarkers are commonly used as outcomes or measures of exposure. These biomarkers are obtained with assays that are known to have measurement error and known limits of detection (LOD). For some biomarkers, there are multiple assays measuring the same exposure, with one assay potentially being more invasive or more expensive than another. For example, in a recent study1,2 examining the relationship between polychlorinated biphenyl (PCB) environmental agents and the risk of endometriosis, serum measurements of numerous PCB congeners were obtained in all study participants before laparoscopic surgery to diagnosis endometriosis. In addition, in about 20% of participants, PCB exposure was measured from adipose tissue that was sampled during laparoscopic surgery. Assay measurements of PCB exposure obtained from adipose tissue are considered to have less measurement error than serum-based assays. A complicating factor is that the less invasive serum assays are subject to LODs, whereas assays measured from more invasive adipose tissue samples are not. The focus of this article is to propose new statistical models that incorporate exposure data measured from assays on the 2 different sample sources.
In the PCB exposure/endometriosis study, adipose tissue was sampled in the first 15 patients (of 79 total study participants). Aside from possible accrual effects, this can be viewed as a completely random sample. Alternative designs are possible in which the probability of measuring the tissue sample depends on the value of the serum-based assay. In this paper, we will compare designs in which we observe the tissue sample (1) only when the serum-based assay is outside detectable limits, (2) when the serum-based assay is outside detectable limits and with a fixed probability in other cases, and (3) completely at random. All these designs focus on the situation for which the sampling mechanism of the tissue sample is fixed by design. Namely, we could obtain the tissue sample in all patients, but choose not to, due to the fact that we wish to limit the proportion of patients who will undergo an invasive procedure.
This proposed methodology is closely related to the classic measurement error in covariates problem, where covariates are measured with error and repeated measurements of surrogates are available3; the proposed model is a structural model in which a parametric model is assumed for the true covariate values (compared with a functional model in which the true covariates are assumed fixed or are assumed random with only minimal assumptions made about the distribution of the covariate). The methodology extends this framework by incorporating (1) LODs in the assays and (2) modeling systematic differences between the multiple assays.
The outline for the paper is as follows. In second section entitled “Modeling Framework,” we propose a regression modeling approach that includes multiple linear regression and logistic regression for continuous and binary outcome variables, respectively. We analyze data from the PCB/endometriosis study in “Example.” Simulations evaluating the efficiency for different designs are presented in the fourth section on “Simulations: Evaluation of Designs.” A discussion follows.
We will assume that the measurements from 2 assays, denoted as equation M1 and equation M2, have measurement error δ1i and δ2i, where the measurement errors have normal distributions with mean 0 and variance equation M3 and equation M4, respectively. The value equation M5 will be from an inexpensive or noninvasive assay (serum-based assay in PCB/endometriosis study), whereas the value equation M6 will be from an expensive or invasive assay (tissue-based assay). For simplicity, these 2 assays will be described as the inexpensive and expensive assay, respectively. We assume that the means of equation M7 and equation M8 are Δ(Xi) and Xi, respectively, where Xi is the “true” measurement without measurement error, and Δ(X) is a function of X that characterizes the calibration between the 2 assay values. We assume that Xi is normally distributed with mean μx and variance equation M9. When Δ(X) = X + Δ, then both assays differ only by a scalar factor; in this case, both assay measurements are parallel to each other. Another function we consider is Δ(X) = Δ0 + Δ1X, where the functional relationship between the 2 assays has both a scalar and a multiplicative factor.
In this paper, we focus on the situation in which the initial assay is subject to a lower LOD, and the more refined expensive or invasive assay is not subject to an LOD. We assume that equation M10 when Δ(Xi) + δ1i > C and equation M11 when Δ(Xi) + δ1iC, where C is the lower LOD for the inexpensive assay. Although we refer to these limits as LOD, others have called these limits quantification limits.
Simple Linear Regression
The relationship between the underlying true biologic measurement Xi and a continuous outcome variable Yi can be examined using the following simple linear regression model
equation M12
(1)
where εi is a residual error term that is assumed to be normally distributed with mean zero and variance equation M13. We assume that εi, δ1i, δ2i, and Xi are all independent of each other.
When neither of the 2 assays is subject to LOD, the likelihood can be formulated by noting that (Yi, equation M14, equation M15) is multivariate normal with mean μ and variance Σ, where μ = (β0 + β1μx, E[Δ(X)], μx)′ and where
equation M16
where E[Δ(X)] = Δ0 + Δ1μx, equation M17, and equation M18 when Δ(X) = Δ0 + Δ1X.
The likelihood that we need to maximize to obtain maximum likelihood estimates of the model parameters is equation M19, where the individual contribution to the likelihood is equation M20. This likelihood was maximized using the quasi-Newton–Rhapson procedure in GAUSS.4
When equation M21 is subject to an LOD, the appropriate likelihood to maximize is equation M22 where equation M23 is a bivariate normal distribution with mean given by the first and third element in μ and variance given by the first and third row and column in Σ, and where equation M24 is a univariate Gaussian cumulative distribution function formed by noting that the conditional distribution of a jointly normal random variable is a cumulative normal distribution.
This likelihood assumes that the second assay is performed at all measurements. In the PCB/endometriosis example, the second assay is more invasive and is only performed in a subset of patients. We consider the situation in which the initial assay is performed in all participants and the second assay is performed in only a subset of patients. Define Zi as an indicator of whether the second assay is performed in the ith patient (ie, the second assay is performed when Zi = 1 and is not performed when Zi = 0). This article considers the case in which the second assay could be observed in all patients, but we only perform the second assay in a subset of these patients due to cost or difficulty in performing this assay. For this type of “missing by design,” the missing data mechanism is either missing completely at random (ie, we observe the second assay without regard to the results of the first assay) or missing at random (ie, the probability of measuring the second assay depends on the value of the first assay). Under a missing data mechanism in which the second assay is observed either completely at random or at random, the individual contribution of the likelihood can be written as
equation M25
(2)
The likelihood corresponding to this design can be written as
equation M26
(3)
where equation M27, and where equation M28. The cumulative probability equation M29 is a cumulative normal because (Yi, Xi1) is bivariate normal, and therefore, equation M30 has a normal distribution.
Polynomial Regression
The linear relationship between exposure and outcome may not be appropriate in certain applications. Equation (1) can be generalized to allow for a more flexible relationship between exposure and outcome,
equation M31
(4)
where g may be a linear (g(x) = β1x), quadratic (g(x) = β1x + β2x2), cubic (g(x) = β1x + β2x2 + β3x3), or cubic spline function (equation M32, where K is the number of knot points and ck, k = 1, 2, . . ., K are the positions of the knot points). Unfortunately, unlike for the simple linear regression model presented previously, the joint distribution of equation M33 does not have a multivariate normal distribution when the 2 assay measurements are not subject to LODs. However, the model parameters can be estimated by maximum likelihood by using (3) where
equation M34
(5)
equation M35
(6)
equation M36
(7)
and
equation M37
(8)
where P(... | X) denotes P(... | Xi = X), P(X) denoting P(Xi = X) is a normal density with μx and variance equation M38, P(Yi | X) is a normal density with mean β0 + g(X) and variance σ2, equation M39 is a normal density with mean X and variance equation M40, and equation M41 is a cumulative normal distribution function with mean Δ(X) and variance equation M42. We evaluated these integrals numerically by using a simple trapezoidal rule with 400 function values between –5 and 5. This approach worked well in simulations.
Logistic Regression
The model can be generalized to allow for a nonnormal outcome. We focus our attention on the case, where Yi is a binary random variable. We consider the logistic regression model,
equation M43
(9)
As before, Xi is measured with error using 2 assays (equation M44 and equation M45), both subject to measurement error and the first subject to a lower LOD. Maximum likelihood estimation follows using (3) and (5)(8) with P(Yi | X) being the probability of either a positive or negative outcome with P(Yi = 1 | X)= exp{β0 + β1X)}/(1 + exp{β0 + β1X}).
Example
Louis et al1 have recently associated PCB exposure (as measured by 62 individual PCB congeners) with the risk of endometriosis in 79 patients. Specifically, they showed a positive relationship between total PCB exposure measured by serum assays and the risk of endometriosis by using logistic regression. Regardless of whether a serum measurement was above or below the LOD, all measurements were used in the analysis. In addition to serum measurements used in their analysis, assays determined from adipose tissue were also available in a subset of 15 of the 79 patients.2 Adipose tissue was collected in the first 15 patients during laparoscopic surgery. They also measured total lipids (using gravitational techniques) that, along with the presence or absence of endometriosis, will be used as outcome variables in our analyses. By using methodology proposed in this paper, we will incorporate assays from both serum and adipose tissue in this analysis.
For illustrative purposes, we focus on the PCB congener 153, which has 44% (35 of 79) of measurements below LODs (lower LOD = 0.142). Figure 1 shows the distribution of all 79 log-transformed serum values (including values that are below the lower LODs, which is graphically demonstrated with a dark bar) for congener 153; the plot suggests that the data are approximately normal on the log-scale. Figure 1 also shows that the distribution of total lipids is approximately normally distributed on the log-scale. Figure 2 shows the relationship between log-transformed serum and adipose measurements for the 15 patients in whom both assays were measured. The figure demonstrates a strong positive relationship between the serum and adipose tissue assay values for this small group of participants (r = 0.68, P = .005).
FIGURE 1
FIGURE 1
Histograms of log-transformed serum PCB congener 153 and log-transformed total lipids. The solid bar denotes the lower LOD for serum PCB congener 153.
FIGURE 2
FIGURE 2
Relationship between log-transformed PCB congener 153 measurements from adipose tissue and from serum. The line represents a least-squares fit through the data.
Although the values were available from the serum assays, which were below LODs, we did not incorporate this information into the analysis (ie, the models discussed previously assume that values below LOD are not available for analysis). Table 1 shows the maximum likelihood estimates for model (1) where the dependent variable Yi is log-transformed total lipids, the independent variable Xi is log-transformed PCB congener 153, and the relationship between the 2 assays is described by Δ(X) = Δ0 + Δ1X. Due to the small sample sizes in this study (recall that only 15 adipose tissue assays were performed in 79 study participants), a parametric bootstrap5 with 800 bootstrap samples was performed for assessing the variability of the parameter estimates in this study. For many of the model parameters, histograms of the bootstrap estimates were not close to being normally distributed, so we present 95% confidence intervals (CIs) for each of the parameter estimates using the percentile method. The results in Table 1 show that the estimated trend is positive equation M46 with the 95% CI being very close to excluding 0. Thus, there appears to be some evidence for a positive increase in log-transformed total lipids with increasing log-transformed PCB congener 153. The β1 coefficient can be interpreted as the average change in log-transformed total lipids per unit change in log-transformed PCB congener 153. A scale invariant quantity is the change in mean log-transformed total lipids from the first to the third quartile of log-transformed PCB congener 153 true values. This quantity is 2(0.675)β1 × σx = 2(0.675)(0.726)(0.449) = 0.44 with the 95% CI nearly excluding 0 (95% CI = –0.02 to 0.92). The results in Table 1 also show that the measurement error for the initial serum assay is larger than that for the second assay that was performed with samples from adipose tissue. Further, the estimates of Δ0 and Δ1 suggest that the 2 assays can be calibrated by a constant shift (ie, Δ1 is not significantly different from 1). We also fit a model in which the measurements below the lower LOD were used in the analysis (data not shown). The results showed similar estimates with slightly smaller 95% CIs to those presented in Table 1. For example, equation M47 (95% CI = 0.00–1.74) and the estimated change in the mean log-transformed total lipids from the first to the third quartile was 0.46 (0.00–0.85). Thus, there appears to be a small increase in precision in incorporating the actual values of the serum assay when they are below LOD. In the next section, we will explore this in more details with simulations.
TABLE 1
TABLE 1
Maximum Likelihood Estimates of the Model, Yi = β0 + β1Xi + εi
Table 2 shows the maximum likelihood estimates for a simple logistic regression model (model [3]) with the dependent variable Yi representing the occurrence of endometriosis during laparoscopic surgery and Xi representing the log-transformed PCB congener 153 value. The results in Table 2 show a positive effect equation M48 of log-transformed PCB congener 153 on the probability of endometriosis. However, this effect was not statistically different from zero (ie, 95% CI contained zero). The change in the log odds of endometriosis from the first to the third quartile of values of PCB congener 153 is 0.63 (95% CI = –0.26–2.08). As with the continuous total lipids outcome (Table 1), the measurement error is substantially smaller for the adipose tissue assay compared with the serum assay. Also, estimates of Δ0 and Δ1 suggest that there is a near constant shift between the 2 assays.
TABLE 2
TABLE 2
Maximum Likelihood Estimates of the Model, logit P(Yi = 1) = β0 + β1Xi
In this study, the adipose tissue assays were performed in the first 15 patients in the study. An alternative design would have been to sample the adipose tissue with a probability depending on the value of the serum assay. For example, a design in which the adipose tissue assay is performed only when the initial serum assay is below LOD could be considered. We explore various designs with simulations in the next section.
We examine designs through simulations. Specifically, we compare designs in which (1) the second more precise assay is measured only when the first assay is below LOD, (2) the second more precise assay is measured when the first assay is below LOD and with probability P when the first assay is above LOD, and (3) when the second assay is measured completely at random (ie, without regard to the value of the initial assay). In the following subsections, we compare these designs for the different models previously discussed. All simulations presented were conducted with I = 1000 to mimic asymptotic results for comparing different designs. However, comparisons with smaller sample sizes of I = 100 gave similar results for most parameter estimates.
Simple Linear Regression
Table 3 shows simulations under a simple linear model (model given by [1]). The parameters were chosen such that (1) one-half the values of the initial assay were below LOD, (2) the measurement error for the first assay was 4 times that of the second assay, and (3) the calibration between the first assay and the second is specified by the linear function Δ0 + Δ1X. The columns corresponding to Design A show the mean (standard deviation [SD]) of the parameter estimates for a design for which the second more precise assay is measured when the initial less precise assay is below LOD and with probability P when the initial assay is above LOD. The columns corresponding to Design B show the mean (SD) parameter estimates for a design in which the second assay is measured completely at random (without regard to the value of the initial assay) with probability P. A comparison of these 2 designs shows the efficiency advantage of Design A with P = 0 compared with Design B with P = .05 (note that both of these designs have the same overall 50% probability of measuring the second assay). For example, the relative efficiency for estimating β1 for Design B with P = .05 compared with Design A with P = 0 is 14.27 (= [0.136/0.036]2). Thus, designs in which we only observe the second more expensive assay when the first assay is below LOD are highly inefficient relative to sampling the second assay completely at random in a comparable proportion of participants. Table 3 also shows the large efficiency advantages of taking a small proportion of second-assay measurements when the first assay is above LOD compared with a design that only measures the second assay when the initial assay is below LOD. For example, there is a large efficiency gain in using Design A with P = .05 compared with using Design A with P = 0 (in this case, the relative efficiency for estimating β1 is ([0.136/0.052]2 = 6.84). Although Design B is more efficient than Design A when the second assay is measured in a relatively small fraction of patients, both designs have similar efficiency when a larger proportion of second assays are sampled. For example, the relative efficiencies when using Design A with P = .50 and Design B with P = .75 are nearly identical (both these designs have the same overall 75% probability of measuring the second assay).
TABLE 3
TABLE 3
Simulation Under the Model, Yi = β0 + β1Xi + εi
The assay measurements below the LOD are not incorporated in the approaches taken in this paper. We assume that those values are left censored at the lower LOD (in many cases, the values below LOD are not provided by the laboratory). In a simulation with the same true parameters sample size as those described in Table 3 and with the second assay measured completely at random in 50% of patients, we estimated the mean (SD) estimated parameter value for each of the 9 parameters (data not shown). There was only a moderate efficiency gain in incorporating these values. For example, the efficiency gain for estimating β1 was 27% ([0.036/0.032]2) when, as in Table 3, 50% of initial values were below LOD. The efficiency gain will be substantially smaller when there are fewer initial values that are below LOD.
The proposed methodology is completely parametric and inferences may be sensitive to these parametric assumptions. We examined the robustness of the methodology to normal assumption for Xi and εi with 2 simulations for Design A with P = .50. First, we generated data as in Table 3, with Xi being generated as a t distribution with 3 degrees of freedom instead of the assumed normal distribution. Estimates of β0 and β1 were nearly unbiased with this moderate departure from normality (data not shown). Second, we generated data as in Table 3, with εi being a t distribution with 3 degrees of freedom. Estimates of β0 and β1 were also nearly unbiased (data not shown). These results suggest that the methodology is robust to moderate departure from normality for the variance of the true covariate and for the residual variance in the simple linear regression.
Polynomial Regression
Table 4 shows simulations under a cubic model as described in the second section. As in Table 3, we present simulation results for a design in which the second assay is measured when the first assay is below LOD and with probability P when the first assay is above LOD (Design A) and for a design in which the second assay is measured completely at random with probability P (Design B). In addition to presenting results for each model parameter, the table shows the mean change in Y from the first to the third quartile in X. The results demonstrate nearly unbiased estimation with a sample size of I = 1000. Results are similar to those presented for the linear regression model in Table 3. The simulation results demonstrate the advantage of performing at least a small fraction of second assays when the first assay is above LOD. For Design A with P = .05 compared with P = 0, the relative efficiency for estimating the change in the mean response from the first to the third quartile in X is 1.58 ((0.048/0.037)2). There is a substantially less efficiency gain in performing a large fraction of second assays above LOD. Further, completely random verification is more efficient than only measuring the second assay when the initial assay is below LOD (eg, for estimating the mean response from the first to the third quartile, the efficiency of Design B with P = .50 compared with Design A with P = 0 is ([0.048/0.028]2 = 2.94). As for simple linear regression model (Table 3), Designs A and B provide more similar efficiency when a large proportion of second assays are sampled (eg, the efficiency of Design B with P = .75 compared with Design A with P = .50 is ([0.025/0.028]2 = 0.80).
TABLE 4
TABLE 4
Simulation Under the Model, equation M55
Logistic Regression
This section examines the statistical properties of model (9) under different designs. Table 5 shows simulations for the logistic regression model under both Designs A and B with varying P. With the exception of σδ2, which is lightly upwardly biased, the estimates are nearly unbiased with a sample size of I = 1000. The bias in estimating σδ2 is reduced with larger sample sizes (data not shown). As in other simulations, the efficiency advantages of measuring the second more expensive assay on at least a subset of measurements when the initial assay is above LOD. Also, completely random verification is more efficient than only sampling when the initial assay is above LOD (compare Design B with P = .50 to Design A with P = 0).
TABLE 5
TABLE 5
Simulation Under the Logistic Regression Model, logit P(Yi = 1) = β0 + β1Xi
This paper proposed methodology for estimating the regression relationship between exposure and outcome when the exposure variable is assessed by multiple assays that are potentially subject to detection limits. We developed models and evaluated designs for both continuous and binary outcome variables and considered the case in which the relationship between outcome and exposure followed a polynomial or smoothing spline relationship. The PCB/endometriosis study measured a serum assay in all study participants and the more definitive “gold standard” assay on only a subset of the total patients. In this study, the second gold standard assay was measured from adipose tissue sampled during laparoscopy surgery and can be considered to be sampled completely at random. We explored the alternative design in which the second assay is performed when the first assay is below LOD and with probability P when the initial assay is above LOD. We found that the efficiency of a design in which the second assay is measured only when the initial assay is below LOD is substantially improved when only a small percentage of second assays are measured when the initial assay is above LOD. Additionally, designs in which we only observe the second assay when the first assay is below LOD is highly inefficient relative to sampling the second assay completely at random in a comparable proportion of patients.
In the PCB/endometriosis study, the second assay was an invasive, tissue-based assay, whereas the initial assay was a serum-based assay. In other settings, the second assay may be more expensive than the initial assay, and designs may be compared based on minimizing study costs. In these cases, rather than comparing relative efficiencies as was done in this paper, we can compare designs based on minimizing cost functions. Such an approach would require specifying the relative cost of the 2 assays, and could easy be implemented within the framework discussed in this paper.
For the PCB/endometriosis study, the serum assays were subject to LODs, whereas the gold standard adipose tissue assay was not subject to LOD. Thus, the model development and design considerations were for the situation in which the inexpensive/noninvasive assay measured in all individuals was subject to lower LOD, whereas the gold standard expensive/invasive assay measured on a subset of individuals was not subject to LOD. The model and simulations could be easily altered to allow for LOD on the second assay. Furthermore, the model could be easily altered additionally to allow for an upper LOD for the initial assay. Our investigation is focused on the situation where there are 2 assays. The methodology can easily be extended to incorporate more than 2 assays measuring a single exposure. We expect that the design results presented in this paper will be similar for these alternative formulations.
Most of the models presented in this article assume that values below LOD are not available and treat these values as being left censored. In many instances, values outside LODs are not reported by the laboratory or, for a particular assay, values below a certain limit cannot be quantified. However, as a general principle, these values should be collected when possible, and depending on the type of assay, should be used in the analysis.6 For the PCB/endometriosis study, where the value of serum assay measurements below LOD were recorded, we were able to examine the advantages of using the actual measurements below the LOD rather than treating these values as left censored. We found small efficiency gains in incorporating these values in the analysis. Further, simulation studies confirmed these efficiency gains in situations in which the proportion of initial assays below LOD was sizable. In the analyses and simulations, we assumed that measurement error was constant and did not depend on the actual true measurement of the exposure variable. More complex models that allow for a measurement error process that depends on the true value may be more appropriate when we are analyzing actual measurements below LODs.6 The development of such models is an area of future research.
We proposed models for cross-sectional data such as the PCB/endometriosis study in which single exposure and outcome measurements were obtained. These models could be extended to the longitudinal setting in which both exposure and outcome are measured longitudinally in time. Albert7 proposed such a model when the outcome in a longitudinal study is measured with multiple assays. A similar model could be developed in which the exposure variable is measured with multiple assays. This is an area of future research.
This article focuses on the situation in which the second assay is missing by design. Namely, an investigator could measure the second assay in all study participants, but because of cost or feasibility issues only performs the second assay in a subset of study participants. In other applications, investigators may attempt to measure the second assay in all participants, but may have missing data. For example, if the second assay requires a tissue specimen, then it may be missing when the tissue sample cannot be obtained due to the participants refusal to undergo the procedure or due to the surgeon failure to obtain the sample. If the reason for missing data mechanism is missing at random or missing completely at random,8 then the proposed methodology is appropriate. However, if the probability of missing the second assay depends on the value of the second assay had it been observed (this type of missing data mechanism has been referred to as nonignorable missingness) then the methodology presented in this article may result in biased inference. Extensions of the models for nonignorable missingness is an area of future research.
ACKNOWLEDGMENTS
We thank the 2 reviewers for constructive comments that lead to a significant improvement of this article. We thank Enrique Schisterman, Brian Whitticomb, and Germaine Louis for providing the PCB/endometriosis study data and for helpful discussions. The research used the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (http://biowulf.nih.gov).
Supported partially by the Long Range Initiative of the American Chemistry Council and funding from the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development; National Institutes of Health.
1. Louis GM, Weiner JM, Whitcomb BW, et al. Environmental PCB exposure and risk of endometriosis. Hum Reprod. 2005;20:279–285. [PubMed]
2. Whitcomb BW, Schisterman EF, Buck G, Weiner JM, Greizerstein H, Kostyniak PJ. Relative concentrations of organochlorines in adipose tissue and serum among reproductive age women. Environ Toxicol Pharmacol. 2005;19:203–213. [PubMed]
3. Carroll RJ, Ruppert D, Stefanski LA. Measurement Error in Nonlinear Models. Chapman & Hall; New York: 1995.
4. Aptech Systems Gauss Systems, Version 3.0. Aptech Systems; Ravensdale, WA: 1992.
5. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman and Hall; New York: 1993.
6. Guo Y, Harel O, Little RJ. How Well Quantified is the Limit of Quantification. University of Connecticut Department of Statistics; Storrs, CT: 2008. Technical Report 20.
7. Albert PS. Modeling longitudinal biomarker data from multiple assays that have different known detection limits. Biometrics. 2008;64:527–537. [PubMed]
8. Little RJ, Rubin DB. Statistical Analysis with Missing Data. John Wiley; New York: 1987.