Home | About | Journals | Submit | Contact Us | Français |

**|**Int J Epidemiol**|**PMC2786252

Formats

Article sections

Authors

Related links

Int J Epidemiol. 2009 December; 38(6): 1674–1680.

Published online 2009 August 10. doi: 10.1093/ije/dyp269

PMCID: PMC2786252

* Corresponding author. McGavran-Greenberg Hall, Campus Box 7435, Chapel Hill, NC 27599-7435, USA. E-mail: ude.cnu@eloc

Accepted 2009 July 6.

Copyright Published by Oxford University Press on behalf of the International Epidemiological Association © The Author 2009; all rights reserved.

This article has been cited by other articles in PMC.

**Background** In epidemiologic research, little emphasis has been placed on methods to account for left-hand censoring of ‘exposures’ due to a limit of detection (LOD).

**Methods** We calculate the odds of anti-HIV therapy naiveté in 45 HIV-infected men as a function of measured log_{10} plasma HIV RNA viral load using five approaches including *ad hoc* methods as well as a maximum likelihood estimate (MLE). We also generated simulations of a binary outcome with 10% incidence and a 1.5-fold increased odds per log increase in a log-normally distributed exposure with 25, 50 and 75% of exposure data below LOD. Simulated data were analysed using the same five methods, as well as the full data.

**Results** In the example, the estimated odds ratio (OR) varied by 1.22-fold across methods, from 1.45 to 1.77 per log_{10} copies of viral load and the standard error for the log OR varied by 1.52-fold across methods, from 0.31 to 0.47. In the simulations, use of full data or the MLE was unbiased with appropriate confidence interval (CI) coverage. However, as the proportion of exposure below LOD increased, substituting LOD, LOD/√2 or LOD/2 was increasingly biased with increasingly inappropriate CI coverage. Finally, exclusion of values below LOD was unbiased but imprecise.

**Conclusions** In this example and the settings explored by simulation, and among methods readily available to investigators (i.e. sans full data), the MLE provided an unbiased and appropriately precise estimate of the exposure–outcome OR.

In epidemiologic research, emphasis is placed rightly on appropriately accounting for incomplete outcome data due to censoring.1 For example, methods are commonly employed to account for right-hand censoring of times-to-event due to drop-out or study completion. In addition, analyses of biomarker *outcomes* due to a limit of detection (LOD) commonly employ methods (for possible mixtures of true zeros) and left-hand censoring at the LOD.2–10 However, with few exceptions,11–15 little emphasis has been placed on principled methods to account for left-hand censoring of biomarker *exposures* due to an LOD.

The LOD is the lowest quantity of a substance that can be distinguished from the absence of the substance. Most analytic instruments produce a signal even when a matrix without the analyte (i.e. a blank) is analysed. The LOD is often estimated as the mean of the blank plus the product of a confidence level (e.g. 99%) and the standard deviation (SD) of the blank. Values below the LOD may indeed be useful (albeit perhaps subject to more measurement error than values above the LOD), but here we assume that such values are not reported by the laboratory, as is common practice.

We compare the use of several *ad hoc* methods to account for an exposure with an LOD when estimating the odds ratio (OR). In particular, we describe an example of estimating the odds of anti-HIV therapy naiveté in HIV-infected men as a function of the plasma HIV RNA viral load. In addition, we describe moderate- and large-sample simulation experiments and provide a summary of simulation results to guide future research.

As an example, we calculate the odds of anti-HIV therapy naiveté in 45 HIV-infected heterosexual men as a function of the log_{10} plasma HIV RNA viral load. The 45 HIV-infected men have been previously described.16 Briefly, viral load was ascertained from plasma samples using RNA amplification yielding an LOD of 400 copies/ml of plasma. History of anti-HIV therapy use was obtained by self-report. Identification of particular agents or therapy regimens by self-report may be subject to substantial misclassification, but classification of having ever used any anti-HIV therapy by self-report is likely subject to substantially less misclassification.

Due to the established effectiveness of anti-HIV therapies,^{2,17} one would surmise *a priori* that a high level of viral load would suggest that a patient with established HIV is therapy naïve, although patients receiving anti-HIV therapies that fail or patients that are not adherent may also demonstrate high levels of viral load.

Data were analysed with logistic regression for the binary outcome therapy naiveté with the log_{10} HIV RNA viral load exposure as the sole covariate using five approaches. First, records with values of exposure below the LOD were excluded from analysis. Secondly, the LOD was substituted for exposure values below the LOD. Thirdly, the LOD/√2 was substituted for exposure values below the LOD. Fourthly, the LOD/2 was substituted for exposure values below the LOD. Lastly, a maximum likelihood estimate (MLE) was obtained as described in the following paragraph.

One can find the MLE accounting for an LOD in the exposure. Although the likelihood function has no closed form, it can be expressed through an 1D integral. As is standard with maximum likelihood,18 given the correct data generating mechanism, the estimates are asymptotically unbiased and efficient. The log likelihood is

where *y* is an indicator of the event of interest, , δ is an indicator of exposure *x* ≤ *LOD*,

with unknown mean *µ* and SD *σ* and . This integral must be evaluated numerically. We do so by a Riemann sum, which can approximate *q* to a chosen level of accuracy. SAS code is provided in Appendix 1, using the NLMIXED procedure to maximize this log likelihood. R code is provided in Appendix 2, using the optim() function to maximize this log likelihood.

Six scenarios were studied varying the sample size as moderate (*N* = 500) or large (*N* = 2000) and the percent of exposure below the LOD as 25, 50 or 75%. Undetectable percentages as large as 50% may be seen in epidemiologic research on novel biomarkers before great strides are made in the refinement of the assay or measurement. For example, in the mid-1990s, when effective therapies for HIV debuted assays for HIV RNA viral load were still in early stages of refinement and often provided large percentages of undetectable measurements for successfully treated individuals. Whereas we would rarely expect to see biomarkers with 75% below the LOD, this scenario is studied to provide a more complete picture. For each scenario, 5000 simulations were generated.

For each simulated subject, a log-normally distributed exposure, *X*, was generated to have a median of 1 and first and third quartiles of 0.5 and 2.0, respectively. Limits of detection were set at 0.5, 1 and 2 to achieve the desired percents below the LOD. To reflect typical epidemiologic studies, a binary outcome was generated with a marginal incidence of ~10%, conditional on the log-normal exposure with an OR of 1.5, which yielded ~75 and >99% statistical power in the moderate and large sample settings, respectively.

Simulated data were analysed with logistic regression for the binary outcome with the log-normal exposure as the sole covariate using the same five approaches described for the example as well as the full data. Full data were analysed using true values for exposure below the LOD. This full data approach is not typically available to investigators but is conducted here to provide a reference.

Simulation results are tallied as: percent bias, defined as , where is the average (over simulations) estimated log OR or ; Monte Carlo standard error for the log OR, defined as ; statistical power, defined as the percent of simulations that produce a statistically significant result given a true OR of 1.5, with a two-sided *α* of 0.05; and confidence interval (CI) coverage, defined as the percent of simulations where the 95% CI traps the true OR. A separate simulation study was conducted to assess the validity of the approaches (i.e. type 1 error) using the large sample size and 50% of exposure below the LOD. The simulation standard error for CI coverage or the type 1 error (under the null) was ±0.3%.

Of 45 men, 23 were non-White and the median age (inter-quartile range, IQR) was 40 (37–46) years. Of 45 men, 10 had infection duration >6 years and 1/3 had an existing AIDS diagnosis. Of 45, 10 had a viral load below the LOD (of 400 copies/ml); among the 35 of 45 with a quantified viral load, the median number of log_{10} copies (IQR) was 4.57 (3.68–5.04). The mean log_{10} viral load was 4.41 and the skew was –0.36, suggesting (given the limited sample size) a relatively log-normal distribution. Of 45 men, 12 were therapy naïve by self-reports. Figure 1 provides the viral load values by anti-HIV therapy naiveté status.

Distribution of HIV RNA viral load by anti-HIV therapy naiveté in 45 HIV-infected men. Overlapping values are stacked so that all observations are visible

As shown in Table 1, the estimated OR of being therapy naïve for each log_{10} difference in viral load ranged from 1.45 with the MLE to 1.77 with substitution of the LOD. Exclusion and substitution of LOD/2 yielded similar ORs of 1.54 (95% CI 0.61–3.90) and 1.56 (95% CI 0.90–2.69), respectively. Substitution of LOD and LOD/√2 yielded larger ORs of 1.77 (95% CI 0.88–3.56) and 1.64 (95% CI 0.90–2.99), respectively. As expected given the simulation results (see below), relative to the MLE, the *ad hoc* substitutions provided estimates of the OR that were 8–22% larger than the MLE, and therefore likely biased away from the null in this setting. Also, the *ad hoc* substitutions provided standard errors that were 36–18% smaller than the MLE. These smaller standard errors were likely overly optimistic given that single deterministic substitutions do not account for the uncertainty in the missing values below the LOD.

As expected, across the six scenarios studied, the use of full data provided an approximately unbiased estimate of the OR with appropriate CI coverage (Table 2). In addition, the MLE provided an unbiased estimate with appropriate CI coverage in all scenarios. The MLE, which accounts for the uncertainty in the missing data, appropriately provided slightly less precision than the full data even when the proportion below the LOD was only 25%. However, when the proportion below the LOD was 75%, the MLE was >30% less efficient than the estimate obtained from the full data. Exclusion of exposures below the LOD was relatively unbiased, but sacrificed precision. For instance, with 50% of exposure below the LOD, exclusion resulted in standard errors approximately twice as those of the full data and was therefore associated with 75% efficiency loss. Substitution of LOD, LOD/√2 or LOD/2 provided estimates of the OR that were increasingly biased as the percent of exposure below the LOD increased. Also, substitution of LOD, LOD/√2 or LOD/2 provided inadequate CI coverage, which was more apparent in the large sample (*N* = 2000) scenarios. For instance, with 75% of exposure below the LOD and *N* = 2000, substitution of LOD, LOD/√2 or LOD/2 provided subpar 54, 69 and 88% CI coverage, respectively. Under the null, the following type 1 errors were obtained for full data, excluding exposure below the LOD, substituting LOD, LOD/√2, LOD/2, and the MLE: 5.0, 4.9, 5.0, 4.9, 4.6 and 4.7%, respectively. All estimated type 1 errors were within two simulation standard errors of the expected 5%.

We demonstrated that many common *ad hoc* methods to account for an LOD in an exposure variable are flawed. Use of the MLE appeared unbiased, appropriately precise (relative to the full data) and provided appropriate CI coverage and type 1 error. While exclusion of exposures below the LOD was unbiased, it was only unbiased in these settings because the effect of exposure on outcome was constant across the range of exposure.19 Moreover, a great loss in precision may accompany exclusion of values below the LOD: a portion of this information may be recovered using the MLE. The example analysis supported simulation results, but is based on a small study of 45 HIV-infected men.

Existing work on left- or interval-censored exposures is scarce. Lynn11 compared several *ad hoc* substitutions, a maximum likelihood approach and several multiple-imputation approaches using the example of left-censored HIV viral load as an exposure for incident AIDS as well as simulations. Lynn found the maximum likelihood approach operated best (i.e. smallest mean squared error). Gomez *et al.*13 derived MLEs for a discrete-valued *interval-censored* baseline characteristic in a randomized trial, and demonstrated by simulation that midpoint substitution is flawed. Richardson and Ciampi12 discussed the setting where a left-censored exposure is subject to measurement error, they suggested substitution of the expectation below the LOD will provide unbiased estimates and they describe simulations demonstrating that standard results for non-differential, independent measurement error may not apply in the presence of a left-censored exposure. Schisterman *et al.*14 developed a substitution method for handling left-censored exposures in linear and logistic regression with a nutritional biomarker example and simulations: the authors concluded that replacing values below the LOD with the expectation ‘above’ the LOD provided unbiased estimates while overcoming the distributional assumptions of other methods. Lei *et al.*15 provided a likelihood-based framework that incorporates the approaches of Richardson and Ciampi12 and Schisterman *et al.*14 for linear regression, as well as simulations and two worked examples: the authors conclude that the likelihood-based approach is optimal, albeit subject to distributional assumptions.

In addition to accounting for an LOD by the use of methods for left-censored observations, one may consider a proportion of the values below the LOD to be true zeros. Methods that allow the exposure to be a mixture of true zeros and continuous values, possibly left-censored by an LOD, may be more appropriate in such settings. Approaches have been developed for outcomes,^{6,10} but to our knowledge not for exposures.

Alternatively, as proposed by Schisterman *et al.* in the receiver operating characteristic curve context,^{20,21} one could fit the analogy of a two-part model, such as a binary indicator for being below the LOD and a continuous regressor for values above the LOD. In such extensions, as in the approach described here, the choice of parametric distribution for exposure values (or values above 0 or the LOD in the mixture or two-part model, respectively) is crucial. Therefore, thorough exploratory data analysis and information from subject-matter specialists are essential.

One can envision the *ad hoc* solutions studied, e.g. substitution of LOD/2, LOD/√2 and the LOD itself as points along a continuum of possible values for the left-censored exposures. Therefore, it is reasonable that the bias is a function of the substituted value. Indeed, if one were able to plug-in the expectation of exposure below the LOD for values below LOD, the resultant estimate would be approximately unbiased.12 However, in practice, one will not typically know the expectation of exposure below the LOD and therefore not be able to simply calibrate the substitution value to remove bias (although once a parametric form is specified for the likelihood, one could calculate this expectation in principle). Even if one were able to simply calibrate such a substitution, the resultant interval estimate would be overly precise due to not accounting for the uncertainty in the calibration.

However, it should be noted that substitution of the LOD/2 worked fairly well in simulations with ≤50% exposure data below the LOD. This is the case here because the expectation of exposure below the LOD (SD) was ~0.3 (0.12), 0.5 (0.25) and 0.8 (0.5) for 25, 50 and 75% of exposure below the LOD, respectively. Therefore, at an LOD of 0.5, and of the substitutions explored, LOD/√2 = 0.35 will best mimic the expectation below the LOD; at an LOD of 1, substitution of LOD/2 = 0.5 will best mimic the expectation below the LOD; and at an LOD of 2, substitution of LOD/2.5 = 0.8 (not explored) would have best mimicked the expectation below the LOD.

Whereas these simulations typified data seen in epidemiologic research (albeit enriching the percent below the LOD), more empirical examples are warranted. A limitation of maximum likelihood is its possible sensitivity to incorrect specification of the model for the data: this in particular could be explored in future work. In addition to maximum likelihood, multiple-imputation22 and Bayesian23 methods should also be explored in more detail. Indeed, multiple-imputation provides MLEs when data are missing at random and the imputation model form is correct.22 However, it should be noted that simply multiplying imputing values from a uniform distribution below the LOD will only provide unbiased results if the uniform distribution correctly reflects the distribution of biomarker values below the LOD, which is unlikely in epidemiologic settings. In conclusion, in the settings explored and among methods typically available to investigators (i.e. sans full data or substitution of expectation of exposure below the LOD), many *ad hoc* approaches to handling exposures subject to an LOD are flawed. However, maximum likelihood provides an unbiased and appropriately precise estimate of the exposure–outcome association.

American Chemical Council and the Intramural Research Program of the *Eunice Kennedy Shriver* National Institute of Child Health and Human Development; NIH grants R03-AI-071763, R01-AA-017594 and P30-AI-50410 (to S.R.C.); Lineberger Cancer Center Core Grant CA16086 from the National Cancer Institute and P30-AI-50410 from the National Institutes of Health (to H.C.); Intramural Research Program of the *Eunice Kennedy Shriver* National Institute of Child Health and Human Development, National Institutes of Health and the American Chemistry Council (to E.F.S.).

The authors would like to thank Drs David Richardson and Kate Buchacz, as well as the members of the Detection Limit Working Group, for their expert advice. Views expressed in this paper do not necessarily represent the official positions of the US FDA.

* Conflict of interest:* None declared.

- Little emphasis has been placed on principled methods to account for left-hand censoring of biomarker exposures due to a limit of detection (LOD).
- Ad hoc methods to account for an exposure LOD are flawed.
- Maximum likelihood methods are unbiased and appropriately precise.

SAS code for the MLE

`proc nlmixed;`

`parms b0=0 b1=0 c0=1 c1=1;`

`p=1/(1 + exp(-(b0+b1*log(x))));`

`q=0;`

`do k=0+(loq/1e4) to lod by (loq/1e4);`

`q=q+exp(y*(b0+b1*log(k)))/(1+exp(b0+b1* log(k)))*pdf("lognormal",k,c0,c1);`

`end;`

`q=q*(loq/1e4);`

`logL = y*log(p)*(1-delta)+`

`(1-y)*log(1-p)*(1-delta)+`

`log(pdf("lognormal",x,c0,c1))*(1-delta)+`

`log(q)*delta;`

`model y~general(logL);`

R code for the MLE

`int.p1<-function(xx) {`

`1/(1+exp(-b0-b1*log(xx)))*dlnorm(xx, c0, c1)`

`}`

`int.p0<-function(xx) {`

`1/(1+exp(b0+b1*log(xx)))*dlnorm(xx, c0, c1)`

`}`

`logitln <- function(para){ `

`b0 <<- para[1]`

`b1 <<- para[2]`

`c0 <<- para[3]`

`c1 <<- para[4]`

`px <- 1/(1+exp(-b0-b1*log(x)))`

`logL.1<-sum((1-delta)*(y*log(px)+(1-y)*log (1-px)+log(dlnorm(x,c0,c1))))`

`logL.2<-sum(delta*y*log(integrate(int.p1, lower=0,upper=LD)$value)+delta*(1-y)* log(integrate(int.p0,lower=0,upper=LD)$value)) -(logL.1+logL.2)`

`}`

`fit <- optim(par=c(-1, 1, 0, 1), fn=logitln, hessian = T)`

`rbind(est=fit$par, se=sqrt(diag(solve (fit$hessian))))`

1. Rothman KJ, Greenland S, Lash T. Modern Epidemiology. 3rd. New York: Lippincott–Raven; 2008.

2. Hammer SM, Squires KE, Hughes MD, et al. A controlled trial of two nucleoside analogues plus indinavir in persons with human immunodeficiency virus infection and CD4 cell counts of 200 per cubic millimeter or less. AIDS Clinical Trials Group 320 Study Team. N Engl J Med. 1997;337:725–33. [PubMed]

3. Marschner IC, Betensky RA, DeGruttola V, Hammer SM, Kuritzkes DR. Clinical trials using HIV-1 RNA-based primary endpoints: statistical analysis and potential biases. J Acquir Immune Defic Syndr Hum Retrovirol. 1999;20:220–27. [PubMed]

4. Hughes JP. Mixed effects models with censored data with application to HIV RNA levels. Biometrics. 1999;55:625–29. [PubMed]

5. Jacqmin-Gadda H, Thiebaut R, Chene G, Commenges D. Analysis of left-censored longitudinal data with application to viral load in HIV infection. Biostatistics. 2000;1:355–68. [PubMed]

6. Berk KN, Lachenbruch PA. Repeated measures with zeros. Stat Methods Med Res. 2002;11:303–16. [PubMed]

7. Thiebaut R, Jacqmin-Gadda H. Mixed models for longitudinal left-censored repeated measures. Comput Methods Programs Biomed. 2004;74:255–60. [PubMed]

8. Lubin JH, Colt JS, Camann D, et al. Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect. 2004;112:1691–6. [PMC free article] [PubMed]

9. Cole SR, Hernán MA, Anastos K, Jamieson BD, Robins JM. Determining the effect of highly active antiretroviral therapy on changes in human immunodeficiency virus type 1 RNA viral load using a marginal structural left-censored mean model. Am J Epidemiol. 2007;166:219–27. [PubMed]

10. Chu H, Gange SJ, Li X, et al. Effect of HAART on HIV RNA trajectory among treatment naive men and women: a segmented Bernoulli/lognormal random effects model with left censoring. Epidemiology. 2009 (in press) [PMC free article] [PubMed]

11. Lynn HS. Maximum likelihood inference for left-censored HIV RNA data. Stat Med. 2001;20:33–45. [PubMed]

12. Richardson DB, Ciampi A. Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol. 2003;157:355–63. [PubMed]

13. Gomez G, Espinal A, Lagakos S. Inference for a linear regression model with an interval-censored covariate. Stat Med. 2003;22:409–25. [PubMed]

14. Schisterman EF, Vexler A, Whitcomb BW, Liu A. The limitations due to exposure detection limits for regression models. Am J Epidemiol. 2006;163:374–83. [PMC free article] [PubMed]

15. Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF. Linear regression with an independent variable subject to a detection limit. Epidemiology. 2009 (in press) [PMC free article] [PubMed]

16. Buchacz K, Pan CY, van der Straten A, Hanson CV, Padian N. HIV viral load and viral cultures in sexually active heterosexual men. J Acquir Immune Defic Syndr. 2000;23:98–9. [PubMed]

17. Sterne JA, Hernán MA, Ledergerber B, et al. Long-term effectiveness of potent antiretroviral therapy in preventing AIDS and death: a prospective cohort study. Lancet. 2005;366:378–84. [PubMed]

18. Pawitan Y. In All Likelihood: Statistical Modeling and Inference using Likelihood. New York: Oxford University Press; 2001.

19. Little RJA. Regression with missing Xs: a review. JASA. 1992;87:1227–37.

20. Schisterman EF, Reiser B, Faraggi D. ROC analysis for markers with mass at zero. Stat Med. 2006;25:623–38. [PubMed]

21. Schisterman EF, Faraggi D, Reiser B, Hu J. Youden Index and the optimal threshold for markers with mass at zero. Stat Med. 2008;27:297–315. [PMC free article] [PubMed]

22. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd. New York: Wiley; 2002.

23. Greenland S. Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int J Epidemiol. 2006;35:765–75. [PubMed]

Articles from International Journal of Epidemiology are provided here courtesy of **Oxford University Press**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |