Linear regression with a left-censored independent variable X due to limit of detection (LOD) was recently considered by 2 groups of researchers: Richardson and Ciampi, and Schisterman and colleagues.
Both groups obtained consistent estimators for the regression slopes by replacing left-censored X with a constant, that is, the expectation of X given X below LOD E(X|X
Schisterman and colleagues argued that their approach would be a better choice because the sample mean of X given X above LOD is available, whereas E(X|X
Recommendations are given based on theoretical and simulation results. These recommendations are illustrated with 1 case study.
Motivation: The Illumina BeadArray is a popular platform for profiling DNA methylation, an important epigenetic event associated with gene silencing and chromosomal instability. However, current approaches rely on an arbitrary detection P-value cutoff for excluding probes and samples from subsequent analysis as a quality control step, which results in missing observations and information loss. It is desirable to have an approach that incorporates the whole data, but accounts for the different quality of individual observations.
Results: We first investigate and propose a statistical framework for removing the source of biases in Illumina Methylation BeadArray based on several positive control samples. We then introduce a weighted model-based clustering called LumiWCluster for Illumina BeadArray that weights each observation according to the detection P-values systematically and avoids discarding subsets of the data. LumiWCluster allows for discovery of distinct methylation patterns and automatic selection of informative CpG loci. We demonstrate the advantages of LumiWCluster on two publicly available Illumina GoldenGate Methylation datasets (ovarian cancer and hepatocellular carcinoma).
Availability: R package LumiWCluster can be downloaded from http://www.unc.edu/~pfkuan/LumiWCluster
Supplementary information: Supplementary data are available at Bioinformatics online.
Treatment effect is traditionally assessed through either superiority or non-inferiority clinical trials. Investigators may find that because of safety concerns and/or wide variability across strata of the superiority margin of active controls over placebo, neither a superiority nor a non-inferiority trial design is ethical or practical in some disease populations. Prior knowledge may allow and drive study designers to consider more sophisticated designs for a clinical trial.
In this paper, the authors propose hybrid designs which may combine a superiority design in one subgroup with a non-inferiority design in another subgroup or combine designs with different control regimens in different subgroups in one trial when a uniform design is unethical or impractical. The authors show how the hybrid design can be planned and how inferences can be made. Through two examples, the authors illustrate the scenarios where hybrid designs are useful while the conventional designs are not preferable.
The hybrid design is a useful alternative to current superiority and non-inferiority designs.
We propose hybrid designs for the trials when neither a superiority nor a non-inferiority trial design is ethical and practical.
The hybrid design is practical, flexible and feasible.
We expect it to become a major alternative to the superiority and non-inferiority designs.
Strengths and limitations of this study
Hybrid design provides a powerful and relatively simple solution to the difficult problem of active controls with varying efficacy and/or safety concern. The problem is becoming more common as more drugs become available.
The design and analysis are moderately complex compared with the superiority and non-inferiority designs.
Bivariate random effect models are currently one of the main methods recommended to synthesize diagnostic test accuracy studies. However, only the logit-transformation on sensitivity and specificity has been previously considered in the literature. In this paper, we consider a bivariate generalized linear mixed model to jointly model the sensitivities and specificities, and discuss the estimation of the summary receiver operating characteristic curve (ROC) and the area under the ROC curve (AUC). As the special cases of this model, we discuss the commonly used logit, probit and complementary log-log transformations. To evaluate the impact of misspecification of the link functions on the estimation, we present two case studies and a set of simulation studies. Our study suggests that point estimation of the median sensitivity and specificity, and AUC is relatively robust to the misspecification of the link functions. However, the misspecification of link functions has a noticeable impact on the standard error estimation and the 95% confidence interval coverage, which emphasizes the importance of choosing an appropriate link function to make statistical inference.
meta-analysis; bivariate random effect models; sensitivity; specificity; receiver operating characteristic curve; area under the ROC curve
NF-κB is an antiapoptotic transcription factor that has been shown to be a mediator of treatment resistance. Bcl-3 is a regulator of NF-κB that may play a role in oncogenesis. The goal of this study was to correlate the activation status of NF-κB and Bcl-3 with clinical outcome in a group of patients with metastatic colorectal cancer (CRC).
A retrospective study of 23 patients who underwent surgical resection of CRC at the University of North Carolina (UNC). Activation of NF-κB was defined by nuclear expression of select components of NF-κB (p50, p52, p65) and Bcl-3. Tissue microarrays were created from cores of normal mucosa, primary tumor, lymph node metastases and liver metastases in triplicate from disparate areas of the blocks, and an intensity score was generated by multiplying intensity (0–3+) by percent of positive tumor cells. Generalized estimating equations were used to note differences in intensity scores among normal mucosa and nonnormal tissues. Cox regression models were fit to see if scores were significantly associated with overall survival.
p65 NE was significantly higher in primary tumor and liver metastases than normal mucosa (both p < 0.01). p50 nuclear expression was significantly higher for all tumor sites than for normal mucosa (primary tumor and lymph node metastases p < 0.0001, liver metastases p < 0.01). Bcl-3 nuclear expression did not differ significantly between normal mucosa and tumor; however, nuclear expression in primary tumor for each of these components was strongly associated with survival: the increase in hazard for each 50-point increase in nuclear expression was 91% for Bcl-3, 66% for p65, and 52% for p50 (all p < 0.05).
Activation of canonical NF-κB subunits p50 and p65 as measured by nuclear expression is strongly associated with survival suggesting NF-κB as a prognostic factor in this disease. Primary tumor nuclear expression appears to be as good as, or better than, metastatic sites at predicting prognosis. Bcl-3 nuclear expression is also negatively associated with survival and deserves further study in CRC.
NF-κB; P65; P50; Colorectal carcinoma
To evaluate the probabilities of a disease state, ideally all subjects in a study should be diagnosed by a definitive diagnostic or gold standard test. However, since definitive diagnostic tests are often invasive and expensive, it is generally unethical to apply them to subjects whose screening tests are negative. In this article, we consider latent class models for screening studies with two imperfect binary diagnostic tests and a definitive categorical disease status measured only for those with at least one positive screening test. Specifically, we discuss a conditional independent and three homogeneous conditional dependent latent class models and assess the impact of misspecification of the dependence structure on the estimation of disease category probabilities using frequentist and Bayesian approaches. Interestingly, the three homogeneous dependent models can provide identical goodness-of-fit but substantively different estimates for a given study. However, the parametric form of the assumed dependence structure itself is not “testable” from the data, and thus the dependence structure modeling considered here can only be viewed as a sensitivity analysis concerning a more complicated non-identifiable model potentially involving heterogeneous dependence structure. Furthermore, we discuss Bayesian model averaging together with its limitations as an alternative way to partially address this particularly challenging problem. The methods are applied to two cancer screening studies, and simulations are conducted to evaluate the performance of these methods. In summary, further research is needed to reduce the impact of model misspecification on the estimation of disease prevalence in such settings.
maximum likelihood; Bayesian inference; diagnostic test; dependence; screening; latent class models
That conditioning on a common effect of exposure and outcome may cause selection, or collider-stratification, bias is not intuitive. We provide two hypothetical examples to convey concepts underlying bias due to conditioning on a collider. In the first example, fever is a common effect of influenza and consumption of a tainted egg-salad sandwich. In the second example, case-status is a common effect of a genotype and an environmental factor. In both examples, conditioning on the common effect imparts an association between two otherwise independent variables; we call this selection bias.
Bias; selection; methods; epidemiologic
In the survival analysis context, when an intervention either reduces a harmful exposure or introduces a beneficial treatment, it seems useful to quantify the gain in survival attributable to the intervention as an alternative to the reduction in risk. To accomplish this we introduce two new concepts, the attributable survival and attributable survival time, and study their properties. Our analysis includes comparison with the attributable risk function as well as hazard-based alternatives. We also extend the setting to the case where the intervention takes place at discrete points in time, and may either eliminate exposure or introduce a beneficial treatment in only a proportion of the available group. This generalization accommodates the more realistic situation where the treatment or exposure is dynamic. We apply these methods to assess the effect of introducing highly active antiretroviral therapy for the treatment of clinical AIDS at the population level.
attributable risk function; survival analysis; parametric models; generalized gamma distribution; product limit estimate
Neighborhood socioeconomic environment may be a determinant of injection drug use cessation. The authors used data from a prospective cohort study of Baltimore City, Maryland, injection drug users assessed between 1990 and 2006. The study examined the relation between living in a poorer neighborhood and the probability of injection cessation among active injectors, independent of individual characteristics and while respecting the temporality of potential confounders, exposure, and outcome. Participants’ residences were geocoded, and the crude, adjusted, and inverse probability of exposure weighted associations between neighborhood poverty and injection drug use cessation were estimated. Weighted models showed a strong association between neighborhood poverty and injection drug use cessation; living in a neighborhood with fewer than 10%, compared with more than 30%, of residents in poverty was associated with a 44% increased odds of not injecting in the prior 6 months (odds ratio = 1.44, 95% confidence interval: 1.14, 1.82). Results show that neighborhood environment may be an important determinant of drug injection behavior independent of individual-level characteristics.
drug users; epidemiologic methods; heroin; poverty; residence characteristics; social environment; substance-related disorders
Background In epidemiologic research, little emphasis has been placed on methods to account for left-hand censoring of ‘exposures’ due to a limit of detection (LOD).
Methods We calculate the odds of anti-HIV therapy naiveté in 45 HIV-infected men as a function of measured log10 plasma HIV RNA viral load using five approaches including ad hoc methods as well as a maximum likelihood estimate (MLE). We also generated simulations of a binary outcome with 10% incidence and a 1.5-fold increased odds per log increase in a log-normally distributed exposure with 25, 50 and 75% of exposure data below LOD. Simulated data were analysed using the same five methods, as well as the full data.
Results In the example, the estimated odds ratio (OR) varied by 1.22-fold across methods, from 1.45 to 1.77 per log10 copies of viral load and the standard error for the log OR varied by 1.52-fold across methods, from 0.31 to 0.47. In the simulations, use of full data or the MLE was unbiased with appropriate confidence interval (CI) coverage. However, as the proportion of exposure below LOD increased, substituting LOD, LOD/√2 or LOD/2 was increasingly biased with increasingly inappropriate CI coverage. Finally, exclusion of values below LOD was unbiased but imprecise.
Conclusions In this example and the settings explored by simulation, and among methods readily available to investigators (i.e. sans full data), the MLE provided an unbiased and appropriately precise estimate of the exposure–outcome OR.
Biomarkers; epidemiologic methods; limit of detection; statistical method
In occupational case–control studies, work-related exposure assessments are often fallible measures of the true underlying exposure. In lieu of a gold standard, often more than 2 imperfect measurements (e.g. triads) are used to assess exposure. While methods exist to assess the diagnostic accuracy in the absence of a gold standard, these methods are infrequently used to correct for measurement error in exposure–disease associations in occupational case–control studies. Here, we present a likelihood-based approach that (a) provides evidence regarding whether the misclassification of tests is differential or nondifferential; (b) provides evidence whether the misclassification of tests is independent or dependent conditional on latent exposure status, and (c) estimates the measurement error–corrected exposure–disease association. These approaches use information from all imperfect assessments simultaneously in a unified manner, which in turn can provide a more accurate estimate of exposure–disease association than that based on individual assessments. The performance of this method is investigated through simulation studies and applied to the National Occupational Hazard Survey, a case–control study assessing the association between asbestos exposure and mesothelioma.
Case–control study; Gold standard; Missing data; Occupational exposure assessment
To estimate the association of rear seat safety belt use with death in a traffic crash.
Matched cohort study.
The US during 2000 through 2004.
Drivers (10 427) and rear seat passengers (15 922) in passenger vehicles that crashed and had at least one driver or rear passenger death. Data from the Fatality Analysis Reporting System.
Main outcome measures
The adjusted relative risk (aRR) of death for a belted rear seat passenger compared with an otherwise similar unbelted rear passenger.
Safety belt use was associated with a reduced risk of death for rear car occupants: outboard rear seat aRR 0.42 (95% CI 0.38 to 0.46), and center rear seat aRR 0.30 (95% CI 0.20 to 0.44). For rear occupants of light trucks, vans, and utility vehicles, the estimates were: outboard aRR 0.25 (95% CI 0.21 to 0.29), center aRR 0.34 (95% CI 0.24 to 0.48).
If the authors' estimates are causal, traffic crash mortality can be reduced for rear occupants by approximately 55–75% if they use safety belts.
Using validation sets for outcomes can greatly improve the estimation of vaccine efficacy (VE) in the field (Halloran and Longini, 2001; Halloran and others, 2003). Most statistical methods for using validation sets rely on the assumption that outcomes on those with no cultures are missing at random (MAR). However, often the validation sets will not be chosen at random. For example, confirmational cultures are often done on people with influenza-like illness as part of routine influenza surveillance. VE estimates based on such non-MAR validation sets could be biased. Here we propose frequentist and Bayesian approaches for estimating VE in the presence of validation bias. Our work builds on the ideas of Rotnitzky and others (1998, 2001), Scharfstein and others (1999, 2003), and Robins and others (2000). Our methods require expert opinion about the nature of the validation selection bias. In a re-analysis of an influenza vaccine study, we found, using the beliefs of a flu expert, that within any plausible range of selection bias the VE estimate based on the validation sets is much higher than the point estimate using just the non-specific case definition. Our approach is generally applicable to studies with missing binary outcomes with categorical covariates.
Bayesian; Expert opinion; Identifiability; Influenza; Missing data; Selection model; Vaccine efficacy
To assess whether HIV RNA levels (log10 scale) in highly active antiretroviral therapy (HAART) treated population have a bimodal distribution, suggesting optimal or suboptimal response to HAART.
The study population from two ongoing cohort studies comprised 564 men (4785 person visits) and 1173 women (8675 person visits) with known dates of HAART initiation and with HIV RNA measurements before and after initiation. Values below detection limit of assays were treated in the analysis as left censored. Maximum likelihood methods were used to estimate parameters and to determine possible bimodality of HIV RNA distributions.
A two component mixture model fitted HIV RNA levels significantly better than did a single component distribution at different years from HAART initiation in both therapy experienced and therapy naive patients. In the fifth year after HAART initiation, 32% of men and 44% of women had HIV RNA in the higher component with medians of 5247 and 9253 copies/ml, respectively, suggesting suboptimal virological response to HAART, which was associated with poor adherence and lower frequency of CCR5 heterozygous genotype.
The bimodal distribution of HIV RNA persisted during the years after HAART initiation. The high occurrence of suboptimal virological response at the fifth year after HAART initiation underscore the needs for careful monitoring and patient education about the importance of treatment adherence. This data analysis overcomes limitations of measurement techniques of observations having values below detection limits and serves to characterise the dynamics of the virological response to therapies.
mixture model; left censoring; HIV RNA; HAART