The predictive capacity of a marker in a population can be described using the population distribution of risk (Huang et al. 2007; Pepe et al. 2008a; Stern 2008). Virtually all standard statistical summaries of predictability and discrimination can be derived from it (Gail and Pfeiffer 2005). The goal of this paper is to develop methods for making inference about risk prediction markers using summary measures derived from the risk distribution. We describe some new clinically motivated summary measures and give new interpretations to some existing statistical measures. Methods for estimating these summary measures are described along with distribution theory that facilitates construction of confidence intervals from data. We show how markers and, more generally, how risk prediction models, can be compared using clinically relevant measures of predictability. The methods are illustrated by application to markers of lung function and nutritional status for predicting subsequent onset of major pulmonary infection in children suffering from cystic fibrosis. Simulation studies show that methods for inference are valid for use in practice.
doi:10.2202/1557-4679.1188
PMCID: PMC2827895
PMID: 20224632
Non-specific responses to treatment (commonly known as placebo response) are pervasive when treating mental illness. Subjects treated with an active drug may respond in part due to non-specific aspects of the treatment, i.e, those not related to the chemical effect of the drug. To determine the extent a subject responds due to the chemical effect of a drug, one must disentangle the specific drug effect from the non-specific placebo effect. This paper presents a unique statistical model that allows for the separate prediction of a specific effect and non-specific effects in drug treated subjects. Data from a clinical trial comparing fluoxetine to a placebo for treating depression is used to illustrate this methodology.
doi:10.2202/1557-4679.1152
PMCID: PMC3085382
PMID: 21556319
longitudinal outcome; linear mixed effects models; BLUP; non-specific treatment effect; specific drug effect; allometric extension; principal components
In the simultaneous estimation of a large number of related quantities, multilevel models provide a formal mechanism for efficiently making use of the ensemble of information for deriving individual estimates. In this article we investigate the ability of the likelihood to identify the relationship between signal and noise in multilevel linear mixed models. Specifically, we consider the ability of the likelihood to diagnose conjugacy or independence between the signals and noises. Our work was motivated by the analysis of data from high-throughput experiments in genomics. The proposed model leads to a more flexible family. However, we further demonstrate that adequately capitalizing on the benefits of a well fitting fully-specified likelihood in the terms of gene ranking is difficult.
doi:10.2202/1557-4679.1129
PMCID: PMC2827886
PMID: 20224629
Dynamic allocation of participants to treatments in a clinical trial has been an alternative to randomization for nearly 35 years. Design-adaptive allocation is a particularly flexible kind of dynamic allocation. Every investigation of dynamic allocation methods has shown that they improve balance of prognostic factors across treatment groups, but there have been lingering doubts about their influence on the validity of statistical inferences. Here we report the results of a simulation study focused on this and similar issues. Overall, it is found that there are no statistical reasons, in the situations studied, to prefer randomization to design-adaptive allocation. Specifically, there is no evidence of bias, the number of participants wasted by randomization in small studies is not trivial, and when the aim is to place bounds on the prediction of population benefits, randomization is quite substantially less efficient than design-adaptive allocation. A new, adjusted permutation estimate of the standard deviation of the regression estimator under design-adaptive allocation is shown to be an unbiased estimate of the true sampling standard deviation, resolving a long-standing problem with dynamic allocations. These results are shown in situations with varying numbers of balancing factors, different treatment and covariate effects, different covariate distributions, and in the presence of a small number of outliers.
doi:10.2202/1557-4679.1144
PMCID: PMC2827888
PMID: 20224630
CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island “core” is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island “core,” a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.
doi:10.2202/1557-4679.1158
PMCID: PMC2818740
PMID: 20148132
Propensity-score matching is frequently used in the medical literature to reduce or eliminate the effect of treatment selection bias when estimating the effect of treatments or exposures on outcomes using observational data. In propensity-score matching, pairs of treated and untreated subjects with similar propensity scores are formed. Recent systematic reviews of the use of propensity-score matching found that the large majority of researchers ignore the matched nature of the propensity-score matched sample when estimating the statistical significance of the treatment effect. We conducted a series of Monte Carlo simulations to examine the impact of ignoring the matched nature of the propensity-score matched sample on Type I error rates, coverage of confidence intervals, and variance estimation of the treatment effect. We examined estimating differences in means, relative risks, odds ratios, rate ratios from Poisson models, and hazard ratios from Cox regression models. We demonstrated that accounting for the matched nature of the propensity-score matched sample tended to result in type I error rates that were closer to the advertised level compared to when matching was not incorporated into the analyses. Similarly, accounting for the matched nature of the sample tended to result in confidence intervals with coverage rates that were closer to the nominal level, compared to when matching was not taken into account. Finally, accounting for the matched nature of the sample resulted in estimates of standard error that more closely reflected the sampling variability of the treatment effect compared to when matching was not taken into account.
doi:10.2202/1557-4679.1146
PMCID: PMC2949360
PMID: 20949126
propensity score; matching; propensity-score matching; variance estimation; coverage; simulations; type I error; observational studies
Epidemiologic research focuses on estimating exposure-disease associations. In some applications the exposure may be dichotomized, for instance when threshold levels of the exposure are of primary public health interest (e.g., consuming 5 or more fruits and vegetables per day may reduce cancer risk). Errors in exposure variables are known to yield biased regression coefficients in exposure-disease models. Methods for bias-correction with continuous mismeasured exposures have been extensively discussed, and are often based on validation substudies, where the “true” and imprecise exposures are observed on a small subsample. In this paper, we focus on biases associated with dichotomization of a mismeasured continuous exposure. The amount of bias, in relation to measurement error in the imprecise continuous predictor, and choice of dichotomization cut point are discussed. Measurement error correction via regression calibration is developed for this scenario, and compared to naïly using the dichotomized mismeasured predictor in linear exposure-disease models. Properties of the measurement error correction method (i.e., bias, mean-squared error) are assessed via simulations.
doi:10.2202/1557-4679.1143
PMCID: PMC2743435
PMID: 20046953
Granger causality (GC) and its extension have been used widely to infer causal relationships from multivariate time series generated from biological systems. GC is ideally suited for causal inference in bivariate vector autoregressive process (VAR). A zero magnitude of the upper or lower off-diagonal element(s) in a bivariate VAR is indicative of lack of causal relationship in that direction resulting in true acyclic structures. However, in experimental settings, statistical tests, such as F-test that rely on the ratio of the mean-squared forecast errors, are used to infer significant GC relationships. The present study investigates acyclic approximations within the context of bi-directional two-gene network motifs modeled as bivariate VAR. The fine interplay between the model parameters in the bivariate VAR, namely: (i) transcriptional noise variance, (ii) autoregulatory feedback, and (iii) transcriptional coupling strength that can give rise to discrepancies in the ratio of the mean-squared forecast errors is investigated. Subsequently, their impact on statistical power is investigated using Monte Carlo simulations. More importantly, it is shown that one can arrive at acyclic approximations even for bi-directional networks for suitable choice of process parameters, significance level and sample size. While the results are discussed within the framework of transcriptional network, the analytical treatment provided is generic and likely to have significant impact across distinct paradigms.
doi:10.2202/1557-4679.1119
PMCID: PMC2827889
PMID: 20224631
CpG islands are genome subsequences with an unexpectedly high number of CG di-nucleotides. They are typically identified using filtering criteria (e.g., G+C% expected vs. observed CpG ratio and length) and are computed using sliding window methods. Most such studies illusively assume an exhaustive search of CpG islands are achieved on the genome sequence of interest. We devise a Lexis diagram and explicitly show that filtering criteria-based definitions of CpG islands are mathematically incomplete and non-operational. These facts imply that the sliding window methods frequently fail to identify a large percentage of subsequences that meet the filtering criteria. We also demonstrate that an exhaustive search is computationally expensive. We develop the Hierarchical Factor Segmentation (HFS) algorithm, a pattern recognition technique with an adaptive model selection device to overcome the incompleteness and non-operational drawbacks, and to achieve effective computations for identifying CpG-islands. The concept of a CpG island “core” is introduced and computed using the HFS algorithm, which is independent from any specific filtering criteria. Upon such a CpG island “core,” a CpG-island is constructed using a Lexis diagram. This two-step computational approach provides a nearly exhaustive search for CpG islands that can be practically implemented on whole chromosomes. In a simulation study realistically mimicking CpG-island dynamics through a Hidden Markov Model we demonstrate that this approach retains very high sensitivity and specificity, that is, very low rates of false positives and false negatives. Finally, we apply the HFS algorithm to identify CpG island cores on human chromosome 21.
doi:10.2202/1557-4679.1158
PMCID: PMC2818740
PMID: 20148132
AIC and BIC model selection criteria; non-parametric decoding; filtering criteria; hierarchical factor segmentation; human chromosome 21; mathematical incompleteness; methylation
For both clinical and research purposes, biopsies are used to classify liver damage known as fibrosis on an ordinal multi-state scale ranging from no damage to cirrhosis. Misclassification can arise from reading error (misreading of a specimen) or sampling error (the specimen does not accurately represent the liver). Studies of biopsy accuracy have not attempted to synthesize these two sources of error or to estimate actual misclassification rates from either source. Using data from two studies of reading error and two of sampling error, we find surprisingly large possible misclassification rates, including a greater than 50% chance of misclassification for one intermediate stage of fibrosis. We find that some readers tend to misclassify consistently low or consistently high, and some specimens tend to be misclassified low while others tend to be misclassified high. Non-invasive measures of liver fibrosis have generally been evaluated by comparison to simultaneous biopsy results, but biopsy appears to be too unreliable to be considered a gold standard. Non-invasive measures may therefore be more useful than such comparisons suggest. Both stochastic uncertainty and uncertainty about our model assumptions appear to be substantial. Improved studies of biopsy accuracy would include large numbers of both readers and specimens, greater effort to reduce or eliminate reading error in studies of sampling error, and careful estimation of misclassification rates rather than less useful quantities such as kappa statistics.
doi:10.2202/1557-4679.1139
PMCID: PMC2810974
PMID: 20104258
fibrosis; hepatitis C; kappa statistic; latent variables; misclassification
For both clinical and research purposes, biopsies are used to classify liver damage known as fibrosis on an ordinal multi-state scale ranging from no damage to cirrhosis. Misclassification can arise from reading error (misreading of a specimen) or sampling error (the specimen does not accurately represent the liver). Studies of biopsy accuracy have not attempted to synthesize these two sources of error or to estimate actual misclassification rates from either source. Using data from two studies of reading error and two of sampling error, we find surprisingly large possible misclassification rates, including a greater than 50% chance of misclassification for one intermediate stage of fibrosis. We find that some readers tend to misclassify consistently low or consistently high, and some specimens tend to be misclassified low while others tend to be misclassified high. Non-invasive measures of liver fibrosis have generally been evaluated by comparison to simultaneous biopsy results, but biopsy appears to be too unreliable to be considered a gold standard. Non-invasive measures may therefore be more useful than such comparisons suggest. Both stochastic uncertainty and uncertainty about our model assumptions appear to be substantial. Improved studies of biopsy accuracy would include large numbers of both readers and specimens, greater effort to reduce or eliminate reading error in studies of sampling error, and careful estimation of misclassification rates rather than less useful quantities such as kappa statistics.
doi:10.2202/1557-4679.1139
PMCID: PMC2810974
PMID: 20104258
The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.
We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability under much weaker assumptions than are required for standard methods. A drawback of this approach, as we show, is that these confidence intervals are often quite wide. In response to this, we present a method for constructing much narrower confidence intervals, which are better suited for practical applications, and that are still more robust than confidence intervals based on standard methods, when dealing with small sample sizes. We show how to extend our approaches to much more general estimation problems than estimating the sample mean. We describe how these methods can be used to obtain more reliable confidence intervals in survey sampling. As a concrete example, we construct confidence intervals using our methods for the number of violent deaths between March 2003 and July 2006 in Iraq, based on data from the study “Mortality after the 2003 invasion of Iraq: A cross sectional cluster sample survey,” by Burnham et al. (2006).
doi:10.2202/1557-4679.1118
PMCID: PMC2827893
PMID: 20231867
We examined the behavior of alternative smoothing methods for modeling environmental epidemiology data. Model fit can only be examined when the true exposure-response curve is known and so we used simulation studies to examine the performance of penalized splines (P-splines), restricted cubic splines (RCS), natural splines (NS), and fractional polynomials (FP). Survival data were generated under six plausible exposure-response scenarios with a right skewed exposure distribution, typical of environmental exposures. Cox models with each spline or FP were fit to simulated datasets. The best models, e.g. degrees of freedom, were selected using default criteria for each method. The root mean-square error (rMSE) and area difference were computed to assess model fit and bias (difference between the observed and true curves). The test for linearity was a measure of sensitivity and the test of the null was an assessment of statistical power. No one method performed best according to all four measures of performance, however, all methods performed reasonably well. The model fit was best for P-splines for almost all true positive scenarios, although fractional polynomials and RCS were least biased, on average.
doi:10.2202/1557-4679.1104
PMCID: PMC2827890
PMID: 20231865
Matched case-control study designs are commonly implemented in the field of public health. While matching is intended to eliminate confounding, the main potential benefit of matching in case-control studies is a gain in efficiency. Methods for analyzing matched case-control studies have focused on utilizing conditional logistic regression models that provide conditional and not causal estimates of the odds ratio. This article investigates the use of case-control weighted targeted maximum likelihood estimation to obtain marginal causal effects in matched case-control study designs. We compare the use of case-control weighted targeted maximum likelihood estimation in matched and unmatched designs in an effort to explore which design yields the most information about the marginal causal effect. The procedures require knowledge of certain prevalence probabilities and were previously described by van der Laan (2008). In many practical situations where a causal effect is the parameter of interest, researchers may be better served using an unmatched design.
doi:10.2202/1557-4679.1127
PMCID: PMC2827892
PMID: 20231866
Epidemiologic research focuses on estimating exposure-disease associations. In some applications the exposure may be dichotomized, for instance when threshold levels of the exposure are of primary public health interest (e.g., consuming 5 or more fruits and vegetables per day may reduce cancer risk). Errors in exposure variables are known to yield biased regression coefficients in exposure-disease models. Methods for bias-correction with continuous mismeasured exposures have been extensively discussed, and are often based on validation substudies, where the “true” and imprecise exposures are observed on a small subsample. In this paper, we focus on biases associated with dichotomization of a mismeasured continuous exposure. The amount of bias, in relation to measurement error in the imprecise continuous predictor, and choice of dichotomization cut point are discussed. Measurement error correction via regression calibration is developed for this scenario, and compared to naïvely using the dichotomized mismeasured predictor in linear exposure-disease models. Properties of the measurement error correction method (i.e., bias, mean-squared error) are assessed via simulations.
doi:10.2202/1557-4679.1143
PMCID: PMC2743435
PMID: 20046953
measurement error correction; dichotomizing covariates; regression calibration