Epidemiologic and clinical studies routinely collect longitudinal measures of multiple outcomes, including biomarker measures, cognitive functions, and clinical symptoms. These longitudinal outcomes can be used to establish the temporal order of relevant biological processes and their association with the onset of clinical symptoms. Univariate change point models have been used to model various clinical endpoints, such as CD4 count in studying the progression of HIV infection and cognitive function in the elderly. We propose to use bivariate change point models for two longitudinal outcomes with a focus on the correlation between the two change points. We consider three types of change point models in the bivariate model setting: the broken-stick model, the Bacon–Watts model, and the smooth polynomial model. We adopt a Bayesian approach using a Markov chain Monte Carlo sampling method for parameter estimation and inference. We assess the proposed methods in simulation studies and demonstrate the methodology using data from a longitudinal study of dementia.
random change point model; longitudinal bivariate outcomes; Bayesian method
The field of psychiatric genetics is hampered by the lack of a clear taxonomy for disorders. Building on the work of Houseman and colleagues (Feature-specific penalized latent class analysis for genomic data. Harvard University Biostatistics Working Paper Series, Working Paper 22, 2005), we describe a penalized latent class regression aimed at allowing additional scientific information to influence the estimation of the measurement model, while retaining the standard assumption of non-differential measurement.
In simulation studies, ridge and LASSO penalty functions improved the precision of estimates and, in some cases of differential measurement, also reduced bias. Class-specific penalization enhanced separation of latent classes with respect to covariates, but only in scenarios where there was a true separation. Penalization proved to be less computationally intensive than an analogous Bayesian analysis by a factor of 37.
This methodology was then applied to data from normal elderly subjects from the Cache County Study on Memory and Aging. Addition of APO-E genotype and a number of baseline clinical covariates improved the dementia prediction utility of the latent classes; application of class-specific penalization improved precision while retaining that prediction utility. This methodology may be useful in scenarios with large numbers of collinear covariates or in certain cases where latent class model assumptions are violated. Investigation of novel penalty functions may prove fruitful in further refining psychiatric phenotypes.
latent class analysis; latent variable models; measurement models; penalization
The time-to-event continual reassessment method (TITE-CRM) was proposed to handle the problem of long trial duration in Phase 1 trials as a result of late-onset toxicities. Here, we implement the TITE-CRM in dose–finding trials of combinations of agents. When studying multiple agents, monotonicity of the dose-toxicity curve is not clearly defined. Therefore, the toxicity probabilities follow a partial order, meaning that there are pairs of treatments for which the ordering of the toxicity probabilities is not known at the start of the trial. A CRM design for partially ordered trials (PO-CRM) was recently proposed. Simulation studies show that extending the TITE-CRM to the partial order setting produces results similar to those of the PO-CRM in terms of maximum tolerated dose recommendation yet reduces the duration of the trial.
continual reassessment method; dose finding; Phase 1 trials; drug combination; partial order; time-to-event
Net reclassification and integrated discrimination improvements have been proposed as alternatives to the increase in the AUC for evaluating improvement in the performance of risk assessment algorithms introduced by the addition of new phenotypic or genetic markers. In this paper, we demonstrate that in the setting of linear discriminant analysis, under the assumptions of multivariate normality, all three measures can be presented as functions of the squared Mahalanobis distance. This relationship affords an interpretation of the magnitude of these measures in the familiar language of effect size for uncorrelated variables. Furthermore, it allows us to conclude that net reclassification improvement can be viewed as a universal measure of effect size. Our theoretical developments are illustrated with an example based on the Framingham Heart Study risk assessment model for high risk men in primary prevention of cardiovascular disease.
AUC; biomarker; c statistic; model performance; risk prediction; ROC
Response fatigue can cause measurement error and misclassification problems in survey research. Questions asked later in a long survey are often prone to more measurement error or misclassification. The response given is a function of both the true response and participant response fatigue. We investigate the identifiability of survey order effects and their impact on estimators of treatment effects. The focus is on fatigue that affects a given answer to a question rather than fatigue that causes non-response and missing data. We consider linear, Gamma, and logistic models of response that incorporate both the true underlying response and the effect of question order. For continuous data, survey order effects have no impact on study power under a Gamma model. However, under a linear model that allows for convergence of responses to a common mean, the impact of fatigue on power will depend on how fatigue affects both the rate of mean convergence and the variance of responses. For binary data and for less than a 50% chance of a positive response, order effects cause study power to increase under a linear probability (risk difference) model, but decrease under a logistic model. The results suggest that measures designed to reduce survey order effects might have unintended consequences. We present a data example that demonstrates the problem of survey order effects.
Comparative Effectiveness Research has been given a broad and ambitious mandate. Will it be able to deliver the multifaceted and granular comparative information it has been tasked with developing? After a discussion of the general conditions for the feasibility of CER, we focus our attention on one of the most challenging areas: the evaluation of diagnostic tests and biomarkers.
The analysis of data subject to detection limits is becoming increasingly necessary in many environmental and laboratory studies. Covariates subject to detection limits are often left censored because of a measurement device having a minimal lower limit of detection. In this paper, we propose a Monte Carlo version of the expectation–maximization algorithm to handle large number of covariates subject to detection limits in generalized linear models. We model the covariate distribution via a sequence of one-dimensional conditional distributions, and sample the covariate values using an adaptive rejection metropolis algorithm. Parameter estimation is obtained by maximization via the Monte Carlo M-step. This procedure is applied to a real dataset from the National Health and Nutrition Examination Survey, in which values of urinary heavy metals are subject to a limit of detection. Through simulation studies, we show that the proposed approach can lead to a significant reduction in variance for parameter estimates in these models, improving the power of such studies.
EM algorithm; Gibbs sampling; logistic regression; maximum likelihood estimation; Monte Carlo EM; NHANES
The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this paper, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene–disease association in a population-based case-control study of colorectal cancer.
cross-sectional study; case-control study; efficiency; logistic regression; pooling; repeated measures; single nucleotide polymorphism
In clinical trials with time-to-event endpoints, it is not uncommon to see a significant proportion of patients being cured (or long-term survivors), such as trials for the non-Hodgkins lymphoma disease. The popularly used sample size formula derived under the proportional hazards (PH) model may not be proper to design a survival trial with a cure fraction, since the PH model assumption may be violated. To account for a cure fraction, the PH cure model is widely used in practice, where a PH model is used for survival times of uncured patients and a logistic distribution is used for the probability of patients being cured. In this paper, we develop a sample size formula based on the PH cure model by investigating the asymptotic distributions of the standard weighted log-rank statistics under the null and local alternative hypotheses. The derived sample size formula under the PH cure model is more flexible since it can be used to test the differences in the short-term survival and/or cure fraction. Furthermore, the impacts of accrual methods and durations of accrual and follow-up periods on sample size calculation are also investigated as numerical examples. The results show that ignoring the cure rate in sample size calculation can lead to either underpowered or overpowered studies. The performance of the proposed formula is evaluated by simulation studies and an example is given to illustrate its application using data from a melanoma trial.
Clinical trial; Proportional hazards cure model; Power; Sample size; Weighted log-rank test
The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice.
Integral Approximation; Linearization; GLIMMIX; lme4; NLMIXED; R; SAS; ZELIG
Although in cancer research microarray gene profiling studies have been successful in identifying genetic variants predisposing to the development and progression of cancer, the identified markers from analysis of single datasets often suffer low reproducibility. Among multiple possible causes, the most important one is the small sample size hence the lack of power of single studies. Integrative analysis jointly considers multiple heterogeneous studies, has a significantly larger sample size, and can improve reproducibility. In this article, we focus on cancer prognosis studies, where the response variables are progression-free, overall, or other types of survival. A group minimax concave penalty (GMCP) penalized integrative analysis approach is proposed for analyzing multiple heterogeneous cancer prognosis studies with microarray gene expression measurements. An efficient group coordinate descent algorithm is developed. The GMCP can automatically accommodate the heterogeneity across multiple datasets, and the identified markers have consistent effects across multiple studies. Simulation studies show that the GMCP provides significantly improved selection results as compared with the existing meta-analysis approaches, intensity approaches, and group Lasso penalized integrative analysis. We apply the GMCP to four microarray studies and identify genes associated with the prognosis of breast cancer.
integrative analysis; cancer prognosis; microarray; penalized selection
Issues surrounding choice of time scales in Cox proportional hazard regression models have received limited attention in the literature. Although the choice between time on study and ‘attained’ age time scales has been examined, the calendar time scale may be of interest when modeling health effects of environmental exposures with noteworthy secular trends such as ambient particulate matter air pollution in large epidemiological cohort studies. The authors use simulation studies to examine performance (bias, mean squared error, coverage probabilities, and power) of models using all three time scales when the primary exposure of interest depends on calendar time. Results show that performance of models fit to the calendar time scale varies inversely with the strength of the linear association between the time-varying primary exposure and calendar time. Although models fit to attained age and time on study that do not adjust for calendar time were relatively robust, the authors conclude that care should be exercised when using time scales that are highly correlated with exposures of interest.
Timescale; Ambient particulate matter; Cox proportional hazards model; Model misspecification; Time dependent covariate; Time-varying covariate
The use of correlation coefficients in measuring the association between two continuous variables is common, but regular methods of calculating correlations have not been extended to the clustered data framework. For clustered data in which observations within a cluster may be correlated, regular inferential procedures for calculating marginal association between two variables can be biased. This is particularly true for data in which the number of observations in a given cluster is informative for the association being measured. In this paper, we apply the principle of inverse cluster size reweighting to develop estimators of marginal correlation that remain valid in the clustered data framework when cluster size is informative for the correlation being measured. These correlations are derived as analogs to regular correlation estimators for continuous, independent data, namely, Pearson’s ρ and Kendall’s τ. We present the results of a simple simulation study demonstrating the appropriateness of our proposed estimators and the inherent bias of other inferential procedures for clustered data. We illustrate their use through an application to data from patients with incomplete spinal cord injury in the USA.
measures of correlation; marginal analysis; clustered observations; informative cluster size
The receiver operating characteristics (ROC) curve is a widely used tool for evaluating discriminative and diagnostic power of a biomarker. When the biomarker value is missing for some observations, the ROC analysis based solely on the complete cases loses efficiency due to the reduced sample size, and more importantly, it is subject to potential bias. In this paper, we investigate nonparametric multiple imputation methods for ROC analysis when some biomarker values are missing at random (MAR) and there are auxiliary variables that are fully observed and predictive of biomarker values and/or missingness of biomarker values. While a direct application of standard nonparametric imputation is robust to model misspecification, its finite sample performance suffers from curse of dimensionality as the number of auxiliary variables increases. To address this problem, we propose new nonparametric imputation methods, which achieve dimension reduction through the use of one or two working models, namely, models for prediction and propensity scores. The proposed imputation methods provide a platform for a full range of ROC analysis, and hence are more flexible than existing methods that primarily focus on estimating the area under the ROC curve (AUC). We conduct simulation studies to evaluate the finite sample performance of the proposed methods, and find that the proposed methods are robust to various types of model misidentification and outperform the standard nonparametric approach even when the number of auxiliary variables is moderate. We further illustrate the proposed methods using an observational study of maternal depression during pregnancy.
Area Under Curve; Bootstrap Methods; Dimension Reduction; Multiple Imputation; Nearest Neighbor Methods; Nonparametric Imputation; Receiver Operating Characteristics Curve
We discuss extensions of model-based designs, such as the continual reassessment method, for use in dose-finding studies. Rather than work with a single model to carry out the design and analysis of a dose-finding study we indicate how the use of several models can greatly increase flexibility. We can appeal to established results on Bayesian model choice and this device makes the inferential problem essentially straightforward. The greater flexibility enables us to take on board many different kinds of added complexity. Examples include extended models to deal with subject heterogeneity, extended models to take account of different treatment schedules and extended models to tackle the problem of partial ordering.
clinical trials; Bayesian model choice; continual reassessment method; dose escalation; dose-finding studies; extended models; phase 1; safety; toxicity
Because randomization of participants is often not feasible in community-based health interventions, non-randomized designs are commonly employed. Non-randomized designs may have experimental units that are spatial in nature, such as zip codes that are characterized by aggregate statistics from sources like the U.S. census and the Centers for Medicare and Medicaid Services. A perennial concern with non-randomized designs is that even after careful balancing of influential covariates, bias may arise from unmeasured factors. In addition to facilitating the analysis of interventional designs based on spatial units, Bayesian hierarchical modeling can quantify unmeasured variability with spatially correlated residual terms. Graphical analysis of these spatial residuals demonstrates whether variability from unmeasured covariates is likely to bias the estimates of interventional effect.
The Connecticut Collaboration for Fall Prevention is the first large-scale longitudinal trial of a community-wide healthcare intervention designed to prevent injurious falls in older adults. Over a two-year evaluation phase, this trial demonstrated a rate of fall-related utilization at hospitals and emergency departments by persons 70 years and older in the intervention area that was 11 per cent less than that of the usual care area, and a 9 per cent lower rate of utilization from serious injuries. We describe the Bayesian hierarchical analysis of this non-randomized intervention with emphasis on its spatial and longitudinal characteristics. We also compare several models, using posterior predictive simulations and maps of spatial residuals.
Bayesian hierarchical model; posterior predictive simulation; spatial residuals; non-randomized trial; longitudinal study; fall prevention
Increasing demands for evidence-based medicine and for the translation of biomedical research into individual and public health benefit have been accompanied by the proliferation of special units that offer expertise in biostatistics, epidemiology, and research design (BERD) within academic health centers. Objective metrics that can be used to evaluate, track, and improve the performance of these BERD units are critical to their successful establishment and sustainable future. To develop a set of reliable but versatile metrics that can be adapted easily to different environments and evolving needs, we consulted with members of BERD units from the consortium of academic health centers funded by the Clinical and Translational Science Award Program of the National Institutes of Health. Through a systematic process of consensus building and document drafting, we formulated metrics that covered the three identified domains of BERD practices: the development and maintenance of collaborations with clinical and translational science investigators, the application of BERD-related methods to clinical and translational research, and the discovery of novel BERD-related methodologies. In this article, we describe the set of metrics and advocate their use for evaluating BERD practices. The routine application, comparison of findings across diverse BERD units, and ongoing refinement of the metrics will identify trends, facilitate meaningful changes, and ultimately enhance the contribution of BERD activities to biomedical research.
metrics; biomedical research; collaboration; clinical and translational science; biostatistics; epidemiology; research design
Identifying brain regions with high differential response under multiple experimental conditions is a fundamental goal of functional imaging. In many studies, regions of interest (ROIs) are not determined a priori but are instead discovered from the data, a process that requires care because of the great potential for false discovery. An additional challenge is that magnetoencephalography/electroencephalography sensor signals are very noisy, and brain source images are usually produced by averaging sensor signals across trials. As a consequence, for a given subject, there is only one source data vector for each condition, making it impossible to apply testing methods such as analysis of variance. We solve these problems in several steps. (1) To obtain within-condition uncertainty, we apply the bootstrap across trials, producing many bootstrap source images. To discover ‘hot spots’ in space and time that could become ROIs, (2) we find source locations where likelihood ratio statistics take unusually large values. We are not interested in isolated brain locations where a test statistic might happen to be large. Instead, (3) we apply a clustering algorithm to identify sources that are contiguous in space and time where the test statistic takes an ‘excursion’ above some threshold. Having identified possible spatiotemporal ROIs, (4) we evaluate global statistical significance of ROIs by using a permutation test. After these steps, we check performance via simulation, and then illustrate their application in a magnetoencephalography study of four-direction center-out wrist movement, showing that this approach identifies statistically significant spatiotemporal ROIs in the motor and visual cortices of individual subjects.
ROI; global statistical significance; spatiotemporal clustering; bootstrap; MEG/EEG; source localization
The Lorenz curve is a graphical tool that is widely used to characterize the concentration of a measure in a population, such as wealth. It is frequently the case that the measure of interest used to rank experimental units when estimating the empirical Lorenz curve, and the corresponding Gini coefficient, is subject to random error. This error can result in an incorrect ranking of experimental units which inevitably leads to a curve that exaggerates the degree of concentration (variation) in the population. We consider a specific data configuration with a hierarchical structure where multiple observations are aggregated within experimental units to form the outcome whose distribution is of interest. Within this context, we explore this bias and discuss several widely available statistical methods that have the potential to reduce or remove the bias in the empirical Lorenz curve. The properties of these methods are examined and compared in a simulation study. This work is motivated by a health outcomes application that seeks to assess the concentration of black patient visits among primary care physicians. The methods are illustrated on data from this study.
concentration; distribution; inequality; hierarchical data
For pathogens that must be treated with combinations of antibiotics and acquire resistance through genetic mutation, knowledge of the order in which drug-resistance mutations occur may be important for determining treatment policies. Diagnostic specimens collected from patients are often available; this makes it possible to determine the presence of individual drug-resistance conferring mutations and combinations of these mutations. In most cases, these specimens are only available from a patient at a single point in time; it is very rare to have access to multiple specimens from a single patient collected over time as resistance accumulates to multiple drugs. Statistical methods that use branching trees have been successfully applied to such cross-sectional data to make inference on the ordering of events that occurred prior to sampling. Here we propose a Bayesian approach to fitting branching tree models that has several advantages, including the ability to accommodate prior information regarding measurement error or cross-resistance and the natural way it permits the characterization of uncertainty. Our methods are applied to a data set for drug resistant tuberculosis in Peru; the goal of analysis is to determine the order with which patients develop resistance to the drugs commonly used for treating TB in this setting.
Bayesian Networks; Branching; TB drug resistance; Tree Inference
Outcome-dependent sampling (ODS) study designs are commonly implemented with rare diseases or when prospective studies are infeasible. In longitudinal data settings, when a repeatedly measured binary response is rare, an ODS design can be highly efficient for maximizing statistical information subject to resource limitations that prohibit covariate ascertainment of all observations. This manuscript details an ODS design where individual observations are sampled with probabilities determined by an inexpensive, time-varying auxiliary variable that is related but is not equal to the response. With the goal of validly estimating marginal model parameters based on the resulting biased sample, we propose a semi-parametric, sequential offsetted logistic regressions (SOLR) approach. The SOLR strategy first estimates the relationship between the auxiliary variable and the response and covariate data by using an offsetted logistic regression analysis where the offset is used to adjust for the biased design. Results from the auxiliary variable model are then combined with the known or estimated sampling probabilities to formulate a second offset that is used to correct for the biased design in the ultimate target model relating the longitudinal binary response to covariates. Because the target model offset is estimated with SOLR, we detail asymptotic standard error estimates that account for uncertainty associated with the auxiliary variable model. Motivated by an analysis of the BioCycle Study (Gaskins et al., Effect of daily fiber intake on reproductive function: the BioCycle Study. American Journal of Clinical Nutrition 2009; 90(4): 1061–1069) that aims to describe the relationship between reproductive health (determined by luteinizing hormone levels) and fiber consumption, we examine properties of SOLR estimators and compare them with other common approaches.
outcome-dependent sampling; bias sampling; study design; generalized estimating equations; longitudinal data analysis; binary data
It is a common practice to conduct medical trials to compare a new therapy with a standard-of-care based on paired data consisted of pre- and post-treatment measurements. In such cases, a great interest often lies in identifying treatment effects within each therapy group and detecting a between-group difference. In this article, we propose exact nonparametric tests for composite hypotheses related to treatment effects to provide efficient tools that compare study groups utilizing paired data. When correctly specified, parametric likelihood ratios can be applied, in an optimal manner, to detect a difference in distributions of two samples based on paired data. The recent statistical literature introduces density-based empirical likelihood methods to derive efficient nonparametric tests that approximate most powerful Neyman–Pearson decision rules. We adapt and extend these methods to deal with various testing scenarios involved in the two-sample comparisons based on paired data. We show that the proposed procedures outperform classical approaches. An extensive Monte Carlo study confirms that the proposed approach is powerful and can be easily applied to a variety of testing problems in practice. The proposed technique is applied for comparing two therapy strategies to treat children’s attention deficit/hyperactivity disorder and severe mood dysregulation.
empirical likelihood; exact tests; likelihood ratio; nonparametric test; paired data; paired t-test; two-sample problem; Wilcoxon signed rank test
The current practice in analyzing data from anti-cancer drug screening by xenograft experiments lacks statistical consideration to account for experimental noise, and a sound inference procedure is necessary. A novel confidence bound and interval procedure for estimating quantile ratios developed in this paper fills the void. Justified by rigorous large-sample theory and a simulation study of small-sample performance, the proposed method performs well in a wide range of scenarios involving right-skewed distributions. By providing rigorous inference and much more interpretable statistics that account for experimental noise, the proposed method improves the current practice of analyzing drug activity data in xenograft experiments. The proposed method is fully nonparametric, simple to compute, performs equally well or better than known nonparametric methods, and is applicable to any statistical inference of a “fold change” that can be formulated as a quantile ratio.
Quantile; Median; Quartile; Ratio; Fold change; Confidence interval; Confidence bound; Xenograft experiment
Screening mammography is a widely used method for breast cancer detection. For each mammogram we propose a performance model based on order of outcomes. That is, we envision an initial assessment, a follow up assessment if the initial one is positive and, eventually, a determination of whether cancer was present or not. A model can be built at each stage reflecting effects due to patient characteristics, to the facility where mammogram was performed and to the radiologist reading the mammogram. Since assessment is not perfectly associated with outcome, familiar rates of agreement and disagreement are of interest. These rates can be investigated at various levels of risk factors of interest. The approach is illustrated with screening mammography data from the Group Health Cooperative in Seattle, WA. A Bayesian framework is adopted for inference and an analysis of the data set is presented.
breast cancer; false positive/negative; logistic regression; model selection; sensitivity; specificity
We present a global test for disease clustering with power to identify disturbances from the null population distribution which accounts for the lag time between the date of exposure and the date of diagnosis. Location at diagnosis is often used as a surrogate for the location of exposure, however, the causative exposure could have occurred at a previous address in a case’s residential history. We incorporate models for the incubation distribution of a disease to weight each address in the residential history by the corresponding probability of the exposure occurring at that address. We then introduce a test statistic which uses these incubation-weighted addresses to test for a difference between the spatial distribution of the cases and the spatial distribution of the controls, or the background population. We follow the construction of the M statistic to evaluate the significance of these new distance distributions. Our results show that gains in detection power when residential history is accounted for are of such a degree that it might make the qualitative difference between the presence of spatial clustering being detected or not, thus making a strong argument for the inclusion of residential history in the analysis of such data.
Residential history; Interpoint distances; U-statistics; incubation period distributions; public health surveillance