Search tips
Search criteria

Results 1-25 (1175601)

Clipboard (0)

Related Articles

1.  Varying-coefficient models for longitudinal processes with continuous-time informative dropout 
Biostatistics (Oxford, England)  2009;11(1):93-110.
Dropout is a common occurrence in longitudinal studies. Building upon the pattern-mixture modeling approach within the Bayesian paradigm, we propose a general framework of varying-coefficient models for longitudinal data with informative dropout, where measurement times can be irregular and dropout can occur at any point in continuous time (not just at observation times) together with administrative censoring. Specifically, we assume that the longitudinal outcome process depends on the dropout process through its model parameters. The unconditional distribution of the repeated measures is a mixture over the dropout (administrative censoring) time distribution, and the continuous dropout time distribution with administrative censoring is left completely unspecified. We use Markov chain Monte Carlo to sample from the posterior distribution of the repeated measures given the dropout (administrative censoring) times; Bayesian bootstrapping on the observed dropout (administrative censoring) times is carried out to obtain marginal covariate effects. We illustrate the proposed framework using data from a longitudinal study of depression in HIV-infected women; the strategy for sensitivity analysis on unverifiable assumption is also demonstrated.
PMCID: PMC2800163  PMID: 19837655
HIV/AIDS; Missing data; Nonparametric regression; Penalized splines
2.  A varying-coefficient method for analyzing longitudinal clinical trials data with nonignorable dropout 
Contemporary clinical trials  2011;33(2):378-385.
Dropout is common in longitudinal clinical trials and when the probability of dropout depends on unobserved outcomes even after conditioning on available data, it is considered missing not at random and therefore nonignorable. To address this problem, mixture models can be used to account for the relationship between a longitudinal outcome and dropout. We propose a Natural Spline Varying-coefficient mixture model (NSV), which is a straightforward extension of the parametric Conditional Linear Model (CLM). We assume that the outcome follows a varying-coefficient model conditional on a continuous dropout distribution. Natural cubic B-splines are used to allow the regression coefficients to semiparametrically depend on dropout and inference is therefore more robust. Additionally, this method is computationally stable and relatively simple to implement. We conduct simulation studies to evaluate performance and compare methodologies in settings where the longitudinal trajectories are linear and dropout time is observed for all individuals. Performance is assessed under conditions where model assumptions are both met and violated. In addition, we compare the NSV to the CLM and a standard random-effects model using an HIV/AIDS clinical trial with probable nonignorable dropout. The simulation studies suggest that the NSV is an improvement over the CLM when dropout has a nonlinear dependence on the outcome.
PMCID: PMC3414213  PMID: 22101223
Dropout; Nonignorable Missing Data; Longitudinal data; Varying-coefficient model; B-spline; HIV/AIDS
3.  Bayesian Latent-class Mixed-effect Hybrid Models for Dyadic Longitudinal Data with Non-ignorable Dropouts 
Biometrics  2013;69(4):914-924.
The analysis of longitudinal dyadic data is challenging due to the complicated correlations within and between dyads, as well as possibly non-ignorable dropouts. Based on a mixed-effects hybrid model, we propose an approach to analyze longitudinal dyadic data with non-ignorable dropouts. We factorize the joint distribution of the measurement and dropout processes into three components: the marginal distribution of random effects, the conditional distribution of the dropout process given the random effects, and the conditional distribution of the measurement process given the random effects and missing data patterns. We model the conditional dropout process using a discrete survival model, and the conditional measurement process using a latent-class pattern-mixture model. These models account for the dyadic interdependence using the “actor” and “partner” effects and dyad-specific random effects. We use the latent-dropout-class approach to address the problem of a large number of missing data patterns caused by the dyadic data structure. We evaluate the performance of the proposed method using a simulation study, and apply our method to a longitudinal dyadic data set that arose from a prostate cancer trial.
PMCID: PMC3970927  PMID: 24328715
dyadic; non-ignorable missingness; mixed-effect; longitudinal; latent class
4.  A marginalized conditional linear model for longitudinal binary data when informative dropout occurs in continuous time 
Biostatistics (Oxford, England)  2011;13(2):355-368.
Within the pattern-mixture modeling framework for informative dropout, conditional linear models (CLMs) are a useful approach to deal with dropout that can occur at any point in continuous time (not just at observation times). However, in contrast with selection models, inferences about marginal covariate effects in CLMs are not readily available if nonidentity links are used in the mean structures. In this article, we propose a CLM for long series of longitudinal binary data with marginal covariate effects directly specified. The association between the binary responses and the dropout time is taken into account by modeling the conditional mean of the binary response as well as the dependence between the binary responses given the dropout time. Specifically, parameters in both the conditional mean and dependence models are assumed to be linear or quadratic functions of the dropout time; and the continuous dropout time distribution is left completely unspecified. Inference is fully Bayesian. We illustrate the proposed model using data from a longitudinal study of depression in HIV-infected women, where the strategy of sensitivity analysis based on the extrapolation method is also demonstrated.
PMCID: PMC3297830  PMID: 22133756
Bayesian analysis; HIV/AIDS; Marginal model; Missing data; Sensitivity analysis
5.  Growth Modeling with Non-Ignorable Dropout: Alternative Analyses of the STAR*D Antidepressant Trial 
Psychological methods  2011;16(1):17-33.
This paper uses a general latent variable framework to study a series of models for non-ignorable missingness due to dropout. Non-ignorable missing data modeling acknowledges that missingness may depend on not only covariates and observed outcomes at previous time points as with the standard missing at random (MAR) assumption, but also on latent variables such as values that would have been observed (missing outcomes), developmental trends (growth factors), and qualitatively different types of development (latent trajectory classes). These alternative predictors of missing data can be explored in a general latent variable framework using the Mplus program. A flexible new model uses an extended pattern-mixture approach where missingness is a function of latent dropout classes in combination with growth mixture modeling using latent trajectory classes. A new selection model allows not only an influence of the outcomes on missingness, but allows this influence to vary across latent trajectory classes. Recommendations are given for choosing models. The missing data models are applied to longitudinal data from STAR*D, the largest antidepressant clinical trial in the U.S. to date. Despite the importance of this trial, STAR*D growth model analyses using non-ignorable missing data techniques have not been explored until now. The STAR*D data are shown to feature distinct trajectory classes, including a low class corresponding to substantial improvement in depression, a minority class with a U-shaped curve corresponding to transient improvement, and a high class corresponding to no improvement. The analyses provide a new way to assess drug efficiency in the presence of dropout.
PMCID: PMC3060937  PMID: 21381817
Latent trajectory classes; random effects; survival analysis; not missing at random
6.  Bayesian Inference for Growth Mixture Models with Latent Class Dependent Missing Data 
Multivariate behavioral research  2011;46(4):567-597.
Growth mixture models (GMMs) with nonignorable missing data have drawn increasing attention in research communities but have not been fully studied. The goal of this article is to propose and to evaluate a Bayesian method to estimate the GMMs with latent class dependent missing data. An extended GMM is first presented in which class probabilities depend on some observed explanatory variables and data missingness depends on both the explanatory variables and a latent class variable. A full Bayesian method is then proposed to estimate the model. Through the data augmentation method, conditional posterior distributions for all model parameters and missing data are obtained. A Gibbs sampling procedure is then used to generate Markov chains of model parameters for statistical inference. The application of the model and the method is first demonstrated through the analysis of mathematical ability growth data from the National Longitudinal Survey of Youth 1997 (Bureau of Labor Statistics, U.S. Department of Labor, 1997). A simulation study considering 3 main factors (the sample size, the class probability, and the missing data mechanism) is then conducted and the results show that the proposed Bayesian estimation approach performs very well under the studied conditions. Finally, some implications of this study, including the misspecified missingness mechanism, the sample size, the sensitivity of the model, the number of latent classes, the model comparison, and the future directions of the approach, are discussed.
PMCID: PMC4002129  PMID: 24790248
7.  A comparison of parametric models for the investigation of the shape of cognitive change in the older population 
BMC Neurology  2008;8:16.
Cognitive decline is a major threat to well being in later life. Change scores and regression based models have often been used for its investigation. Most methods used to describe cognitive decline assume individuals lose their cognitive abilities at a constant rate with time. The investigation of the parametric curve that best describes the process has been prevented by restrictions imposed by study design limitations and methodological considerations. We propose a comparison of parametric shapes that could be considered to describe the process of cognitive decline in late life.
Attrition plays a key role in the generation of missing observations in longitudinal studies of older persons. As ignoring missing observations will produce biased results and previous studies point to the important effect of the last observed cognitive score on the probability of dropout, we propose modelling both mechanisms jointly to account for these two considerations in the model likelihood.
Data from four interview waves of a population based longitudinal study of the older population, the Cambridge City over 75 Cohort Study were used. Within a selection model process, latent growth models combined with a logistic regression model for the missing data mechanism were fitted. To illustrate advantages of the model proposed, a sensitivity analysis of the missing data assumptions was conducted.
Results showed that a quadratic curve describes cognitive decline best. Significant heterogeneity between individuals about mean curve parameters was identified. At all interviews, MMSE scores before dropout were significantly lower than those who remained in the study. Individuals with good functional ability were found to be less likely to dropout, as were women and younger persons in later stages of the study.
The combination of a latent growth model with a model for the missing data has permitted to make use of all available data and quantify the effect of significant predictors of dropout on the dropout and observational processes. Cognitive decline over time in older persons is often modelled as a linear process, though we have presented other parametric curves that may be considered.
PMCID: PMC2412911  PMID: 18485192
8.  Mixtures of Varying Coefficient Models for Longitudinal Data with Discrete or Continuous Nonignorable Dropout 
Biometrics  2004;60(4):854-864.
The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the proposed semiparametric model is hence more robust than the parametric conditional linear model. The unconditional distribution of the repeated measures is a mixture over the dropout distribution. We show that estimation in the semiparametric varying coefficient mixture model can proceed by fitting a parametric mixed effects model and can be carried out on standard software platforms such as SAS. The model is used to analyze data from a recent AIDS clinical trial and its performance is evaluated using simulations.
PMCID: PMC2677904  PMID: 15606405
Clinical trials; Equivalence trial; Linear mixed model; Missing data; Nonignorable dropout; Pattern-mixture model; Pediatric AIDS; Selection bias; Smoothing splines
The annals of applied statistics  2012;6(2):753-771.
Dyadic data are common in the social and behavioral sciences, in which members of dyads are correlated due to the interdependence structure within dyads. The analysis of longitudinal dyadic data becomes complex when nonignorable dropouts occur. We propose a fully Bayesian selection-model-based approach to analyze longitudinal dyadic data with nonignorable dropouts. We model repeated measures on subjects by a transition model and account for within-dyad correlations by random effects. In the model, we allow subject’s outcome to depend on his/her own characteristics and measure history, as well as those of the other member in the dyad. We further account for the nonignorable missing data mechanism using a selection model in which the probability of dropout depends on the missing outcome. We propose a Gibbs sampler algorithm to fit the model. Simulation studies show that the proposed method effectively addresses the problem of nonignorable dropouts. We illustrate our methodology using a longitudinal breast cancer study.
PMCID: PMC3693094  PMID: 23814631
Dyadic Data; Missing Data; Nonignorable Dropout; Selection Model
10.  Identification of Multivariate Responders/Non-Responders Using Bayesian Growth Curve Latent Class Models 
In this paper, we propose a multivariate growth curve mixture model that groups subjects based on multiple symptoms measured repeatedly over time. Our model synthesizes features of two models. First, we follow Roy and Lin (2000) in relating the multiple symptoms at each time point to a single latent variable. Second, we use the growth mixture model of Muthén and Shedden (1999) to group subjects based on distinctive longitudinal profiles of this latent variable. The mean growth curve for the latent variable in each class defines that class’s features. For example, a class of “responders” would have a decline in the latent symptom summary variable over time. A Bayesian approach to estimation is employed where the methods of Elliott et al (2005) are extended to simultaneously estimate the posterior distributions of the parameters from the latent variable and growth curve mixture portions of the model. We apply our model to data from a randomized clinical trial evaluating the efficacy of Bacillus Calmette-Guerin (BCG) in treating symptoms of Interstitial Cystitis. In contrast to conventional approaches using a single subjective Global Response Assessment, we use the multivariate symptom data to identify a class of subjects where treatment demonstrates effectiveness. Simulations are used to confirm identifiability results and evaluate the performance of our algorithm. The definitive version of this paper is available at
PMCID: PMC3104279  PMID: 21637724
11.  Joint modeling of multivariate longitudinal data and the dropout process in a competing risk setting: application to ICU data 
Joint modeling of longitudinal and survival data has been increasingly considered in clinical trials, notably in cancer and AIDS. In critically ill patients admitted to an intensive care unit (ICU), such models also appear to be of interest in the investigation of the effect of treatment on severity scores due to the likely association between the longitudinal score and the dropout process, either caused by deaths or live discharges from the ICU. However, in this competing risk setting, only cause-specific hazard sub-models for the multiple failure types data have been used.
We propose a joint model that consists of a linear mixed effects submodel for the longitudinal outcome, and a proportional subdistribution hazards submodel for the competing risks survival data, linked together by latent random effects. We use Markov chain Monte Carlo technique of Gibbs sampling to estimate the joint posterior distribution of the unknown parameters of the model. The proposed method is studied and compared to joint model with cause-specific hazards submodel in simulations and applied to a data set that consisted of repeated measurements of severity score and time of discharge and death for 1,401 ICU patients.
Time by treatment interaction was observed on the evolution of the mean SOFA score when ignoring potentially informative dropouts due to ICU deaths and live discharges from the ICU. In contrast, this was no longer significant when modeling the cause-specific hazards of informative dropouts. Such a time by treatment interaction persisted together with an evidence of treatment effect on the hazard of death when modeling dropout processes through the use of the Fine and Gray model for sub-distribution hazards.
In the joint modeling of competing risks with longitudinal response, differences in the handling of competing risk outcomes appear to translate into the estimated difference in treatment effect on the longitudinal outcome. Such a modeling strategy should be carefully defined prior to analysis.
PMCID: PMC2923158  PMID: 20670425
12.  Developmental Profiles of Eczema, Wheeze, and Rhinitis: Two Population-Based Birth Cohort Studies 
PLoS Medicine  2014;11(10):e1001748.
Using data from two population-based birth cohorts, Danielle Belgrave and colleagues examine the evidence for atopic march in developmental profiles for allergic disorders.
Please see later in the article for the Editors' Summary
The term “atopic march” has been used to imply a natural progression of a cascade of symptoms from eczema to asthma and rhinitis through childhood. We hypothesize that this expression does not adequately describe the natural history of eczema, wheeze, and rhinitis during childhood. We propose that this paradigm arose from cross-sectional analyses of longitudinal studies, and may reflect a population pattern that may not predominate at the individual level.
Methods and Findings
Data from 9,801 children in two population-based birth cohorts were used to determine individual profiles of eczema, wheeze, and rhinitis and whether the manifestations of these symptoms followed an atopic march pattern. Children were assessed at ages 1, 3, 5, 8, and 11 y. We used Bayesian machine learning methods to identify distinct latent classes based on individual profiles of eczema, wheeze, and rhinitis. This approach allowed us to identify groups of children with similar patterns of eczema, wheeze, and rhinitis over time.
Using a latent disease profile model, the data were best described by eight latent classes: no disease (51.3%), atopic march (3.1%), persistent eczema and wheeze (2.7%), persistent eczema with later-onset rhinitis (4.7%), persistent wheeze with later-onset rhinitis (5.7%), transient wheeze (7.7%), eczema only (15.3%), and rhinitis only (9.6%). When latent variable modelling was carried out separately for the two cohorts, similar results were obtained. Highly concordant patterns of sensitisation were associated with different profiles of eczema, rhinitis, and wheeze. The main limitation of this study was the difference in wording of the questions used to ascertain the presence of eczema, wheeze, and rhinitis in the two cohorts.
The developmental profiles of eczema, wheeze, and rhinitis are heterogeneous; only a small proportion of children (∼7% of those with symptoms) follow trajectory profiles resembling the atopic march.
Please see later in the article for the Editors' Summary
Editors' Summary
Our immune system protects us from viruses, bacteria, and other pathogens by recognizing specific molecules on the invader's surface and initiating a sequence of events that culminates in the death of the pathogen. Sometimes, however, our immune system responds to harmless materials (allergens such as pollen) and triggers allergic, or atopic, symptoms. Common atopic symptoms include eczema (transient dry itchy patches on the skin), wheeze (high pitched whistling in the chest, a symptom of asthma), and rhinitis (sneezing or a runny nose in the absence of a cold or influenza). All these symptoms are very common during childhood, but recent epidemiological studies (examinations of the patterns and causes of diseases in a population) have revealed age-related changes in the proportions of children affected by each symptom. So, for example, eczema is more common in infants than in school-age children. These findings have led to the idea of “atopic march,” a natural progression of symptoms within individual children that starts with eczema, then progresses to wheeze and finally rhinitis.
Why Was This Study Done?
The concept of atopic march has led to the initiation of studies that aim to prevent the development of asthma in children who are thought to be at risk of asthma because they have eczema. Moreover, some guidelines recommend that clinicians tell parents that children with eczema may later develop asthma or rhinitis. However, because of the design of the epidemiological studies that support the concept of atopic march, children with eczema who later develop wheeze and rhinitis may actually belong to a distinct subgroup of children, rather than representing the typical progression of atopic diseases. It is important to know whether atopic march adequately describes the natural history of atopic diseases during childhood to avoid the imposition of unnecessary strategies on children with eczema to prevent asthma. Here, the researchers use machine learning techniques to model the developmental profiles of eczema, wheeze, and rhinitis during childhood in two large population-based birth cohorts by taking into account time-related (longitudinal) changes in symptoms within individuals. Machine learning is a data-driven approach that identifies structure within the data (for example, a typical progression of symptoms) using unsupervised learning of latent variables (variables that are not directly measured but are inferred from other observable characteristics).
What Did the Researchers Do and Find?
The researchers used data from two UK birth cohorts—the Avon Longitudinal Study of Parents and Children (ALSPAC) and the Manchester Asthma and Allergy Study (MAAS)—for their study (9,801 children in total). Both studies enrolled children at birth and monitored their subsequent health at regular review clinics. At each review clinic, information about eczema, wheeze, and rhinitis was collected from the parents using validated questionnaires. The researchers then used these data and machine learning methods to identify groups of children with similar patterns of onset of eczema, wheeze, and rhinitis over the first 11 years of life. Using a type of statistical model called a latent disease profile model, the researchers found that the data were best described by eight latent classes—no disease (51.3% of the children), atopic march (3.1%), persistent eczema and wheeze (2.7%), persistent eczema with later-onset rhinitis (4.7%), persistent wheeze with later-onset rhinitis (5.7%), transient wheeze (7.7%), eczema only (15.3%), and rhinitis only (9.6%).
What Do These Findings Mean?
These findings show that, in two large UK birth cohorts, the developmental profiles of eczema, wheeze, and rhinitis were heterogeneous. Most notably, the progression of symptoms fitted the profile of atopic march in fewer than 7% of children with symptoms. The researchers acknowledge that their study has some limitations. For example, small differences in the wording of the questions used to gather information from parents about their children's symptoms in the two cohorts may have slightly affected the findings. However, based on their findings, the researchers propose that, because eczema, wheeze, and rhinitis are common, these symptoms often coexist in individuals, but as independent entities rather than as a linked progression of symptoms. Thus, using eczema as an indicator of subsequent asthma risk and assigning “preventative” measures to children with eczema is flawed. Importantly, clinicians need to understand the heterogeneity of patterns of atopic diseases in children and to communicate this variability to parents when advising them about the development and resolution of atopic symptoms in their children.
Additional Information
Please access these websites via the online version of this summary at
The UK National Health Service Choices website provides information about eczema (including personal stories), asthma (including personal stories), and rhinitis
The US National Institute of Allergy and Infectious Diseases provides information about atopic diseases
The UK not-for-profit organization Allergy UK provides information about atopic diseases and a description of the atopic march
MedlinePlus encyclopedia has pages on eczema, wheezing, and rhinitis (in English and Spanish)
MedlinePlus provides links to further resources about allergies, eczema, and asthma (in English and Spanish)
Information about ALSPAC and MAAS is available
Wikipedia has pages on machine learning and latent disease profile models (note that Wikipedia is a free online encyclopedia that anyone can edit; available in several languages)
PMCID: PMC4204810  PMID: 25335105
13.  Bayesian Inference in Semiparametric Mixed Models for Longitudinal Data 
Biometrics  2009;66(1):70-78.
We consider Bayesian inference in semiparametric mixed models (SPMMs) for longitudinal data. SPMMs are a class of models that use a nonparametric function to model a time effect, a parametric function to model other covariate effects, and parametric or nonparametric random effects to account for the within-subject correlation. We model the nonparametric function using a Bayesian formulation of a cubic smoothing spline, and the random effect distribution using a normal distribution and alternatively a nonparametric Dirichlet process (DP) prior. When the random effect distribution is assumed to be normal, we propose a uniform shrinkage prior (USP) for the variance components and the smoothing parameter. When the random effect distribution is modeled nonparametrically, we use a DP prior with a normal base measure and propose a USP for the hyperparameters of the DP base measure. We argue that the commonly assumed DP prior implies a nonzero mean of the random effect distribution, even when a base measure with mean zero is specified. This implies weak identifiability for the fixed effects, and can therefore lead to biased estimators and poor inference for the regression coefficients and the spline estimator of the nonparametric function. We propose an adjustment using a postprocessing technique. We show that under mild conditions the posterior is proper under the proposed USP, a flat prior for the fixed effect parameters, and an improper prior for the residual variance. We illustrate the proposed approach using a longitudinal hormone dataset, and carry out extensive simulation studies to compare its finite sample performance with existing methods.
PMCID: PMC3081790  PMID: 19432777
Dirichlet process prior; Identifiability; Postprocessing; Random effects; Smoothing spline; Uniform shrinkage prior; Variance components
14.  A Marginal Mixture Model for Selecting Differentially Expressed Genes across Two Types of Tissue Samples 
Bayesian hierarchical models that characterize the distributions of (transformed) gene profiles have been proven very useful and flexible in selecting differentially expressed genes across different types of tissue samples (e.g. Lo and Gottardo, 2007). However, the marginal mean and variance of these models are assumed to be the same for different gene clusters and for different tissue types. Moreover, it is not easy to determine which of the many competing Bayesian hierarchical models provides the best fit for a specific microarray data set. To address these two issues, we propose a marginal mixture model that directly models the marginal distribution of transformed gene profiles. Specifically, we approximate the marginal distributions of transformed gene profiles via a mixture of three-component multivariate Normal distributions, each component of which has the same structures of marginal mean vector and covariance matrix as those for Bayesian hierarchical models, but the values can differ. Based on the proposed model, a method is derived to select genes differentially expressed across two types of tissue samples. The derived gene selection method performs well on a real microarray data set and consistently has the best performance (based on class agreement indices) compared with several other gene selection methods on simulated microarray data sets generated from three different mixture models.
PMCID: PMC2835454  PMID: 20231912
15.  An exploration of fixed and random effects selection for longitudinal binary outcomes in the presence of nonignorable dropout 
Biometrical journal. Biometrische Zeitschrift  2012;55(1):10.1002/bimj.201100107.
We explore a Bayesian approach to selection of variables that represent fixed and random effects in modeling of longitudinal binary outcomes with missing data caused by dropouts. We show via analytic results for a simple example that nonignorable missing data lead to biased parameter estimates. This bias results in selection of wrong effects asymptotically, which we can confirm via simulations for more complex settings. By jointly modeling the longitudinal binary data with the dropout process that possibly leads to nonignorable missing data, we are able to correct the bias in estimation and selection. Mixture priors with a point mass at zero are used to facilitate variable selection. We illustrate the proposed approach using a clinical trial for acute ischemic stroke.
PMCID: PMC3855104  PMID: 23124889
Bayesian variable selection; Bias; Dropout; Missing data; Model selection
16.  Estimating the cumulative risk of false positive cancer screenings 
When evaluating cancer screening it is important to estimate the cumulative risk of false positives from periodic screening. Because the data typically come from studies in which the number of screenings varies by subject, estimation must take into account dropouts. A previous approach to estimate the probability of at least one false positive in n screenings unrealistically assumed that the probability of dropout does not depend on prior false positives.
By redefining the random variables, we obviate the unrealistic dropout assumption. We also propose a relatively simple logistic regression and extend estimation to the expected number of false positives in n screenings.
We illustrate our methodology using data from women ages 40 to 64 who received up to four annual breast cancer screenings in the Health Insurance Program of Greater New York study, which began in 1963. Covariates were age, time since previous screening, screening number, and whether or not a previous false positive occurred. Defining a false positive as an unnecessary biopsy, the only statistically significant covariate was whether or not a previous false positive occurred. Because the effect of screening number was not statistically significant, extrapolation beyond 4 screenings was reasonable. The estimated mean number of unnecessary biopsies in 10 years per woman screened is .11 with 95% confidence interval of (.10, .12). Defining a false positive as an unnecessary work-up, all the covariates were statistically significant and the estimated mean number of unnecessary work-ups in 4 years per woman screened is .34 with 95% confidence interval (.32, .36).
Using data from multiple cancer screenings with dropouts, and allowing dropout to depend on previous history of false positives, we propose a logistic regression model to estimate both the probability of at least one false positive and the expected number of false positives associated with n cancer screenings. The methodology can be used for both informed decision making at the individual level, as well as planning of health services.
PMCID: PMC166156  PMID: 12841854
17.  Bayesian modeling of ChIP-chip data using latent variables 
BMC Bioinformatics  2009;10:352.
The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations.
In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length.
The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the Bayesian latent method can outperform other methods, especially when the data contain outliers.
PMCID: PMC2779819  PMID: 19857265
18.  A Bayesian model for time-to-event data with informative censoring 
Biostatistics (Oxford, England)  2012;13(2):341-354.
Randomized trials with dropouts or censored data and discrete time-to-event type outcomes are frequently analyzed using the Kaplan–Meier or product limit (PL) estimation method. However, the PL method assumes that the censoring mechanism is noninformative and when this assumption is violated, the inferences may not be valid. We propose an expanded PL method using a Bayesian framework to incorporate informative censoring mechanism and perform sensitivity analysis on estimates of the cumulative incidence curves. The expanded method uses a model, which can be viewed as a pattern mixture model, where odds for having an event during the follow-up interval (tk−1,tk], conditional on being at risk at tk−1, differ across the patterns of missing data. The sensitivity parameters relate the odds of an event, between subjects from a missing-data pattern with the observed subjects for each interval. The large number of the sensitivity parameters is reduced by considering them as random and assumed to follow a log-normal distribution with prespecified mean and variance. Then we vary the mean and variance to explore sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the expanded model, thus allowing exploration of the sensitivity to inferences as departures from the inferences under the MAR assumption. The proposed approach is applied to data from the TRial Of Preventing HYpertension.
PMCID: PMC3297827  PMID: 22223746
Clinical trials; Hypertension; Ignorability index; Missing data; Pattern-mixture model; TROPHY trial
19.  Flexible marginalized models for bivariate longitudinal ordinal data 
Biostatistics (Oxford, England)  2013;14(3):462-476.
Random effects models are commonly used to analyze longitudinal categorical data. Marginalized random effects models are a class of models that permit direct estimation of marginal mean parameters and characterize serial correlation for longitudinal categorical data via random effects (Heagerty, 1999). Marginally specified logistic-normal models for longitudinal binary data. Biometrics 55, 688–698; Lee and Daniels, 2008. Marginalized models for longitudinal ordinal data with application to quality of life studies. Statistics in Medicine 27, 4359–4380). In this paper, we propose a Kronecker product (KP) covariance structure to capture the correlation between processes at a given time and the correlation within a process over time (serial correlation) for bivariate longitudinal ordinal data. For the latter, we consider a more general class of models than standard (first-order) autoregressive correlation models, by re-parameterizing the correlation matrix using partial autocorrelations (Daniels and Pourahmadi, 2009). Modeling covariance matrices via partial autocorrelations. Journal of Multivariate Analysis 100, 2352–2363). We assess the reasonableness of the KP structure with a score test. A maximum marginal likelihood estimation method is proposed utilizing a quasi-Newton algorithm with quasi-Monte Carlo integration of the random effects. We examine the effects of demographic factors on metabolic syndrome and C-reactive protein using the proposed models.
PMCID: PMC3677737  PMID: 23365416
Kronecker product; Metabolic syndrome; Partial autocorrelation
20.  A full Bayesian hierarchical mixture model for the variance of gene differential expression 
BMC Bioinformatics  2007;8:124.
In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log2 transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log2 transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.
We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.
The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the p-values obtained are more robust than either using a constant gene or gene-specific variance estimate.
PMCID: PMC1876253  PMID: 17439644
21.  A Bayesian Non-Parametric Potts Model with Application to Pre-Surgical FMRI Data 
Statistical methods in medical research  2012;22(4):10.1177/0962280212448970.
The Potts model has enjoyed much success as a prior model for image segmentation. Given the individual classes in the model, the data are typically modeled as Gaussian random variates or as random variates from some other parametric distribution. In this manuscript we present a non-parametric Potts model and apply it to an FMRI study for the pre-surgical assessment of peritumoral brain activation. In our model we assume that the Z-score image from a patient can be segmented into activated, deactivated and null classes, or states. Conditional on the class, or state, the Z-scores are assumed to come from some generic distribution which we model non-parametrically using a mixture of Dirichlet process priors within the Bayesian framework. The posterior distribution of the model parameters is estimated with a Markov chain Monte Carlo algorithm and Bayesian decision theory is used to make the final classifications. Our Potts prior model includes two parameters, the standard spatial regularization parameter and a parameter that can be interpreted as the a priori probability that each voxel belong to the null, or background state, conditional on the lack of spatial regularization. We assume that both of these parameters are unknown, and jointly estimate them along with other model parameters. We show through simulation studies that our model performs on par, in terms of posterior expected loss, with parametric Potts models when the parametric model is correctly specified, and outperforms parametric models when the parametric model in misspecified.
PMCID: PMC3843604  PMID: 22627277
Decision Theory; Dirichlet process; FMRI; hidden Markov random field; Non-parametric Bayes; Potts model
22.  A Two-Latent-Class Model for Smoking Cessation Data with Informative Dropouts 
Non ignorable missing data is a common problem in longitudinal studies. Latent class models are attractive for simplifying the modeling of missing data when the data are subject to either a monotone or intermittent missing data pattern. In our study, we propose a new two-latent-class model for categorical data with informative dropouts, dividing the observed data into two latent classes; one class in which the outcomes are deterministic and a second one in which the outcomes can be modeled using logistic regression. In the model, the latent classes connect the longitudinal responses and the missingness process under the assumption of conditional independence. Parameters are estimated by the method of maximum likelihood estimation based on the above assumptions and the tetrachoric correlation between responses within the same subject. We compare the proposed method with the shared parameter model and the weighted GEE model using the areas under the ROC curves in the simulations and the application to the smoking cessation data set. The simulation results indicate that the proposed two-latent-class model performs well under different missing procedures. The application results show that our proposed method is better than the shared parameter model and the weighted GEE model.
PMCID: PMC2879593  PMID: 20523912
Area under ROC curve; Informative dropout; Latent class; Tetrachoric correlation
23.  A Semiparametric Marginalized Model for Longitudinal Data with Informative Dropout 
Journal of probability and statistics  2012;2012(2012):734341.
We propose a marginalized joint-modeling approach for marginal inference on the association between longitudinal responses and covariates when longitudinal measurements are subject to informative dropouts. The proposed model is motivated by the idea of linking longitudinal responses and dropout times by latent variables while focusing on marginal inferences. We develop a simple inference procedure based on a series of estimating equations, and the resulting estimators are consistent and asymptotically normal with a sandwich-type covariance matrix ready to be estimated by the usual plug-in rule. The performance of our approach is evaluated through simulations and illustrated with a renal disease data application.
PMCID: PMC3261622  PMID: 22267962
24.  Bayesian multivariate growth curve latent class models for mixed outcomes 
Statistics in medicine  2012;10.1002/sim.5596.
In many clinical studies, the disease of interest is multi-faceted, and multiple outcomes are needed to adequately capture information about the characteristics of the disease or its severity. In analysis of such diseases, it is often difficult to determine what constitutes improvement due to the multivariate nature of the outcome. Furthermore, when the disease of interest has an unknown etiology and/or is primarily a symptom-defined syndrome, there is potential for the disease population to have distinct subgroups. Identification of population subgroups is of interest as it may assist clinicians in providing appropriate treatment or in developing accurate prognoses. We propose multivariate growth curve latent class models that group subjects based on multiple symptoms measured repeatedly over time. These groups or latent classes are defined by distinctive longitudinal profiles of a latent variable which is used to summarize the multivariate outcomes at each point in time. The mean growth curve for the latent variable in each class defines the features of the class. We develop this model for any combination of continuous, binary, ordinal or count outcomes within a Bayesian hierarchical framework. Simulation studies are used to validate the estimation procedures. We apply our model to data from a randomized clinical trial evaluating the efficacy of Bacillus Calmette-Guerin in treating symptoms of Interstitial Cystitis where we are able to identify a class of subjects for whom treatment is effective.
PMCID: PMC3676449  PMID: 22961883
25.  Imputation-based strategies for clinical trial longitudinal data with nonignorable missing values 
Statistics in medicine  2008;27(15):2826-2849.
Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.
PMCID: PMC3032542  PMID: 18205247
multiple partial imputation; selection model; pattern-mixture model; Markov transition model; nonignorable dropout; intermittent missing values

Results 1-25 (1175601)