A significant source of missing data in longitudinal epidemiologic studies on elderly individuals is death. It is generally believed that these missing data by death are non-ignorable to likelihood based inference. Inference based on data only from surviving participants in the study may lead to biased results. In this paper we model both the probability of disease and the probability of death using shared random effect parameters. We also propose to use the Laplace approximation for obtaining an approximate likelihood function so that high dimensional integration over the distributions of the random effect parameters is not necessary. Parameter estimates can be obtained by maximizing the approximate log-likelihood function. Data from a longitudinal dementia study will be used to illustrate the approach. A small simulation is conducted to compare parameter estimates from the proposed method to the ‘naive’ method where missing data is considered at random.
In designing a longitudinal cluster randomized clinical trial (cluster-RCT), the interventions are randomly assigned to clusters such as clinics. Subjects within the same clinic will receive the identical intervention. Each will be assessed repeatedly over the course of the study. A mixed-effects linear regression model can be applied in a cluster-RCT with three level data to test the hypothesis that the intervention groups differ in the course of outcome over time. Using a test statistic based on maximum likelihood estimates, we derived closed form formulae for statistical power to detect the intervention by time interaction and the sample size requirements for each level. Importantly, the sample size does not depend on correlations among second level data units and the statistical power function depends on the number of second and third level data units through their product. A simulation study confirmed that theoretical power estimates based on the derived formulae are nearly identical to empirical estimates.
longitudinal cluster RCT; three level data; power; sample size; intervention by time interaction; effect size
Despite their importance in biology and biomedicine, genetic mapping of binary traits that change over time has not been well explored. In this article, we develop a statistical model for mapping quantitative trait loci (QTLs) that govern longitudinal responses of binary traits. The model is constructed within the maximum likelihood framework by which the association between binary responses is modeled in terms of conditional log odds-ratios. With this parameterization, the maximum likelihood estimates (MLEs) of marginal mean parameters are robust to the misspecification of time dependence. We implement an iterative procedures to obtain the MLEs of QTL genotype-specific parameters that define longitudinal binary responses. The usefulness of the model was validated by analyzing a real example in rice. Simulation studies were performed to investigate the statistical properties of the model, showing that the model has power to identify and map specific QTLs responsible for the temporal pattern of binary traits.
binary trait; dynamic trait; functional mapping; maximum likelihood estimate
Estimation of longitudinal data covariance structure poses significant challenges because the data are usually collected at irregular time points. A viable semiparametric model for covariance matrices was proposed in Fan, Huang and Li (2007) that allows one to estimate the variance function nonparametrically and to estimate the correlation function parametrically via aggregating information from irregular and sparse data points within each subject. However, the asymptotic properties of their quasi-maximum likelihood estimator (QMLE) of parameters in the covariance model are largely unknown. In the current work, we address this problem in the context of more general models for the conditional mean function including parametric, nonparametric, or semi-parametric. We also consider the possibility of rough mean regression function and introduce the difference-based method to reduce biases in the context of varying-coefficient partially linear mean regression models. This provides a more robust estimator of the covariance function under a wider range of situations. Under some technical conditions, consistency and asymptotic normality are obtained for the QMLE of the parameters in the correlation function. Simulation studies and a real data example are used to illustrate the proposed approach.
Correlation structure; difference-based estimation; quasi-maximum likelihood; varying-coefficient partially linear model
The analysis of longitudinal data to study changes in variables measured repeatedly over time has received considerable attention in many fields. This paper proposes a two-level structural equation model for analyzing multivariate longitudinal responses that are mixed continuous and ordered categorical variables. The first-level model is defined for measures taken at each time point nested within individuals for investigating their characteristics that are changed with time. The second level is defined for individuals to assess their characteristics that are invariant with time. The proposed model accommodates fixed covariates, nonlinear terms of the latent variables, and missing data. A maximum likelihood (ML) approach is developed for the estimation of parameters and model comparison. Results of a simulation study indicate that the performance of the ML estimation is satisfactory. The proposed methodology is applied to a longitudinal study concerning cocaine use.
latent variables; longitudinal study on cocaine use; maximum likelihood; MCEM algorithm; model comparison; ordered categorical variables
The current study described patterns of yoga practice and examined differences in physical activity over time between individuals with or at risk for type 2 diabetes who completed an 8-week yoga intervention compared with controls.
A longitudinal comparative design measured the effect of a yoga intervention on yoga practice and physical activity, using data at baseline and postintervention months 3, 6, and 15.
Disparate patterns of yoga practice occurred between intervention and control participants over time, but the subjective definition of yoga practice limits interpretation. Multilevel model estimates indicated that treatment group did not have a significant influence in the rate of change in physical activity over the study period. While age and education were not significant individual predictors, the inclusion of these variables in the model did improve fit.
Findings indicate that an 8-week yoga intervention had little effect on physical activity over time. Further research is necessary to explore the influence of yoga on behavioral health outcomes among individuals with or at risk for type 2 diabetes.
mind-body; longitudinal; health; health behavior
In analysis of longitudinal data, it is not uncommon that observation times of repeated measurements are subject-specific and correlated with underlying longitudinal outcomes. Taking account of the dependence between observation times and longitudinal outcomes is critical under these situations to assure the validity of statistical inference. In this article, we propose a flexible joint model for longitudinal data analysis in the presence of informative observation times. In particular, the new procedure considers the shared random-effect model and assume a time-varying coefficient for the latent variable, allowing a flexible way of modeling longitudinal outcomes while adjusting their association with observation times. Estimating equations are developed for parameter estimation. We show that the resulting estimators are consistent and asymptotically normal, with variance-covariance matrix that has a closed form and can be consistently estimated by the usual plug-in method. One additional advantage of the procedure is that, it provides a unified framework to test whether the effect of the latent variable is zero, constant, or time-varying. Simulation studies show that the proposed approach is appropriate for practical use. An application to a bladder cancer data is also given to illustrate the methodology.
Estimating equation method; Informative observation times; Longitudinal data analysis; Time-varying effect
Improving efficiency for regression coefficients and predicting trajectories of individuals are two important aspects in analysis of longitudinal data. Both involve estimation of the covariance function. Yet, challenges arise in estimating the covariance function of longitudinal data collected at irregular time points. A class of semiparametric models for the covariance function is proposed by imposing a parametric correlation structure while allowing a nonparametric variance function. A kernel estimator is developed for the estimation of the nonparametric variance function. Two methods, a quasi-likelihood approach and a minimum generalized variance method, are proposed for estimating parameters in the correlation structure. We introduce a semiparametric varying coefficient partially linear model for longitudinal data and propose an estimation procedure for model coefficients by using a profile weighted least squares approach. Sampling properties of the proposed estimation procedures are studied and asymptotic normality of the resulting estimators is established. Finite sample performance of the proposed procedures is assessed by Monte Carlo simulation studies. The proposed methodology is illustrated by an analysis of a real data example.
Kernel regression; local linear regression; profile weighted least squares; semiparametric varying coefficient model
Score limitation at the top of a scale is commonly termed “ceiling effect.” Ceiling effects can lead to serious artifactual parameter estimates in most data analysis. This study examines the consequences of ceiling effects in longitudinal data analysis and investigates several methods of dealing with ceiling effects through Monte Carlo simulations and empirical data analyses. Data were simulated based on a latent growth curve model with T = 5 occasions. The proportion of the ceiling data [10%–40%] was manipulated by using different thresholds, and estimated parameters were examined for R = 500 replications. The results showed that ceiling effects led to incorrect model selection and biased parameter estimation (shape of the curve and magnitude of the changes) when regular growth curve models were applied. The Tobit growth curve model, instead, performed very well in dealing with ceiling effects in longitudinal data analysis. The Tobit growth curve model was then applied in an empirical cognitive aging study and the results were discussed.
Objective To determine the feasibility and acceptability of an interdisciplinary intervention for mothers of children newly diagnosed with cancer and to estimate effect sizes for the intervention in reducing distress. Management of illness uncertainty was a key framework for the intervention. Methods Mothers (N = 52) were randomly assigned to the intervention or a treatment as usual group, completing measures at baseline and follow-up time points. Results Mothers’ satisfaction ratings were consistently high, and intervention implementation appeared feasible. Significant mean effects or trends in favor of the intervention group were found for pre-to-post change on measures of distress. Evidence of a preventative effect was also observed; mothers in the intervention group tended to improve or remain stable in their adjustment, whereas many parents in the treatment as usual group showed worsening outcomes. Conclusions An interdisciplinary intervention targeting maternal illness uncertainty has clinical value within this sample.
clinical intervention; psychosocial outcomes; uncertainty
Procedures for estimating the parameters of the general class of semiparametric models for recurrent events proposed by Peña and Hollander (2004) are developed. This class of models incorporates an effective age function encoding the effect of changes after each event occurrence such as the impact of an intervention, it models the impact of accumulating event occurrences on the unit, it admits a link function in which the effect of possibly time-dependent covariates are incorporated, and it allows the incorporation of unobservable frailty components which induce dependencies among the inter-event times for each unit. The estimation procedures are semiparametric in that a baseline hazard function is nonparametrically specified. The sampling distribution properties of the estimators are examined through a simulation study, and the consequences of mis-specifying the model are analyzed. The results indicate that the flexibility of this general class of models provides a safeguard for analyzing recurrent event data, even data possibly arising from a frailtyless mechanism. The estimation procedures are applied to real data sets arising in the biomedical and public health settings, as well as from reliability and engineering situations. In particular, the procedures are applied to a data set pertaining to times to recurrence of bladder cancer and the results of the analysis are compared to those obtained using three methods of analyzing recurrent event data.
Correlated inter-event times; counting process; effective age process; EM algorithm; frailty; intensity models; model mis-specification; sum-quota accrual scheme
Population health attributes (such as disease incidence and prevalence) are often estimated using sentinel hospital records, which are subject to multiple sources of uncertainty. When applied to these health attributes, commonly used biased estimation techniques can lead to false conclusions and ineffective disease intervention and control. Although some estimators can account for measurement error (in the form of white noise, usually after de-trending), most mainstream health statistics techniques cannot generate unbiased and minimum error variance estimates when the available data are biased.
Methods and Findings
A new technique, called the Biased Sample Hospital-based Area Disease Estimation (B-SHADE), is introduced that generates space-time population disease estimates using biased hospital records. The effectiveness of the technique is empirically evaluated in terms of hospital records of disease incidence (for hand-foot-mouth disease and fever syndrome cases) in Shanghai (China) during a two-year period. The B-SHADE technique uses a weighted summation of sentinel hospital records to derive unbiased and minimum error variance estimates of area incidence. The calculation of these weights is the outcome of a process that combines: the available space-time information; a rigorous assessment of both, the horizontal relationships between hospital records and the vertical links between each hospital's records and the overall disease situation in the region. In this way, the representativeness of the sentinel hospital records was improved, the possible biases of these records were corrected, and the generated area incidence estimates were best linear unbiased estimates (BLUE). Using the same hospital records, the performance of the B-SHADE technique was compared against two mainstream estimators.
The B-SHADE technique involves a hospital network-based model that blends the optimal estimation features of the Block Kriging method and the sample bias correction efficiency of the ratio estimator method. In this way, B-SHADE can overcome the limitations of both methods: Block Kriging's inadequacy concerning the correction of sample bias and spatial clustering; and the ratio estimator's limitation as regards error minimization. The generality of the B-SHADE technique is further demonstrated by the fact that it reduces to Block Kriging in the case of unbiased samples; to ratio estimator if there is no correlation between hospitals; and to simple statistic if the hospital records are neither biased nor space-time correlated. In addition to the theoretical advantages of the B-SHADE technique over the two other methods above, two real world case studies (hand-foot-mouth disease and fever syndrome cases) demonstrated its empirical superiority, as well.
Intervention effects estimated from non-randomized intervention studies are plagued by biases, yet social or structural intervention studies are rarely randomized. There are underutilized statistical methods available to mitigate biases due to self-selection, missing data, and confounding in longitudinal, observational data permitting estimation of causal effects. We demonstrate the use of Inverse Probability Weighting (IPW) to evaluate the effect of participating in a combined clinical and social STI/HIV prevention intervention on reduction of incident chlamydia and gonorrhea infections among sex workers in Brazil.
We demonstrate the step-by-step use of IPW, including presentation of the theoretical background, data set up, model selection for weighting, application of weights, estimation of effects using varied modeling procedures, and discussion of assumptions for use of IPW.
420 sex workers contributed data on 840 incident chlamydia and gonorrhea infections. Participators were compared to non-participators following application of inverse probability weights to correct for differences in covariate patterns between exposed and unexposed participants and between those who remained in the intervention and those who were lost-to-follow-up. Estimators using four model selection procedures provided estimates of intervention effect between odds ratio (OR) .43 (95% CI:.22-.85) and .53 (95% CI:.26-1.1).
After correcting for selection bias, loss-to-follow-up, and confounding, our analysis suggests a protective effect of participating in the Encontros intervention. Evaluations of behavioral, social, and multi-level interventions to prevent STI can benefit by introduction of weighting methods such as IPW.
Intervention Evaluation; Inverse Probability Weights; STI/HIV Prevention; Statistical Methods
Analysis of longitudinal ordered categorical efficacy or safety data in clinical trials using mixed models is increasingly performed. However, algorithms available for maximum likelihood estimation using an approximation of the likelihood integral, including LAPLACE approach, may give rise to biased parameter estimates. The SAEM algorithm is an efficient and powerful tool in the analysis of continuous/count mixed models. The aim of this study was to implement and investigate the performance of the SAEM algorithm for longitudinal categorical data. The SAEM algorithm is extended for parameter estimation in ordered categorical mixed models together with an estimation of the Fisher information matrix and the likelihood. We used Monte Carlo simulations using previously published scenarios evaluated with NONMEM. Accuracy and precision in parameter estimation and standard error estimates were assessed in terms of relative bias and root mean square error. This algorithm was illustrated on the simultaneous analysis of pharmacokinetic and discretized efficacy data obtained after a single dose of warfarin in healthy volunteers. The new SAEM algorithm is implemented in MONOLIX 3.1 for discrete mixed models. The analyses show that for parameter estimation, the relative bias is low for both fixed effects and variance components in all models studied. Estimated and empirical standard errors are similar. The warfarin example illustrates how simple and rapid it is to analyze simultaneously continuous and discrete data with MONOLIX 3.1. The SAEM algorithm is extended for analysis of longitudinal categorical data. It provides accurate estimates parameters and standard errors. The estimation is fast and stable.
categorical data; mixed models; MONOLIX; proportional odds model; SAEM
Random effects are often used in generalized linear models to explain the serial dependence for longitudinal categorical data. Marginalized random effects models (MREMs) for the analysis of longitudinal binary data have been proposed to permit likelihood-based estimation of marginal regression parameters. In this paper, we introduce an extension of the MREM to accommodate longitudinal ordinal data. Maximum marginal likelihood estimation is implemented utilizing quasi-Newton algorithms with Monte Carlo integration of the random effects. Our approach is applied to analyze the quality of life data from a recent colorectal cancer clinical trial. Dropout occurs at a high rate and is often due to tumor progression or death. To deal with progression/death, we use a mixture model for the joint distribution of longitudinal measures and progression/death times and principal stratification to draw causal inferences about survivors.
marginalized likelihood-based models; ordinal data models; dropout
For longitudinal binary data with non-monotone non-ignorably missing outcomes over time, a full likelihood approach is complicated algebraically, and with many follow-up times, maximum likelihood estimation can be computationally prohibitive. As alternatives, two pseudo-likelihood approaches have been proposed that use minimal parametric assumptions. One formulation requires specification of the marginal distributions of the outcome and missing data mechanism at each time point, but uses an “independence working assumption,” i.e., an assumption that observations are independent over time. Another method avoids having to estimate the missing data mechanism by formulating a “protective estimator.” In simulations, these two estimators can be very inefficient, both for estimating time trends in the first case and for estimating both time-varying and time-stationary effects in the second. In this paper, we propose use of the optimal weighted combination of these two estimators, and in simulations we show that the optimal weighted combination can be much more efficient than either estimator alone. Finally, the proposed method is used to analyze data from two longitudinal clinical trials of HIV-infected patients.
Dietary intervention trials aim to change dietary patterns of individuals. Participating in such trials could impact dietary self-report in divergent ways: Dietary counseling and training on portion-size estimation could improve self-report accuracy; participant burden could increase systematic error. Such intervention-associated biases could complicate interpretation of trial results. The authors investigated intervention-associated biases in reported total carotenoid intake using data on 3,088 breast cancer survivors recruited between 1995 and 2000 and followed through 2006 in the Women's Healthy Eating and Living Study, a randomized intervention trial. Longitudinal data from 2 self-report methods (24-hour recalls and food frequency questionnaires) and a plasma carotenoid biomarker were collected. A flexible measurement error model was postulated. Parameters were estimated in a Bayesian framework by using Markov chain Monte Carlo methods. Results indicated that the validity (i.e., correlation with “true” intake) of both self-report methods was significantly higher during follow-up for intervention versus nonintervention participants (4-year validity estimates: intervention = 0.57 for food frequency questionnaires and 0.58 for 24-hour recalls; nonintervention = 0.42 for food frequency questionnaires and 0.48 for 24-hour recalls). However, within- and between-instrument error correlations during follow-up were higher among intervention participants, indicating an increase in systematic error. Diet interventions can impact measurement errors of dietary self-report. Appropriate statistical methods should be applied to examine intervention-associated biases when interpreting results of diet trials.
bias (epidemiology); diet; intervention studies; Markov chain Monte Carlo; measurement error; nutrition assessment; reproducibility of results; validity
Infants in foster care need sensitive, responsive caregivers to promote their healthy outcomes. The current study examined the effectiveness of the Attachment and Biobehavioral Catch-up Intervention, a short-term, targeted, attachment-based intervention program designed to promote sensitive caregiving behavior among foster mothers. Ninety-six foster mother–infant dyads participated in this study; 44 dyads were assigned to the Attachment and Biobehavioral Catch-up Intervention, and 52 dyads were assigned to a control intervention. Results of hierarchical linear modeling indicated that foster mothers who were assigned to the Attachment and Biobehavioral Catch-up Intervention showed greater improvements in their sensitivity from pre- to postintervention assessment time points when compared with foster mothers who were assigned to the control intervention. We conclude that a short-term, targeted, attachment-based intervention is effective in changing foster mothers’ responsiveness to their foster infants, which is critical for foster infants’ healthy socioemotional adjustment.
Biomedical research is plagued with problems of missing data, especially in clinical trials of medical and behavioral therapies adopting longitudinal design. After a literature review on modeling incomplete longitudinal data based on full-likelihood functions, this paper proposes a set of imputation-based strategies for implementing selection, pattern-mixture, and shared-parameter models for handling intermittent missing values and dropouts that are potentially nonignorable according to various criteria. Within the framework of multiple partial imputation, intermittent missing values are first imputed several times; then, each partially imputed data set is analyzed to deal with dropouts with or without further imputation. Depending on the choice of imputation model or measurement model, there exist various strategies that can be jointly applied to the same set of data to study the effect of treatment or intervention from multi-faceted perspectives. For illustration, the strategies were applied to a data set with continuous repeated measures from a smoking cessation clinical trial.
multiple partial imputation; selection model; pattern-mixture model; Markov transition model; nonignorable dropout; intermittent missing values
A significant source of missing data in longitudinal epidemiological studies on elderly individuals is death. Subjects in large scale community-based longitudinal dementia studies are usually evaluated for disease status in study waves, not under continuous surveillance as in traditional cohort studies. Therefore, for the deceased subjects, disease status prior to death cannot be ascertained. Statistical methods assuming deceased subjects to be missing at random may not be realistic in dementia studies and may lead to biased results. We propose a stochastic model approach to simultaneously estimate disease incidence and mortality rates. We set up a Markov chain model consisting of three states, non-diseased, diseased and dead, and estimate the transition hazard parameters using the maximum likelihood approach. Simulation results are presented indicating adequate performance of the proposed approach.
longitudinal data; stochastic model; informative missing; dementia studies
Longitudinal imaging studies are essential to understanding the neural development of neuropsychiatric disorders, substance use disorders, and the normal brain. The main objective of this paper is to develop a two-stage adjusted exponentially tilted empirical likelihood (TAETEL) for the spatial analysis of neuroimaging data from longitudinal studies. The TAETEL method allows us to efficiently analyze longitudinal data without correctly modeling temporal correlation and to classify different time-dependent covariate types. To account for spatial dependence, the TAETEL method developed here specifically combines all the data in the neighborhood of each voxel (or pixel) on a 3 dimensional (3D) volume (or 2D surface) with appropriate weights to calculate adaptive parameter estimates and adaptive test statistics. Simulation studies are used to examine the finite sample performance of the adjusted exponential tilted likelihood ratio statistic and TAETEL. We demonstrate the application of our statistical methods to the detection of the difference in the morphological changes of the hippocampus across time between schizophrenia patients and healthy subjects in a longitudinal schizophrenia study.
Hippocampus shape; longitudinal data; time-dependent covariate; two-stage adjusted exponentially tilted empirical likelihood
The aim of this paper is to develop a semiparametric model for describing the variability of the medial representation of subcortical structures, which belongs to a Riemannian manifold, and establishing its association with covariates of interest, such as diagnostic status, age and gender. We develop a two-stage estimation procedure to calculate the parameter estimates. The first stage is to calculate an intrinsic least squares estimator of the parameter vector using the annealing evolutionary stochastic approximation Monte Carlo algorithm and then the second stage is to construct a set of estimating equations to obtain a more efficient estimate with the intrinsic least squares estimate as the starting point. We use Wald statistics to test linear hypotheses of unknown parameters and establish their limiting distributions. Simulation studies are used to evaluate the accuracy of our parameter estimates and the finite sample performance of the Wald statistics. We apply our methods to the detection of the difference in the morphological changes of the left and right hippocampi between schizophrenia patients and healthy controls using medial shape description.
Intrinsic least squares estimator; Medial representation; Semiparametric model; Wald statistic
Existing methods for joint modeling of longitudinal measurements and survival data can be highly influenced by outliers in the longitudinal outcome. We propose a joint model for analysis of longitudinal measurements and competing risks failure time data which is robust in the presence of outlying longitudinal observations during follow-up. Our model consists of a linear mixed effects sub-model for the longitudinal outcome and a proportional cause-specific hazards frailty sub-model for the competing risks data, linked together by latent random effects. Instead of the usual normality assumption for measurement errors in the linear mixed effects sub-model, we adopt a t-distribution which has a longer tail and thus is more robust to outliers. We derive an EM algorithm for the maximum likelihood estimates of the parameters and estimate their standard errors using a profile likelihood method. The proposed method is evaluated by simulation studies and is applied to a scleroderma lung study.
Cause-specific hazard; EM algorithm; Joint modeling; Longitudinal data; Non-ignorable missing data; Robust inference
A recurrent statistical problem in cell biology is to draw inference about cell kinetics from observations collected at discrete time points. We investigate this problem when multiple cell clones are observed longitudinally over time. The theory of age-dependent branching processes provides an appealing framework for the quantitative analysis of such data. Likelihood inference being difficult in this context, we propose an alternative composite likelihood approach, where the estimation function is defined from the marginal or conditional distributions of the number of cells of each observable cell type. These distributions have generally no closed-form expressions but they can be approximated using simulations. We construct a bias-corrected version of the estimating function, which also offers computational advantages. Two algorithms are discussed to compute parameter estimates. Large sample properties of the estimator are presented. The performance of the proposed method in finite samples is investigated in simulation studies. An application to the analysis of the generation of oligodendrocytes from oligodendrocyte type-2 astrocyte progenitor cells cultured in vitro reveals the effect of neurothrophin-3 on these cells. Our work demonstrates also that the proposed approach outperforms the existing ones.
Bias correction; Cell differentiation; Composite likelihood; Discrete data; Monte Carlo; Neurotrophin-3; Oligodendrocytes; Precursor cell; Stochastic model
We explore a Bayesian approach to selection of variables that represent fixed and random effects in modeling of longitudinal binary outcomes with missing data caused by dropouts. We show via analytic results for a simple example that nonignorable missing data lead to biased parameter estimates. This bias results in selection of wrong effects asymptotically, which we can confirm via simulations for more complex settings. By jointly modeling the longitudinal binary data with the dropout process that possibly leads to nonignorable missing data, we are able to correct the bias in estimation and selection. Mixture priors with a point mass at zero are used to facilitate variable selection. We illustrate the proposed approach using a clinical trial for acute ischemic stroke.
Bayesian variable selection; Bias; Dropout; Missing data; Model selection