We consider the problem of identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial, and it is desirable that the subgroup be defined by a limited number of covariates. For this problem, the development of a standard, pre-determined strategy may help to avoid the well-known dangers of subgroup analysis. We present a method developed to find subgroups of enhanced treatment effect. This method, referred to as “Virtual Twins”, involves predicting response probabilities for treatment and control “twins” for each subject. The difference in these probabilities is then used as the outcome in a classification or regression tree, which can potentially include any set of the covariates. We define a measure Q(Â) to be the difference between the treatment effect in estimated subgroup Â and the marginal treatment effect. We present several methods developed to obtain an estimate of Q(Â), including estimation of Q(Â) using estimated probabilities in the original data, using estimated probabilities in newly simulated data, two cross-validation-based approaches and a bootstrap-based bias corrected approach. Results of a simulation study indicate that the Virtual Twins method noticeably outperforms logistic regression with forward selection when a true subgroup of enhanced treatment effect exists. Generally, large sample sizes or strong enhanced treatment effects are needed for subgroup estimation. As an illustration, we apply the proposed methods to data from a randomized clinical trial.
randomized clinical trials; subgroups; random forests; regression trees; tailored therapeutics
With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers which is the best available measure of the underlying biological process. This same biological process may also be measured by W, coming from a prior technology but correlated with X. On a moderately sized sample, we have (Y,X,W), and on a larger sample we have (Y,W). We utilize the data on W to boost the prediction of Y by X. When p is large and the subsample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies. We propose to shrink the regression coefficients β of Y on X toward different targets that use information derived from W in the larger dataset. We compare these proposals with the classical ridge regression of Y on X, which does not use W. We also unify all of these methods as targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of β. With an optimal choice of weights, the hybrid estimator balances efficiency and robustness in a data-adaptive way to theoretically yield a smaller prediction error than any of its constituents. The methods, including a fully Bayesian alternative, are evaluated via simulation studies. We also apply them to a gene-expression dataset. mRNA expression measured via quantitative real-time polymerase chain reaction is used to predict survival time in lung cancer patients, with auxiliary information from microarray technology available on a larger sample.
Cross-validation; Generalized ridge; Mean squared prediction error; Measurement error
Health behaviors have been shown to be associated with recurrence risk and survival rates in cancer patients and are also associated with Interleukin-6 levels, but few epidemiologic studies have investigated the relationship of health behaviors and Interleukin-6 among cancer populations. The purpose of the study is to look at the relationship between five health behaviors: smoking, alcohol problems, body mass index (a marker of nutritional status), physical activity, and sleep and pretreatment Interleukin-6 levels in persons with head and neck cancer.
Patients (N=409) were recruited in otolaryngology clinic waiting rooms and invited to complete written surveys. A medical record audit was also conducted. Descriptive statistics and multivariate analyses were conducted to determine which health behaviors were associated with higher Interleukin-6 levels controlling for demographic and clinical variables among newly diagnosed head and neck cancer patients.
While smoking, alcohol problems, body mass index, physical activity, and sleep were associated with Interleukin-6 levels in bivariate analysis, only smoking (current and former) and decreased sleep were independent predictors of higher Interleukin-6 levels in multivariate regression analysis. Covariates associated with higher Interleukin-6 levels were age and higher tumor stage, while comorbidities were marginally significant.
Health behaviors, particularly smoking and sleep disturbances, are associated with higher Interleukin-6 levels among head and neck cancer patients.
Treating health behavior problems, especially smoking and sleep disturbances, may be beneficial to decreasing Interleukin-6 levels which could have a beneficial effect on overall cancer treatment outcomes.
head and neck/oral cancers; tobacco; cytokines; diet, alcohol, smoking, and other lifestyle risk factors; molecular markers in prevention research
The increasing availability and use of predictive models to facilitate informed decision making highlights the need for careful assessment of the validity of these models. In particular, models involving biomarkers require careful validation for two reasons: issues with overfitting when complex models involve a large number of biomarkers, and inter-laboratory variation in assays used to measure biomarkers. In this paper we distinguish between internal and external statistical validation. Internal validation, involving training-testing splits of the available data or cross-validation, is a necessary component of the model building process and can provide valid assessments of model performance. External validation consists of assessing model performance on one or more datasets collected by different investigators from different institutions. External validation is a more rigorous procedure necessary for evaluating whether the predictive model will generalize to populations other than the one on which it was developed. We stress the need for an external dataset to be truly external, that is, to play no role in model development and ideally be completely unavailable to the researchers building the model. In addition to reviewing different types of validation, we describe different types and features of predictive models and strategies for model building, as well as measures appropriate for assessing their performance in the context of validation. No single measure can characterize the different components of the prediction, and the use of multiple summary measures is recommended.
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status.
We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR).
Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software.
Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility.
This study was designed to (1) describe the demographics and (2) determine the efficacy of a head and neck cancer screening program in order to optimize future programs.
After IRB approval, we conducted a retrospective cohort study to review a single institution’s 14-year experience (1996–2009) conducting a free annual head and neck cancer screening clinic. Available demographic and clinical data, as well as clinical outcomes were analyzed for all participants (n=761). The primary outcome was the presence of a finding suspicious for head and neck cancer on screening evaluation.
Five percent of participants had findings suspicious for head and neck cancer on screening evaluation, and malignant or pre-malignant lesions were confirmed in one percent of participants. Lack of insurance (p=.05), tobacco use (p<.001), male gender (p=.03), separated marital status (p=.03), and younger age (p=.04) were the significant demographic predictors of a lesion suspicious for malignancy. Patients complaining of a neck mass (p<.001) or oral pain (p<.001) were significantly more likely to have findings suspicious of malignancy. A high percentage (40%) was diagnosed with benign otolaryngologic pathologies on screening evaluation.
A minority of patients presenting to a head and neck cancer screening clinic will have a suspicious lesion identified. Given these findings, in order to achieve maximal potential benefit, future head and neck cancer screening clinics should target patients with identifiable risk factors and take full advantage of opportunities for education and prevention.
This study is designed to (1) determine the perceived quality of care received by patients with head and neck cancer at the end of their lives, in order to (2) better anticipate and improve upon the experiences of future patients.
Single-institution, academic tertiary care medical center.
Subjects and Methods
A validated survey instrument, the Family Assessment of Treatment at the End of life (FATE), was administered to families of patients who died of head and neck cancer (n=58). The primary outcome was the overall FATE score. Independent variables included clinical characteristics, treatments received and the care provided at the time of death.
Overall FATE scores and the domains assessing management of symptoms and care at the time of death did not vary by disease status (logoregional vs. distant metastasis) at the end of life (p=.989). The location of death in the home or in hospice (vs. hospital) significantly improves scores in all three categories (p=.023). Involvement of a palliative care team improved the care at the time of death (p<.001), and palliative treatments (radiation and/or chemotherapy) improved scores in management of symptoms and care at the time of death (p=.011, p=.017).
The FATE survey is a useful measure of the end of life experience of head and neck cancer patients. Palliative treatments of head and neck cancer, death outside of the hospital and palliative care team involvement all improve the end of life experience in this population.
Head and neck cancer; quality of life; end of life care
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method.
Censored data; Constrained nonparametric maximum likelihood estimator; Kaplan–Meier estimator; Maximum likelihood estimator; Order restriction
Patients who were previously treated for prostate cancer with radiation therapy are monitored at regular intervals using a laboratory test called Prostate Specific Antigen (PSA). If the value of the PSA test starts to rise, this is an indication that the prostate cancer is more likely to recur, and the patient may wish to initiate new treatments. Such patients could be helped in making medical decisions by an accurate estimate of the probability of recurrence of the cancer in the next few years. In this paper, we describe the methodology for giving the probability of recurrence for a new patient, as implemented on a web-based calculator. The methods use a joint longitudinal survival model. The model is developed on a training dataset of 2,386 patients and tested on a dataset of 846 patients. Bayesian estimation methods are used with one Markov chain Monte Carlo (MCMC) algorithm developed for estimation of the parameters from the training dataset and a second quick MCMC developed for prediction of the risk of recurrence that uses the longitudinal PSA measures from a new patient.
Joint longitudinal-survival model; Online calculator; Predicted probability; Prostate cancer; PSA
Randomized trials with dropouts or censored data and discrete time-to-event type outcomes are frequently analyzed using the Kaplan–Meier or product limit (PL) estimation method. However, the PL method assumes that the censoring mechanism is noninformative and when this assumption is violated, the inferences may not be valid. We propose an expanded PL method using a Bayesian framework to incorporate informative censoring mechanism and perform sensitivity analysis on estimates of the cumulative incidence curves. The expanded method uses a model, which can be viewed as a pattern mixture model, where odds for having an event during the follow-up interval (tk−1,tk], conditional on being at risk at tk−1, differ across the patterns of missing data. The sensitivity parameters relate the odds of an event, between subjects from a missing-data pattern with the observed subjects for each interval. The large number of the sensitivity parameters is reduced by considering them as random and assumed to follow a log-normal distribution with prespecified mean and variance. Then we vary the mean and variance to explore sensitivity of inferences. The missing at random (MAR) mechanism is a special case of the expanded model, thus allowing exploration of the sensitivity to inferences as departures from the inferences under the MAR assumption. The proposed approach is applied to data from the TRial Of Preventing HYpertension.
Clinical trials; Hypertension; Ignorability index; Missing data; Pattern-mixture model; TROPHY trial
Motivation: RNA sequencing (RNA-Seq) is a powerful new technology for mapping and quantifying transcriptomes using ultra high-throughput next-generation sequencing technologies. Using deep sequencing, gene expression levels of all transcripts including novel ones can be quantified digitally. Although extremely promising, the massive amounts of data generated by RNA-Seq, substantial biases and uncertainty in short read alignment pose challenges for data analysis. In particular, large base-specific variation and between-base dependence make simple approaches, such as those that use averaging to normalize RNA-Seq data and quantify gene expressions, ineffective.
Results: In this study, we propose a Poisson mixed-effects (POME) model to characterize base-level read coverage within each transcript. The underlying expression level is included as a key parameter in this model. Since the proposed model is capable of incorporating base-specific variation as well as between-base dependence that affect read coverage profile throughout the transcript, it can lead to improved quantification of the true underlying expression level.
Availability and implementation: POME can be freely downloaded at http://www.stat.purdue.edu/~yuzhu/pome.html.
Contact: email@example.com; firstname.lastname@example.org
Supplementary information: Supplementary data are available at Bioinformatics online.
Genetic anticipation, described by earlier age of onset (AOO) and more aggressive symptoms in successive generations, is a phenomenon noted in certain hereditary diseases. Its extent may vary between families and/or between mutation sub-types known to be associated with the disease phenotype. In this paper, we posit a Bayesian approach to infer genetic anticipation under flexible random effects models for censored data that capture the effect of successive generations on AOO. Primary interest lies in the random effects. Misspecifying the distribution of random effects may result in incorrect inferential conclusions. We compare the fit of four candidate random effects distributions via Bayesian model fit diagnostics. A related statistical issue here is isolating the confounding effect of changes in secular trends, screening and medical practices that may affect time to disease detection across birth cohorts. Using historic cancer registry data, we borrow from relative survival analysis methods to adjust for changes in age-specific incidence across birth cohorts. Our motivating case-study comes from a Danish cancer register of 124 families with mutations in mismatch repair genes known to cause hereditary non-polyposis colorectal cancer, also called Lynch syndrome. We find evidence for a decrease in AOO between generations in this study. Our model predicts family level anticipation effects which are potentially useful in genetic counseling clinics for high risk families.
Birth-death process; Brier score; Conditional predictive ordinate; Deviance information criterion; Dirichlet Process; Hereditary non-polyposis colorectal cancer; Prediction of random effects; Relative survival analysis
In longitudinal biomedical studies, there is often interest in the rate functions, which describe the functional rates of change of biomarker profiles. This paper proposes a semiparametric approach to model these functions as the realizations of stochastic processes defined by stochastic differential equations. These processes are dependent on the covariates of interest and vary around a specified parametric function. An efficient Markov chain Monte Carlo algorithm is developed for inference. The proposed method is compared with several existing methods in terms of goodness-of-fit and more importantly the ability to forecast future functional data in a simulation study. The proposed methodology is applied to prostate-specific antigen profiles for illustration. Supplementary materials for this paper are available online.
Euler approximation; Functional data analysis; Gaussian process; Rate function; Stochastic differential equation; Semiparametric stochastic velocity model
In clinical trials, a biomarker (S) that is measured after randomization and is strongly associated with the true endpoint (T) can often provide information about T and hence the effect of a treatment (Z) on T. A useful biomarker can be measured earlier than T and cost less than T. In this paper we consider the use of S as an auxiliary variable and examine the information recovery from using S for estimating the treatment effect on T, when S is completely observed and T is partially observed. In an ideal but often unrealistic setting, when S satisfies Prentice’s definition for perfect surrogacy, there is the potential for substantial gain in precision by using data from S to estimate the treatment effect on T. When S is not close to a perfect surrogate, it can provide substantial information only under particular circumstances. We propose to use a targeted shrinkage regression approach that data-adaptively takes advantage of the potential efficiency gain yet avoids the need to make a strong surrogacy assumption. Simulations show that this approach strikes a balance between bias and efficiency gain. Compared with competing methods, it has better mean squared error properties and can achieve substantial efficiency gain, particularly in a common practical setting when S captures much but not all of the treatment effect and the sample size is relatively small. We apply the proposed method to a glaucoma data example.
Auxiliary Variable; Biomarker; Randomized Trials; Ridge Regression; Missing Data
This paper presents a new modeling strategy in functional data analysis. We consider the problem of estimating an unknown smooth function given functional data with noise. The unknown function is treated as the realization of a stochastic process, which is incorporated into a diffusion model. The method of smoothing spline estimation is connected to a special case of this approach. The resulting models offer great flexibility to capture the dynamic features of functional data, and allow straightforward and meaningful interpretation. The likelihood of the models is derived with Euler approximation and data augmentation. A unified Bayesian inference method is carried out via a Markov Chain Monte Carlo algorithm including a simulation smoother. The proposed models and methods are illustrated on some prostate specific antigen data, where we also show how the models can be used for forecasting.
Diffusion model; Euler approximation; Nonparametric regression; Simulation smoother; Stochastic differential equation; Stochastic velocity model
Intermediate outcome variables can often be used as auxiliary variables for the true outcome of interest in randomized clinical trials. For many cancers, time to recurrence is an informative marker in predicting a patient’s overall survival outcome, and could provide auxiliary information for the analysis of survival times.
To investigate whether models linking recurrence and death combined with a multiple imputation procedure for censored observations can result in efficiency gains in the estimation of treatment effects, and be used to shorten trial lengths.
Recurrence and death times are modeled using data from 12 trials in colorectal cancer. Multiple imputation is used as a strategy for handling missing values arising from censoring. The imputation procedure uses a cure model for time to recurrence and a time-dependent Weibull proportional hazards model for time to death. Recurrence times are imputed, and then death times are imputed conditionally on recurrence times. To illustrate these methods, trials are artificially censored 2-years after the last accrual, the imputation procedure is implemented, and a log-rank test and Cox model are used to analyze and compare these new data with the original data.
The results show modest, but consistent gains in efficiency in the analysis by using the auxiliary information in recurrence times. Comparison of analyses show the treatment effect estimates and log rank test results from the 2-year censored imputed data to be in between the estimates from the original data and the artificially censored data, indicating that the procedure was able to recover some of the lost information due to censoring.
The models used are all fully parametric, requiring distributional assumptions of the data.
The proposed models may be useful to improve the efficiency in estimation of treatment effects in cancer trials and shortening trial length.
Auxiliary Variables; Colon Cancer; Cure Models; Multiple Imputation; Surrogate Endpoints
Diet is associated with cancer prognosis, including head and neck cancer (HNC), and has been hypothesized to influence epigenetic state by determining the availability of functional groups involved in the modification of DNA and histone proteins. The goal of this study was to describe the association between pretreatment diet and HNC tumor DNA methylation. Information on usual pretreatment food and nutrient intake was estimated via food frequency questionnaire (FFQ) on 49 HNC cases. Tumor DNA methylation patterns were assessed using the Illumina Goldengate Methylation Cancer Panel. First, a methylation score, the sum of individual hypermethylated tumor suppressor associated CpG sites, was calculated and associated with dietary intake of micronutrients involved in one-carbon metabolism and antioxidant activity, and food groups abundant in these nutrients. Second, gene specific analyses using linear modeling with empirical Bayesian variance estimation were conducted to identify if methylation at individual CpG sites was associated with diet. All models were controlled for age, sex, smoking, alcohol and HPV status. Individuals reporting in the highest quartile of folate, vitamin B12 and vitamin A intake, compared with those in the lowest quartile, showed significantly less tumor suppressor gene methylation, as did patients reporting the highest cruciferous vegetable intake. Gene specific analyses identified differential associations between DNA methylation and vitamin B12 and vitamin A intake when stratifying by HPV status. These preliminary results suggest that intake of folate, vitamin A and vitamin B12 may be associated with the tumor DNA methylation profile in HNC and enhance tumor suppression.
DNA methylation; diet; tumor suppressor; folate; vitamin B12
When the true end points (T) are difficult or costly to measure, surrogate markers (S) are often collected in clinical trials to help predict the effect of the treatment (Z). There is great interest in understanding the relationship among S, T, and Z. A principal stratification (PS) framework has been proposed by Frangakis and Rubin (2002) to study their causal associations. In this paper, we extend the framework to a multiple trial setting and propose a Bayesian hierarchical PS model to assess surrogacy. We apply the method to data from a large collection of colon cancer trials in which S and T are binary. We obtain the trial-specific causal measures among S, T, and Z, as well as their overall population-level counterparts that are invariant across trials. The method allows for information sharing across trials and reduces the nonidentifiability problem. We examine the frequentist properties of our model estimates and the impact of the monotonicity assumption using simulations. We also illustrate the challenges in evaluating surrogacy in the counterfactual framework that result from nonidentifiability.
Bayesian estimation; Counterfactual model; Identifiability; Multiple trials; Principal stratification; Surrogate marker
A surrogate marker (S) is a variable that can be measured earlier and often easier than the true endpoint (T) in a clinical trial. Most previous research has been devoted to developing surrogacy measures to quantify how well S can replace T or examining the use of S in predicting the effect of a treatment (Z). However, the research often requires one to fit models for the distribution of T given S and Z. It is well known that such models do not have causal interpretations because the models condition on a post-randomization variable S. In this paper, we directly model the relationship among T, S and Z using a potential outcomes framework introduced by Frangakis and Rubin (2002). We propose a Bayesian estimation method to evaluate the causal probabilities associated with the cross-classification of the potential outcomes of S and T when S and T are both binary. We use a log-linear model to directly model the association between the potential outcomes of S and T through the odds ratios. The quantities derived from this approach always have causal interpretations. However, this causal model is not identifiable from the data without additional assumptions. To reduce the non-identifiability problem and increase the precision of statistical inferences, we assume monotonicity and incorporate prior belief that is plausible in the surrogate context by using prior distributions. We also explore the relationship among the surrogacy measures based on traditional models and this counterfactual model. The method is applied to the data from a glaucoma treatment study.
Bayesian Estimation; Counterfactual Model; Randomized Trial; Surrogate Marker
Purpose. Screening for depression, sleep-related disturbances, and anxiety in patients with diagnosed adenocarcinoma of the pancreas. Materials and Methods. Patients were evaluated at initial consultation and subsequent visits at the multidisciplinary pancreatic cancer clinic at our University Cancer Center. Cross-sectional and longitudinal psychosocial distress was assessed utilizing Personal Health Questionnaire 9 (PHQ9) to screen for depression and monitor symptoms, the Penn State Worry Questionnaire (PSWQ) for generalized anxiety, and the University of Michigan Sleep Questionnaire to monitor sleep symptoms. Results. Twenty-two patients diagnosed with pancreatic cancer participated during the 6-month pilot study with longitudinal followup for thirteen patients. In this study, mild-to-moderate depressive symptoms, anxiety, and potential sleep problems were common. The main finding of the study was 23% of the patients who were part of this pilot project screened positive for moderately severe major depressive symptoms, likely anxiety disorder or a potential sleep disorder during the study. One patient screened positive for moderately severe depressive symptoms in longitudinal followup. Conclusions. Depression, anxiety, and sleep problems are evident in patients with pancreatic cancer. Prospective, longitudinal studies, with larger groups of patients, are needed to determine if these comorbid symptoms impact outcome and clinical course.
There has been substantive interest in the assessment of surrogate endpoints in medical research. These are measures which could potentially replace “true” endpoints in clinical trials and lead to studies that require less follow-up. Recent research in the area has focused on assessments using causal inference frameworks. Beginning with a simple model for associating the surrogate and true endpoints in the population, we approach the problem as one of endogenous covariates. An instrumental variables estimator and general two-stage algorithm is proposed. Existing surrogacy frameworks are then evaluated in the context of the model. In addition, we define an extended relative effect estimator as well as a sensitivity analysis for assessing what we term the treatment instrumentality assumption. A numerical example is used to illustrate the methodology.
Clinical Trial; Counterfactual; Nonlinear response; Prentice Criterion; Structural equations model
We consider using observational data to estimate the effect of a treatment on disease recurrence, when the decision to initiate treatment is based on longitudinal factors associated with the risk of recurrence. The effect of salvage androgen deprivation therapy (SADT) on the risk of recurrence of prostate cancer is inadequately described by existing literature. Furthermore, standard Cox regression yields biased estimates of the effect of SADT, since it is necessary to adjust for prostate-specific antigen (PSA), which is a time-dependent confounder and an intermediate variable. In this paper, we describe and compare two methods which appropriately adjust for PSA in estimating the effect of SADT. The first method is a two-stage method which jointly estimates the effect of SADT and the hazard of recurrence in the absence of treatment by SADT. In the first stage, PSA is predicted in the absence of SADT, and in the second stage, a time-dependent Cox model is used to estimate the benefit of SADT, adjusting for PSA. The second method, called sequential stratification, reorganizes the data to resemble a sequence of experiments in which treatment is conditionally randomized given the time-dependent covariates. Strata are formed, each consisting of a patient undergoing SADT and a set of appropriately matched controls, and analysis proceeds via stratified Cox regression. Both methods are applied to data from patients initially treated with radiation therapy for prostate cancer and give similar SADT effect estimates.
treatment by indication; time-dependent confounder; proportional hazards model; causal effect; prostate cancer
Prostate-specific antigen (PSA) is a biomarker routinely and repeatedly measured on prostate cancer patients treated by radiation therapy (RT). It was shown recently that its whole pattern over time rather than just its current level was strongly associated with prostate cancer recurrence. To more accurately guide clinical decision making, monitoring of PSA after RT would be aided by dynamic powerful prognostic tools that incorporate the complete posttreatment PSA evolution. In this work, we propose a dynamic prognostic tool derived from a joint latent class model and provide a measure of variability obtained from the parameters asymptotic distribution. To validate this prognostic tool, we consider predictive accuracy measures and provide an empirical estimate of their variability. We also show how to use them in the longitudinal context to compare the dynamic prognostic tool we developed with a proportional hazard model including either baseline covariates or baseline covariates and the expected level of PSA at the time of prediction in a landmark model. Using data from 3 large cohorts of patients treated after the diagnosis of prostate cancer, we show that the dynamic prognostic tool based on the joint model reduces the error of prediction and offers a powerful tool for individual prediction.
Error of prediction; Joint latent class model; Mixed model; Posterior probability; Predictive accuracy; Prostate cancer prognosis
The treatment effect of a colorectal polyp prevention trial is often evaluated from the colorectal adenoma recurrence status at the end of the trial. However, early colonoscopy from some participants complicates estimation of the final study end recurrence rate. The early colonoscopy could be informative of status of recurrence and induce informative differential follow-up into the data. In this paper we use mid-point imputation to handle interval-censored observations. We then apply a weighted Kaplan-Meier method to the imputed data to adjust for potential informative differential follow-up, while estimating the recurrence rate at the end of the trial. In addition, we modify the weighted Kaplan-Meier method to handle a situation with multiple prognostic covariates by deriving a risk score of recurrence from a working logistic regression model and then use the risk score to define risk groups to perform weighted Kaplan-Meier estimation. We argue that mid-point imputation will produce an unbiased estimate of recurrence rate at the end of the trial under an assumption that censoring only depends on the status of early colonoscopy. The method described here is illustrated with an example from a colon polyp prevention study.
current status data; mid-point imputation; weighted Kaplan-Meier estimator