PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1060536)

Clipboard (0)
None

Related Articles

1.  Quantile Regression for Recurrent Gap Time Data 
Biometrics  2013;69(2):375-385.
Summary
Evaluating covariate effects on gap times between successive recurrent events is of interest in many medical and public health studies. While most existing methods for recurrent gap time analysis focus on modeling the hazard function of gap times, a direct interpretation of the covariate effects on the gap times is not available through these methods. In this article, we consider quantile regression that can provide direct assessment of covariate effects on the quantiles of the gap time distribution. Following the spirit of the weighted risk-set method by Luo and Huang (2011, Statistics in Medicine 30, 301–311), we extend the martingale-based estimating equation method considered by Peng and Huang (2008, Journal of the American Statistical Association 103, 637–649) for univariate survival data to analyze recurrent gap time data. The proposed estimation procedure can be easily implemented in existing software for univariate censored quantile regression. Uniform consistency and weak convergence of the proposed estimators are established. Monte Carlo studies demonstrate the effectiveness of the proposed method. An application to data from the Danish Psychiatric Central Register is presented to illustrate the methods developed in this article.
doi:10.1111/biom.12010
PMCID: PMC4123128  PMID: 23489055
Clustered survival data; Data perturbation; Gap times; Quantile regression; Recurrent events; Within-cluster resampling
2.  Statistical Methods for Analyzing Right-censored Length-biased Data under Cox Model 
Biometrics  2009;66(2):382-392.
Summary
Length-biased time-to-event data are commonly encountered in applications ranging from epidemiologic cohort studies or cancer prevention trials to studies of labor economy. A longstanding statistical problem is how to assess the association of risk factors with survival in the target population given the observed length-biased data. In this paper, we demonstrate how to estimate these effects under the semiparametric Cox proportional hazards model. The structure of the Cox model is changed under length-biased sampling in general. Although the existing partial likelihood approach for left-truncated data can be used to estimate covariate effects, it may not be efficient for analyzing length-biased data. We propose two estimating equation approaches for estimating the covariate coefficients under the Cox model. We use the modern stochastic process and martingale theory to develop the asymptotic properties of the estimators. We evaluate the empirical performance and efficiency of the two methods through extensive simulation studies. We use data from a dementia study to illustrate the proposed methodology, and demonstrate the computational algorithms for point estimates, which can be directly linked to the existing functions in S-PLUS or R.
doi:10.1111/j.1541-0420.2009.01287.x
PMCID: PMC3035941  PMID: 19522872
Cox model; Dependent censoring; Estimating equation; Length-biased
3.  Empirical comparison of methods for analyzing multiple time-to-event outcomes in a non-inferiority trial: a breast cancer study 
Background
Subjects with breast cancer enrolled in trials may experience multiple events such as local recurrence, distant recurrence or death. These events are not independent; the occurrence of one may increase the risk of another, or prevent another from occurring. The most commonly used Cox proportional hazards (Cox-PH) model ignores the relationships between events, resulting in a potential impact on the treatment effect and conclusions. The use of statistical methods to analyze multiple time-to-event events has mainly been focused on superiority trials. However, their application to non-inferiority trials is limited. We evaluate four statistical methods for multiple time-to-event endpoints in the context of a non-inferiority trial.
Methods
Three methods for analyzing multiple events data, namely, i) the competing risks (CR) model, ii) the marginal model, and iii) the frailty model were compared with the Cox-PH model using data from a previously-reported non-inferiority trial comparing hypofractionated radiotherapy with conventional radiotherapy for the prevention of local recurrence in patients with early stage breast cancer who had undergone breast conserving surgery. These methods were also compared using two simulated examples, scenario A where the hazards for distant recurrence and death were higher in the control group, and scenario B. where the hazards of distant recurrence and death were higher in the experimental group. Both scenarios were designed to have a non-inferiority margin of 1.50.
Results
In the breast cancer trial, the methods produced primary outcome results similar to those using the Cox-PH model: namely, a local recurrence hazard ratio (HR) of 0.95 and a 95% confidence interval (CI) of 0.62 to 1.46. In Scenario A, non-inferiority was observed with the Cox-PH model (HR = 1.04; CI of 0.80 to 1.35), but not with the CR model (HR = 1.37; CI of 1.06 to 1.79), and the average marginal and frailty model showed a positive effect of the experimental treatment. The results in Scenario A contrasted with Scenario B with non-inferiority being observed with the CR model (HR = 1.10; CI of 0.87 to 1.39), but not with the Cox-PH model (HR = 1.46; CI of 1.15 to 1.85), and the marginal and frailty model showed a negative effect of the experimental treatment.
Conclusion
When subjects are at risk for multiple events in non-inferiority trials, researchers need to consider using the CR, marginal and frailty models in addition to the Cox-PH model in order to provide additional information in describing the disease process and to assess the robustness of the results. In the presence of competing risks, the Cox-PH model is appropriate for investigating the biologic effect of treatment, whereas the CR models yields the actual effect of treatment in the study.
doi:10.1186/1471-2288-13-44
PMCID: PMC3610213  PMID: 23517401
Non-inferiority; Cox model; Correlation; Marginal model; Frailty model; Competing risks
4.  Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models 
Statistica Sinica  2012;22(2):509-530.
In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of “robust” residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as “robust” censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.
doi:10.5705/ss.2010.109
PMCID: PMC3697158  PMID: 23825917
Censored linear regression; Goodness-of-fit; Partial linear model; Partial residual; Quantile regression; Resampling method; Rank estimation
5.  Statistical modelling for recurrent events: an application to sports injuries 
British Journal of Sports Medicine  2012;48(17):1287-1293.
Background
Injuries are often recurrent, with subsequent injuries influenced by previous occurrences and hence correlation between events needs to be taken into account when analysing such data.
Objective
This paper compares five different survival models (Cox proportional hazards (CoxPH) model and the following generalisations to recurrent event data: Andersen-Gill (A-G), frailty, Wei-Lin-Weissfeld total time (WLW-TT) marginal, Prentice-Williams-Peterson gap time (PWP-GT) conditional models) for the analysis of recurrent injury data.
Methods
Empirical evaluation and comparison of different models were performed using model selection criteria and goodness-of-fit statistics. Simulation studies assessed the size and power of each model fit.
Results
The modelling approach is demonstrated through direct application to Australian National Rugby League recurrent injury data collected over the 2008 playing season. Of the 35 players analysed, 14 (40%) players had more than 1 injury and 47 contact injuries were sustained over 29 matches. The CoxPH model provided the poorest fit to the recurrent sports injury data. The fit was improved with the A-G and frailty models, compared to WLW-TT and PWP-GT models.
Conclusions
Despite little difference in model fit between the A-G and frailty models, in the interest of fewer statistical assumptions it is recommended that, where relevant, future studies involving modelling of recurrent sports injury data use the frailty model in preference to the CoxPH model or its other generalisations. The paper provides a rationale for future statistical modelling approaches for recurrent sports injury.
doi:10.1136/bjsports-2011-090803
PMCID: PMC4145455  PMID: 22872683
6.  A basis approach to goodness-of-fit testing in recurrent event models 
A class of tests for the hypothesis that the baseline hazard function in Cox’s proportional hazards model and for a general recurrent event model belongs to a parametric family C ≡ {λ0(·; ξ): ξ ∈ Ξ} is proposed. Finite properties of the tests are examined via simulations, while asymptotic properties of the tests under a contiguous sequence of local alternatives are studied theoretically. An application of the tests to the general recurrent event model, which is an extended minimal repair model admitting covariates, is demonstrated. In addition, two real data sets are used to illustrate the applicability of the proposed tests.
doi:10.1016/j.jspi.2004.03.022
PMCID: PMC1563443  PMID: 16967104
Counting process; Goodness-of-fit test; Minimal repair model; Neyman’s test; Nonhomogeneous Poisson process; Repairable system; Score test
7.  Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment 
PLoS Computational Biology  2013;9(3):e1002975.
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/.
Author Summary
Network-based computational models are attracting increasing attention in studying cancer genomics because molecular networks provide valuable information on the functional organizations of molecules in cells. Survival analysis mostly with the Cox proportional hazard model is widely used to predict or correlate gene expressions with time to an event of interest (outcome) in cancer genomics. Surprisingly, network-based survival analysis has not received enough attention. In this paper, we studied resistance to chemotherapy in ovarian cancer with a network-based Cox model, called Net-Cox. The experiments confirm that networks representing gene co-expression or functional relations can be used to improve the accuracy and the robustness of survival prediction of outcome in ovarian cancer treatment. The study also revealed subnetwork signatures that are enriched by extracellular matrix receptors and modulators and the downstream nuclear signaling components of extracellular signal-regulators, respectively. In particular, FBN1, which was detected as a signature gene of high confidence by Net-Cox with network information, was validated as a biomarker for predicting early recurrence in platinum-sensitive ovarian cancer patients in laboratory.
doi:10.1371/journal.pcbi.1002975
PMCID: PMC3605061  PMID: 23555212
8.  An Estimating Function Approach to the Analysis of Recurrent and Terminal Events 
Biometrics  2013;69(2):366-374.
Summary
In clinical and observational studies, the event of interest can often recur on the same subject. In a more complicated situation, there exists a terminal event (e.g. death) which stops the recurrent event process. In many such instances, the terminal event is strongly correlated with the recurrent event process. We consider the recurrent/terminal event setting and model the dependence through a shared gamma frailty that is included in both the recurrent event rate and terminal event hazard functions. Conditional on the frailty, a model is specified only for the marginal recurrent event process, hence avoiding the strong Poisson-type assumptions traditionally used. Analysis is based on estimating functions that allow for estimation of covariate effects on the recurrent event rate and terminal event hazard. The method also permits estimation of the degree of association between the two processes. Closed-form asymptotic variance estimators are proposed. The proposed method is evaluated through simulations to assess the applicability of the asymptotic results in finite samples and the sensitivity of the method to its underlying assumptions. The methods can be extended in straightforward ways to accommodate multiple types of recurrent and terminal events. Finally, the methods are illustrated in an analysis of hospitalization data for patients in an international multi-center study of outcomes among dialysis patients.
doi:10.1111/biom.12025
PMCID: PMC3692576  PMID: 23651362
Cox model; Frailty; Marginal rate function; Multivariate survival; Relative risk; Semiparametric methods
9.  Mark-specific proportional hazards model with multivariate continuous marks and its application to HIV vaccine efficacy trials 
For time-to-event data with finitely many competing risks, the proportional hazards model has been a popular tool for relating the cause-specific outcomes to covariates (Prentice and others, 1978. The analysis of failure time in the presence of competing risks. Biometrics 34, 541–554). Inspired by previous research in HIV vaccine efficacy trials, the cause of failure is replaced by a continuous mark observed only in subjects who fail. This article studies an extension of this approach to allow a multivariate continuum of competing risks, to better account for the fact that the candidate HIV vaccines tested in efficacy trials have contained multiple HIV sequences, with a purpose to elicit multiple types of immune response that recognize and block different types of HIV viruses. We develop inference for the proportional hazards model in which the regression parameters depend parametrically on the marks, to avoid the curse of dimensionality, and the baseline hazard depends nonparametrically on both time and marks. Goodness-of-fit tests are constructed based on generalized weighted martingale residuals. The finite-sample performance of the proposed methods is examined through extensive simulations. The methods are applied to a vaccine efficacy trial to examine whether and how certain antigens represented inside the vaccine are relevant for protection or anti-protection against the exposing HIVs.
doi:10.1093/biostatistics/kxs022
PMCID: PMC3520499  PMID: 22764174
Competing risks; Failure time data; Goodness-of-fit test; HIV vaccine trial; Hypothesis testing; Mark-specific relative risk; Multivariate data; Partial likelihood estimation; Semiparametric model; STEP trial
10.  Recurrent events and the exploding Cox model 
Lifetime data analysis  2010;16(4):525-546.
Counting process models have played an important role in survival and event history analysis for more than 30 years. Nevertheless, almost all models that are being used have a very simple structure. Analyzing recurrent events invites the application of more complex models with dynamic covariates. We discuss how to define valid models in such a setting. One has to check carefully that a suggested model is well defined as a stochastic process. We give conditions for this to hold. Some detailed discussion is presented in relation to a Cox type model, where the exponential structure combined with feedback lead to an exploding model. In general, counting process models with dynamic covariates can be formulated to avoid explosions. In particular, models with a linear feedback structure do not explode, making them useful tools in general modeling of recurrent events.
doi:10.1007/s10985-010-9180-y
PMCID: PMC4066394  PMID: 20625827
Recurrent events; Cox regression; Explosion; Honest process; Birth process; Aalen regression; Stochastic differential equation; Lipschitz condition; Feller criterion; Martingale problem
11.  Dynamic regression hazards models for relative survival 
Statistics in medicine  2008;27(18):3563-3584.
SUMMARY
A natural way of modelling relative survival through regression analysis is to assume an additive form between the expected population hazard and the excess hazard due to the presence of an additional cause of mortality. Within this context, the existing approaches in the parametric, semiparametric and non-parametric setting are compared and discussed. We study the additive excess hazards models, where the excess hazard is on additive form. This makes it possible to assess the importance of time-varying effects for regression models in the relative survival framework. We show how recent developments can be used to make inferential statements about the non-parametric version of the model. This makes it possible to test the key hypothesis that an excess risk effect is time varying in contrast to being constant over time. In case some covariate effects are constant, we show how the semiparametric additive risk model can be considered in the excess risk setting, providing a better and more useful summary of the data. Estimators have explicit form and inference based on a resampling scheme is presented for both the non-parametric and semiparametric models. We also describe a new suggestion for goodness of fit of relative survival models, which consists on statistical and graphical tests based on cumulative martingale residuals. This is illustrated on the semiparametric model with proportional excess hazards. We analyze data from the TRACE study using different approaches and show the need for more flexible models in relative survival.
doi:10.1002/sim.3242
PMCID: PMC2737139  PMID: 18338318
12.  Impact of Cyclooxygenase Inhibitors in the Women's Health Initiative Hormone Trials: Secondary Analysis of a Randomized Trial 
PLoS Clinical Trials  2006;1(5):e26.
Objectives:
We evaluated the hypothesis that cyclooxygenase (COX) inhibitor use might have counteracted a beneficial effect of postmenopausal hormone therapy, and account for the absence of cardioprotection in the Women's Health Initiative hormone trials. Estrogen increases COX expression, and inhibitors of COX such as nonsteroidal anti-inflammatory agents appear to increase coronary risk, raising the possibility of a clinically important interaction in the trials.
Design:
The hormone trials were randomized, double-blind, and placebo-controlled. Use of nonsteroidal anti-inflammatory drugs was assessed at baseline and at years 1, 3, and 6.
Setting:
The Women's Health Initiative hormone trials were conducted at 40 clinical sites in the United States.
Participants:
The trials enrolled 27,347 postmenopausal women, aged 50–79 y.
Interventions:
We randomized 16,608 women with intact uterus to conjugated estrogens 0.625 mg with medroxyprogesterone acetate 2.5 mg daily or to placebo, and 10,739 women with prior hysterectomy to conjugated estrogens 0.625 mg daily or placebo.
Outcome Measures:
Myocardial infarction, coronary death, and coronary revascularization were ascertained during 5.6 y of follow-up in the estrogen plus progestin trial and 6.8 y of follow-up in the estrogen alone trial.
Results:
Hazard ratios with 95% confidence intervals were calculated from Cox proportional hazard models stratified by COX inhibitor use. The hazard ratio for myocardial infarction/coronary death with estrogen plus progestin was 1.13 (95% confidence interval 0.68–1.89) among non-users of COX inhibitors, and 1.35 (95% confidence interval 0.86–2.10) among continuous users. The hazard ratio with estrogen alone was 0.92 (95% confidence interval 0.57–1.48) among non-users of COX inhibitors, and 1.08 (95% confidence interval 0.69–1.70) among continuous users. In a second analytic approach, hazard ratios were calculated from Cox models that included hormone trial assignment as well as a time-dependent covariate for medication use, and an interaction term. No significant interaction was identified.
Conclusions:
Use of COX inhibitors did not significantly affect the Women's Health Initiative hormone trial results.
Editorial Commentary
Background: As part of a set of studies known as the Women's Health Initiative trials, investigators aimed to find out whether providing postmenopausal hormone therapy (estrogen in the case of women who had had a hysterectomy, and estrogen plus progestin for women who had not had a hysterectomy) reduced cardiovascular risk as compared to placebo. Earlier observational studies had suggested this might be the case. The trials found that postmenopausal hormone therapy did not reduce cardiovascular risk in the groups studied. However, there was a concern that medication use outside the trial with nonsteroidal anti-inflammatory drugs (NSAIDs), and specifically the type of NSAID known as COX-2 inhibitors, could have affected the findings. This concern arose because it is known that COX-2 inhibition lowers levels of prostacyclin, a molecule thought to be beneficial to cardiovascular health, whereas estrogen increases prostacyclin levels. Evidence from randomized trials and observational studies has also shown that patients treated with some COX-2 inhibitors are at increased risk of heart attacks and strokes; the cardiovascular safety of other NSAIDs is also the focus of great attention. Therefore, the authors of this paper aimed to do a statistical exploration of the data from the Women's Health Initiative hormone trials, to find out whether NSAID use by participants in the trials could have affected the trials' main findings.
What this trial shows: In this reanalysis of the original data from the trials, the investigators found that the effects of hormone therapy on cardiovascular outcomes were similar among users and non-users of NSAIDs, confirming that use of these drugs did not significantly affect the results from the Women's Health Initiative hormone trials.
Strengths and limitations: The original hormone trials were large, appropriately randomized studies that enrolled a diverse cohort of participants. Therefore, a large number of cardiovascular events occurred in the groups being compared, allowing this subsequent analysis to be done. One limitation is that use of COX-2 inhibitors in the trial was low; therefore, the investigators were not able to specifically test whether COX-2 inhibitor use (as opposed to NSAID use generally) might have affected their findings.
Contribution to the evidence: The investigators did not set out specifically to evaluate the cardiovascular safety of particular medications in this study. Rather, they wanted to see if these NSAIDs could have modified the effects of the hormone therapy. The secondary analysis done here shows that the main findings from the Women's Health Initiative hormone trials were not significantly affected by use of NSAIDs outside the trial.
doi:10.1371/journal.pctr.0010026
PMCID: PMC1584256  PMID: 17016543
13.  Testing the proportional hazards assumption in case-cohort analysis 
Background
Case-cohort studies have become common in epidemiological studies of rare disease, with Cox regression models the principal method used in their analysis. However, no appropriate procedures to assess the assumption of proportional hazards of case-cohort Cox models have been proposed.
Methods
We extended the correlation test based on Schoenfeld residuals, an approach used to evaluate the proportionality of hazards in standard Cox models. Specifically, pseudolikelihood functions were used to define “case-cohort Schoenfeld residuals”, and then the correlation of these residuals with each of three functions of event time (i.e., the event time itself, rank order, Kaplan-Meier estimates) was determined. The performances of the proposed tests were examined using simulation studies. We then applied these methods to data from a previously published case-cohort investigation of the insulin/IGF-axis and colorectal cancer.
Results
Simulation studies showed that each of the three correlation tests accurately detected non-proportionality. Application of the proposed tests to the example case-cohort investigation dataset showed that the Cox proportional hazards assumption was not satisfied for certain exposure variables in that study, an issue we addressed through use of available, alternative analytical approaches.
Conclusions
The proposed correlation tests provide a simple and accurate approach for testing the proportional hazards assumption of Cox models in case-cohort analysis. Evaluation of the proportional hazards assumption is essential since its violation raises questions regarding the validity of Cox model results which, if unrecognized, could result in the publication of erroneous scientific findings.
doi:10.1186/1471-2288-13-88
PMCID: PMC3710085  PMID: 23834739
Proportional hazards; Schoenfeld residuals; Case-cohort studies; Cox models
14.  Accounting for individual differences and timing of events: estimating the effect of treatment on criminal convictions in heroin users 
Background
The reduction of crime is an important outcome of opioid maintenance treatment (OMT). Criminal intensity and treatment regimes vary among OMT patients, but this is rarely adjusted for in statistical analyses, which tend to focus on cohort incidence rates and rate ratios. The purpose of this work was to estimate the relationship between treatment and criminal convictions among OMT patients, adjusting for individual covariate information and timing of events, fitting time-to-event regression models of increasing complexity.
Methods
National criminal records were cross linked with treatment data on 3221 patients starting OMT in Norway 1997–2003. In addition to calculating cohort incidence rates, criminal convictions was modelled as a recurrent event dependent variable, and treatment a time-dependent covariate, in Cox proportional hazards, Aalen’s additive hazards, and semi-parametric additive hazards regression models. Both fixed and dynamic covariates were included.
Results
During OMT, the number of days with criminal convictions for the cohort as a whole was 61% lower than when not in treatment. OMT was associated with reduced number of days with criminal convictions in all time-to-event regression models, but the hazard ratio (95% CI) was strongly attenuated when adjusting for covariates; from 0.40 (0.35, 0.45) in a univariate model to 0.79 (0.72, 0.87) in a fully adjusted model. The hazard was lower for females and decreasing with older age, while increasing with high numbers of criminal convictions prior to application to OMT (all p < 0.001). The strongest predictors were level of criminal activity prior to entering into OMT, and having a recent criminal conviction (both p < 0.001). The effect of several predictors was significantly time-varying with their effects diminishing over time.
Conclusions
Analyzing complex observational data regarding to fixed factors only overlooks important temporal information, and naïve cohort level incidence rates might result in biased estimates of the effect of interventions. Applying time-to-event regression models, properly adjusting for individual covariate information and timing of various events, allows for more precise and reliable effect estimates, as well as painting a more nuanced picture that can aid health care professionals and policy makers.
doi:10.1186/1471-2288-14-68
PMCID: PMC4040473  PMID: 24886472
Maintenance treatment; Criminal activity; Time-to-event; Recurring event; Time-dependent covariate; Dynamic covariate
15.  Gene–gene interaction analysis for the survival phenotype based on the Cox model 
Bioinformatics  2012;28(18):i582-i588.
Motivation: For the past few decades, many statistical methods in genome-wide association studies (GWAS) have been developed to identify SNP–SNP interactions for case-control studies. However, there has been less work for prospective cohort studies, involving the survival time. Recently, Gui et al. (2011) proposed a novel method, called Surv-MDR, for detecting gene–gene interactions associated with survival time. Surv-MDR is an extension of the multifactor dimensionality reduction (MDR) method to the survival phenotype by using the log-rank test for defining a binary attribute. However, the Surv-MDR method has some drawbacks in the sense that it needs more intensive computations and does not allow for a covariate adjustment. In this article, we propose a new approach, called Cox-MDR, which is an extension of the generalized multifactor dimensionality reduction (GMDR) to the survival phenotype by using a martingale residual as a score to classify multi-level genotypes as high- and low-risk groups. The advantages of Cox-MDR over Surv-MDR are to allow for the effects of discrete and quantitative covariates in the frame of Cox regression model and to require less computation than Surv-MDR.
Results: Through simulation studies, we compared the power of Cox-MDR with those of Surv-MDR and Cox regression model for various heritability and minor allele frequency combinations without and with adjusting for covariate. We found that Cox-MDR and Cox regression model perform better than Surv-MDR for low minor allele frequency of 0.2, but Surv-MDR has high power for minor allele frequency of 0.4. However, when the effect of covariate is adjusted for, Cox-MDR and Cox regression model perform much better than Surv-MDR. We also compared the performance of Cox-MDR and Surv-MDR for a real data of leukemia patients to detect the gene–gene interactions with the survival time.
Contact: leesy@sejong.ac.kr; tspark@snu.ac.kr
doi:10.1093/bioinformatics/bts415
PMCID: PMC3436842  PMID: 22962485
16.  Automatic validation of computational models using pseudo-3D spatio-temporal model checking 
BMC Systems Biology  2014;8(1):124.
Background
Computational models play an increasingly important role in systems biology for generating predictions and in synthetic biology as executable prototypes/designs. For real life (clinical) applications there is a need to scale up and build more complex spatio-temporal multiscale models; these could enable investigating how changes at small scales reflect at large scales and viceversa. Results generated by computational models can be applied to real life applications only if the models have been validated first. Traditional in silico model checking techniques only capture how non-dimensional properties (e.g. concentrations) evolve over time and are suitable for small scale systems (e.g. metabolic pathways). The validation of larger scale systems (e.g. multicellular populations) additionally requires capturing how spatial patterns and their properties change over time, which are not considered by traditional non-spatial approaches.
Results
We developed and implemented a methodology for the automatic validation of computational models with respect to both their spatial and temporal properties. Stochastic biological systems are represented by abstract models which assume a linear structure of time and a pseudo-3D representation of space (2D space plus a density measure). Time series data generated by such models is provided as input to parameterised image processing modules which automatically detect and analyse spatial patterns (e.g. cell) and clusters of such patterns (e.g. cellular population). For capturing how spatial and numeric properties change over time the Probabilistic Bounded Linear Spatial Temporal Logic is introduced. Given a collection of time series data and a formal spatio-temporal specification the model checker Mudi (http://mudi.modelchecking.org) determines probabilistically if the formal specification holds for the computational model or not. Mudi is an approximate probabilistic model checking platform which enables users to choose between frequentist and Bayesian, estimate and statistical hypothesis testing based validation approaches. We illustrate the expressivity and efficiency of our approach based on two biological case studies namely phase variation patterning in bacterial colony growth and the chemotactic aggregation of cells.
Conclusions
The formal methodology implemented in Mudi enables the validation of computational models against spatio-temporal logic properties and is a precursor to the development and validation of more complex multidimensional and multiscale models.
Electronic supplementary material
The online version of this article (doi:10.1186/s12918-014-0124-0) contains supplementary material, which is available to authorized users.
doi:10.1186/s12918-014-0124-0
PMCID: PMC4272535  PMID: 25440773
Stochastic spatial discrete event system (SSpDES); Probabilistic bounded linear spatial temporal logic (PBLSTL); Spatio-temporal; Multidimensional; Model checking; Mudi; Computational model; Model validation; Systems biology; Synthetic biology
17.  Application of Smoothing Methods for Determining of the Effecting Factors on the Survival Rate of Gastric Cancer Patients 
Background
Smoothing methods are widely used to analyze epidemiologic data, particularly in the area of environmental health where non-linear relationships are not uncommon. This study focused on three different smoothing methods in Cox models: penalized splines, restricted cubic splines and fractional polynomials.
Objectives
The aim of this study was to assess the effects of prognostic factors on survival of patients with gastric cancer using the smoothing methods in Cox model and Cox proportional hazards. Also, all models were compared to each other in order to find the best one.
Materials and Methods
We retrospectively studied 216 patients with gastric cancer who were registered in one referral cancer registry center in Tehran, Iran. Age at diagnosis, sex, presence of metastasis, tumor size, histology type, lymph node metastasis, and pathologic stages were entered in to analysis using the Cox proportional hazards model and smoothing methods in Cox model. The SPSS version 18.0 and R version 2.14.1 were used for data analysis. These models compared with Akaike information criterion.
Results
In this study, The 5 year survival rate was 30%. The Cox proportional hazards, penalized spline and fractional polynomial models let to similar results and Akaike information criterion showed a better performance for these three models comparing to the restricted cubic spline. Also, P-value and likelihood ratio test in restricted cubic spline was greater than other models. Note that the best model is indicated by the lowest Akaike information criterion.
Conclusions
The use of smoothing methods helps us to eliminate non-linear effects but it is more appropriate to use Cox proportional hazards model in medical data because of its’ ease of interpretation and capability of modeling both continuous and discrete covariates. Also, Cox proportional hazards model and smoothing methods analysis identified that age at diagnosis and tumor size were independent prognostic factors for the survival of patients with gastric cancer (P < 0.05). According to these results the early detection of patients at younger age and in primary stages may be important to increase survival.
doi:10.5812/ircmj.8649
PMCID: PMC3652506  PMID: 23682331
Proportional Hazards Models; Survival; Stomach Neoplasms
18.  Efficient Semiparametric Estimation of Short-term and Long-term Hazard Ratios with Right-Censored Data 
Biometrics  2013;69(4):10.1111/biom.12097.
Summary
The proportional hazards assumption in the commonly used Cox model for censored failure time data is often violated in scientific studies. Yang and Prentice (2005) proposed a novel semiparametric two-sample model that includes the proportional hazards model and the proportional odds model as sub-models, and accommodates crossing survival curves. The model leaves the baseline hazard unspecified and the two model parameters can be interpreted as the short-term and long-term hazard ratios. Inference procedures were developed based on a pseudo score approach. Although extension to accommodate covariates was mentioned, no formal procedures have been provided or proved. Furthermore, the pseudo score approach may not be asymptotically efficient. We study the extension of the short-term and long-term hazard ratio model of Yang and Prentice (2005) to accommodate potentially time-dependent covariates. We develop efficient likelihood-based estimation and inference procedures. The nonparametric maximum likelihood estimators are shown to be consistent, asymptotically normal, and asymptotically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical settings. The proposed method successfully captured the phenomenon of crossing hazards in a cancer clinical trial and identified a genetic marker with significant long-term effect missed by using the proportional hazards model on age-at-onset of alcoholism in a genetic study.
doi:10.1111/biom.12097
PMCID: PMC3868993  PMID: 24328712
Semiparametric hazards rate model; Non-parametric likelihood; Proportional hazards model; Proportional odds model; Semiparametric efficiency
19.  Modeling genome-wide replication kinetics reveals a mechanism for regulation of replication timing 
We developed analytical models of DNA replication that include probabilistic initiation of origins, fork progression, passive replication, and asynchrony.We fit the model to budding yeast genome-wide microarray data probing the replication fraction and found that initiation times correlate with the precision of timing.We extracted intrinsic origin properties, such as potential origin efficiency and firing-time distribution, which cannot be done using phenomenological approaches.We propose that origin timing is controlled by stochastically activated initiators bound to origin sites rather than explicit time-measuring mechanisms.
The kinetics of DNA replication must be controlled for cells to develop properly. Although the biochemical mechanisms of origin initiations are increasingly well understood, the organization of initiation timing as a genome-wide program is still a mystery. With the advance of technology, researchers have been able to generate large amounts of data revealing aspects of replication kinetics. In particular, the use of microarrays to probe the replication fraction of budding yeast genome wide has been a successful first step towards unraveling the details of the replication program (Raghuraman et al, 2001; Alvino et al, 2007; McCune et al, 2008). On the surface, the microarray data shows apparent patterns of early and late replicating regions and seems to support the prevailing picture of eukaryotic replication—origins are positioned at defined sites and initiated at defined, preprogrammed times (Donaldson, 2005). Molecular combing, a single-molecule technique, however, showed that the initiation of origins is stochastic (Czajkowsky et al, 2008). Motivated by these conflicting viewpoints, we developed a model that is flexible enough to describe both deterministic and stochastic initiation.
We modeled origin initiation as probabilistic events. We first propose a model where each origin is allowed to have its distinct ‘firing-time distribution.' Origins that have well-determined initiation times have narrow distributions, whereas more stochastic origins have wider distributions. Similar models based on simulations have previously been proposed (Lygeros et al, 2008; Blow and Ge, 2009; de Moura et al, 2010); however, our model is novel in that it is analytic. It is much faster than simulations and allowed us, for the first time, to fit genome-wide microarray data and extract parameters that describe the replication program in unprecedented detail (Figure 2).
Our main result is this: origins that fire early, on average, have precisely defined initiation times, whereas origins that fire late, on average, do not have a well-defined initiation time and initiate throughout S phase. What kind of global controlling mechanism can account for this trend? We propose a second model where an origin is composed of multiple initiators, each of which fires independently and identically. A good candidate for the initiator is the minichromosome maintenance (MCM) complex, as it is found to be associated with origin firing and loaded in abundance (Hyrien et al, 2003). We show that the aforementioned relationship can be explained quantitatively if the earlier-firing origins have more MCM complexes. This model offers a new view of replication: controlled origin timing can emerge from stochastic firing and does not need an explicit time-measuring mechanism, a ‘clock.' This model provides a new, detailed, plausible, and testable mechanism for replication timing control.
Our models also capture the effects of passive replication, which is often neglected in phenomenological approaches (Eshaghi et al, 2007). There are two ways an origin site can be replicated. The site can be replicated by the origin binding to it but can also be passively replicated by neighboring origins. This complication makes it difficult to extract the intrinsic properties of origins. By modeling passive replication, we can separate the contribution from each origin and extract the potential efficiency of origins, i.e., the efficiency of the origin given that there is no passive replication. We found that while most origins are potentially highly efficient, their observed efficiency varies greatly. This implies that many origins, though capable of initiating, are often passively replicated and appear dormant. Such a design makes the replication process robust against replication stress such as fork stalling (Blow and Ge, 2009). If two approaching forks stall, normally dormant origins in the region, not being passively replicated, will initiate to replicate the gap.
With the advance of the microarray and molecular-combing technology, experiments have been done to probe many different types of cells, and large amounts of replication fraction data have been generated. Our model can be applied to spatiotemporally resolved replication fraction data for any organism, as the model is flexible enough to capture a wide range of replication kinetics. The analytical model is also much faster than simulation-based models. For these reasons, we believe that the model is a powerful tool for analyzing these large datasets. This work opens the possibility for understanding the replication program across species in more rigor and detail (Goldar et al, 2009).
Microarrays are powerful tools to probe genome-wide replication kinetics. The rich data sets that result contain more information than has been extracted by current methods of analysis. In this paper, we present an analytical model that incorporates probabilistic initiation of origins and passive replication. Using the model, we performed least-squares fits to a set of recently published time course microarray data on Saccharomyces cerevisiae. We extracted the distribution of firing times for each origin and found that the later an origin fires on average, the greater the variation in firing times. To explain this trend, we propose a model where earlier-firing origins have more initiator complexes loaded and a more accessible chromatin environment. The model demonstrates how initiation can be stochastic and yet occur at defined times during S phase, without an explicit timing program. Furthermore, we hypothesize that the initiators in this model correspond to loaded minichromosome maintenance complexes. This model is the first to suggest a detailed, testable, biochemically plausible mechanism for the regulation of replication timing in eukaryotes.
doi:10.1038/msb.2010.61
PMCID: PMC2950085  PMID: 20739926
DNA replication program; genome-wide analysis; microarray data; replication-origin efficiency; stochastic modeling
20.  Kernelized partial least squares for feature reduction and classification of gene microarray data 
BMC Systems Biology  2011;5(Suppl 3):S13.
Background
The primary objectives of this paper are: 1.) to apply Statistical Learning Theory (SLT), specifically Partial Least Squares (PLS) and Kernelized PLS (K-PLS), to the universal "feature-rich/case-poor" (also known as "large p small n", or "high-dimension, low-sample size") microarray problem by eliminating those features (or probes) that do not contribute to the "best" chromosome bio-markers for lung cancer, and 2.) quantitatively measure and verify (by an independent means) the efficacy of this PLS process. A secondary objective is to integrate these significant improvements in diagnostic and prognostic biomedical applications into the clinical research arena. That is, to devise a framework for converting SLT results into direct, useful clinical information for patient care or pharmaceutical research. We, therefore, propose and preliminarily evaluate, a process whereby PLS, K-PLS, and Support Vector Machines (SVM) may be integrated with the accepted and well understood traditional biostatistical "gold standard", Cox Proportional Hazard model and Kaplan-Meier survival analysis methods. Specifically, this new combination will be illustrated with both PLS and Kaplan-Meier followed by PLS and Cox Hazard Ratios (CHR) and can be easily extended for both the K-PLS and SVM paradigms. Finally, these previously described processes are contained in the Fine Feature Selection (FFS) component of our overall feature reduction/evaluation process, which consists of the following components: 1.) coarse feature reduction, 2.) fine feature selection and 3.) classification (as described in this paper) and prediction.
Results
Our results for PLS and K-PLS showed that these techniques, as part of our overall feature reduction process, performed well on noisy microarray data. The best performance was a good 0.794 Area Under a Receiver Operating Characteristic (ROC) Curve (AUC) for classification of recurrence prior to or after 36 months and a strong 0.869 AUC for classification of recurrence prior to or after 60 months. Kaplan-Meier curves for the classification groups were clearly separated, with p-values below 4.5e-12 for both 36 and 60 months. CHRs were also good, with ratios of 2.846341 (36 months) and 3.996732 (60 months).
Conclusions
SLT techniques such as PLS and K-PLS can effectively address difficult problems with analyzing biomedical data such as microarrays. The combinations with established biostatistical techniques demonstrated in this paper allow these methods to move from academic research and into clinical practice.
doi:10.1186/1752-0509-5-S3-S13
PMCID: PMC3287568  PMID: 22784619
21.  Survival Analysis of Irish Amyotrophic Lateral Sclerosis Patients Diagnosed from 1995–2010 
PLoS ONE  2013;8(9):e74733.
Introduction
The Irish ALS register is a valuable resource for examining survival factors in Irish ALS patients. Cox regression has become the default tool for survival analysis, but recently new classes of flexible parametric survival analysis tools known as Royston-Parmar models have become available.
Methods
We employed Cox proportional hazards and Royston-Parmar flexible parametric modeling to examine factors affecting survival in Irish ALS patients. We further examined the effect of choice of timescale on Cox models and the proportional hazards assumption, and extended both Cox and Royston-Parmar models with time varying components.
Results
On comparison of models we chose a Royston-Parmar proportional hazards model without time varying covariates as the best fit. Using this model we confirmed the association of known survival markers in ALS including age at diagnosis (Hazard Ratio (HR) 1.34 per 10 year increase; 95% CI 1.26–1.42), diagnostic delay (HR 0.96 per 12 weeks delay; 95% CI 0.94–0.97), Definite ALS (HR 1.47 95% CI 1.17–1.84), bulbar onset disease (HR 1.58 95% CI 1.33–1.87), riluzole use (HR 0.72 95% CI 0.61–0.85) and attendance at an ALS clinic (HR 0.74 95% CI 0.64–0.86).
Discussion
Our analysis explored the strengths and weaknesses of Cox proportional hazard and Royston-Parmar flexible parametric methods. By including time varying components we were able to gain deeper understanding of the dataset. Variation in survival between time periods appears to be due to missing data in the first time period. The use of age as timescale to account for confounding by age resolved breaches of the proportional hazards assumption, but in doing so may have obscured deficiencies in the data. Our study demonstrates the need to test for, and fully explore, breaches of the Cox proportional hazards assumption. Royston-Parmar flexible parametric modeling proved a powerful method for achieving this.
doi:10.1371/journal.pone.0074733
PMCID: PMC3786977  PMID: 24098664
22.  Inflammatory Biomarkers, Death, and Recurrent Nonfatal Coronary Events After an Acute Coronary Syndrome in the MIRACL Study 
Background
In acute coronary syndromes, C‐reactive protein (CRP) strongly relates to subsequent death, but surprisingly not to recurrent myocardial infarction. Other biomarkers may reflect different processes related to these outcomes. We assessed 8 inflammatory and vascular biomarkers and the risk of death and recurrent nonfatal cardiovascular events in the 16 weeks after an acute coronary syndrome.
Methods and Results
We measured blood concentrations of CRP, serum amyloid A (SAA), interleukin‐6 (IL‐6), soluble intercellular adhesion molecule (ICAM), soluble vascular cell adhesion molecule (VCAM), E‐selectin, P‐selectin, and tissue plasminogen activator antigen (tPA) 24 to 96 hours after presentation with acute coronary syndrome in 2925 subjects participating in a multicenter study. Biomarkers were related to the risk of death, and recurrent nonfatal acute coronary syndromes (myocardial infarction or unstable angina) over 16 weeks using Cox proportional hazard models. On univariate analyses, baseline CRP (P=0.006), SAA (P=0.012), and IL‐6 (P<0.001) were related to death, but not to recurrent nonfatal acute coronary syndromes. VCAM and tPA related to the risk of death (P<0.001, P=0.021, respectively) and to nonfatal acute coronary syndromes (P=0.021, P=0.049, respectively). Adjusting for significant covariates reduced the strength of the associations; however, CRP and SAA continued to relate to death.
Conclusions
In acute coronary syndromes, the CRP inflammatory axis relates to the risk of death and may reflect myocardial injury. VCAM and tPA may have greater specificity for processes reflecting inflammation and thrombosis in the epicardial arteries, which determine recurrent coronary events.
doi:10.1161/JAHA.112.003103
PMCID: PMC3603244  PMID: 23525424
acute coronary syndromes; biomarkers; CRP; death; nonfatal events; risk
23.  Changes in Drug Utilization during a Gap in Insurance Coverage: An Examination of the Medicare Part D Coverage Gap 
PLoS Medicine  2011;8(8):e1001075.
Jennifer Polinski and colleagues estimated the effect of the "coverage gap" during which US Medicare beneficiaries are fully responsible for drug costs and found that the gap was associated with a doubling in discontinuing essential medications.
Background
Nations are struggling to expand access to essential medications while curbing rising health and drug spending. While the US government's Medicare Part D drug insurance benefit expanded elderly citizens' access to drugs, it also includes a controversial period called the “coverage gap” during which beneficiaries are fully responsible for drug costs. We examined the impact of entering the coverage gap on drug discontinuation, switching to another drug for the same indication, and drug adherence. While increased discontinuation of and adherence to essential medications is a regrettable response, increased switching to less expensive but therapeutically interchangeable medications is a positive response to minimize costs.
Methods and Findings
We followed 663,850 Medicare beneficiaries enrolled in Part D or retiree drug plans with prescription and health claims in 2006 and/or 2007 to determine who reached the gap spending threshold, n = 217,131 (33%). In multivariate Cox proportional hazards models, we compared drug discontinuation and switching rates in selected drug classes after reaching the threshold between all 1,993 who had no financial assistance during the coverage gap (exposed) versus 9,965 multivariate propensity score-matched comparators with financial assistance (unexposed). Multivariate logistic regressions compared drug adherence (≤80% versus >80% of days covered). Beneficiaries reached the gap spending threshold on average 222 d ±79. At the drug level, exposed beneficiaries were twice as likely to discontinue (hazard ratio [HR]  = 2.00, 95% confidence interval [CI] 1.64–2.43) but less likely to switch a drug (HR  = 0.60, 0.46–0.78) after reaching the threshold. Gap-exposed beneficiaries were slightly more likely to have reduced adherence (OR  = 1.07, 0.98–1.18).
Conclusions
A lack of financial assistance after reaching the gap spending threshold was associated with a doubling in discontinuing essential medications but not switching drugs in 2006 and 2007. Blunt cost-containment features such as the coverage gap have an adverse impact on drug utilization that may conceivably affect health outcomes.
Please see later in the article for the Editors' Summary
Editors' Summary
Background
Every year, more effective drugs for more diseases become available. But the availability of so many drugs poses a problem. How can governments provide their citizens with access to essential medications but control drug costs? Many different approaches have been tried, among them the “coverage gap” or “donut hole” approach that the US government has incorporated into its Medicare program. Medicare is the US government's health insurance program for people aged 65 or older and for younger people with specific conditions. Nearly 50 million US citizens are enrolled in Medicare. In 2006, the government introduced a prescription drug insurance benefit called Medicare Part D to help patients pay for their drugs. Until recently, beneficiaries of this scheme had to pay all their drug costs after their drug spending reached an initial threshold in any calendar year ($2,830 in 2010). Beneficiaries remained in this coverage gap (although people on low incomes received subsidies to help them pay for their drugs) until their out-of-pocket spending reached a catastrophic coverage spending threshold ($4,550 in 2010) or a new year started, after which the Part D benefit paid for most drug costs. Importantly, the 2010 US health reforms have mandated a gradual reduction in the amount that Medicare Part D enrollees have to pay for their prescriptions when they reach the coverage gap.
Why Was This Study Done?
Three to four million Medicare Part D beneficiaries reach the coverage gap every year (nearly 15% of all Part D beneficiaries). Supporters of the coverage gap concept argue that withdrawal of benefits increases beneficiaries' awareness of medication costs and encourages switching to cost-effective therapeutic options. However, critics argue that the coverage gap is likely to lead to decreased drug utilization, increased use of health services, and adverse outcomes. In this study, the researchers examine the impact of entering the coverage gap on drug discontinuation, switching to another drug for the same indication, and drug adherence (whether patients take their prescribed drugs regularly).
What Did the Researchers Do and Find?
The researchers studied 663,850 Medicare beneficiaries enrolled in Part D or in retiree drug plans (which provide coverage under a employer's group health plan after retirement; the retiree drug plans included in this study did not have coverage gaps) who made prescription claims in 2006 and/or 2007. A third of these individuals reached the gap spending threshold. The researchers used detailed statistical analyses to compare the drug discontinuation, switching, and adherence rates of 1,993 beneficiaries who had no financial assistance during the coverage gap (exposed beneficiaries) with those of 9,965 matched beneficiaries who had financial assistance during the coverage gap (unexposed). On average, beneficiaries reached the gap spending threshold 222 days into the year (mid August). In a drug-level analysis, exposed beneficiaries were twice as likely to discontinue a drug and slightly more likely to have reduced drug adherence than unexposed beneficiaries but 40% less likely to switch a drug after reaching the threshold. Similar results were obtained in a beneficiary-level analysis in which discontinuation, switching, and adherence rates were considered in terms of the complete drug regimen of individual beneficiaries.
What Do These Findings Mean?
These findings show that, among the Medicare beneficiaries investigated, a lack of financial assistance to pay for drugs after reaching the coverage gap spending threshold led to a doubling in the rate of drug discontinuation and a slight reduction in drug adherence. Surprisingly, lack of financial assistance resulted in a decrease in drug switching even though the Centers for Medicare and Medicaid Services advise patients to consider switching to generic or low-cost drugs. Importantly, the researchers estimate that, for the whole Medicare population, the lack of financial assistance to pay for drugs could result in an additional 18,000 patients discontinuing one or more prescription drug per year. Although this study did not directly investigate the effect of the coverage gap on patient outcomes, these findings suggest that this and other blunt cost-containment approaches could adversely affect health outcomes through their effects on drug utilization. Thus, insurance strategies that specifically promote the use of drugs with high benefit but low cost might be a better approach for governments seeking to improve the health of their citizens while reining in drug costs.
Additional Information
Please access these websites via the online version of this summary at http://dx.doi.org/10.1371/journal.pmed.1001075.
The US Department of Health and Human Services Centers for Medicare and Medicaid provides information on all aspects of Medicare, including general advice on bridging the coverage gap and an information sheet on bridging the coverage gap in 2011
Medicare.gov, the official US government website for Medicare, provides information on all aspects of Medicare (in English and Spanish), including a description of Part D prescription drug coverage
An information sheet from the Kaiser Family Foundation explains the key changes to the Medicare Part D drug benefit coverage gap that were introduced in the 2010 health care reforms
MedlinePlus provides links to further information about Medicare (in English and Spanish)
doi:10.1371/journal.pmed.1001075
PMCID: PMC3156689  PMID: 21857811
24.  REGULARIZATION FOR COX’S PROPORTIONAL HAZARDS MODEL WITH NP-DIMENSIONALITY* 
Annals of statistics  2011;39(6):3092-3120.
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of non-concave penalized methods for non-polynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that non-concave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the non-concave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.
doi:10.1214/11-AOS911
PMCID: PMC3468162  PMID: 23066171
Hazard rate; LASSO; SCAD; Large deviation; Oracle
25.  Predictive value of lactate in unselected critically ill patients: an analysis using fractional polynomials 
Journal of Thoracic Disease  2014;6(7):995-1003.
Background and objectives
Hyperlactatemia has long been associated with poor clinical outcome in varieties of intensive care unit (ICU) patients. However, the impact of temporal changes in lactate has not been well established and there are some shortcomings in model building in previous studies. The present study aims to investigate the association of initial lactate and normalization time with hazard by using fractional polynomial Cox proportional hazard model.
Methods
A large clinical database named Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II) was employed for analysis. Demographics, comorbidities, laboratory findings were extracted and were compared between survivors and non-survivors by using univariable analysis. Cox proportional hazard model was built by purposeful selection of covariate with initial lactate (L0) and normalization time (T) remaining in the model. Best fit model was selected by using deviance difference test and comparisons between fractional polynomial regression models of different degrees were performed by using closed test procedure.
Main results
A total of 6,291 ICU patients were identified to be eligible for the present study, including 1,675 non-survivors and 4,616 survivors (mortality rate: 26.6%). Patients with lactate normalization had significantly reduced hazard rate as compared to those without normalization (log-rank test: P<0.05). The best powers of L0 in the model were –2 and –1 with the deviance of 19,944.51, and the best powers of T were 0.5 and 3 with the deviance of 7,965.63. The adjusted hazard ratio for the terms L0–2 and L0–1 were 1.13 (95% CI: 1.09-1.18) and 0.43 (95% CI: 0.34-0.54); and the adjusted hazard ratio for the terms T0.5 and T3 were 7.42 (95% CI: 2.85-19.36) and 3.06×10–6 (95% CI: 3.01×10–11-0.31).
Conclusions
Initial lactate on ICU admission is associated with death hazard and the relationship follows a fractional polynomial pattern with the power of –2 and –1. Delayed normalization of lactate is predictive of high risk of death when it is measured within 150 hours after ICU admission.
doi:10.3978/j.issn.2072-1439.2014.07.01
PMCID: PMC4120171  PMID: 25093098
Fractional polynomial; lactate normalization; intensive care unit (ICU); mortality; critically ill

Results 1-25 (1060536)