To test induction chemotherapy (IC) followed by concurrent chemoradiotherapy (CRT) or surgery/ radiotherapy (RT) for advanced oropharyngeal cancer and to assess the effect of human papilloma virus (HPV) on response and outcome.
Patients and Methods
Sixty-six patients (51 male; 15 female) with stage III to IV squamous cell carcinoma of the oropharynx (SCCOP) were treated with one cycle of cisplatin (100 mg/m2) or carboplatin (AUC 6) and with fluorouracil (1,000 mg/m2/d for 5 days) to select candidates for CRT. Those achieving a greater than 50% response at the primary tumor received CRT (70 Gy; 35 fractions with concurrent cisplatin 100 mg/m2 or carboplatin (AUC 6) every 21 days for three cycles). Adjuvant paclitaxel was given to patients who were complete histologic responders. Patients with a response of 50% or less underwent definitive surgery and postoperative radiation. Pretreatment biopsies from 42 patients were tested for high-risk HPV.
Fifty-four of 66 patients (81%) had a greater than 50% response after IC. Of these, 53 (98%) received CRT, and 49 (92%) obtained complete histologic response with a 73.4% (47 of 64) rate of organ preservation. The 4-year overall survival (OS) was 70.4%, and the disease-specific survival (DSS) was 75.8% (median follow-up, 64.1 months). HPV16, found in 27 of 42 (64.3%) biopsies, was associated with younger age (median, 55 v 63 years; P = .016), sex (22 of 30 males [73.3%] and five of 12 females [41.7%]; P = .08), and nonsmoking status (P = .037). HPV titer was significantly associated with IC response (P = .001), CRT response (P = .005), OS (P = .007), and DSS (P = .008).
Although the numbers in this study are small, IC followed by CRT is an effective treatment for SCCOP, especially in patients with HPV-positive tumors; however, for patients who do not respond to treatment, alternative treatments must be developed.
We consider the problem of identifying a subgroup of patients who may have an enhanced treatment effect in a randomized clinical trial, and it is desirable that the subgroup be defined by a limited number of covariates. For this problem, the development of a standard, pre-determined strategy may help to avoid the well-known dangers of subgroup analysis. We present a method developed to find subgroups of enhanced treatment effect. This method, referred to as “Virtual Twins”, involves predicting response probabilities for treatment and control “twins” for each subject. The difference in these probabilities is then used as the outcome in a classification or regression tree, which can potentially include any set of the covariates. We define a measure Q(Â) to be the difference between the treatment effect in estimated subgroup Â and the marginal treatment effect. We present several methods developed to obtain an estimate of Q(Â), including estimation of Q(Â) using estimated probabilities in the original data, using estimated probabilities in newly simulated data, two cross-validation-based approaches and a bootstrap-based bias corrected approach. Results of a simulation study indicate that the Virtual Twins method noticeably outperforms logistic regression with forward selection when a true subgroup of enhanced treatment effect exists. Generally, large sample sizes or strong enhanced treatment effects are needed for subgroup estimation. As an illustration, we apply the proposed methods to data from a randomized clinical trial.
randomized clinical trials; subgroups; random forests; regression trees; tailored therapeutics
Intermediate outcome variables can often be used as auxiliary variables for the true outcome of interest in randomized clinical trials. For many cancers, time to recurrence is an informative marker in predicting a patient’s overall survival outcome, and could provide auxiliary information for the analysis of survival times.
To investigate whether models linking recurrence and death combined with a multiple imputation procedure for censored observations can result in efficiency gains in the estimation of treatment effects, and be used to shorten trial lengths.
Recurrence and death times are modeled using data from 12 trials in colorectal cancer. Multiple imputation is used as a strategy for handling missing values arising from censoring. The imputation procedure uses a cure model for time to recurrence and a time-dependent Weibull proportional hazards model for time to death. Recurrence times are imputed, and then death times are imputed conditionally on recurrence times. To illustrate these methods, trials are artificially censored 2-years after the last accrual, the imputation procedure is implemented, and a log-rank test and Cox model are used to analyze and compare these new data with the original data.
The results show modest, but consistent gains in efficiency in the analysis by using the auxiliary information in recurrence times. Comparison of analyses show the treatment effect estimates and log rank test results from the 2-year censored imputed data to be in between the estimates from the original data and the artificially censored data, indicating that the procedure was able to recover some of the lost information due to censoring.
The models used are all fully parametric, requiring distributional assumptions of the data.
The proposed models may be useful to improve the efficiency in estimation of treatment effects in cancer trials and shortening trial length.
Auxiliary Variables; Colon Cancer; Cure Models; Multiple Imputation; Surrogate Endpoints
A common side effect experienced by head and neck cancer patients after radiotherapy (RT) is impairment of the parotid glands’ ability to produce saliva. Our purpose is to investigate the relationship between radiation dose and saliva changes in the two years following treatment.
Methods and Materials
The study population includes 142 patients treated with conformal or intensity modulated radiotherapy. Saliva flow rates from 266 parotid glands are measured before and 1, 3, 6, 12, 18 and 24 months after treatment. Measurements are collected separately from each gland under both stimulated and unstimulated conditions. Bayesian nonlinear hierarchical models were developed and fit to the data.
Parotids receiving higher radiation produce less saliva. The largest reduction is at 1–3 months after RT followed by gradual recovery. When mean doses are lower (e.g. <25Gy), the model-predicted average stimulated saliva recovers to pre-treatment levels at 12 months and exceeds it at 18 and 24 months. For higher doses (e.g. >30Gy), the stimulated saliva does not return to original levels after two years. Without stimulation, at 24 months, the predicted saliva is 86% of pre-treatment levels for 25Gy and <31% for >40Gy. We do not find evidence to support that the over-production of stimulated saliva at 18 and 24 months after low dose in one parotid gland is due to low saliva production from the other parotid gland.
Saliva production is impacted significantly by radiation, but with doses <25–30Gy, recovery is substantial and returns to pre-treatment levels two years after RT.
Head and neck cancer; Intensity modulated radiation therapy; Parotid salivary glands; Radiation dose; Bayesian analysis
In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics
58, 21–29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine
8, 431–440). The method is applied to data from a macular degeneration study and an ovarian cancer study.
Bayesian estimation; Principal stratification; Surrogate endpoints
It has been postulated that gastroesophageal reflux plays a role in the etiology of head and neck squamous cell carcinomas (HNSCC) and contributes to complications after surgery or during radiotherapy. Antacid medications are commonly used in HNSCC patients for the management of acid reflux however their relationship with outcomes has not been well studied.
Associations between histamine receptor-2 antagonists (H2RAs) and proton pump inhibitors (PPIs) use and treatment outcomes were determined in 596 previously untreated HNSCC patients enrolled in our SPORE epidemiology program from 2003–2008 (median follow-up 55-month). Comprehensive clinical information was entered prospectively in our database. Risk strata were created based on possible confounding prognostic variables (age, demographics, socioeconomics, tumor stage, primary site, smoking status, HPV-16 status and treatment modality); correlations within risk strata were analyzed in a multivariable model.
Patients taking antacid medications had significantly better overall survival (PPI alone: p<0.001: H2RA alone, p=0.0479; both PPI+H2RA, p=0.0133). Using multivariable Cox models and adjusting for significant prognostic covariates, both PPIs and H2RAs use were significant prognostic factors for overall survival, but only H2RAs use for recurrence-free survival in HPV16-positive oropharyngeal patients. We found significant associations between use of H2RAs and PPIs, alone or in combination, and various clinical characteristics.
The findings in this large cohort study indicate that routine use of antacid medications may have significant therapeutic benefit in HNSCC patients. The reasons for this association remain an active area of investigation and could lead to identification of new treatment and prevention approaches with agents that have minimal toxicities.
Head and neck squamous cell carcinoma (HNSCC); antacid medications; proton pump inhibitors (PPI); histamine receptor 2 antagonists (H2RA); clinical outcome; survival
Selection of dose for cancer patients treated with radiation therapy (RT) must balance the increased efficacy with the increased toxicity associated with higher dose. Historically, a single dose has been selected for a population of patients (e.g., all stage III non-small cell lung cancer). However, the availability of new biologic markers for toxicity and efficacy allows the possibility of selecting a more personalized dose. We consider the use of statistical models for toxicity and efficacy as a function of RT dose and biomarkers to select an optimal dose for an individual patient, defined as the dose that maximizes the probability of efficacy minus the sum of weighted toxicity probabilities. This function can be shown to be equal to the expected value of the utility derived from a particular family of bivariate outcome utility matrices. We show that if dose is linearly related to the probability of toxicity and efficacy, then any marker that only acts additively with dose cannot improve efficacy, without also increasing toxicity. Using a dataset of lung cancer patients treated with RT, we illustrate this approach and compare it to non-marker-based dose selection. Because typical metrics used in evaluating new markers (e.g., area under the ROC curve) do not directly address the ability of a marker to improve efficacy at a fixed probability of toxicity, we utilize a simulation study to assess the effects of marker-based dose selection on toxicity and efficacy outcomes.
dose finding; phase I; biomarkers; radiation therapy; utilities
In this paper, we consider the problem of constructing confidence intervals (CIs) for G independent normal population means subject to linear ordering constraints. For this problem, CIs based on asymptotic distributions, likelihood ratio tests and bootstraps do not have good properties particularly when some of the population means are close to each other. We propose a new method based on defining intermediate random variables that are related to the original observations and using the CIs of the means of these intermediate random variables to restrict the original CIs from the separate groups. The coverage rates of the intervals are shown to exceed, but be close to, the nominal level for two groups, when the ratio of the variances is assumed known. Simulation studies show that the proposed CIs have coverage rates close to nominal levels with reduced average widths. Data on half-lives of an antibiotic are analyzed to illustrate the method.
Key words and phrases: Convex combination; Elliptical unimodal distribution; Linear ordering; Normal distribution; Restricted confidence interval
With challenges in data harmonization and covariate heterogeneity across various data sources, meta-analysis of gene-environment interaction studies can often involve subtle statistical issues. In this paper, we study the effect of environmental covariate heterogeneity (within and between cohorts) on two approaches for fixed-effect meta-analysis: the standard inverse-variance weighted meta-analysis and a meta-regression approach. Akin to the results in Simmonds and Higgins (2007), we obtain analytic efficiency results for both methods under the assumption of gene-environment independence. The relative efficiency of the two methods depends on the ratio of within- versus between- cohort variability of the environmental covariate. We propose to use an adaptively weighted estimator (AWE), between meta-analysis and meta-regression, for the interaction parameter. The AWE retains full efficiency of the joint analysis using individual level data under certain natural assumptions. Lin and Zeng (2010a, b) showed that a multivariate inverse-variance weighted estimator also had asymptotically full efficiency as joint analysis using individual level data, if the estimates with full covariance matrices for all the common parameters are pooled across all studies. We show consistency of our work with Lin and Zeng (2010a, b). Without sacrificing much efficiency, the AWE uses only univariate summary statistics from each study, and bypasses issues with sharing individual level data or full covariance matrices across studies. We compare the performance of the methods both analytically and numerically. The methods are illustrated through meta-analysis of interaction between Single Nucleotide Polymorphisms in FTO gene and body mass index on high-density lipoprotein cholesterol data from a set of eight studies of type 2 diabetes.
ADAPTIVELY WEIGHTED ESTIMATOR; COVARIATE HETEROGENEITY; GENE-ENVIRONMENT INTERACTION; INDIVIDUAL PATIENT DATA; META-ANALYSIS; META-REGRESSION; POWER CALCULATION
The US National Cancer Institute (NCI), in collaboration with scientists representing multiple areas of expertise relevant to ‘omics’-based test development, has developed a checklist of criteria that can be used to determine the readiness of omics-based tests forguiding patient care in clinical trials. The checklist criteria cover issues relating to specimens, assays, mathematical modelling, clinical trial design, and ethical, legal and regulatory aspects. Funding bodies and journals are encouraged to consider the checklist, which they may find useful for assessing study quality and evidence strength. The checklist will be used to evaluate proposals for NCI-sponsored clinical trials in which omics tests will be used to guide therapy.
We consider the problem of variable selection for monotone single-index models. A single-index model assumes that the expectation of the outcome is an unknown function of a linear combination of covariates. Assuming monotonicity of the unknown function is often reasonable, and allows for more straightforward inference. We present an adaptive LASSO penalized least squares approach to estimating the index parameter and the unknown function in these models for continuous outcome. Monotone function estimates are achieved using the pooled adjacent violators algorithm, followed by kernel regression. In the iterative estimation process, a linear approximation to the unknown function is used, therefore reducing the situation to that of linear regression, and allowing for the use of standard LASSO algorithms, such as coordinate descent. Results of a simulation study indicate that the proposed methods perform well under a variety of circumstances, and that an assumption of monotonicity, when appropriate, noticeably improves performance. The proposed methods are applied to data from a randomized clinical trial for the treatment of a critical illness in the intensive care unit.
Adaptive LASSO; Isotonic regression; Kernel estimator; Single-index models; Variable selection
In this paper, we develop a Bayesian approach to estimate a Cox proportional hazards model that allows a threshold in the regression coefficient based on a threshold in a covariate, when some fraction of subjects are not susceptible to the event of interest. A data augmentation scheme with latent binary cure indicators is adopted to simplify the Markov chain Monte Carlo implementation. Given the binary cure indicators, the Cox cure model reduces to a standard Cox model and a logistic regression model. Furthermore, the threshold detection problem reverts to a threshold problem in a regular Cox model. The baseline cumulative hazard for the Cox model is formulated non-parametrically using counting processes with a gamma process prior. Simulation studies demonstrate that the method provides accurate point and interval estimates. Application to a data set of Oropharynx cancer patients suggests a significant threshold in age at diagnosis such that the effect of gender on disease-specific survival changes after the threshold.
threshold; Cox model; cure model; mixture model; Markov chain Monte Carlo
In this paper, we consider two-stage designs with failure-time endpoints in single arm phase II trials. We propose designs in which stopping rules are constructed by comparing the Bayes risk of stopping at stage one to the expected Bayes risk of continuing to stage two using both the observed data in stage one and the predicted survival data in stage two. Terminal decision rules are constructed by comparing the posterior expected loss of a rejection decision versus an acceptance decision. Simple threshold loss functions are applied to time-to-event data modelled either parametrically or non-parametrically, and the cost parameters in the loss structure are calibrated to obtain desired Type I error and power. We ran simulation studies to evaluate design properties including type I&II errors, probability of early stopping, expected sample size and expected trial duration, and compared them with the Simon two-stage designs and a design which is an extension of the Simon’s designs with time-to-event endpoints. An example based on a recently conducted phase II sarcoma trial illustrates the method.
Proinflammatory cytokine levels may be associated with cancer stage, recurrence, and survival. A study was undertaken to determine if cytokine levels were associated with dietary patterns and fat-soluble micronutrients in previously untreated head and neck squamous cell carcinoma (HNSCC) patients.
This was a cross-sectional study of 160 newly diagnosed HNSCC patients who completed pretreatment food frequency questionnaires (FFQ) and health surveys. Dietary patterns were derived from FFQs using principal component analysis. Pretreatment serum levels of the proinflammatory cytokines IL-6, TNF-α, and IFN-γ were measured by ELISA and serum carotenoid and tocopherol levels by HPLC. Multivariable ordinal logistic regression models examined associations between cytokines and quartiles of reported and serum dietary variables.
Three dietary patterns emerged: whole foods, Western, and convenience foods. In multivariable analyses, higher whole foods pattern scores were significantly associated with lower levels of IL-6, TNF-α, and IFN-γ (P = <0.001, P = 0.008, and P = 0.03, respectively). Significant inverse associations were reported between IL-6, TNF-α, and IFN-γ levels and quartiles of total reported carotenoid intake (P = 0.006, P = 0.04, and P = 0.04, respectively). There was an inverse association between IFN-γ levels and serum α-tocopherol levels (P = 0.03).
Consuming a pretreatment diet rich in vegetables, fruit, fish, poultry and whole grains may be associated with lower proinflammatory cytokine levels in patients with HNSCC.
dietary patterns; carotenoids; cytokines; head and neck cancer
In this commentary we discuss several challenges that are of current relevance to the design of clinical trials in oncology. We argue that the compartmentalization of trials into the three standard phases, with non overlapping aims, is not necessary and in fact may slow the clinical development of agents. Combined phase I/II trials and/or phase I trials that at minimum collect efficacy data and more optimally include a preliminary measure of efficacy in dosing determination should be more widely utilized. Similarly, we posit that randomized phase II trials should be used more frequently, as opposed to the traditional historical single arm phase II trial that usually does not have a valid comparison group. The use of non binary endpoints is a simple modification that can improve the efficiency of early phase trials. The heterogeneity in scientific goals and contexts in early phase oncology trials is considerable, and the potential to improve the design to match these goals is great. Our overall premise is that the potential benefits associated with the oncology clinical trial community moving away from the one size fits all paradigm of trial design are great, and that more flexible and efficient designs tailored to match the goals of each study are currently available and being used successfully.
Efficient designs; flexible adaptive designs; summarization of information; randomized phase II; phase I/II
We propose a Phase I/II trial design in which subjects with dose-limiting toxicity are not followed for response, leading to three possible outcomes for each subject: dose-limiting toxicity, absence of therapeutic response without dose-limiting toxicity, and presence of therapeutic response without dose-limiting toxicity. We define the latter outcome as a ‘success,’ and the goal of the trial is to identify the dose with the largest probability of success. This dose is commonly referred to as the most successful dose. We propose a design that accumulates information on subjects with regard to both dose-limiting toxicity and response conditional on no dose-limiting toxicity. Bayesian methods are used to update the estimates of dose-limiting toxicity and response probabilities when each subject is enrolled, and we use these methods to determine the dose level assigned to each subject. Due to the need to explore doses more fully, each subject is not necessarily assigned the current estimate of the most successful dose; our algorithm may instead assign a dose that is in a neighborhood of the current most successful dose. We examine the ability of our design to correctly identify the most successful dose in a variety of settings via simulation and compare the performance of our design to that of competing approaches.
Dose-finding studies; early-phase clinical trials; most successful dose; adaptive design
When an outcome of interest in a clinical trial is late-occurring or difficult to obtain, surrogate markers can extract information about the effect of the treatment on the outcome of interest. Understanding associations between the causal effect of treatment on the outcome and the causal effect of treatment on the surrogate is critical to understanding the value of a surrogate from a clinical perspective.
Traditional regression approaches to determine the proportion of the treatment effect explained by surrogate markers suffer from several shortcomings: they can be unstable, and can lie outside of the 0–1 range. Further, they do not account for the fact that surrogate measures are obtained post-randomization, and thus the surrogate-outcome relationship may be subject to unmeasured confounding. Methods to avoid these problem are of key importance.
Frangakis C, Rubin DM. Principal stratification in causal inference. Biometrics 2002; 58:21–9 suggested assessing the causal effect of treatment within pre-randomization “principal strata” defined by the counterfactual joint distribution of the surrogate marker under the different treatment arms, with the proportion of the overall outcome causal effect attributable to subjects for whom the treatment affects the proposed surrogate as the key measure of interest. Li Y, Taylor JMG, Elliott MR. Bayesian approach to surrogacy assessment using principal stratification in clinical trials. Biometrics 2010; 66:523–31 developed this “principal surrogacy” approach for dichotomous markers and outcomes, utilizing Bayesian methods that accommodated non-identifiability in the model parameters. Because the surrogate marker is typically observed early, outcome data is often missing. Here we extend Li, Taylor, and Elliott to accommodate missing data in the observable final outcome under ignorable and non-ignorable settings. We also allow for the possibility that missingness has a counterfactual component, a feature that previous literature has not addressed.
We apply the proposed methods to a trial of glaucoma control comparing surgery versus medication, where intraocular pressure (IOP) control at 12 months is a surrogate for IOP control at 96 months. We also conduct a series of simulations to consider the impacts of non-ignorability, as well as sensitivity to priors and the ability of the Decision Information Criterion to choose the correct model when parameters are not fully identified.
Because model parameters cannot be fully identified from data, informative priors can introduce non-trivial bias in moderate sample size settings, while more non-informative priors can yield wide credible intervals.
Assessing the linkage between causal effects of treatment on a surrogate marker and causal effects of a treatment on an outcome is important to understanding the value of a marker. These causal effects are not fully identifiable: hence we explore the sensitivity and identifiability aspects of these models and show that relatively weak assumptions can still yield meaningful results.
Causal Inference; Surrogate Marker; Bayesian Analysis; dentifiability; Non-response; Counterfactual
Use of the Continual Reassessment Method (CRM) and other model-based approaches to design in Phase I clinical trials has increased due to the ability of the CRM to identify the maximum tolerated dose (MTD) better than the 3+3 method. However, the CRM can be sensitive to the variance selected for the prior distribution of the model parameter, especially when a small number of patients are enrolled. While methods have emerged to adaptively select skeletons and to calibrate the prior variance only at the beginning of a trial, there has not been any approach developed to adaptively calibrate the prior variance throughout a trial. We propose three systematic approaches to adaptively calibrate the prior variance during a trial and compare them via simulation to methods proposed to calibrate the variance at the beginning of a trial.
adaptive design; Bayes factor; dose-finding study; dose-escalation study
With advancement in genomic technologies, it is common that two high-dimensional datasets are available, both measuring the same underlying biological phenomenon with different techniques. We consider predicting a continuous outcome Y using X, a set of p markers which is the best available measure of the underlying biological process. This same biological process may also be measured by W, coming from a prior technology but correlated with X. On a moderately sized sample, we have (Y,X,W), and on a larger sample we have (Y,W). We utilize the data on W to boost the prediction of Y by X. When p is large and the subsample containing X is small, this is a p>n situation. When p is small, this is akin to the classical measurement error problem; however, ours is not the typical goal of calibrating W for use in future studies. We propose to shrink the regression coefficients β of Y on X toward different targets that use information derived from W in the larger dataset. We compare these proposals with the classical ridge regression of Y on X, which does not use W. We also unify all of these methods as targeted ridge estimators. Finally, we propose a hybrid estimator which is a linear combination of multiple estimators of β. With an optimal choice of weights, the hybrid estimator balances efficiency and robustness in a data-adaptive way to theoretically yield a smaller prediction error than any of its constituents. The methods, including a fully Bayesian alternative, are evaluated via simulation studies. We also apply them to a gene-expression dataset. mRNA expression measured via quantitative real-time polymerase chain reaction is used to predict survival time in lung cancer patients, with auxiliary information from microarray technology available on a larger sample.
Cross-validation; Generalized ridge; Mean squared prediction error; Measurement error
Health behaviors have been shown to be associated with recurrence risk and survival rates in cancer patients and are also associated with Interleukin-6 levels, but few epidemiologic studies have investigated the relationship of health behaviors and Interleukin-6 among cancer populations. The purpose of the study is to look at the relationship between five health behaviors: smoking, alcohol problems, body mass index (a marker of nutritional status), physical activity, and sleep and pretreatment Interleukin-6 levels in persons with head and neck cancer.
Patients (N=409) were recruited in otolaryngology clinic waiting rooms and invited to complete written surveys. A medical record audit was also conducted. Descriptive statistics and multivariate analyses were conducted to determine which health behaviors were associated with higher Interleukin-6 levels controlling for demographic and clinical variables among newly diagnosed head and neck cancer patients.
While smoking, alcohol problems, body mass index, physical activity, and sleep were associated with Interleukin-6 levels in bivariate analysis, only smoking (current and former) and decreased sleep were independent predictors of higher Interleukin-6 levels in multivariate regression analysis. Covariates associated with higher Interleukin-6 levels were age and higher tumor stage, while comorbidities were marginally significant.
Health behaviors, particularly smoking and sleep disturbances, are associated with higher Interleukin-6 levels among head and neck cancer patients.
Treating health behavior problems, especially smoking and sleep disturbances, may be beneficial to decreasing Interleukin-6 levels which could have a beneficial effect on overall cancer treatment outcomes.
head and neck/oral cancers; tobacco; cytokines; diet, alcohol, smoking, and other lifestyle risk factors; molecular markers in prevention research
The increasing availability and use of predictive models to facilitate informed decision making highlights the need for careful assessment of the validity of these models. In particular, models involving biomarkers require careful validation for two reasons: issues with overfitting when complex models involve a large number of biomarkers, and inter-laboratory variation in assays used to measure biomarkers. In this paper we distinguish between internal and external statistical validation. Internal validation, involving training-testing splits of the available data or cross-validation, is a necessary component of the model building process and can provide valid assessments of model performance. External validation consists of assessing model performance on one or more datasets collected by different investigators from different institutions. External validation is a more rigorous procedure necessary for evaluating whether the predictive model will generalize to populations other than the one on which it was developed. We stress the need for an external dataset to be truly external, that is, to play no role in model development and ideally be completely unavailable to the researchers building the model. In addition to reviewing different types of validation, we describe different types and features of predictive models and strategies for model building, as well as measures appropriate for assessing their performance in the context of validation. No single measure can characterize the different components of the prediction, and the use of multiple summary measures is recommended.
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status.
We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR).
Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software.
Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility.
This study was designed to (1) describe the demographics and (2) determine the efficacy of a head and neck cancer screening program in order to optimize future programs.
After IRB approval, we conducted a retrospective cohort study to review a single institution’s 14-year experience (1996–2009) conducting a free annual head and neck cancer screening clinic. Available demographic and clinical data, as well as clinical outcomes were analyzed for all participants (n=761). The primary outcome was the presence of a finding suspicious for head and neck cancer on screening evaluation.
Five percent of participants had findings suspicious for head and neck cancer on screening evaluation, and malignant or pre-malignant lesions were confirmed in one percent of participants. Lack of insurance (p=.05), tobacco use (p<.001), male gender (p=.03), separated marital status (p=.03), and younger age (p=.04) were the significant demographic predictors of a lesion suspicious for malignancy. Patients complaining of a neck mass (p<.001) or oral pain (p<.001) were significantly more likely to have findings suspicious of malignancy. A high percentage (40%) was diagnosed with benign otolaryngologic pathologies on screening evaluation.
A minority of patients presenting to a head and neck cancer screening clinic will have a suspicious lesion identified. Given these findings, in order to achieve maximal potential benefit, future head and neck cancer screening clinics should target patients with identifiable risk factors and take full advantage of opportunities for education and prevention.
This study is designed to (1) determine the perceived quality of care received by patients with head and neck cancer at the end of their lives, in order to (2) better anticipate and improve upon the experiences of future patients.
Single-institution, academic tertiary care medical center.
Subjects and Methods
A validated survey instrument, the Family Assessment of Treatment at the End of life (FATE), was administered to families of patients who died of head and neck cancer (n=58). The primary outcome was the overall FATE score. Independent variables included clinical characteristics, treatments received and the care provided at the time of death.
Overall FATE scores and the domains assessing management of symptoms and care at the time of death did not vary by disease status (logoregional vs. distant metastasis) at the end of life (p=.989). The location of death in the home or in hospice (vs. hospital) significantly improves scores in all three categories (p=.023). Involvement of a palliative care team improved the care at the time of death (p<.001), and palliative treatments (radiation and/or chemotherapy) improved scores in management of symptoms and care at the time of death (p=.011, p=.017).
The FATE survey is a useful measure of the end of life experience of head and neck cancer patients. Palliative treatments of head and neck cancer, death outside of the hospital and palliative care team involvement all improve the end of life experience in this population.
Head and neck cancer; quality of life; end of life care
In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method.
Censored data; Constrained nonparametric maximum likelihood estimator; Kaplan–Meier estimator; Maximum likelihood estimator; Order restriction