A critical challenge in the development of new molecularly targeted anticancer drugs is the identification of predictive biomarkers and the concurrent development of diagnostics for these biomarkers. Developing matched diagnostics and therapeutics will require new clinical trial designs and methods of data analysis. The use of adaptive design in phase III trials may offer new opportunities for matched diagnosis and treatment because the size of the trial can allow for subpopulation analysis. We present an adaptive phase III trial design that can identify a suitable target population during the early course of the trial, enabling the efficacy of an experimental therapeutic to be evaluated within the target population as a later part of the same trial. The use of such an adaptive approach to clinical trial design has the potential to greatly improve the field of oncology and facilitate the development of personalized medicine.
It is highly challenging to develop reliable diagnostic tests to predict patients’ responsiveness to anticancer treatments on clinical endpoints before commencing the definitive phase III randomized trial. Development and validation of genomic signatures in the randomized trial can be a promising solution. Such signatures are required to predict quantitatively the underlying heterogeneity in the magnitude of treatment effects.
We propose a framework for developing and validating genomic signatures in randomized trials. Codevelopment of predictive and prognostic signatures can allow prediction of patient-level survival curves as basic diagnostic tools for treating individual patients.
We applied our framework to gene-expression microarray data from a large-scale randomized trial to determine whether the addition of thalidomide improves survival for patients with multiple myeloma. The results indicated that approximately half of the patients were responsive to thalidomide, and the average improvement in survival for the responsive patients was statistically significant. Cross-validated patient-level survival curves were developed to predict survival distributions of individual future patients as a function of whether or not they are treated with thalidomide and with regard to their baseline prognostic and predictive signature indices.
The proposed framework represents an important step toward reliable predictive medicine. It provides an internally validated mechanism for using randomized clinical trials to assess treatment efficacy for a patient population in a manner that takes into consideration the heterogeneity in patients’ responsiveness to treatment. It also provides cross-validated patient-level survival curves that can be used for selecting treatments for future patients.
High-throughput ?omics? technologies that generate molecular profiles for biospecimens have been extensively used in preclinical studies to reveal molecular subtypes and elucidate the biological mechanisms of disease, and in retrospective studies on clinical specimens to develop mathematical models to predict clinical endpoints. Nevertheless, the translation of these technologies into clinical tests that are useful for guiding management decisions for patients has been relatively slow. It can be difficult to determine when the body of evidence for an omics-based test is sufficiently comprehensive and reliable to support claims that it is ready for clinical use, or even that it is ready for definitive evaluation in a clinical trial in which it may be used to direct patient therapy. Reasons for this difficulty include the exploratory and retrospective nature of many of these studies, the complexity of these assays and their application to clinical specimens, and the many potential pitfalls inherent in the development of mathematical predictor models from the very high-dimensional data generated by these omics technologies. Here we present a checklist of criteria to consider when evaluating the body of evidence supporting the clinical use of a predictor to guide patient therapy. Included are issues pertaining to specimen and assay requirements, the soundness of the process for developing predictor models, expectations regarding clinical study design and conduct, and attention to regulatory, ethical, and legal issues. The proposed checklist should serve as a useful guide to investigators preparing proposals for studies involving the use of omics-based tests. The US National Cancer Institute plans to refer to these guidelines for review of proposals for studies involving omics tests, and it is hoped that other sponsors will adopt the checklist as well.
Analytical validation; Biomarker; Diagnostic test; Genomic classifier; Model validation; Molecular profile; Omics; Personalized medicine; Precision Medicine; Treatment selection
Fibrotic disorders of the lung are associated with perturbations in the plasminogen activation system. Specifically, plasminogen activator inhibitor-1 (PAI-1) expression is increased relative to the plasminogen activators. A direct role for this imbalance in modulating the severity of lung scarring following injury has been substantiated in the bleomycin model of pulmonary fibrosis. However, it remains unclear whether derangements in the plasminogen activation system contribute more generally to the pathogenesis of lung fibrosis beyond bleomycin injury. To answer this question, we employed an alternative model of lung scarring, in which type II alveolar epithelial cells (AECs) are specifically injured by administering diphtheria toxin (DT) to mice genetically engineered to express the human DT receptor (DTR) off the surfactant protein C promoter. This targeted AEC injury results in the diffuse accumulation of interstitial collagen. In the present study, we found that this targeted type II cell insult also increases PAI-1 expression in the alveolar compartment. We identified AECs and lung macrophages to be sources of PAI-1 production. To determine whether this elevated PAI-1 concentration was directly related to the severity of fibrosis, DTR+ mice were crossed into a PAI-1-deficient background (DTR+: PAI-1−/−). DT administration to DTR+: PAI-1−/− animals caused significantly less fibrosis than was measured in DTR+ mice with intact PAI-1 production. PAI-1 deficiency also abrogated the accumulation of CD11b+ exudate macrophages that were found to express PAI-1 and type-1 collagen. These observations substantiate the critical function of PAI-1 in pulmonary fibrosis pathogenesis and provide new insight into a potential mechanism by which this pro-fibrotic molecule influences collagen accumulation.
PAI-1; lung; fibrosis; macrophage
Identification of genes that are synthetic lethal to p53 is an important strategy for anticancer therapy as p53 mutations have been reported to occur in more than half of all human cancer cases. Although genome-wide RNAi screening is an effective approach to finding synthetic lethal genes, it is costly and labor-intensive.
To illustrate this approach, we identified potentially druggable genes synthetically lethal for p53 using three microarray datasets for gene expression profiles of the NCI-60 cancer cell lines, one next-generation sequencing (RNA-Seq) dataset from the Cancer Genome Atlas (TCGA) project, and one gene expression data from the Cancer Cell Line Encyclopedia (CCLE) project. We selected the genes which encoded kinases and had significantly higher expression in the tumors with functional p53 mutations (somatic mutations) than in the tumors without functional p53 mutations as the candidates of druggable synthetic lethal genes for p53. We identified important regulatory networks and functional categories pertinent to these genes, and performed an extensive survey of literature to find experimental evidence that support the synthetic lethality relationships between the genes identified and p53. We also examined the drug sensitivity difference between NCI-60 cell lines with functional p53 mutations and NCI-60 cell lines without functional p53 mutations for the compounds that target the kinases encoded by the genes identified.
Our results indicated that some of the candidate genes we identified had been experimentally verified to be synthetic lethal for p53 and promising targets for anticancer therapy while some other genes were putative targets for development of cancer therapeutic agents.
Our study indicated that pre-screening of potential synthetic lethal genes using gene expression profiles is a promising approach for improving the efficiency of synthetic lethal RNAi screening.
Cancer; p53 mutations; Synthetic lethal genes; Gene expression profiles; Computational biology
In the context of national calls for reorganizing cancer clinical trials, the National Cancer Institute (NCI) sponsored a two day workshop to examine the challenges and opportunities for optimizing radiotherapy quality assurance (QA) in clinical trial design.
Participants reviewed the current processes of clinical trial QA and noted the QA challenges presented by advanced technologies. Lessons learned from the radiotherapy QA programs of recent trials were discussed in detail. Four potential opportunities for optimizing radiotherapy QA were explored, including the use of normal tissue toxicity and tumor control metrics, biomarkers of radiation toxicity, new radiotherapy modalities like proton beam therapy, and the international harmonization of clinical trial QA.
Four recommendations were made: 1) Develop a tiered (and more efficient) system for radiotherapy QA and tailor intensity of QA to clinical trial objectives. Tiers include (i) general credentialing, (ii) trial specific credentialing, and (iii) individual case review; 2) Establish a case QA repository; 3) Develop an evidence base for clinical trial QA and introduce innovative prospective trial designs to evaluate radiotherapy QA in clinical trials; and 4) Explore the feasibility of consolidating clinical trial QA in the United States.
Radiotherapy QA may impact clinical trial accrual, cost, outcomes and generalizability. To achieve maximum benefit, QA programs must become more efficient and evidence-based.
clinical trial design; credentialing; radiotherapy; quality assurance
Current educational interventions and training courses in microsurgery are often predicated on theories of skill acquisition and development that follow a 'practice makes perfect' model. Given the changing landscape of surgical training and advances in educational theories related to skill development, research is needed to assess current training tools in microsurgery education and devise alternative methods that would enhance training. Simulation is an increasingly important tool for educators because, whilst facilitating improved technical proficiency, it provides a way to reduce risks to both trainees and patients. The International Microsurgery Simulation Society has been founded in 2012 in order to consolidate the global effort in promoting excellence in microsurgical training. The society's aim to achieve standarisation of microsurgical training worldwide could be realised through the development of evidence based educational interventions and sharing best practices.
Curriculum; Education; Microsurgery; Teaching
Over the past decade, driven by advances in educational theory and pressures for efficiency in the clinical environment, there has been a shift in surgical education and training towards enhanced simulation training. Microsurgery is a technical skill with a steep competency learning curve on which the clinical outcome greatly depends. This paper investigates the evidence for educational and training interventions of traditional microsurgical skills courses in order to establish the best evidence practice in education and training and curriculum design. A systematic review of MEDLINE, EMBASE, and PubMed databases was performed to identify randomized control trials looking at educational and training interventions that objectively improved microsurgical skill acquisition, and these were critically appraised using the BestBETs group methodology. The databases search yielded 1,148, 1,460, and 2,277 citations respectively. These were then further limited to randomized controlled trials from which abstract reviews reduced the number to 5 relevant randomised controlled clinical trials. The best evidence supported a laboratory based low fidelity model microsurgical skills curriculum. There was strong evidence that technical skills acquired on low fidelity models transfers to improved performance on higher fidelity human cadaver models and that self directed practice leads to improved technical performance. Although there is significant paucity in the literature to support current microsurgical education and training practices, simulated training on low fidelity models in microsurgery is an effective intervention that leads to acquisition of transferable skills and improved technical performance. Further research to identify educational interventions associated with accelerated skill acquisition is required.
Microsurgery; Clinical competence; Education; Curriculum
Motivation: Tumors are thought to develop and evolve through a sequence of genetic and epigenetic somatic alterations to progenitor cells. Early stages of human tumorigenesis are hidden from view. Here, we develop a method for inferring some aspects of the order of mutational events during tumorigenesis based on genome sequencing data for a set of tumors. This method does not assume that the sequence of driver alterations is the same for each tumor, but enables the degree of similarity or difference in the sequence to be evaluated.
Results: To evaluate the new method, we applied it to colon cancer tumor sequencing data and the results are consistent with the multi-step tumorigenesis model previously developed based on comparing stages of cancer. We then applied the new method to DNA sequencing data for a set of lung cancers. The model may be a useful tool for better understanding the process of tumorigenesis.
Availability: The software is available at: http://linus.nci.nih.gov/Data/YounA/OrderMutation.zip
Supplementary data are available at Bioinformatics online.
Justicia insularis T. Anders (Acanthaceae) is a medicinal plant whose leaves and those of three other plants are mixed for the preparation of a concoction used to improve fertility and to reduce labour pains in women of the Western Region of Cameroon. Previous studies have demonstrated the inducing potential on ovarian folliculogenesis and steroidogenesis of the aqueous extract of the leaf mixture (ADHJ) of four medicinal plants (Aloe buettneri, Dicliptera verticillata, Hibiscus macranthus and Justicia insularis) among which the later represented the highest proportion. This study was aimed at evaluating the ovarian inducing potential of J. insularis in immature female rats. Various doses of the aqueous extract of J. insularis were daily and orally given, for 20 days, to immature female rats distributed into four experimental groups of twenty animals each. At the end of the experimental period some biochemical and physiological parameters of ovarian function were assayed. The administration of the aqueous extract of Justicia insularis significantly induced an early vaginal opening in all treated groups (P < 0.001) as well as an increase (at doses of 50 or 100 mg/kg) in the number of hemorrhagic points, Corpus luteum, implantation sites, ovarian weight, uterine and ovarian proteins. Ovarian cholesterol level (P < 0.05) significantly decreased in animals treated with the lowest dose (12.5 mg/kg). The evaluation of the toxicological effects of the extract on pregnancy showed that it significantly increased pre- and post-implantation losses, resorption index and decreased the rate of nidation as well as litter's weight. These results suggest that the aqueous extract of Justicia insularis induces ovarian folliculogenesis thus justifying its high proportion in the leaf mixture of ADHJ.
Justicia insularis; vaginal opening; ovary; fertility; gestation; resorption index
Interferon regulatory factor (IRF)-5 is a transcription factor involved in type I interferon signaling whose germ line variants have been associated with autoimmune pathogenesis. Since relationships have been observed between development of autoimmunity and responsiveness of melanoma to several types of immunotherapy, we tested whether polymorphisms of IRF5 are associated with responsiveness of melanoma to adoptive therapy with tumor infiltrating lymphocytes (TILs).
140 TILs were genotyped for four single nucleotide polymorphisms (rs10954213, rs11770589, rs6953165, rs2004640) and one insertion-deletion in the IRF5 gene by sequencing. Gene-expression profile of the TILs, 112 parental melanoma metastases (MM) and 9 cell lines derived from some metastases were assessed by Affymetrix Human Gene ST 1.0 array.
Lack of A allele in rs10954213 (G > A) was associated with non-response (p < 0.005). Other polymorphisms in strong linkage disequilibrium with rs10954213 demonstrated similar trends. Genes differentially expressed in vitro between cell lines carrying or not the A allele could be applied to the transcriptional profile of 112 melanoma metastases to predict their responsiveness to therapy, suggesting that IRF5 genotype may influence immune responsiveness by affecting the intrinsic biology of melanoma.
This study is the first to analyze associations between melanoma immune responsiveness and IRF5 polymorphism. The results support a common genetic basis which may underline the development of autoimmunity and melanoma immune responsiveness.
We demonstrate that clinical trials using response adaptive randomized treatment assignment rules are subject to substantial bias if there are time trends in unknown prognostic factors and standard methods of analysis are used. We develop a general class of randomization tests based on generating the null distribution of a general test statistic by repeating the adaptive randomized treatment assignment rule holding fixed the sequence of outcome values and covariate vectors actually observed in the trial. We develop broad conditions on the adaptive randomization method and the stochastic mechanism by which outcomes and covariate vectors are sampled that ensure that the type I error is controlled at the level of the randomization test. These conditions ensure that the use of the randomization test protects the type I error against time trends that are independent of the treatment assignments. Under some conditions in which the prognosis of future patients is determined by knowledge of the current randomization weights, the type I error is not strictly protected. We show that response-adaptive randomization can result in substantial reduction in statistical power when the type I error is preserved. Our results also ensure that type I error is controlled at the level of the randomization test for adaptive stratification designs used for balancing covariates.
Response adaptive randomization; adaptive stratification; clinical trials
For medical classification problems, it is often desirable to have a probability associated with each class. Probabilistic classifiers have received relatively little attention for small n large p classification problems despite of their importance in medical decision making. In this paper, we introduce 2 criteria for assessment of probabilistic classifiers: well-calibratedness and refinement and develop corresponding evaluation measures. We evaluated several published high-dimensional probabilistic classifiers and developed 2 extensions of the Bayesian compound covariate classifier. Based on simulation studies and analysis of gene expression microarray data, we found that proper probabilistic classification is more difficult than deterministic classification. It is important to ensure that a probabilistic classifier is well calibrated or at least not “anticonservative” using the methods developed here. We provide this evaluation for several probabilistic classifiers and also evaluate their refinement as a function of sample size under weak and strong signal conditions. We also present a cross-validation method for evaluating the calibration and refinement of any probabilistic classifier on any data set.
Gene expression analysis; High-dimensional data; Microarray; Probabilistic classification
In 2009, an outbreak of raccoon rabies in Central Park in New York City, New York, USA, infected 133 raccoons. Five persons and 2 dogs were exposed but did not become infected. A trap-vaccinate-release program vaccinated ≈500 raccoons and contributed to the end of the epizootic.
rabies; raccoon; vaccination; epizootic; urban; New York; TVR; trap-vaccinate-release; viruses
Developments in whole genome biotechnology have stimulated statistical focus on prediction methods. We review here methodology for classifying patients into survival risk groups and for using cross-validation to evaluate such classifications. Measures of discrimination for survival risk models include separation of survival curves, time-dependent ROC curves and Harrell’s concordance index. For high-dimensional data applications, however, computing these measures as re-substitution statistics on the same data used for model development results in highly biased estimates. Most developments in methodology for survival risk modeling with high-dimensional data have utilized separate test data sets for model evaluation. Cross-validation has sometimes been used for optimization of tuning parameters. In many applications, however, the data available are too limited for effective division into training and test sets and consequently authors have often either reported re-substitution statistics or analyzed their data using binary classification methods in order to utilize familiar cross-validation. In this article we have tried to indicate how to utilize cross-validation for the evaluation of survival risk models; specifically how to compute cross-validated estimates of survival distributions for predicted risk groups and how to compute cross-validated time-dependent ROC curves. We have also discussed evaluation of the statistical significance of a survival risk model and evaluation of whether high-dimensional genomic data adds predictive accuracy to a model based on standard covariates alone.
predictive medicine; survival risk classification; cross-validation; gene expression
Cell type heterogeneity may have a substantial effect on gene expression profiling of human tissue. Several in silico methods for deconvoluting a gene expression profile into cell-type-specific subprofiles have been published but not widely used. Here, we consider recent methods and the experimental validations available for them. Shen-Orr et al. recently developed an approach called cell-type-specific significance analysis of microarray for deconvoluting gene expression. This method requires the measurement of the proportion of each cell type in each sample and the expression profiles of the heterogeneous samples. It determines how gene expression varies among pre-defined phenotypes for each cell type. Gene expression can vary substantially among cell types and sample heterogeneity can mask the identification of biologically important phenotypic correlations. Consequently, the deconvolution approach can be useful in the analysis of mixtures of cell populations in clinical samples.
Although numerous methods of using microarray data analysis for cancer classification have been proposed, most utilize many genes to achieve accurate classification. This can hamper interpretability of the models and ease of translation to other assay platforms. We explored the use of single genes to construct classification models. We first identified the genes with the most powerful univariate class discrimination ability and then constructed simple classification rules for class prediction using the single genes.
We applied our model development algorithm to eleven cancer gene expression datasets and compared classification accuracy to that for standard methods including Diagonal Linear Discriminant Analysis, k-Nearest Neighbor, Support Vector Machine and Random Forest. The single gene classifiers provided classification accuracy comparable to or better than those obtained by existing methods in most cases. We analyzed the factors that determined when simple single gene classification is effective and when more complex modeling is warranted.
For most of the datasets examined, the single-gene classification methods appear to work as well as more standard methods, suggesting that simple models could perform well in microarray-based cancer prediction.
biomarkers; early detection; genomics; personalized medicine; translational research
DAPfinder and DAPview are novel BRB-ArrayTools plug-ins to construct gene coexpression networks and identify significant differences in pairwise gene-gene coexpression between two phenotypes.
Each significant difference in gene-gene association represents a Differentially Associated Pair (DAP). Our tools include several choices of filtering methods, gene-gene association metrics, statistical testing methods and multiple comparison adjustments. Network results are easily displayed in Cytoscape. Analyses of glioma experiments and microarray simulations demonstrate the utility of these tools.
DAPfinder is a new friendly-user tool for reconstruction and comparison of biological networks.
We consider the problem of designing a study to develop a predictive classifier from high dimensional data. A common study design is to split the sample into a training set and an independent test set, where the former is used to develop the classifier and the latter to evaluate its performance. In this paper we address the question of what proportion of the samples should be devoted to the training set. How does this proportion impact the mean squared error (MSE) of the prediction accuracy estimate?
We develop a non-parametric algorithm for determining an optimal splitting proportion that can be applied with a specific dataset and classifier algorithm. We also perform a broad simulation study for the purpose of better understanding the factors that determine the best split proportions and to evaluate commonly used splitting strategies (1/2 training or 2/3 training) under a wide variety of conditions. These methods are based on a decomposition of the MSE into three intuitive component parts.
By applying these approaches to a number of synthetic and real microarray datasets we show that for linear classifiers the optimal proportion depends on the overall number of samples available and the degree of differential expression between the classes. The optimal proportion was found to depend on the full dataset size (n) and classification accuracy - with higher accuracy and smaller n resulting in more assigned to the training set. The commonly used strategy of allocating 2/3rd of cases for training was close to optimal for reasonable sized datasets (n ≥ 100) with strong signals (i.e. 85% or greater full dataset accuracy). In general, we recommend use of our nonparametric resampling approach for determing the optimal split. This approach can be applied to any dataset, using any predictor development method, to determine the best split.
A substantial number of studies have reported the development of gene expression–based prognostic signatures for lung cancer. The ultimate aim of such studies should be the development of well-validated clinically useful prognostic signatures that improve therapeutic decision making beyond current practice standards. We critically reviewed published studies reporting the development of gene expression–based prognostic signatures for non–small cell lung cancer to assess the progress made toward this objective. Studies published between January 1, 2002, and February 28, 2009, were identified through a PubMed search. Following hand-screening of abstracts of the identified articles, 16 were selected as relevant. Those publications were evaluated in detail for appropriateness of the study design, statistical validation of the prognostic signature on independent datasets, presentation of results in an unbiased manner, and demonstration of medical utility for the new signature beyond that obtained using existing treatment guidelines. Based on this review, we found little evidence that any of the reported gene expression signatures are ready for clinical application. We also found serious problems in the design and analysis of many of the studies. We suggest a set of guidelines to aid the design, analysis, and evaluation of prognostic signature studies. These guidelines emphasize the importance of focused study planning to address specific medically important questions and the use of unbiased analysis methods to evaluate whether the resulting signatures provide evidence of medical utility beyond standard of care–based prognostic factors.
Rationale: Ineffective repair of a damaged alveolar epithelium has been postulated to cause pulmonary fibrosis. In support of this theory, epithelial cell abnormalities, including hyperplasia, apoptosis, and persistent denudation of the alveolar basement membrane, are found in the lungs of humans with idiopathic pulmonary fibrosis and in animal models of fibrotic lung disease. Furthermore, mutations in genes that affect regenerative capacity or that cause injury/apoptosis of type II alveolar epithelial cells have been identified in familial forms of pulmonary fibrosis. Although these findings are compelling, there are no studies that demonstrate a direct role for the alveolar epithelium or, more specifically, type II cells in the scarring process.
Objectives: To determine if a targeted injury to type II cells would result in pulmonary fibrosis.
Methods: A transgenic mouse was generated to express the human diphtheria toxin receptor on type II alveolar epithelial cells. Diphtheria toxin was administered to these animals to specifically target the type II epithelium for injury. Lung fibrosis was assessed by histology and hydroxyproline measurement.
Measurements and Main Results: Transgenic mice treated with diphtheria toxin developed an approximately twofold increase in their lung hydroxyproline content on Days 21 and 28 after diphtheria toxin treatment. The fibrosis developed in conjunction with type II cell injury. Histological evaluation revealed diffuse collagen deposition with patchy areas of more confluent scarring and associated alveolar contraction.
Conclusions: The development of lung fibrosis in the setting of type II cell injury in our model provides evidence for a causal link between the epithelial defects seen in idiopathic pulmonary fibrosis and the corresponding areas of scarring.
diphtheria toxin; lung; collagen; scarring
The development of tumor biomarkers ready for clinical use is complex. We propose a refined system for biomarker study design, conduct, analysis, and evaluation that incorporates a hierarchal level of evidence scale for tumor marker studies, including those using archived specimens. Although fully prospective randomized clinical trials to evaluate the medical utility of a prognostic or predictive biomarker are the gold standard, such trials are costly, so we discuss more efficient indirect “prospective–retrospective” designs using archived specimens. In particular, we propose new guidelines that stipulate that 1) adequate amounts of archived tissue must be available from enough patients from a prospective trial (which for predictive factors should generally be a randomized design) for analyses to have adequate statistical power and for the patients included in the evaluation to be clearly representative of the patients in the trial; 2) the test should be analytically and preanalytically validated for use with archived tissue; 3) the plan for biomarker evaluation should be completely specified in writing before the performance of biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier; and 4) the results from archived specimens should be validated using specimens from one or more similar, but separate, studies.
Physicians need improved tools for selecting treatments for individual patients. Many diagnostic entities hat were traditionally viewed as individual diseases are heterogeneous in their molecular pathogenesis and treatment responsiveness. This results in the treatment of many patients with ineffective drugs, incursion of substantial medical costs for the treatment of patients who do not benefit and the conducting of large clinical trials to identify small, average treatment benefits for heterogeneous groups of patients. In oncology, new genomic technologies provide powerful tools for the selection of patients who require systemic treatment and are most (or least) likely to benefit from a molecularly targeted therapeutic. In the large amount of literature on biomarkers, there is considerable uncertainty and confusion regarding the specifics involved in the development and evaluation of prognostic and predictive biomarker diagnostics. There is a lack of appreciation that the development of drugs with companion diagnostics increases the complexity of clinical development. Adapting to the fundamental importance of tumor heterogeneity and achieving the benefits of personalized oncology for patients and healthcare costs will require paradigm changes for clinical and statistical investigators in academia, industry and regulatory agencies. In this review, I attempt to address some of these issues and provide guidance on the design of clinical trials for evaluating the clinical utility and robustness of prognostic and predictive biomarkers.
adaptive design; biomarker; clinical trial design; predictive; prognostic; validation
The traditional oncology drug development paradigm of single arm phase II studies followed by a randomized phase III study has limitations for modern oncology drug development. Interpretation of single arm phase II study results is difficult when a new drug is used in combination with other agents or when progression free survival is used as the endpoint rather than tumor shrinkage. Randomized phase II studies are more informative for these objectives but increase both the number of patients and time required to determine the value of a new experimental agent. In this paper, we compare different phase II study strategies to determine the most efficient drug development path in terms of number of patients and length of time to conclusion of drug efficacy on overall survival.