|Home | About | Journals | Submit | Contact Us | Français|
*A Report of the Response Assessment in Neuro-Oncology Working Group.
To review the strengths and weaknesses of primary and auxiliary end points for clinical trials among patients with high-grade glioma (HGG). Recent advances in outcome for patients with newly diagnosed and recurrent HGG, coupled with the development of multiple promising therapeutics with myriad antitumor actions, have led to significant growth in the number of clinical trials for patients with HGG. Appropriate clinical trial design and the incorporation of optimal end points are imperative to efficiently and effectively evaluate such agents and continue to advance outcome. Growing recognition of limitations weakening the reliability of traditional clinical trial primary end points has generated increasing uncertainty of how best to evaluate promising therapeutics for patients with HGG. The phenomena of pseudoprogression and pseudoresponse have made imaging-based end points, including overall radiographic response and progression-free survival, problematic. Although overall survival is considered the “gold-standard” end point, recently identified active salvage therapies such as bevacizumab may diminish the association between presalvage therapy and overall survival. Finally, advances in imaging as well as the assessment of patient function and well being have strengthened interest in auxiliary end points assessing these aspects of patient care and outcome. Better appreciation of the strengths and limitations of primary end points will lead to more effective clinical trial strategies. Technical advances in imaging as well as improved survival for patients with HGG support the further development of auxiliary end points evaluating novel imaging approaches as well as measures of patient function and well being.
The identification of optimal end points for high-grade glioma (HGG) clinical trials has become an increasingly complex challenge, in part because of advances in diagnostic techniques, assessment of sequential imaging, and therapy. Traditional radiologic response guidelines defined 20 years ago that use 2-dimensional measurement of abnormal enhancement as a surrogate of tumor size and activity1 have become less applicable and reliable in the current therapeutic era, in which temozolomide chemoradiotherapy is routinely used for newly diagnosed patients and angiogenic inhibitors, such as bevacizumab, are administered at recurrence. Greater uncertainty in accurately defining progression due to pseudoprogression and pseudoresponse has, in turn, made the use of progression-free end points more problematic. The recently defined Response Assessment for Neuro-Oncology (RANO) criteria2 and technological advances in neuroimaging are expected to address some of these challenges. Although overall survival has traditionally been “the gold standard” for HGG clinical trials, active salvage therapies, such as bevacizumab, may impact survival and thus weaken the association between presalvage therapy and survival. As overall survival is improving, metrics of patient function and well being are increasingly valued by patients, caregivers, clinicians, and regulatory agencies and thus warrant consideration as end points of importance.
The RANO working group, consisting of leaders from European and North American neuro-oncology organizations and cooperative groups, includes neuro-oncologists, neurosurgeons, radiation oncologists, neuroradiologists, neuropsychologists, and biostatisticians, who are updating guidelines for evolving and complex clinical scenarios affecting patients with brain tumors. In addition to the criteria for response assessment in HGG, updates for response assessment for low-grade gliomas and for neurosurgical interventions are in preparation. The current article provides a review of the literature as well as consensus recommendations from the RANO working group for commonly employed end points in HGG clinical trials, and it discusses their applicability in the design of clinical trials for newly diagnosed and recurrent patients. We also highlight auxiliary end points that may play a growing role in the design and interpretation of future clinical trials. End points are ultimately relevant in the context of their application to specific clinical trial designs. Faulty study architecture may yield misleading or uninterpretable study results, regardless of selected study end points. Therefore, a separate manuscript dedicated to reviewing key controversies and considerations of clinical trial design for patients with malignant glioma is warranted and planned.
As with other solid tumors, the assessment of overall radiographic response (ORR) has been an important end point for HGG clinical trials, especially those that use cytotoxic chemotherapy. Reduction in tumor size may correlate with symptomatic improvement and longer survival and therefore may provide a surrogate for clinical benefit. Twenty years ago, Macdonald et al3 proposed criteria for ORR in patients with supratentorial HGG based on 2-dimensional measurement of contrast enhancement partly resulting from imaging-based responses of oligodendrogliomas to chemotherapy.4 Similar to the brain tumor response criteria proposed by Levin et al,5 Macdonald and colleauges' criteria integrated clinical status into the definitions of both response and progression, as well as corticosteroid use, into the definition of response. The latter was based on the observation that corticosteroid use can substantially decrease tumor contrast enhancement on both CT6 and MRI scans.7 Macdonald and colleagues' criteria have subsequently remained the adopted, de facto standard for response assessment in HGG clinical trials. Although a number of studies suggest that 1-dimensional tumor measurements using the Response Evaluation Criteria in Solid Tumors (RECIST) criteria8 may be equally effective, these criteria have not been widely adopted for patients with brain tumor9,10 Volumetric imaging may potentially be more accurate but is currently neither widely available nor standardized.11,12 Finally, change in tumor size as a continuous variable has been advocated as a measure of activity,13 but this approach also requires validation for patients with HGG.
The use of ORR as a primary end point has several advantages. ORR is a direct measure of therapeutic effect and is not affected by disease natural history, which may affect duration-based end points. It can also be determined relatively rapidly. In addition, there are extensive historical ORR data for a variety of salvage therapies, including cytotoxic chemotherapy14,15 and bevacizumab-based regimens,16,17 that can serve as comparators to the activity of planned study therapies. Furthermore, ORR is not affected by subsequent salvage therapies. For these reasons, some have argued that ORR is the only reliable end point for single-arm studies. Finally, regulatory agencies such as the US Food and Drug Administration (FDA) continue to place emphasis on ORR as an end point.18 In fact, durable ORR was the primary criterion cited by the FDA in their recent accelerated approval of bevacizumab for recurrent GBM.
However, the utility of ORR as a primary end point has potential limitations. First, the ability of ORR to consistently predict meaningful clinical benefit is variable.10–12,19 One possible explanation is that extravasation of contrast material across leaky intratumoral blood vessels, the surrogate used to determine tumor burden and thus ORR, can be influenced by several factors, including use of corticosteroids, inflammation, seizures, surgery, ischemia, and radiation effect on tumor, normal brain, and vascular permeability.20–22 Variations in radiological techniques can also influence contrast uptake.11 Furthermore, numeric cutoffs to define ORR were originally derived from the World Health Organization's solid tumor criteria, which may or may not be valid for CNS tumors. Importantly, ORR does not credit prolonged disease stabilization. Many new therapies, including targeted molecular agents, are predominantly cytostatic and are likely to produce stable disease rather than tumor regression. Furthermore, most classical cytotoxic agents do not yield significant ORR rates, yet they may show efficacy in terms of progression-free survival (PFS) or overall survical (OS) when given to patients with newly diagnosed conditions (eg, temozolomide). Such tumor control generally represents meaningful clinical benefit among patients with HGG. For cytostatic agents, other end points such as PFS at 6 months (PFS-6), may better reflect antitumor activity. Finally, radiographic assessment can be challenging because of difficulties in measuring irregularly shaped tumors, interobserver variability, lack of guidelines for assessment of multifocal disease, and the inability to measure nonenhancing tumor.11 A strategy to offset some of these issues is the use of blinded central imaging review to reduce bias and subjectivity in tumor measurement, thereby improving reliability. However, centralized review may not be practical for many phase II studies and may neglect clinical deterioration.
The utility of ORR based on contrast assessment is further limited by 2 clinically relevant scenarios, particularly in the current era. Although it has been noted after radiotherapy alone,23 an increase in contrast enhancement and edema, with or without accompanying neurologic findings, referred to as pseudoprogression, has been reported in up to 30% of patients with newly diagnosed HGG following completion of temozolomide chemoradiotherapy.24–26 In addition, pseudoprogression has been observed with other therapies, including biodegradable chemotherapy polymers,27 immunotoxins administered by convection-enhanced delivery,28 viral gene therapy,29 focal irradiation with brachytherapy or stereotactic radiosurgery,30 and immunotherapies.31Although the underlying mechanism is not fully understood, these changes are felt to reflect increased permeability and/or vascular damage. Unfortunately, no currently available imaging modality reliably distinguishes pseudoprogression from true progression with sufficient sensitivity and specificity for routine use. In such situations, progression can only be reliably defined if it occurs at a distant site, is confirmed histopathologically, or worsens on sequential imaging.
Paradoxically, the second scenario limiting current radiographic assessment is the phenomenon of pseudoresponse. Pseudoresponse refers to improved contrast enhancement due to diminished vascular permeability that does not necessarily reflect a true underlying anti-tumor effect. For example, rapid and marked improvement in contrast enhancement has been noted less than 1-2 days after antivascular endothelial growth factor (VEGF) therapy.32 In other cases, despite improved enhancement, progressive, infiltrative nonenhancing tumor as assessed by FLAIR/T2 sequences has been noted.33
The recently announced RANO criteria34 build on the foundation established by the criteria of Levin et al5 and Macdonald et al1 and were developed to address the radiographic findings associated with pseudoprogression and pseudoresponse. In addition, pending experience and evaluation of the RANO criteria, concerns remain regarding the interpretation of ORR as an end point for trials investigating therapeutic approaches that are associated with either pseudoprogression or pseudoresponse. Future studies that include an evaluation of the potential correlation of ORR, preferably using the RANO criteria, with OS, will be particularly helpful to better define the value of ORR as an end point in the modern era.
OS is considered the definitive primary end point for patients with both newly diagnosed and recurrent HGG, because it reflects a clinically meaningful benefit that can be objectively and unequivocally assessed. Furthermore, the natural history of HGG does not typically include long periods of disease quiescence, as may occur in low-grade gliomas. Therefore, improvement in OS is believed to directly reflect therapeutic efficacy.
However, the use of OS as a primary end point can have drawbacks. Factors unrelated to current treatment (eg, factors related to underlying disease biology) may affect survival time more than PFS or ORR and may therefore introduce greater variability in outcome. Studies based on OS also inherently require longer assessment time than do other measures, such as PFS or ORR. In addition, the impact of study therapy on OS may be less pronounced than its impact on PFS or ORR, resulting in the need for larger sample sizes. These factors may increase study duration and sample size and, hence, expense. Finally, start times may also vary across studies which can impact time to end point measures including OS. This potential variable highlights the need for studies to clearly specify start times and for clinicians to appropriately take study start time into account when comparing outcome across studies and control groups.
Another important potential drawback of OS as a primary end point is the impact of effective salvage therapies administered at recurrence, which may increase subsequent OS and lead to an apparent disconnect between a salvage therapy and OS improvement. Specifically, given the rapidly expanding use of bevacizumab for patients with HGG and its accelerated approval by the FDA in May 2009 for recurrent GBM, the interpretation of ongoing and future GBM studies will likely need to take into account the potential effect of such salvage therapies on OS. Future studies should consider documenting salvage therapy administered after study failure to better assess their potential impact on study OS. A strategy to help deal with the potentially confounding effect of salvage therapy, particularly for definitive or registration studies, is to include a statistical design that is sufficiently powered to evaluate a second end point that is not affected by salvage therapy.
For single-arm phase II studies that include historical controls, differences between the study and control groups, such as eligibility criteria, start times, or salvage therapy use, can impact end point comparisons including but not limited to OS. In addition, OS may improve over time due to general medical advances, such as earlier diagnosis, improved care of tumor symptoms and therapy complications, and better neurosurgical and radiation approaches. These factors coupled with the impact of random biases, such as patient selection, argue strongly for the incorporation of a contemporaneous and comparable randomized control group, particularly for advanced phase II studies.35,36 Nonetheless, there remains a need for study designs for early, small phase II trials to rapidly evaluate potentially advantageous treatments. For such studies, PFS at a defined time point may be a preferable end point. Although such studies may provide a signal of anti-tumor activity, definitive conclusions, including the decision to proceed to a pivotal phase III study, will be more appropriately based on a randomized study design.
For exploratory phase II trials designed to determine whether proceeding to a randomized phase III study is appropriate, OS rate measured at a defined time point for which there are established historical data, such as OS at 12 months (OS-12), can shorten trial length while still providing a measure of treatment efficacy. However, required accrual may increase for OS at a specified time point compared to overall median OS. Alternatively, clinical trial designs that incorporate a “run-in” phase II component to evaluate a more rapidly measured event (such as OS-12 or PFS-6) can be performed before transitioning into a randomized phase III trial that incorporates a median OS primary end point. Statistical models exist to permit a seamless shift from phase II to phase III.37
Several metrics have been used to gauge the durability of treatment antitumor activity. PFS measures time from treatment initiation to either progression or death from any cause. PFS end points have several potential strengths. First, the duration of time without tumor progression is usually clinically meaningful and, in general, reliably reflects treatment effect, because HGGs do not typically exhibit prolonged disease inactivity (in contrast to low-grade glioma). Second, PFS end points are not affected by subsequent therapies. Third, PFS-based studies are typically completed in a shorter time frame than OS-based studies because the outcome occurs sooner. They may also require smaller sample sizes because treatment effect is often greater on PFS than OS. Quicker, smaller studies are more feasible, economical, and rational in view of the large number of drugs under development.
Another value of PFS end points among patients with recurrent HGG is that they were recently shown to correlate with OS in 2 independent meta-analyses of cooperative group phase II trials.38,39 Similarly, recent retrospective data suggest that PFS-6 also correlates with OS among patients with newly diagnosed conditions.40 However, all of these data were generated prior to the routine use of bevacizumab salvage therapy and defined PD based on enhancing tumor measurement according to the criteria of Macdonald et al.1 The association of PFS-6 and OS for both patients with newly diagnosed GBM and those with recurrent GBM will require reevaluation among patients treated with anti-VEGF therapy and should include parameters to define PD that assess nonenhancing as well as enhancing disease such as the RANO criteria.34
There are potential disadvantages of PFS end points.18 First and foremost, determination of PFS is critically dependent on a reliable, standardized, objective means to define tumor progression. Such a definition must be quantifiable and able to be confirmed by external review and auditing. Current radiographic methods to define progression are problematic due to confounding factors including pseudoprogression and pseudoresponse, as discussed above. Improved guidelines, such as those defined by the RANO criteria, are expected to lessen the impact of these issues.19,34 The use of blinded independent central review of PFS can lessen the potential bias involved in determining PFS, but it is also not practical in all trials.41,42
Another potential drawback of PFS is evaluation time bias due to variably timed planned and unplanned patient assessments within and across clinical trials. One strategy to circumvent this issue is to assess PFS at a fixed time point. PFS-6 is usually preferred for patients with recurrent HGG, because 6 months is believed to represent a clinically meaningful period. In addition, meta-analysis of PFS-6 across comparable trials can provide a useful historical benchmark.38,39,43
This section briefly examines alternative imaging assessments and measures of patient function and well being that may be considered as auxiliary or secondary end points in future clinical trials. Unfortunately, an extensive review of such measures, as well as the potential association of tumor biomarkers and outcome, is beyond the scope of this manuscript.
Ideally, alternative imaging approaches should provide either an early predictive marker of an established clinical benefit (ORR, PFS, or OS) or a more accurate measure of tumor burden or extent compared to established techniques. However, before implementation, new techniques must be standardized, validated, and made widely available in terms of both access and cost.
Studies comparing 1-, 2- and 3-dimensional measurements and volumetric assessment suggest that there is good concordance between methods in determining radiologic response in adults with newly diagnosed and recurrent HGGs.9,10 However, given the geometric complexity of both contrast enhancement and T2/FLAIR changes associated with most glial tumors, volumetric evaluations may improve accuracy, whereas computerization of such measures may lessen interobserver variability.11
MRI can be used to define surrogates of vascular effect, such as cerebral blood volume and flow, as well as vascular permeability using techniques such as dynamic susceptibility contrast imaging, arterial spin labelling, and dynamic contrast enhanced MRI .32,44 A recent analysis suggests that multivariate algorithms of such indices may provide an accurate strategy to predict outcome following anti-angiogenic therapy.45
Evaluating major tissue metabolites such as choline, creatinine, N-acetyl-acetate, lipids, and lactic acid using magnetic resonance spectroscopy can help define the complex biology of HGGs.46 Although these data have yet to be successfully integrated into HGG trials, the ability to noninvasively interrogate tissue metabolites over time is attractive as a method to monitor treatment effects.
Diffusion-weighted imaging measures the diffusivity of water within brain tissue and can provide information on cell density and characteristics of tissue architecture through either the creation of apparent diffusion coefficient (ADC) maps47,48 or white matter tract morphology via fractional anisotropy maps.32 Preliminary analyses suggest that ADC values correlate with response to therapy.48–50 A recent report also suggests that diffusion MRI may be able to detect nonenhancing tumor progression in patients receiving anti-VEGF therapy.51
PET52 can monitor glioma therapy using radiolabeled, biologically relevant ligands such as fluorodeoxyglucose, amino acids and fluoro-L-thymidine.53,54 Each ligand is associated with potential merits and limitations and such approaches offer promise for a wide array of therapeutics.53,54 For example, in a recent study of patients with recurrent HGG treated with bevacizumab, a ≥25% reduction in tumor fluoro-L-thymidine uptake was predictive of PFS and OS.55
Importantly, “established” outcome parameters such as ORR, PFS, or OS are truly of clinical benefit only if they stabilize or improve patient function via tumor control and do not worsen patient function or well being due to treatment toxicity. Patient function and well being are particularly meaningful because they directly reflect the patient perspective. Objective assessment of patient function and well being is generally preferred as it reduces self-report biases and other confounders. For example, neurocognitive testing provides an objective approach to evaluating patient function that reflects neurologic integrity.56 In contrast, subjective self-reporting of neurocognitive status by patients is not well correlated to objective neurocognitive function and is more often associated with symptoms of depression or fatigue.57 However, some patient experiences (eg, symptoms such as pain, health related quality of life [HRQOL]) require use of a patient reported outcome (PRO) scale.
Several practical concerns warrant consideration as metrics of patient function and well being gain interest as potential clinical trial end points. Effective assessment tools must take into account the heterogeneity and complexity of neurologic manifestations caused by brain tumors and must also strive to minimize subjectivity, bias, and interobserver variability. Ideally, assessments could distinguish the effects of comorbid events (eg, strokes, seizures, and infections) or concurrent use of medications from those of underlying tumor. Just as imaging end points are often aided by the incorporation of clinical data (eg, information regarding comorbid, events such as strokes and seizures, or use of concurrent medications), inclusion of such data in the interpretation of patient assessments can be valuable.58 Furthermore, because long-term data regarding these measures among survivors are sparse, it may be difficult to distinguish a treatment effect from that of the natural disease course in uncontrolled, noncomparative studies. Other challenges of such end points include the relatively short period to progression in many patients, and the practical limitations posed by assessment frequency. Compliance and missing data can also be critical issues and may limit data analysis.59 Finally, because better-functioning patients are more likely to participate with such evaluations, attempts should be made to apply assessments broadly across good and poor performance patients.
Objective assessment of these dimensions may help to establish differences in antitumor activity, treatment-related toxicity, and time to neurological deterioration, the latter being of clinical importance, even in the absence of an objective antitumor response. The value of these alternative end points in HGG trials is partly dependent on the phase of the trial and the mechanism of action of the tested agent. For example, interpreting the impact of therapy on neurocognitive function or quality of life in uncontrolled phase II studies of novel agents with unproven activity may be impractical. However, once evidence of antitumor activity has been established, and a randomized phase II or III trial is developed, data from auxiliary end points assessing objective neurocognitive function and subjective patient reported outcomes become increasingly important.
In patients with HGG, corticosteroids are commonly used to decrease vascular permeability, mitigate peritumoral edema, and improve neurologic function. However, long-term corticosteroid use can contribute substantially to morbidity due to impairment of immune function, diminished wound healing, and the development of proximal myopathy, osteoporosis, mood and sleep disorders, hypertension, weight gain and diabetes. Given the negative effects of corticosteroids and the association of their use with tumor burden, it is reasonable to consider a decrease in corticosteroid dose as a clinical benefit. Furthermore, ~70% of patients with newly diagnosed GBM60 and 50% of those with recurrent GBM16 are receiving corticosteroids. These relatively high percentages make it feasible to power a study incorporating the reduction of corticosteroid dose as an end point. However, there is currently no uniform approach to initiating and adjusting corticosteroid dosing among patients with HGG, making the use of corticosteroid adjustment as a reliably measurable end point problematic. Nevertheless, reduction in steroid requirements was a positive effect noted in the recent FDA accelerated approval of bevacizumab for recurrent GBM.16
Even though the neurologic examination is modularized and neurologic scales exist for a number of neurologic disease categories, there is no clear method to define significant neurologic decline for brain tumor trials. Pretreatment patient performance, a widely evaluated metric across many oncology trials, provides a consistent and statistically significant prognostic factor for OS in brain cancer trials.61,62 Using change in performance status to indicate clinical progression is attractive given the ease and pertinence of its evaluation. However, HGG trials have yet to identify the performance status scale of choice for consistent use across studies. In addition, changes in performance status due to comorbid events or concurrent use of medications must be distinguished from those attributed to underlying tumor progression.
Neurocognitive dysfunction is ubiquitous among patients with HGG, is associated with diminished independence in activities of daily living,63 and can impact HRQOL along with several other factors.63,64 Furthermore, neurocognitive function can predict prognosis58,65,66 and may decline in advance of imaging evidence of progression.66–68 Previous research has demonstrated that assessment of more specific signs (eg, cognitive function) and symptoms (eg, fatigue) in patients with brain tumors are more prognostic than assessment of global constructs such as overall HRQOL.69 The assessment of neurocognitive function in clinical trials involves a compromise between the desire for sensitive and extensive testing and practical constraints posed by available resources and patient compliance. The absence of neuropsychological expertise at each center limits the extent to which cognition can be thoroughly tested. Therefore, a practical neurocognitive assessment in clinical trials should be brief, sensitive, and repeatable; have known psychometric properties, such as test-retest reliability and published norms; and be able to be administered by trained nonneuropsychologists. In a large multiple-center trial, neurocognitive assessments have demonstrated value.56
Clinical trials may consider implementing a small core set of neurocognitive tests for all participating patients or centers and incorporate more extensive testing at select centers with appropriate neuropsychological expertise. A core neurocognitive test battery that is inexpensive and poses low patient burden developed at the MD Anderson Cancer Center has been widely used in several Radiation Therapy Oncology Group trials.58 The tests have been translated into a limited number of languages for use in multinational trials.
Although assessments of mental status, such as the Mini-Mental Status Examination (MMSE), can be administered quickly and easily, these measures are most useful for evaluating delirium or dementia and are insensitive measures of neurocognitive function in patients with brain tumors.70 Nonetheless, MMSE assessment was found to be prognostic for OS in some HGG trials71 and may provide an early cue of tumor progression.66 Therefore, baseline MMSE can be considered in future trials as a possible stratification factor; however, the longitudinal evaluation of neurocognitive function is more sensitively achieved by formal testing.
Any assessment of a patient's perceived level of function, symptom experience or HRQOL is a PRO, which is inherently subjective in nature. Although HRQOL and self-reported symptom questionnaires are both subjective measures of an individual's health status that come directly from the patient, they are conceptually distinct forms of patient reported outcome (PROs). In terms of PRO measures, symptom questionnaires have been described as a preferred approach to HRQOL questionnaires to determine the impact of disease and treatment in a clinical trial as symptoms are thought to be closer to the biologic and physiologic factors impacting the patient than HRQOL. Specific symptom questionnaires (eg, Brief Fatigue Inventory52) as well as brain tumor specific multisymptom questionnaires (M.D. Anderson Symptom Inventory-Brain Tumor Module72) have been developed.
HRQOL has been defined as “the physical, psychological, and social domains of health, seen as distinct areas that are influenced by a person's experiences, beliefs, expectations, and perceptions.”73 Beliefs and expectations regarding health and the ability to cope with limitations and disability can greatly affect a person's perception of health, satisfaction with life and reporting on HRQOL questionnaires. Susceptibility to response shift over time and other biases further limit the use of HRQOL questionnaires to assess clinical benefit in clinical trials. Nonetheless, validated HRQOL questionnaires for patients with brain tumors can assess the physical, psychological, and social impact of the disease and its treatment.74 The general Functional Assessment of Cancer Therapy questionnaire combined with the Brain Tumor module (FACT-BR) addresses physical, social, emotional, and functional well being to provide a broad view of HRQOL issues with robust psychometric properties. Similarly, the QLQ-C30 developed by the EORTC as a general oncology questionnaire, which interrogates 5 main functional areas and provides symptom scales for fatigue, nausea/vomiting, and pain can be combined with a brain-tumor-specific questionnaire (BN20), which provides additional queries of neurologic and treatment parameters.
Recent clinical trials have identified therapeutic agents and strategies that meaningfully improve outcome for both newly diagnosed and recurrent patients with HGG. On the basis of laboratory advances that have significantly enhanced our understanding of cancer biology, a large number of therapeutic agents with diverse mechanisms of action are in clinical development. It is therefore now more important than ever to recognize the strengths and potential pitfalls associated with various clinical trial end points to evaluate these agents as efficiently as possible. Although OS remains the gold standard for phase III trials and most trials involving newly diagnosed patients, some of these trials may be enhanced by an additional end point that is not affected by subsequent salvage therapy, such as PFS at a defined interval. For most phase II studies, particularly for those involving patients with recurrent conditions, median OS and PFS at a defined time interval represent appropriate primary end points. Imaging response remains of value for selected situations but is potentially confounded by several factors including pseudoprogression and pseudoresponse. Application of the RANO criteria2 may lessen the impact of these factors and improve assessment of ORR as well as defining progression. Given the limitations of commonly used imaging assessment techniques, the development of alternative imaging approaches as secondary end points for clinical trials is encouraged. Finally, validated metrics of patient function and well being are available and increasingly valued as secondary end points.
1. End points of therapeutic clinical trials should be clearly defined and based on a rationale statistical hypothesis.
2. Although ORR is an important end point, its reliability is problematic in studies evaluating agents or regimens with cytostatic potential or that are associated with either pseudoprogression or pseudoresponse, particularly anti-VEGF therapies.
3. Metrics of clinical and radiographic evaluation should be standardized and should incorporate the recently defined RANO criteria.
4. Overall survival remains the gold-standard; however, if potentially compromised by salvage therapy, an additional end point of outcome should be considered that is independent of salvage therapy, such as ORR or PFS.
5. FS, including assessment at a fixed time point, is an appropriate end point for most phase II studies, especially among patients with recurrent conditions. Correlation of PFS end points with OS should be further explored in the current era and ideally validated with the RANO criteria. PFS end points can enhance studies with an OS primary end point if subsequent salvage therapy may affect OS.
6. To optimize study end point value, appropriate comparison groups should be used. Although concurrent controls are preferable in general, a historical comparison group may be considered but should reflect patients with comparable eligibility, chronological era of care, study start times, and subsequent salvage therapy.
7. Objective tests of neurocognitive function and subjective patient reported outcome measures are highly valued and should be practical, reliable, and validated in controlled study designs.
We gratefully acknowledge Wendy Gentry for the preparation of this manuscript. Conflict of interest statement. D.A.R. served on advisory boards for Genentech, Merck/S.P., Bayer, AstraZeneca, and Genzyme and consulted for EMD Serono and Genentech. T.F.C consulted for Genentech, AstraZeneca, Exelixis, Imclone, and Roche. A.B.L consulted for Bristol Meyers Squibb, Campus Bio, Cephalon, Eisai, Genentech, Imclone, and Merck/S.P. and served on speaker's bureau for Merck/S.P. M.R.G. consulted for Genentech and Merck/S.P. J.H.S. consulted for BRAINlab. M.C.C. served as an advisor for Genentech, Merck/S.P., Exelixix and consulted for Genentech, and Merck/S.P. M.P.M. has stock in Colby, Pharmacyclins, Procertus, Stemina, and Tomotherapy; served as an advisor for Colby, Stemina, and Procertus; served on the board of directors for Pharmacyclics; served on data safety monitoring board for Apogenix; consulted for Adnexus, Bayer, Merck, Roche, and Tomotherapy; and served as a speaker for Merck. M.A.V. consulted for AstraZeneca, Merck/S.P., BRAINlab, and Genentech; served on the Editorial Board for Neuro-Oncology and Secretary/Treasurer for Society of Neuro-Oncology. All other authors: none declared.