|Home | About | Journals | Submit | Contact Us | Français|
The Response Evaluation Criteria in Solid Tumors, or RECIST criteria (one-dimensional [1D] measurement), are widely used to measure response in tumors, but there are few studies evaluating these criteria in brain tumors. We compared linear and volumetric measurements in adult high-grade supratentorial enhancing gliomas to determine the agreement between measurements, in defining responses and in their subsequent relation to survival. We hypothesized that the 1D RECIST criteria maybe suitable for response assessment in adult high-grade gliomas. Tumor size on MRI scans in 104 patients with high-grade enhancing gliomas treated on clinical trial protocols was measured by using 1D (greatest length), 2D (two-dimensional: product of the two longest perpendicular diameters), 3D (three dimensional: product of the longest perpendicular diameters in one plane and the longest orthogonal diameter to that plane), enhancing volume (EV), and total volume (TV). A total of 388 T1 postgadolinium MRI scans (104 baseline and 284 follow-up scans) were evaluated. Volumetric analysis (EV and TV) was performed with commercially available software. Intraobserver and interobserver correlations (ρ) were high for all modalities (ρ > 0.92 and ρ > 0.71, respectively). Correlation was excellent (ρ > 0.9) among all modalities except for 3D (ρ < 0.6). Patient response rates ranged from 12% to 26%. Median progression-free survival (mPFS) and six-month progression-free survival (6mPFS) were not significantly different among the methods (range, 5.3 months to 5.9 months and 42% to 48%, respectively). Landmark analyses of response at two months using linear methods predicted overall survival with hazard ratios of 0.19 to 0.29 (P < 0.005). These results suggest high concordance among 1D, 2D, TV, and EV, but not 3D, methods in assessing enhancing tumor progression and in estimating mPFS and 6mPFS in adult brain tumor patients. The tumor response at two months assessed by linear methods correlated better with overall survival. Thus, linear methods are comparable to volumetric methods, but simpler to implement for routine clinical use and for designing clinical trials of brain tumors.
The past few decades have witnessed the emergence and implementation of various radiographic criteria in assessing tumor response and in guiding therapy. The WHO criteria, first introduced in 1979 (Miller et al., 1981; WHO, 1979), represented an early attempt to define objective response to an anticancer agent based on the change in the size of the lesion as measured by determining the product of the largest diameter and its perpendicular length for each measurable lesion (two-dimensional, 2D) and summing the products. Responses were characterized as follows (Table 1): complete response (CR), complete disappearance of tumor; partial response (PR), at least 50% decrease in the products of the two largest perpendicular diameters; progressive disease (PD), at least 25% increase in the products of the two largest perpendicular diameters; and stable disease (SD), neither PR nor PD. In 1990, Macdonald et al. (1990) suggested that the 2D WHO criteria be applied to brain tumors. They proposed specifically that these criteria would be more useful in patients who were not on steroids and also did not experience clinical deterioration other than that attributable to progressive tumor burden (e.g., systemic or metabolic disturbances). These Macdonald criteria have become the standard for many brain tumor trials.
The Response Evaluation Criteria in Solid Tumors (RECIST) were introduced in 2000 (Therasse et al., 2000) in an attempt to simplify and standardize the objective response criteria to cancer therapy in solid tumors. These criteria used a unidimensional (1D) measurement (the sums of the longest diameters of tumors) instead of two-dimensional measurements (2D, the sum of products of perpendicular diameters) to determine response. With the RECIST criteria (Table 1), responses were redefined: CR was defined as complete disappearance of the lesion, PR as at least 30% decrease in the sums of the longest diameter of tumors, PD as at least 20% increase in the sums of the longest diameter of tumors, and SD as neither PR nor PD. There is excellent concordance between the RECIST criteria and the WHO criteria in determining response in extracranial tumors (Shi et al., 1998; Therasse et al., 2000), but there have been few studies validating these criteria in brain tumors. Furthermore, a fifth category, minor response (MR), is commonly used to capture smaller responses and is defined as a decrease between 12% and 25% in 1D, between 25% and 50% in 2D, and between 40% and 65% in three-dimensional (3D) and volume methods.
Advances in MRI technology have led to the ability to measure tumor volumes by automatic and/or manual outlining of tumor boundaries (Aoyama et al., 2001; Joe et al., 1999; Mazzara et al., 2004; Moonis et al., 2002; Shi et al., 1998; Sorensen et al., 2001; Weltens et al., 2001). Warren et al. (2001) evaluated 130 MRI scans in 32 children with brain tumors and found good concordance between the 1D, 2D, and enhancing volume (EV) methods in determining PR. However, there was fairly marked variation between the different methods in determining time to disease progression. Galanis et al. (2003) evaluated 614 MRI scans from 73 newly diagnosed adult brain tumor patients and found no difference in survival among patients characterized as responders or nonresponders on the basis of 1D versus 2D measurements. To the best of our knowledge, there have been no other studies evaluating the 1D criteria in adults with up-front and recurrent brain tumors, the setting of the majority of clinical trials in brain tumors.
Currently, the method most commonly selected by neuro-oncologists to measure objective response is use of the 2D Macdonald criteria (Prados et al., 2004) with 3D or volumetric methods (Batchelor et al., 2004). Additionally, there are variations in the criteria (Table 1) among groups that limit direct comparisons of response rates in clinical trials. Thus, there exists a need to simplify and standardize the methodology of these measurements. In this study, we evaluated contrast-enhanced MRI scans of 104 adults with high-grade supratentorial enhancing gliomas and compared the 1D, 2D, 3D, computer-assessed total tumor volume (TV), and EV measurements with one another and with clinical neurological status in determining tumor response and progression. We hypothesized that, as for other solid tumors, the 1D (RECIST) criteria are equivalent to other linear and volumetric criteria in assessing response and are thus applicable for response assessment in adult high-grade gliomas.
Adult patients with supratentorial high-grade gliomas on treatment protocols between the years 2001 and 2004 at the Dana-Farber Cancer Institute, Brigham and Women’s Hospital, and Massachusetts General Hospital in Boston, Massachusetts, were eligible for this study. Only patients with measurable enhancing tumor by the methods discussed below, two or more available sequential MRI scans, and clear documentation of status of clinical progression were selected. In all cases, the baseline MRI was the same as the baseline MRI used for the given protocol. MRI scans were generally spaced apart by eight weeks according to protocol requirements, and in cases where MRIs were performed for nonprotocol reasons, no two scans used in this study were less than two weeks apart. The 2D Macdonald criteria were used to identify PD (>25% increase) and thus to remove patients from protocols. All the scans used for this study were performed while patients were on protocol, although some patients underwent more than one protocol.
Of note, all patients had undergone standard external beam radiation treatment, but none had stereotactic radiosurgery; therefore, most, though not all, of the observed tumor progression could be interpreted as authentic progression and not radiation-related necrosis (see discussion). Furthermore, all patients were on stable doses of steroids for at least five days prior to each MRI scan. The Institutional Review Board overseeing all of these institutions approved this study (Dana-Farber Cancer Institute protocol 03–260).
MRIs of the brain were performed on a 1.5-tesla scanner with a standard quadrature head coil. Patients received Magnevist (Berlex Biosciences, Richmond, Calif.) according to weight (0.1 cm3/lb) given as a bolus injection through a peripheral vein infusion. The brain was scanned immediately after injection, and all scans were completed within 30 min. Among others, axial and coronal T1 pre-gadolinium and postgadolinium images were obtained and used for this study. The digitized images were transferred to an Impax Diagnostic Workspace system (Agfa-Gevaert Group, Mortsel, Belgium), where they were available for review by the authors. For volumetric analysis, volume-calculating software (VITREA 2, 3.3.2, 1997–2002 [Vital Images, Inc., Minnetonka, Minn.]) was used to obtain the measurements.
Two investigators (G.D.S. and S.K.) performed all 1D, 2D, and 3D analyses of 388 MRI scans using digital calipers on an Impax workstation, while volumetric analyses and calculations were performed on a VITREA 2 workstation. On a subset of 50 randomly chosen scans, measurements were repeated by both of the aforementioned investigators to determine intraobserver and interobserver variability. Most patients (72%) had only one measurable lesion. The 1D measurement was the sum of the longest diameters in any plane (axial, sagittal, or coronal) of all visible contrast-enhancing lesions on the postgadolinium T1-weighted images. The 2D measurements were the sum of the products of the largest diameters and their maximum perpendicular diameters in the same plane. The 3D measurements were the sum of the products of the 2D measurements and the longest vertical diameters on the view perpendicular to the 1D and 2D planes (i.e., usually the coronal plane unless it contained the longest 1D diameters of any plane, in which case the axial plane was used for third measurement). In cases where there was a gradient of enhancement, a point was selected to begin the measurement at which there was a clearly visible transition from nonenhancement to enhancement. In cases where a nonenhancing cystic cavity was present within a surrounding area of enhancement, we maintained the same rigid criteria of measurement—the longest diameters of enhancing tumor, regardless of where the cystic cavity was located.
For volumetric analysis, the tumor was outlined in all axial planes with a computer-assisted, free-outline technique on postgadolinium T1 images, and EV or TV was calculated by using the VITREA 2 workstation. These raw measurements were compared to the baseline or best overall response scan for each patient, defined as the scan with the smallest size of tumor, which could have been either at the start of the study or on subsequent scans if the tumor responded to treatment. Tumor response was codified as CR, PR, MR, SD, and PD when compared to best overall response scan for each criteria as defined in Table 1. Clinical progression was defined as the appearance of a new neurological deficit or the permanent progression of an existing neurological deficit, that is, deficits that do not resolve with an increase in steroids. Median progression-free survival (mPFS) and six-month progression-free survival (6mPFS) were calculated separately for each measurement method and were defined from the date of protocol registration to the date of progression according to each measurement method. Overall survival was defined as time from registration to death.
Response rates and progression-free survival (PFS) were determined by each of the measurement methods described above. PFS curves were plotted by using the Kaplan–Meier method. Intraobserver and interobserver correlations were calculated with raw measurement data by using Pearson correlation coefficients. Analyses of correlation between different methods were performed comparing the percent change from baseline or best response scan with 95% CIs. MR and SD were combined to compare the concordance between the different methods in classifying response. Scan response rate was calculated by adding all the scans labeled as CR and PR and dividing by the total number of follow-up scans (284) in each method. Patient response rate was calculated by adding the number of responders (CR and PR) and dividing by the total number of patients (104) for each method. All analyses were performed with SAS statistical software version 8.0 (SAS Institute, Inc., Cary, N.C.). Median and six-month PFS rates were estimated by the Kaplan–Meier method. Landmark analysis was performed at two-month and six-month time points to determine whether response predicted overall survival. The landmark analysis at a fixed time point looks at all patients who are alive and not censored by that time and compares the subsequent overall survival between the two categories: those who have progressed by that time and those who have not (Anderson et al., 1983).
A total of 388 MRI scans were analyzed from 104 adult patients with measurable, enhancing high-grade gliomas. Ages ranged from 28 to 80 years, with a median age of 54 years (Table 2). There were 54 men and 50 women. Of these 104 patients, nine had anaplastic astrocytomas (AAs), seven had anaplastic oligodendrogliomas (AOs), four had mixed oligoastrocytomas, and 84 had glioblastoma multiforme (GBM). The median age was 55.5 years for GBM, 39 years for AA, 48 years for AO, and 39 years for mixed oligoastrocytomas. A total of 104 baseline scans and 284 follow-up scans were used for this analysis. The median number of scans per patient was three (two scans, 29%; three scans, 22%; four scans, 17%; five scans, 13%; and six or more scans, 18%). For 74 of the 104 patients, the tumors being evaluated were recurrent, whereas for the other 30 patients, the tumors were either up-front or postradiation treatment. Mean follow-up from initial scan was 7.4 months (median, 6.2 months), and 63% of the patients were alive at the time of analysis. Median tumor measurements were as follows: 1D, 5.06 cm; 2D, 15.61 cm2; 3D, 56.6 cm3; total volumes, 21.6 cm3; and enhancing volumes, 16.0 cm3.
To determine variability in measurements, intraobserver and interobserver correlations were calculated for a randomly selected sample of 50 MRI scans taken from the full set of 388 (Table 3). Intraobserver Pearson correlation coefficient rho (ρ) values using the raw measurements were all greater than 0.92 with narrow 95% CIs. Interobserver ρ values were all greater than 0.71 with wider 95% CIs.
Pearson correlation coefficients of percent changes were calculated for all pairwise methods (1D vs. 2D, 1D vs. 3D, etc.) measurements to determine the concordance between each pair of the measured modalities. A high concordance would demonstrate a high agreement in percent change for each scan as measured by a pair of modalities (e.g., 1D vs. 2D, etc). The 1D, 2D, TV, and EV measurements all correlated extremely well with one another, with ρ values greater than 0.91 and with narrow CIs (Table 4). The 3D measurements correlated poorly with the other methods.
We determined the degree of fit among all the methods in categorizing treatment response (Tables 5 and and6).6). All of the follow-up scans were classified into each response category according to the percentage change from baseline or best overall response as defined by the various measurement methods (Table 5). The CR rate was 1% in all methods, with more variance for the other responses due to differing criteria for response: PR, 5% to 17%; SD, 36% to 46%; and PD, 37% to 47%. The linear methods tended to be in closer agreement than the volume methods, with 1D and 2D being in closest agreement, and EV^ and TV being in least agreement. There was excellent scanwise agreement among all methods for detecting CRs (100%), PDs (93%), and SDs (88%) and less agreement for PRs (65%) when using 1D as the reference. EV^ and 3D^ methods, in which response was defined as a 50% reduction (rather than >65% reduction) in tumor volume, were associated with a higher overall response rate as compared to the other methods (13%–18% vs. 7%–12%, respectively). We also categorized each patient’s best overall response as measured by each modality and calculated frequency of agreement between each modality with regard to response categories (Table 6). This shows that patient best overall response concordance rate is less than scan concordance rate (Table 5), with linear and volumetric methods agreeing better internally (e.g., 1D vs. 2D) than with each other (e.g., 1D vs. EV).
At the time of this analysis, 65 (63%) patients were still alive. We compared the mPFS and 6mPFS as measured by these methods as well as by clinical status (Table 7). The mPFS and 6mPFS for each of the modalities were comparable, with mPFS ranging from 5.3 to 5.9 months (Fig. 1) and 6mPFS ranging from 42% to 48%. The volume methods, in general, had a higher PFS than did the linear methods. A subset analysis of the 74 patients with recurrent tumors reveals a shorter mPFS (range, 4.3–5.5 months) and lower 6mPFS (30%–40%) compared to mPFS and 6mPFS for all patients; however, again the mPFS and 6mPFS were comparable across modalities. A subset analysis of GBMs showed an mPFS of 5.5 to 6.1 months, which was little different from that of the total population (5.3–5.9 months) because of the predominance of GBMs (84 of the 104 cases) in this study and low numbers of other tumor types. Median PFS based on clinical criteria was similar across the tumor subtypes (log-rank test P = 0.7) as was median overall survival (log-rank test P = 0.6). The lack of difference in overall survival among the tumor subtypes was attributed to the fact that the majority of the anaplastic gliomas were multiply recurrent. A subset analysis of the 75 patients (72% total) with only one measurable lesion revealed no significant difference from the overall patient population in mPFS (5.2–5.7 months) or in 6mPFS (42%–48%).
To correlate overall survival with PFS as defined by each of the modalities, we used landmark analysis. There was a significant correlation (P < 0.05, hazard ratios < 0.3) between nonprogression at two months (n = 99 patients alive at 2 months) and overall survival by the linear measurement methods (1D, 2D, 3D, 2D^, and 3D^) and one volumetric method (EV), but not by the other two volumetric methods (TV and EV^) (P > 0.05) (Table 8). However, there was no significant correlation between nonprogression at six months (53 patients alive at six months) and overall survival, with the exception of that observed by the 3D^ method. Although 65 patients were still alive at time of this analysis, only 53 patients reached the six-month follow-up to be included in the six-month PFS landmark analysis. There were only 17 subsequent deaths in this group. Among these patients, the maximum follow-up time was 26.8 months from entry to study, and the estimated median follow-up time (after adjusting for censoring by deaths) was 11.1 months from study entry. Clinical progression was better than all of the imaging modalities at both two months and six months in predicting overall survival. Landmark analysis of only the patients with recurrent disease (71 patients for two-month and 34 patients for six-month analyses) showed a similar trend but with lower significance levels.
A particularly problematic issue in measuring gliomas is that they grow in three dimensions and infiltrate extensively and asymmetrically in several directions. Furthermore, tumor size may be confounded by the presence of cystic areas that may or may not contain actual tumor. Consequently, we found that it was sometimes difficult to determine exactly where to begin and end a measurement, especially when a cystic cavity was present or when a gradient of enhancement was difficult to differentiate from normal brain tissue. However, this ambiguity is an inherent and accepted limitation of current radiographic criteria for measurement of brain tumors that is most effectively addressed by employing the same rigorous methodology described above in each measurement. Because we dealt with patients who had measurable enhancing tumor, this factor was less of a concern than dealing with patients in routine neuro-oncologic practice, where patients may be more likely to have non-enhancing or nonmeasurable lesions.
Accurate assessment of tumor response is also difficult in clinical trials of brain tumors because different methods are used among clinical trial consortia. Thus, there is not a clear consensus that enables comparison among different trials. In this study, we show that linear measurements (except 3D) of enhancing lesions correlated well with volumetric measurements and in determining mPFS, but that response assessment by linear methods differed from that by volume methods. However, response categorization at two months with linear methods was superior at predicting overall survival.
One concern of using objective, quantitative radiologic criteria to determine changes in tumor size is that of reproducibility across observers (interobserver correlation) and across different sessions for the same observer (intraobserver correlation) (Chisholm et al., 1989; Erasmus et al., 2003; James et al., 1999; Lavin and Flowerdew, 1980; Sorensen et al., 2001; Vos et al., 2003). We demonstrate that the same observer is likely to interpret a lesion quite closely over the course of two sessions. Our two observers interpreted the same lesions in close but less conforming ways. This is an unavoidable limitation of all the methods. The interobserver variations were least for the linear methods and higher for the volumetric methods (Table 3).
As shown in Table 5, there was good concordance among the linear methods in classifying response. CRs were seen by all radiologic criteria in three total follow-up scans representing three patients. The 1D method has the potential to underestimate progression as compared to the other methods. The reason for this is that progression is defined in the 1D method as a 20% increase in the diameter of the lesion. A 20% increase in the diameter of a sphere is equivalent to a 44% increase in area and a 73% increase in volume (Gehan and Tefft, 2000; James et al., 1999; Warren et al., 2001). However, the 2D criteria uses a 25% increase in area, whereas 3D and volumetric criteria use either a 65% or 50% increase in volume to define progression.
In this study, the number of scans determined by the 1D, 2D, and 3D linear methods to show stable disease was higher than that by volumetric methods EV and TV. Not unexpectedly, the EV method using a 50% decrease in tumor volume as a measure of response (EV^) produced a higher number of partial responders, although the mPFS and 6mPFS results were not significantly different from those determined by the other methods (Table 7). In general, volumetric methods gave higher PRs and lower PDs, or, conversely, linear methods gave higher PDs and lower PRs. The selection of a particular method involves balancing a potential discarding of active treatments with acceptance of slightly less effective treatments. However, since there is good correlation between all methods in determining mPFS and 6mPFS, which are increasingly the end points for clinical trials in malignant gliomas, these small differences in response rates are probably not significant.
By landmark analysis, the tumor response at two months assessed by the linear methods correlated better with overall survival than that by volume methods (Table 8). However, tumor response assessed by both linear and volumetric methods at six months did not correlate with overall survival, in part reflecting the short duration of response of current therapies and in part because of our limited data set, since of the 53 patients included in the analysis only 17 had subsequently died at the time of analysis. Thus, the analysis may not be adequately powered. It is important to emphasize that a primary end point of phase 2 trials conducted in the North American Brain Tumor Consortium is 6mPFS; this study raises the important question of whether this end point is useful. Recurrent malignant gliomas tend to progress and the patients die relatively quickly, which provides a partial explanation for why a six-month assessment would not be helpful at predicting overall survival, since the disease of most patients will have progressed by six months. Furthermore, since the number of patients alive at that time is small, it may not have been adequately powered to detect a difference. Certainly, a larger prospective study will help to resolve this issue.
For predicting overall survival, the strongest criterion at two months, and the only criterion with significant correlation at six months, was clinical progression. Thus, it may be that despite continuing attempts to define and refine radiographic criteria in an attempt to help guide therapy, the best method to do so might still be the clinician’s professional assessment based on history and examination.
An important concern when generalizing these results to the total brain tumor population is that the patients we studied were mostly recurrent glioma cases with enhancing tumors. Extrapolation to newly diagnosed tumors or nonenhancing tumor may or may not be appropriate. Galanis et al. (2003) have attempted to address this issue in a study of 614 MRI scans in 73 patients with both high-grade and low-grade, newly diagnosed gliomas. They found that 1D measurements (RECIST criteria) or 2D measurements in gadolinium-enhanced images were both associated with survival improvement, and there was no evidence that either method was superior. However, 1D and 2D T2-based tumor measurements correlated less well with survival. For nonenhancing tumors, volumetric measurement may potentially be more useful. A second issue is that none of our patients had undergone stereotactic radiosurgery or placement of carmustine wafers. For patients who receive these treatments, radiation necrosis and tumor progression might easily be confused (Guerin et al., 2004; Kleinberg et al., 2004; Shinoda et al., 2002; St. Clair and Given, 2003). As a result, the various measurements of enhancement alone will not capture the true tumor burden and may not be of prognostic significance. It is possible that a small minority of patients in this study who received only standard radiotherapy may have developed radiation necrosis. Whenever radiation necrosis is suspected, additional imaging with positron emission tomography, magnetic resonance spectroscopy, single photon emission computerized tomography, or surgical biopsy is performed. Nonetheless, we cannot exclude the possibility that radiation necrosis was mistaken for tumor progression in a small minority of patients in this study.
Most of the patients in this study (about 80%) received steroids at some point during their treatment. Steroids are known to affect the enhancement and edema of the tumor (Ostergaard et al., 1999) and must be taken into account when assessing response. To control for this issue, all patients had to be on a stable dose of steroids for at least five days prior to the imaging studies. The exceptions were a small number of patients with significant clinical deterioration who had to be treated immediately with high doses of steroids. Despite the increased steroid doses, these patients all had PD on their imaging studies.
It is well known that gadolinium enhancement can be affected by a variety of factors, including the rate and amount of gadolinium injected and the timing of image acquisition. In our study, patients received standard doses of gadolinium and within 30 min of scan, which should have reduced the variability due to technique. We do not feel that this contributed significantly to the measurement errors across scans for each patient.
To summarize, our results show that 1D measurements were comparable to higher dimensional modalities in assessment of tumor growth and in correlation with disease progression and survival. We propose simplifying measurements in brain tumor trials so that there is more comparability across consortia. Thus, as prior studies have done for solid tumors (James et al., 1999; Park et al., 2003), this article validates the RECIST criteria as objective measures of tumor response in glioma patients. As treatment responses improve, one of these measurement methods may turn out to be superior in assessing response. Computerized measurement of enhancing tumor volume potentially provides the most accurate measure of tumor burden and merits further study, as this may ultimately be superior in assessing true response. Other methods under study to assess tumor response are likely to be of utility in the near future, including functional MRI, methionine-PET scans, single photon emission computerized tomography, magnetic resonance spectroscopy, and biomarkers (Fountas and Karampelas, 2004; Mazzara et al., 2004).
Given that response assessment differed among the methods, standard criteria should be established for all brain tumor trials to enable comparability across treatments in different consortia. We have shown that the RECIST criteria are useful in determining objective response of enhancing brain tumors to therapy; these criteria can easily be used in all centers. These findings are likely to be of importance in the design of future clinical trials of brain tumors.
1We gratefully acknowledge the support of the Martin and Pauline Elkin, Neil Harrington, and Cynthia Martin Brain Tumor Clinical Research Funds.
4Abbreviations used are as follows: 1D, one-dimensional; 2D, two-dimensional; 3D, three-dimensional; 6mPFS, six-month progression-free survival; AA, anaplastic astrocytoma; AO, anaplastic oligodendroglioma; CR, complete response; EV, enhancing volume; GBM, glioblastoma multiforme; mPFS, median progression-free survival; MR, minor response; PD, progressive disease; PFS, progression-free survival; PR, partial response; RECIST, Response Evaluation Criteria in Solid Tumors; SD, stable disease; TV, total tumor volume.