|Home | About | Journals | Submit | Contact Us | Français|
To evaluate the prognostic value of metabolic tumor volume measured on 18F-fluorodeoxyglucose positron emission tomography (FDG-PET) imaging and other clinical factors in patients treated for locally advanced head and neck cancer at a single institution.
From March 2003 to August 2007, 85 patients received PET/CT-guided chemoradiotherapy for HNC. Metabolically active tumor regions were delineated on pretreatment PET scans semi-automatically using custom software. We evaluated the relationship of FDG-PET maximum standardized uptake value (SUV) and total metabolic tumor volume (MTV) with disease-free survival (DFS) and overall survival (OS).
Mean follow-up for surviving patients was 20.4 months. The estimated 2-year locoregional control, DFS, and OS for the group were 88.0%, 69.5% and 78.4%, respectively. The median time to first failure was 9.8 months among the 16 patients with relapse. An increase in MTV of 17.4 mL (difference between the 75th and 25th percentiles) was significantly associated with an increased hazard of first event (recurrence or death) (1.9-fold, p<0.001), even after controlling for Karnofsky performance status (KPS) (1.8-fold, p=0.001), and of death (2.1-fold, p<0.001). We did not find a significant relationship of maximum SUV, stage, or other clinical factors with DFS or OS.
Metabolic tumor volume is an adverse prognostic factor for disease recurrence and death in HNC. MTV retained significance after controlling for KPS, the only other significant adverse prognostic factor found in this cohort. MTV is a direct measure of tumor burden and is a potentially valuable tool for risk stratification and guiding treatment in future studies.
Positron emission tomography (PET) imaging using the tracer 18F-fluorodeoxyglucose (FDG) incorporates metabolic tumor function with anatomic localization when integrated with computed-tomography (CT) imaging. PET/CT imaging has become an increasingly important component of staging, tumor localization for radiation therapy (RT) treatment planning, and assessment of treatment response in many malignancies, including head and neck cancer (HNC) (1,2). FDG-PET in HNC provides a better assessment of the local extent of the tumor as well as of regional and distant metastatic spread. Many previous studies have established the high sensitivity of PET validated against histologic tissue sampling in various HNC sites. The sensitivity ranges from 82-90% was and is generally equivalent or superior to standard CT or magnetic resonance imaging (MRI) in identifying malignancy, particularly for detecting regional nodal involvement (3,4). In several recent studies, the degree of tumor uptake of FDG on PET as assessed by the standardized uptake value (SUV) has been shown to be an independent prognostic factor in various HNC subsites (5-9). Differential tumor uptake of FDG compared to normal tissues correlates with biological factors such as cell viability and proliferative activity (10-12). As a result, FDG-PET has emerged as a potentially valuable tool in the functional and biological evaluation of head and neck tumors.
FDG-PET allows us to systematically measure tumor burden, which may more directly predict locoregional control and survival as compared to previously identified clinical and pathologic prognostic factors including stage, anatomic subsite, and Karnofsky performance status (KPS) (13,14). Of these, stage in particular is intended to represent the tumor burden, which has been difficult to quantify in terms of tumor volume except by labor intensive manual methods using conventional imaging. The high tumor to background intensity ratio in FDG-PET facilitates rapid computer assisted measurement of total body metabolic tumor volume (MTV), and we have developed tools to do so. Thus, the objectives of this study are (1) to test the hypothesis that tumor burden as characterized by MTV is a prognostic factor that can predict for failure and death in HNC and (2) to evaluate clinical outcomes including overall survival, disease-free survival, and locoregional control in this cohort who received FDG-PET/CT-guided radiation therapy.
We conducted a retrospective review of the medical records of all patients with HNC who underwent FDG-PET/CT imaging either for staging or radiation therapy planning and were treated at Stanford Hospital and Clinics. This study was conducted under the review and approval of the Stanford institutional review board. Between February 2003 and August 2007, 152 patients with HNC underwent PET/CT imaging prior to radiation therapy. Of these, 85 were treated with definitive chemoradiotherapy with curative intent and formed the cohort of this study. Exclusion criteria were: non-carcinoma histology, previous radiation therapy, previous chemotherapy, previous definitive surgery, palliative intent, and evidence of distant metastatic disease at diagnosis. Patients with salivary gland, paranasal sinus, thyroid, and skin primary sites were also excluded.
All patients were positioned supine with the arms by the sides and shoulders displaced caudally. A custom molded foam cushion (AccuForm, Medtec, Orange City, IA) was used to support the head and neck, a thermoplastic mask (Aquaplast, WFR/Aquaplast Corp., Wyckoff, NJ) was molded to the face, and posts indexed to the treatment couch were used as hand holds to reproduce the shoulder position. . All scans were performed on a GE Discovery LS PET/CT scanner (GE Medical Systems, Milwaukee, WI). Each patient fasted for at least 8 hours before imaging. After ensuring that blood glucose levels were <180 mg/dl, patients were injected with 10 to 18 mCi of FDG. Patients then underwent PET/CT imaging after a tracer uptake time of 45 to 60 minutes. Frontal and lateral x-ray projection images were acquired as localizers to select the field of view, and CT data were collected in helical acquisition mode. PET data covering the same field of view were acquired in two dimensional (2D) mode, for 3 to 5 min of acquisition time per bed position. The PET data were then reconstructed with an ordered set expectation maximization (OSEM) algorithm, using the CT images for attenuation correction. In patients who did not have previous staging PET scans, the axial field of view included the top of the head to mid-thighs, spanning 6-7 bed positions, for the purpose of whole body staging. Patients who had previous staging PET scans documenting lack of distant metastases had limited field of view scans including the top of the head to the mid to lower thorax, spanning 2-4 bed positions, for the purpose of aiding radiation treatment planning. At the conclusion of the examination, all reconstructed image data were transferred to a radiation treatment planning workstation, and also to a research workstation for tumor volume analysis. The complete PET/CT examination requires approximately 90 min, including patient setup, tracer uptake, and CT and PET image acquisition.
Computer-aided metabolic tumor volume and SUV measurements were performed using RT_Image, a software application developed at our institution to analyze functional imaging data for radiation therapy applications, using the Interactive Data Language (IDL;ITT Visual Information Solutions, Boulder, CO) (15). The FDG-PET data were read into the program in DICOM format and intensity values were automatically converted to SUVs. The images were displayed as maximum intensity projections (MIP) and hypermetabolic lesions were identified by radiation oncologists experienced in PET/CT based treatment planning (TL and EF). Diagnostic nuclear medicine reports and final radiation treatment planning volumes were used as references when identifying the lesions. The users then selected each hypermetabolic lesion interactively by clicking on its projection using a graphical user interface. This is the only step in the segmentation process requiring user interaction.
Each tumor thus identified by the user was then segmented automatically in three dimensions by the software using the following procedure. First, the voxel of maximum intensity along the selected projection line is used as the starting point for a region growing procedure. The algorithm then finds the voxel of local maximum intensity within a specified radius (default value of 1 cm) of the starting voxel. The region growing algorithm then defines the segmented volume as all voxels connected to the local maximum intensity voxel that have an intensity greater than a specified fraction of the maximum intensity. The threshold intensity value used in this study was 50% of the local maximum intensity, which has been identified as a reasonable choice in phantom studies (16).
Once all of the hypermetabolic tumor foci are segmented, the software calculates the metabolic tumor volume (MTV), defined as the total volume of all tumors in the body in milliliters (mL), as well as the maximum and average SUV within the MTV. The integrated SUV, defined as the product of the average SUV and MTV, is also calculated automatically. Figure 1 shows cross sectional FDG-PET images with overlays of segmented MTV for two cases of oropharyngeal cancer with a small MTV (Figure 1a) and a large MTV (Figure 1b).
Statistical analysis was performed using the free software environment R (version 2.5.1) with the “survival” package (17). Survival curves were estimated using the Kaplan-Meier method. Time to event was calculated as the time interval from the date of diagnosis to the date of death or of the first finding on clinical or imaging exam that suggested local, regional, or distant disease recurrence and led to additional confirmatory testing (e.g., biopsy or additional imaging) or change in clinical management. When indicated, the statistical analysis was also performed from date of initiation of RT. The Cox proportional hazards (CPH) model was used to evaluate prognostic variables in our study for univariate and multivariate prediction of disease free survival (DFS, with event defined as relapse at any site or death) and overall survival (OS, with event defined as any death); tests were based on the likelihood-ratio (LR) statistic. Prognostic factors analyzed included PET MTV, integrated SUV, maximum SUV, stage, sex, Karnofsky performance status (KPS), radiotherapy dose, and chemotherapy type. We analyzed KPS, MTV, integrated SUV, and maximum SUV as continuous variables, whereas we analyzed stage, sex, radiotherapy dose, and chemotherapy type as categorical variables in the CPH model. There were four categories for RT dose and seven categories of chemotherapy type as detailed in Table 2. The proportional hazards assumption was tested with the “cox.zph” method and was not rejected (18).
Total gross tumor volume (GTV) was outlined by the treating physicians (QTL and BL) on each patient’s treatment planning CT scan. PET and MRI imaging were fused with the treatment planning CT scan and helped guide GTV delineation when available. A quantitative measure of the total GTV was determined for each patient and was correlated to the MTV. The MTV-GTV correlation was analyzed using the Pearson correlation coefficient and a two-sided t-test was used to determine the significance of the correlation.
Clinical characteristics including gender, age, primary site, histology, American Joint Committee on Cancer (AJCC) stage (TNM), and KPS of the 85 patients are summarized in Table 1. The majority had cancer of the oropharynx (53%) and almost all had locally advanced HNC, as illustrated by the distribution of stages (72% stage IVA-B and 98% stage III-IVB). The treatment characteristics radiation dose, radiation technique, and chemotherapy type are summarized in Table 2. All patients underwent definitive chemoradiotherapy and the majority (89%) received a platinum-based chemotherapy regimen. All patients were treated with 3-dimensional conformal radiation therapy or intensity modulated radiation therapy technique. Radiation therapy doses ranged from 50-70 Gy (although only one patient received less than 66 Gy) and the majority of patients with nasopharyngeal carcinoma received a stereotactic radiosurgery boost (7-10 Gy) to the primary tumor site following fractionated RT treatment. To determine clinical outcomes, medical records were also reviewed to determine the date of pathologic diagnosis, dates of treatment, date of pre-treatment PET/CT scan, date of locoregional failure, date of distant failure, and date of last follow-up or death. Median time from date of diagnosis to date of PET/CT scan is 22 days and 94% of patients underwent their PET/CT scan within 12 weeks of diagnosis. Mean follow-up from time of diagnosis for surviving patients was 20.4 months (range, 5.6-58.8 months).
Among 85 patients, there were 7 locoregional failures (LRF), 13 distant failures (DF) of which 4 were in patients who also had LRF, and 4 deaths from intercurrent illness. Upon review of the imaging and treatment plans in the patients with LRF, all six local failures occurred within the original MTV and consequently, within the high-dose RT region. The one regional failure occurred in a patient with stage IVA (T3N2b) SCC of the pyriform sinus. The regional failure occurred in the thyroid gland, which was outside the original MTV but within the low-dose RT region (50 Gy), and in the cavernous sinus, also outside the original MTV. The four deaths from intercurrent illness were in: a 39-year old woman with a nasopharyngeal cancer who died of disseminated intravascular coagulation unrelated to either cancer or treatment, a 57-year old man with oropharyngeal cancer who died of presumed aspiration pneumonia 3 months after completion of treatment, a 76-year old woman with hypopharyngeal cancer who died from post-operative complications related to a later-diagnosed pancreatic cancer, and an 87-year old man with hypopharyngeal cancer who died of atherosclerotic heart disease. All four patients had no evidence of disease at last follow-up. Among the 16 patients with LRF or DF, median time from end of radiotherapy to first failure was 9.8 months.
The estimated (Kaplan-Meier) 2-year locoregional control, DFS, and OS for the entire cohort were 88.0%, 69.5%, and 78.4%, respectively. The estimated (Kaplan-Meier) median OS for the cohort was 48.2 months. Figure 2 shows the disease-free survival (DFS) and overall survival (OS) curves of the entire cohort, with 95% confidence bands. The median MTV and integrated SUV for the cohort were 11.2 mL (range, 0.8-88.9 mL) and 100.6 mL*SUV (range, 4.1-1022 mL*SUV), respectively. The median value for maximum SUV was 14.8 (range, 3.6-57.0).
On univariate analysis, MTV had a significant relationship with DFS (LR = 13.6, p < 0.001) and with OS (LR = 11.7, p < 0.001). When calculated from the time of initiation of RT, the relationship between MTV and DFS (LR= 14.5, p<0.001) and OS (LR=11.9, p<0.001) remained significant. An increase in MTV of 17.4 ml (the difference between the 75th and 25th percentiles) was associated with a 1.9-fold increase (95% CI 1.4-2.6) in hazard of first event and 2.1-fold increase (95% CI 1.4-3.1) in hazard of death. Conversely, maximum SUV did not have a significant relationship with DFS (LR = 1.6, p = 0.20) or OS (LR = 1.5, p = 0.22). Figure 3 shows DFS by tertiles of MTV. Figure 4 shows OS by tertiles of MTV.
Initially, our primary hypothesis was that integrated SUV was related to DFS (LR = 17.6, p < 0.001). However, integrated SUV is the product of MTV and average SUV. Once it was determined that MTV was also significant while maximum SUV was not, we hypothesized that MTV accounted for the significance of integrated SUV. We confirmed that neither maximum nor average SUV, which were strongly correlated with each other, had a significant effect on outcome. Therefore, MTV was the driving factor behind the significance of integrated SUV.
On univariate analysis, lower KPS was predictive of shorter DFS (p < 0.001) and OS (p = 0.001). Stage, sex, radiotherapy dose, and chemotherapy type were not significantly related to DFS or OS. As a result, MTV was also tested against DFS controlling for KPS and its effect remained significant on multivariate analysis (LR = 10.8, p = 0.001).
We note that the number of patients with an event (recurrence or death) is small relative to the number of univariate models analyzed for DFS and OS. Our purpose for determining the significance of these factors was to ensure that the prognostic effect of MTV is not accounted for by a confounding variable. As a result, we controlled for KPS as a potentially significant factor.
GTV measurements from treatment planning CT scans were available in 92% (78/85) of the patients with a range of 17.4 to 441.9 mL. The GTV was consistently larger than the MTV because it encompassed the entirety of the visualized disease whereas we assigned a 50% threshold of maximum SUV for the semi-automated delineation of the MTV. GTV included both the primary tumor and involved lymph nodes and these values were highly correlated with MTV. The correlation coefficient was 0.73 (p<0.001).
Treatment outcomes in HNC remains heterogeneous; therefore, substantial research efforts have focused on the identification of novel biological parameters to further stratify risk groups with the goal of developing individualized treatment strategies for these patients. PET/CT is an increasingly popular imaging modality that incorporates both anatomical localization and functional information and has the potential of being a valuable tool in risk stratification of HNC patients. Previously reported studies discuss the feasibility of PET/CT imaging as part of staging, radiation therapy planning, and follow-up of treatment response for HNC. PET has been shown to be superior to computerized tomography (CT) alone for identifying nodal disease in the neck (4). In addition, PET/CT imaging has been shown to identify the primary site of malignancy in 24-29% of patients with carcinoma of unknown primary in the head and neck after a complete conventional work-up (19, 20).
FDG-PET imaging has also been increasingly used for radiation therapy treatment planning due to the improved tumor localization (8, 16). In a study by Soto et al, the anatomic site of local failure following definitive RT was compared to the pretreatment FDG PET-biologic target volume (BTV) (21). There were 9 locoregional failures in the cohort of 61 patients (15%) and 8 of the 9 failures (89%) were within the PET-BTV. Similarly in our study, all 6 local failures occurred within the pretreatment MTV and only the 1 regional failure occurred outside the pretreatment MTV. This demonstrates the value of PET imaging in the identification of high-risk tumor volumes for RT planning and potential future use in dose-intensification protocols.
Several studies have evaluated SUV as predictive of outcome and two studies by Allal et al have shown high SUV to predict for worse LC and DFS, but this was not confirmed based on our results (6, 7). This could be due to the difference between the patient populations evaluated in the 2 studies. Nearly all (98%) of our patients had stage III-IV tumors and all underwent definitive chemoradiation therapy whereas the patients in the Allal’s study were more heterogeneous with 20% early stage patients. In addition, the statistical methods used differed. We analyzed maximum SUV as a continuous variable, not as a categorical variable as performed in the Allal et al study. Another single-institution study by Vernon et al also failed to confirm maximum SUV as a predictor of outcome (22).
On the other hand, our study suggests that tumor burden is a more important predictor of treatment outcome. Knegjens et al also found that in advanced HNC larger tumor volume, measured by MRI, correlates with inferior local control and overall survival at 5 years (23). In a multivariable analysis, tumor volume was the most significant independent factor for local control and overall survival whereas T-stage was not a significant factor.
Instead of measuring tumor volume based on MRI or CT imaging, we used FDG-PET imaging to determine MTV, which may be a more direct and reliable method of quantifying tumor burden because it incorporates functional criteria. The high tumor to background intensity ratio in FDG-PET facilitates computer-aided measurements. The custom software used in our study, RT_image, allows rapid identification and automated segmentation of hypermetabolic tumor volumes with greater consistency and less interobserver variability than CT or MR based segmentation that generally requires labor intensive manual contouring on each axial image. The choice of the threshold intensity value used for PET segmentation can affect the absolute value of the volume measurements, but not the consistency of the measurements as long as the same threshold is used within a reasonable range. At this time, it is unclear where the extremes of this meaningful range lie but would be an interesting topic for a future study. In a previous feasibility study evaluating the use of PET/CT in radiation therapy planning, a threshold of 50% of maximum SUV was found to be a reliable correlate to CT tumor volume in a phantom study (16). This choice of threshold was subsequently used to evaluate MTV in lung cancer and lymphoma and MTV correlated with outcome in both of these tumor types (24, 25). As a result, we hypothesized that the same 50% threshold would be reliable and the resultant MTV would be an independent prognostic factor in head and neck cancer.
In order to further assess the reliability of our MTV measurements, we performed a correlation between MTV and CT-based GTV measurements in this cohort of patients. GTV and MTV were found to be highly correlated with a correlation coefficient of 0.73 (p<0.001). This suggests that MTV measurements are reliable, although MTV is consistently less than GTV in each patient. This is likely due to the 50% threshold established in our study for measuring MTV whereas GTV encompasses the entire grossly visible tumor, metabolically active or otherwise.
Of note, stage, which is the most important conventional measure of tumor burden, was not prognostic in our study population. This is likely because of the narrow range of stage (predominated by stage IV) in this group of patients with almost exclusively locally advanced HNC. In addition, there is little variability in the treatment with the vast majority of our patients receiving 66 Gy or higher (99%) and a platinum-based chemotherapy regimen (89%). As a result, it was not surprising that radiation dose and chemotherapy type were not prognostic._ Overall, MTV was the most significant prognostic factor we analyzed, indicating the importance of a quantitative assessment of tumor burden.
Limitations of our study include the heterogeneity of primary tumor sites, non-uniformed treatment regimens and the retrospective design. Despite these limitations, we obtained highly significant results demonstrating a correlation between high MTV and disease recurrence and patient death. Similar results have been reported in lung cancer and lymphoma where MTV was shown to be highly prognostic for disease progression and death, independent of other established prognostics factors (24, 25). We believe that previously described prognostic factors, namely stage, may simply be surrogates for the underlying and more prognostically significant metabolically active tumor burden as measured on FDG-PET. Prospective trials to confirm the reliability of MTV in predicting outcome are necessary.
Clinical outcomes seen in this cohort of patients are comparable to other modern single institution studies such as from the Cleveland Clinic who reported a 5-year locoregional control of 86.7% and overall survival of 65.7% in locoregionally advanced SCC of the head and neck treated with concurrent chemoradiotherapy (26). Our results are slightly improved compared to historical controls from large randomized multi-institutional studies in which FDG-PET was not routinely utilized in treatment planning, with a 3-year locoregional control of 47% and overall survival of 37% reported by Adelstein et al, and a 55% 3-year overall survival reported by Bonner et al (27, 28).
In summary, we have found that MTV is an independent predictor of survival in our patient group. These results will need to be prospectively validated in an independent large cohort of patients with HNC treated uniformly. We are also evaluating the role of MTV on post-treatment FDG-PET/CT imaging in the same patient cohort to determine whether it can help to predict pathologic response as well as long term outcomes in these patients.
Supported by 1 R01 CA118582-03 (QTL & EEG) and PO1-CA67166 (QTL & EEG)
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Conflicts of Interest Notification
No actual or potential conflicts of interest exist.