|Home | About | Journals | Submit | Contact Us | Français|
To explore the relationship between pathologic tumor volume and volume estimated from different tumor segmentation techniques on 18F-fluorodeoxyglucose (FDG) positron emission tomography (PET) in oral cavity cancer.
Twenty-three patients with squamous cell carcinoma of the oral tongue had PET-CT scans before definitive surgery. Pathologic tumor volume was estimated from surgical specimens. Metabolic tumor volume (MTV) was defined from PET-CT scans as the volume of tumor above a given SUV threshold. Multiple SUV thresholds were explored including absolute SUV thresholds, relative SUV thresholds, and gradient-based techniques.
Multiple MTV's were associated with pathologic tumor volume; however the correlation was poor (R2 range 0.29–0.58). The ideal SUV threshold, defined as the SUV that generates an MTV equal to pathologic tumor volume, was independently associated with maximum SUV (p=0.0005) and tumor grade (p=0.024). MTV defined as a function of maximum SUV and tumor grade improved the prediction of pathologic tumor volume (R2 = 0.63).
Common SUV thresholds fail to predict pathologic tumor volume in head and neck cancer. The optimal technique that allows for integration of PET-CT with radiation treatment planning remains to be defined. Future investigation should incorporate biomarkers such as tumor grade into definitions of MTV.
18F-fluorodeoxyglucose (FDG) positron emission tomography combined with computed tomography (PET-CT) has emerged as a prominent tool for staging [20,26] and has shown potential as an independent prognostic factor [1,2] in head and neck cancer. Research at our institution has shown that metabolic tumor volume (MTV) – defined on PET-CT as the volume of hypermetabolic tumor above a given SUV threshold – predicts disease progression and death in head and neck cancer [19,21].
In addition to its role in staging and as a prognostic factor, PET-CT has proven useful in radiation treatment planning [7,10,11,13,14,16,25,28]. Technical advances in radiation oncology allow for the delivery of radiation with continually increasing precision, which permits increased radiation dose to tumors and decreased dose to adjacent normal tissue. The ability to construct radiation treatment plans with steep dose gradients increases the therapeutic ratio of radiotherapy; however this technology emphasizes the importance of target delineation. Tools such as PET-CT, which can visualize the metabolic and anatomic components of cancer, have the capacity to improve the distinction between normal tissue and cancer.
Despite the promise of PET-CT in target definition, the optimal method of how to incorporate this imaging modality into radiation treatment planning remains unclear. The ideal tumor delineation technique with PET-CT would employ an SUV threshold or other segmentation method that distinguishes between pathologic tumor and normal tissue. Several SUV threshold techniques and different tumor segmentation methods have been reported, however authors typically compare PET-CT volumes to phantoms [3,12] or other imaging modalities [10,17,22,23] that may not adequately reflect the pathologic tumor volume [7,11]. Few authors have compared PET-CT segmentation techniques to pathologic tumor specimens in head and neck cancer [7,11,13], and the small sample sizes in these studies preclude definitive conclusions. The purpose of this study was to explore different PET-CT tumor delineation techniques and compare these PET-CT volumes to pathologic tumor volumes in a group of patients with head and neck cancer, specifically oral tongue cancer. We chose to focus on oral tongue cancer in this study since these tumors are often removed as a single specimen, thus facilitating accurate three dimensional pathologic tumor measurement. This study serves as a foundation to identify the optimal metabolic tumor volume for segmentation, which then can be validated against other head and neck tumor types in the future.
After Institutional Review Board approval, we reviewed the medical records of all patients with cancer of the oral tongue who underwent surgery or had a PET-CT scan at Stanford University between April 2003 and July 2010. Patients were included if they had histologically confirmed squamous cell cancer of the oral tongue, and a PET-CT scan within 6 weeks of definitive surgery. Patients were excluded if they had recurrent disease, received chemotherapy or radiotherapy prior to surgery, or underwent surgery with palliative intent. Thirty patients met the above criteria: however, the resection specimen was not available in two patients, and five patients had very small T1 primary tumors that were not identifiable on PET-CT. The remaining 23 patients made up the cohort analyzed in this study. Patient and treatment characteristics are provided in Table 1.
All patients underwent PET-CT scans for staging purposes prior to surgery. Patients fasted for at least 6 hours, and plasma glucose was confirmed to be less than 200 mg/dL before injection with the prescribed dose of 15 mCi of FDG (range: 10 to 18 mCi of FDG). PET and CT images were acquired approximately 60 minutes after FDG administration. CT images were acquired first for attenuation correction and anatomical localization of FDG activity. Two-dimensional (2-D) PET imaging was obtained over 3–5 minutes of acquisition time per bed position. The 2-D PET data were reconstructed with an ordered set expectation maximization (OSEM) algorithm, and reviewed on a dedicated workstation.
All patients received a partial- (13%) or hemi-glossectomy (87%) as surgical treatment for their primary oral tongue tumor. Specimens from the operating room were submitted to pathology either fresh (n = 10) or in 10% buffered formalin (n = 13). Those that were submitted fresh from the operating room had three dimensional gross measurements taken from the tumor before being placed into formalin. Otherwise, three dimensional measurements of the tumor were obtained after the tumor had been in formalin for up to 24 hours. To assess the degree of shrinkage from formalin fixation, tumor measurements from the fresh specimens were compared to measurements taken from formalin fixed histology slides. The tumor type (squamous cell carcinoma), grade, keratinization score , presence or absence of perineural invasion, margins, and lymph node status were also determined from the histology slides.
Metabolic tumor volume (MTVx) was defined as the volume of hypermetabolic tissue within the region of the gross tumor with an SUV greater than a defined threshold x (Figure 1). While MTV depends explicitly on an SUV threshold, the optimal SUV threshold remains to be defined. Therefore, this study evaluated multiple MTVs with different SUV thresholds including the following: absolute SUV thresholds; relative SUV thresholds expressed as a percentage of the maximum SUV (SUVmax); and a gradient-based technique. The absolute threshold levels we explored included SUVs that ranged from 2.0 to 6.0 in increments of 1.0 (MTV2.0 – MTV6.0). For example, MTV2.0 was the volume tumor with an SUV greater than 2.0. The relative SUV thresholds ranged from 30% to 70% of the maximum SUV in increments of 10% (MTV30% – MTV70%). For example, SUV30% was the volume of tumor with an SUV greater than 30% of SUVmax. In prior studies [19,21], the MTV included the primary tumor and involved nodal groups, however in this study MTV included only the primary oral tongue tumor since this was compared to the primary tumor pathology specimen. The MTV was trimmed off normal tissues such as the mandible (not FDG avid), and the sublingual glands (often increased baseline level of physiologic FDG uptake [5,29]). The absolute and relative MTVs were determined with the MIM® Software Suite along with the MIMfusion® and MIMcontouring® packages (MIMvista Corporation, Cleveland, OH).
The gradient-based technique (MTVgradient) defined the boundary of MTV from the gradient between the high SUV in tumor cells and the lower SUV in adjacent normal tissues. MTVgradient was determined with RT Image (version 0.7β), which is an open source software package designed at Stanford to analyze functional imaging data . The following settings for the gradient ROI tool in RT Image were used: initial threshold relative; limit minimum; search maximum; range 20mm; tolerance 0.5; alpha 0.001; iterations 500; unsample 4; downsample 4; and smoothing 3.
The pathologic volume of the oral tongue tumor (Vpath) was estimated from the volume of an ellipsoid:
where xpath, ypath, and zpath were the three orthogonal diameters obtained from the resected tumor specimen.
In addition to directly comparing the pathologic tumor volume to MTV, a more clinically relevant measure is the margin of expansion or contraction between the two volumes. The circumferential marginal expansion or contraction (d) was determined by solving the following equation for d:
For example, if d were 2mm, then MTV would overestimate the pathologic tumor volume by an approximate 2mm circumferential margin.
In addition to using pre-defined SUV thresholds with the multiple MTV endpoints defined above, we also determined ideal SUV thresholds. The ideal SUV threshold was defined as the SUV value for each patient that yielded an MTV equal to the pathologic tumor volume.
Our initial definitions of the threshold for MTV depended solely on SUVmax. With this study we sought to determine if accounting for potential confounding factors could improve the definition of MTV and thus improve the ability of PET to predict pathologic tumor volume. This was accomplished by searching for significant predictors of the ideal SUV threshold, because if one can predict the ideal SUV threshold then one could define an MTV equal to the pathologic tumor volume. First, we noted that SUVmax only partially predicted the ideal SUV threshold, however there was unexplained variation which could be attributable to confounding factors. Next, we used multivariate linear regression models to examine the role potential confounders along with SUVmax to determine if accounting for the confounders could improve the correlation with the ideal SUV threshold. With the small number of patients in this study the multivariate linear regression model was limited to two predictors, the first was always SUVmax, and the second was the potential confounder. The individual potential confounders we examined included tumor grade, tumor keratinization score, tumor stage, nodal stage, perineural invasion status, and the elapsed time between PET-CT and surgery. Of all the confounders we examined, only tumor grade, in addition to SUVmax, were significant predictors of the ideal SUV threshold. A multivariate linear regression model including SUVmax and tumor grade was constructed to generate a prediction of the ideal SUV threshold (SUVpredicted threshold). Finally, we incorporated the SUVpredicted threshold into the definition of MTV by introducing MTVtumor grade which was defined as the volume of hypermetabolic tissue above each patient's SUVpredicted threshold.
Statistical analysis was done with SAS version 9.2 (SAS Institute Inc., Cary, NC).
The median time between each patient's PET-CT scan and surgery was 10 days (range 3–36 days). The median estimated pathologic tumor volume was 3.1 cm3 (range 0.013–38 cm3). MTV's with absolute SUV thresholds (MTV2.0 – MTV6.0), relative SUV thresholds (MTV30% – MTV70%), and MTV with the gradient-based technique (MTVgradient) all roughly correlated with pathologic tumor volume (p<0.05), and are demonstrated in Figures 2A–2K. However, the strength of the correlation between MTV and pathologic tumor volume was relatively poor (R2 range 0.29–0.58). The diagonal gray bands in Figure 2 represent the region where MTV was within +/−50% of the pathologic tumor volume, and with each MTV endpoint there were multiple patients that fell outside of the bands, suggesting a significant over- or underestimation of pathologic tumor volume in several patients. One patient seen on the left side of the plots in Figure 2 had a very small pathologic tumor volume pathology and a larger tumor on PET. Excluding this outlier from the analysis did not significantly change the correlation between PET and pathologic volumes.
Table 2 illustrates the difference between pathologic tumor volume and MTV when the relationship is expressed as the circumferential marginal difference. Similar to the direct volumetric comparison, MTV often under- or overestimated the pathologic volume by large margins.
The ideal SUV threshold was defined as the SUV cutoff that generated a metabolic tumor volume equal to the gross pathology tumor volume. The median ideal SUV threshold was 5.2 (range 2.1–12.7). Figure 3 demonstrates the relationship between the ideal SUV threshold and SUVmax. The ideal SUV threshold increases with SUVmax (p = 0.002), suggesting that the definition of MTV should in part depend on SUVmax. Despite this significant relationship, the overall correlation between the ideal threshold and SUVmax was relatively poor (R2 = 0.37), which suggests that other factors may confound this relationship. Indeed, the multivariate linear regression found that both SUVmax (p = 0.0005) and tumor grade (p = 0.024) independently predict the ideal SUV threshold. This analysis yielded the following regression model for the ideal SUV threshold:
where the tumor grade equals 1, 2 or 3 for well-, moderately- or poorly-differentiated squamous cell carcinoma, respectively. This multivariate regression analysis suggests that for a given SUVmax, the ideal SUV threshold for a moderately differentiated tumor would be 1.9 units lower than a well differentiated tumor. Additionally, a poorly differentiated tumor's ideal SUV threshold would be 1.9 units lower than a moderately differentiated tumor. This multivariate analysis indicates that higher tumor grades require larger margins on PET-CT to adequately estimate the pathologic tumor volume. Neither T-stage (p = 0.53), N-stage (p = 0.25), degree of tumor keratinization (p = 0.95), perineural invasion (p = 0.95), nor elapsed time between PET and surgery (p = 0.11) were associated with the ideal SUV threshold.
Finally, MTVtumor grade (Figure 2L) slightly improved MTV's ability to predict pathologic tumor volume (R2 = 0.63). This improvement was also seen when the relationship between MTVtumor grade and pathologic tumor volume was expressed as the circumferential marginal difference (bottom of Table 2). MTVtumor grade was within 2mm of the pathologic tumor volume in 16 patients (70%), and no patient had an MTVtumor grade that was more than 5mm away from the pathologic tumor volume.
To assess the degree of shrinkage from formalin fixation, the available tumor measurements from the fresh specimens were compared to measurements taken from the associated formalin fixed histology slides. After tumor fixation, the linear tumor dimension decreased an average of 0.22 cm (range 0.1–0.3 cm), which corresponds to an average 14% (range 5–23%) relative decrease in tumor dimension length.
The key finding in this study relates to the observation that no single SUV threshold gives a metabolic tumor volume that adequately captures pathologic tumor volume in patients with cancer of the oral tongue. The ability of PET-CT to distinguish between tumor and normal tissue makes it an attractive tool in tumor delineation with radiation treatment planning. Unfortunately, the optimal method to combine this technology with tumor segmentation remains unclear.
When contouring a tumor on a PET-CT, the most frequently used method involves the radiation oncologist visually inspecting PET images, and consciously distinguishing tumor from normal tissue. This method depends substantially on the windowing of the PET and CT scans, as well as the judgment and experience of the treating physician. Ciernik et al.  noted that PET-CT tumor volumes differ by an average of 9.1cm3 between independent expert observers. Because of the bias involved with gross visual interpretation, several investigators have studied more objective approaches to tumor delineation.
Burri et al.  concluded that a threshold of 40% of the maximum SUV (MTV40%) provided the best compromise between accuracy and avoiding underestimation of the pathologic tumor volume. However, from the raw data reported in the Burri et al. study, only four of the twelve patients with surgery specimens had MTV40% values that were within +/−50% of the pathologic tumor volume. Daisne et al.  found that an SUV threshold defined as a function of the tumor-to-background ratio most accurately predicted pathologic tumor volume compared to CT or MRI. However, the raw data presented in the Daisne et al. investigation revealed that five of the nine patients with tumor specimens had PET volumes more than 50% greater than the pathologic volume. These findings mimic the observations of this current study in that PET-CT volumes fail to accurately predict pathologic tumor volume.
Further work by Geets et al.  suggested that gradient based segmentation methods outperform the tumor-to-background threshold technique. Indeed, all of the seven analyzed patients in the Geets et al. study had gradient based PET volumes within +/− 50% of the pathologic tumor volume. These findings differ from our results. Although our gradient-based MTV performed relatively better than other MTV volumes, it still underestimated the pathology volume by greater than 5 mm in one patient. Potential explanations for these discordant results could relate to the technical differences in measuring pathologic tumor volume, or the differing techniques for determining the gradient-based MTV. While the gradient-based method in the Geets et al. study appears superior, one limitation relates to the requirement that each PET-CT scanner undergo individual calibration prior to implementing their gradient-based technique.
Even though this current study failed to identify a tight relationship between MTV and pathologic tumor volume, an interesting finding relates to the observation that the MTV prediction of pathologic tumor volume improved when it integrated both SUVmax and histologic grade. This indicates that the underlying tumor biology, as reflected by tumor grade in the present study, can influence FDG uptake and may be an important factor in defining the optimal SUV threshold. In oral cavity cancer, higher grade tumors have been associated with deeper and more invasive fronts , which may not be large enough to be adequately visualized on FDG PET-CT. These invasive fronts may explain why high grade tumors require larger MTV margins. Further research into this hypothesis is warranted.
Although combining the PET parameters with tumor grade makes sense and may improve PET volume delineation, a major drawback of this model relates to the potential variability of tumor cell differentiation within the entire tumor specimen. For example, a single tumor may have both well differentiated and poorly differentiated regions, and a biopsy from the well differentiated region would conclude the tumor was low grade. Since the standard three-tiered grading system has not consistently correlated with prognosis in oral SCC, proposals have been made for grading the invasive front which is presumed to represent the most aggressive portion of the tumor [6,8]. Another factor that requires consideration relates to the fact that grade is subject to significant inter-observer and intra-observer variation. Prior studies have shown poor interobserver agreement for the grading of oral squamous cell carcinomas [6,24]. For our study, grade concordance was reached between two independent pathologists, who graded the tumor based on the highest grade present within the entire tumor specimen. However, analysis of the entire tumor specimen is not applicable in head and neck cancer cases treated with definitive radiation, which is the most common scenario where PET-CT is used for treatment planning. In addition to sampling variability, tumor grade is often not available in cases when nodal fine needle aspiration has been used to establish diagnosis. Finally, the influence of grade in this model, which focuses on oral cavity cancer, may not translate to other tumor subsites in the head and neck.
Other limitations of this current study are also worth mentioning. First, our pathologic tumor size estimation assumed an ellipsoid tumor shape. The pathologic tumor volume of an oblong or stellate shaped tumor could vary significantly from our estimated pathologic volume, which could significantly bias our conclusions. On the other hand, one reason we chose oral tongue cancer in this analysis relates to the hypothesis that oral tongue tumors should grow relatively unimpeded by hard anatomic structures, and therefore their shape would theoretically imitate an ellipsoid. Regardless, we accept the drawbacks of volumetric estimation with ellipsoid approximation , and therefore in future prospective studies we plan to use slice-by-slice measurement of fresh tumor to validate these initial findings. Second, irregularly shaped or heterogeneous tumors could suffer from inaccuracies related the relatively low resolution of PET images , which could affect our PET volume estimation. Third, the non-standardized retrospective nature of the pathology specimen handling and measurement process could add bias to our tumor volume measurements. Fourth, this study included both fresh and formalin fixed tumor specimens. While we determined that the overall magnitude of formalin shrinkage was relatively small, this could lead to an underestimation of the pathologic tumor volume. Fifth, the methods used in this study did not allow us to evaluate the spatial location of the PET volume compared to the actual in vivo tumor volume. Therefore, the possibility of a geographic miss remains unaddressed. Finally, the presence of dental fillings in the oral cavity can lead to streak artifacts on CT, which would alter the attenuation corrected PET , and could unpredictably disturb our MTV estimation.
Despite these limitations, this study demonstrates that several commonly used tumor segmentation techniques fail to accurately predict pathologic tumor volume. The optimal method of integrating PET-CT into radiotherapy treatment planning has yet to be defined, and needs further study.
Supported in part by: R01 CA118582-04 (QTL, CK, EEG) & P01 CA67166 (QTL, EEG)
Conflict of Interest Statement: no conflicts of interest exist