|Home | About | Journals | Submit | Contact Us | Français|
18F-fluorodeoxyglucose (FDG) positron emission tomography computed tomography (PET/CT) provides information about metabolic and morphologic status of malignancies. Tumor size and standardized uptake value (SUV) measurements are crucial for cancer treatment monitoring.
The purpose of our study was to assess the variability of these measurements performed by observers evaluating lung tumors.
Retrospective cross-sectional study.
FDG PET/CT images of 97 patients with pulmonary tumors were independently evaluated by two experienced nuclear medicine physicians. Primary tumor size (UDCT), maximum SUV (SUVmax), mean SUV (SUVmean) and maximum SUV normalized to liver mean SUV (SUVnliv max) were measured by each observer at two different times with an interval of at least 2 weeks. Interobserver and intraobserver variabilities of measurements were evaluated through statistical methods.
Size of the lesions varied from 0.81 to 13.6 cm (mean 4.29±2.24 cm). Very good agreement was shown with correlation, Bland-Altman and regression analysis for all measured PET/CT parameters. In the interobserver and intraobserver variability analysis, the Pearson correlation coefficients were greater than 0.96 and 0.98, respectively.
Semi-quantitative measurements of pulmonary tumors were highly reproducible when determined by experienced physicians with clinically available software for routine FDG PET/CT evaluation. Consistency may be improved if the same observer performs serial measurements for any one patient.
The integrated positron emission tomography/computed tomography (PET/CT) allows the precise localization of the abnormal isotope uptake. PET/CT imaging using glucose analogue, 18F-fluorodeoxyglucose (FDG) provides valuable information about differential diagnosis, staging and treatment response of malignant tumors (1). Many malignant neoplasms and their metastases are characterized by enhanced glucose utilization and therefore increased FDG uptake. Besides a qualitative evaluation, different quantitative measurements of FDG uptake can be obtained from PET/CT scans. Standardized up-take value (SUV), which is a measurement of activity per unit volume of tissue (MBq/mL) adjusted for administered activity per unit of body weight (MBq/g), is the preferred index used as a semi-quantitative measurement of glucose intake.
Various approaches for SUV determination have been used (2–13). Among them, maximum SUV within the slice with highest radioactivity concentration is commonly used (2–6). Quantitation requires delineation of the tumor tissue by regions of interest (ROIs). ROI definition is not fully automated in most of the approaches. Observers manually selects the region to be measured.
SUV has been found helpful for differentiation between benign and malignant pulmonary lesions. SUV more than 2.5 usually has a positive predictive value (PPV) of approximately 80% when indicating malignancy and an SUV above 4.0 has a PPV of about 90% (14). SUV is also preferable to visual assessment when evaluating the effects of therapy in lung cancer (13,15). Response of lung cancers to treatment is determined by serial size and SUV measurements of tumor on PET/CT scans. The percentage of change in the measurements between a baseline scan and a second scan obtained during treatment or after the end of treatment is used to monitor response. Therefore, variability among measurements must be known. In the present study, our purpose was to assess the interobserver and intraobserver variability of size and SUV measurements of primary lung tumors with clinically available software used in the evaluation of routine FDG PET/CT.
We retrospectively analyzed FDG PET/CT scans of 97 consecutive patients who had pulmonary tumors and were referred for diagnosis or initial staging between January 2011 and December 2012 obtained from the database of our institute. Only lesions which were visually identified in both FDG PET and CT images were included in the study. If multiple lesions were observed, the dominant pulmonary lesions were selected for measurements. We obtained informed consent from the patients. This study was approved by institutional ethics committee.
FDG PET/CT was performed using an integrated PET/CT scanner which consisted of a full-ring high-resolution LSO PET and a six-slice CT (Siemens Biograph 6; Knoxville, USA). All patients fasted for at least 6 hours before undergoing PET/CT. Serum glucose levels were measured to ensure that the results were <200 mg/dl. Whole-body images were acquired 60 minutes after intravenous injection of FDG, and images were obtained from the level of vertex to that of the proximal thigh region.
Two nuclear medicine physicians who have faculty experience in reading PET/CT (Reader 1 has 4 years of experience and had read at least 5000 scans; Reader 2 has 8 years of experience and had read at least 10,000 scans) evaluated all PET/CT images on an E-soft workstation, independently. Images were analyzed semi-quantitatively by use of the SUV as indices of FDG uptake. Each observer measured primary tumor size on the axial slice that showed the largest tumor dimension on the CT component of the PET/CT (UDCT). Maximum (SUVmax) and mean (SUVmean) SUVs were calculated by manually drawing ROIs over the primary tumor. ROI was placed around the most intense slice of the tumor, which was identified by defining ROIs over every axial image plane of the whole tumor. In order to measure the mean standard up-take value of the liver, a standard ROI with a diameter of 30 mm was drawn on the right lobe of the liver. Then, the ratio of primary tumor SUVmax to the liver SUVmean was calculated to obtain SUVmax normalized to liver SUVmean (SUVnliv max) (Figure 1, ,2).2). Each observer measured each PET/CT parameter at two different times. There was an interval of at least 2 weeks between the first and second image analysis. The observers were blinded to the measurements performed before.
Interobserver and intraobserver variabilities between 2 observers were determined with different statistical analyses evaluating different aspects of agreement. Hence, correlation, regression and Bland-Altman methods were used. The 8 pairs of measurements by the 2 observers were evaluated in this way. Interobserver variability of the 4 parameters was defined with the use of an intraclass correlation coefficient (ICC). When interobserver agreement for the parameters measured is perfect the ICC will approach 1. Landis and Koch (16) classified the interpretation of ICC as follows: 0.0–0.20 slight, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 substantial, and 0.81–1.00 almost excellent agreement. The interobserver and intraobserver correlation of the measurements were calculated by the Pearson correlation analysis. The null hypothesis for analysis of correlation was that the correlation was 0; thus, excellent agreement would be 1. In the analysis of regression for intraobserver variability, the SUVs at time 1 were subjected to regression on the SUVs at time 2 and for interobserver agreement, the SUVs for observer 1 (the dependent variable) were subjected to regression on those of observer 2 (the independent variable). Good agreement was assumed when the regression line went through the origin with the slope of 1. Coefficient of variation (COV) was used to analyze the interobserver variability. COV was obtained for every parameter for each patient by dividing the SD by means which were calculated from two readings. In order to acquire the whole interobserver agreement of the 4 parameters, root-mean-square value of the 97 COVs was obtained. All of the measured parameters were expressed as mean±standard deviation (SD). To carry out statistical analysis, SPSS version 20 (SPSS Inc., Chicago, IL, USA) was used. Statistical significance was defined as p<0.05.
The study group was composed of 97 patients (77 men, 20 women; average age, 58.2±9.8 years; range 28–81 years) who had pulmonary tumors (59 malignant and 38 benign). The mean±SD and COV of the SUVmax, SUVmean, SUVnliv max and the UDCT measurements of the pulmonary tumors obtained by the 2 observers from the 97 patients’ scans at 2 different times are presented in Table 1. The size of the primary tumor varied from 0.81 to 13.6 cm (mean 4.29±2.24 cm). Root- mean-square value of these COVs was lowest for UDCT (7.23%) and highest for SUVmax (9.11%).
Correlation analysis was performed to assess intraobserver variability. Very high correlation was found between 2 reading times for all the parameters. With p values set at <0.05, the correlation was significantly different from 0 for all cases. The Pearson correlation coefficients varied from 0.985 to 0.998 for observer 1 and from 0.979 to 0.999 for observer 2 (Table 2).
Regression analysis also indicated a very good intraobserver reproducibility for every observer at 2 reading times, with an intercept of 0 and a slope of 1 for almost overall parameters. However, intercepts were 0 for all measurements (confidence interval [CI] of the intercepts included 0) only for SUVmean and for UDCT, slope wasn’t 1 (95% CI of the slope did not include value 1). The null hypothesis states that the slope is equal to zero, and the alternative hypothesis states that the slope is not equal to zero. The p values were >0.05. The alternative hypothesis for the intercept was that the intercept is 0, and slope is 1. The fact that the null hypotheses could be rejected for both intercept and slope, indicated good intraobserver reproducibility for both observers.
Bland-Altman Analysis was also carried out in order to determine intraobserver agreement. Difference (the dependent variable) between the two reading times was subjected to regression on the sum (the independent variable) of the readings at times 1 and 2. No association was found between the difference and sums for both observers and for all parameters, showing a very good intraobserver reproducibility.
Table 3 shows the results for the interobserver agreement at time 1 and at time 2. Concerning interobserver agreement, the correlation between observer 1 and observer 2 was very high. Utilizing the Fisher test and with p values set at <0.05, the correlation was significantly different from 0, in all cases. The Pearson correlation coefficients between observers ranged from 0.965 to 0.999 at time 1 and from 0.959 to 0.999 at time 2, respectively, for all parameters. The ICC between the two observers was found to be over 0.98 for all parameters at both times, indicating a very good agreement between observers (ICC approaches 1.0 if there is an excellent agreement between the observers for the measured parameters).
Agreement between the observers was also evaluated by regression of the first observer and that of second observer for all measurements. Overall, there was a very good agreement between observers with the intercept of 0 and the slope of 1. The exception was between observer 1 and 2 at time 2, for SUVnliv max and for UDCT, where the slope was not 1 (95% CI of the slope did not include value 1). However, intercepts were 0 for all measurements (CI of the intercept included 0), which indicated that the null hypothesis could be rejected for both intercept and slope and also indicated good agreement between observers at both times.
We also analyzed the results according to tumor size and pathology. All pulmonary tumors were classified into two groups according to their size (≤30 mm, >30mm) and pathology (malign or benign lesions). The correlation coefficients were given in Table 4. Among the measured SUVs, interobserver and intraobserver agreements were highest for SUVmax and lowest for SUVmean in all of the classified groups. According to our findings, agreement in SUVmean was slightly lower in larger tumor sizes and malignant groups compared to the smaller and benign ones. However, intraobserver and interobserver correlation for all measured PET/CT parameters were still very high in all classified groups. Figures 3 and and44 show the Bland-Altman plots for SUVmax and SUVmean according to tumor size.
FDG PET/CT functional imaging is widely used in the assessment of pulmonary nodules and masses either to categorize a lesion as malignant or benign or to stage and monitor lung cancer (14). For the quantification of tumor glucose up-take, SUVs are commonly employed as a semi-quantitative index. For repeated tumors, measurements of SUV must be highly reproducible. This is particularly important for therapy assessment when small changes in tumor metabolism are being evaluated. Metabolic response on PET is manifested by decrease in the glycolytic activity of tumor (17,18). A lack of metabolic progression is associated with good improvement in outcome. Hence, the importance of reliable quantification of FDG uptake has been accentuated in many studies. Differences in measurements may lead to unnecessary operations or erroneous changes in therapy such as the discontinuation of prior curative therapies and the introduction of new chemotherapeutic drugs. Therefore, the interobserver and intraobserver variabilities of SUV analysis must be known. In the study by Marom et al. (3), to assess interobserver and intraobserver variabilities of SUVmax measurements, 5 observers determined the SUVmax in 20 patients with lung cancer. SUVmax was determined using 2 different methods: by manually shifting a fixed-size (1-cm) circular ROI around the lesion until SUVmax was detected and by creating a freehand drawing placed over the lesion within the slice in which the lesion was visualized by the highest FDG concentration. The SUVmax measurements demonstrated good interobserver and intraobserver agreement with various statistical analyses (correlation, Bland- Altman, regression, and ANOVA). In the intraobserver and interobserver variability analysis, Pearson correlation coefficients were greater than 0.94 and 0.95, respectively. In that study, SUV measurements obtained from the study were also compared with the calculations reported in the initial clinical documents. In that case, agreement was poor. SUVmax measurements of the study for the same tumor showed a difference greater than 25% from the measurements of initial clinical documents in 45% of the tumors. In the study by Minn et al. (19), which evaluated 10 patients with lung cancer, and the study by Benz et al. (6), which evaluated 33 patients with high-grade sarcomas, there was 100% agreement in SUVmax measurements of the tumors determined by two observers.
The results of interobserver and intraobserver agreement in tumor size measurements in previous studies are discordant. In the literature, several studies have reported good interobserver and intraobserver agreement of tumor size measurements (20,21). However, some other studies have demonstrated substantial variation (22,23). Huang et al. (5) evaluated 43 pulmonary nodules on PET/CT and assessed interobserver variability of SUVmax, SUVmean and tumor size measurements. They found no interobserver variation in SUVmax measurements and low variability in SUVmean or size measurements. Jackson et al. (4) evaluated the interobserver agreement of the SUVmax, SUVnliv max and size measurements in primary tumors between observers with disparate practice in PET/CT. They found that in the setting of different reading experiences, FDG metabolic parameters have higher interobserver agreement than size measurements. They reported that size measurements were more manual and subjective than the SUV measurements as marginally altered angles in the measurements could produce greater variability.
Since there is no gold standard method in measuring SUV values, we used 3 commonly applied approaches (SUVmax, SUVmean and SUVnliv max) for SUV calculation in our study. Among the measured SUVs, intraobserver and interobserver agreements were highest for SUVmax and lowest for SUVmean in all subgroups, which were classified according to tumor size and malignancy status. Our findings showed that variability of SUVmean was slightly higher in larger tumor sizes and malignant groups compared to the smaller and benign ones. In general, the low activity in the background of the thorax might have facilitated tumor delineation in FDG PET/CT scans. However, in some of the necrotic regions of large tumors, tumor delineation was more difficult. In such cases, necrotic regions affected the SUVmean much more than SUVmax. The strong dependency of the SUVmean on the dimension and shape of the ROIs may explain the slightly higher variability of SUVmean in our study. However, intraobserver and interobserver correlation for all measured PET/CT parameters were still very high in all classified groups. This very good interobserver and intraobserver agreement for all parameters in our study may be related to the training and experience of the observers, as mentioned in previous studies (4,24). The other factor resulting in the lower variability in our study was probably the study environment. More attention of the readers to technique might affect the results. The present study had limitations. It was designed as a single center with a relatively small patient population (n=97). There were only 2 readers and the exams were limited to the lungs. The low level of background activity in the thorax might have facilitated tumor delineation in FDG PET/CT studies. The higher and more variable level of background activity in other parts of the body may lead to a higher variability of FDG PET/CT measurements. A larger, multi-institutional prospective studies are needed to expand our present findings.
This study shows that if PET/CT is evaluated by experienced observers with careful attention to technique, SUV and size measurements on FDG PET/CT are highly reproducible. However, despite the high interobserver and intraobserver reproducibility, the fact that some variability exists must be considered when evaluating tumor response. Consistency may be improved if the same observer performs serial measurements for any one patient. If the difference in SUV and size measurements account for a modulation in treatment and there is discrepancy between the clinical and imaging findings, it would be helpful to repeat current and prior FDG uptake measurements by the same observer to improve reproducibility.
Ethics Committee Approval: Ethics committee approval was received for this study from the Ethics Committee of Çukurova University.
Informed Consent: Informed consent was obtained from the patients who participated in this study.
Peer-review: Externally peer-reviewed.
Author contributions: Concept - G.B.; Design - G.B.; Supervision - G.B., G.ş.; Resource - G.B., M.G.; Materials - G.B., M.G.; Data Collection and/or Processing - G.B., M.G.; Analysis and/or Interpretation - G.B., M.G., G.ş.; Literature Search - G.B.; Writing -G.B.; Critical Reviews - G.B.
Conflict of Interest: No conflict of interest was declared by the authors.
Financial Disclosure: The authors declared that this study has received no financial support.