This study represents the first time that the variability of CT-based tumor measurement has been fully quantified with the use of repeat CT imaging and conventional side-by-side measurement. The variability we found is directly applicable to measurements of lung tumors and is likely to be observed in tumor measurements in other organs. We believe our data have important relevance to the clinical care of patients with parenchymal metastases and to the interpretation of clinical trial results that rely heavily on radiographic response end points.
Our data demonstrate that CT measurement of lung lesions has variability of a clinically meaningful magnitude, with standard deviation of 2.4 mm of change between two CT scans of a tumor performed within a short period of time. This measurement variation is multifactorial in nature, partly because of differences in the appearance of the tumor or surrounding tissue (
Appendix Fig A1, online only) and partly because of operator-dependent differences in the placement of a measurement. In total, the multistep process of repeat imaging and measurement of a tumor will lead to differences of ≥ 4 mm approximately 10% of the time (). Or put in other terms, for a lesion that in fact measures 4 cm, the variability of CT imaging can lead to measurements ranging from 3.5 to 4.5 cm ().
These data can assist clinical oncologists as they determine when a patient has developed disease progression. Although disease progression in clinical trials is defined by RECIST as an increase in summed tumor diameter of ≥ 20%,
13 the criteria state that, “it is not intended that these RECIST guidelines play a role in [clinical] decision making, except if determined appropriate by the treating oncologist.” In clinical practice, any clear evidence of tumor growth could be judged a treatment failure, supporting the discontinuation of a therapy and possible change to another. Some oncologists may interpret a diameter increase of 1 or 2 mm as evidence of clear tumor growth; however, we found that reimaging led to measurement differences exceeding 2 mm 33% of the time, with half of these (17%) showing a more than 2 mm increase in tumor diameter. Therefore, we believe the variability inherent in CT imaging requires that clinicians consider other factors, such as changes in size of other lesions or patient toxicity, when using CT measurements to identify tumor progression.
Our findings indicate that the inherent variability of conventional unidimensional CT measurement can at times lead to the appearance of RECIST progression (≥ 20% diameter increase), considering that 3% of measurement changes calculated from the repeat CTs met this criterion. This was more common with lesions measuring between 1 and 3 cm; in 6% of these lesions, the measurement change from reimaging resulted in an appearance of a ≥ 20% increase in diameter. One strategy for avoiding cases of variability being misclassified as progression was adopted by RECIST 1.1, which now dictates that a ≥ 20% increase only qualifies as disease progression if there is “an absolute increase of at least 5 mm” in summed diameter measurements,
13 and our data support this concept of a minimal change requirement.
This work also may have important implications for the interpretation of tumor response, particularly when measurement change is considered as a continuous variable, a technique increasingly considered as a way of better expressing a therapy's antitumor activity.
1–2,14 One analysis often used is the waterfall plot, which displays the magnitude of each patient's best response as a percent measurement change (B). To gauge its prevalence in the literature, we searched PubMed for phase I and II trials treating the major CT-measured carcinomas (lung, colorectal, pancreas, and renal) that were published in the
Journal of Clinical Oncology in 2009. The search found 41 articles; nine articles focusing on radiotherapy or toxicity of therapy were excluded. Of the remaining 32 clinical trials, 15 (47%) included data addressing response as a continuous variable; 14 showed waterfall plots, and nine quantified the number of patients with a reduction in measurements. Although these statistics are used frequently, they can only be meaningfully interpreted (and meaningfully reported) if the expected variability of CT imaging and measurement is known. In the present study, we found a median decrease of 4.2% with the reimaging of a tumor after a short interval of time, indicating that half of measurement decreases as a result of variability alone will exceed 4.2%. As shown in the waterfall plot of our data (B), 84% of percent measurement differences fell between −10% and +10%. Tumor size changes of small magnitudes are commonly reported in clinical trial results, yet the implication of these changes remains unclear considering that these differences may be solely a result of imaging variability.
As an example, in , we present a waterfall plot of data from a recently completed phase II trial of targeted therapy in NSCLC (published separately).
15 No standard statistical method exists for comparing waterfall plots, but many authors describe the proportion of patients with tumor shrinkage, including tumors with any evidence of diameter decrease. However, our data would suggest that many measurement decreases, particularly those less than 10%, may be indistinguishable from variability-related changes. Rather than calculating a tumor shrinkage rate, we would recommend that tumors with diameter changes between 0% and 10% decrease be considered relatively unchanged; for this reason, we have added a gray zone to to minimize the significance of changes with a magnitude less than 10%. The diameter decreases between 10% and 30% are less likely to be a result of variability and could potentially represent true antitumor effect, although the clinical implications of such a minor response would need to be investigated further. It is worth remembering that the historical roots of the RECIST response criteria date back to a variability study performed in 1976
16,17; perhaps the improved accuracy of modern measurement could in part be a basis for reconsidering what qualifies as a tumor response. Interestingly, several groups have found that a 10% decrease in tumor diameter may be correlated with better outcomes in some cancers.
18–21Because this study measured only a single lesion for each patient, we are not able to quantify how summed measurement of multiple lesions might affect variability, although we can estimate this effect using the standard deviation of measurement variability from earlier. Although variance increases proportionally with a number (
n) of independent summed measurements with the same standard deviation (σ), standard deviation (the square root of variance) increases at a square root proportion (√
n × σ). This means the relative magnitude of the standard deviation can decrease with summed measurement, particularly if one assumes no correlation between the measurement error for two tumors on an individual CT scan. To illustrate this, we can consider a patient with multiple 15-mm lung tumors. Measurement of a single 15-mm tumor has a standard deviation of approximately 2.0 mm (), meaning that 95% of tumor measurements can be expected to lie within 15 ± 4.0 mm; as a percentage of tumor size, this is equal to a 95% limits of agreement of −27% and +27%. Yet when four 15-mm tumors are measured, the standard deviation increases only by a factor of two, to equal 4.0 mm; therefore, 95% of tumor measurements will lie within 60 ± 8.0 mm, equal to a 95% limits of agreement of −13% and +13% (
Appendix Table A1, online only). This demonstrates how one can increase the relative accuracy of summed measurements by measuring a greater number of similarly sized lesions.
In conclusion, this rescan study of lung lesions in patients with advanced NSCLC found a clinically important magnitude of measurement variability inherent in repeat CT imaging. This variability is greatest in the measurement of small tumors and has important implications for accurate determination of disease progression. Apparent changes in tumor diameter exceeding 1 to 2 mm are common on reimaging and alone may not be indicative of progression. Relative changes less than 10% may be indistinguishable from changes caused by variability alone and are unproven as a marker of efficacy in clinical trials.