|Home | About | Journals | Submit | Contact Us | Français|
Imperfect detection on screening tests can lead to erroneous conclusions about the natural history of thyroid nodules following radiation exposure. Our objective was to assess in a repeatedly screened I-131-exposed population the frequency with which a thyroid nodule could be retrospectively identified on ultrasonography studies preceding the one on which it was initially detected.
A cohort of over 13,000 young people exposed to fallout from Chornobyl underwent ultrasonography screening at 2-year intervals from 1998 to 2007. The study group consisted of screening examinations on which a thyroid nodule was detected following one or more prior negative examinations. In the study group there were 48 cancers and 92 benign nodules. For each of these 140 index studies a comparison set was created containing all available prior studies plus (to test for bias) negative studies from control subjects. While viewing the index study, three independent reviewers scored the comparison studies for the presence and size of a preexisting nodule. Detection rates were compared for true priors versus controls, for cancer versus benign, and for histologic subtypes of papillary carcinoma.
A preexisting nodule was identified by at least one reviewer in 24.0% of the true prior versus 8.3% of the controls and by all three reviewers in 11% versus 1% (Fisher's exact test, p<0.0001). There was no significant difference in detection rates between cancers and benign nodules (22.4% vs. 24.7%, p=0.411). There was no correlation between time from prior to index study and change in nodule size for either malignant or benign nodules (r=0.01, NS). There were no differences in detection rates or size among papillary cancer subtypes. Reviewers could not distinguish between true priors and controls.
These findings, showing significant rates of undetected benign and malignant nodules and no evidence for rapid growth, suggest that conclusions drawn from screening studies about the frequency of late-developing, rapidly growing thyroid nodules following radiation exposure should be interpreted with caution.
The April 1986 catastrophe at the Chornobyl (Chernobyl) nuclear power plant has provided a unique opportunity to assess the effects on the thyroid of environmental exposure to radioiodines, mostly I-131. The most significant health consequence reported to date has been an increase in papillary thyroid cancer, principally among those who were exposed during childhood or adolescence (1,2). Excess thyroid cancers appear to continue occurring today (3,4).
Given the unique nature of the disaster at Chornobyl, it is critical that any conclusions about the effects should be as accurate as possible. In particular, any conclusions about the latency and growth velocity of subsequent cancers should take into account the sensitivity of the means used to detect them. Imperfect detection has the potential to lead to the conclusion that there is a larger subset of late-developing, rapidly growing tumors than is actually the case.
Since 1998, the Ukrainian–American cohort study of thyroid cancer and other thyroid diseases following the Chornobyl accident (UkrAm study) has followed a cohort of over 13,000 young people who were living in areas affected by the fallout from Chornobyl. Those in the study cohort received on average four comprehensive evaluations at 2-year intervals, including an ultrasonography evaluation of the thyroid.
The goal of our study was to refine our understanding of the natural history of thyroid nodules in a radiation-exposed cohort. To this end we reviewed prior ultrasonography studies in conjunction with the study on which a nodule was first detected to evaluate the frequency with which the nodule could be identified in retrospect on earlier studies.
The UkrAm study was reviewed and approved by the Institutional Review Boards of the U.S. National Cancer Institute and the Institute of Endocrinology and Metabolism of the Academy of Medical Sciences of Ukraine, and all participants (or their legal guardians for those under 16 years at the time of screening) signed an informed consent form.
Details of the design of the UkrAm study have been published previously (5). Briefly, the cohort includes over 13,000 subjects who were younger than 18 years on April 26, 1986, had direct thyroid radioactivity measurements made in May or June 1986, and lived in the most heavily contaminated regions of Ukraine in 1998.
The cohort was screened four times at 2-year intervals from 1998 to 2007 either by mobile teams visiting regional hospitals or at the Institute of Endocrinology and Metabolism in Kyiv. Examinations included thyroid palpation and ultrasonography examination. From 1998 to 2002, screening was done using 7.5MHz probes, either an electronic linear transducer (Hitachi Medical Systems, Tokyo, Japan; GE Logiq a100, General Electric Company, Milwaukee, WI) or a mechanical sector probe with water bag kit (Tosbee SSA 240s with 7.5MHz SM-708A probes; Toshiba Corporation, Tokyo, Japan). In 2002, this equipment was replaced with a laptop-based mobile system that used a 10MHz linear probe (Terason Ultrasound, Burlington, MA). Detailed information about the location and characteristics of thyroid nodules and regional lymph nodes, and about thyroid size and echostructure were recorded on a standardized ultrasound form. Standard longitudinal and transverse static images were stored for all patients. For each lobe, one longitudinal image and up to three transverse images were recorded. Additional focused images were obtained in patients with nodules. For all studies done on the Terason system, ~3-second video sweeps of each lobe were stored in addition to the static images. Images obtained on the earlier systems were recorded on thermal paper or a Camtronics magneto-optical disk (Camtronics Medical Systems, Birmingham, AL) and later were scanned into a central database as part of the patient record. The Terason images and video loops were stored initially on a hard drive and transferred directly to the central image database.
Ultrasound-guided fine-needle aspiration (FNA) was performed on all palpable and ultrasound-detected nodules that were >10mm in largest dimension and on all nodules 5–10mm in largest dimension that had one or more of the following features were hypoechoic, or had microcalcifications, an irregular contour, extension through the thyroid capsule, interval growth, or suspicious lymph nodes. Cytological interpretations were made by two study pathologists and then reviewed by a third. Patients were referred to surgery if their FNA cytology was interpreted as suspicious for, or diagnostic of, malignancy or follicular neoplasm.
All surgical specimens were first examined by an experienced pathologist and classified according to the World Health Organization histological system (6–8) and later reviewed by an International Pathology Panel established by the Chernobyl Tissue Bank Project (9). Papillary cancers were subcategorized into papillary, follicular, mixed, and solid histologic subtypes.
The index group consisted of ultrasonography studies from individuals who had a thyroid nodule detected for the first time in the second through the fourth screening cycle and had at least one prior negative ultrasonography examination. Studies from both the index examination and prior negative examination had to be available for review. The index group contained 140 ultrasonography studies. These consisted of 48 studies of thyroid cancer (44 papillary cancers, 3 follicular, and 1 medullary) and 92 studies of benign thyroid nodules (19 surgically proved benign nodules in the cohort plus a random sample of 73 nodules determined on FNA to be benign). There were 17 additional cases of incident thyroid cancer diagnosed in the cohort for which prior studies were not available, and therefore these were not included in the index group. In all cases, nodules had been measured at the time the study was done.
The comparison group consisted of 560 studies on which no nodule had been identified at the time of examination. It contained all prior studies available for the index group (true prior studies, n=247) plus studies from patients for whom no nodule was ever identified on ultrasonography (control studies, n=313). We included studies from individuals without thyroid nodules in the comparison group to minimize the potential bias related to reviewers' awareness of nodule presence on index studies. For the control studies, patients without nodules were randomly selected from within the prespecified strata of the UkrAm cohort to match patients with nodules on sex, age, and examination cycle. One true prior study had a date inconsistency and was dropped, leaving 559 studies for analysis.
Each index study was presented for review along with a set of four comparison studies. In each comparison set, the number of true prior studies varied from 1 to 3, and control studies made up the remainder. The order of the control studies and true prior studies was randomized. All identifying information was removed from the images. The studies were reviewed using two projectors set side by side in a dark room: one showed images from the index study in which the nodule was identified (only the images showing the nodule were loaded; the images showed measurement calipers). The location of the nodule was annotated on the images and could also be verified visually since both long- and short-axis views of the affected lobe were available. The other projected all available images from the comparison studies. Readers compared the images from each comparison study one by one with the images from the index study being projected simultaneously on the other screen to assess for presence of a nodule in the same location. No manipulation of original image magnification was done. Three expert readers scored the studies independently (P.O.K., R.J.M., and E.S.). All of them were present at the same session, but they did not communicate their judgments with each other. Readers did not have a formal time limit for viewing each image and could re-examine individual images within each study as often as they wished.
For each index study, reviewers scored the four comparison studies in three ways. First, they judged whether they believed the comparison study was from the same patient as the index study. Then, they recorded whether they believed that a nodule was present on the comparison study in the region where the nodule was detected on the index study, using a simple dichotomous score (present/absent). Where they believed a nodule was present, they estimated its maximum diameter in millimeters by comparing it with an image from the index study that contained measurement data.
Our major objective was to compare the rate of nodule detection on true prior studies and control studies. To assure that reviewers were indeed blind as to whether the studies were true priors or controls, we also calculated a same-patient judgment rate and compared it for the two study groups. If the same-patient rates did not vary between true prior studies and control studies, this would suggest that observer bias was unlikely. We could calculate nodule detection rates and same-patient rates for all three readers separately, but to properly conduct significance tests, we reduced the number of judgments (1677) to the number of studies (559) following a testing for differences in rates among the readers. To this end, we combined reader judgments for each comparison study, and then dichotomized the four possibilities (three negative judgments, two negative and one positive judgment, one negative and two positive judgments, and three positive judgments). Throughout the article, we emphasize rates computed based on a positive judgment by at least one reader for each comparison study, since our goal was to identify as many undetected nodules as possible. Once the judgments were reduced to studies, we computed standard Fisher's exact tests to compare the nodule detection rates between true prior studies and control studies. Repeated measure analysis of variances (ANOVAs) were conducted to examine the possibility of statistical independence with the four trials for each index case and to test for possible effect of machine type on miss rate. In addition, for true prior studies, we compared the nodule detection rates between benign and malignant tumors and among different histologic subtypes of papillary cancers. Using an ANOVA with Student-Newman-Keuls (SNK) multiple comparison tests, we further evaluated whether the size of nodules on index studies differed statistically between benign and malignant tumors and between different histological subtypes of papillary cancer.
Because by definition the nodules were not initially perceived on the prior studies, the images from those studies show at best only a random slice through the nodule, not a slice chosen to show the true nodule size. Thus, we could not directly obtain a growth rate by comparing nodule size on the index and the comparison studies; therefore, we attempted to evaluate the growth dynamics of thyroid nodules by testing whether the nodules detected on prior studies tended to differ more in size from the index nodule size (percent change) with increasing time between the index study and prior study (days). The relationship between the percent change in size and the number of days was evaluated using a scatterplot, calculating means (standard deviations), and computing a Pearson product-moment correlation. Finally, we compared the sizes of benign and malignant nodules on prior studies using t-tests.
The reviewers were unable to distinguish between true prior studies and controls. At least one reviewer judged the comparison study to be from the same patient as the index study in 69.9% (172/246) of the true prior studies, compared with 63.3% (198/313) for the control studies, a difference that was not significant (Fisher's exact test, p=0.34).
There were no significant differences among the three reviewers in the rate of nodule detection in either true prior studies or control studies (15.9%, 19.1%, and 16.3% for true prior studies, and 2.9%, 4.8%, and 4.8%, for control studies, respectively); pair-wise comparisons of rates by McNemar's Q-tests were all nonsignificant, ranging from p<0.06 to p<1.0.
In 24.0% (59/246) of the true prior studies, at least one reviewer saw a nodule in the same location as the index nodule, compared with 8.3% (26/313) in the control studies, a significant difference (Fisher's exact test, p<0.0001) (Table 1). All three reviewers identified a prior nodule in 11.0% (27/246) of the true prior studies, compared with 1% (3/313) of the controls (p<0.0001).
A repeated measures analysis of variance was used to examine whether there was nonindependence among the four trials for each index case. True prior/control was a significant effect (p<0.0001), whereas index case cluster was not (p<0.4786), and the variance (mean square error) associated with index case cluster was very small, about 3% of total variance, suggesting that statistical nonindependence is negligible. The distribution of machine types between true priors and controls was roughly similar: Terason, 12.8% versus 6.1%; Toshiba, 21.4% versus 16.6%; Hitachi, 61% versus 75.3%; GE, 4.8% versus 2.0%. A repeated measures ANOVA again showed a true prior/control effect (p<0.0004), but no effect of machine (p<0.33) or machine–group interaction (p<0.92).
There was no difference in the frequency with which benign and malignant nodules were identified as being present in the true prior studies. One or more reviewers saw a prior nodule for 22.4% (17/76) of cancers compared with 24.7% (42/170) of benign nodules, a statistically insignificant difference (Fisher's exact test, p=0.411) (Table 2). There were no significant differences in detection rate among the papillary cancer subtypes.
Table 3 presents the number, mean diameters, and standard deviations for benign and malignant nodules at the time of diagnosis on the index studies. There were no significant differences in nodule size between benign and malignant nodules or among histologic subtypes of papillary cancer at the time of diagnosis.
A linear correlation between the percent change in nodule diameter and time interval was not significantly different from zero (r=0.01, NS). That is, there was no difference in the change in nodule diameter regardless of how long the interval was between the prior study and the index study (Fig. 1). There was no significant difference in estimated size between benign and malignant nodules on the prior studies.
We found that in a large-scale thyroid-screening program of people exposed to I-131 following the Chornobyl nuclear accident, an initially undetected nodule was identified in retrospect by at least one reader in 24.0% of studies.
We found no evidence for rapid growth in the undetected nodules. Although this study does not allow a direct assessment of growth rate, we found no trend for the percent change between the index size and the prior size to increase with increasing time between index and prior studies, suggesting that the undetected nodules had an indolent growth pattern. This is in accordance with previous studies of the natural history of thyroid nodules in other irradiated and nonirradiated populations (10–13). We found no evidence to suggest a difference in growth rate between benign and malignant nodules, which is also in accordance with previous studies (12–14). In a previous UkrAm study we found the histologically more aggressive solid subtype of papillary cancer to be more conspicuous on ultrasonography than the other subtypes (15). In this study there were no cases in which a solid subtype could be identified in retrospect on prior studies. However, the number of cases (n=2) was exceedingly small and the statistical power was too low to find statistical significance.
Ultrasonography is the de facto gold standard for nodule detection in the thyroid (16–18). Although there have been many studies correlating sonographic findings with subsequent cytologic or histologic findings (19–23), we were unable to find any studies that address the overall sensitivity of thyroid sonography for nodule detection in relation to pathologic examination, and we could find no study combining sequential ultrasonography examinations and retrospective review in the fashion we did here.
It should be emphasized that the goal of the study was to improve our understanding of the natural history of thyroid nodules in a radiation-exposed cohort subject to repeated screenings, not to evaluate what might be practically attainable in a screening program. Because the study used subsequent information to increase sensitivity, the results are an indication of what the screening studies overlooked, but are not necessarily an indication of what screening could reasonably have detected at the time the studies were done. It is very possible that the degree of sensitivity we attained in this study could be achieved in a screening setting only at the cost of unacceptably low specificity.
Strengths of the study include the fact that the original screening examinations were done in a large population using a strictly defined protocol and that the retrospective review was designed to maximize sensitivity while testing for and minimizing the possibility of observer bias.
The major limitation of the current study stems from the limited number of images available from previous studies. It is likely that in many cases the comparison images did not show exactly the same portion of the gland as the index images, and since most of the lesions were small, the true incidence of undetected nodules is likely to be considerably higher than the rates we report here. A second limitation is that a number of different ultrasound machines were used in the screening, with potentially different sensitivities for nodule detection. Further, the image quality available at the time the studies were done was lower than that available with current state-of-the-art equipment, and therefore the results may not be directly applicable to future screening studies.
In summary, this retrospective review study found that in almost a quarter of those found to have an incident thyroid nodule following repeated screening examinations, the nodule was seen on a prior study by at least one reader. Thus, conclusions about the natural history of thyroid nodules following I-131 exposure based on screening studies must be interpreted with caution.
This research was supported by the Intramural Research Program of the U.S. National Cancer Institute, NIH, DHHS. The Department of Energy and the U.S. Nuclear Regulatory Commission have also contributed funds. The study team is grateful to the Louise Hamilton Kyiv Data Management Center of the University of Illinois at Chicago, supported in part by the U.S. NIH Fogarty International Center, and its head Oleksandr Zvinchuk, for database management.
The authors declare that no competing financial interests exist.