|Home | About | Journals | Submit | Contact Us | Français|
There is wide variation in the management of thyroid nodules identified on ultrasound imaging.
To quantify the risk of thyroid cancer associated with thyroid nodules based on their ultrasound characteristics.
Retrospective case-control study of patients who underwent thyroid ultrasound between January 1st, 2000 and March 30th, 2005. Thyroid cancers were identified through linkage with the California Cancer Registry.
8 806 patients underwent 11 618 thyroid ultrasound examinations during the study period including 105 subsequently diagnosed with thyroid cancer. Thyroid nodules were common in patients diagnosed with cancer (97%) and patients not diagnosed with thyroid cancer (56%). Three ultrasound nodule characteristics–micro-calcifications (odds ratio [OR] 8.1 [95% CI 3.8, 17.3]), size greater than 2 cm (OR 3.6 [95% CI 1.7, 7.6]) and an entirely solid composition (OR 4.0 [95% CI 1.7, 9.2] - were the only findings associated with the risk of thyroid cancer. If a single characteristic is used as an indication for biopsy, most patients with thyroid cancer would be detected (sensitivity .88 [95% CI .80, .94]) with a high false positive rate (.44 [95% CI .43, .45]) and a low likelihood ratio positive (2.0 [95% CI 1.8, 2.2]), and 56 biopsies will be performed per cancer diagnosed. If two characteristics were required for biopsy, the sensitivity and false positive rates would be lower (sensitivity 0.52 [95% CI 0.42, 0.62]; false positive rate 0.07 [95% CI 0.07, 0.08]), the likelihood ratio positive would be higher (7.1 [95% CI 6.2, 8.2]), and only 16 biopsies will be performed per cancer diagnosed. In comparison to performing biopsy of all thyroid nodules greater than 5 mm, adoption of this more stringent rule requiring two abnormal nodule characteristics to prompt biopsy would reduce unnecessary biopsies by 90%, while maintaining a low risk of cancer, 5 per 1000 patients, for whom biopsy is deferred.
Thyroid ultrasound could be used to identify patients who have a low risk of cancer for whom biopsy could be deferred. Based on these results, these findings should be validated in a large prospective cohort.
Ultrasound has replaced nuclear medicine as the most frequently used imaging test of the thyroid.1 The growth in the use of thyroid ultrasound by radiologists, endocrinologists and head and neck surgeons has led to the discovery of large numbers of asymptomatic thyroid nodules, which may occur in 50% or more of adults.2,3 as well as a rapid rise in the diagnosis of thyroid cancer. 4 In contrast, clinically apparent thyroid cancer is rare, affecting 1/10,000 people annually, and fewer than 1% of individuals over the course of their lives.4–6 Because of the high prevalence of nodules, and the rarity of symptomatic cancer, only a minority of thyroid nodules is malignant. Uncertainty about which nodules may harbor cancer and lack of evidence-based management guidelines has resulted in a myriad of conflicting recommendations regarding which nodules warrant biopsy, 6–17,18,19–21 frequent thyroid biopsies, and the over-diagnosis of thyroid cancers that would otherwise likely have remained asymptomatic in the absence of detection.4,22,23
While many studies have analyzed the association between the ultrasound characteristics of thyroid nodules and the risk of thyroid cancer, most studies are small and all limited their analysis to patients who underwent biopsy, where the decision to biopsy was influenced by the ultrasound result. 6–17,18,19–21 This ascertainment bias will overestimate the risk of cancer associated with thyroid biopsy and the accuracy of ultrasound.24–26 The information that is most important to patients and providers managing care includes quantifying the risk of cancer associated with a nodule with a particular imaging characteristic and no prior publication can accurately provide this information. This has hindered the development of an evidence-based strategy for determining which nodules should be biopsied because of an elevated cancer risk. The purpose of this study was to determine the ultrasound characteristics that are associated with cancer, and to use this information for creating a standardized system for interpreting thyroid ultrasound.
We conducted a retrospective, case-control study at the University of California San Francisco (UCSF), including consecutive patients who underwent thyroid ultrasound between January 1st, 2000 and March 30th 2005. A waiver of patient informed consent was obtained. Patients were excluded if they had a prior unilateral or bilateral thyroidectomy for benign or malignant disease.
Cancers identified in the cohort were identified through linkage with the California Cancer Registry (CCR), a population-based cancer registry collecting cancer incidence and mortality data for all of California.27 The Registry is a collaboration between the Cancer Surveillance Section of California Department of Public Health, The Public Health Institute, and eight regional cancer registries, that by legislative mandate, have collected cancer incidence data from hospitals and other facilities across the state since 1988. The registry is certified by the North American Association of Central Cancer Registries (NAACCR) as meeting their highest standard for completeness of cancer ascertainment, reflecting capture of over 97% of cancers diagnosed in the state. 28 We included thyroid cancers diagnosed through March 30, 2007, allowing a minimum 2 years of follow-up after the last enrolled patient’s ultrasound, during which a cancer could be diagnosed, and at least 2 years further follow-up to ensure reporting to the registry.27 Patients diagnosed with non-thyroid malignant neoplasms (other than skin cancer) were excluded to prevent the inclusion of the rare, but theoretically possibly metastatic cancer to the thyroid, as these metastatic cancers would not be captured by the cancer registry. All patients diagnosed with cancer (cases, Table 1) and a sample of patients not diagnosed with cancer (controls), matched four to one to the cancer patients on age, sex and year of the ultrasound exam, were selected for detailed review of the sonogram.
We retrieved and reviewed the ultrasound examinations on 96 (91%) of the cancer patients on the Radiology PACS (Picture Archiving and Communication System) and 369 controls. Each ultrasound examination was reviewed independently by two board-certified radiologists blinded to cancer status. Disagreement was resolved by consensus. For each patient, each reviewer independently recorded the number, size and characteristics of all nodules >5 mm. There was good to outstanding agreement (kappas 0.73 to 1.0) in the categorization of the specific ultrasound characteristics.
In patients selected as controls, all nodules were considered benign. In 43 (44.8%) of cancer patients, a single nodule was identified and it was considered malignant. In 50 cancer patients (52.1%), multiple thyroid nodules were identified. To ensure correct attribution of cancer to the correct nodule, one of the authors was un-blinded, patient records (radiology, pathology, surgery) were reviewed to determine which nodules were malignant. In the small number of cases that we were unable to determine which nodule harbored cancer, all nodules were considered malignant. Nodules in patients never diagnosed with thyroid cancer (n=428) and benign nodules in cancer patients (n=87) were combined to create our final control group of benign nodules (n=515). Note that 3 cancer patients (3.1%) did not have any nodules > 5 mm identified on their ultrasound imaging.
We compared mean age, age group, sex and year of study between patients diagnosed with cancer and controls. We used Chi-square to determine whether the number of nodules varied by age group. We performed single predictor modeling to assess the association between specific ultrasound characteristics and cancer status using Generalized Estimating Equations (GEE), with a compound symmetry (exchangeable) correlation structure to account for the correlated outcomes between multiple nodules within a patient. For variables that were statistically significant in single predictor model, we calculated diagnostic accuracy statistics (sensitivity, specificity, likelihood ratios, predictive values).
To build the GEE models, we added variables that were statistically significant in single predictor models one at a time in the order of the effect size. Variables were retained if the associated p-value after inclusion was < 0.10 for that variable. The ultrasound characteristics that were retained in the final multiple-predictor model (micro-calcifications, size ≥ 2 cm and solid composition, Figure 1) were combined in various ways, via logical “and/or” criteria, to define an “abnormal ultrasound interpretation.” The ultrasound characteristics that were retained in the final multiple-predictor model (micro-calcifications, size ≥ 2 cm and solid composition, Figure 1) were combined in various ways, via logical “and/or” criteria, to define an “abnormal ultrasound interpretation.” The risk of cancer (predictive values) associated with each definition of an abnormal ultrasound was calculated, accounting for the sampling strategy in the entire cohort. The positive predictive value (PPV) is the risk of cancer for a patient that is found to have an abnormal ultrasound interpretation, and the negative predictive (NPV) is the probability of being cancer free if the ultrasound is negative. For each definition of an abnormal ultrasound, we calculated the number of cancers missed per 1000 ultrasounds performed. The number of patients needed to undergo a biopsy (NNTB) to detect a single cancer was defined as the inverse of the PPV. We performed several sensitivity analyses to determine whether implicit assumptions in the primary analysis were reasonable. More details on the analysis are provided in the online Appendix.
8 806 patients underwent 11 618 thyroid ultrasounds during the study period including 105 diagnosed with thyroid cancer (incidence 0.9 cancers per 100 ultrasound examinations). The cancers were diagnosed 1 day to 6.1 years following ultrasound imaging, and among control patients, there was a mean follow up of 4.2 years (range 2.0 – 10.9). There were no significant differences in the matching variables between cases and controls.
Thyroid nodules were common among patients diagnosed with thyroid cancer (96.9%) as well as patients not diagnosed with thyroid cancer (56.4%), Table 3. Among the 96 patients cases, 102 malignant nodules and 87 benign nodules were identified, with an increase in the number of nodules seen with advancing age. Among the 372 controls, 428 benign nodules were identified and the number of nodules did not vary with age.
Several ultrasound findings were significantly associated with the odds of a nodule harboring cancer, Table 4. Micro-calcifications had the strongest association with cancer; 38% of cancer nodules vs. 5% of benign nodules had micro-calcifications, reflecting approximately a 7 fold increase in the likelihood of cancer if micro-calcifications were seen (likelihood ratio positive 7.0 [95% CI 6.0, 8.2]) and a 30% reduction in the likelihood of cancer if micro-calcifications were not seen (likelihood ratio negative 0.65 [95% CI 0.56, 0.76]). The corresponding odds ratio was 11.6 (95% CI 6.5, 20). Course calcifications, nodule composition, nodule echogenicity, central vascularity, margins and shape were also each significantly associated with cancer, but the magnitude of association was smaller, with odds ratios ranging from 1.6 to 2.8. Rim calcifications and comet tail artifacts, peripheral vascularity and the presence of a halo were not associated with the likelihood of cancer. The odds of cancer increased with nodule size, and the largest nodules had the greatest odds of cancer (likelihood ratio 1.8 [95% CI 1.5, 2.1], and OR 3.1, [95% CI 1.8, 5.2]) for nodules > 2 cm compared with nodules under 1 cm. Simple cysts never reflected cancer.
Only three nodule characteristics were significantly associated with the risk of cancer in the multiple predictor modeling; micro-calcifications (OR 8.1 [95% CI 3.8, 17.3]), size greater than 2 cm (OR 3.6 [95% CI 1.7, 7.6]), and an entirely solid composition (OR 4.0 [95% CI 1.7, 9.2]), Table 5. The inclusion of the remaining nodule characteristics were not significantly associated with the risk of cancer, and including them in the definition of an abnormal nodule added less than 2% cancer detection.
The accuracy of the several definitions of an abnormal ultrasound are provided in Table 6. If any one of the three characteristics is used to prompt biopsy, most patients with thyroid cancer would be detected (sensitivity 0.88 [95% CI 0.80, 0.94]) at a false positive rate of 0.44 (95% CI 0.43, 0.45). The high false positive rate of this approach is reflected in a low PPV (i.e., risk of cancer) of 1.8% (95% CI 1.5%, 2.2%) when a single characteristic is used to prompt biopsy, and 56 biopsies will be required per cancer diagnosed. If two abnormal ultrasound characteristics were required to prompt biopsy, the sensitivity and false positive rates would be lower (sensitivity 0.52 [95% CI 0.42, 0.62]; false positive rate 0.07 [95% CI 0.07, 0.08]) and the risk of cancer in those with a suspicious ultrasound would be higher (PPV 6.2% [95% CI 4.7%, 8.7%]) and fewer biopsies, 16, would be required per cancer diagnosed. In comparison to existing guidelines that suggest biopsy of all thyroid nodules greater than 5 mm 7,8 adoption of this more stringent rule requiring two abnormal characteristics to prompt biopsy would reduce unnecessary biopsies by 90%, while maintaining a low risk of cancer in patients in whom biopsy is deferred (i.e., 5 cancers per 1000 ultrasound examinations, 0.5%).
The most specific definition of an abnormal ultrasound is one requiring all three abnormal characteristics to prompt biopsy, however this definition would detect only a small proportion of cancers (sensitivity 0.07 [95% CI 0.03, 0.14]), but would have a high likelihood ratio positive of 28 (95% CI 23, 34).
The tradeoff between the different definitions of an abnormal ultrasound and test accuracy is shown in Figure 2. As the number of criteria required to prompt biopsy increases, the number of missed cancers (false negatives) increases, and the number of patients who will be biopsied in order to detect a cancer will decrease. For example, if two criteria instead of one are required to prompt biopsy, the rate of missed cancers among patients who do not undergo biopsy increases from 2 to 5 per 1000 ultrasound examinations, while the number of biopsies needed to detect a cancer decreases from 56 to 16.
The risk of cancer based on the ultrasound appearance of the thyroid is shown in Table 7. The risk of cancer is low among patients with a homogeneous thyroid, where no nodules were identified (0.60 cancers per 1000). The risk of cancer is also low in patients where the only ultrasound characteristic is a simple cyst (0.32 cancers per 1000).
If the presence of a single abnormal characteristic is used to define an abnormal ultrasound, patients with a normal examination will have a risk of cancer of 2 per 1000, whereas patients with an abnormal exam will have a risk of cancer of 18 per 1000. If two or more characteristics are required to define an ultrasound exam as abnormal, patients with a negative exam will have a risk of cancer of 5 per 1000, and patients with an abnormal exam will have a risk of cancer of 62 per 1000, putting them in a moderate risk category. Micro-calcifications are the most predictive characteristic and are associated with a cancer risk of 82 per 1000. If an abnormal ultrasound is defined as one where micro-calcifications or a solid mass greater than 2 cm is seen, 58 cancers will be diagnosed per 1000 patients. When a solid mass > 2 cm with microcalcifications is seen, nearly all of these nodules harbor cancer. 960 per 1000.
The results were robust across all of the sensitivity analysis, and changed little when we varied our primary assumptions in the analysis.
Thyroid nodules are extremely common. Even among patients selected as controls in our study, 56% had thyroid nodules greater than 5 mm, and nearly a third had multiple nodules. In contrast to previous reports that have suggested the prevalence of cancer in thyroid nodules as high as 23%, we found only 1.6% of patients who had one or more thyroid nodules 5 mm or greater harbored cancer. Thus while thyroid nodules are common, the vast majority, 98.5%, are benign, highlighting the importance of being prudent in deciding which nodules should be sampled to reduce unnecessary biopsies.22 Unnecessary tissue sampling is not only invasive and costly, but leads to repeated sampling and unnecessary open surgical procedures, as up to one third of fine needle aspiration biopsies may be non-diagnostic, requiring open surgical biopsy for diagnosis.8,9,29,30 We found that only three ultrasound characteristics: micro-calcifications, size ≥2 cm, and entirely solid composition – were statistically significantly associated with the risk of cancer, and that when used in combination, these three characteristics could be used to help determine which nodules should be sampled. Simple cysts are essentially never malignant and should not be sampled.31
There are many ways to characterize the accuracy of ultrasound. We believe the risk of cancer, PPV, is the most relevant to patients and physicians and ours is the first study that permits estimating this risk. A patient’s risk of harboring cancer ranges from 2 per 1000 among patients whose thyroid ultrasound has none of the three characteristics identified; 18 per 1000 if a patient has a nodule with a single characteristic; 62 per 1000 if a patient has a nodule with two abnormal characteristics; and 960 per 1000 if a patient has a nodule with all three characteristics. While there is growing concern regarding over-diagnosis and over treatment across all areas of medicine,22,32–34 there are no well-established guidelines of what risk is low enough that an imaging finding can be ignored. In other areas of diagnostic testing, for example when assessing patients at risk for acute coronary syndrome, or breast cancer (diseases with higher morbidity and mortality than thyroid cancer), often a risk of less than 1% or 0.5% is considered sufficiently low that further evaluation is deemed unnecessary. If a thyroid cancer risk < 0.5% is considered acceptable for those in whom biopsy is deferred, using micro-calcifications or the combined observations of a large (≥2 cm) solid nodule as the only features to prompt biopsy reflects a good choice. In comparison with various guidelines that recommend biopsies in a larger number of patients13 limiting biopsy to nodules that fulfill this definition would reduce the number of biopsies by as much as 90%, while maintaining a low cancer rate of 5/1000 among individuals who do not undergo thyroid sampling. Most thyroid cancers have a favorable prognosis, with a 20-year survival greater than 97% seen even among patients who do not receive immediate treatment.10,23,34,35 Thus, given the favorable prognosis of most thyroid cancer even without treatment, a risk of cancer of 0.5% among those with a negative examination seems to balance between detection and unnecessary tissue sampling. Ongoing ultrasound surveillance of patients with nodules who do not meet the criteria for biopsy, is unlikely to prove beneficial given our results ascribe these patients a low risk of cancer for as long as 10 years following imaging.
Our study was designed to determine how to reduce unnecessary and excessive thyroid surveillance and biopsy. Our study does not provide evidence as to whether the detection of thyroid cancers will lead to improved patient outcomes. There has been a recent rise in the observed incidence of small and micro thyroid cancer 4,5,35 without a corresponding change in thyroid cancer mortality rate, raising the question as to whether there is benefit to the earlier diagnosis or treatment of incidental thyroid cancer.22,23,36,37
A large number of previous studies have assessed the risk of cancer associated with the ultrasound appearance of the thyroid. 6,7,8,10,11–17,18,19–21,38–43 All previous studies will have inflated the association between nodule characteristics and cancer risk because they limited their analysis to nodules that underwent biopsy. For example, Ahn et al. compared various existing guidelines for prompting fine needle aspiration in a sample of 1398 patients who had undergone biopsy. 13 In this sample, 20% of the included patients had cancer, contrasting with the 1.5% cancer rate in our study. He reports that the PPV value for cancer if a patient has micro-calcifications is 85.1%, whereas using our population-based approach without ascertainment bias, we found a PPV of 5.8%. We considered a large number of nodule characteristics endorsed by other authors 5, 7–26 but when put into the multiple predictor models, most of the characteristics were not significantly associated with cancer risk.
It is widely reported that the number of benign thyroid nodules increases with age. We observed this relationship among patients diagnosed with cancer, but not among patients without cancer.
The main strength of our study is the large sample size and the linkage of the cohort with data from a comprehensive cancer registry, which allows accurate assessment of the true underlying prevalence of cancer. The analysis has several limitations. We did not have accurate information about why patients underwent imaging – and the risk of cancer may vary by why patients were sent for sonograms. We did not stratify the results by the histological type of cancer, although the majority of included cancers were papillary cancer, as is the case for thyoid cancer in general. There are several ultrasound features that we did not assess, but these are rare, such as extra-capsular growth, or abnormal lymph nodes.11 We did not include the theoretical metastatic cancer to the thyroid as these would not be captured in the cancer registry data. However, we also linked to the local pathology database, and no cases of metastatic cancer were identified in the thyroid biopsies included.
The increased utilization 1 and improved technical quality of ultrasound has given rise to the detection of multiple morphologic characteristics, without clear criteria for what needs further evaluation, 22 resulting in greater tissue sampling and excessive treatment.23,35 In mammography, the adoption of uniform interpretation standards through BI-RADS (Breast Imaging Reporting and Data System) has been useful in allowing comparative effectiveness work in breast imaging, and efforts to standardize the interpretation of mammograms. Similar adoption of uniform standards for interpreting of thyroid sonograms, would be a first step toward standardizing the diagnosis and treatment of thyroid cancer, and limiting unnecessary diagnostic testing and treatment.
Funding/Support: This study was supported by the National Cancer Institute R21CA131698 and K24 CA125036, and a University of California San Francisco Department of Radiology and Biomedical Engineering SEED grant. The content is solely the responsibility of the authors and does not represent the official views of the National Cancer Institute or the National Institutes of Health. The funding organizations had no role in the design and conduct of the study, data collection, management, analysis, or interpretation of the data; or preparation, review, and approval of the manuscript.
Role of the Sponsor: None
Conflict of Interest Disclosure: All authors have completed and submitted the ICMJE Form for Disclosure of Potential Conflicts of Interest and none were reported.
Author Contributions: Smith-Bindman had full access to all the data in this study and takes responsibility for the integrity of the data and the accuracy of the data analysis.Study concept and design: Smith-Bindman, Goldstein, Feldstein
Acquisition of data: Smith-Bindman
Analysis and interpretation of data: Smith-Bindman, Lebda, Feldstein, Sellami, Goldstein, Brasic, Jin, Kornak,
Drafting of the manuscript: Smith-Bindman
Critical revision of the manuscript for important intellectual content: Smith-Bindman, Lebda, Feldstein, Sellami Goldstein, Brasic, Kornak
Statistical analysis: Smith-Bindman, Jin, Kornak
Obtained funding: Smith-Bindman, Sellami
Administrative, technical, or material support: Smith-Bindman, Lebda, Sellami, Brasic, Feldstein, Goldstein, Jin, Kornak,
Study supervision: Smith-Bindman.
Additional Contributions: We thank the following people for their valuable assistance in gathering data for this study: Phillip Chu, and the Northern California Cancer Registry. The collection of cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer-reporting program.