|Home | About | Journals | Submit | Contact Us | Français|
The purpose of our study was to examine the accuracy of short-interval follow-up mammograms and evaluate patient and radiologist characteristics associated with accuracy.
We evaluated 45,007 initial short-interval follow-up mammograms from the Breast Cancer Surveillance Consortium interpreted 3–9 months after a probably benign assessment on a screening or diagnostic examination between 1994 and 2004. We linked these mammograms with patient characteristics and breast cancer diagnoses within 12 months. A subset of short-interval follow-up examinations (n = 13,907) was merged with radiologist characteristics collected from survey data from 130 interpreting radiologists. Using logistic regression, we fit generalized estimating equations to model sensitivity and specificity of short-interval follow-up mammograms by patient and radiologist characteristics.
For every 1,000 women, 8.0 women (0.8%) were diagnosed with breast cancer within 6 months and 11.3 (1.1%) within 12 months. Sensitivity was 83.3% (95% CI, 79.4–87.3%) for cancers diagnosed within 6 months and 60.5% (56.2–64.7%) for those diagnosed within 12 months. Specificity was 97.2% (96.9–97.6%) at 6 months and 97.3% (96.9–97.6%) at 12 months. Sensitivity at 12 months increased among women with unilateral short-interval follow-up mammograms (odds ratio, 1.56 [95% CI, 1.06–2.29]) and when the interpreting radiologist spent more than 10 hours a week in breast imaging (odds ratio, 3.25 [1.00–10.52]).
Initial short-interval follow-up mammography examinations had a lower sensitivity for detecting breast cancer within 12 months than other diagnostic mammograms (61% for short-interval follow-up vs 80% for diagnostic mammograms reported in the literature). However, sensitivity within the 6-month interval that is usually recommended for subsequent follow-up was 83%. Accuracy of short-interval follow-up mammograms was influenced by few patient and radiologist characteristics.
Women with probably benign abnormalities noted on screening or diagnostic mammograms often receive recommendations to obtain a diagnostic mammogram in 6 months. These diagnostic mammograms, or short-interval follow-up examinations, are surprisingly common . Short-interval follow-up examinations are most often recommended for women with a BI-RADS category 3, probably benign, mammogram assessment after a full diagnostic workup [2, 3]. Because probably benign lesions have a low probability of malignancy (< 2%) [1–8], they are recommended for periodic short-interval follow-up surveillance rather than immediate biopsy to avoid unnecessary invasive procedures, patient anxiety, and medical costs [9–12]. Numerous previous studies have shown that periodic short-interval follow-up surveillance appropriately detects visible changes in probably benign lesions that represent small malignant tumors at an early stage and have a favorable prognosis [1, 2, 5–8].
Despite the frequency of short-interval follow-up examinations in clinical practice, few data such as sensitivity and specificity are available on the performance characteristics of these examinations. Our study was stimulated by data available on the Website of the National Cancer Institute provided by the Breast Cancer Surveillance Consortium (BCSC) . One table on this Website shows performance characteristics of diagnostic mammography examinations in the United States. The reported sensitivity of short-interval follow-up examinations was substantially lower than that of other diagnostic examinations (55.8% vs 79.8%) [1, 13]. We found this lower sensitivity of concern because it suggests that nearly one half of breast cancers are missed on these short-interval follow-up examinations.
The purpose of this study was to examine the accuracy of short-interval follow-up examinations and factors that might affect interpretive performance relative to other diagnostic examinations. We used detailed patient information from the BCSC linked to radiologist information from a self-administered survey to describe the sensitivity and specificity of short-interval follow-up mammograms. We also evaluated the association between the performance of short-interval follow-up mammograms and patient and radiologist characteristics in clinical practice. Patient characteristics, such as age and breast density, have been associated with the interpretive performance of both screening and diagnostic mammography examinations [14–16]. We do not know whether patient characteristics are associated with the accuracy of short-interval follow-up examinations, and this information could be used to determine which women would be best suited for periodic short-interval follow-up surveillance. Previous studies have also shown that radiologist characteristics, such as number of years in practice, workload, and type of training, are associated with interpretive performance of screening examinations [17–20]. Therefore, we explored whether any radiologists factors predict better performance when interpreting short-interval follow-up examinations.
This study used data from seven mammography registries that are part of the BCSC : Group Health Cooperative, Western Washington; Colorado Mammography Project, Colorado; New Hampshire Mammography Network, New Hampshire; Carolina Mammography Registry, North Carolina; New Mexico Mammography Project, New Mexico; San Francisco Mammography Registry, California; and Vermont Breast Cancer Surveillance System, Vermont. We also collected radiologist information via a survey of three of the registries: Group Health Cooperative, New Hampshire Mammography Network, and Colorado Mammography Project. A statistical coordinating center pooled data for analysis. Each registry and the statistical coordinating center received institutional review board approval to enroll participants, link data, and perform analytic studies. All procedures were HIPAA-compliant, and all registries and the statistical coordinating center have received a federal certificate of confidentiality that protects the identities of research subjects.
We included mammograms from 1994 to 2004 in this study if the interpreting radiologist indicated that the examination was a short-interval follow-up. We included each woman’s first unilateral or bilateral diagnostic mammography examination that was classified as short-interval follow-up (n = 110,942 women). We excluded women with a history of breast cancer (n = 7,271) because the recommendation for and interpretation of short-interval follow-up mammograms may be different for these women. We made several additional exclusions to ensure that our analysis included only initial short-interval follow-up mammograms. BI-RADS guidelines recommend short-interval follow-up mammograms only after a probably benign assessment ; therefore, we excluded women with examinations without a previous assessment of probably benign in the BCSC database (n = 25,490). Because initial short-interval follow-up examinations normally occur about 6 months after the previous mammogram or sonogram, we also excluded women who had an examination less than 3 months or more than 9 months before (n = 31,487). Finally, we excluded women with short-interval follow-up examinations that were missing a final assessment (n = 445) and those with unknown laterality of the mammogram (n = 1,242), for a total sample size of 45,007 short-interval follow-up mammograms.
We linked each short-interval follow-up mammogram to the radiologist’s final BI-RADS assessment and recommendation within 180 days after the completion of all imaging workup. We also linked each mammogram to a Surveillance Epidemiology and End Results (SEER) registry, state cancer registry, or local benign and malignant pathology database to determine whether a cancer diagnosis had been made within 365 days of the short-interval follow-up examination.
We obtained patient information by linking to standardized questionnaires completed by women at the time of the short-interval follow-up examination . The questionnaires collected information on demographics (e.g., age and race), breast cancer risk factors (e.g., family history of breast cancer, breast symptoms, menopausal status, and current use of hormone therapy), and clinical history of previous breast procedures (e.g., biopsy).
Radiologists were eligible to complete a self-administered survey if they interpreted mammograms in the year 2001 at one of three participating sites: Group Health Cooperative, New Hampshire Mammography Network, or Colorado Mammography Project. Of the 181 surveys mailed in 2002, 139 surveys were returned, for a response rate of 77%; 130 of these radiologists interpreted one or more short-interval follow-up mammograms from 1994 to 2004, making them eligible for the present study. A detailed description of the radiologist survey has been published elsewhere . Briefly, we collected demographic and clinical characteristics of radiologists, including age, sex, academic affiliation, number of years spent working in breast imaging, workload, percentage of time spent in diagnostic imaging, number of breast procedures conducted, and malpractice history. We double-entered all survey data at each site and transferred the data file via secure file transfer protocol to the statistical coordinating center. We linked these survey responses to a subgroup of the short-interval follow-up mammograms described previously that were interpreted by radiologists with survey data (n = 13,907). This subanalysis did not link to all 45,007 short-interval follow-up mammograms because we included mammograms in the main analyses that were interpreted by radiologists who did not complete the survey.
We evaluated rates of abnormal interpretation and breast cancer diagnosis among all short-interval follow-up examinations and for each category of patient and radiologist characteristics. We then evaluated the sensitivity and specificity among all examinations and for each category of patient and radiologist characteristics using the standard 12- month follow-up interval for detecting breast cancer . Because guidelines for short-interval follow-up examinations suggest that women should return for a second short-interval follow-up examination after 6 months, we also calculated sensitivity and specificity using a 6-month outcome interval. Positive examinations were defined as those with a final BI-RADS assessment of 4, 5, or 0 with a recommendation for biopsy after the completion of all diagnostic workup. Negative examinations were defined as those with a final BI-RADS assessment of 1, 2, 3, or 0 without recommendation for biopsy. We determined final BI-RADS assessments by looking for the first non-0 assessment from additional imaging within 180 days of the short-interval follow-up mammogram; however, some 0 assessments remained unresolved. We defined the abnormal interpretation rate as the number of positive examinations divided by the total number of examinations. The definitions we used for sensitivity and specificity are as follows:
The 6- and 12-month definitions were not mutually exclusive (i.e., cancers included in the 6-month definition were also included in the 12-month definition). We matched cancer laterality among unilateral short-interval follow-up mammograms so that cancers diagnosed only in the same breast as the unilateral examination were counted as a cancer diagnosis. If a cancer was diagnosed in the opposite breast, it was not counted as a cancer diagnosis for the calculation of sensitivity, and instead was included as a negative examination in the calculation of specificity. We calculated 95% CIs for sensitivity and specificity using the robust variance estimates from generalized estimating equations (GEE) with an exchangeable correlation structure to account for correlation within radiologists who interpreted more than one short-interval follow-up mammogram .
Using logistic regression fit with GEE to account for correlation within radiologists, we modeled the odds of a positive examination given a cancer diagnosis (sensitivity) and the odds of a negative examination given no cancer diagnosis (specificity). We conducted univariate analyses to determine which covariates should be included in the multivariable models; those that were statistically signifi cant (p < 0.05) in one or more models (either sensitivity or specificity) were included. The final models for the patient characteristics were adjusted for patient age (continuous), menopausal status and hormone therapy use (premenopausal, post-menopausal with no hormone therapy, post-menopausal with hormone therapy), mammogram laterality (bilateral vs unilateral), and breast density (categorized as BI-RADS density categories 3 and 4 [dense breasts] versus BI-RADS density categories 1 and 2 [fatty breasts]). The final models for the radiologist characteristics were adjusted for all patient characteristics just described and for self-reported radiologist characteristics, including age group (35–44, 45–54, ≥ 55 years old), academic affiliation (primary appointment vs affiliate or no appointment), hours per week spent in breast imaging (< 10 vs ≥ 10), number of mammograms interpreted in 2001 (≤ 1,000, 1,001–2,000, ≥ 2,001), percentage of mammograms interpreted that were diagnostic (≤ 25% vs 25–100%), and work full-time (yes vs no). We conducted all statistical analyses using Stata (StataCorp).
The characteristics of the women who underwent 45,007 short-interval follow-up examinations from 1994 to 2004 and associated performance characteristics are shown in Table 1. Most short-interval follow-up mammograms occurred among women 40–59 years old (61%) and among women with no breast symptoms (76%). In addition, most women were postmenopausal, not taking hormone therapy, and had no family history of breast cancer. Approximately 63% of examinations were unilateral short-interval follow-up mammograms. The examinations were ordered to follow up on abnormalities noted on a previous screening mammogram (42%) or as follow-up after a diagnostic evaluation was completed (50% of the previous diagnostic evaluations were after screening examinations and 8% were diagnostic examinations to evaluate a breast problem). The average abnormal interpretation rate of short-interval follow-up examinations was 3.4%. Breast cancer (matched on laterality to the breast that underwent the short-interval follow-up examination) was diagnosed in 11.3 per 1,000 women (1.1%) within 12 months of their short-interval follow-up examinations; more than half of these cases (8.0 per 1,000 examinations, 0.8%) were diagnosed within 6 months.
The 12-month sensitivity of all short-interval follow-up examinations was 60.5% (95% CI, 56.2–64.7%); at 6 months this increased to 83.3% (79.4–87.3%). We noted trends in accuracy of the short-interval follow-up examinations by patient characteristics; however, none was statistically significant in unadjusted models. Crude sensitivity increased with patient age and among post-menopausal women not receiving hormone therapy. Sensitivity was also higher among unilateral short-interval follow-up examinations compared with bilateral examinations. The sensitivity was lower for women with extremely dense breasts and whose examination before the short-interval follow-up mammogram was obtained to evaluate a specific breast problem. The average specificity was 97.3% (97.0–97.6%) at 12 months and was unchanged when calculated at 6 months; specificity increased only among women with almost entirely fatty breasts (98.4%; 95% CI, 97.8–99.0%).
Adjusted odds ratios (ORs) for sensitivity and specificity by patient characteristics are shown in Table 2. Because trends in unadjusted sensitivity and specificity at 6 months were similar to those at 12 months, and the number of cancers at 6 months was small, we presented adjusted rates for 12 months only. Although sensitivity increased among postmenopausal women who did not use hormone therapy (compared with premenopausal women), confidence intervals were wide. Women who had unilateral mammograms had significantly increased sensitivity compared with those with bilateral examinations (OR, 1.56 [95% CI, 1.06–2.29]). Postmenopausal women (regardless of hormone therapy use) and women with breast symptoms had slightly lower specificity (and thus more false-positive examinations) than premenopausal women and women without symptoms, respectively. Women with dense breasts had significantly lower specificity than women with fatty breasts (OR, 0.78 [0.69–0.97]).
We present interpretive performance by radiologists’ characteristics in Table 3. The overall rates of abnormal interpretation, cancer diagnoses, sensitivity, and specificity for this subgroup of short-interval follow-up examinations were similar to those in the larger patient population obtaining these examinations described in Table 1. We noted trends in which unadjusted sensitivities increased among radiologists who had less than 10 years of experience interpreting mammograms (compared with ≥ 10), spent 10 or more hours per week in breast imaging (compared with < 10 hours), or were female (compared with male).
A few radiologist characteristics were associated with sensitivity and specificity (Table 4). Radiologists who spent 10 or more hours per week in breast imaging had increased sensitivity and specificity compared with those who spent less than 10 hours per week (sensitivity OR, 3.25 [1.00–10.52]; specificity OR, 1.35 [0.88–2.07]). Radiologists with an affiliate or no academic appointment had increased specificity compared with those with a primary academic appointment (OR, 2.51 [1.20–5.22]).
Diagnostic mammograms obtained as short-interval follow-up examinations to follow up a probably benign abnormality have a low sensitivity for detecting cancers diagnosed within the following 12 months (61%) compared with what previous studies have shown for other types of diagnostic mammograms in the same population (usually ≈ 80%) [1, 14]. Few patient or radiologist characteristics were statistically significantly associated with sensitivity.
To our knowledge, this is the first article to describe the accuracy of initial short-interval follow-up mammograms and to evaluate the accuracy by patient and radiologist characteristics. Sickles et al. [1, 13] showed a similarly low sensitivity at 12 months (for all short-interval follow-up examinations) in results that were posted on the public BCSC Website; these results were not published or discussed in the original articles. Our study population was also from the BCSC, but included one new study site and additional years of data beyond those included in Sickles’ articles. We also examined the influence of both patient and radiologist characteristics on sensitivity and specificity. These differences and that we matched laterality of cancer diagnoses and examinations might have accounted for the slightly higher sensitivity noted in our results (60.5% vs 55.8% in Sickles’ earlier article [1, 13]). Other previous studies have evaluated the accuracy of the examination producing the initial short-interval follow-up recommendation (i.e., the examination given the probably benign assessment), but these studies did not evaluate the sensitivity or specificity of the initial short-interval follow-up examinations themselves [4, 7].
The reason for the low 12-month sensitivity of initial short-interval follow-up examinations is unclear, but there are several possible explanations. First, cancers assessed as probably benign (BI-RADS category 3) may not grow as rapidly as cancers that appear more suspicious for malignancy (BI-RADS category 4 or 5). Therefore, it may be more difficult to identify interval change (hence, recommend biopsy) at initial short-interval follow-up examinations because these examinations usually are performed 6 months rather than 1 year after the index mammogram that prompted the short-interval follow-up. The rationale for recommending an initial short-interval follow-up examination is to identify 6 months earlier those poorer-prognosis “probably benign” cancers that do grow sufficiently rapidly to be detected early [10–12].
A second possible reason for the low sensitivity of initial short-interval follow-up examinations is that radiologists interpreting them might be reassured by the previous radiologist’s probably benign interpretation and thus have a higher threshold for calling the initial examination suspicious compared with other diagnostic examinations. This would be false reassurance if it is resulting in low sensitivity for diagnosing breast cancer. It would be interesting to evaluate short-interval follow-up sensitivity among radiologists who interpreted both the examination that resulted in a probably benign assessment and the initial short-interval follow-up examination; however, we were unable to do this in our study.
A third possible reason for the low sensitivity of initial short-interval follow-up examinations may be that the standard BI-RADS definition to use a 12-month follow-up period for evaluating sensitivity and specificity does not match the 6-month follow-up period recommended after an initial short-interval follow-up examination. Our data showed that using a 6-month follow-up interval for the definition of sensitivity increased the unadjusted sensitivity of short-interval follow-up examinations to 83%, which is similar to the 12-month sensitivity for other diagnostic examinations (≈ 80%) . However, a trade-off in defining the follow-up interval for sensitivity always exists—if you shorten the follow-up interval, sensitivity increases, because there are fewer false-negative examinations that appear as interval cancers during this shorter time. It has been recommended that the follow-up interval for defining sensitivity and specificity of a screening or diagnostic test should match the follow-up interval recommended for that test, so long as that is what occurs in clinical practice . Future research should evaluate the specific follow-up intervals that radiologists are actually recommending after initial short-interval follow-up examinations and whether women are complying with those recommendations.
Our study had several limitations. We were unable to retrospectively evaluate whether cancers did or did not show interval progression on short-interval follow-up mammograms, which might have helped to determine more effective thresholds for recommending biopsy rather than continued surveillance. In addition, we were unable to evaluate performance of initial short-interval follow-up examinations by the types of lesions requiring follow-up (e.g., mass, focal asymmetry, calcifications) or by the size and stage of the cancers that were diagnosed. These analyses were beyond the scope of our project.
Although our sample included 130 radiologists from three geographic areas in the United States, we had limited ability to evaluate the importance of individual radiologist characteristics because of a small sample size in some categories. However, all sensitivity and specificity analyses were based on multiple short-interval follow-up mammograms interpreted by the radiologists, increasing the statistical power of these calculations. Despite the large size of our cohort, we were also limited by the small number of women with breast cancer used to evaluate sensitivity, especially within the 6-month follow-up period. Overall, the rate of cancer diagnoses in this study was smaller than has been reported for other short-interval follow-up studies. This may be because a large proportion of short-interval follow-up examinations in our study directly followed screening mammograms, which have a lower cancer rate compared with other diagnostic examinations. Given the low cancer rate and differences between this and previous studies, we caution the reader in interpreting our results.
Our study had several unique strengths. We were able to evaluate the interpretive performance of short-interval follow-up examinations in a large, geographically diverse population. The large population size allowed us to make several exclusions (such as women with a history of breast cancer or women without outcome information) and to analyze a population of women eligible for short-interval follow-up mammograms. Had we not made these exclusions, we likely would have increased the variability in our sensitivity and specificity estimates, thus decreasing the ease of interpretation of our results. Our population included unique detailed information on patient and radiologist characteristics available from linking several study databases. We also had the ability to link to cancer registry data, which enabled us to calculate sensitivity and specificity.
In conclusion, the sensitivity of diagnostic mammograms obtained as initial short-interval follow-up examinations is low when using the standard 12-month auditing definition for follow-up period. The reasons for this low sensitivity should be elucidated. We noted increases in sensitivity among women who underwent unilateral short-interval follow-up examinations and among radiologists who spent 10 or more hours per week in breast imaging; but overall, few patient or radiologist characteristics were associated with accuracy. The value of using a 6-month (rather than a 12-month) follow-up period for defining sensitivity also should be examined in future studies.
Supported by the Agency for Healthcare Research and Quality (HS-10591) and the National Cancer Institute (1R01 CA 107623 and K05 CA 104699; Breast Cancer Surveillance Consortium: U01 CA63731, U01 CA86082, U01 CA63736, U01 CA70013, U01 CA63740, U01 CA70040, U01 CA69976, and U01 CA86076; and K05 CA104699).
We thank the BCSC investigators, participating mammography facilities, and radiologists for the data they provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at http://breastscreening.cancer.gov/.