|Home | About | Journals | Submit | Contact Us | Français|
Breast cancer missed on diagnostic mammography may contribute to delayed diagnoses, while false-positive results may lead to unnecessary invasive procedures. Whether accuracy of diagnostic mammography at facilities serving vulnerable women differs from other facilities is unknown.
To compare the interpretive performance of diagnostic mammography at facilities serving vulnerable women to those serving non-vulnerable women.
We examined 168,251 diagnostic mammograms performed at BCSC facilities from 1999–2005. We used hierarchical logistic regression to compare sensitivity, false positive rates, and cancer detection rates.
Women ages 40–80 years undergoing diagnostic mammography to evaluate an abnormal screening mammogram or breast problem.
Facilities were assigned vulnerability indices according to the populations served based on the proportion of mammograms performed on women with lower educational attainment, racial/ethnic minority status, limited household income, or rural residences.
Sensitivity of diagnostic mammography did not vary significantly across vulnerability indices adjusted for patient-level characteristics, but false-positive rates for diagnostic mammography examinations to evaluate a breast problem were higher at facilities serving vulnerable women defined as those with: lower educational attainment (odds ratio (OR) 1.39; 95% confidence interval (CI) 1.08, 1.79); racial/ethnic minorities (OR 1.32; 95% CI 0.98, 1.76); limited income (OR 1.34; 95% CI 1.08, 1.66), and rural residence (OR 1.55; 95% CI 1.27, 1.88).
Diagnostic mammography to evaluate a breast problem at facilities serving vulnerable women has higher false positive rates than at facilities serving non-vulnerable women. This may reflect concerns that vulnerable populations may be less likely to follow-up after abnormal diagnostic mammography or concerns that such populations have higher cancer prevalence.
Diagnostic mammography is the principal imaging tool used to diagnosis breast cancer. Accuracy in diagnostic mammography interpretations varies nationally across facilities,(1) in part due to differences in radiologists’ experience, equipment, practice patterns, and patient populations undergoing diagnostic mammography. Previous research has focused primarily on the woman and radiologist, factors that contribute to accuracy in diagnostic mammography.(2–5) Disparities in cancer care and outcomes for women with lower education attainment, racial or ethnic minorities, limited income, and those who live in rural areas are well established.(6–8) Whether the interpretative performance of diagnostic mammography at facilities serving a large proportion of women with these demographics is similar to performance at other facilities is less well understood.
Vulnerable women such as those with lower educational attainment, racial or ethnic minorities, limited income, or living in a rural area are at higher risk for poor breast cancer outcomes (6–8). Previous research indicates that screening mammography performed at facilities serving high proportions of vulnerable women had higher specificity and similar sensitivity compared to screening at facilities serving non-vulnerable women.(9) In other words, women seen at facilities serving greater numbers of vulnerable women are more likely to have a normal screening mammography examination when they do not have cancer (i.e. higher specificity rates) than at facilities serving vulnerable women; however, there is no difference in the proportion of cancers that are detected. Findings for screening mammography, however, may differ from those in diagnostic mammography, as these types of mammography may require different interpretive skills.(10) While the goal of both screening and diagnostic mammography is to detect breast cancer, the two types of exams involve different patient populations, very different cancer incidence, different numbers of images taken, possible use of different machines and/or technologists, different interpretation protocols (e.g., batch vs. online reading), and different management recommendations for abnormal assessments.
Since it is the accuracy of diagnostic mammography that confers whether a woman will undergo an invasive biopsy necessary to make the diagnosis of cancer, it is important to determine any differences in diagnostic performance between facilities serving different populations. Moreover, it is important for diagnostic mammography to have high sensitivity to avoid diagnostic delay. In this study, we compare the accuracy of diagnostic mammography interpretations at facilities serving vulnerable women to the accuracy at facilities serving non-vulnerable women in the Breast Cancer Surveillance Consortium (BCSC) (1) to understand whether differences in accuracy could partially account for the disparities in breast cancer severity and mortality in vulnerable populations.
Data were pooled from mammography registries across seven states participating in the National Cancer Institute-funded BCSC and include mammography interpretations linked at the patient-level to pathology and tumor registry data. This consortium was formed to evaluate the quality of mammography nationally. The BCSC population has been shown to be representative of the U.S. population of women with characteristics that are similar to national demographics in terms of age, ethnicity, and urban or rural residence. (1)
The mammography registries prospectively collect women’s self-reported demographic information and breast cancer risk factor data at each mammography examination, together with radiologists’ reports on screening and diagnostic mammography. The BCSC links to 2000 U.S. Census Bureau data based on the women’s zip codes to obtain population-level socio-demographic information. Registries ascertain cancer outcomes through linkage with state tumor registries or regional Surveillance, Epidemiology, and End Results (SEER) programs, as well as linkages to pathology databases at five of the seven mammography registries. Each registry and the Statistical Coordinating Center have received IRB approval for either active or passive consenting processes or a waiver of consent to enroll participants, link data, and perform analytic studies. All procedures are Health Insurance Portability and Accountability Act (HIPAA) compliant and all registries and the Statistical Coordinating Center have received a Federal Certificate of Confidentiality and other protection for the identities of women, physicians, and facilities who are subjects of this research.
We included women ages 40–80 years who underwent at least one diagnosticmammography examination between January 1999 and December 2005 identified by the radiologist as either performed for the evaluation of a recent abnormal screening result (termed ‘additional evaluation of a recent abnormal screening result’) or performed for the ‘evaluation of a symptomatic breast problem.’ Of the 384,063 diagnostic mammography interpretations initially identified from women age 40–80 during the years 1999–2005, we excluded mammography from women with a history of breast cancer [N=57,786, 15%] based on self-report or linkage with the cancer registry or pathology databases; those from women who reported presence of breast implants at the time of examination (N=6,155, 1.6%],(11), those missing a final mammographic assessment [N=1,504, 3.9%], and those with unknown time since last mammography [N=34,575, 9.0%]. In addition, some BCSC facilities do not collect Breast Imaging Reporting and Data System (BI-RADS) breast density as it is not required to collect breast density by the Mammography Quality Standards Act or for American College of Radiation accreditation, yet as it is a potential confounding variable related to both interpretive performance and patient characteristics, (5, 12) we excluded mammography interpretations where breast density was not reported [N=113,280, 29%].
We used standard definitions for diagnostic mammography interpretations based on the final assessment at the end of imaging work-up, which could be up to 90 days following the initial diagnostic examination.(2, 13) BI-RADS is the standard lexicon used for interpreting mammography. (13) Final assessments of BI-RADS 4 (suspicious abnormality) or 5 (highly suggestive of malignancy), or 0 (incomplete) or 3 (probably benign finding) with a recommendation for biopsy, fine needle aspiration (FNA), or surgical consult were classified a positive interpretations. A negative diagnostic mammography examination was defined as a mammography interpretation with a BI-RADS 1 (negative) or 2 (benign finding) or a BI-RADS 0 or 3 without a final recommendation for biopsy/FNA/surgical consult. If the final assessment within 90 days in the BCSC database had a BI-RADS 0 with recommendation for additional imaging, non-specified workup, or a missing recommendation, we considered the assessment to be missing and excluded the mammography from the analysis (0.4%). We considered women to have a diagnosis of breast cancer if the state tumor registries or regional Surveillance, Epidemiology, and End Results (SEER) programs, or the pathology databases showed any invasive carcinoma or ductal carcinoma in situ within 12 months of the diagnostic exam. We also included cancers diagnosed within 30 days prior to diagnostic mammography because some cancer registries define the date of diagnosis to be the first evidence of breast cancer rather than the date of the biopsy confirmation, and this evidence could come from a prior recent abnormal screening result or clinical exam which led to the additional diagnostic evaluation.(1) Sarcomas, lymphomas, and lobular carcinoma in situ were not considered breast cancer.
To evaluate accuracy of diagnostic mammography, we calculated sensitivity, false positive rates, and cancer detection rates. (1, 14) Sensitivity was calculated from the number of true-positive mammography examinations (positive final assessments with breast cancer) divided by the number of breast cancers, and false positive rates were calculated from the number of false-positive examinations (positive final assessments without breast cancer) divided by the number of non-breast cancers. Cancer detection rate was defined as the number of cancers detected (true positives) per 1,000 diagnostic mammography interpretations.
For this analysis, we used the methodology and definitions employed in a prior study that analyzed the accuracy of screening mammography at facilities serving vulnerable populations.(9) Vulnerability was based on four socio-demographic characteristics: educational attainment, race/ethnicity, household income, and whether living in a rural or urban area. We first measured these characteristics for all women. Self-reported information provided at the time of mammography was used to determine a woman’s educational attainment and race/ethnicity. For income and rural/urban status, geocoded linkages between 2000 Census data and self-reported residential zip code at the time of mammography were used to assign each woman an income measure corresponding to the median household income in the zip code and a rural/urban score corresponding to the percentage of rural residences in the zip code.
To describe the vulnerability of the population served by each mammography facility, we calculated a continuous facility-level vulnerability index by aggregating individual woman-level characteristics for the four vulnerability measures across all mammography examinations (both screening and diagnostic) served by a given facility during the 1999–2005 study period. The continuous index measures were (1) the percentage of the population with a high school education or less, (2) the percentage of the population composed of minorities (self-reported African-American race, or Hispanic/Pacific-Islander/Hawaiian/Native American ethnicity), (3) the average median household income and (4) the average percentage of rural residents. We did not include Non-Pacific Islander Asian Americans as a vulnerable minority because their breast cancer mortality rates are lower than that of Caucasians and other minority groups.(15) The continuous index measures of vulnerability were then dichotomized to provide a binary facility-level vulnerable/not vulnerable classification for the population served by each facility. The cutoffs for these dichotomized variables were determined by taking one standard deviation from the means of the continuous vulnerability measures for our study population. Specifically, we classified facilities as serving a vulnerable population if: 1) ≥17% of mammography interpretations were from women who had not completed high school (lower educational attainment); 2) the percentage minority was > 30% (racial/ethnic minority); 3) the average median income was < $45,000 (limited income); or 4) the average percentage of rural residences was > 52% (rural residence). We also created a composite facility-level vulnerability score by adding 1 for each of the binary vulnerability indices met; each component was given equal weighting, so the score ranged from 0 to 4. For descriptive purposes in this paper (Table 1), we referred to a 0 composite measure of vulnerability as ‘non-vulnerable’, a 1 or 2 as ‘moderately vulnerable’, and a 3 or 4 as ‘highly vulnerable’, however for the main analysis we included all 5 measures. Finally, each facility’s four binary vulnerability indices and its composite index were assigned to every diagnostic mammography examination performed on women within the facility. Thus, the vulnerability score is a characteristic of the population to which the woman (mammography) belongs, rather than a characteristic of the woman herself.
To ensure interpretability and stability of the facility vulnerability categorizations, we excluded facilities if any of their 4 vulnerability classifications (serving limited education, racial/ethnic minorities, women with limited income, and rural residents) were missing (N = 2) or changed more than twice during the 7 year study period (N = 2).
For all analyses, we analyzed separately two types of diagnostic mammography examinations (those indicated as follow-up of a recent abnormal screening result and those indicated for evaluation of a breast problem) because interpretive performance measures of different indications for diagnostic examinations are known to vary significantly. (1) We described the total number of diagnostic mammography interpretations by age group, BI-RADS breast density, time since last mammography, and BI-RADS final assessment, and stratified results by facility vulnerability categories. We then calculated the unadjusted sensitivity, false positive rate and cancer detection rate of diagnostic mammography, stratified by each of the four binary vulnerability indices and the composite index.
Adjusted associations (odds ratios and 95% confidence intervals) between diagnostic accuracy and the facility-level vulnerability measures were estimated using logistic-normal mixed-effects models, (16) with adjusted sensitivity, false positive rate, and cancer detection rates estimated from these models using marginal standardization (also known as predictive margins).(17, 18) Each model was specified at the level of the diagnostic mammography examination, with a facility-level random effect introduced to account for clustering of examinations within facilities, and with additional mammography-level covariates included to adjust for other factors that may influence mammography performance including registry site, a woman’s age, time since last mammography, and BI-RADS breast density. Woman-level random effects (to account for multiple mammography interpretations from the same woman) were not considered because of computational constraints due to the large number of woman-level clusters and the potential for risk factor covariates to change across a woman’s mammography interpretations; prior experience shows that this adjustment does not change the inferences due to the large number of women and small number of mammography interpretations per women. (19) We performed all analyses at the level of the diagnostic mammography examination. This allowed for interpretation of associations in terms of the impact that the vulnerability of the population served has on mammography accuracy at the level of the examination. Statistical modeling was done using PROC NLMIXED in SAS 9.1 (SAS Institute, Carey, NC).
In addition to the main analyses, we conducted a sensitivity analysis where we ascertained cancers for 2 years (730 days) following the index diagnostic mammography, instead of 1 year (365 days), to address the concern that cancer diagnoses in vulnerable women might be delayed due to less access to care or delayed follow-up. (20, 21) To understand the potential implications on our results of excluding mammograms without breast density reported, we compared whether facilities that reported breast density versus those that did not had different interpretive performance. As a post-hoc analysis, we also calculated unadjusted cancer rates, defined as all cancers ascertained in the cancer registries and pathology databases whether or not they were detected on diagnostic mammogram within 12 months of diagnostic mammography examinations, to measure cancer prevalence at facilities serving vulnerable women compared to non-vulnerable women.
Our study sample included 168,251 diagnostic mammography examinations, including 83,464 examinations performed at 153 facilities to evaluate a recent abnormal screening result among 76,199 women, and 84,787 examinations performed at 176 facilities to evaluate a breast problem among 74,785 women (Table 1). About 60% of all mammography interpretations (both indications) occurred at facilities serving a non-vulnerable population; around 30% occurred at facilities classified as serving a moderately vulnerable population; and 10% occurred at facilities classified as facilities serving a highly vulnerable population. This overall distribution was similar to the distributions seen within the two strata defined by diagnostic indications. Facilities serving more vulnerable populations were more likely to serve older women, women with less dense breasts, and women who were less frequently screened, and facilities serving vulnerable women were more likely to recommend a biopsy and less likely to recommend short-interval follow-up for BI-RADS assessments of 0 or 3 (‘needs additional imaging or probably a benign finding’).
In unadjusted analyses for both diagnostic mammography indications, facilities serving vulnerable women tended to have lower sensitivity and higher false positive rates across most measures of vulnerability (Table 2). Associations between vulnerability and cancer detection rates tended to differ, however, between the two mammography indications. For diagnostic mammography indicated as an additional evaluation of an abnormal screening result, facilities that served predominately rural residents and racial/ethnic minorities had lower cancer detection rates. However, for diagnostic mammography indicated to evaluate a breast problem, facilities serving vulnerable populations (racial/ethnic minorities, limited income, and rural residence) tended to have higher cancer detection rates (Table 2).
For diagnostic mammography indicated as an evaluation of an abnormal screening, most of the observed differences in raw performance measures were no longer significant after adjusting for time since previous mammography, BI-RADS breast density, registry site, and age (Table 3a). After adjustment, only facilities serving limited income women had significantly higher false positive rates relative to those serving non-vulnerable income populations (OR 1.39; 95% CI 1.13, 1.70). Adjusted sensitivity and cancer detection rates were not significantly different, though they did tend to be lower (non-significant) at facilities serving more vulnerable populations. In contrast, for diagnostic mammography taken to evaluate a breast problem, the differences in false positive rates remained after adjustment (Table 3b). Adjusted false positive rates were higher in 3 of the 4 vulnerability categories: lower educational attainment (OR 1.39; 95% CI 1.08, 1.79); limited income (OR 1.34; 95% CI 1.08, 1.66), and rural residence (OR 1.55; 95% CI 1.27, 1.88, and bordered on statistical significance for the fourth category (race/ethnicity OR 1.32, 95% CI 0.98, 1.76). Furthermore, there was a dose response relationship, with increasing vulnerability composite score associated with higher false positive rates (trend: p < 0.01). The cancer detection rate remained significantly higher after adjustment at facilities serving predominately rural populations (OR 1.26; 95% CI 1.05, 1.50), and was borderline significant for race/ethnicity and income (Table 3b). There were no significant differences in the adjusted sensitivity of diagnostic mammography interpretations between facilities.
As a sensitivity analysis, we calculated all measures of diagnostic mammography accuracy based on cancers linked using a two year follow-up window from the time of mammography, and we found no substantial differences in our conclusions for sensitivity, false positive rates, or cancer detection rates compared with using one year. (Data not presented) We did not find any substantial differences in sensitivity or false positive rates between facilities that did or did not report breast density. The only difference between facilities excluded for not reporting breast density was that the cancer detection rates among mammograms taken for evaluation of a breast problem were lower for excluded facilities that did not report breast density (37.6 per 1,000 vs. 21.6 per 1000 exams). We also calculated cancer rates as a measure of cancer prevalence at facilities serving vulnerable women compared to facilities serving non-vulnerable women and found that the unadjusted cancer rates were higher in facilities serving vulnerable women for diagnostic mammography to evaluate breast symptoms; lower educational attainment (cancer rate per 1,000 exams 43.8 vs. 44.7); racial/ethnic minorities (cancer rate 43.7 vs. 45.6); limited income (cancer rate 41.6 vs. 51.3); rural residence (cancer rate 42.8 vs. 48.1), with increasing cancer rates associated with higher composite scores. (Data not shown)
For diagnostic mammography performed to evaluate an abnormal screening result, facilities serving vulnerable women had similar interpretive performance to facilities serving non-vulnerable women. Only facilities serving limited income women, one of the four categories of vulnerable women, had significantly higher false positive rates relative to those serving non-vulnerable income populations. In contrast, for the interpretation of diagnostic mammography to evaluate a symptomatic breast problem, facilities serving a greater proportion of vulnerable women were more likely to recommend a biopsy or surgical consultation among women not subsequently diagnosed with breast cancer compared to facilities that did not serve vulnerable women. We did not find associated differences between facilities in the sensitivity or cancer detection rates. The lack of difference in the sensitivity of diagnostic mammography to evaluate an abnormal screening result or breast problem for vulnerable and non-vulnerable populations is reassuring and indicates that the characteristics of the facilities where women go does not appear to influence cancer detection among those with cancer. However, the higher false positive rates at facilities serving vulnerable women suggest these women may be more likely to receive breast biopsies when they don’t have cancer.
Our findings for diagnostic mammography differ from our prior study for screening mammography. (9) The previous study, which also used BCSC data (1998–2004), found radiologists at facilities serving women with lower educational attainment, racial/ethnic minorities, limited income, and rural residences tended to have lower false positive rates (higher specificity). These contradictory results suggest factors driving mammography interpretive performance could differ between screening and diagnostic mammography. For one, differences in cancer prevalence among women undergoing screening and diagnostic mammography may influence radiologists’ perception of cancer risk and therefore the likelihood that they would recommend women return for further testing. In settings where availability of diagnostic imaging is limited and where cancer prevalence is low (i.e. low-risk screening population), radiologists may be less likely to recall women for diagnostic mammography. In contrast, in settings where the cancer prevalence is higher, as occurs with diagnostic mammography, radiologists may be concerned that women may not return for follow up evaluation, and therefore may be more likely to recommend a biopsy as opposed to short-interval follow-up, additional diagnostic imaging, or clinical follow-up. To clarify, while radiologists are unlikely to know the likelihood of a given woman’s follow-up or cancer risk, their practice patterns are likely to be influenced by the overall follow-up rates and cancer prevalence of the population of women evaluated at the mammography facility. Follow-up rates after screening mammography for women with lower educational attainment, racial/ethnic minorities, and women with limited income are lower than for other women.(22) Similar concerns may exist for diagnostic mammography. In addition, because the unadjusted cancer rates (representative of cancer prevalence) at facilities serving vulnerable women are higher, radiologists may have a greater concern that these women are more likely to have cancer and therefore may recommend more biopsies or surgical follow-up in symptomatic woman attending these facilities. This increased likelihood to recommend biopsy or surgical follow-up could explain the higher false positive rates at these facilities.
Availability of screening and diagnostic mammography may differ across facilities. Radiologists at these facilities may have different experience in interpreting diagnostic and screening mammography. Facilities serving vulnerable populations may tend proportionately to perform more screening than diagnostic mammography, and therefore, have lower false-positive rates for screening mammography and higher false-positive rates for diagnostic mammography. (23) Facilities serving limited income women were the only type of facility serving vulnerable women that demonstrated higher false positive rates for diagnostic mammography to evaluate an abnormal screening result. These facilities may have specific resource limitations, such as lack of breast ultrasound. (24, 25) We did not have these data available to us for this analysis.
There are several important strengths and limitations to the study. We used a diverse cohort of many facilities across seven sites in the United States representative of community practice and evaluated the impact that the vulnerability of a population which a facility serves has on the accuracy of diagnostic interpretations using multiple characterizations of vulnerable women. While the higher false positive rates seen at facilities serving vulnerable women may lead to more biopsies in women who do not ultimately receive a cancer diagnosis, our study could not determine this specifically because detailed utilization data to determine whether referral for biopsy truly equated to receiving a biopsy were not available for all facilities. Lower biopsy rates could delay cancer diagnoses, and in effect, artificially raise the false positive rates in vulnerable women. It is possible that vulnerable women may be less likely to receive follow up, which could artificially increase false positive rates at facilities serving vulnerable women if some of these women lost-to-follow-up truly had cancer. To address this concern, we conducted a sensitivity analysis that extended the follow-up time for diagnosis from 1 to 2 years and did not find any difference in our findings.
We selected several measures of vulnerability to help identify facilities serving vulnerable women; however other definitions could be considered with different thresholds. Our study was limited to mammograms with BI-RADS breast density reported because breast density is a known confounder of interpretive performance. We found no substantial differences in sensitivity or false positive rates between included facilities and those excluded due to missing breast density values; however, the cancer detection rate was somewhat lower among excluded facilities. We note, though, that a number of these excluded facilities were from large urban centers which, consistent with our main analysis results, would be estimated to have lower cancer detection rates than facilities serving more rural populations.
Finally, this analysis evaluated the impact of differences in mammography facility-level characteristics on diagnostic performance at the level of a woman’s mammography exam. We did not control for radiologists’ experience, equipment, or practice patterns, which can contribute to diagnostic mammography interpretive performance.(2) Our analysis, however, has significance from the perspective of a woman choosing to undergo a mammography at a given facility. While women may be able to select where her mammography is performed, she did not have the ability to select who will interpret her mammogram at a particular facility. The experience of the collective group of radiologists and the equipment or the practice patterns at a facility were unmodifiable facility characteristics from the perspective of the woman; therefore, we did not adjust for them.
In conclusion, for diagnostic mammography indicated for evaluation of a symptomatic breast problem, facilities serving vulnerable populations, in general, had higher rates of biopsy or surgical consultation recommendations in women who did not have a subsequent diagnosis of cancer than did at facilities serving fewer vulnerable women; however, significant differences in sensitivity were not observed between such facilities. Facilities serving limited income women undergoing diagnostic mammography to evaluate an abnormal screening result additionally demonstrated greater rates of biopsy and surgical consultation referral among women who did not have a subsequent diagnosis of cancer than did facilities serving non-limited income women. Research should be conducted to determine the appropriate thresholds for referring women to biopsy in different clinical situations for optimal cancer yield per biopsy. As accuracy may differ between screening and diagnostic mammography, both should be assessed when evaluating the quality of mammography at facilities. Future research should consider evaluating facility characteristics such as availability of ultrasound and other diagnostic resources to better understand potential modifiers of diagnostic accuracy.
This work was supported by the Agency for Health Care Research and Quality, Grant #1 K08 HS018090-01, California Breast Cancer Research Project, Grant #14IB-0062, and the National Cancer Institute Breast Cancer Surveillance Consortium (U01CA63740, U01CA86076, U01CA86082, U01CA63736, U01CA70013, U01CA69976, U01CA63731, U01CA70040). The collection of cancer data used in this study was supported in part by several state public health departments and cancer registries throughout the U.S. For a full description of these sources, please see: http://breastscreening.cancer.gov/work/acknowledgement.html. The authors take full responsibility in the design of the study, the collection of the data, the analysis and interpretation of the data, the decision to submit the manuscript for publication, and the writing of the manuscript. We thank the participating women, mammography facilities, and radiologists for the data they have provided for this study. A list of the BCSC investigators and procedures for requesting BCSC data for research purposes are provided at: http://breastscreening.cancer.gov/.