|Home | About | Journals | Submit | Contact Us | Français|
Assessing the volume of mammographic density might more accurately reflect the amount of breast volume at risk of malignant transformation and provide a stronger indication of risk of breast cancer than methods based on qualitative scores or dense breast area.
We prospectively collected mammograms for women undergoing screening mammography. We determined the diagnosis of subsequent invasive or ductal carcinoma in situ (DCIS) for 275 cases, selected 825 controls matched for age, ethnicity, and mammography system, and assessed three measures of breast density: percent dense area, fibroglandular volume, and percent fibroglandular volume.
After adjustment for familial breast cancer history, body mass index, history of breast biopsy, and age at first live birth, the odds ratios for breast cancer risk in the highest versus lowest measurement quintiles were 2.5 (95% CI: 1.5, 4.3) for percent dense area, 2.9 (95% CI: 1.7, 4.9) for fibroglandular volume, and 4.1 (95% CI: 2.3, 7.2) for percent fibroglandular volume. Net reclassification indexes for density measures plus risk factors versus risk factors alone were 9.6% (P=0.07) for percent dense area, 21.1% (P=0.0001) for fibroglandular volume, and 14.8% (P=0.004) for percent fibroglandular volume. Fibroglandular volume improved the categorical risk classification of 1 in 5 women for both women with and without breast cancer.
Volumetric measures of breast density are more accurate predictors of breast cancer risk than risk factors alone and than percent dense area.
Risk models including dense fibroglandular volume may more accurately predict breast cancer risk than current risk models.
Breast density has been recognized as a risk factor for breast cancer for over 30 years but there is neither a single definition of breast density nor quantitative measurement method used in clinical practice. Wolfe’s classification and BI-RADS breast density score are both qualitative categorical estimates. Ideally, the best definition and best method for measuring breast density would have the strongest association with risk of breast cancer. Risk estimates from various methods for assessing breast density vary substantially depending on the study population, technical features, reproducibility of the method, and reporting method. Wolfe first reported that there was a 22-fold increase in risk between lowest density category, N1 (normal for age, breast almost completely made of fat with a few fibrous connective tissue strands), and highest risk category, DY (extremely dense parenchyma which usually denotes connective tissue hyperplasia) (1) using a 5-year time period and a population of 5284 women. The Breast Imaging Reporting and Data System (BI-RADS) provides a 4-category ranking of the amount of mammographically dense (white in image) relative to the total projected breast area and is correlated with a 3 to 4-fold subsequent risk of breast cancer when comparing lowest to highest risk categories (2). Percent dense area, defined as the ratio of the projected area of dense breast tissue divided by area of the entire breast, is a computerized method that has been extensively used in clinical studies using digitized film mammograms. Percent dense area also provides an absolute measure of dense area, which has been shown to be a strong predictor of breast cancer risk (3-8). In a prospective study, Boyd et al. reported a 4-fold two-year relative risk of breast cancer between women with the highest and lowest over six categories of percent dense area (9). In a meta-analysis of 42 studies and aggregate data of more than 14,000 women with breast cancer (cases) and 26,000 women without breast cancer (controls), McCormack et al. (10) showed percent dense area was more strongly associated with breast cancer than BI-RADS categories. However, BI-RADS is more widely used in clinical practice because it is easier to determine and report.
Both BI-RADS and percent dense area have several limitations. They are two-dimensional estimations that impose distinct borders between dense and non-dense (fat) regions. They may vary as a function of the compression forced used when performing mammography, and they rely on skilled readers to manually determine dense and nondense areas. In addition, two-dimensional methods are not sensitive to breast thickness and may underestimate the true amount of fibroglandular tissue for large breasted women.
If increased risk of breast cancer associated with high ‘mammographic density’ reflects a greater amount of cellular and interstitial tissues, then measures of dense mammographic volume should more accurately classify risk than measures of dense mammographic area. We prospectively collected film mammograms with calibration phantoms for assessment of percent dense area and accurate estimation of the volumes of dense tissue and the whole breast and examined the association of these measures with incident breast cancer. The purpose of our study was to determine whether percent dense area, absolute fibroglandular volume, or percent fibroglandular volume is most accurate predictor of breast cancer risk.
We included women aged 18 years or older undergoing screening mammography between January 7, 2004, and April 8, 2006 at the California Pacific Medical Center Breast Health Center, a participant in the San Francisco Mammography Registry (11). As part of routine clinical care, film mammograms were digitized with the Image Checker computer aided diagnostic tool (R2, Inc., Sunnyvale, CA). A calibration phantom that has been described elsewhere (12) was imaged in the unused corner of all screening film mammograms. We excluded women who either did not have a phantom present in the film mammogram, a film mammogram was not digitized, or had poor placement of the calibration phantom. We also excluded women who had prior diagnosis of breast cancer, breast augmentation, reduction, reconstruction, or mastectomy. The study was approved by the University of California at San Francisco Institutional Review Boards for consenting processes, enrollment of participants, and ongoing data linkages for research purposes and has a Federal Certificate of Confidentiality that protect the identities of research subjects.
Cases were women with invasive breast cancer or ductal carcinoma in situ, who had a screening examination a year or more prior to their breast cancer diagnosis. Three women without breast cancer (control subjects) were randomly selected for each case and matched by age, ethnicity, and mammography system (see definition below). Examinations for controls were selected to have occurred within one year of case screening examinations. Breast cancers were identified annually linking women in the SFMR to the California Cancer Registry, which consists of the Northern and Southern California Surveillance Epidemiology and End Results (SEER) programs that collect population-based cancer data.
Demographic and breast health history information were obtained on a self-administered questionnaire at the time of screening mammography that included questions about race/ethnicity, family history of breast cancer, menopausal status, age at first live birth, current postmenopausal hormone therapy (HT) use, height and weight and history of breast biopsy (13).
We categorized women as non-Hispanic White, non-Hispanic black, Hispanic, Asian/Native Hawaiian/Pacific Islander, Native American/Native Alaskan or other/mixed race. Women were considered to have a family history of breast cancer if they reported having at least one first-degree relative (mother, sister, or daughter) with breast cancer. “Current HT users” were those who reported using prescription HT at the time of a screening examination. Postmenopausal women were defined as those with both ovaries removed, reported their periods had stopped naturally, currently using postmenopausal HT, or age 55 or older. Self-reported height and weight were used to calculate body mass index (BMI) by dividing weight in kilograms by height in meters squared (kg/m2). Age at first live birth was dichotomized into those whose first live birth was before age 30 years versus those whose first live birth was at age 30 years or older or nulliparous. This is based on the previous observations that a woman who has her first child around age 30 has approximately the same lifetime risk of developing breast cancer as a woman who has never given birth (14). History of breast biopsy was dichotomized as either previous or no previous breast biopsies.
Screening film mammograms were acquired on seven film mammography systems, either GE Healthcare DMR (GE Healthcare, Madison, WI) or Lorad M-IV (Hologic, Inc. Bedford, Ma). The mammograms consisted of Kodak MIN-R 2000 Film and MIN-R 2190 screen combination. As is typical for screening mammograms, the X-ray technique was set to “autocontrast” and each breast was compressed to the maximum pressure the patient could tolerate. Craniocaudal (CC) film mammograms were selected to calculate all density measures after being digitized to 16-bit resolution and 169-μm pixel sizes using an R2 ImageChecker digitizers (Hologic, Inc. Sunnyvale, CA). All density measures reported here were averages of left and right breasts.
Percent dense area was quantified using custom software comparable to that described by Byng (15). The software automatically detects and delineates the skin edge to define the total breast area. An expert reader defines the dense tissue area by moving a histogram threshold slider such that all pixels with values above the slider were labeled as dense. The ratio of the dense pixels to the total breast pixels times 100 was defined as percent dense area. One expert reader read all films and was trained against a local radiologist (16) and then underwent additional training at the University of Toronto by Drs. Yaffe and Boyd using a similar software tool (Cumulus) as our custom software to measure percent dense area. To test for accuracy, the reader reread 100 films previously read by a reader at the Mayo clinic using the Cumulus software (9) and having similar training. Their inter-reader intraclass correlation was 0.94.
Fibroglandular volume and percent fibroglandular volume were defined using a fully-automated technique called single-X-ray absorptiometry (SXA). A complete description of the method is found elsewhere (12). Briefly, the breast is modeled as a continuous mixture of two materials, fat and fibroglandular tissue. Percent fibroglandular volume is found for each pixel volume by interpolating image pixel grey-scale value between the phantom grey-scales values corresponding to 100% adipose and 80% fibroglandular volume at the same tissue thickness. Breast compression thickness was accurately determined, even in the presence of tilt of the compression surfaces, by the unique nature of the compressible phantom. Using the raw unprocessed images, the breast area is identified through an automated multistep algorithm that first segments between breast/non-breast regions using an automatically chosen threshold and then models the breast edge using polynomial fit. Each image in the case-control set was manually reviewed to remove potential error sources such as failure of the automated breast/non-breast segmentation routine and image artifacts. The reproducibility of the percent dense volume measured using SXA is better than 2.5%. The SXA method was used on all clinical screening mammograms during the study period, was easy to install, and did not require modifications to the clinical screening protocol.
An additional quality control phantom was scanned weekly on each mammography system to monitor calibration stability. The quality control phantom was custom designed and consisted of seven 4-cm thick regions with percent fibroglandular volume densities that ranged from 0 to 100%. No calibration changes were observed during the study period.
All statistical analyses were performed using SAS Version 9.2 (SAS Institute, Cary, North Carolina) and Stata Version 11 (Stata Corp, College Station, Texas). Normalized histograms of the case and control groups were generated to help visualize the distributions for percent dense area, percent fibroglandular volume, and fibroglandular volume. Baseline clinical and demographic characteristics of the cases and controls were compared by t-test or the Mann- Whitney U test for continuous variables and Chi-square test for categorical data. Conditional logistic regression was used to estimate the association between breast density and risk of breast cancer to take account of the matched study design. Breast density measurements were categorized into quintiles based on their distributions among control subjects. Crude odds ratios (conditional logistic model without adjustment for confounding variables) and corresponding adjusted odds ratios (adjusted for family history of breast cancer, BMI, history of breast biopsy, and age at first live birth) were calculated respectively. To test for trends, models also were fit using quintile medians. All statistical tests were two-sided and P<0.05 were considered to be statistically significant. The density measurements were studied using a variety of other scaling factors including continuous variable, log transformed, quartile and cubic splines with adjustments for the other risk factors using the concordance index (c-index) (17 1996). The c-index is a measure of the ability of these models to separate women who developed breast cancer from those women who did not. The c-index can range in value from 0.5 (random chance) to 1 (women with breast cancer consistently had risk scores higher than those who remained disease free.)
Although the c-index is considered to be the standard metric to test models, new classification methods have been developed in the past 15 years. To determine whether adding breast density measures to risk factors alone improves risk classification for both cases and controls, we calculated Pencina’s net reclassification improvement (NRI) (18). NRI assesses the degree to which cases are reclassified by the more inclusive model into higher risk categories, and controls reclassified into lower risk categories. The risk categories are those used to guide decisions about clinical care. NRI for risk assessment for risk assessment for breast cancer reflects how effectively the new risk marker might help clinicians focus more intensive surveillance and primary prevention on the higher risk women, and avoid unnecessarily frequent screening for the lowest risk. In this study we used mutually-exclusive categories for breast cancer 5-year risk stratification centered around the 1.67% risk threshold used with the Gail model: low (<1%), average (1% to 1.66%), intermediate (1.67% to 2.49%), high (2.5%-3.99%), and very high ≥ 4% (19).
There were 48,477 women with screening visits during the study period. Of these, 45,120 women were eligible and 3,357 (6.9% of population) were excluded based on either opting out of the SFMR or having a history of one of the following: mastectomy, breast augmentation, reconstruction, reduction or breast cancer. Of the 45,120 eligible women, 26,787 had available digitized film mammograms. Mammograms were unavailable for a multitude of reasons including digitizer being out of service, communication errors between the digitizer and study workstation, and poor digitization quality. Of the 26,787 women with digitized film mammograms, 16,105 women had valid SXA measures on one craniocaudal examination. Reasons for not having a valid SXA measure included missing SXA phantom in the image and compressed breast thickness out of range (over 6.7 cm or below 2 cm.) To insure that no bias was associated with study exclusions of cases due to known age and ethnic/racial associations to cancer risk, we compared cases with a valid SXA measure to the cases without either a valid SXA measure or without a digitized mammogram. Our major concern was whether cases with available digital mammograms differ from cases without available digital mammograms since breast cancer occurs more frequently in certain age and ethnic/racial groups. We found that the study and non-study cases did not differ in terms of their demographic and clinical characteristics (P>0.10). After testing the representativeness of cases, controls were compared against the whole study population (regardless of the availability of SXA digital images) throughout the matching process and found not to be different than the study population.
Baseline and mammographic characteristics of participants are presented in Table 1. BMI was higher for the cases than controls by 0.9 kg/m2. Percent dense area and percent fibroglandular volume were negatively associated with BMI (R = -0.35 and -0.27 respectively). Fibroglandular volume was positively associated with BMI (R = 0.44). Both measures of fibroglandular volume and percent dense area were significantly higher for cases than controls. The largest fractional difference between cases and controls was for fibroglandular volume; mean fibroglandular volumes were 242cm3 for cases and 194cm3 for controls or 25% higher for cases than for controls. In Figure 1, percent dense area and the volumetric measures are plotted against each other. The percent dense area had a strong linear relationship to percent fibroglandular volume, R2 = 0.59, while fibroglandular volume had a weak linear relationship to percent dense area, R2 = 0.01. Relative frequency histograms of the cases and controls for fibroglandular volume, percent fibroglandular volume, and percent dense area are represented in Figure 2. The histograms demonstrate better separation for cases from controls with fibroglandular volume than percent dense area.
Table 1 shows the univariate determinants of risk known to be associated with breast cancer. Older age at 1st live birth or nulliparity and having a first degree relative with breast cancer was more common in cases than controls (P-value < 0.05).
All three breast density measurements were significantly positively associated with risk of breast cancer (P-value for trends <0.0001 for all three measurements). Quintile odds ratios are found in Table 2. In adjusted models, the odds ratios for breast cancer in the highest versus lowest quintiles 2.5 (1.5-4.3), 4.1 (2.3-7.2), and 2.9 (1.7-4.9) for percent dense area, fibroglandular volume, and percent fibroglandular volume. Adjustment for BMI strengthened the associations of percent dense area and percent fibroglandular volume with risk of breast cancer, but lessened the association of fibroglandular volume with risk of breast cancer.
The c-index for risk factors alone (no breast density measure) was 0.609. A variety of modeling schemes were investigated as listed in the statistics section to include breast density. Each model improved the c-index from 0.609 to between 0.638 (percent fibroglandular volume as a continuous variable) and 0.652 (fibroglandular volume as cubic splines). Although all models that included density of some form were significantly different than the model using risk factors alone (P-values > 0.01), none of the c-index values were statistically different from one another. The c-indexes for the fully-adjusted models shown in Table 2 are 0.642 (percent dense area), 0.646 (fibroglandular volume), and 0.645 (percent fibroglandular volume). Interestingly, adding log percent dense area (as a continuous variable) to the fully adjusted model with log fibroglandular volume (as a continuous variable) modestly but significantly increased the c-index from 0.651 to 0.667 (P-value = 0.031).
Net reclassification tables are shown comparing risk classifications based on risk factors alone compared to classifications also using percent dense area (Table 3), log fibroglandular volume (Table 4), and percent fibroglandular volume (Table 5), and. The natural log of fibroglandular volume was used to normalize its distribution. In these comparisons, NRIs were significant and high for log fibroglandular volume (21.1%, 95% CI: 10.6%, 31.6%, P-value < 0.0001) and percent fibroglandular volume (14.8%, 95% CI: 4.8%, 24.7%, P-value = 0.0036). Interestingly, the model including risk factors and fibroglandular volume had significant reclassifications over risk factors alone for both the cases and controls while the model including percent fibroglandular volume reached significance for the cases only. The reclassification index for cases and controls, and the NRI for the risk factors and percent dense area all failed to reach statistical significance (NRI 9.6%, 95% CI: -0.7%, 19.9%, P-value = 0.062). There were also no statistically significant reclassifications when risk factors plus log fibroglandular volume was compared to risk factor plus log fibroglandular volume plus percent dense area (NRI 3.4%, 95% CI: -2.7%, 9.5%, P-value = 0.27).
Volumetric breast density has long been thought to hold the promise of stronger breast cancer risk prediction due to its accuracy in quantifying the true cellular composition of the breast. We found that volumetric breast density measured as either fibroglandular volume or percent fibroglandular volume was strongly associated with risk of breast cancer. Both statistical methods (c-index and NRI) used compare models that include breast density with clinical risk factors showed that either fibroglandular volume measure significantly improved the accuracy of risk assessment over risk factors alone. However, the c-index was not statistically different between models that included any measure of breast density.
Our study used a novel method for accurately measuring volumetric breast density, called SXA that derives fibroglandular volume and percent fibroglandular volume from a calibration phantom and screening single X-ray tube voltage images (12). The use of a reference sample in each mammogram minimized the assumptions made in calculating breast density related to changes in X-ray tube and film sensitivity and allowed automated determination of accurate and precise measures of mammographically dense volume. The use of an in-image phantom allows physically-calibrated cross-comparison over time and between different makes and models of mammography systems. Longitudinal cross-calibration is particularly important in studies that monitor changes in breast density resulting from treatment. In addition, all SXA measures for this study were performed in a fully automated mode while the percent dense area measures were made manually by a skilled reader. Thus, there are strong advantages of the SXA method in terms training/labor costs and pooling ability compared to percent dense area.
The odds ratios in our study are comparable to other studies for similar measures and categories. Barlow et al. reported a BI-RADS-4:BI-RADS-1 odds ratio of 3.93 for premenopausal and 3.15 for postmenopausal women (2). In the Barlow et al. paper, BI-RADS-1 contained 4.3% and 10.2% of the pre- and postmenopausal population, and BI-RADS-4 contained 14.4% and 5.3% respectively. Thus, by percentage of women in the highest and lowest density categories, BI-RADS compares more extreme spectrums of density than quintile analysis. Boyd et al. (9) reported an odds ratio of 6.1 between the least category of “no density” and the ≥ 75% dense area categories using a semiquantitative method (radiologist manual readings). As with BI-RADS, these categories contained fewer women at more extreme densities than quintile analysis with only 7% and 18% of the women respectively in those two extreme categories. Vachon et al. (20) reported an odds ratios of 2.7 between 4th and 1st quartiles for percent dense area with a 3-year median time before diagnosis. There are many other examples in the literature with values ranging from 3 to 6 fold increase in risk between the highest and lowest categories of density that vary depending on the method and median time before diagnosis. A meta-analysis of available studies (10) showed that the odds ratios were much stronger for percent dense area than for Wolfe grade or BI-RADS classification. Our odds ratios for percent dense area are similar to others that have reported on the strength of breast density as a risk factor.
The NRI method of comparing models is relatively new, but is useful to in showing how models that include fibroglandular volume measures may improve risk prediction. First, the risk estimate accuracy for women who did not develop breast cancer in the 5 years of the study was higher when absolute fibroglandular volume was included in the risk model than without. More accurate identification of women at low risk for breast cancer undergo less frequent screening mammograms. Models that included either absolute fibroglandular volume or percent fibroglandular volume were significantly more accurate in their risk assessment than risk factors alone among women who develop breast cancer. Women at high breast cancer risk may consider and benefit from more frequent breast imaging, genetic counseling, or chemoprevention. Thus, is it clear that more accurate risk models that include a volumetric breast density measure could lead to more women being offered and undergoing appropriate prevention measures.
There have been other methods proposed for measuring percent fibroglandular volume. The standard mammographic form (SMF version 2.2β, Siemens Molecular Imaging Limited) is a method used in film mammograms to quantify the volume of “interesting” dense tissue (21-22). In this case, density associated with the skin and adipose tissues are estimated and subtracted off from the percent fibroglandular volume. Aitken et al. (23) compared percent dense area to the SMF measure and found significant associations for percent dense area with a 2.19 (95% CI 1.28-3.72) interquartile relative risk, but no significant association to breast cancer for SMF. Hologic (Hologic, Inc., Bedford, MA) has marketed an adaptation of the SMF method for digital mammography called Quantra. However, it has yet to be reported if Quantra differs substantially from SMF regarding it risk association. Pawluczyk et al. (24) reported on a method for measure of volumetric breast density that relies on the precise description of the digital mammography system’s geometry, breast thickness, and x-ray characteristics to derive a volumetric breast density. No associations with risk have been reported for the Pawluczyk or other volumetric methods using either mammography (24-25), magnet resonance imaging (26-27), tomosynthesis (28), or ultrasound tomography (29). Thus, it is important to note that the results of this study may not be generalizable to all measures of fibroglandular volume since they vary in the way the dense volume is modeled (with or without skin, with or without total water, the quality control methods (with absolute phantom standards or relative standards), and the calibration standards (adipose tissue or fatty acid for 0 percent fibroglandular volume; parenchymal tissue or water for 100 percent fibroglandular volume.)
Fibroglandular volume and percent dense area may capture different variations in breast morphology related to cancer risk. Fibroglandular volume should directly capture risk related to increasing volumes of stromal tissue and is thereby expected to be positively associated with BMI. Percent dense area is weakly related to fibroglandular volume and may capture additional risk related to the distribution of stromal tissue in the breast. Whereas percent dense area assumes that dark areas of a mammogram are composed of lipid, SXA directly measures the density of areas that appear to be only fat. Adipose tissue contains water that contributes to density in the SXA calculations. As shown in Table 1, the percent fibroglandular volume was higher on average than percent dense area (44.5% versus 27.1% for controls). This is likely due to the fact that SXA estimates of density includes all lean tissue, including the density from water in the adipose tissue, whereas percent dense area does not. As shown in Figure 2, the lower limit of percent dense area is zero percent dense area but approximately 18% for percent fibroglandular volume because the fatty breast will still contain approximately 18% water.
Although the inclusion of adipose water in the fibroglandular compartment may limit dynamic range to some degree, the approach is less noisy than more complicated models that try to extract only parenchyma tissue since no approximations of skin thickness or hydration of the fat are used in the calculation of density. Free water in adipose tissue does vary for individuals and a constant value cannot be used for the general case. In magnetic resonance imaging studies measuring breast total water content versus either delineated fibroglandular volume or percent dense area, the adipose %water (at 0, zero percent dense area) was found to be 14% from Graham et al. (30) and 18% from Boyd et al. (31).
Our study has limitations. First 40% of the participants were excluded due to lack of phantom calibration that arose from technical issues regarding the position and construction of the phantom. These issues have since been resolved in later generation phantoms by placing the phantom on top rather than between compression paddles, and eliminating moving parts (32). Secondly, our results were based on digitized film data. A majority of systems in the US are now full-field digital mammography systems that have improved linearity and dynamic range. Thus these results should be reproducible in newer mammography systems to a similar if not improved accuracy. Studies using SXA on full-field digital mammography systems are ongoing. In addition, BI-RADS density scores were not available and comparisons to this standard clinical measure could not be made.
We conclude that fibroglandular volume and percent fibroglandular volume, estimated with the SXA method, are strong predictors of breast cancer, and more accurate predictors of breast cancer risk than percent dense area.
The authors are grateful for the assistance of the radiology technologists at the California Pacific Medical Center for their assistance in performing the protocol, and Alice LaRocca for managing the study. Funding for this study came from the California Pacific Medical Center, the DaCosta Foundation for Prevention of Breast Cancer, Lilly Pharmaceuticals, Roche, and the National Cancer Institute Grant no. U01 CA63740.