|Home | About | Journals | Submit | Contact Us | Français|
To assess the agreement between Quantra™ (Hologic Inc., Bedford, MA) and Breast Imaging Reporting and Data Systems (BI-RADS®) and the performance of Quantra at reproducing BI-RADS mammographic breast density (MBD) assessment.
MBD assessment was performed using Quantra and BI-RADS. BI-RADS assessment was performed in two phases (1314 and 292 cases, respectively). Kappa was used to assess the interreader agreement and the agreement between Quantra and BI-RADS, and receiver-operating characteristics analysis was used to assess the performance of Quantra at reproducing BI-RADS rating.
Agreement (weighted kappa) between BI-RADS and Quantra in Phase 1 was 0.75 [95% confidence interval (CI): 0.73–0.78] and 0.85 (95% CI: 0.80–0.90) on four- and two-grade scales, respectively. The corresponding agreement in Phase 2 was 0.79 (95% CI: 0.75–0.84) and 0.84 (95% CI: 0.79–0.87) using the majority report. In Phase 1, Quantra demonstrated 93.2% sensitivity and 86.1% specificity for BI-RADS on a two-grade scale (1–2 vs 3–4). In Phase 2, it demonstrated 91.3% sensitivity and 83.6% specificity on a two-grade scale.
Quantra is limited in reproducing BI-RADS rating on a four-grade scale; however, it highly reproduces BI-RADS assessment on a two-grade scale.
Quantra (v. 2.0) is a poor predictor of BI-RADS assessment on a four-grade scale, but well reproduces BI-RADS rating on a two-grade scale. Therefore, it should be considered a tool for two-grade scale MBD classification.
Breast density is associated with breast cancer risk, interval breast cancer and mammographic sensitivity.1,2 It has been shown that 30% reduction in breast cancer mortality is partly attributable to improved knowledge of its risk factors and advances in early detection and treatment strategies.3,4 Breast density refers to the proportion of the breast that is composed of fibroglandular tissue5 and is related to established breast cancer risk factors such as genetics, hormonal agents and lifestyle characteristics.6,7 Most importantly, breast density reduction has been shown to be a prognostic marker of response to adjuvant tamoxifen therapy in post-menopausal females.8,9 Therefore, adding breast density information to existing breast cancer risk-prediction models may improve the stratification of breast cancer risk,10 which may lead to earlier adoption of prevention and control measures. Breast density information can also be used for tailoring clinical decisions such as screening intervals for females who are asymptomatic and the selection of more appropriate imaging pathways such as ultrasound,11 MRI12 or digital breast tomosynthesis (DBT)13 to enhance the visualization and evaluation of the dense tissue for features of cancer.14
Currently, breast density notification laws are effective in 22 US states, and the most economically viable and common method of obtaining such information is through mammographic breast density (MBD) assessment.5 Qualitative, semi-automated and fully automated approaches are available for acquiring MBD information from mammograms.15 Qualitative approaches such as Breast Imaging Reporting and Data Systems (BI-RADS®) subjectively assign MBD grades based on the perceived relative proportions of fatty and dense tissue and ductal distribution. Semi-automated techniques involve the user outlining the dense area using computer assistance to calculate percentage MBD. Fully automated area-based methods such as MedDensity and Autodensity measure percentage MBD, whilst others such as Quantra™ (Hologic Inc., Bedford, MA) and Volpara™ (Volpara Solutions, Mãkatina Company, Wellington, New Zealand) perform volumetric breast density measurement.15 All approaches have shown that breast density is a significant risk factor for breast cancer.16,17 However, for breast density information to be used consistently for assessing risk and tailoring screening pathways, it is essential that methods of assessment are reliable and reproducible, as inaccurate calculation could adversely affect the patient outcome and clinical management. The BI-RADS approach commonly used in the clinical practice is subjective and inconsistent.18,19 Semi-automated methods require user input and also suffer from subjective variability and add labour-intensive software into the already complex radiological reporting ecosystem.20 There are also concerns that measurement of MBD as percentage area breast density by area-based semi-automated and automated methods is affected by breast size and treatment of the breast as a two-dimensional structure and therefore may not be representative of the quantity of the dense tissue in the breast.21 To avoid variability and provide a more consistent measure of the quantity of the dense tissue in the breast, automated volumetric tools such as Quantra have been developed.
Quantra is designed to estimate volumetric breast density and generate pseudo-BI-RADS grades. For this reason, it is important when using Quantra for MBD assessment in clinical practice to measure to what degree its MBD measures compare with radiologist BI-RADS grading. Only two peer-reviewed articles have reported the performance of the Quantra in reproducing radiologists’ fourth edition BI-RADS assessment and one on the fifth edition of BI-RADS rating scale.22 Each of these studies are laboratory based and employed a relatively small sample size and have reported different sensitivity, specificity and cut-off values for Quantra in reproducing BI-RADS categories on a two-grade scale (BI-RADS 1–2 vs 3–4), but none have reported Quantra performance on a four-grade scale (BI-RADS 1–4).23–25 These studies have recommended further works to assess the performance of Quantra in order to ensure that it is fit for the purpose of clinical MBD assessment. Furthermore, these studies employed older versions of Quantra (v. 1). A recalibrated and upgraded newer version (v. 2.0) has been developed; however, it is unknown how well this new version will perform in MBD assessment. The recommendations for further studies and the deficiencies highlighted above underscore the need for further evaluation of Quantra. Also, BI-RADS is observer dependant,18,19 and extraneous variables such as artificiality of the study environment and extent of scrutiny of one's performance influence reader behaviour in a laboratory-type experimental setting (laboratory effect).26,27 Since these variables cause discrepancy between results obtained in a laboratory-type environment and real-life clinical situations,26,27 it is imperative to assess how Quantra assessment compares with radiologist-assigned BI-RADS categories in a real-life clinical environment. In addition, the association of high-grade breast densities (BI-RADS 3–4) with cancer risk and mammographic masking effect emphasizes the need for further assessment of the cut-off value for the correct differentiation between high- and low-grade BI-RADS categories using Quantra. Therefore, this study aimed to assess the agreement between Quantra and BI-RADS. It also aimed to assess the performance of Quantra in reproducing BI-RADS assessment and the cut-off value for Quantra that best differentiates high- and low-grade BI-RADS categories.
This study involved 1314 females who underwent screening mammography between March and July 2014. All females gave both oral and written consent for use of their mammograms for research purposes. Mammograms were acquired using the Hologic® Selenia® Dimensions® equipment (Hologic Inc.). The equipment comprised of a tungsten (W) anode and rhodium (Rh) and silver (Ag) filters. All mammograms were acquired in standard craniocaudal and mediolateral oblique projections with automatic image-acquisition parameters [tube potential (kVp), tube current (mAs) and filter type and thickness].
Two methods of MBD measurement were used, automated breast density assessment and qualitative assessment, by radiologists. Automated breast density assessment from the raw projection images was performed using Quantra v. 2.0 (Hologic Inc.). Quantra employs a physical modelling of mammographic systems to calculate volumetric breast density. Volumetric breast density measurement is based on the physical composition of the breast and compressed breast thickness, as well as the information related to the X-ray spectra such as tube potential (kVp), tube current (mAs) and filter type and thickness.28,29 Quantra estimates the thickness of the dense tissue above each pixel in each mammographic image and combines these pixel values to compute the total volume of the dense tissue in the breast [Vfg (cm3)]. It examines the entire imaged breast silhouette to estimate the entire volume of the breast [Vbd (cm3)] and calculates percentage volumetric breast density (Vbd %) as a percentage of the estimated dense tissue volume and the total breast volume. The software also calculates the area percentage breast density (Abd %) as the area of the fibroglandular tissue and the total area of the breast. Quantra segments the estimated volumetric breast density to generate fractional quantized breast density (q_abd) values for each mammographic view. These q_abd estimates from all views are averaged to produce a total q_abd value for each patient, which is then used to generate a total quantized breast density (Q_abd) value for that patient. Quantra estimates the gross breast density make-up of a patient (total Q_abd) by rounding off the total q_abd value for that patient to the nearest integer. Briefly, total q_abd values are rounded off as follows: ≤1.44 represents a total Q_abd value of 1; 1.45–2.44 represents a total Q_abd value of 2; 2.45–3.44 represents a total Q_abd value of 3; and ≥3.45 represents total a Q_abd value of 4. Quantra's Q_abd values range from 1 to 4 and are used to map BI-RADS 1–4 categories, respectively. For this study, the total Q_abd value estimated by Quantra for each patient was matched to radiologists' BI-RADS assessment for that patient and used to assess the ability of Quantra to reproduce BI-RADS assessment.
Qualitative MBD assessment was performed by radiologists using the fourth edition BI-RADS scheme on a discrete scale (1–4). For this study, BI-RADS breast density assessment was performed in two phases. In the first phase, 1314 cases had their MBD assigned in a single reading during clinical mammography interpretation and were stored in a database; each case was read by 1 radiologist, and in total, 6 breast radiologists were involved across all these cases. All readers were Royal Australian and New Zealand College of Radiology-certified breast radiologists. Readers were working in the same practice and in the New South Wales breast-screening programme (BreastScreen, NSW, Australia). The characteristics of the readers and the number of cases each reader assessed in the present study are shown in Table 1. These BI-RADS MBD assessment reports were retrieved from the database and matched with the breast density grades assigned by Quantra.
In Phase 2, a subsample of 300 mammograms was selected from the 1314 cases and the MBD grades assigned to these cases in Phase 1 recorded. This subsample consisted of the first consecutive 60, 80, 90 and 70 cases rated BI-RADS 1, 2, 3 and 4, respectively, in Phase 1. The selection was to ensure that there were a substantial number of cases in each category and to fairly represent their prevalence in the original data set in Phase 1. This subsample was assessed by three radiologists (MR, DD and JK) blinded to the results of Quantra assessment and a majority report was generated (consensus of at least two radiologists); two of the radiologists (MR and JK) participated in Phase 1. A repeat assessment of the first consecutive 100 cases in the data set was performed by each radiologist 5 months after the initial assessment.
The agreement between each radiologist and the majority report was assessed by comparing their first readings with the majority report, and interreader agreement was assessed by comparing the first assessments of the radiologists in pairs. Intrareader agreement was assessed by comparing the first and second readings of each radiologist. Data of each reader were grouped into two categories (1–2 and 3–4) to enable the assessment of reader agreement on a two-category scale (1–2 vs 3–4). Also, the BI-RADS assessment in Phase 1 and the majority report in Phase 2 were grouped into a two-grade scale (1–2 and 3–4) to assess the performance of Quantra in reproducing BI-RADS categories on a two-grade scale.
The Statistical Package for Social Sciences v. 21 (IBM Corp., New York, NY; formerly SPSS Inc., Chicago, IL) was used for data analysis. Weighted kappa (κw) was also used to calculate the agreement between each radiologist and the majority report in Phase 2 as well as intrareader and interreader agreement. The agreement between Quantra and each reader as well as their majority report was calculated using both simple and weighted kappa. Simple kappa was calculated to show the percentage agreement between Quantra and BI-RADS, whilst weighted kappa was calculated to account for agreement beyond chance expectation. The agreement (κ) between the majority report generated in Phase 2 and the initial BI-RADS reports provided for these cases in Phase 1 was also calculated. Agreement was assessed on a four-grade scale (BI-RADS 1–4) and on a two-grade scale (BI-RADS 1–2 vs 3–4). Kappa was interpreted as designated by Landis and Koch:30 0.00–0.20 (slight), 0.21–0.40 (fair), 0.41–0.60 (moderate), 0.61–0.80 (substantial) and 0.81–1.00 (almost perfect) agreements. Receiver-operating characteristics (ROC) analysis was also used to assess the performance of Quantra in reproducing BI-RADS assessments in both phases. To assess the performance of Quantra in reproducing BI-RADS on a two-grade scale, BI-RADS 1–2 were grouped together, as were 3–4. This enabled assessment of the sensitivity, specificity and cut-off value for the correct differentiation of high- and low-grade BI-RADS categories using Quantra. The area under the curve (AUC) was used as a figure of merit to assess the overall performance of Quantra in reproducing BI-RADS. AUC was interpreted as described by Fan et al:31 1 (perfect); 0.9–0.99 (excellent); 0.8–0.89 (good); 0.7–0.79 (fair); 0.51–0.69 (poor); and 0.5 (worthless). The University of Sydney institutional review board approved the study (IRB: 2014/481).
The 1314 cases used in Phase 1 comprised of 13.8% (n=181), 30.1% (n=396), 37.8% (n=497) and 18.3% (n=240) for BI-RADS 1, 2, 3 and 4, respectively. MBD assessment readings were not provided for 8 cases in Phase 2, leaving a total of 292 cases that were assessed by all three radiologists. The majority report of the three radiologists in this phase consisted of 9.6% (n=28), 35.3% (n=103), 27.1% (n=79) and 28.1% (n=82) for BI-RADS 1, 2, 3 and 4, respectively. The number of cases classified as BI-RADS 1, 2, 3 and 4 ranged from 24 to 60 (mean=39.7), 80–105 (mean=90.7), 81–85 (mean=82.3) and 60–82 (mean=79.3), respectively. Weighted kappa analysis in Phase 2 demonstrated an almost perfect intrareader agreement (κ), as shown in Table 2.
A substantial to almost perfect agreement was observed between the individual radiologist and the majority report on a four-grade scale from 0.80 (95% CI: 0.76–0.83) to 0.89 (95% CI: 0.84–0.93) and almost perfect agreement on a two-grade scale from 0.82 (95% CI: 0.77–0.87) to 0.90 (95% CI: 0.85–0.94). There was substantial interreader agreement on a four-grade scale from 0.66 (95% CI: 0.62–0.71) to 0.75 (95% CI: 0.70–0.81) and substantial 0.77 (95% CI: 0.73–0.82) to almost perfect 0.89 (95% CI: 0.84–0.93) interreader agreement on a two-grade scale (Table 2).
The agreement (κw) between the Phase 1 BI-RADS assessment and the majority report in Phase 2 was 0.78 (95% CI: 0.73–0.85) on a four-grade scale and 0.85 (95% CI: 0.79–0.88) on a two-grade scale.
Quantra percentage volumetric density estimates are converted to pseudo-BI-RADS four-point scale based on preset percentages; for this study, these Quantra percentage volumetric density were 1−7%, 6–22%, 13–29% and 25–44% for BI-RADS 1, 2, 3 and 4, respectively. The distribution of Quantra classification of the Phase 1 and Phase 2 BI-RADS assessment is shown in Figure 1a,b, respectively.
Agreement (κw) between BI-RADS and Quantra in Phase 1 was 0.75 (95% CI: 0.73–0.78) and 0.85 (95% CI: 0.80–0.90) on four- and two-grade scales, respectively. The corresponding simple kappa was 0.53 (95% CI: 0.50–0.57) and 0.59 (95%CI: 0.55–0.68). The agreement between BI-RADS majority report and Quantra in Phase 2 is shown in Table 3. The agreement (κw) between readers and Quantra ranged from 0.77 (95% CI: 0.72–0.81) to 0.81 (95% CI: 0.76–0.84) on a four-grade scale and from 0.81 (95% CI: 0.74–0.86) to 0.83 (95% CI: 0.79–0.87) on a two-grade scale (Table 3).
Considering the BI-RADS assessment in Phase 1 as the assumed truth, ROC analyses show that on a four-grade scale (BI-RADS 1–4), Quantra demonstrated a sensitivity for 25.4%, 81.1%, 90.7% and 56.7% of BI-RADS 1, 2, 3 and 4, respectively. On a two-grade scale (1–2 vs 3–4), Quantra demonstrated 93.2% sensitivity and 86.1% specificity (Table 4), and a cut-off value of 20% allowed differentiation between high- and low-grade BI-RADS categories.
Restricting the analyses to the majority report in Phase 2, Quantra demonstrated a sensitivity of 35.7%, 91.2%, 88.6% and 50.3% for BI-RADS 1, 2, 3 and 4, respectively. On a two-grade scale (1–2 vs 3–4), it demonstrated 91.3% sensitivity and 83.6% specificity (Table 5), and a cut-off value of 20.5% allowed differentiation between high- and low-grade BI-RADS categories.
Since BI-RADS depends on individual radiologist perception of MBD,18,19,23 it is intuitive that an automated method such as Quantra, which is modelled with input from radiologists and selectively measures the volume of the dense tissue, may demonstrate greater reproducibility. However, this alone is not sufficient to warrant clinical implementation; automated measures must reflect the clinical opinion of the experienced breast radiologist consistently. In consideration of this, the present work examines how well the automated measures reflect BI-RADS assessment and treats BI-RADS as the “truth”. Simple kappa (a measure of percentage agreement) demonstrated moderate agreement between Quantra and BI-RADS. Weighted kappa, which provides a measure of agreement beyond chance expectation, demonstrated a good level of agreement between these two methodologies on a two-grade scale (1–2 vs 3–4). Similarly, ROC analysis showed that Quantra is limited in reproducing BI-RADS rating on a four-grade scale, but very well reproduces BI-RADS on a two-grade scale (Table 4). The area under the curve (AUC=0.917; 95% CI: 0.901–0.933) indicates that Quantra is a promising tool for MBD assessment on a two-grade scale in the clinical setting.
On a four-grade scale, Quantra demonstrated low sensitivity but high specificity for BI-RADS 1 and 4; it showed considerably high sensitivity for BI-RADS 2 and 3 in both phases of the assessment (Tables 4 and and5).5). Clinically, the differentiation between BI-RADS 2 and 3 is the most important, as it determines clinical decisions made from MBD assessment such as identification of females at elevated risk of breast cancer.10 It also guides clinicians in selecting females who may require additional imaging with ultrasound or MRI, as well as the next screening round for females who are asymptomatic.14 In both phases of assessment, Quantra correctly classified these categories, which form the midpoint between high and low densities, with a few cases of BI-RADS 2 graded as 3 and vice versa (Figure 1a,b).
Although there was an excellent distinction between low-grade and high-grade densities, Quantra graded most of the BI-RADS 1 categories as BI-RADS 2 in both phases (Figure 1a,b). Restricting the analysis to Phase 2 majority report, we observed a 10.3% increase from 25.4% to 35.7% in sensitivity and a 4% decrease in specificity for BI-RADS 1 (Table 5). The result also shows that Quantra classified 42% and 47% of BI-RADS 4 as 3 in Phases 1 and 2, respectively. It is possible that the discrepancy between BI-RADS and Quantra for BI-RADS 3 and 4 may be due to radiologist assessment being influenced by perceived masking effect (concealment of cancer due to dense tissue superimposition), which usually warrants further imaging,14 a phenomenon not considered by Quantra. A further observation is that a small difference in Quantra's total fractional quantized breast density (q_abd) values contributed to the discrepancy between BI-RADS and Quantra rating. This effect was most pronounced for BI-RADS 3 and 4 (Figure 2d) than between BI-RADS 2 and 3 (Figure 2c). Therefore, it can be argued that Quantra has a narrow threshold for BI-RADS 1 and a wider threshold for BI-RADS 3 and may need to be recalibrated to improve the sensitivity for BI-RADS 1 and 4.
Whilst it is desirable for Quantra to absolutely reproduce BI-RADS, it should be remembered that Quantra measures volumetric breast density which may be more related to the quantity of the dense tissue that is the major determinant breast cancer risk.32–34 The association between breast density and cancer risk emphasizes the need to consider dense tissue volume when estimating breast density, particularly for the purpose of risk assessment. This is because even with the same size of the projected dense area, the thickness and volume of the dense tissue may not necessarily be the same in different females35 and is perhaps the reason why volumetric MBD assessment methods are better risk predictors from MBD calculations than visual and area-based approaches.16
The result of this study is comparable with that of previous studies assessing the performance of Quantra in reproducing BI-RADS. Unlike previous studies that focused on the differentiation of high- and low-grade densities, the present work has also shown the performance of Quantra on a four-grade scale (reproducibility of each BI-RADS category). An important observation in the present study is that Quantra is a poor predictor of BI-RADS on a four-grade scale, but a small difference in the total fractional quantized breast density (q_abd) value may be a significant cause of this reduced reproducibility. Therefore, Quantra should be considered a tool for two-grade scale MBD classification.
Previous studies have reported sensitivity and specificity on a two-grade scale ranging from 78.0% to 89.8% and 84.6–88.9%, respectively, for Quantra. The reported cut-off values for Quantra in differentiating high-grade from low-grade BI-RADS categories ranged from 13% to 22%.23–25,36 The variation in sensitivity, specificity and cut-off values in the different studies may be due to the varying sample sizes, test sets and radiologists. For example, considering the majority report in Phase 2, a cut-off value of 20.5% was observed. Even though Quantra studies have shown variable cut-off values, the differences in the majority of the studies are negligible (19.5–22%);23,24,36 only one study has reported a lower cut-off value (13%).25 Ciatto et al23 have questioned this lower cut-off value, implicating differences in the radiologist panels as the cause for the lower value. Furthermore, the 13% cut-off reported by Rafferty et al25 has not yet been peer reviewed.
The first thorough independent evaluation of Quantra was performed by Ciatto et al23 using a test set of 481 mammograms and 11 radiologists and reported correct classification of 88.6% of D1–2 (almost fatty to scattered fibroglandular tissue) and 89.8% of D3–4 (heterogeneously dense to extremely dense breasts) BI-RADS categories by Quantra. The authors assessed the performance of Quantra in reproducing BI-RADS assessment on a two-category scale, neglecting the four-category scale used by radiologists clinically. In addition, their study was based on the first version of Quantra (v. 1.0) and on a test-set reading, which may differ from the actual clinical scenario. Technology is constantly evolving and newer versions of Quantra have been developed, including recalibration and upgrade in technology to improve performance. Without further studies, it is impossible to establish whether a new version of Quantra is fit for the purpose of MBD assessment or whether it demonstrates improvement over older versions. The present study differs from the only peer-reviewed studies by Ciatto et al23 and Regini et al24 in that it is based on a new version of Quantra (v. 2.0) and evaluated its performance in reproducing BI-RADS rating on a four-grade scale commonly used in the clinical setting. It also provides evidence for the performance of Quantra in reproducing BI-RADS categories assigned clinically and in a test-set reading. Thus, the present study has provided a new insight into the clinical performance of a newer version of Quantra (v2.0), highlighting its limitation on a four-grade scale and improved performance on a two-grade scale. Quantra has been shown to be a reproducible tool for volumetric breast density assessment.37–39 Findings suggest that Quantra may be a useful supplementary tool to BI-RADS for MBD assessment. Automated MBD assessment may also eliminate variability in MBD classification, meaning a more consistent use of breast density information in clinical decision-making.
Limitations of this study include that six radiologists from the same practice were involved across the BI-RADS breast density reports in Phase 1. Variability in MBD assessment has been reported in the literature with interreader agreement (κ) ranging from 0.19 to 0.91.15,18,19 This interreader variability is also evident in Phase 2 of the present study (Table 2). The high level of BI-RADS interreader agreement observed in the present study may be due to all readers working in the same practice. It is possible that they would demonstrate considerable interreader variability with readers from different practice, limiting the generalizability of results. Using the majority report of radiologists to assess the performance of Quantra is a better reference standard. Therefore, if we were to use the majority report of the six radiologists in Phase 1, some of the BI-RADS categories may have been different. However, we have also assessed the performance of Quantra using the majority report in Phase 2. It is also possible that the increased sensitivity of Quantra for BI-RADS 1 and 2 in Phase 2 may be due to the small sample size compared with sample in Phase 1 and the laboratory effect.27
Conversely, the strength of the study is that BI-RADS categories were assigned by breast radiologists using BI-RADS scheme on a daily basis. Also, the high level of intrareader and interreader agreement, the agreement between each radiologist and Quantra and the agreement between the majority report in Phase 2 and the report in Phase 1 show that the latter can be relied on to assess the performance of Quantra in reproducing BI-RADS categories. A further strength of this study is the large sample size compared with previous studies. Also, since extraneous variables influence radiologist behaviour in a laboratory environment,26 the fact that we assessed the performance of Quantra in reproducing BI-RADS with categories assigned clinically and in a laboratory setting adds to the strengths of this work.
The result demonstrates that Quantra is very good at reproducing BI-RADS classification on a two-grade scale. A cut-off value of 20% allowed better differentiation between high-grade and low-grade categories using Quantra. Although discrepancies are evident between BI-RADS 1 and 2, and between BI-RADS 3 and 4, Quantra is a promising adjunct to BI-RADS for MBD assessment in clinical practice.