|Home | About | Journals | Submit | Contact Us | Français|
Millions of women receive clinical breast examination (CBE) each year, as either a breast cancer screening test or a diagnostic test for breast symptoms. While screening CBE had moderately high specificity (~94%) in clinical trials, community clinicians may be comparatively inexperienced and may conduct relatively brief examinations, resulting in even higher specificity but lower sensitivity.
To estimate the specificity of screening and diagnostic CBE in clinical practice and identify patient factors associated with specificity.
Retrospective cohort study.
Breast-cancer-free female health plan enrollees in 5 states (WA, OR, CA, MA, and MN) who received CBE (N=1,484).
Medical charts were abstracted to ascertain breast cancer risk factors, examination purpose (screening vs diagnostic), and results (true-negative vs false-positive). Women were considered “average-risk” if they had neither a family history of breast cancer nor a prior breast biopsy and “increased-risk” otherwise.
Among average- and increased-risk women, respectively, the specificity (true-negative proportion) of screening CBE was 99.4% [95% confidence interval (CI): 98.8–99.7%] and 97.1% (95% CI: 95.7–98.0%), and the specificity of diagnostic CBE was 68.7% (95% CI: 59.7–76.5%) and 57.1% (95% CI: 51.1–63.0%). The odds of a true-negative screening CBE (specificity) were significantly lower among women at increased risk of breast cancer (adjusted odds ratio 0.21; 95% CI: 0.10–0.46).
Screening CBE likely has higher specificity among community clinicians compared to examiners in clinical trials of breast cancer screening, even among women at increased breast cancer risk. Highly specific examinations, however, may have relatively low sensitivity for breast cancer. Diagnostic CBE, meanwhile, is relatively nonspecific.
The American Cancer Society recommends that asymptomatic women over the age 20 years receive regular clinical breast examinations (CBEs) as screening tests for breast cancer.1 The U.S. Preventive Services Task Force recommends screening mammography with or without CBE for women over 40 years old.2 Although national CBE rates have declined slightly with increasing mammography use,3 CBE screening is still performed on millions of U.S. women per year.4 Many others receive diagnostic CBE to evaluate breast symptoms or mammographic abnormalities. With limited access to mammography in most developing countries, CBE may be the only practical means of population-based breast cancer screening in many parts of the world.5
An accurate estimate of CBE specificity is critical to judging the likely effectiveness of CBE in community practice. Relatively high specificity may be associated with low sensitivity, compromising cancer detection, whereas relatively low specificity may lead to excessive diagnostic testing and unnecessary patient anxiety. In clinical trials of breast cancer screening, the sensitivity and specificity of screening CBE were 54 and 94%, respectively,6 but clinical trial results may reflect CBE performance under idealized circumstances that differ substantially from actual clinical practice. Some community-based studies suggest that CBE sensitivity may be substantially lower in actual practice than in clinical trials.7–10 Because sensitivity is typically inversely related to specificity, relatively lower sensitivity of CBE in the community would suggest that specificity may be higher. However, community-based studies of screening CBE specificity have been limited by sampling only low-income women,7 women from single U.S. health plans,11,12 or women who were examined by specially trained nurses12 or a single radiologist.8
We estimated the specificity of screening and diagnostic CBE among women enrolled in 6 large U.S. health plans and determined patient and examination characteristics associated with specificity. We hypothesized that screening CBE specificity would be higher in community settings than has been reported in clinical trials because examination duration may be relatively brief in community practice,13 leading to a reduced ability to detect abnormalities.14 Conversely, we hypothesized that the specificity of diagnostic CBE would be relatively low because of patient and clinician concerns about missed breast cancer. We also hypothesized that specificity of screening CBE would be lower among women with clinical factors associated with increased breast cancer risk (e.g., family history of breast cancer) because clinicians may interpret subtle breast abnormalities more cautiously among these women.
This study was conducted within the Cancer Research Network, a National Cancer Institute-supported consortium of nonprofit health maintenance organizations developed to increase the effectiveness of preventive, curative, and supportive interventions that span the natural history of major cancers among diverse populations and health systems through a program of collaborative research.15 The subjects were female enrollees from 6 large health plans in 5 U.S. states (WA, OR, CA, MN, and MA) for whom receipt of CBE was ascertained for a matched case-control study assessing the effectiveness of breast cancer screening.16 In the present study, we analyzed CBE results among the cancer-free control subjects to estimate CBE specificity. Our study included all CBEs performed on control subjects greater than 1 year prior to the date on which they were ascertained to be free of breast cancer. Thus, the reference standard was the absence of a cancer diagnosis within 1 year of CBE. CBE outcomes among the breast cancer cases have been reported separately.10
Control subjects were matched by age, breast cancer risk, and health plan enrollment period to cases from the same health plan who were aged 40–65 years when diagnosed with breast cancer in 1983–1993 and who subsequently died of breast cancer. Controls had not been diagnosed with breast cancer prior to an index date when breast cancer was first suspected in their matched case. Women were considered to be at “increased breast cancer risk” if they had a family history of breast cancer or a personal history of breast biopsy and “average risk” otherwise.
Receipt of breast cancer testing was ascertained by medical chart review for 3 years prior to each subject’s index date. For some cases, the index date (when breast cancer was first suspected) preceded the breast cancer diagnosis date by up to 2 years, so our sample includes CBEs performed on some controls when they were younger than 40 years old. We classified CBEs as “screening” when performed on asymptomatic subjects not receiving CBE to evaluate a previous positive test (e.g., mammography). CBEs performed on women reporting breast symptoms or to evaluate previous positive tests were considered “diagnostic.”
Our study included 1,484 breast-cancer-free women who received either screening or diagnostic CBE from 1979–1992. Of these, 1,427 women underwent 2,206 screening examinations, and 177 women received 381 diagnostic examinations. Of the 1,427 women who received screening CBE, 120 (8.4%) also received diagnostic CBE. During the study period, each plan recommended approximately annual CBE screening for women aged 40 or more years old. Current plan recommendations either regard CBE as an optional accompaniment to screening mammography (like the U.S. Preventive Services Task Force) or make no specific recommendation regarding CBE screening.
The methods of chart abstraction and data quality monitoring have been described previously.17 In brief, all abstractors completed an 8-step training protocol that included the study of training manuals and mock abstraction of standardized chart examples that were based on common, potentially ambiguous clinical situations (e.g., distinguishing screening from diagnostic tests). Abstractors coded examination results into 1 of 4 categories based on clinicians’ recorded impressions and follow-up recommendations: (1) normal, (2) abnormal benign (e.g., fibrocystic changes not requiring further evaluation), (3) indeterminate (e.g., new abnormality requiring or diagnostic testing or follow-up), or (4) suspicious for cancer.
To monitor data quality, a second reviewer who was blinded to subjects’ cancer status randomly reabstracted selected charts, and interrater reliability of screening and diagnostic mammography results was excellent (Kappa ranged from 0.76 to 0.91).17 In addition, 3 clinicians (MBB, SWF, JGE), blinded to cancer status, reviewed the records of women for whom coded events might be questionable (e.g., a screening CBE within 9 months of a prior indeterminate/suspicious screening CBE) and resolved ambiguities by consensus. Coding remained unchanged for 93% of those labeled as screening CBEs and 92% of those labeled as diagnostic CBEs.
Outcome Measures We defined CBE results as positive if coded “indeterminate” or “suspicious of cancer” because these interpretations would prompt follow-up testing or surveillance. CBE results of “normal” or “abnormal benign” were considered negative. Among average- and increased-risk women, respectively, we estimated the specificities of screening and diagnostic CBE as the proportion of these examinations that were interpreted as negative.
Independent Variables The following were determined on the examination date: age, use of estrogen therapy, and receipt of Pap smear. We classified a woman as an “estrogen user” if she received CBE (1) during a period defined by 2 separate chart notes signifying continuing estrogen use, (2) more than 30 days after starting estrogen therapy and continuation or discontinuation of estrogen was subsequently noted, or (3) within 90 days of an isolated note signifying estrogen continuation. Women were otherwise classified as “nonusers or uncertain.”Additional covariates were determined as of each subject’s index date: family history of breast cancer, including chart documentation of breast cancer in a first- or second-degree relative; number of breast biopsies; history of natural or surgical menopause; and the Charlson Comorbidity Index18 modified to include an additional point for hypertension. We included measures of Pap smear receipt and hypertension because we hypothesized that time allotted for Pap tests and hypertension care could influence CBE duration. A woman was considered “peri- or postmenopausal” if (1) menopausal status was specifically stated in the chart, (2) the woman had undergone surgical oophorectomy, or (3) symptoms of menopause (e.g., vasomotor instability, irregular menses) were recorded in the chart without a specific diagnosis of menopause. Race was also determined whenever possible from the medical record.
We used logistic regression with generalized estimating equations (GEE) to estimate 95% confidence intervals (CI) for specificity estimates, which accounted for dependent outcomes among women who received more than 1 CBE during the study period.19 To identify patient and examination characteristics associated with specificity, we used logistic regression with GEE to model the probability of a true-negative CBE (specificity) as a function of individual covariates while adjusting for health plan. Thus, odds ratios (ORs) greater than 1 signify that a covariate is associated with greater specificity. Independent variables were included in the models as indicator (dummy) variables. When modeling outcomes of diagnostic examinations, we included year of examination first as an indicator variable then as a grouped linear parameter to judge whether the specificity of diagnostic CBE changed linearly over time. Because several covariates were significantly associated with specificity of diagnostic CBE (P<0.05), we repeated logistic regression while including all significant covariates simultaneously. Hypothesis tests were 2-sided with an alpha of 0.05. The study methods were approved by the Institutional Review Boards of each of the 6 health plans.
The subjects ranged in age from 35 to 65 years, and approximately one-fifth were nonwhite (Table 1). Over one-third (34.8%) of women who received screening CBE had either a family history of breast cancer or a personal history of breast biopsy and were classified as increased-risk for breast cancer. Among women who received diagnostic CBE, two-thirds (65.5%) were considered increased-risk by these criteria. Most of the 381 diagnostic examinations (75%) were performed to evaluate breast symptoms. Among women who received screening CBE, the mean number of examinations per woman was 1.55 (SD=0.78; range 1–6). The mean number of diagnostic examinations per woman was 2.15 (SD=1.64; range 1–11).
Among 930 average-risk women who received 1,387 screening CBEs, 9 (0.7%) were interpreted as indeterminate and none were interpreted as suspicious for cancer (Table 2). Among 497 increased-risk women who received 819 screening CBEs, 23 (2.8%) were indeterminate and 1 (0.1%) was suspicious for cancer. Thus, the specificities of screening CBE among average- and increased-risk women, respectively, were 99.4% (95% CI: 98.8–99.7%) and 97.1% (95% CI: 95.7–98.0%).
Among 61 average-risk women who received 115 diagnostic CBEs, 36 (31.3%) were indeterminate and none were suspicious for cancer. Among 116 increased-risk women who received 266 diagnostic CBEs, 100 (37.6%) were indeterminate and 15 (5.3%) were suspicious for cancer. The specificities of diagnostic CBE were 68.7% (95% CI: 59.7–76.5%) among average-risk women and 57.1% (95% CI: 51.1–63.0%) among increased-risk women.
Table 3 shows the associations between individual patient and examination characteristics and CBE specificity after adjustment for health plan. Among screening examinations, both a family history of breast cancer and a history of prior breast biopsy were associated with significantly lower specificity. Thus, compared to women at average breast cancer risk, women at increased risk had significantly reduced odds of true-negative screening CBE (OR 0.21; 95% CI: 0.10–0.46). Patient age, race, menopausal status, estrogen use, chronic disease comorbidity, Pap smear receipt, and examination year were not significantly associated with screening CBE specificity.
Among women who received diagnostic examination, those with 2 or more breast biopsies had significantly lower specificity relative to women without prior breast biopsy (OR 0.53; 95% CI: 0.32–0.89). Reduced specificity of diagnostic CBE was also associated with concurrent estrogen use (OR 0.32; 95% CI: 0.16–0.61) and later examination year (P for trend=0.03). However, examination year was no longer significantly associated with specificity (P for trend=0.27) in a multivariate model that simultaneously controlled for examination year, concurrent estrogen use, number of breast biopsies, and health plan (data not shown).
We found that the screening CBE had very high specificity (>99%) among female health plan enrollees at average-risk of breast cancer. Although specificity of screening CBE was lower among women with increased breast cancer risk, it was still high (>97%) compared to clinical trials of breast cancer screening in which screening CBE specificity was approximately 94%. Diagnostic CBE was relatively nonspecific regardless of breast cancer risk.
Although screening CBE is recommended by some organizations and is commonly performed, few studies have reported screening CBE specificity in community practice. Specificity of screening CBE was 96.3% among women receiving screening CBE within a New England health plan that participated in our study.11 In contrast, higher specificities were observed among nurse examiners within a unique screening program within another participating plan12 and among screening CBEs performed by a single radiologist.8 In our study, screening CBE had similarly high specificity among a geographically diverse sample of women enrolled in 6 regional health plans who received CBE from diverse examiners.
Among a large sample of low-income women receiving examinations in the National Breast and Cervical Cancer Early Detection Program (NBCCEDP),7 the specificity of screening CBE was lower (96.2%) than among average-risk (99.1%) and increased-risk women (97.8%) in our study. Many women eligible to enroll in the NBCCEDP, however, may not have had recent previous examinations. Because of higher cancer prevalence among previously unscreened women, prevalence screens may have greater sensitivity but lower specificity than later rounds of screening. Thus, CBE performance among NBCCEDP enrollees may not generalize to populations receiving more regular breast cancer screening. In addition, data quality assessment of NBCCEDP claims may be limited. If a substantial fraction of enrollees with breast symptoms are misclassified as asymptomatic, estimates of screening CBE specificity based on NBCCEDP claims may be spuriously reduced.
The discrepancy in the specificity of screening CBE in the community and in clinical trials suggests that CBE conduct in actual practice may differ substantially from its conduct in experimental settings. The American Cancer Society recommends regular CBE screening based in part on clinical trial evidence that CBE can detect some cancers that are missed by mammography.20,21 In the trial achieving the highest CBE sensitivity (69%), trained examiners performed CBE in a systematic fashion with a usual duration of 5 to 10 minutes.22 The typical screening CBE in the community is probably less systematic and briefer,13 which may compromise sensitivity while boosting specificity. Indeed, the high specificity observed in our study is consistent with recent studies suggesting low sensitivity of screening CBE in community practice.7–10 Recent calls to standardize CBE conduct and reporting may be well-justified if screening CBE performance in community practice indeed falls short of performance in clinical trials.23
Specificity of screening CBE was significantly lower among women at increased risk for breast cancer. Patient history and perceived risk influence the interpretation of screening mammography.24 Clinical history may similarly lead clinicians to interpret subtle breast abnormalities more cautiously among women with risk factors for breast cancer. Clinicians might also conduct the examination more deliberately among these women, thereby increasing the likelihood of detection and subsequent evaluation of small breast lumps. Even so, the observed specificity of screening CBE among increased-risk women (97.1%) was still higher than among a general population within clinical trials of breast cancer screening (94%).6
Breast cancer is relatively common among women with certain breast symptoms,25 and cancer may be impossible to exclude based on physical examination alone. In this respect, the low specificity of diagnostic CBE may reflect good practice. Diagnostic CBE was significantly less specific among current estrogen users. This association could arise from estrogen effects on breast tissue or other characteristics of estrogen users that prompt further evaluation after diagnostic CBE.
The examinations in our study were performed from 1979 to 1992, and one might posit that current CBEs are of higher quality. However, we found no evidence of temporal trends in screening CBE performance and doubt that U.S. clinicians currently conduct the screening CBE differently. In addition, we studied a population of insured women aged 35–65 years with stable health plan enrollment, and our findings may not be generalizable to women outside this age range or who have different insurance statuses. Although our sample size is smaller than previous studies, precise estimates of screening CBE specificity do not require large samples because false-positive screens occur uncommonly. We found no significant association between menopausal status and specificity, yet breast density (which we could not measure directly) may affect CBE specificity independently from menopause.12 In addition, our study did not measure body mass index, which has been associated with lower sensitivity of screening CBE.9 Lastly, our study lacked detailed data on the reasons for diagnostic examination beyond the presence of symptoms or a previously positive test and included relatively few diagnostic examinations among average-risk women.
Our study has several important strengths, including a geographically diverse sample of women from 5 U.S. states who received CBE from a range of clinical examiners. In addition, careful ascertainment of examination purpose and breast cancer risk factors allowed us to estimate the specificity of both screening and diagnostic CBE and among both average- and increased-risk women. We studied a population receiving regular screening,11 rather than prevalence screening as in the NBCCEDP,7 which may provide more accurate estimates of CBE specificity among a general population of women receiving regular CBE screening.
Our study suggests that screening CBE in community practice has substantially higher specificity than in clinical trials of breast cancer screening. Diagnostic CBE, meanwhile, is relatively nonspecific. Our findings are consistent with other reports of high CBE specificity in unique settings8,12 and recent reports of low sensitivity in community practice.7–10 Discrepant performance of CBE in the community and clinical trials may reflect clinically important differences in examination conduct, duration, and interpretation among community clinicians compared to highly experienced examiners in experimental settings. While high specificity implies a lower risk of false-positive CBE, this benefit may come at the cost of lower sensitivity for breast cancer. Clinicians should try to perform CBE in a deliberate fashion like examiners in clinical trials, and women should be informed that a high-quality CBE may require minutes rather than seconds.
The Cancer Research Network (CRN) consists of the research programs, enrollee populations, and databases of 11 health maintenance organizations (HMOs) that are members of the HMO Research Network. The health care delivery systems participating in the CRN are Group Health Cooperative, Harvard Pilgrim Health Care, Henry Ford Health System, HealthPartners Research Foundation, the Meyers Primary Care Institute of the Fallon Healthcare System/University of Massachusetts, and Kaiser Permanente in 6 regions: Colorado, Georgia, Hawaii, Northwest (Oregon and Washington), northern California, and southern California. The authors thank Sarah Greene, Kevin Beverly, Gene Hart, and the data abstractors for their efforts on this project.
Financial Support Supported by the American Cancer Society (Grant #MRSGT-05-214-01-CPPB to Dr. Fenton) and the National Cancer Institute (Grants U19CA79689 to Dr. Edward H. Wagner and 1 K05 CA104699-02 to Dr. Elmore). During early phases of the project, Dr. Geiger was with the Southern California Kaiser Permanente, Research and Evaluation Department.
Potential Financial Conflicts of Interest None disclosed.