|Home | About | Journals | Submit | Contact Us | Français|
Various interventions to promote repeat use of mammography have been evaluated, but the efficacy of such interventions is not well understood.
We searched electronic databases through August 15, 2009, and extracted data to calculate unadjusted effect estimates (odds ratios [ORs] and 95% confidence intervals [CIs]). Eligible studies were those that reported estimates of repeat screening for intervention and control groups. We tested homogeneity and computed summary odds ratios. To explore possible causes of heterogeneity, we performed stratified analyses, examined meta-regression models for 15 a priori explanatory variables, and conducted influence analyses. We used funnel plots and asymmetry tests to assess publication bias. Statistical tests were two-sided.
The 25 eligible studies (27 effect estimates) were statistically significantly heterogeneous (Q = 69.5, I2 = 63%, P < .001). Although there were homogeneous subgroups in some categories of the 15 explanatory variables, heterogeneity persisted after stratification. For all but one explanatory variable, subgroup summary odds ratios were similar with overlapping confidence intervals. The summary odds ratio for the eight heterogeneous reminder-only studies was the largest observed (OR = 1.79, 95% CI = 1.41 to 2.29) and was statistically significantly greater than the summary odds ratio (Pdiff = .008) for the homogeneous group of 17 studies that used the more intensive strategies of education/motivation or counseling (OR = 1.27, 95% CI = 1.17 to 1.37). However, reminder-only studies remained statistically significantly heterogeneous, whereas the studies classified as education/motivation or counseling were homogeneous. Similarly, in meta-regression modeling, the only statistically significant predictor of the intervention effect size was intervention strategy (reminder-only vs the other two combined as the referent). Publication bias was not apparent.
The observed heterogeneity precludes a summary effect estimate. We also cannot conclude that reminder-only intervention strategies are more effective than alternate strategies. Additional studies are needed to identify methods or strategies that could increase repeat mammography.
Regular mammography screening has been shown to reduce mortality from breast cancer in women aged 50–74 years, but it is not known whether interventions to promote regular screening, for example, reminders, educational outreach, and counseling, are effective.
The effectiveness of various intervention strategies was examined in a systematic review and meta-analysis of studies that reported estimates of repeat screening for intervention and control groups.
The 25 analyzed studies were heterogeneous overall, as were most subgroups in a stratified analysis. The intervention effect of a group of reminder-only studies was statistically significantly greater than that of a group of studies that used more intensive strategies such as education/motivation or counseling.
Heterogeneity prevents firm conclusions about the effectiveness of more intensive vs less intensive strategies. More studies with consistent designs and well-defined intervention categories are needed.
Not all studies compared participants with dropouts or reported differential attrition by study group, which may result in overestimated effect sizes. Most of the studies were conducted more than 10 years ago, and most were of non-Hispanic white women, so the results may not be applicable to the present day or to other ethnic groups.
From the Editors
Breast cancer is the second leading cause of cancer deaths in women in the United States (1). Regular screening with mammography has been shown to reduce mortality from breast cancer in women aged 50–74 years by approximately 23% (2). To maximize the population benefit related to mortality reduction, the US Preventive Services Task Force recommended in 2002 that women aged 40 years and older be screened with mammography every 1–2 years. Although the Task Force has raised the minimum age for biennial screening to 50 years of age, they still suggest that younger women discuss mammography with their doctors to make an informed decision based on their family histories, personal values, and general health (http://www.ahrq.gov/clinic/uspstf09/breastcancer/brcanrs.htm) (3). Breast cancer screening with mammography increased substantially since 1990. Surveillance data from the National Health Interview Survey show that the prevalence of self-reported recent use (within the past 2 years) in women 40 years of age and older increased from 30% in 1987 to 70% in 2000 (4); however, data from the 2005 National Health Interview Survey show a decline to 66% (5). The prevalence of regular or repeat mammography use, that is, consecutive, on-schedule mammograms, is lower compared with recent use, that is, one mammogram within the past 2 years. A review of 37 regional studies of repeat mammography conducted through 2001 found that the overall weighted average prevalence was 46.1% (95% confidence interval [CI] = 39.4 to 52.8) (6). Summary estimates also showed that repeat use increased from 26.5% (95% CI = 12.9 to 40.0) in studies conducted before 1991 to 53.2% (95% CI = 44.7 to 61.8) in those conducted between 1995 and 2001 (6).
Systematic reviews and meta-analyses have shown that many intervention strategies are effective at motivating women to have one mammogram during the study period (7–19). Most meta-analyses consistently show that minimal interventions such as reminders directed at patients (7,8,10–12,14,17,19) or providers (13–15,18) delivered through a variety of communication channels are effective in increasing one-time mammography screening compared with a no-intervention control group. Systematic reviews and meta-analyses also demonstrate that more intensive patient-directed interventions, including those using multiple strategies (eg, letter plus telephone call, letter plus voucher), those tailored to an individual’s beliefs or characteristics (eg, a personalized message that addresses a woman's concerns such as fear of finding cancer), and those based on health behavior theories (a list of theories commonly used in health promotion research is available at http://dccps.cancer.gov/brp/constructs) reported larger intervention effects compared with a no-intervention control group or with minimal interventions, such as mail or telephone reminders (7,11,12,16,19). Reducing barriers to access such as cost or transportation also is associated with increased mammography use (8,17).
In contrast to interventions to promote one-time screening, the efficacy of interventions designed to promote regular mammography screening is not well understood (20). To our knowledge, there is no systematic review or meta-analysis of interventions that promote repeat mammography screening, perhaps because fewer interventions have reported repeat mammography outcomes compared with one-time use. Developing approaches to encourage women to maintain a regular schedule of mammography screening is needed if we are to realize a reduction in breast cancer mortality.
We conducted a systematic review of repeat mammography intervention studies, in which we evaluated sampling, methods, and intervention characteristics. We examined consistency of effect across studies and assessed completeness of reporting on selected study characteristics related to internal and external validity. We also identified gaps in the literature and make recommendations for further research.
We conducted electronic database searches in consultation with a medical librarian who was trained in systematic review literature searches to identify articles reporting the effects of behavioral interventions on repeat mammography. We concluded the database searches on August 15, 2009. First, we searched MEDLINE (OVID) from 1966. Then, we adapted the search for CINAHL (OVID) from 1982, PsycInfo (OVID) from 1967, and Academic Search Premier (EBSCO) from 1990. We repeated the search terms used in the Clark et al. review (6), which paired “mammogra$” and seven keywords: “regular,” “repeat,” “adherence,” “compliance,” “annual,” “rescreen,” and “maintenance” (“maint$”); we added “biennial,” “on schedule,” and “guideline$.” We used several search terms to identify controlled trials or interventions, as recommended by the Cochrane Collaboration (21). We combined search results from the two previous steps and then limited the results to human studies published in English that were not editorials, commentaries, letters to the editor, reviews, or meta-analyses. Using Medical Subject Headings terms (http://scientific.thomson.com/support/faq/wok3new/medline/#MeSH), we further excluded studies that focused on diagnostic techniques and procedures, biopsy, and drug therapy. Reference lists from eligible articles and observational studies of repeat mammography were hand searched for additional reports. We also hand searched the Cochrane databases (http://www.thecochranelibrary.com/view/0/index.html) for eligible studies, as well as literature reviews of mammography interventions.
To be included, studies had to report an estimate of repeat mammography use for at least one intervention group and one concurrent comparison group. We used the Clark et al. (6) definition of repeat screening, that is, at least two consecutive, on-schedule mammograms during a given period (approximately 1–2 years apart); a certain number of mammograms during a given period (at least two within the past 5 years); or at least two mammograms on an age-appropriate schedule (eg, biennially for women in their forties). Like Clark et al. (6), we excluded studies defining repeat mammography as more than one lifetime mammogram without regard to period, receipt of a single mammogram with an intention to obtain future screening, or behaviors of health-care providers such as a recommendation to get screened. The intervention had to include women at average risk for breast cancer, but it also could address other health-related behaviors in addition to mammography, such as other cancer screening behaviors, smoking, or physical activity.
Two authors independently reviewed titles and abstracts to identify relevant articles according to the inclusion and exclusion criteria. Full-text articles were reviewed for eligibility when more information was needed. A hierarchical categorization scheme was applied to each study to determine eligibility. Studies were first classified as including an intervention or no intervention. Intervention studies were considered for inclusion and further classified as having repeat mammography as an outcome. Disagreements about study eligibility were discussed with all coauthors until consensus was reached.
We used a standardized data extraction form to record descriptive information from eligible studies. In studies of women with diverse mammography histories, we included only the subgroup of women who were eligible to complete at least one repeat mammogram. Specifically, we identified two types of study designs used in intervention studies of repeat mammography. In the most common type of design, study participants with a recent (ie, not overdue) mammogram before the intervention were followed long enough to receive one postintervention mammogram on schedule (hereafter referred to as design 1). In the other design, study subjects were followed long enough to receive two postintervention mammograms on schedule regardless of mammography history (hereafter referred to as design 2).
We extracted information on the following study characteristics: year(s) the study was conducted, age range of participants, race/ethnicity (percent white was used rather than a racial or ethnic breakdown because most of the studies reporting racial/ethnic composition of the study population had studied white women), study setting (health care or community), percent of the sample with a recent mammogram at study baseline, screening interval used to measure adherence (1 year, 2 years, or age dependent [biennially for women in their forties and annually or biennially for women in their fifties]), study design type (1 or 2), data source for mammography status (medical records, administrative or program data, or self-report), intervention strategy, mode of intervention delivery (mail only, telephone only, mail plus telephone, mail plus in person, or community education plus other modes), number of study groups, type of control group (no contact, survey only, or active [alternative intervention of equal or lower intensity]), theoretic framework [eg, health belief model (22)], and theoretic constructs (eg, barriers such as cost and stage of change or readiness to be screened [http://dccps.cancer.gov/brp/constructs]).
For studies that included more than one intervention group, we abstracted information on the intervention strategy of greatest intensity (eg, personalized vs generic messages or multiple vs single strategies) or based on the author's hypothesis. For example, Finney and Iannotti (23) tested messages framed in terms of what would be gained or lost by getting a mammogram and hypothesized that messages emphasizing what would be lost by not being screened would be more effective than messages emphasizing what would be gained. We also abstracted information on delivery mode and on theoretic frameworks and constructs only for the group that received the most intensive intervention. Where relevant, we identified the theoretic constructs used to deliver tailored messages. For example, Clark et al. (6) created letters using each woman's interview responses to questions about her stage of change or readiness to get a mammogram and her perceived benefits of and barriers to getting screened.
A variety of approaches have been used to describe and classify intervention strategies, and no one approach is considered the gold standard (24). Because there is no agreed-upon classification, we classified intervention strategies in three ways. Our primary classification was informed by health behavior theory (http://dccps.cancer.gov/brp/constructs) and consisted of three categories: reminder, education/motivation, and counseling. We based this classification on the work of Kreuter et al. (25,26) who describe intervention strategies and messages in terms of their personal relevance to the recipient. In general, generic messages such as mailed reminders are considered to be the least intensive, whereas communication based on an assessment of a person's beliefs and attitudes and delivered through interpersonal communication channels is considered to be the most intensive. Reminders, whether generic or personalized, consisted of minimal print or telephone messages that served as a cue to action or prompt by letting women know that they were due for screening. Reminders could contain minimal information such as a statement that a woman was due for screening and should call to schedule an appointment, or they could contain brief motivational messages based on health behavior theory, along with the reminder. Educational/motivational strategies consisted of print messages to increase knowledge, facilitate attitude change, and motivate women to be screened. Messages may or may not be personalized on the basis of information obtained from personal assessments (eg, surveys or interviews) or from other data sources such as medical records. Counseling strategies, typically delivered over the telephone or in person, are considered to be the most personalized form of communication because they engage women in a dialogue in an attempt to change attitudes, address barriers such as perceived risk of developing breast cancer or fear of pain from the mammogram, and motivate women to be screened. If an intervention used multiple strategies, we classified it based on the most intensive strategy.
Our second approach to classifying intervention strategies was to contrast the subset of studies that explicitly stated that they used barriers-specific telephone counseling with those that did not. Barriers-specific telephone counseling is an intervention approach used by the National Cancer Institute's Breast Cancer Screening Consortium (27), in which a counselor uses a standardized protocol to identify a person's barriers to performing a health behavior and provides information to address and overcome the barriers. In the third approach, we classified intervention strategies based on whether the study used a single intervention strategy (eg, reminder only) or multiple strategies (eg, reminder plus another strategy). We also classified intervention delivery in two ways: by delivery channel (eg, mail only, telephone only) and by whether a single mode or multiple modes were used.
Eligible studies were assigned to one coauthor (S. W. Vernon, A. McQueen, J. A. Tiro) for review. To assess reliability, one coauthor (D. J. del Junco) reviewed a randomly selected 10% sample of the studies; disagreements were discussed by all authors until consensus was reached. Data to calculate effect size estimates were independently extracted by two coauthors, and disagreements were discussed by all authors until consensus was reached. When available, we reviewed other published reports of the same study; however, in only one instance, did we gain additional relevant information. In that case, companion articles reported on different aspects of a single study, and we treated these reports as one study (28,29). We did not contact authors to obtain unpublished data or missing information because that source is not readily available to readers and because we wanted to evaluate the extent to which selected study characteristics were reported in the published literature.
Every study produced one or more comparisons of an intervention group with a control group. When there was more than one intervention condition, we contrasted the most intensive intervention condition with the control condition. Odds ratio (OR) effect sizes and 95% confidence intervals were calculated using cell frequencies or proportions from 2 × 2 contingency tables (30). We recalculated unadjusted odds ratios and confidence intervals for all studies; however, for two studies (31,32), insufficient data were available, and we used adjusted estimates reported by the authors. Statistical tests were two-sided.
Tests for measuring heterogeneity were conducted using the Q statistic with a P < .05 criterion and an I2 statistic with a cutoff greater than or equal to 50% to indicate substantial heterogeneity (33). A statistically significant Q indicates a heterogeneous distribution of study effect sizes, which may then warrant additional subgroup analyses (30). The I2 statistic describes the percentage of the variability in effect estimates because of heterogeneity rather than sampling error or chance alone (33). Heterogeneity tests were not performed when there were fewer than five observations in a category of a variable. Heterogeneity tests were performed using STATA 10.0 (34).
Variance-weighted summary effect sizes were computed. We used a fixed-effects model to summarize homogeneous distributions. For heterogeneous distributions, we report summarizations based on random-effects models (35). We also performed random-effects meta-regression analyses using STATA on potential explanatory variables of intervention effects (35,36). Meta-regressions were performed with each variable univariately, with a forward variable selection procedure that included variables with a univariate P less than .25 and eliminated variables with a multivariable-adjusted P greater than .05.
For the heterogeneity and meta-regression analyses, we based our a priori choice of potential explanatory variables on prior systematic reviews of studies that measured completion of one mammogram during the study period (8,16) and on factors associated with repeat screening examined by Clark et al. (6). We examined 15 covariates (categorization of these variables is described earlier): age, study setting, screening interval, study design type, data source for repeat mammography outcome, intervention strategy (classified in three ways, as described earlier), mode of intervention delivery, number of delivery modes, control group type, use of a theoretic framework, two theoretic constructs (barriers and stage of change), and use of tailoring (personalizing the message).
Most studies used multiple theoretic frameworks, and there were not enough studies using a given framework to form reliable groupings. For example, only three studies used the transtheoretical model (37) alone and only one study used the health belief model (22) alone, the two most frequently cited models. Therefore, we classified studies in terms of whether a theoretic framework was used (yes or no). Likewise, with only a few exceptions, very few studies measured the same theoretic constructs. Two exceptions were the barriers construct from the health belief model (22), which was measured in 13 studies, and the stage of change construct from the transtheoretical model (37), which was measured in nine studies. Those were the only two theoretic constructs included in our analyses. We also identified studies that tailored messages on the basis of one or more constructs, for example, knowledge and stage of change. We created a variable called use of tailoring and classified studies as yes or no on that variable.
To examine the contribution of individual studies to the overall summary effect estimate, we conducted an influence analysis (omitting one study at a time) (38). The influence analysis produces a graph enabling the assessment of the influence of one study on the overall meta-analysis summary odds ratio estimate by visually comparing summary effect estimates after the removal of each study's effect estimate on successive turns. To assess the potential for publication bias, we performed funnel plot asymmetry tests (39,40). The Begg test (39) is directly analogous to a visual assessment of funnel plot symmetry (ie, the dispersion of all point estimates from all studies on a graph to form a symmetrical funnel shape), and it tests whether the Begg rank correlation between effect size and its SE is zero. Pseudo confidence intervals are the points connected by the diagonal lines forming the “funnel” on the funnel plot; they are the expected 95% confidence intervals for a given SE (depicted as increasing along the x-axis). The Egger test (40) is a regression of the standardized effect size (eg, log OR/SE of log OR) against its precision (eg, 1/SE of log OR). If the intercept of the Egger regression line differs statistically significantly from zero, publication bias may be present.
The importance of systematically assessing aspects of internal validity in health promotion trials has been recognized for some time (41). One aspect of internal validity that has recently been emphasized is the design and analysis of group- or cluster-randomized trials in cancer prevention and control (42). Recent attention also has been directed to the importance of assessing external validity in health promotion research, and several frameworks for assessing it have been proposed (43,44). We assessed studies for completeness of reporting on aspects of internal and external validity related to representativeness of the study population and to design and analysis issues. To assess representativeness, we recorded whether or not authors provided information on the response rate at baseline, comparison of respondents and nonrespondents at baseline, equivalence of study groups at baseline, response rate at follow-up, comparison of the final sample to dropouts, and whether there was differential attrition by study group. To assess design and analysis issues, we recorded whether the authors provided information on how the sample size was determined (ie, a priori statistical power analysis), whether an intent-to-treat analysis was performed, the unit of randomization, and, for group-randomized trials, whether the outcome analysis was adjusted for nested data.
We identified 319 unique articles that matched our search criteria. Of the 319 articles reviewed for eligibility, 165 studies were excluded because they were not intervention studies (Figure 1). Of the 153 intervention studies, 39 reported repeat mammography outcomes. Thirteen of the 39 were excluded because the study design (45–49), analysis or reporting (50–55), or study sample (56,57) could not be directly compared with the other repeat mammography studies. In one study that stratified results by family history of breast cancer (23), we extracted data only for the group of women without a family history because our focus was on women at average risk. We reached consensus that 25 studies (26 articles) were eligible for review (Figure 1).
Study population and setting.Most studies were conducted in the mid-to-late 1990s or later; four did not report when the study was conducted (Table 1). Nine studies included women with a minimum age of 40 years, 14 with a minimum age of 50–52 years, and two with a minimum age of 65 years. Fifteen studies imposed an upper age limit, whereas the rest did not.
Of the 19 studies reporting racial/ethnic composition of the study population, most studied non-Hispanic white women. Seventeen of the 25 studies recruited women through health-care settings (health maintenance organizations, primary care practices, outpatient clinics, or mammography facilities). Of the eight studies that recruited community residents, five used population-based sampling strategies (29,58–61), two recruited through the state's Breast and Cervical Cancer Early Detection Program (62,63), and one through churches (64).
Most studies recruited women with diverse mammography histories, but 11 (23,59,62,63,65–71) recruited only women who had an up-to-date mammogram and would become due for another one during the study period. In studies of women with diverse mammography histories, the percentage of women with an up-to-date mammogram at study baseline (and therefore eligible to be included in our effect size estimates) ranged from 4% (31) to 72% [(32), Table 1].
Measurement of repeat mammography.Mammography adherence was assessed as annual use in 17 studies, biennial use in six studies, and age dependent (ie, biennial for women <50 years of age and annual for women ≥50 years of age) in two studies (Table 1). Eighteen studies used design 1, that is, they measured one pre- and one postintervention mammogram. Five studies used design 2, that is, they measured two on-schedule postintervention mammograms, and two studies reported data in a way that permitted calculation of repeat mammography estimates for both design types. Thus, a total of 27 individual effect size estimates were available for analysis. Thirteen of the 25 studies used some type of objective record data to measure the outcome of mammography completion, nine used self-report, and three used a combination of records and self-report (Table 1).
Description of the interventions.All but two studies were randomized controlled trials. Eaker et al. (60) selected eight counties and designated four as intervention and four as control. Quinley et al. (71) identified mammography facilities that used reminders and compared them with facilities that did not. Eight studies used only patient reminders as the primary intervention strategy (23,31,62,65–67,69,71), six used educational or motivational strategies (29,59,60,63,70,72), and 11 (32,58,61,64,68,73–78) involved in-person or telephone counseling (Table 1). The one study (73) that used lay health advisors or navigators was grouped with the counseling interventions for subsequent analysis because their role was to educate women about breast cancer and screening and to assist women in scheduling and completing a mammogram. Variability existed within intervention strategy. For example, reminders to complete a mammogram varied by source (ie, facility or personal physician), tailoring or framing of the message, and timing. About half of the 25 studies used multiple intervention strategies, and eight (32,58,61,68,74–77) of the 11 counseling studies used some variation of barriers-specific telephone counseling. Intervention delivery mode also varied across studies (Table 1); nine involved only mail, three used only telephone, nine used both mail and telephone, two involved mail and in-person contact, and two used a variety of community education strategies in addition to mail or telephone.
The number of study groups varied, with some studies using multiple intervention and comparison groups (Table 1). Eleven studies used a survey-only control group, whereas 12 had an active control or comparison intervention group. Only two studies included a no-contact control group.
Most studies (17 of 25) used a theoretic framework to guide the study intervention, and most studies that used a framework used more than one (Table 1). The most commonly used frameworks were the transtheoretical model (37) and the health belief model (22). The constructs most frequently used in interventions were barriers from the health belief model and stage of change from the transtheoretical model. Twelve studies tailored intervention messages on one or more theoretic constructs (29,32,58,61,64,68,72,74–78). Of the eight studies that did not explicitly identify a theoretic framework (31,60,62,63,66,69,71,73), all but one (73) used patient reminders as at least one of the intervention strategies, an approach consistent with the cue to action construct from the health belief model.
Across all 25 eligible studies (27 effect estimates), the test for heterogeneity was statistically significant (Q = 69.5, I2 = 63%, P < .001; Table 2 and Figure 2). When subgroups of studies were classified under each of our 15 a priori categorical covariates, nine homogeneous subgroups were identified but only for certain categories of a covariate (Table 2): community study setting, design 2 (ie, two postintervention mammograms), self-report data for mammography completion, the three ways of classifying intervention strategies (use of education/motivation or counseling; use of barriers-specific telephone counseling; and use of multiple intervention strategies), use of the stage of change construct in the intervention, and use of tailoring. Statistically significant heterogeneity remained in all categories of the other seven covariates. With one exception, confidence intervals overlapped when comparing odds ratios across categories within each covariate, indicating similar subgroup effect sizes (ORs). The exception was the reminder-only intervention strategy (OR = 1.79, 95% CI = 1.41 to 2.29, P < .001) compared with the education/motivation strategy (OR = 1.25, 95% CI = 1.14 to 1.38, P = .868) (Table 2). The summary odds ratio for the eight heterogeneous reminder-only studies was the largest observed (OR = 1.79, 95% CI = 1.41 to 2.29) and was statistically significantly greater than the summary odds ratio (Pdiff = .008) for the homogeneous group of 17 studies that used the more intensive strategies of education/motivation or counseling (OR = 1.27, 95% CI = 1.17 to 1.37) (Table 2).
Variables with more than two categories were dichotomized for the meta-regression. For screening interval, the category age dependent, representing two studies (60,73), was combined with the category 1 year because the majority of women were aged 50 years or older, and 1 year is the commonly recommended interval. For data source, the three studies (69,70,73) that used a combination of self-report and medical records were combined with studies using medical records. For control group type, the two studies (66,71) that used a no-contact control group were combined with the survey-only category. For intervention strategy, the categories education/motivation and counseling were combined because their odds ratios were the same. Delivery mode was dichotomized as mail only vs other modes. In meta-regression modeling that included all 27 estimates, the only statistically significant predictor of the magnitude of the odds ratios was the intervention strategy of reminder only vs education/motivation and counseling combined as the referent (OR = 1.35, 95% CI = 1.08 to 1.68, P = .011). However, the overall I2 value for this model remained 60.6%, indicating substantial residual heterogeneity and confirming the results of the heterogeneity analyses in Table 2 for the studies using reminder-only interventions.
In the influence analysis that included all 27 estimates, omitting the Mayer et al. (67) reminder-only study had the most pronounced effect (I2 changed from 62.2% to 57.7%) but only slightly decreased the summary odds ratio estimate (from 1.39 to 1.35). In a separate influence analysis restricted to the eight reminder-only intervention studies that remained heterogeneous (23,31,62,65–67,69,71), omitting Quinley et al. (71) substantially increased the summary odds ratio (from 1.79 to 1.93). Neither the funnel plot pattern (Figure 3) nor the results from the asymmetry hypothesis tests (Begg test: P = .17; Egger test: P = .54) suggested evidence of publication bias.
Nonresponse at baseline was not an issue for the nine studies (23,31,59,62,63,65,66,69,71) that used records to identify and track the study sample for outcome measurement (Table 3). For the other 16 studies, all but one (64) reported the response rate at baseline; however, of these 16 studies, only two (29,74) compared characteristics of respondents and nonrespondents at baseline. Nineteen of the 25 studies tested for equivalence of study groups at baseline on selected variables (Table 3). Fifteen of the 16 studies that actively recruited participants (as opposed to tracking them passively through medical records or administrative databases) reported response rates at follow-up; however, only four (28,58,75,76) compared the final sample with dropouts. Only eight studies that should have tested for differential attrition across study groups did so (Table 3).
Only six studies reported how the sample size for the intervention trial was determined, and only 11 studies conducted intent-to-treat analyses (Table 3), defined as including everyone who was randomized to an intervention or control group (79). Of the four group-level, randomized trials (61,64,74,75), all described analyses intended to adjust for the design effect; however, neither of the two group-level, nonrandomized trials (60,71) adjusted for the effect of nested samples. Two of the studies (67,76) that identified individuals as the unit of randomization and analysis could have but did not report the intraclass correlation or magnitude of the design effect to describe the possible influence of patients nested within providers or facilities.
We observed statistically significant heterogeneity (P < .001) among the effect size estimates (ORs) in the 25 studies (27 estimates), indicating variability in effect estimates of repeat mammography rates. Of the 15 categorical covariates identified a priori as study characteristics that may influence effect size, no single covariate resolved the heterogeneity in univariate analyses. In multivariable meta-regression, only the intervention strategy of reminders vs the combined categories of education/motivation and counseling remained a statistically significant predictor of the magnitude of the intervention effect as measured by the odds ratio; however, substantial residual heterogeneity persisted in the model. The summary odds ratio for the eight heterogeneous studies using reminders was the largest observed (OR = 1.79, 95% CI = 1.41 to 2.29 computed under a random-effects model) and was statistically significantly (Pdiff = .008) greater than the summary odds ratio for the homogeneous group of 17 studies that used the more intensive strategies of education/motivation or counseling (OR = 1.27, 95% CI = 1.17 to 1.37) regardless of whether it was computed under a fixed- or random-effects model. It is important to note that the eight studies using reminders showed statistically significant heterogeneity despite the notoriously low statistical power of homogeneity testing (80). Moreover, all eight studies were alike in terms of using medical records or administrative data to ascertain mammography status using design 1 (one pre- and one postintervention mammogram) and, except for one study, being conducted within a health-care setting. The results of the influence analyses confirmed that the observed heterogeneity was mostly attributable to one or two of the studies using reminders (65,67). Because of this heterogeneity, we cannot conclude that the use of a reminder intervention strategy within a health-care setting is more effective than alternate intervention strategies in the same or different study settings. Therefore, additional studies are needed to help resolve the remaining heterogeneity in this subgroup by identifying the explanatory study characteristics or research methodologies that are the key factors in increasing repeat mammography.
The 17 studies that used education/motivation or counseling were remarkably homogeneous in their effect sizes with a narrow confidence interval, suggesting a high degree of consistency among the studies and that the true intervention effect of these strategies may be, at best, moderate, that is, odds ratios between 1.18 and 1.36. The results of the meta-regression modeling further suggest that, among these homogeneous studies, there was no detectable advantage or disadvantage in the different study designs, methods, settings, populations, intervention strategies, delivery modes, outcome measurements, screening intervals, or use of theory. This finding raises a question as to whether substantial increases in regular mammography screening can be expected from education/motivation or counseling interventions, regardless of how intensive, rigorous, innovative, or expensive the approach. In other words, changes in regular mammography screening behavior may not be particularly sensitive to variations in education/motivation or counseling interventions. In the current US environment, substantial increases in regular cancer screening behavior may depend more on factors at the systems level (eg, regulations relating to health-care access such as insurance coverage and standards of preventive care) than on factors at the individual level such as perceived risk of breast cancer.
Our finding of a relatively modest intervention effect for the subgroup of more intensive intervention studies is consistent with the finding in a meta-analysis (16) that was restricted to tailored interventions and that focused on one-time mammography screening. Sohl and Moyer (16) found that interventions promoting repeat mammography had a smaller effect size (OR = 1.17) compared with those promoting one-time use (OR = 1.53). As discussed previously, however, we cannot conclude that lower intensity interventions such as reminders are a better strategy without additional research to determine whether a particular type of reminder strategy is effective across different study settings, populations, and study methodologies.
An unexpected finding was that there were two types of study designs used in repeat mammography interventions. In design 1, women who had had a recent preintervention mammogram at study baseline were followed long enough to complete one on-schedule mammogram during the study period. In design 2, women with diverse mammography histories at study baseline were followed long enough to complete two on-schedule mammograms during the study period. In design 2 studies, women who were overdue or had never been screened may have been more resistant to attempts to get them to complete screening. Although the confidence intervals overlapped, the odds ratio for design 2 studies was smaller compared with design 1 studies, suggesting that women who were not overdue at study baseline were more likely to complete another mammogram on schedule compared with a group of women that included some who were overdue. Future studies should consider how mammography history may affect receptivity to different types of interventions (81). For example, if a woman experiences the procedure as painful, she may be unwilling to return for her next mammogram when it is due and may disregard reminders or messages promoting mammography.
There is no consensus about how to classify types of intervention strategies. Systematic reviews and meta-analyses of one-time mammography screening have used a number of different classifications (7–19). The lack of consistency may result, in part, from lack of consensus about how to operationalize our theoretic frameworks and constructs (24). In addition, many of the interventions reviewed here were complex, multicomponent interventions and, therefore, difficult to classify. For these reasons, we explored three approaches to classifying intervention strategies. None of these approaches yielded homogeneous subgroups across all categories of a variable, and, with the possible exception of reminders, the effect sizes were generally similar.
In our assessment of the quality of reporting for eight characteristics of internal validity, most studies tested for equivalence of study groups at baseline and reported the response rate at follow-up (Table 3). Fewer studies reported whether they compared the characteristics of participants who remained in the study with those who dropped out, and even fewer reported whether there was differential attrition by study group (Table 3). Only 11 studies conducted an intention-to-treat analysis, so it is likely that the effect sizes were overestimated because data on other cancer screening behaviors suggest that dropouts are less likely to complete screening tests (82–84). There was far less attention to reporting study characteristics that affect external validity such as the representativeness of participants and settings. As noted by Steckler and McElroy (85), “Systematic reviews and meta-analyses are limited in the conclusions that can be drawn when external validity data are not reported.” This limitation needs to be addressed if we are to successfully disseminate effective interventions. Application of frameworks to address internal (41,86) and external (44,87) validity in the implementation and evaluation of interventions will enable us to learn from our successes as well as our failures.
A limitation of our systematic review is that several of the studies reviewed here were not explicitly designed to promote repeat mammography, although they provided data that allowed us to calculate effect estimates. It may be that had their interventions been designed to address repeat mammography in addition to one-time mammography, the effect estimates in those studies would have been different. Although there is an extensive body of intervention research on one-time mammography screening, the number of intervention studies of repeat mammography is comparatively small, and estimates in some categories of the predictor variables were unstable. In addition, most of the studies were conducted more than 10 years ago, and most were of non-Hispanic white women, thus limiting our ability to generalize the findings to the present day and to other ethnic groups.
If we are to reap the benefits of mortality reduction from mammography screening, we need a better understanding of the determinants of repeat screening behavior so that we can develop more effective interventions. This review called attention to a number of characteristics in the studies to date, which, if attended to in future studies, will increase our understanding of how to develop and implement interventions to increase regular mammography screening. Because reporting standards are increasingly being adopted by journal editors, it will be easier to synthesize the literature and draw conclusions about what works, under what circumstances, and for what reasons.
National Cancer Institute (NCI)/National Institutes of Health (RO1-CA-76330 to S.W.V., D.J.d.J., and J.A.T.); NCI (RO3-CA-103512 to D.J.d.J. and S.W.V.); UL1 (RR024148—CTSA to D.J.d.J.); NCI (R25-CA-57712 to A.M.); American Cancer Society Mentored Research Scholar Grant (CPPB-113777 to A.M.).
The funders did not have any involvement in the design of the study; the collection, analysis, and interpretation of the data; the writing of the manuscript; or the decision to submit the manuscript for publication.
The authors thank Sharon Coan for her assistance with data analysis.