To our knowledge, this is the first cancer screening intervention trial in the United States that used a well-defined, nationally representative study population and that provided evidence of both internal and external validity (28
). In addition, few intervention trials have conducted ITT analyses or compared intervention effects using different analytic approaches. An exception is a recent study of a substance abuse program (67
) that analyzed data using both ITT and PP approaches and found that positive findings from the PP analysis were not replicated using ITT.
In none of our primary analyses did the tailored and targeted intervention (group 1) result in higher mammography rates than the targeted-only intervention (group 2), and there was limited support for either intervention being more effective than the baseline survey alone (group 3). Our study had adequate statistical power to test hypotheses of 7%–12% absolute between-group differences for mammography coverage and compliance, assuming that rates in the control group were in the range of 35%–65% for coverage and of 15%–45% for compliance. Mammography rates in the control group were within these ranges for all primary analyses except coverage in the MITT and PP analyses, for which the cumulative incidence estimates were around 80%. With such high rates, it may be difficult for any intervention to increase coverage (ie, a ceiling effect). Follow-up time was adequate for women to complete two postintervention mammograms before the end of the study, even assuming that they did not receive their first postintervention mammograms until the end of the first follow-up period. However, some women may not follow an annual mammography schedule or may be on a biennial schedule at the recommendation of their physician. Studies measuring multiple mammograms after an intervention need to allow adequate time to assess these contingencies.
We also found that the PP effect estimates from logistic regression were of greater magnitude than the corresponding effect estimates from Cox modeling. This finding is consistent with data from an occupational cohort and simulation study that compared mortality risk estimates using proportional hazards, Poisson, and logistic regression. In that study, Callas et al. (63
) found that logistic regression overestimates mortality risk associated with the exposure and is less precise than estimates based on proportional hazards models, which are considered to be the “gold” standard for prospective studies. Further, the extensive simulations of Callas et al. (63
) revealed that risk estimates differed most when the outcome (eg, mammography use) was common and when the relative risk was less than 2.0, circumstances that pertained to our data.
Our estimates based on Cox regression were also lower than those from the five other studies (21
) that evaluated tailored or personalized approaches and measured two postintervention mammograms, all of which used logistic regression to estimate intervention effects. However, when we analyzed our data using logistic regression and imposed similar restrictions on the study sample, our effect sizes increased and were within the range of estimates reported in those studies. Collectively, these findings raise a question about the appropriate analytic method to use in analyzing outcome data in cancer screening intervention trials. Trials that use logistic regression may overestimate favorable intervention effects in the target population. If so, our findings have important implications for the dissemination of cancer screening interventions that appear favorable based on the results of efficacy trials using analyses that ignore losses to follow-up.
Our study has several limitations. Because our study was designed to assess the effectiveness of the interventions under “real-world” conditions, we retained women in our ITT analyses for whom we did not have data on mammography status. Our decision to code women who did not respond, who refused to participate, or who were missing self-reported mammography data as having “no mammogram” is a “worst-case” imputation strategy (61
). However, the hazard rate ratios based on decreasingly conservative analyses were very similar to one another. Moreover, despite the artificially low rates of coverage and compliance produced by imputing zeros for missing mammography dates in the ITT analyses that were based on self-report, the ITT analyses based on VA records—in which mammography was ascertained independent of study participation—were corroborative.
Another limitation relates to the use of self-report to measure mammography status. There is some evidence that survey respondents may overreport socially desirable behaviors such as mammography screening when self-reports are compared with more objective data sources, such as medical records (68
). In a recent study by Paskett et al. (71
), women receiving the intervention were more likely than those in the control group to report mammograms that were not documented in their medical records. In our analyses of VA users, in which we compared medical record with self-report data, there was no consistent pattern suggestive of such bias in the hazard rate ratio estimates. In addition, volunteers willing to participate in health promotion and prevention trials may be a self-selected subgroup that is more likely than the target population to engage in health-related behaviors. For example, people who complete study questionnaires are more likely to undergo colorectal cancer screening than those who do not (72
). Such a phenomenon could explain the slight increases in hazard rate ratios we observed when comparing results from the ITT with the less conservative MITT and PP analyses based on VA records.
At the time our trial was initiated, there were no published studies that evaluated the effect of a tailored intervention on completion of two postintervention mammograms or that compared the effects of tailored and targeted approaches. Our hypothesis regarding mammography compliance (ie, the completion of two postintervention mammograms 6–15 months apart) was based on theory (36
) and on some empiric evidence from trials of interventions to promote one postintervention mammogram (57
). We expected that a tailored and targeted intervention would be more effective than one that was targeted-only and that both interventions would be more effective than a survey-only or no contact. After our trial began, findings were published from two other studies that used a three-group design, that delivered two rounds of intervention that varied the extent of personalization, and that measured completion of two postintervention mammograms (23
). Our findings are generally consistent with those studies. Rimer et al. (24
) found no statistically significant difference between either intervention condition (standard care plus a mailed tailored print booklet or standard care plus telephone counseling to address barriers) compared with standard care (patient reminder and physician prompt), whereas Lipkus et al. (23
) reported greater mammography use after telephone counseling than after standard care (multiple mailed reminders), but only in the first year of the trial. Collectively, these findings and ours provide little support for an additional benefit of using tailored interventions, either mailed or by telephone, to increase regular mammography screening.
We did not have adequate power to detect whether the modest intervention effects we observed were statistically significant. Crude coverage and compliance estimates showed favorable absolute differences between the control group and the two intervention groups of 1%–3% for ITT analysis, of 1%–5% for MITT analysis, and of 2%–6% for the PP analysis. The absolute differences that we observed were larger than those reported in a population-based mammography intervention trial that tested a direct mail strategy using a commercial mailing list to reach low-income women eligible for free screening (80
). That study reported statistically significant differences for the mail-only group (1.06%) and the mail plus incentive group (1.58%) compared with a control group (0.83%). Although most efficacy trials of cancer screening interventions aim to detect 10% or greater absolute between-group differences, there is no consensus about what constitutes an important difference from a public health perspective when an intervention is delivered at a population level.
Our findings suggest that a targeted-only intervention may be as effective as one that is both targeted and tailored, although there was limited support for either intervention being more effective than the baseline survey alone. We also found that using an analytic approach that adjusts for variable follow-up time produced more conservative (less favorable) intervention effect estimates.