Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2586667

Formats

Article sections

Authors

Related links

J Med Screen. Author manuscript; available in PMC 2008 November 25.

Published in final edited form as:

PMCID: PMC2586667

NIHMSID: NIHMS78849

Stuart G Baker, Stuart G Baker, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA;

Correspondence to: Stuart G Baker, Sc. D., National Cancer Institute, EPN 3131, 6130 Executive Blvd MSC 7354, Bethesda, MD 20892-7354, USA; Email: vog.hin@i61bs

The publisher's final edited version of this article is available at J Med Screen

See other articles in PMC that cite the published article.

Many cancer screening trials involve a screening programme of one or more screenings with follow-up after the last screening. Usually a maximum follow-up time is selected in advance. However, during the follow-up period there is an opportunity to report the results of the trial sooner than planned. Early reporting of results from a randomized screening trial is important because obtaining a valid result sooner translates into health benefits reaching the general population sooner. The health benefits are reduction in cancer deaths if screening is found to be beneficial and more screening is recommended, or avoidance of unnecessary biopsies, work-ups and morbidity if screening is not found to be beneficial and the rate of screening drops.

Our proposed method for deciding if results from a cancer screening trial should be reported earlier in the follow-up period is based on considerations involving postscreening noise. Postscreening noise (sometimes called dilution) refers to cancer deaths in the follow-up period that could not have been prevented by screening: (1) cancer deaths in the screened group that occurred after the last screening in subjects whose cancers were not detected during the screening program and (2) cancer deaths in the control group that occurred after the time of the last screening and whose cancers would not have been detected during the screening programme had they been randomized to screening (the number of which is unobserved). Because postscreening noise increases with follow-up after the last screening, we propose early reporting at the time during the follow-up period when postscreening noise first starts to overwhelm the estimated effect of screening as measured by a *z*-statistic. This leads to a confidence interval, adjusted for postscreening noise, that would not change substantially with additional follow-up. Details of the early reporting rule were refined by simulation, which also accounts for multiple looks.

For the re-analysis of the Health Insurance Plan trial for breast cancer screening and the Mayo Lung Project for lung cancer screening, estimates and confidence intervals for the effect of screening on cancer mortality were similar on early reporting and later.

The proposed early reporting rule for a cancer screening trial with post-screening follow-up is a promising method for making results from the trial available sooner, which translates into health benefits (reduction in cancer deaths or avoidance of unnecessary morbidity) reaching the population sooner.

The goal of cancer screening trials is to evaluate the effect of early detection of cancer coupled with early intervention on deaths from the cancer targeted by screening (which we call cancer deaths or cancer mortality). Cancer screening trials typically involve randomization of subjects to screening vs. no screening or usual care, with a long period of follow-up after the last screening. Particularly in recent years there has been interest in monitoring cancer screening trials for early reporting of results during the follow-up period.

If results from a randomized trial of cancer screening were reported early in the follow-up period and screening were found to have a benefit, a substantial number of lives could be saved prior to the originally scheduled end of follow-up. For example suppose the follow-up period were scheduled for 10 years and results were reported after only 6 years of follow-up when cancer screening was found to be beneficial. After 6 years of follow-up, persons in the general population could be influenced to start cancer screening. Otherwise an additional four years of follow-up would pass before persons in the general population could be influenced to start screening, and four years would be lost when screening could have had widespread benefits.

If results from a randomized trial of cancer screening were reported early and the upper bound of a 95% confidence interval for the difference in mortality rates indicated that the any possible benefit was probably small relative to harms (false-positives, unnecessary biopsies and overdiagnoses of non-life-threatening cancers), and the lower bound was less than zero, the current level of screening could be reduced before the originally scheduled end of follow-up, which would be a net benefit. For example, prostate-specific antigen (PSA) is widely used in some countries to screen for prostate cancer. Suppose that in reporting the results of a randomized trial of PSA screening the upper bound of the 95% confidence interval indicated a small reduction in prostate cancer mortality relative to harms, and the lower bound indicated an increase in prostate cancer mortality. Many persons would be then influenced to stop PSA screening or not to start PSA screening. The earlier this unfavourable confidence interval is reported, the sooner (1) persons currently undergoing PSA screening could stop receiving PSA screening and (2) persons contemplating PSA screening could avoid PSA screening. The key idea of early reporting during the follow-up period is that there is a time during the follow-up period after which the coverage of the confidence interval for the estimated difference in cancer mortality rates between randomization groups would not change substantially with further follow-up (discussed later). We developed an early reporting rule and refined it using a simulation that accounts for multiple looks. We then investigated the performance of the rule in randomized trials of breast and lung cancer screening in which there was no early reporting in the original designs and analyses.

As a prerequisite to the discussion of early reporting in cancer trials with follow-up after the last screening, we first discuss the analysis without early reporting. We follow the framework of Baker *et al.*^{1} but with an extension to allow randomization in different years. Let *t* = 1, 2, … *T* denote year since randomization, where *T* is the largest year corresponding to the end of follow-up. Let *g* denote randomization group, with *g* = 0 = assigned to no screening and *g* = 1 = assigned to screening. Let *x _{g}*(

$$d(t)\approx \sum _{i=1}^{t}S(i){h}_{0}(i)-\sum _{i=1}^{t}S(i){h}_{1}(i),\phantom{\rule{0.38889em}{0ex}}{h}_{g}(t)=\frac{{x}_{g}(t;T)}{{r}_{g}(t;T)}.$$

(1)

A modification is needed to adjust for non-compliance (when some subjects randomized to screening refuse screening) and contamination (when some subjects randomized to no screening receive screening).^{1} Based on the standard model for potential outcomes with all-or-none compliance,^{2}^{–}^{5} if non-compliance and contamination occur soon after randomization, the estimated effect of screening among those subjects who would only receive screening if offered (sometimes called the causal effect) is

$${d}_{\text{causal}}(t)=\frac{d(t)}{{f}_{1}-{f}_{0}},$$

(2)

where *f _{g}* is the fraction who receive screening in group

Another modification is needed to adjust for postscreening noise, which is sometimes called dilution. Postscreening noise refers to cancer deaths in the follow-up period that could not have been prevented by screening: (1) cancer deaths in the screened group that occurred after the last screening in subjects whose cancers were not detected during the screening program, and (2) cancer deaths in the control group that occurred after the time of the last screening and in subjects whose cancers would not have been detected during the screening program had they been randomized to screening. The number in (2) is not directly observed, but, in keeping with the theoretical underpinnings of randomization, the number in (1) would equal, on average, the number in (2). Failure to adjust for postscreening noise can yield incorrect confidence intervals for the estimated effect of screening.

Baker *et al.*^{1} proposed estimating the effect of screening while adjusting for postscreening noise by analysing the data in the year since randomization when postscreening noise first starts to mask the effect of screening. The key is to note that after some point in the follow-up period, the difference between cancer mortality rates should remain fairly constant as there is little residual effect of screening, and the cancer mortality rates will be increasing due to postscreening noise (which is also related to age). The year when postscreening noise first starts to mask the effect of screening is identified using a *z*-statistic, the estimated difference in cumulative cancer mortality rates divided by its standard error,

$$z(t)=\frac{d(t)}{\sqrt{vard(t)}},\phantom{\rule{0.38889em}{0ex}}vard(t)=\sum _{g=0}^{1}\sum _{i=1}^{t}\frac{{\{S(i)\}}^{2}{h}_{g}(i)}{{r}_{g}(i)}.$$

(3)

The estimated variance in the denominator of *z*(*t*) is computed under the assumption that cancer death rates follow a Poisson distribution. At some year after the last screening the *z*-statistic typically decreases because (1) postscreening noise increases the denominator of the *z*-statistic and (2) any effect of screening diminishes after the last screening, keeping the numerator relatively constant.

More formally, to mitigate the effect of postscreening noise, Baker *et al.*^{1} proposed estimating the effect of screening (with slightly different notation here) by *d*_{causal} (
${t}_{0}^{\ast};{X}_{T0}$), where
${t}_{0}^{\ast}$ denotes the year after randomization when the *z*-statistic reaches a maximum and *X _{T}*

To avoid the fallacy of cutpoint optimization, confidence intervals are computed by generating the data under a Poisson distribution *j* = 1, … *J* times. For the *j*th random data generation let *x _{gj}*(

We extend these ideas to early reporting of results during the follow-up period. Suppose a cancer screening trial is monitored in successive years during the follow-up period after screening has ended. Let *m* denote the monitoring year since the start of the study corresponding to the initial enrollment. Importantly *m* is also the maximum follow-up time since randomization for the current year of monitoring. Therefore *m* ≤ *T*. The value of *m* is directly related to chronological year of monitoring. For example if the first enrolment began in chronological year 1965 and the chronological year of monitoring year is 1970, which includes the data from 1965 to 1969, then *m* = 5, which also means the maximum follow-up is five years.

In this setting
${t}_{j}^{\ast}$ is called the year of analysis. When we discussed the analysis without early reporting,
${t}_{j}^{\ast}$ represented the year after randomization when the *z*-statistic reached a maximum (with ties going to the latest maximum). To be more conservative in accounting for chance fluctuations due to yearly monitoring, we also considered
${t}_{j}^{\ast}$ as the year the *z*-statistic reached a maximum (with ties going to the latest maximum) plus one. This latter modification would likely increase the comfort level of a Data and Safety Monitoring Committee considering whether to recommend early reporting of results. As shown later in the simulation, this conservative choice of year of analysis gives approximately correct confidence intervals.

The key idea behind early reporting in the follow-up period is that once postscreening noise starts to mask the effect of screening on cancer mortality, the year of analysis should not change with increased follow-up and hence estimates and confidence intervals should not change substantially with further follow-up.

For each monitoring year *m*, random data are generated according to a Poisson distribution with means equal to the observed counts. Let *X _{Tj}* denote the data

$$F(m)=100\sum _{j=1}^{J}\frac{I({t}_{j}^{\ast},m;{X}_{mj})}{J}.$$

(4)

Our early reporting rule is to report results if *F*(*m*) > *F*_{target}, where *F*_{target} is a target value determined by simulation (discussed later).

The following bootstrap approach is used to obtain estimates and 95% confidence intervals for the effect of screening on cancer mortality at the year of early reporting. Let *d*_{causal} (
${t}_{j}^{\ast},m;{X}_{mj}$) denote the estimated causal effect of screening in monitoring year *m* since the trial began for the *j*th random generation of data. If results are reported at monitoring year *m*, the effect of screening on the reduction in cancer mortality rate is estimated by

$$\mathit{dif}(m)=\sum _{j=1}^{J}\frac{{d}_{\text{causal}}({t}_{j}^{\ast},m;{X}_{mj})}{J},$$

(5)

with estimated standard error,

$$\begin{array}{c}\text{se}\phantom{\rule{0.16667em}{0ex}}\mathit{dif}(m)=\sqrt{var\mathit{dif}(m)},\\ var\mathit{dif}(m)=\sum _{j=1}^{J}\frac{{\{{d}_{\text{causal}}({t}_{j}^{\ast},m;{X}_{mj})-\mathit{dif}(m)\}}^{2}}{J}.\end{array}$$

(6)

A bootstrap 95% confidence interval is

$$(\mathit{dif}(m)+1.96\text{se}\phantom{\rule{0.16667em}{0ex}}\mathit{dif}(m),\mathit{dif}(m)-1.96\text{se}\phantom{\rule{0.16667em}{0ex}}\mathit{dif}(m)).$$

(7)

To minimize the impact of chance fluctuations in the observed data, the average over random data generations, *dif*(*m*), is used in Equation (7) instead of the observed estimate, *d*_{causal} (
${t}_{0}^{\ast},m;{X}_{m0}$). Note that the same random generations of data, *X _{mj}*, are used to compute

$${t}_{\text{avg}}^{\ast}=\sum _{j=1}^{J}\frac{{t}_{j}^{\ast}}{J},$$

(8)

which is the average over the random generations of data.

We considered a wide range of scenarios depicted in Figure 1 in which screening ended at year 3 after randomization. (The number of years of screening is not relevant if the total number of cancer deaths during the screening period is specified). In all the scenarios, the yearly number of cancer deaths reached a constant to reflect postscreening noise. The large-effect scenario corresponded to a roughly 33% reduction in cancer deaths during the follow-up period before the period of constant postscreening noise (which started at different times). The moderate-effect scenario corresponded to a roughly 22% reduction in cancer mortality in same period. We also considered scenarios in which the number of deaths in the follow-up period was doubled for moderate and large-effect scenarios.

Plots of yearly deaths since year of randomization (screening ends at year 3) for various simulation scenarios (solid line is controls; dashed line is screened group)

From each set of possible ‘true’ counts (namely the counts of cancer deaths in the two groups over the planned follow-up period) plotted in Figure 1, data for 1000 simulated trials were randomly generated according to a Poisson distribution with means equal to the true counts. For each simulated trial, we computed *J* = 20 random generations of data, from which we computed *F*(*m*) using Equation (4). If *F*(*m*) was greater than or equal to *F*_{target}, we selected *m* as the year to report results and computed the bootstrap 95% confidence interval using Equation (7). Otherwise, if *F*(*m*) was less than *F*_{target}, we incremented the year of monitoring by one and repeated the same type of calculation but with an additional year’s data, thus mimicking the process of multiple looks.

We computed the coverage, namely the fraction of the simulated trials that the 95% confidence interval for each trial enclosed the true difference, where the true difference was computed as *d*_{causal} (
${t}_{0}^{\ast},T;{X}_{T0}$), where *X _{T}*

For year of analysis equal to the maximum *z*-statistic we found that the coverages of 95% confidence intervals ranged from 85–93%, 72–92%, and 53–94% for *F*_{target} = 90%, 60%, 30%, respectively. These coverages were too low to recommend this statistic.

In contrast, for year of analysis equal to the maximum *z*-statistic plus one, we found that the coverage of 95% confidence intervals ranged 91–96%, 90–94% and 84–95% for *F*_{target} = 90%, 60% and 30%, respectively. Among these values, the smallest *F*_{target} that yielded a reasonable coverage was 60%. Thus we concluded that a reasonable rule for early reporting has *F*_{target} = 60% when year of analysis equals the maximum *z*-statistic plus one

Sometimes there is interest in the estimating the relative risk (RR) in addition to the risk difference. The causal estimate for RR^{5} requires additional data. Let *c* denote the compliance status soon after randomization, with *c* = 1 denoting receipt of screening and *c* = 0 denoting refusal of screening. Let *w _{gc}*(

$${\text{RR}}_{\text{causal}}(t)=\frac{{p}_{11}(t)-{p}_{01}(t)}{{p}_{00}(t)-{p}_{10}(t)},\phantom{\rule{0.38889em}{0ex}}{p}_{gc}(t)=\sum _{i=1}^{t}S(i)\frac{{w}_{gc}(t)}{{r}_{g}(t)}.$$

(9)

Let *W _{mj}* denote the data

We applied the early reporting rule (*F*_{target} = 60% and year of analysis equal to the maximum of *z*-statistic plus one) to data from two randomized cancer screening trials in which there was no early reporting and one arm was offered screening while the control arm received no planned screening per protocol. Our goal was to compare two estimates of the causal effect of screening and 95% confidence interval, one based on early reporting and one based on the last year of follow-up.

For all the calculations, we used *J* = 20 random data generations to compute bootstrap confidence intervals and approximated the probability of overall survival during the course of the trial by *S*(*t*) = 1. For reporting the results we omit the *m* in writing *F*(*m*) and *dif*(*m*), with the understanding that *m* corresponds to the difference between the chronological year and year of first enrolment.

In the Health Insurance Plan (HIP) study of breast cancer screening, about 60,000 women were assigned on alternate days to either an offer of four annual breast cancer screenings (with 2/3 accepting the offer) or to a control group that received no screening.^{7} In the years from 1964 to 1966 the numbers entering the study were 22036, 27742 and 10918, respectively, which were approximated equally allocated to the two randomization groups. We analysed the data in successive years of follow-up from 1969 to 1976 (Table 1, *F*Figure 2). Early reporting was recommended in 1971 ( = 70%) and the estimated reduction in breast cancer mortality due to screening was *dif* = 19 per 10,000 with 95% confidence interval of (9, 29) per 10,000. In comparison, in our last year of follow-up data, 1976, the estimated reduction in cancer mortality due to screening was similar, *dif* = 22 per 10,000 with 95% confidence interval of (9, 34) per 10,000. Based on a separate set of randomly generated data (because additional random generations were needed), in 1971 (*F* = 100%), the estimated RR was 0.71 with 95% confidence interval of (0.57, 0.85) and in the last year 1976, the estimated RR was 0.72 with 95% confidence interval of (0.56, 0.89). The reason for the similar results in 1971 and 1976 is that the average year of analysis was similar, 6.3 for 1971 and 7.0 for 1976. (A naïve analysis in 1976 that did not adjust for postscreening noise would have given different and incorrect results).

Early reporting the Health Insurance Plan breast cancer screening trial. Vertical dashed line indicates year of last screening; ‘dif’ refers to the average estimated difference in cancer mortality between groups at year of analysis with **...**

In the Mayo Lung Project about 9200 male heavy smokers who tested negative on an initial screening were randomized to either radiological and sputum cytology screening examinations every four months for six years (with only 7% not screened) or a control group that received only an initial recommendation for annual chest X-rays.^{8} In the years from 1972 to 1976 the numbers entering the study were 1603,1586, 2733, 2154 and 1135, respectively, which were approximated equally allocated to the two randomization groups. We analysed the data for successive years of follow-up from 1979 to 1984 (Table 2, *F*Figure 3). Early reporting was recommended in 1982 ( = 85%) with an estimated reduction in lung cancer mortality due to screening of *dif* = −39 per 10,000 with 95% confidence interval of (−110, 32) per 10,000. In comparison, in our last year of follow-up data, 1984, the estimated reduction in lung cancer mortality due to screening was *dif* = −35 per 10,000 with 95% confidence interval of (−136, 67) per 10,000, which also indicates no effect of screening on cancer mortality. (A negative number indicates more lung cancer deaths in the screened arm than the control arm.) The fact that *F* was smaller for 1983 and 1984 than 1982 probably reflects the variability in the maximum of the *z*-statistic when it is near zero. Based on a separate set of randomly generated data (because additional random generations were needed), in 1982 (*F* = 70%), the estimated RR was 1.03 with 95% confidence interval of (0.99, 1.07) and in the last year 1984, the estimated relative risk was 1.03 with 95% confidence interval of (0.97, 1.09). The reason for the similar results in 1982 and 1984 is that the average year of analysis was similar, 9.1 for 1982 and 10.0 for 1984.

Early reporting for the Mayo Lung Project trial involving lung cancel screening. Vertical dashed line indicates year of last screening ‘dif’ refers to the average estimated difference in cancer mortality between groups at year of analysis **...**

Because the variability associated with postscreening noise can mask the estimated effect of screening, an early reporting rule based on postscreening noise is sensible. Postscreening noise was measured via a *z*-statistic. To relate the *z*-statistic to early reporting, a simulation was conducted that also incorporated multiple looks at the data. The simulation let us select the early reporting rules that yielded appropriate coverages of the confidence intervals for the estimated effect of screening on cancer mortality. Because of the time needed for the computer to do the calculations, it was not possible to perform a large number of simulations or random generations of data within each simulation.

Risk difference is a useful statistic for reporting results of cancer screening studies because it easily allows weighing of costs and benefits. However, RR is often reported as well. Both the risk difference and RR have the desirable property (not shared by the odds ratio) that when applying the estimates to a different population there is no bias from an unobserved covariate that does not interact with treatment in its effect on outcome.^{9} We used simple formulas for the estimated risk difference or RR for subjects who would only receive screening if offered. These formulas can be viewed as approximations to formulas that incorporate estimated mortality rates from competing risks.^{10}

In summary, the proposed early reporting rule is designed to give correct estimates and approximately correct confidence intervals for the effect of screening on cancer mortality once postscreening noise starts to have a large impact on estimation. The method is advantageous from a public policy standpoint because it can shorten the time until reporting of results, which translates into health benefits reaching the population sooner. It has been validated using data from the HIP study of breast cancer screening and the Mayo Lung Project.

We thank Diane Erwin for pre-processing the data.

Recall that *m* is the monitoring year since the start of the study corresponding to the initial enrolment. Let *n _{gi}* denote the number of subjects in randomized group

$${r}_{g}(t;m)=\{\begin{array}{ll}{n}_{g1}+{n}_{g2}+{n}_{g3}+\dots {n}_{gk},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}t\le m-k+1,\hfill \\ \vdots \hfill & \phantom{\rule{0.16667em}{0ex}}\hfill \\ {n}_{g1}+{n}_{g2}+{n}_{g3},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}t=m-2,\hfill \\ {n}_{g1}+{n}_{g2},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}t=m-1,\hfill \\ {n}_{g1},\hfill & \text{if}\phantom{\rule{0.16667em}{0ex}}t=m.\hfill \end{array}$$

For example in the HIP study, in the years from 1964 to 1966 the numbers entering the study were 22,036, 27,742 and 10,918, approximately equally allocated to each group. Therefore at monitoring year *m* = 5, the number at risk in each group was (22,036 + 27,742 + 10,918)/2 for *t* = 1, 2, 3; (22,036 + 27,742)/2 for *t* = 4; 22,036/2 for *t* = 5.

Stuart G Baker, Stuart G Baker, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA.

Barnett S Kramer, Barnett S Kramer, Office of Disease Prevention, National Institutes of Health, Bethesda, MD, USA.

Philip C Prorok, Philip C Prorok, Division of Cancer Prevention, National Cancer Institute, Bethesda, MD, USA.

1. Baker SG, Kramer BS, Prorok PC. Statistical issues in randomized trials of cancer screening. BMC Med Res Methodol. 2002;2:11. [PMC free article] [PubMed]

2. Baker SG, Lindeman KS. The paired availability design: a proposal for evaluating epidural analgesia during labor. Stat Med. 1994;13:2269–78. [PubMed]

3. Angrist JD, Imbens GW, Rubin DB. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;92:444–55.

4. Cuzick J, Edwards R, Segnan N. Adjusting for non-compliance and contamination in randomized clinical trials. Stat Med. 1997;16:1017–29. [PubMed]

5. Baker SG, Kramer BS. Simple maximum likelihood estimates of efficacy in randomized trials and before-and-after studies, with implications for meta-analysis. Stat Methods Med Res. 2005;14:1–19. Correction (vol. 14, pg. 349, 2005) [PubMed]

6. Altman DG, Lausen B, Sauerbrei W, Schumacher M. Dangers of using ‘optimal’ cutpoints in the evaluation of prognostic factors. J Natl Cancer Inst. 1994;86:829–35. [PubMed]

7. Shapiro S, Venet W, Strax P, Venet L. Periodic Screening for Breast Cancer, The Health Insurance Plan Project and Its Sequelae, 1963–1986. Baltimore: Johns Hopkins University Press; 1988.

8. Marcus PM, Bergstralh EJ, Fagerstrom RM, Williams DE, Fontana R, Taylor WF, Prorok PC. Lung cancer mortality in the Mayo Lung Project: impact of extended follow-up. J Natl Cancer Inst. 2000;92:1308–16. [PubMed]

9. Baker SG, Kramer BS. Randomized trials, generalizability, and meta-analysis: graphical insights for binary outcomes. BMC Med Res Method. 2003;3:10. [PMC free article] [PubMed]

10. Baker SG. Analysis of survival data from a randomized trial with all-or-none compliance: estimating the cost-effectiveness of a cancer screening program. J Am Stat Assoc. 1998;93:929–34.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |