Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Epidemiology. Author manuscript; available in PMC 2013 August 1.
Published in final edited form as:
PMCID: PMC3731075

Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease



The Women’s Health Initiative randomized trial found greater coronary heart disease (CHD) risk in women assigned to estrogen/progestin therapy than in those assigned to placebo. Observational studies had previously suggested reduced CHD risk in hormone users.


Using data from the observational Nurses’ Health Study, we emulated the design and intention-to-treat (ITT) analysis of the randomized trial. The observational study was conceptualized as a sequence of “trials” in which eligible women were classified as initiators or noninitiators of estrogen/progestin therapy.


The ITT hazard ratios (95% confidence intervals) of CHD for initiators versus noninitiators were 1.42 (0.92 – 2.20) for the first 2 years, and 0.96 (0.78 – 1.18) for the entire follow-up. The ITT hazard ratios were 0.84 (0.61 – 1.14) in women within 10 years of menopause, and 1.12 (0.84 – 1.48) in the others (P value for interaction = 0.08). These ITT estimates are similar to those from the Women’s Health Initiative. Because the ITT approach causes severe treatment misclassification, we also estimated adherence-adjusted effects by inverse probability weighting. The hazard ratios were 1.61 (0.97 – 2.66) for the first 2 years, and 0.98 (0.66 – 1.49) for the entire follow-up. The hazard ratios were 0.54 (0.19 – 1.51) in women within 10 years after menopause, and 1.20 (0.78 – 1.84) in others (P value for interaction = 0.01). Finally, we also present comparisons between these estimates and previously reported NHS estimates.


Our findings suggest that the discrepancies between the Women’s Health Initiative and Nurses’ Health Study ITT estimates could be largely explained by differences in the distribution of time since menopause and length of follow-up.

Causal inferences are drawn from both randomized experiments and observational studies. When estimates from both types of studies are available, it is reassuring to find that they are often similar.13 On the other hand, when randomized and observational estimates disagree, it is tempting to attribute the differences to the lack of random treatment assignment in observational studies.

This lack of randomization makes observational effect estimates vulnerable to confounding bias due to the different prognosis of individuals between treatment groups. The potential for confounding may diminish the enthusiasm for other desirable features of observational studies compared with randomized experiments – greater timeliness, less restrictive eligibility criteria, longer follow-up, and lower cost. However, even though randomization is the defining difference between randomized experiments and observational studies, further differences in both design and analysis are commonplace. As a consequence, observational-randomized discrepancies cannot be automatically attributed to randomization itself.

In this paper we assess the extent to which differences other than randomization contribute to discrepant observational versus randomized effect estimates in the well-known example of postmenopausal estrogen plus progestin therapy and the risk of coronary heart disease (CHD). Specifically, we explore discrepancies attributable to different distributions of time since menopause, length of follow-up, and analytic approach.

The published findings on this topic can be briefly summarized as follows. Large observational studies suggested a reduced risk of CHD among postmenopausal hormone users. Two of the largest observational studies were based on the Nurses’ Health Study (NHS)4, 5 in the United States and on the General Practice Research Database6 in the United Kingdom. More recently, the Women’s Health Initiative (WHI) randomized trial7 found a greater incidence of coronary heart disease among postmenopausal women in the estrogen plus progestin arm than in the placebo arm (68% greater in the first two years after initiation, 24% greater after an average of 5.6 years).8, 9

The present paper does not address the complex clinical and public health issues related to hormone therapy, including risk-benefit considerations. Rather, we focus on methodologic issues in the analysis of observational cohort studies. Specifically, we reanalyze the NHS observational data to yield effect estimates of hormone therapy that are directly comparable with those of the randomized WHI trial except for the fact that hormone therapy was not randomly assigned in the NHS. We do this by mimicking the design of the randomized trial as closely as possible in the NHS. As explained below, our approach requires conceptualizing the observational NHS cohort as if it were a sequence of nonrandomized “trials.” Because the randomized trial data were analyzed under the intention-to-treat (ITT) principle, we analyze our NHS “trials” using an observational analog of ITT (see below).

A recent re-analysis of the General Practice Research Database using this strategy could not adjust for lifestyle factors and it yielded wide confidence intervals.10 Further, the estrogen used by women in that study was not the conjugated equine estrogen used by the women in the NHS and WHI studies. Our analysis of the NHS data incorporates lifestyle factors and includes women using the same type of estrogen as in the WHI randomized trial.


The observational cohort as a nonrandomized “trial”

The NHS cohort was established in 1976 and comprised 121,700 female registered nurses from 11 U.S. states, aged 30 to 55 years. Participants have received biennial questionnaires to update information on use, duration (1–4, 5–9, 10–14, 15–19, 20–24 months), and type of hormone therapy during the two-year interval. Common use of oral estrogen plus progestin therapy among NHS participants began in the period between the 1982 and the 1984 questionnaires. The questionnaires also record information on potential risk factors for and occurrence of major medical events, including CHD (nonfatal myocardial infarction or fatal coronary disease). The process for confirming CHD endpoints has been described in detail elsewhere.4

We mimicked the WHI trial by restricting the study population to postmenopausal women who in the 1982 questionnaire had reported no use of any hormone therapy during the prior 2-year period (“washout” period), and in the 1984 questionnaire reported either use of oral estrogen plus progestin therapy (“initiators”) or no use of any hormone therapy (“noninitiators”) during the prior 2-year period. Thus, as in the WHI, the “initiator” group includes both first-time users of hormone therapy and re-initiators (who stopped hormone therapy in 1980 or earlier and then re-initiated use in the period 1982–1984).

Women were followed from the start of follow-up to diagnosis of CHD, death, loss to follow-up, or June 2000, whichever occurred first. Unlike in the randomized WHI and the observational General Practice Research Database, the time of therapy initiation - and thus the time of start of follow-up for initiators – was not known with precision in the NHS, and so we needed to estimate it. For women who reported hormone therapy initiation during the 2-year period before the 1984 questionnaire and were still using it at the time they completed this questionnaire, the start of follow-up was estimated as the month of return of the baseline questionnaire minus the duration of hormone therapy use. (Duration is reported as an interval, e.g., 20–24 months; we used the upper limit of the interval, e.g., 24 months). For women who reported starting hormone therapy during the same two-year period but had stopped using it by the time they returned the 1984 questionnaire, the start of follow-up was estimated as the first month of the two-year period (the earliest possible month of initiation). The start of follow-up for noninitiators was estimated as the average month of start of follow-up among initiators (stratified by age and past use of hormone therapy). Alternative methods to estimate the start of follow-up had little effect on our estimates (see appendix A1).

To further mimic the WHI, we restricted the study population to women who, before the start of follow-up, had a uterus, no past diagnosis of cancer (except nonmelanoma skin cancer) or acute myocardial infarction, and no diagnosis of stroke since the return of the previous questionnaire. To enable adjustment for dietary factors, we restricted the population to women who had reported plausible energy intakes (2,510–14,640 kJ/d) and had left fewer than 10 of 61 food items blank on the most recent food frequency questionnaire before the 1984 questionnaire.

The NHS cohort study can now be viewed as a non-randomized, non-blinded “trial” that mimics the eligibility criteria, definition of start of follow-up, and treatment arms (initiators vs. noninitiators) of the WHI randomized trial, but with a different distribution of baseline risk factors (e.g., lower age and shorter time since menopause in the NHS compared with the WHI). We analyzed the NHS nonrandomized “trial” by comparing the CHD risk of initiators and noninitiators regardless of whether these women subsequently stopped or initiated therapy. Thus our analytic approach is the observational equivalent of the ITT principle that guided the main analysis of the WHI trial. Specifically, we estimated the average hazard (rate) ratio (HR) of CHD in initiators versus noninitiators, and its 95% confidence interval (CI), by fitting a Cox proportional hazards model with “time since beginning of follow-up” as the time variable that included a non time-varying indicator for hormone therapy initiation. The Cox model was stratified on age (in 5-year intervals) and history of use of hormone therapy (yes, no).

To obtain valid effect estimates in a nonrandomized trial, all baseline confounders have to be appropriately measured and adjusted for in the analysis. We proceeded as if this condition was at least approximately true in the NHS nonrandomized “trial” once we added the following covariates to the Cox model: parental history of myocardial infarction before age 60 (yes, no), education (graduate degree: yes, no), husband’s education (less than high school, high school graduate, college, graduate school), ethnicity (non-Hispanic white, other), age at menopause (<50, 50–53, >53), calendar time, high cholesterol (yes, no), high blood pressure (yes, no), diabetes (yes, no), angina (yes, no), stroke (yes, no), coronary revascularization (yes, no), osteoporosis (yes, no), body mass index (<23, 23–<25, 25–<30, ≥30), cigarette smoking (never, past, current 1–14 cigarettes/d, current 15–24 cigarettes/d, current 25+ cigarettes/d,), aspirin use (nonuse, 1–4 yrs, 5–10 yrs, >10 yrs), alcohol intake (0, >0–<5, 5–<10, 10–<15, ≥ 15 g/d), physical activity (6 categories), diet score (quintiles),11 multivitamin use (yes, no), and fruit and vegetable intake (<3, 3–<5, 5–<10, ≥ 10 servings/day). When available, we simultaneously adjusted for the reported value of each variable on both the 1982 and 1980 questionnaires.

The observational cohort as a sequence of nonrandomized “nested trials”

The approach described above would produce very imprecise ITT estimates if (as was the case) few women were initiators during the 1982–1984 period. However, our choice of this period was arbitrary. The approach described above can produce an additional NHS nonrandomized “trial” when applied to each of the eight two-year periods between 1982–1984 and 1996–1998. Thus, as a strategy to increase the efficiency of our ITT estimate, we conducted 7 additional nonrandomized “trials” starting at each subsequent questionnaire (1986, 1988, …., 1998), and pooled all 8 “trials” into a single analysis. Because some women participated in more than one of these NHS “trials” (up to a maximum of 8), we used the robust variance estimator to account for within-person correlation. We assessed the potential heterogeneity of the ITT effect estimates across “trials” by two Wald tests: first, we estimated a separate parameter for therapy initiation in each “trial” and tested for heterogeneity of the parameters (chi-square; 6 df), and then we calculated a product term (for the indicators of “trial” and therapy initiation), testing for whether the product term was different from zero (chi-square; 1 df).

In each “trial,” we used the corresponding questionnaire information to apply the eligibility criteria at the start of follow-up, and to define initiators and noninitiators. We then estimated the CHD average hazard ratio in initiators versus noninitiators (adjusted for the values of covariates reported in the two previous questionnaires), regardless of whether these women subsequently stopped or initiated therapy. To allow for the possibility that the hazard ratio varied with time since baseline, we added product terms between time of follow-up (linear and quadratic terms) and initiation status to a pooled logistic model that approximated our previous Cox model. We then used the fitted model to estimate CHD-free survival curves for initiators and noninitiators.

At each trial’s start of follow-up, women with a CHD diagnosis within the last two years are not eligible for that “trial.” Thus, the subset of women considered for eligibility in each “trial” is approximately nested in the subset of women who were considered for eligibility in the prior “trial.” Our conceptualization of an observational study with a time-varying treatment as a sequence of nested trials, each with non-time-varying treatment, is a special case of g-estimation of nested structural models.12

Several lines of evidence suggest a modification of the effect of hormone therapy by time of initiation.13 We therefore conducted stratified analysis by time since menopause (<10, ≥ 10 years) and age (<60, ≥ 60 yrs).

Adherence-adjusted effect estimates

Because the primary analysis of the WHI randomized trial was conducted under the ITT principle, we analyzed our NHS “trials” using an observational analog of ITT in order to compare the NHS with the WHI estimates. However, ITT estimates are problematic because the magnitude of the ITT effect varies with the proportion of subjects who adhere to the assigned treatment, and thus ITT comparisons can underestimate the effect that would have been observed if everyone had adhered to the assigned treatment. Thus ITT effect estimates may be unsatisfactory when studying the efficacy, and inappropriate when studying the safety, of an active treatment compared with no treatment. An alternative to the ITT effect is the effect that would have been observed if everyone had remained on her initial treatment throughout the follow-up, which we refer to as an adherence-adjusted effect. Adherence-adjusted effect estimates can be obtained in both randomized experiments and observational studies by using g-estimation14, 15 or inverse probability weighting.

We used inverse probability weighting to estimate the adherence-adjusted hazard ratio of CHD. In each NHS “trial” we censored women when they discontinued their baseline treatment (either hormone therapy or no hormone therapy), and then weighted the uncensored women-months by the inverse of their estimated probability of remaining uncensored until that month.16 To estimate trial-specific probabilities for each woman, we fit a pooled logistic model for the probability of remaining on the baseline treatment through a given month. The model included the baseline covariates used in the trial-specific Cox models described above, and the most recent post-baseline values of the same covariates. Inclusion of time-dependent covariates is necessary to adjust for any dependence between noncompliance and CHD within levels of baseline covariates. We fit separate models for initiators and noninitiators. In each “trial,” each woman contributed as many observations to the model as the number of months she was on her baseline therapy.

To stabilize the inverse probability weights, we multiplied the weights by the probability of censoring given the “trial”-specific baseline values of the covariates. Weight stabilization improves precision by helping to reduce random variability. If the true adherence-adjusted hazard ratio is constant over time, this method produces valid estimates provided that discontinuing the baseline treatment is unrelated to unmeasured risk factors for CHD incidence within levels of the covariates, and that the logistic model used to estimate the inverse probability weights is correctly specified. When the adherence-adjusted hazard ratio changes with time since baseline, this method estimates a weighted average adherence-adjusted hazard ratio with time-specific weights proportional to the number of uncensored CHD events occurring at each time. Thus, with heavy censoring due to lack of adherence, the early years of follow-up contribute relatively more weight than would be the case without censoring. To more appropriately adjust for a time-varying hazard ratio, we also fit an inverse probability weighted Cox model (approximated through a weighted pooled logistic model) that included product terms between time of follow-up (linear and quadratic terms) and initiation status. We then used the weighted model to estimate adherence-adjusted CHD-free survival curves for initiators and non-initiators.

We also present additional subsidiary analyses to explain the relation between our estimates and previously reported NHS estimates, which can be regarded as estimates of the adherence-adjusted hazard ratio using an alternative to our inverse probability weighting approach.


The NHS nonrandomized “trials”

Of the 101,819 NHS participants alive and without a history of cancer, heart disease, or stroke in 1984, 81,073 had diet information and, of these, 77,794 were postmenopausal at some time during the follow-up. We excluded 14,764 women who received a form of hormone therapy other than oral estrogen plus progestin in all of the NHS “trials,” or did not provide information on the type of hormone therapy in any of the “trials.” Of the remaining 63,030 women, we excluded 17,146 who received hormone therapy in the two years before the baseline of all the “trials.” Of the remaining 45,884 women, we excluded 11,309 who did not have an intact uterus in 1984. Thus 34,575 women met our eligibility criteria for at least one NHS “trial.” Of these women, 1,035 had a CHD event, 2,596 died from other causes or were lost to follow-up, and 30,944 reached June 2000 free of CHD. Figure 1 shows the distribution of women by number of “trials” in which they participated. Table 1 shows the number of participants, initiators, and CHD events per “trial.” Table 2 shows the distribution of baseline characteristics in initiators and noninitiators.

Figure 1
Distribution of eligible women by number of Nurses’ Health Study “trials” of hormone therapy initiation in which they participated
Table 1
Number of participants, therapy initiators, and CHD events in each NHS “trial” to estimate the intention-to-treat effect of initiation of estrogen/progestin therapy
Table 2
Baseline characteristics of initiators and noninitiators of estrogen/progestin therapy in the NHS “trials”

ITT estimates of the effect of hormone therapy on CHD

The estimated average hazard ratio of CHD for initiators versus noninitiators was 0.96 (95% CI = 0.78–1.18) when the entire follow-up time was included in the analysis (Table 3). The HR was 1.83 (1.05–3.17) when the analysis was restricted to the first year of follow-up, 1.42 (0.92–2.20) for the first two years, 1.11 (0.84–1.47) for the first five years, and 1.00 (0.78–1.28) for the first eight years. Equivalently, the HR was 0.96 (0.66–1.39) during years 2–5, 0.81 (0.51–1.28) during years 5–8, and 0.87 (0.58–1.30) after year 8. We did not find a strong indication of heterogeneity across “trials” (Wald tests P-values 0.24 and 0.15 for the overall HR). Figure 2A shows that the estimated proportion of women free of CHD during the first five years of follow-up was lower in initiators of estrogen plus progestin therapy than in noninitiators of hormone therapy. By year eight, however, this proportion was greater in initiators.

Figure 2
Proportion of women free of CHD by baseline treatment group in the Nurses’ Health Study “trials”
Table 3
Estimates of the intention-to-treat effect of initiation of estrogen/progestin therapy on the incidence of CHD events in the NHS “trials”

We next examined effect modification, stratifying our ITT estimates by age, time since menopause, and period of follow-up (Table 3). The HR was 0.84 (CI = 0.61–1.14) in women within 10 years of menopause at baseline, and 1.12 (0.84–1.48) in the others (86 % of initiators in this latter group initiated therapy 10 to 20 years after menopause). Similarly, the HRs were 0.86 (0.65–1.14) in women under age 60 at baseline, and 1.15 (0.85–1.57) in the others. Figures 2B and 2C show the estimated proportion of women free of CHD by initiator status and time since menopause. The p-value from a log-rank test for the equality of the survival curves was 0.70 for the entire population, 0.27 for women within 10 years of menopause, and 0.43 for the others.

When we repeated the analyses with no past use of hormone therapy as an additional eligibility criterion (26,797 eligible women, 767 CHD events), the HR was 0.79 (CI = 0.60–1.03) for the entire follow-up and 1.49 (0.88–2.54) in the first two years (Table 4). The HR was 0.66 (0.44–0.98) in women within 10 years of menopause at baseline, and 1.02 (0.70–1.50) in the others. The appendix includes additional analyses to document the generally small sensitivity of the results regarding the assignment of the month of therapy initiation (A1), the inclusion of women under age 50 (A2), the exclusion of women who died between the start of follow-up and the return of the baseline questionnaire (A3), the adjustment for confounding by covariates in the proportional hazards model rather than by propensity score methods (A4), and the assumption of possible unmeasured confounding for therapy discontinuation (A5).

Table 4
Estimates of the intention-to-treat effect of initiation of estrogen/progestin therapy on the incidence of CHD events among women with no history of hormone use in the NHS “trials”

Adherence-adjusted effect estimates

Figure 3 show the adherence in initiators and noninitiators. The estimated inverse probability weights had mean 1.02 (range = 0.02–30.7) in initiators, and 1.00 (0.17–19.3) in noninitiators. The inverse probability weighted HRs were 0.98 (CI = 0.66–1.49) for the entire follow-up, 1.53 (0.80–2.95) for the first year, 1.61 (0.97–2.66) for the first two years, 1.14 (0.74–1.76) for the first five years, and 0.99 (0.66–1.50) for the first eight years. The HR was 0.65 (0.30–1.38) during years 2–5, 0.47 (0.14–1.58), during years 5–8, and 0.85 (0.22–3.19) after year 8. The large standard errors that increase with time reflect the fact that few women continued on hormone therapy for long periods. We also examined the effect modification by age and time since menopause (Table 5). Figure 4 shows the estimated adherence-adjusted proportions of women free of CHD. The p-value from a log-rank test for the equality of the survival curves was 0.91 for the entire population, 0.24 for women within 10 years after menopause, and 0.40 for the others.

Figure 3
Proportion of women who adhered to their baseline treatment in the Nurses’ Health Study “trials”
Figure 4
Proportion of women free of CHD under full adherence with the baseline treatment in the Nurses’ Health Study “trials”
Table 5
Estimates of the (adherence-adjusted) effect of continuous estrogen/progestin therapy versus no hormone therapy on the incidence of CHD events in the NHS “trials”

Comparison of ITT estimates with previous NHS estimates

The HR estimate of 0.96 from our ITT analysis is not directly comparable with the HR estimate of 0.68 (0.55 – 0.83) for current users versus never users of estrogen plus progestin reported in the most recent NHS publication.17 The 0.68 estimate can be interpreted as an adherence-adjusted effect estimate, in which incomplete adherence has been adjusted not by inverse probability weighting but by a comparison of current versus never users. This approach is used in many large observational cohorts, including the NHS (see Discussion for details). Table 6 shows the cumulative steps that link our estimates in Table 3 with the previously reported NHS estimate. These steps involve changes in the start of follow-up, the definition of the exposed and unexposed group, the covariates used for adjustment, and eligibility criteria.

Table 6
Comparison of several alternative hazard ratio estimates with the previously reported estimate from the NHS (column vi, row 0–24 months). See main text for a description of each estimate.

Column (i) of Table 6 shows the estimates when (as in previous NHS analyses) the start of follow-up of each “trial” was defined as the date of return of the questionnaire. When the start of follow-up is defined in this way, the selected group of initiators differs from the initiator group in Table 3 because it does not include women who, during the 2 year-interval before the baseline questionnaire, either initiated and stopped hormone therapy or survived a CHD event occurring after initiation. As in Table 3, we provide separate HR estimates for the entire follow-up (0.84), the first two years of follow-up (0.98), and the period after the first two years (0.80).

Second, we varied the definition of the user and nonuser groups in three steps as shown in the next three columns of Table 6. In column (ii) we eliminated our “trial”-specific criterion of no therapy in the two years before baseline for initiators; that is, we compared current users with noninitiators. In column (iii) we eliminated our “trial”-specific criterion of no therapy in the two years before baseline for all women; that is, we compared current users with current nonusers. In column (iv) we used as the comparison group the subset of nonusers with no history of hormone therapy use; that is, we compared current users with never users as in previous NHS analyses. The HR estimates for columns (ii), (iii), (iv) were, respectively, 0.84, 0.86, 0.85 for the entire follow-up, 0.77, 0.77, 0.74 for 0–24 months, and 0.87, 0.90, 0.90 for >24 months.

To explain why the number of exposed cases (n = 319) in columns (ii)–(iv) far exceeds the number (n = 66) in column (i), consider a woman who is continuously on hormone therapy from 1982–1984 until she dies of CHD just before the end of follow-up in 2000. In the analysis of column (i), this woman participates as an exposed CHD case in the first (1984) “trial” only. In contrast, in the analyses of columns (ii)–(iv), the same woman participates as an exposed CHD case in each of the 8 “trials” 1984, …, 1998. Furthermore in the analysis of column (i) the woman would contribute 0 to the 0–24 months exposed case stratum and 1 to the >24 month exposed case stratum. In contrast, the same woman in the analyses of columns (ii)-(iv) would contribute 1 to the 0–24 months exposed case stratum (corresponding to the 1998 “trial”) and 7 to the >24 month exposed case stratum (corresponding to each of the other 7 “trials”).

Third, we repeated the analysis in column (iv) after adjusting for the set of covariate values used in the most recent NHS publication. Thus, the estimates in column (v) – 0.81 for the entire follow-up, 0.71 for 0–24 months, and 0.85 for >24 months – were adjusted for the most recent values available at the time of return of the baseline questionnaire, rather than at the two previous questionnaires.

Fourth, we repeated the analysis in column (v) after dropping the requirement of an intact uterus, which was not used in previous NHS analyses. The estimates in column (vi) were 0.82 for the entire follow-up, 0.67 for 0–24 months, and 0.87 for >24 months. The estimate 0.67 in the row 0–24 months corresponds almost exactly to the analytic approach used in the most recent NHS publication,17 which estimated the HR over the two-year period after the reclassification (i.e., updating) of treatment status at the return of each questionnaire.


We used the NHS observational data to emulate the design and analysis of the WHI randomized trial. The ITT (intention-to-treat) hazard ratios of CHD for therapy initiation were 1.42 (CI = 0.92–2.20) in the NHS versus 1.68 (95% CI = 1.15–2.45) in the WHI9 during the first two years, and 1.00 (0.78–1.28) in the NHS versus approximately 1.24 (0.97–1.60) in the WHI8 during the first 8 years. However, much of the apparent WHI-NHS difference disappeared after stratification by time since menopause at hormone therapy initiation. The ITT hazard ratios were 0.84 (0.61–1.14) in the NHS versus 0.88 (0.54–1.43) in the WHI8, 18 for women within 10 years after menopause, and approximately 1.12 (0.84–1.48) in the NHS versus 1.23 (0.85–1.77) in the WHI8, 18 for women between 10 and 20 years after menopause.

These findings provide additional support to the hypothesis that hormone therapy may increase the long-term CHD risk only in women who were 10 or more years after menopause at initiation,17, 19 and to the rationale for an ongoing randomized clinical trial to determine the effect of estrogen plus progestin on coronary calcification in younger women.20 When the analyses were limited to women with no past history of hormone use, the ITT analysis result was 0.79 (0.60–1.03) for the entire follow-up and 0.66 (0.44–0.98) for women who initiated hormone use within ten years of menopause.

We computed average ITT hazard ratios in the NHS for comparison with the main result of the WHI. Our ITT estimates suggest that any remaining differences between NHS and WHI estimates are not explained by unmeasured joint risk factors for CHD and therapy discontinuation. However, the average ITT hazard ratio is not the ideal effect measure because the survival curves crossed during the follow-up in both the WHI trial and the NHS “trials,” and also because ITT estimates like the ones shown here are generally attenuated towards the null due to misclassification of actual treatment. We addressed the problem by estimating survival curves to first CHD event, and by estimating these curves under full adherence (via inverse probability weighting). Therefore the adherence-adjusted survival curves of Figure 4 provide the most appropriate summary of our results. It will be of interest to compare these results with adherence-adjusted curves (via inverse probability weighting) from the WHI when they become available. The curves suggest that continuous hormone therapy causes a net reduction in CHD among women starting therapy within 10 years of menopause, and a net increase among those starting later. However, either of these effects (as well as the qualitative interaction between therapy and time since menopause) could be due to sampling variability.

Previously published NHS estimates17 compared the hazards of current versus never users over the two-year period after the updating of treatment status at the return of each questionnaire, and could thus be viewed as a form of adherence adjustment. In Table 6 we described the steps from our two-year ITT estimate to the previously published adherence-adjusted estimate. Below we discuss the two key steps: the change of start of follow-up (time of therapy initiation vs. time of questionnaire return), and the change of the exposed group (selected initiators vs. current users).

The two-year hazard ratio estimate changed from 1.42 (Table 3) to 0.98 (Table 6, column i) during the first two years, and from 0.96 (Table 3) to 0.84 (Table 6, column i) for the entire follow-up when the definition of start of follow-up was changed from the estimated time of therapy initiation to the time of return of the next questionnaire. (The latter definition is commonly used in observational studies that collect treatment information at regular intervals.) This latter definition excludes women who initiated treatment and then suffered a nonfatal MI during the interval between treatment initiation and treatment ascertainment (up to two years in the NHS). If hormone therapy increases the short-term risk of CHD, this exclusion will result in an underestimate of the early increase in risk and may result in selection bias,16 which may explain part of the change from 1.42 to 0.98. The impact of this exclusion bias, however, will be diluted over the entire follow-up, as previously suggested in a sensitivity analysis.17 This may explain the smaller change from 0.96 to 0.84. This exclusion bias may be quantified through simulations,21 reduced by stratification of the analysis on duration of therapy at baseline,21 and eliminated by making the start of follow-up coincident with the time of treatment initiation, as discussed by Robins22, 23 and Ray.24 The approach we present here and elsewhere10, 25 generalizes Ray’s “new-users design” to the case of time-varying treatments.

The point estimate further changed from 0.98 (Table 6, column i) to 0.77 (column ii) when the definition of exposure changed from selected initiators to current users. These are estimates for different contrasts. The estimate in column (i) is based on the exposed person-time during the two-year period immediately after the return of the questionnaire in which therapy initiation was reported, and thus can be viewed as a flawed attempt to estimate the early effect of therapy initiation (see previous paragraph). The estimate in column (ii), however, is based on the exposed person-time pooled over all two-year periods after the return of any questionnaire, and thus can be interpreted as an attempt to estimate the effect of therapy use during any two-year period (that excludes the interval between therapy initiation and return of the next questionnaire, as discussed in the previous paragraph). More specifically, the approach in column (ii) can be understood as an attempt to estimate adherence-adjusted effects by entering the current value of exposure and the joint predictors of adherence and CHD as time-varying covariates in the model for CHD risk. Unlike inverse probability weighting, this approach to adherence adjustment requires that the time-dependent covariates not be strongly affected by prior treatment. This may be a reasonable assumption in the NHS. Thus the estimates in column (ii) may be more usefully compared with a weighted average of our interval-specific adherence adjusted estimates of 1.61 (0–2 years), 0 .65 (2–5 years), 0.47 (5–8 years), and 0.85 (>8 years) than to the estimate in column (i).

In summary, our findings suggest that the discrepancies between the WHI and NHS ITT estimates could be largely explained by differences in the distribution of time since menopause and length of follow-up. Residual confounding for the effect of therapy initiation in the NHS seems to play little role.

Appendix Figure 1
Sensitivity analysis for lack of adjustment for treatment arm in the inverse probability weighted analysis that adjusts for selection bias due to death between the start of follow-up and the return of questionnaire in the Nurses’ Health Study ...
Appendix Figure 2
Distribution of eligible women by number of NHS “trials” of hormone therapy discontinuation in which they participated.

Supplementary Material


We thank Murray Mittleman, Javier Nieto, Meir Stampfer, and Alexander Walker for their comments to an earlier version of the manuscript.

Financial support: This work was supported by NIH grants HL080644 and CA87969.


1. Ioannidis JP, Haidich AB, Pappa M, et al. Comparison of evidence of treatment effects in randomized and nonrandomized studies. Jama. 2001;286(7):821–30. [PubMed]
2. Benson K, Hartz AJ. A comparison of observational studies and randomized, controlled trials. N Engl J Med. 2000;342(25):1878–86. [PubMed]
3. Concato J, Shah N, Horwitz RI. Randomized, controlled trials, observational studies, and the hierarchy of research designs. N Engl J Med. 2000;342(25):1887–92. [PMC free article] [PubMed]
4. Grodstein F, Stampfer M, Manson J, et al. Postmenopausal estrogen and progestin use and the risk of cardiovascular disease (Erratum in: N Engl J Med 1996;335:1406) New England Journal of Medicine. 1996;335(7):453–61. [PubMed]
5. Grodstein F, Manson JE, Colditz GA, Willett WC, Speizer FE, Stampfer MJ. A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Annals of Internal Medicine. 2000;133(12):933–41. [PubMed]
6. Varas-Lorenzo C, García-Rodríguez LA, Pérez-Gutthann S, Duque-Oliart A. Hormone replacement therapy and incidence of acute myocardial infarction. Circulation. 2000;101:2572–8. [PubMed]
7. The Women’s Health Initiative Study Group. Design of the Women’s Health Initiative Clinical Trial and Observational Study. Controlled Clinical Trials. 1998;19:61–109. [PubMed]
8. Manson JE, Hsia J, Johnson KC, et al. Estrogen plus progestin and the risk of coronary heart disease. New England Journal of Medicine. 2003;349(6):523–34. [PubMed]
9. Prentice RL, Pettinger M, Anderson G. Statistical issues arising in the woman’s health initiative. Biometrics. 2005;61:899–941. [PubMed]
10. Hernán MA, Robins JM, García Rodríguez LA. Discussion of “Statistical issues arising in the Women’s Health Initiative” by RL Prentice, M Pettinger, and GL Anderson. Biometrics. 2005;61:922–41.
11. Stampfer MJ, Hu FB, Manson JE, Rimm EB, Willett WC. Primary prevention of coronary heart disease in women through diet and lifestyle. New England Journal of Medicine. 2000;343:16–22. [PubMed]
12. Robins JM. The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in longitudinal studies. In: Sechrest L, Freeman H, Mulley A, editors. Health Services Research Methodology: A Focus on AIDS. NCHRS, U.S. Public Health Service; 1989. pp. 113–59.
13. Mendelsohn ME, Karas RH. Hormone replacement therapy and the young at heart. New England Journal of Medicine. 356:2639–41. [PubMed]
14. Mark SD, Robins JM. A method for the analysis of randomized trials with compliance information: an application to the Multiple Risk Factor Intervention Trial. Controlled Clinical Trials. 1993;14:79–97. [PubMed]
15. Cole SR, Chu H. Effect of acyclovir on herpetic ocular recurrence using a structural nested model. Comtemp Clin Trials. 2005;26:300–10. [PubMed]
16. Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology. 2004;15:615–25. [PubMed]
17. Grodstein F, Manson JE, Stampfer MJ. Hormone therapy and coronary heart disease: The role of time since menopause and age at hormone initiation. Journal of Women’s Health. 2006;15(1):35–44. [PubMed]
18. Manson JE, Bassuk SS. Invited commentary: hormone therapy and risk of coronary heart disease why renew the focus on the early years of menopause? Am J Epidemiol. 2007;166(5):511–7. [PubMed]
19. Grodstein F, Clarkson TB, Manson JE. Understanding the divergent data on postmenopausal hormone therapy. New England Journal of Medicine. 2003;348(7):645–50. [PubMed]
20. Harman SM, Brinton EA, Cedars M, et al. KEEPS: The Kronos Early Estrogen Prevention Study. Climacteric. 2005;8(1):3–12. [PubMed]
21. Prentice RL, Langer RD, Stefanick ML, et al. Combined postmenopausal hormone therapy and cardiovascular disease: Toward resolving the discrepancy between observational studies and the women’s health initiative clinical trial. American Journal of Epidemiology. 2005;162:404–14. [PubMed]
22. Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period — Application to the healthy worker survivor effect [published errata appear in Mathl Modelling 1987;14:917–21] Mathematical Modelling. 1986;7:1393–512.
23. Robins JM. Addendum to “A new approach to causal inference in mortality studies with a sustained exposure period — Application to the healthy worker survivor effect” [published errata appear in Computers Math Applic 1989:18;477] Computers and Mathematics with Applications. 1987;14:923–45.
24. Ray WA. Evaluating medication effects outside of clinical trials: New-user designs. American Journal of Epidemiology. 2003;158(9):915–20. [PubMed]
25. Alonso A, García Rodríguez LA, Logroscino G, Hernán MA. Gout and risk of Parkinson’s disease: a prospective study. Neurology. 2007;69:1696–970. [PubMed]
26. Connely M, Richardson M, Platt R. Prevalence and duration of postmenopausal hormone replacement therapy use in a managed care organization. Journal of General Internal Medicine. 2000;15:542–50. [PMC free article] [PubMed]
27. Robins JM, Rotnitzky A, Vansteelandt S. Discussion of “Principal stratification designs to estimate input data missing due to death” by Frangakis CE, Rubin DB, An M, MacKenzie E. Biometrics. 2007;63:650–4. [PubMed]
28. Robins JM. Structural Nested Failure Time Models. In: Andersen PK, Keiding N, Armitage P, Colton C, editors. Survival Analysis. Chichester, UK: John Wiley and Sons; 1998. pp. 4372–89. The Encyclopedia of Biostatistics.
29. Hernán MA, Cole SR, Margolick J, Cohen M, Robins JM. Structural accelerated failure time models for survival analysis in studies with time-varying treatments. Pharmacoepidemiology and Drug Safety. 2005;14:477–91. [PubMed]
30. Robins JM, Blevins D, Ritter G, Wulfsohn M. G-estimation of the effect of prophylaxis therapy for Pneumocystis carinii pneumonia on the survival of AIDS patients [published errata appear in Epidemiology 1993:4;189] Epidemiology. 1992;3:319–36. [PubMed]