The observational cohort as a nonrandomized “trial”

The NHS cohort was established in 1976 and comprised 121,700 female registered nurses from 11 U.S. states, aged 30 to 55 years. Participants have received biennial questionnaires to update information on use, duration (1–4, 5–9, 10–14, 15–19, 20–24 months), and type of hormone therapy during the two-year interval. Common use of oral estrogen plus progestin therapy among NHS participants began in the period between the 1982 and the 1984 questionnaires. The questionnaires also record information on potential risk factors for and occurrence of major medical events, including CHD (nonfatal myocardial infarction or fatal coronary disease). The process for confirming CHD endpoints has been described in detail elsewhere.^{4}

We mimicked the WHI trial by restricting the study population to postmenopausal women who in the 1982 questionnaire had reported no use of any hormone therapy during the prior 2-year period (“washout” period), and in the 1984 questionnaire reported either use of oral estrogen plus progestin therapy (“initiators”) or no use of any hormone therapy (“noninitiators”) during the prior 2-year period. Thus, as in the WHI, the “initiator” group includes both first-time users of hormone therapy and re-initiators (who stopped hormone therapy in 1980 or earlier and then re-initiated use in the period 1982–1984).

Women were followed from the start of follow-up to diagnosis of CHD, death, loss to follow-up, or June 2000, whichever occurred first. Unlike in the randomized WHI and the observational General Practice Research Database, the time of therapy initiation - and thus the time of start of follow-up for initiators – was not known with precision in the NHS, and so we needed to estimate it. For women who reported hormone therapy initiation during the 2-year period before the 1984 questionnaire and were still using it at the time they completed this questionnaire, the start of follow-up was estimated as the month of return of the baseline questionnaire minus the duration of hormone therapy use. (Duration is reported as an interval, e.g., 20–24 months; we used the upper limit of the interval, e.g., 24 months). For women who reported starting hormone therapy during the same two-year period but had stopped using it by the time they returned the 1984 questionnaire, the start of follow-up was estimated as the first month of the two-year period (the earliest possible month of initiation). The start of follow-up for noninitiators was estimated as the average month of start of follow-up among initiators (stratified by age and past use of hormone therapy). Alternative methods to estimate the start of follow-up had little effect on our estimates (see

appendix A1).

To further mimic the WHI, we restricted the study population to women who, before the start of follow-up, had a uterus, no past diagnosis of cancer (except nonmelanoma skin cancer) or acute myocardial infarction, and no diagnosis of stroke since the return of the previous questionnaire. To enable adjustment for dietary factors, we restricted the population to women who had reported plausible energy intakes (2,510–14,640 kJ/d) and had left fewer than 10 of 61 food items blank on the most recent food frequency questionnaire before the 1984 questionnaire.

The NHS cohort study can now be viewed as a non-randomized, non-blinded “trial” that mimics the eligibility criteria, definition of start of follow-up, and treatment arms (initiators vs. noninitiators) of the WHI randomized trial, but with a different distribution of baseline risk factors (e.g., lower age and shorter time since menopause in the NHS compared with the WHI). We analyzed the NHS nonrandomized “trial” by comparing the CHD risk of initiators and noninitiators regardless of whether these women subsequently stopped or initiated therapy. Thus our analytic approach is the observational equivalent of the ITT principle that guided the main analysis of the WHI trial. Specifically, we estimated the average hazard (rate) ratio (HR) of CHD in initiators versus noninitiators, and its 95% confidence interval (CI), by fitting a Cox proportional hazards model with “time since beginning of follow-up” as the time variable that included a non time-varying indicator for hormone therapy initiation. The Cox model was stratified on age (in 5-year intervals) and history of use of hormone therapy (yes, no).

To obtain valid effect estimates in a nonrandomized trial, all baseline confounders have to be appropriately measured and adjusted for in the analysis. We proceeded as if this condition was at least approximately true in the NHS nonrandomized “trial” once we added the following covariates to the Cox model: parental history of myocardial infarction before age 60 (yes, no), education (graduate degree: yes, no), husband’s education (less than high school, high school graduate, college, graduate school), ethnicity (non-Hispanic white, other), age at menopause (<50, 50–53, >53), calendar time, high cholesterol (yes, no), high blood pressure (yes, no), diabetes (yes, no), angina (yes, no), stroke (yes, no), coronary revascularization (yes, no), osteoporosis (yes, no), body mass index (<23, 23–<25, 25–<30, ≥30), cigarette smoking (never, past, current 1–14 cigarettes/d, current 15–24 cigarettes/d, current 25+ cigarettes/d,), aspirin use (nonuse, 1–4 yrs, 5–10 yrs, >10 yrs), alcohol intake (0, >0–<5, 5–<10, 10–<15, ≥ 15 g/d), physical activity (6 categories), diet score (quintiles),^{11} multivitamin use (yes, no), and fruit and vegetable intake (<3, 3–<5, 5–<10, ≥ 10 servings/day). When available, we simultaneously adjusted for the reported value of each variable on both the 1982 and 1980 questionnaires.

The observational cohort as a sequence of nonrandomized “nested trials”

The approach described above would produce very imprecise ITT estimates if (as was the case) few women were initiators during the 1982–1984 period. However, our choice of this period was arbitrary. The approach described above can produce an additional NHS nonrandomized “trial” when applied to each of the eight two-year periods between 1982–1984 and 1996–1998. Thus, as a strategy to increase the efficiency of our ITT estimate, we conducted 7 additional nonrandomized “trials” starting at each subsequent questionnaire (1986, 1988, …., 1998), and pooled all 8 “trials” into a single analysis. Because some women participated in more than one of these NHS “trials” (up to a maximum of 8), we used the robust variance estimator to account for within-person correlation. We assessed the potential heterogeneity of the ITT effect estimates across “trials” by two Wald tests: first, we estimated a separate parameter for therapy initiation in each “trial” and tested for heterogeneity of the parameters (chi-square; 6 df), and then we calculated a product term (for the indicators of “trial” and therapy initiation), testing for whether the product term was different from zero (chi-square; 1 df).

In each “trial,” we used the corresponding questionnaire information to apply the eligibility criteria at the start of follow-up, and to define initiators and noninitiators. We then estimated the CHD average hazard ratio in initiators versus noninitiators (adjusted for the values of covariates reported in the two previous questionnaires), regardless of whether these women subsequently stopped or initiated therapy. To allow for the possibility that the hazard ratio varied with time since baseline, we added product terms between time of follow-up (linear and quadratic terms) and initiation status to a pooled logistic model that approximated our previous Cox model. We then used the fitted model to estimate CHD-free survival curves for initiators and noninitiators.

At each trial’s start of follow-up, women with a CHD diagnosis within the last two years are not eligible for that “trial.” Thus, the subset of women considered for eligibility in each “trial” is approximately nested in the subset of women who were considered for eligibility in the prior “trial.” Our conceptualization of an observational study with a time-varying treatment as a sequence of nested trials, each with non-time-varying treatment, is a special case of g-estimation of nested structural models.^{12}

Several lines of evidence suggest a modification of the effect of hormone therapy by time of initiation.^{13} We therefore conducted stratified analysis by time since menopause (<10, ≥ 10 years) and age (<60, ≥ 60 yrs).

Adherence-adjusted effect estimates

Because the primary analysis of the WHI randomized trial was conducted under the ITT principle, we analyzed our NHS “trials” using an observational analog of ITT in order to compare the NHS with the WHI estimates. However, ITT estimates are problematic because the magnitude of the ITT effect varies with the proportion of subjects who adhere to the assigned treatment, and thus ITT comparisons can underestimate the effect that would have been observed if everyone had adhered to the assigned treatment. Thus ITT effect estimates may be unsatisfactory when studying the efficacy, and inappropriate when studying the safety, of an active treatment compared with no treatment. An alternative to the ITT effect is the effect that would have been observed if everyone had remained on her initial treatment throughout the follow-up, which we refer to as an adherence-adjusted effect. Adherence-adjusted effect estimates can be obtained in both randomized experiments and observational studies by using g-estimation^{14, 15} or inverse probability weighting.

We used inverse probability weighting to estimate the adherence-adjusted hazard ratio of CHD. In each NHS “trial” we censored women when they discontinued their baseline treatment (either hormone therapy or no hormone therapy), and then weighted the uncensored women-months by the inverse of their estimated probability of remaining uncensored until that month.^{16} To estimate trial-specific probabilities for each woman, we fit a pooled logistic model for the probability of remaining on the baseline treatment through a given month. The model included the baseline covariates used in the trial-specific Cox models described above, and the most recent post-baseline values of the same covariates. Inclusion of time-dependent covariates is necessary to adjust for any dependence between noncompliance and CHD within levels of baseline covariates. We fit separate models for initiators and noninitiators. In each “trial,” each woman contributed as many observations to the model as the number of months she was on her baseline therapy.

To stabilize the inverse probability weights, we multiplied the weights by the probability of censoring given the “trial”-specific baseline values of the covariates. Weight stabilization improves precision by helping to reduce random variability. If the true adherence-adjusted hazard ratio is constant over time, this method produces valid estimates provided that discontinuing the baseline treatment is unrelated to unmeasured risk factors for CHD incidence within levels of the covariates, and that the logistic model used to estimate the inverse probability weights is correctly specified. When the adherence-adjusted hazard ratio changes with time since baseline, this method estimates a weighted average adherence-adjusted hazard ratio with time-specific weights proportional to the number of uncensored CHD events occurring at each time. Thus, with heavy censoring due to lack of adherence, the early years of follow-up contribute relatively more weight than would be the case without censoring. To more appropriately adjust for a time-varying hazard ratio, we also fit an inverse probability weighted Cox model (approximated through a weighted pooled logistic model) that included product terms between time of follow-up (linear and quadratic terms) and initiation status. We then used the weighted model to estimate adherence-adjusted CHD-free survival curves for initiators and non-initiators.

We also present additional subsidiary analyses to explain the relation between our estimates and previously reported NHS estimates, which can be regarded as estimates of the adherence-adjusted hazard ratio using an alternative to our inverse probability weighting approach.