Let us illustrate the proposed methods with data from the WHI randomized controlled trial of combined (estrogen plus progestin) postmenopausal hormone therapy, which reported an elevated coronary heart disease risk and overall unfavorable health benefits versus risks over a 5.6-year study period (
Writing Group For the Women's Health Initiative Investigators, 2002;
Manson and others, 2003). Few research reports have stimulated as much public response, since preceding observational research literature suggested a 40–50% reduction in coronary heart disease incidence among women taking postmenopausal hormone therapy. Analysis of the WHI observational study shows a similar discrepancy with the WHI clinical trial for each of coronary heart disease, stroke, and venous thromboembolism. The discrepancy is partially explained by confounding in the observational study. A remaining source of discrepancy between the clinical trial and the observational study is elucidated by recognizing a dependence of the hazard ratio on the therapy duration (e.g.
Prentice and others, 2005). Here, we look at the time to coronary heart disease in the WHI clinical trial, which included 16 608 postmenopausal women initially in the age range of 50–79 with uterus (
n1 = 8102). There were 188 and 147 events observed in the treatment and control group, respectively, implying about 98% censoring, primarily by the trial stopping time. Fitting model (2.1) to this data set, we get

=(0.65,−3.63)
T. Due to heavy censoring, the value 0.03 of exp(
2) cannot be interpreted as the estimated long-term hazard ratio in the range of study follow-up times. The estimated hazard ratio function is needed for a more complete and accurate assessment of the treatment effect.
To examine model adequacy, we can use a residual plot that is similar to the method for the Cox regression model (
Cox and Snell, 1968). Let Λ
C and Λ
T be the cumulative hazard functions of the 2 groups, respectively. Then Λ
C(
Ti),
i ≤
n1,Λ
T(
Ti),
i >
n1 are i.i.d. from the standard exponential distribution. Let
C and
T be the model-based estimator of Λ
C and Λ
T, respectively, and define the residuals

If model (2.1) is correct, the residuals should behave like a censored sample from the standard exponential distribution. Thus, the Aalen–Nelson cumulative hazard estimator based on them should be close to the identity function. If there is noticeable deviation, then model (2.1) is questionable. Similarly, the residual plot can be obtained for the piecewise constant hazards ratio model used in Prentice
and others (2005). Both residual plots, not shown here, suggest that the 2 models fit the data adequately, with similar residual behaviors.
The 95% pointwise confidence intervals and simultaneous confidence bands for the hazard ratio function are given in . For comparison, the 95% confidence intervals for 0–2, 2–5, and > 5 years from Prentice
and others (2005) are included, over the median of uncensored data in each time interval. Compared with the piecewise constant hazards ratio model, the confidence bands do not depend on partitioning of the data range and provide more continuously changing display of the treatment effect. The confidence bands are generally in agreement with the results from
Prentice and others (2005). The UW band is wider than the other 2 bands most of the time. The HW band is the narrowest in the middle section but is quite wide at the beginning. Both the EP band and the HW band give narrower intervals for the middle portion of the data range than the piecewise Cox model. Near the end of the data range, all 3 bands have about the same width as the confidence interval from Prentice
and others (2005). Overall the EP band matches most closely with the results for the piecewise constant hazards ratio model. The width of the EP band is less than or equal to the piecewise model–based confidence intervals for most of the data range, except at the beginning. Note that the constant function 1 is not excluded in the HW and UW bands. In comparison, the EP band stays above 1 for about the first 600 days. From
Prentice and others (2005), the confidence interval for 0 − 2yr excludes 1, indicating an elevation in coronary heart disease risk for the treatment early on. For this data set, the standard error of the estimated hazard ratio begins at 0.43, quickly comes down to below 0.20 at 600 days and stays below 0.20 for the rest of data range. Since the UW band does not take the variance into account and the HW band emphasizes the middle range, the elevated standard error at early follow-up times likely explains the discrepancy among the results. Compared with the original analysis that showed an overall difference between the 2 groups, the results here and those from
Prentice and others (2005) give more detailed analysis on the dependence of the hazard ratio on time and help explaining the discrepancy between the results of the WHI clinical trial and preceding observational research, much of which involved cohorts where women could be enrolled some years after initiating hormone therapy.
For the average hazard ratio function, the estimator and the 95% simultaneous confidence band are given in . The standard error of the estimated average hazard ratio varies more mildly over time, and both the estimated average hazard ratio and the confidence band are changing much more smoothly compared with the results for the hazard ratio in . Note that the confidence band stays above 1 for
t < 700 days. This is in agreement with the results of
Prentice and others (2005).
To compare with the nonparametric approach, gives the estimated hazard ratio, the 95% pointwise confidence intervals and simultaneous confidence band of
Gilbert and others (2002), based on the R programs from the author's site. The same scale as that in is used for comparison and results in truncation of some portion of the plot. The estimated hazard ratio suggests that the hazard ratio is reasonably monotonic. The nonparametric hazard ratio estimate is somewhat lower than the hazard ratio estimates in under either model (2.1) or the piecewise constant hazards ratio model. The confidence band is wider than those in for the beginning and later parts of the data range, reflecting the difficulty in making nonparametric inference on the hazard functions, especially with heavy censoring and in the tail region.
From the results here and additional numerical studies and real data applications, we find that for the hazard ratio, the EP bands are preferable if the interest is in the largest possible data range; if the interest is in part of the middle portion, then the HW bands are usually better. For the average hazard ratio, the simple confidence band proposed here works adequately, although could possibly be improved if more elaborate weights are used.