In Simulation 1, the absolute value of the difference between the regression risk analysis and the M–H ARR across the 15 data sets averaged 0.00015 (range 0.000032–0.00036). The largest absolute difference in risk ratio was <0.025 percent of the M–H risk ratio. ARDs had a mean absolute difference of 9.76 × 10−6. We conclude that for simple models with limited confounding, regression risk analysis gives the same answer (within numerical precision) as M–H, a practical standard for comparison.
In data sets with categorical covariates and substantial confounding (), regression risk analysis, Poisson regression, and log-binomial regression all produced ratio estimates virtually identical to the M–H estimate. AOR (Hosmer and Lemeshow 1989
) and the Zhang and Yu equation (Zhang and Yu 1998
) were biased, as is well known (McNutt et al. 2003
). Regression risk analysis estimates of ARD closely approximated the standard.
When confounders were continuous and confounding exceeded 25 percent (), a common situation where M–H cannot be estimated, regression risk analysis retained a high degree of accuracy. Omitted from the table are a similar number of simulations for which the log-binomial regression failed to converge. Only the regression risk analysis ARR retained its accuracy with increasing baseline risk and effect size, and never had convergence problems. Regression risk analysis estimates of the ARD were also highly accurate. Simulation 4 has no established standard for the ARR because the nominal risk ratio is limited by ceiling effects at the upper end of the distribution. Probability theory bounds the product of the baseline risk and the risk ratio (which equals the exposed risk) at unity, establishing an upper limit for the ARR estimates. Although the table is not shown, of the methods discussed, only the regression risk analysis ARR was always plausible. For example when the baseline risk was 0.33, the maximum plausible ARR is 3.03: the logistic risk analysis estimate was 1.89, while log-binomial regression=3.57, Poisson=3.55, and Zhang and Yu=3.72
demonstrates the precision of regression risk analysis. The widths of the confidence intervals (based on 1,000 bootstrapped replications) are similar to those from Poisson regression. Regression risk analysis appears to be sufficiently precise to produce meaningful estimates when the sample size is adequate to use logistic regression (Concato et al. 1995
We demonstrate the effect of including an interaction term in the logistic model by analyzing two data sets that were identical except for the adjusted risk in the unexposed (0.07 and 0.26, respectively). Regression risk analysis ARRs were estimated in each set (N≈45,000) from two logistic models, one including interactions and one not. Then each data set was divided into 13 smaller data sets on the basis of the covariate values in order to conduct separate logistic regressions on each subset to observe the distribution of AORs for each section of the data. As expected, there was less variation in the AOR when risk was 0.07 (range 2.9–4.0, coefficient of variation [CV]=8.4) than when it was 0.26 (range 5.1–16.4, CV=39.9). In the first data set the ARR with and without interactions were almost identical (2.972 versus 2.971, respectively), suggesting that noninteracted models may be parsimonious when outcomes are not common. Even with the greater variation of the second set the difference between the ARR in the interacted and noninteracted models was modest (3.01 versus 2.84).