We examined the performances of procedures/packages for fitting generalized linear mixed-effects models (GLMM) for correlated binary responses using the popular SAS and R statistical software packages. Unlike the special case of a linear mixed-effects model, maximum likelihood estimates (MLE) are difficult to compute and approximate estimates have been proposed and implemented in most statistical software packages such as SAS and R. As some of these estimates have unknown large sample properties, it is important to evaluate their performances to determine if they can provide reasonably good estimates and inference for practical purposes.
We reviewed two popular approaches for providing such approximate estimates as implemented in SAS and R, and discussed the pros and cons between the two approaches. Although the linearization approach is flexible, and can accommodate a complex structure of random effects, it may not provide reliable estimates, since it is difficult to theoretically characterize the properties of such estimates. In particular, it is unclear whether such estimates are even consistent. In comparison, the integral-based approximation approach yields estimates that approximate the MLE, thereby providing not only consistent, but asymptotically normal estimates. The superior performance of the latter approach as demonstrated by the SAS NLMIXED in our simulation study confirms the theoretical behavior of the estimates from this approach. We are a bit surprised by the performance of the R lme4 and glmmML packages, as neither seems to yield comparable inference as its SAS NLMIXED counterpart given that it implements the same integral approximation approach, albeit using different algorithms.
To further investigate the robustness of inference against non-normal random effects, we also changed the bivariate normal random effects bi
to a shifted bivariate-Gamma with the same variance structure, i.e.,
) denotes a Shifted bivariate Gamma with mean μ
and variance Φ
Shown in are the simulation results from this revised model for τ = 0.5 and 2 based on a sample size n = 100 under 1,000 MC replications. Again, we used 3 quadrature points for the R procedures to save computing time. Although parameter estimates and type I errors were less affected for τ = 0.5, a large bias was obvious for both when τ = 2, even for NLMIXED. Along with the findings for normal random effects, these results seem to indicate that GLMM for binary outcomes is not robust against misspecification of random effects.
Estimates of parameters, standard errors, and type I errors from the revised simulation model with random effects following a shifted bivariate Gamma and n = 100 using different prodedures/packages.
Judging from the results of our simulation study, we conclude that the SAS NLMIXED procedure provides the most accurate parameter estimates and inference (type I error) under correct model assumptions. Among the remaining procedures, estimates are generally biased, especially the type I errors, with the level of accuracy depending not so much on the number of quadrature points (for the procedures based on the integral approximation), but rather on the magnitude of the values of the parameters. For example, we varied the number of for the latter in the range of 3 and 20 for the R procedures, but failed to observe any improvement in type I errors when increasing the quadrature points from 3 to 20, except for a significant increase in computing time, with the latter increased by 15 times when using 20 over 3 quadrature points. Thus, for all practical purposes, it seems that 3 or 4 quadrature points is sufficient when using these procedures. However, as noted in Section 2.2, we should limit NLMIXED to models with a relatively small number of random effects to ensure accurate intergral approximation to the log-likelihood.
Unexpectedly, though quite interestingly as well, the levels of accuracy of these remaining procedures seem to depend on the magnitude of the values of the parameters, with larger (absolute) values yielding higher biases. Further investigations are needed to determine the sources of such a relationship so improvement may be made to these procedures. If these procedures have to be used in a given application for reasons such as those discussed in Section 3, we recommend the use of bootstrapped standard errors to improve inference reliability [26