Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3175267

Formats

Article sections

- Summary
- 1 Introduction
- 2 Generalized Linear Mixed-effects Models
- 3 Available Software for GLMM with Binary Response
- 4 Illustration
- 5 Discussion
- References

Authors

Related links

Stat Med. Author manuscript; available in PMC 2012 December 10.

Published in final edited form as:

Stat Med. 2011 September 10; 30(20): 2562–2572.

Published online 2011 June 10. doi: 10.1002/sim.4265PMCID: PMC3175267

NIHMSID: NIHMS285563

Hui Zhang,^{1} Naiji Lu,^{2,}^{3} Changyong Feng,^{2} Sally W. Thurston,^{2} Yinglin Xia,^{2,}^{3} and Xin M. Tu^{2,}^{3,}^{4}

See other articles in PMC that cite the published article.

The generalized linear mixed-effects model (GLMM) is a popular paradigm to extend models for cross-sectional data to a longitudinal setting. When applied to modeling binary responses, different software packages and even different procedures within a package may give quite different results. In this report, we describe the statistical approaches that underlie these different procedures and discuss their strengths and weaknesses when applied to fit correlated binary responses. We then illustrate these considerations by applying these procedures implemented in some popular software packages to simulated and real study data. Our simulation results indicate a lack of reliability for most of the procedures considered, which carries significant implications for applying such popular software packages in practice.

Longitudinal study designs have become increasingly popular in clinical trials and observational studies across all disciplines, as they capture both between-individual differences and within-subject dynamics. They also offer the opportunity to study more complex biological, psychological and behavioral hypotheses, especially those involving changes over time such as causal treatment effects. Since longitudinal study designs create serial, or within-subject, correlations over repeated assessments, statistical methods for cross-sectional data analysis must be generalized to account for such correlated observations.

One popular paradigm to handle this extension is the generalized linear mixed-effects model (GLMM). This approach explicitly models the within-subject correlation using random effects, or latent random variates [1]. Since the likelihood function arising from GLMM is generally quite complex, it is difficult to directly maximize this function except for the special linear models case for continuous outcomes. As a result, different estimates have been proposed and implemented by major statistical packages to provide inference about model parameters. For example, in SAS [2], which is arguably the most popular software package for data analysis, there are two major procedures that provide support for fitting GLMM. In R [3], another popular package for data analysis, especially among academic investigators, there are at least three packages for inference from GLMM. Thus, for many practitioners, it is quite confusing to select an appropriate procedure to use for their data at hand. A more important question is whether these procedures all provide valid inference.

In this paper, we discuss the major algorithms that drive the different procedures within as well as across different packages and investigate the performance of these procedures when applied to binary responses. We chose to examine the binary response since our experiences indicate this response type seems to mostly accentuate the differences between the different procedures, although our considerations apply equally well to other response types such as count responses. In Section 2, we give a brief review of GLMM for modeling binary responses. In Section 3, we discuss major software packages for fitting GLMM. In Section 4, we use simulated data and real data to examine the performance of these procedures/packages. In Section 5, we give our concluding remarks.

We start with a brief review of the generalized linear mixed-effects model (GLMM) for modeling longitudinal binary outcomes and then discuss two popular approaches for inference from GLMM.

Consider a sample of *n* subjects and let *y _{i}* denote a binary response and

$${y}_{i}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\mathbf{x}}_{i}\phantom{\rule{thinmathspace}{0ex}}\stackrel{i.d.}{~}\text{Bernoulli}({\mu}_{i}),\text{}{\mu}_{i}=E\phantom{\rule{thinmathspace}{0ex}}({y}_{i}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\mathbf{x}}_{i}),\text{logit}({\mu}_{i})={\mathbf{x}}_{i}^{\top}\mathit{\beta},\text{}i=1,2,\dots n,$$

(1)

where *i.d.* denotes *independently distributed*, Bernoulli(μ_{i}) the *Bernoulli* distribution with mean μ_{i}, and **β** the vector of model parameters. Inference about **β** using either maximum likelihood or estimating equations has been extensively discussed in the literature [e.g., 3, Chapters 4 and 9; 4, Chapter 2] and thus is not repeated for space consideration.

For a longitudinal study, each subject will be repeatedly assessed over time. For ease of exposition, we assume a design with *m* fixed assessment points and let *y _{it}* and

Given such complexity, a popular approach to extend the model in (1) to account for dependence across repeated *y _{it}* is to use random effects [5,10,11,12,13,14]. More precisely, consider the following extension of (1):

$$\begin{array}{c}{y}_{\mathit{\text{it}}}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{\mathbf{x}}_{\mathit{\text{it}}},{\mathbf{z}}_{\mathit{\text{it}}},{\mathbf{b}}_{i}\phantom{\rule{thinmathspace}{0ex}}\stackrel{i.d.}{~}\text{Bernoulli}({\mu}_{\mathit{\text{it}}}),\text{logit}({\mu}_{\mathit{\text{it}}})={\mathbf{x}}_{\mathit{\text{it}}}^{\top}\mathit{\beta}+{\mathbf{z}}_{\mathit{\text{it}}}^{\top}{\mathbf{b}}_{i},\hfill \\ {\mathbf{b}}_{i}\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}N\phantom{\rule{thinmathspace}{0ex}}(\mathbf{0},{\mathrm{\Sigma}}_{b}),\text{}i=1,2,\dots n,\text{}t=1,2,\dots M,\hfill \end{array}$$

(2)

where *N* (μ, Σ) denotes a multivariate normal with mean **μ** and variance Σ, and **z**_{it} a vector of predictors/covariates. Although **b**_{i} is typically assumed to follow a multivariate normal as in (2), other types of distributions may also be considered [11]. In addition, even conditioning on **x**_{it}, **z**_{it}, and **b**_{i}, the *y _{it}*’s may still be correlated. However, we focus on the normal random-effects and independent

For the standard logistic model in (1), the log-likelihood is readily evaluated and an objective function for estimation of the parameters is simple to construct based on the independence of the observations. For the GLMM-based extension in (2), the log-likelihood is also easily derived based on the independent *y _{it}*’s upon conditioning on

$$l={\displaystyle \sum _{i=1}^{n}}\text{log}\phantom{\rule{thinmathspace}{0ex}}\left[{\displaystyle {\int}_{{\mathbf{b}}_{i}}}{\mu}_{\mathit{\text{it}}}^{{y}_{\mathit{\text{it}}}}{(1-{\mu}_{\mathit{\text{it}}})}^{1-{y}_{\mathit{\text{it}}}}\varphi \phantom{\rule{thinmathspace}{0ex}}({\mathbf{b}}_{i}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}\mathbf{0},{\mathrm{\Sigma}}_{b})d{\mathbf{b}}_{i}\right],$$

(3)

where ϕ (· | **0**, Σ_{b}) denotes the probability density function of a multivariate normal with mean **0** and variance Σ_{b}. Although the log-likelihood seems simple in appearance, it is not a straightforward task to numerically compute the maximum likelihood estimate (MLE) because of the potentially high-dimensional integration when a large number of random effects (i.e., high dimension of **b**_{i}) is used.

The two most popular approaches for estimation in GLMM as implemented in major software packages such as SAS and R are (1) approximating the log-likelihood function and (2) approximating the model. Algorithms in the second category are developed using linearization of or Laplace approximation to the model. They employ some type of expansions such as the Taylor series expansion to approximate the model by one based on pseudo-data with fewer nonlinear components. The process of computing the linear approximation must be repeated several times until some criteria indicate lack of further progress [15,16]. The fitting methods based on linearization typically involve two levels of iterations. The GLMM is first approximated by a linear mixed-effects model based on current values of the covariance parameter estimates, and the resulting linear mixed-effects model is then fit, which is itself an iterative process. On convergence, the new parameter estimates are used to update the linearization, which results in a new linear mixed-effects model. The iterative process terminates when the difference in parameter estimates between successive linear mixed model fits falls within a specified tolerance level. Note that the Laplace approximation is a special case of the Gauss-Hermite quadrature discussed below, when there exists only one quadrature point.

The advantages of the linearization approach include a relatively simple form of the linearized model that typically can be fit based only on the mean and variance in the linearized form. This approach can fit models with a large number of random effects, crossed random effects, multiple types of subjects, and even correlated *y _{it}*’s after conditioning on

In contrast, the integral approximation approach aims to directly approximate the log-likelihood in (3) and maximize the approximated function [13,14]. Various techniques have been proposed to compute the approximation including Newton and Gauss-Hermite quadratures, Monte Carlo integration, and Markov Chain Monte Carlo. The advantage of this alternative is to provide an actual objective function for optimization, albeit it is an approximated log-likelihood. Nonetheless, this enables likelihood ratio tests and likelihood-based fit statistics. Further, unlike the linearization approach whose approximation accuracy and thus the asymptotic properties of the estimates are limited by the type of models fit (e.g., linear, logistic, etc.), the approximation to the log-likelihood under this approach can be improved to any degree by increasing the precision of numerical integration, at least in principle (see Section 4.1 for such a dependent relationship in some procedures from simulated data). Thus, algorithms based on the integral approximation are expected to provide better estimates than their linearization-based alternatives. In particular, since they can get arbitrarily close to the MLE as the precision of numerical integration increases, estimates obtained under this approach have similar nice large sample properties such as consistency and asymptotic normality and efficiency. One drawback of the integral approximation is the difficulty in getting a good approximation when there are many random effects and/or nested random effects in the model. In Section 4, we compare the performance of the different approaches as implemented in the two most popular software, SAS and R, using simulated data.

In this section, we discuss major procedures in SAS and R for inference about the GLMM model in (2).

Two major procedures in SAS for fitting GLMM are GLIMMIX and NLMIXED. The primary difference between the two lies in the estimation approaches employed. While GLIMMIX implements the linearization approach based on the Laplace approximation, NLMIXED employs the integral approximation to the log-likelihood to provide inference. NLMIXED approximates the log-likelihood by integral approximation through an adaptive Gaussian quadrature when carrying out the dual quasi-Newton optimization. The objective function for the NLMIXED procedure is the marginal log likelihood in (2) obtained by integrating out the random effects from the joint distribution of responses and random effects using quadrature techniques. As noted earlier, this approach is expected to yield estimates that can be made arbitrarily close to the MLE as the approximation precision to the log-likelihood improves. The NLMIXED procedure not only generates true log-likelihood fit statistics that can be used to compare nested models, but also permits greater flexibility to accommodate user-defined likelihood functions to implement other statistical models. A major limitation of NLMIXED is that the *y _{it}*’s in (2) cannot be correlated after controlling for

GLIMMIX, on the other hand, fits GLMM based on the approach of linearization [18]. This allows multiple random effects, nested and crossed random effects, multiple cluster types, and within our context, a random structure to accommodate correlated *y _{it}*’s in (2) even after controlling for

Note that initial parameter values are specified differently between the two procedures. Initial values must be manually set for NLMIXED prior to running the procedure. Within our context of binary outcomes, such values may be generated by fitting the data first using generalized estimating equations [4,9] via PROC GENMOD. In contrast, GLIMMIX expects no such user input and uses a double iteration scheme to generate initial values from an iteratively derived approximated linear mixed-effects model.

There are also multiple procedures (or packages as they are called in R) available for fitting GLMM in R. One popular package is lme4 (version 0.99875-8 was used in the simulation study), which implements the Gauss-Hermite quadrature to approximate the log-likelihood using numerical integration. It defaults to the Laplace approximation when only one quadrature point is used (the default option of lme4) [22]. Another available package is ZELIG, which was proposed to serve as a common framework for statistical analysis and software development built around the R language [23]. For binary responses, this function uses the same computing engine as in lme4 to provide estimates, and thus yields quite similar results as the latter package. A third option is the glmmML package [18]. Like lme4 and ZELIG, glmmML offers integral approximations to the log-likelihood using the Gauss-Hermite quadrature, in addition to the Laplace approximation. Thus, like its SAS counterpart NLMIXED, lme4, ZELIG and glmmML can generate true log-likelihood fit statistics to provide improved inference about model parameters. For small samples, the Wald statistic based on the MLE may not provide reliable inference as the sampling distribution is likely to be poorly approximated by a (multivariate) normal distribution. The likelihood statistic, however, may still provide good inference since it is computed based on the height rather than the entire distribution of the log-likelihood statistic. As in the case of NLMIXED, a primary limitation of the integral approximation approach is that the *y _{it}*’s in (2) must be uncorrelated after controlling for

We illustrate our considerations with simulated as well as real study data. We investigate the performance of each procedure/package within R and SAS so we can compare the accuracy of estimates and inference across the different procedures/packages. All the codes for the simulation and real study data are available from the authors upon request. In all the examples, we set the type I error level at α = 0.05. We start with simulated data.

We simulated data from a GLMM model with a single explanatory variable within a longitudinal data setting containing 3 assessments. To evaluate the performance of the procedures, we repeated the analysis for a range of sample size, *n* = 50, 100, 500, representing small, moderate and large samples. To obtain reliable estimates of model parameters and type I errors, we perform 1, 000 Monte Carlo (MC) replications for each simulation run.

We simulated the explanatory variable *x _{i}* and the response

$$\begin{array}{c}\hfill {y}_{\mathit{\text{it}}}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{x}_{i},{\mathbf{b}}_{i}\phantom{\rule{thinmathspace}{0ex}}~\text{Bernoulli}({\mu}_{\mathit{\text{it}}}),\text{logit}({\mu}_{\mathit{\text{it}}})={\beta}_{0}+{b}_{i0}+{x}_{i}\phantom{\rule{thinmathspace}{0ex}}({\beta}_{1}+{b}_{i1}),\hfill \\ \hfill {\mathbf{b}}_{i}=\left(\begin{array}{c}\hfill {b}_{i0}\hfill \\ \hfill {b}_{i1}\hfill \end{array}\right)\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}N\phantom{\rule{thinmathspace}{0ex}}\left(\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \end{array}\right),{\tau}^{2}\phantom{\rule{thinmathspace}{0ex}}\left(\begin{array}{cc}\hfill 1\hfill & \hfill 0.25\hfill \\ \hfill 0.25\hfill & \hfill 1\hfill \end{array}\right)\right),\text{}{x}_{\mathit{\text{it}}}\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}N\phantom{\rule{thinmathspace}{0ex}}(0,1),\text{}t=1,2,3,\hfill \end{array}$$

(4)

where β_{0} = β_{1} = 1 and τ = 0.001, 0.5 and 2. For τ = 0.001, the within-subject correlation was very small and thus negligible, making the *y _{it}* conditional on

We considered the hypothesis, *H*_{0} : β_{1} = 1 vs. *H _{a}* : β

Shown in Table 1, ,2,2, and and33 are the estimates of model parameters **β** and associated standard errors, averaged over 1,000 MC replications, and the type I error rates based on the different procedures (packages) in SAS (R) for the three sample sizes considered. For SAS GLIMMIX, both maximum likelihood (ML) and restricted maximum likelihood (REML) estimates were reported. For R, results were reported for both the linearization (labeled as "Laplace") and integral approximation (labeled as "Gauss-Hermite") approaches, with the latter results based on 20 quadrature points.

Estimates of parameters, standard errors, and type I errors for the simulation model with *n* = 50 using different prodedures/packages.

Estimates of parameters, standard errors, and type I errors for the simulation model with *n* = 100 using different prodedures/packages.

Estimates of parameters, standard errors, and type I errors for the simulation model with *n* = 500 using different prodedures/packages.

As expected, for small τ = 0.001, all the procedures and packages gave similar estimates of **β** and standard errors across the sample sizes. As τ increased to 0.5 and 2, results for the estimates and standard errors become quite discrepant across the different procedures (packages). Within SAS, there was downward bias in both the estimates of **β** and standard errors by GLIMMIX; the amount of bias reached almost 50% at τ = 2. In both cases of τ, NLMIXED provided good estimates.

As in the case of SAS, estimates of **β** and standard errors based on R packages began to differ as τ increased. Results from lme4 and ZELIG were closer to each other than those from glmmML, which is not surprising, given that lme4 and ZELIG were based on the same computing algorithms as noted earlier. Estimates of **β** and standard errors from glmmML were downwardly biased, though the patterns of bias were more difficult to decipher for lme4 and ZELIG. The integral approach based on the Gauss-Hermite approximation seemed to reduce the amount of bias in the estimated **β**’s and the standard errors, especially for lme4 and ZELIG. It is interesting to note that although the Gauss-Hermite approximation in these packages employs the same integral approximation approach to the log-likelihood as its SAS NLMIXED counterpart, it does not seem to provide more accurate estimates (than its Laplace counterpart) for these packages.

Estimates of type I error rates were quite variable across the different procedures and functions as well. For τ = 0.001, all type I error rates were closer to the nominal value 0.05, especially for *n* = 500, though those from SAS GLIMMIX and R glmmML has small downward bias, while the ones from lme4 and ZELIG showed some upward bias. For τ = 0.5 and 2, all type I error rates increased substantially, except for those from NLMIXED that remained close to 0.05. It is quite clear that with the exception of NLMIXED, none of the procedures would provide acceptable type I error levels for inference about the null *H*_{0} : β_{1} = 1 when the within-subjects correlation is 0.2 or greater.

Note that during the review of this manuscript, one reviewer suggested that we reran the simulation model within the same setting, except for increasing the magnitude of the β’s, to see if larger (absolute) values of β’s would play any role in the bias. We found that the biases and type I errors did seem to depend on the magnitude of β’s. For example, shown in Tables 4, ,55 and and66 are the estimates, standard errors and type I errors for three different set values of the β’s, β_{0} = β_{1} = 0, 1.5 and 2. As seen, the bias in the type I error does increase, as the values of β’s move further away from 0. The bias in the type I error showed a similar pattern of increase when β’s were shifted away from 0 to the left.

Estimates of parameters, standard errors, and type I errors for the revised simulation model with β_{0} = β_{1} = 0 and *n* = 500 using different prodedures/packages.

Estimates of parameters, standard errors, and type I errors for the revised simulation model with β_{0} = β_{1} = 1.5 and *n* = 500 using different prodedures/packages.

Estimates of parameters, standard errors, and type I errors for the revised simulation model with β_{0} = β_{1} = 2 and *n* = 500 using different prodedures/packages.

Note also that the standard error estimates are generally smaller for the SAS GLIMMIX and the R procedures compared to the SAS NLMIXED, especially for GLIMMIX. However, smaller standard error estimates may not be a good thing when parameter estimates are highly biased. As estimates from some procedures such as those based on the Laplace approximation have unknown asymptotic properties, it is more important to examine the bias in the parameter estimates and in the type I error. Highly biased estimates with smaller standard estimates are the worst combination since it not only leads to wrong conclusions, but does so with greater statistical power.

We also applied the different procedures/packages to a multi-center randomized clinical trial comparing two treatments for a respiratory illness [24]. There were 111 subjects randomized to active (54) or placebo (57) treatment across two sites. The primary outcome was respiratory status (0=poor, 1=good), which was assessed at 4 visits. The study data was analyzed by Davis using distribution-free (or semi-parametric) longitudinal methods [24]. In addition to the treatment conditions, he also included baseline respiratory status, study site, sex and age as covariates. There was no missing data in the outcome, the predictor (treatment conditions), and the covariates. Our reanalysis using GLMM included the same predictor and covariates as the fixed effects, and a random intercept.

Shown in Table 7 are the estimates, standard errors and p-values from fitting the GLMM using the different procedures/packages. The results from the R packages lme4 and glmmML were based on the Gauss-Hermite approach for integral approximation with 3 quadrature points, since there is no significant improvement using more quadrature points for these procedures (see Discussion for more details). Since ZELIG yielded almost identical results as lme4, we only reported the results for the latter package. Likewise, we only reported the results from the REML option of SAS GLIMMIX, since they showed slightly less bias than those from the ML method as observed in the simulation study.

Estimates of parameters, standard errors, and type I errors for the the Respiratory Clinical Data using different prodedures/packages.

The estimates of **β** and standard errors showed similar patterns as those from the simulation study. In particular, the standard errors from SAS GLIMMIX and R glmmML all had almost 50% downward bias (with NLMIXED as a reference standard), consistent with the amount of bias observed for these procedures in the simulation study. The standard errors from lme4 also seemed to have some downward bias. The differences in the p-values between NLMIXED and the two R packages are larger than those between NLMIXED and GLIMMIX. The downward bias in both the estimates and standard errors by GLIMMIX may have helped reduce the amount of upward bias in the p-values yielded by this SAS procedure.

We examined the performances of procedures/packages for fitting generalized linear mixed-effects models (GLMM) for correlated binary responses using the popular SAS and R statistical software packages. Unlike the special case of a linear mixed-effects model, maximum likelihood estimates (MLE) are difficult to compute and approximate estimates have been proposed and implemented in most statistical software packages such as SAS and R. As some of these estimates have unknown large sample properties, it is important to evaluate their performances to determine if they can provide reasonably good estimates and inference for practical purposes.

We reviewed two popular approaches for providing such approximate estimates as implemented in SAS and R, and discussed the pros and cons between the two approaches. Although the linearization approach is flexible, and can accommodate a complex structure of random effects, it may not provide reliable estimates, since it is difficult to theoretically characterize the properties of such estimates. In particular, it is unclear whether such estimates are even consistent. In comparison, the integral-based approximation approach yields estimates that approximate the MLE, thereby providing not only consistent, but asymptotically normal estimates. The superior performance of the latter approach as demonstrated by the SAS NLMIXED in our simulation study confirms the theoretical behavior of the estimates from this approach. We are a bit surprised by the performance of the R lme4 and glmmML packages, as neither seems to yield comparable inference as its SAS NLMIXED counterpart given that it implements the same integral approximation approach, albeit using different algorithms.

To further investigate the robustness of inference against non-normal random effects, we also changed the bivariate normal random effects **b**_{i} to a shifted bivariate-Gamma with the same variance structure, i.e.,

$$\begin{array}{c}{y}_{\mathit{\text{it}}}\phantom{\rule{thinmathspace}{0ex}}|\phantom{\rule{thinmathspace}{0ex}}{x}_{i},{\mathbf{b}}_{i}\phantom{\rule{thinmathspace}{0ex}}~\text{Bernoulli}({\mu}_{\mathit{\text{it}}}),\text{logit}({\mu}_{\mathit{\text{it}}})={\beta}_{0}+{b}_{i0}+{x}_{i}\phantom{\rule{thinmathspace}{0ex}}({\beta}_{1}+{b}_{i1}),\hfill \\ {\mathbf{b}}_{i}=\left(\begin{array}{c}\hfill {b}_{i0}\hfill \\ \hfill {b}_{i1}\hfill \end{array}\right)\phantom{\rule{thinmathspace}{0ex}}~\phantom{\rule{thinmathspace}{0ex}}\mathit{\text{SG}}\phantom{\rule{thinmathspace}{0ex}}\left(\left(\begin{array}{c}\hfill 0\hfill \\ \hfill 0\hfill \end{array}\right),{\tau}^{2}\phantom{\rule{thinmathspace}{0ex}}\left(\begin{array}{cc}\hfill 1\hfill & \hfill 0.25\hfill \\ \hfill 0.25\hfill & \hfill 1\hfill \end{array}\right)\right),\text{}t=1,2,3,\hfill \end{array}$$

(5)

where *SG*(**μ**, **Φ**) denotes a Shifted bivariate Gamma with mean **μ** and variance **Φ** [25].

Shown in Table 8 are the simulation results from this revised model for τ = 0.5 and 2 based on a sample size *n* = 100 under 1,000 MC replications. Again, we used 3 quadrature points for the R procedures to save computing time. Although parameter estimates and type I errors were less affected for τ = 0.5, a large bias was obvious for both when τ = 2, even for NLMIXED. Along with the findings for normal random effects, these results seem to indicate that GLMM for binary outcomes is not robust against misspecification of random effects.

Estimates of parameters, standard errors, and type I errors from the revised simulation model with random effects following a shifted bivariate Gamma and *n* = 100 using different prodedures/packages.

Judging from the results of our simulation study, we conclude that the SAS NLMIXED procedure provides the most accurate parameter estimates and inference (type I error) under correct model assumptions. Among the remaining procedures, estimates are generally biased, especially the type I errors, with the level of accuracy depending not so much on the number of quadrature points (for the procedures based on the integral approximation), but rather on the magnitude of the values of the parameters. For example, we varied the number of for the latter in the range of 3 and 20 for the R procedures, but failed to observe any improvement in type I errors when increasing the quadrature points from 3 to 20, except for a significant increase in computing time, with the latter increased by 15 times when using 20 over 3 quadrature points. Thus, for all practical purposes, it seems that 3 or 4 quadrature points is sufficient when using these procedures. However, as noted in Section 2.2, we should limit NLMIXED to models with a relatively small number of random effects to ensure accurate intergral approximation to the log-likelihood.

Unexpectedly, though quite interestingly as well, the levels of accuracy of these remaining procedures seem to depend on the magnitude of the values of the parameters, with larger (absolute) values yielding higher biases. Further investigations are needed to determine the sources of such a relationship so improvement may be made to these procedures. If these procedures have to be used in a given application for reasons such as those discussed in Section 3, we recommend the use of bootstrapped standard errors to improve inference reliability [26].

This research is supported in part by NIH grant U54 RR023480 and UL1-RR024160.

**Conflict of Interests Statement**

The authors have declared no conflict of interest.

1. Breslow NE, Clayton DC. Approximate inference in generalized linear mixed models. Journal of the American Statistical Association. 1993;88:9–25.

2. SAS Institute Inc. Enhancements in SAS/STAT 9.2 Software. Cary, NC: SAS Institute Inc.; 2008.

3. R Development Core Team. Vienna, Austria: R Foundation for Statistical Computing; 2010. R: A language and environment for statistical computing. ISBN 3-900051-07-0, URL http://www.R-project.org.

4. McCullagh P, Nelder JA. Generalized Linear Models. 2nd edition. London: Chapman and Hall; 1989.

5. Kowalski J, Tu XM. Modern Applied U Statistics. New York: Wiley; 2008.

6. Gilliland D, Schabenberger O. Limits on Pairwise Association for Equi-Correlated Binary Variables. Journal of Applied Statistical Sciences. 2001;10:279–285.

7. Bahadur RR. Studies in Item Analysis and Prediction. Stanford, CA: Stanford University Press; 1961. “A Representation of the Joint Distribution of Responses to n Dichotomous Items”

8. Prentice RL. Correlated binary regression with covariates specific to each binary observation. Biometrics. 1988;44:321–327. [PubMed]

9. Shults J, Sun W, Tu XM, Kim H, Amsterdam J, Hilbe JM, Ten-Have T. A comparison of several approaches for choosing between working correlation structures in generalized estimating equation analysis of longitudinal binary data. Statistics in Medicine. 2009;28:2338–2355. [PubMed]

10. Diggle PJ, Heagerty PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data. 2nd. edition. New York: Oxford University Press; 2002.

11. Davidian M, Gallant RA. The nonlinear mixed effects model with a smooth random effects density. Biometrika. 1993;80:475–488.

12. Fitzmaurice GM, Laird NM, Ware JH. Applied Longitudinal Analysis. New York: Wiley; 2004.

13. Demidenko E. Mixed Models: Theory and Applications. Hoboken, New Jersey: Wiley; 2004.

14. Hedeker D, Gibbons RD. Longitudinal Data Analysis. New York: Wiley; 2006.

15. Pinheiro JC, Bates DM. Mixed-effects models in S and S-PLUS. New York: Springer; 2000.

16. Schabenberger O, Gregoire TG. Population-averaged and subject-specific approaches for clustered categorical data. Journal of Statistical Computation and Simulation. 1996;54:231–253.

17. Lindstrom MJ, Bates DM. Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data. Journal of the American Statistical Association. 1988;83:1014–1022.

18. Broström G. Technical Reports. Umeå, Sweden: Department of Statistics, Umeå University; 2003. Generalized linear models with random intercepts. SE-90187.

19. Wolfinger R, O’Connell M. Generalized linear mixed models: a pseudolikelihood approach. Journal of Statistical Compututation and Simulation. 1993;48:233–243.

20. Flom PL, McMahon JM, Pouget ER. Using PROC NLMIXED and PROC GLMMIX to analyze dyadic data with binary outcomes. NESUG. 2006 URL http://www.nesug.org/proceedings/nesug06/an/da08.pdf.

21. Lu N, Tang W, He H, Yu Q, Crits-Christoph P, Zhang H, Tu XM. On the Impact of Parametric Assumptions and Robust Alternatives for Longitudinal Data Analysis. Biometrical Journal. 2009;51:627–643. [PMC free article] [PubMed]

22. Bates D. Fitting mixed-effects models using the lme4 packages in R. 2007 URL http://www.stat.wisc.edu/~bates/IMPS2008/lme4D.pdf.

23. Imai K, King G, Lau O. Toward a common framework for statistical analysis and development. Journal of Computational and Graphical Statistics. 2008;17:892–913.

24. Davis CS. Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials. Statistics in Medicine. 1991;10:1959–1980. [PubMed]

25. Moran PAP. Statistical inference with bivariate gamma distributions. Biometrika. 1969;56:627–634.

26. Field CA, Welsh AH. Bootstrapping clustered data. Journal of the Royal Statistical Society: Series B. 2007;69:369–390.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |