Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Comput Stat Data Anal. Author manuscript; available in PMC 2010 June 3.
Published in final edited form as:
Comput Stat Data Anal. 2010 March 1; 54(3): 790–801.
doi:  10.1016/j.csda.2009.01.016
PMCID: PMC2880516

Impact of non-normal random effects on inference by multiple imputation: A simulation assessment


Multivariate extensions of well-known linear mixed-effects models have been increasingly utilized in inference by multiple imputation in the analysis of multilevel incomplete data. The normality assumption for the underlying error terms and random effects plays a crucial role in simulating the posterior predictive distribution from which the multiple imputations are drawn. The plausibility of this normality assumption on the subject-specific random effects is assessed. Specifically, the performance of multiple imputation created under a multivariate linear mixed-effects model is investigated on a diverse set of incomplete data sets simulated under varying distributional characteristics. Under moderate amounts of missing data, the simulation study confirms that the underlying model leads to a well-calibrated procedure with negligible biases and actual coverage rates close to nominal rates in estimates of the regression coefficients. Estimation quality of the random-effect variance and association measures, however, are negatively affected from both the misspecification of the random-effect distribution and number of incompletely-observed variables. Some of the adverse impacts include lower coverage rates and increased biases.

1. Introduction

The existence of missing data is inevitable in many settings. Among them are multilevel applications where missing values are arbitrarily seen in the observational units nested within naturally occurring groups, such as patients nested within hospitals. As most statistical methods for analyzing data under such circumstances are not designed to handle arbitrary missingness (some model fitting techniques can handle longitudinal data with drop-out under missing at random), in general, missing values are recognized as a complicating factor to the analyses. Indeed, identifying appropriate analytical tools to draw the desired inferences in the presence of missing data is often a daunting task because such tools need to account for the reasons for missingness (i.e. missingness mechanism) as well as the key features of the multilevel design such as levels of variations. Ignoring these essential characteristics may result in misleading results such as biased parameter estimates (e.g. underestimated variances) and reduced statistical power (Rubin, 1976, 1987; Schafer, 1997).

One of the most popular methods in conducting inference from incomplete data applications is multiple imputation (MI) (Rubin, 1987). MI is a simulation-based inferential tool operating on M > 1 “completed” data sets, where the missing values are replaced by random draws from their respective predictive distributions (e.g. posterior predictive distribution of missing data). These M versions of completed data are then analyzed by standard complete-data methods and the results are combined into a single inferential statement using rules to yield estimates, standard errors and p-values that formally incorporate the missing data uncertainty into the modeling process (Rubin, 1987). The key ideas and the advantages of MI are given by Rubin (1987) and Schafer (1997).

The crucial step in MI is filling in the missing data by drawing from the posterior predictive distribution of the missing data (i.e. the conditional distribution of the missing data, given the observed data and the unknown parameters). This typically involves positing a parametric model for the data and using it to derive this conditional distribution. In multilevel data applications, multivariate extensions of the mixed-effects models based on normality have often been perceived as a natural assumption because (1) it reflects the design features; and (2) the conditional distribution of the missing data given the observed data is easily tractable (Schafer and Yucel, 2002). There are two software products implementing the MI-based inference in multilevel data applications. The first one is freeware and available as a R (R Development Core Team, 2007) package called pan (Schafer and Yucel, 2002), and the second one is a macro called mimacro (Carpenter and Goldstein, 2004) which operates within a commercially available software package MLwiN (Rasbash et al., 2006). Because these are extensively used, it is important to examine how good their performance is when the underlying distributional assumption on the random-effects is not met. To the best of our knowledge, despite their growing use in the analyses of multilevel incomplete data, the plausibility and impact of the normality assumption of the random-effects have not been thoroughly evaluated. An evaluation using simulated examples that represent deviations from the standard normality assumption, such as skewness, non-zero peakedness, heavy tails, and flatness of the density that clearly violate the characteristics of normality, is provided in the current study. Performance of MI under varying distributional assumptions and sample sizes is also examined. Throughout the computations, R package pan (Schafer and Yucel, 2002) was used.

Section 2 gives a brief overview of the model used to impute the missing values for continuous or nearly-continuous variables and provides a discussion of the crucial assumptions of MI inference. Section 3 then summarizes how multiple imputations are created under a multivariate generalization of mixed-effects models. Section 4 outlines and discusses the findings of the simulations. Finally, Section 5 concludes the manuscript with a discussion of the implications of these findings.

2. Models used in MI inference

2.1. Imputation model

Below, a brief review on the imputation model commonly used to jointly impute missing values for a set of continuous random variables under a multivariate linear mixed-effects model is given. This model assumes two levels in the data collection, e.g. repeated measurements from individuals or secondary sampling units (US residents) within primary sampling units (US Census blocks). Allowing higher levels of nesting is possible by incorporating the corresponding random-effects in the imputation model (1) below, which has been implemented in MLwiN (Browne, 2003, ch. 17; Carpenter and Goldstein, 2004). We focus on the two-level model because it simplifies the assessment of the impact of the departures from normality and it is the underlying model of the R package pan.

Let yi denote an ni × r matrix of multivariate responses for sample unit i, i = 1, 2, …, m, where each row of yi is a joint realization of a set of r variables subject to missing values. In longitudinal studies, for example, the rows correspond to the measurements of an individual at each point in time, and in clustered data, the rows correspond to the individuals within a cluster. From the perspective of imputing missing values, the variables subject to missingness are included in yi, regardless of being classified as a predictor (covariate) or a response. Often there exists auxiliary information that should be included for the reasons of post-imputation analyses or enriching the reasons for missingness, and we will denote this auxiliary information by Xi for the ith unit or cluster. Multivariate extension of a linear mixed-effects model with conventional Gaussian error structures on the residuals and random-effects is used as the imputation model for the missing values in yi:


where Xi (ni × p) and Zi (ni × q) are known covariate matrices, β (p × r) is a matrix of regression coefficients (fixed-effects) common to all units, and bi (q × r) is a matrix of random coefficients, exhibiting the deviations of cluster i from the overall mean structure. In a fully parametric mixed-effects model framework, a normal probability distribution is often imposed on these deviations: vec(bi) ~ N(0, Ψ) (here the vec operator vectorizes a matrix by stacking its columns). Model (1) assumes that the rows of the residuals matrix εi are independently distributed as N(0, Σ), or alternatively, vec(εi) ~ N(0, Σ [multiply sign in circle] Ini).

Following the standard notation in the missing data literature, the complete data Y is comprised of the observed data Yobs and Ymis: Y = (Yobs, Ymis). Note that Yobs = (y1(obs), y2(obs), …, ym(obs)) and Ymis = (y1(mis), y2(mis), …, ym(mis)) denote all the observed and missing responses. Further, θ includes all the unknown parameters β, Σ, Ψ.

2.2. Models for missingness mechanism

Explicit or implicit assumptions are made about the missing-data mechanisms in all missing-data methods. To set the notation for the discussion of missingness mechanisms, let R denote the set of missing-value indicators; note that R has the same dimension as Y, and it is always observed. Each element of R takes the value of 0 or 1 depending on whether the corresponding element in Y is missing or observed, respectively. Similar to Y, R can be seen as a random variable; and the conditional distribution of R given Y depends on a set of parameters, say γ. The small letters r and yobs denote the realized values of R and Yobs.

Most tools available to the practitioners of MI (via software packages such as SAS PROC MIXED, R packages norm, cat, mix and pan) assume missing at random (MAR) as the missingness mechanism. This assumption simply means that the missingness probability may depend on the observed data but not on the missing data over the conditional distribution of R given Yobs. More formally, P(R|Yobs, Ymis, γ) = P(R|Yobs, γ). Some misconceptions among practitioners exist due to the name MAR. The missing values do not occur at random under MAR; when they do, the mechanism is, in fact, missing completely at random (MCAR). Under MCAR, the missingness probabilities are independent from both Yobs and Ymis: P(R|Yobs, Ymis, γ) = P(R|γ). When the missingness probabilities depend on Ymis, the missingness mechanism is called missing not at random (MNAR). Under MNAR, one must posit a model for the complete data as well as for R. These models are usually very sensitive to the model assumptions (see detailed discussion by Schafer and Graham (2002) for more information). Another important concept is the “ignorability” of the missingness mechanism, and it is often seen as an implied condition once MAR is assumed. Specifically, ignorability occurs when the mechanism is MAR and the parameters γ and θ are distinct: f(Yobs, R|θ, γ) = f(Yobs|θ)f(R|γ) (see Little and Rubin (2002) and Schafer (1997) for more details).

3. Multiple imputation under a multivariate linear mixed-effects model (MLMM)

A brief overview of MI and how it is conducted under a multivariate linear mixed-effects model (MLMM) for a two-level incomplete data is given (see Schafer and Yucel (2002) for computational details). MI is a Monte Carlo simulation-based technique where the missing values are replaced by a moderate number (M > 1 and usually less than 10) of simulated values. How these simulated values are drawn relate to distributional assumptions and the missingness mechanism (Schafer and Yucel, 2002; Browne, 2003, ch. 17; Carpenter and Goldstein, 2004).

The principle idea of MI is that it treats the missing data as an explicit source of random variability which is incorporated into computation of standard errors. The process of creating imputations, analyzing the imputed data sets and combining the results is a Monte Carlo version of averaging the statistical results over the predictive distribution of the missing data, ∫ P(θ|Yobs, Ymis)P(Ymis|Yobs)dYmis. In applications with a moderate level of missingness, reasonably well-calibrated results can be obtained with M ≤ 10. Once the imputations have been created, the M completed data sets may be analyzed with complete-data techniques, resulting in M set of estimates and associated standard errors, which are then combined by rules described by Rubin (1987).

The role of the imputation model in multilevel data such as those arising in longitudinal or clustered designs, is to preserve the relationships among variables (subject to missingness or not) and correlations among the observations from the same subject or cluster. Often the meaning or interpretation of its parameters is not crucial. The key importance of the imputation model is in its functionality to predict, simulate the missing observations and to capture, as much as possible, reasons for the missingness (to make the MAR plausible). MLMM given in (1), for these reasons, has found a common use among imputers. While other models of a similar nature are also possible, we focus on model (1) as it is the only one implemented in software products (R package pan and MLwiN).

Multiple imputations under model (1) are M independent draws Ymis(1),,Ymis(M) from a posterior predictive distribution for the missing data, P(Ymis|Yobs) = ∫ P(Ymis|Yobs, θ) P(θ|Yobs)dθ where P(θ|Yobs) is the observed-data posterior density, which is proportional to the product of a prior density π(θ) and the observed-data likelihood function L(θ|Yobs) = ∫ L(θ|Y)dYmis. As in most missing-data problems with arbitrary missingness patterns, the posterior distribution P(Ymis|Yobs) cannot be simulated directly. Markov Chain Monte Carlo (MCMC) techniques facilitate the creation of random draws of Ymis from P(Ymis|Yobs). In MCMC, one generates a sequence of dependent random variates whose distribution converges to the desired target — in this case the posterior distribution of missing data.

An MCMC procedure called the Gibbs sampler is the underlying iterative simulation algorithm used to draw the missing values from P(Ymis|Yobs). Specifically, suppose the current versions of the unknown parameters θ(t) = (β(t), Σ(t), Ψ(t)) and missing data Ymis(t) are updated in three steps: (1) bi(t+1)~P(bi|Yobs,Ymis(t),θ(t)), independently for i = 1, …, m; (2) θ(t+1)~P(θ|Yobs,Ymis(t),B(t+1)); and finally, (3) yi(mis)(t+1)~P(yi(mis)|Yobs,B(t+1),θ(t+1)), for i = 1, …, m, where B = (b1, …, bm). Given the starting values θ(0) and Ymis(0), these steps define one cycle of an MCMC procedure, or the Gibbs sampler. Executing the cycle repeatedly creates sequences {θ(1), θ(2), …} and {Ymis(1),Ymis(2),} whose limiting distributions are P(θ|Yobs) and P(Ymis|Yobs), respectively.

To reflect the uncertainty in the unknown parameters θ, the Bayesian paradigm has often been used in the preceding MCMC to create multiply imputed data sets. Under the model given in (1), independent inverted Wishart priors are used on the variance components: Σ−1, Ψ−1: Σ−1 ~ W1, Λ1) and Ψ−1 ~ W2, Λ2), where W(ν, Λ) denotes a Wishart variate with ν > 0 degrees of freedom and mean νΛ > 0. It is helpful to regard ν11Λ11andν21Λ21 as prior guesses for Σ and Ψ with confidence equivalent to ν1 and ν2 degrees of freedom, respectively. For β, an improper prior uniform ‘density’ over Rpr is often employed. In practice, determination of these hyperparameters can be difficult. Here, we used the R package mlmmm (Yucel, 2007) which computes maximum likelihood estimates of model (1) parameters to assign as the initial guesses for Ψ and Σ and as the initial values β(0), Σ(0) and Ψ(0) for the Gibbs sampler.

Computational details of the algorithm derived under these priors, Gaussian assumptions on the bi, εi, and model (1) are given in detail by Schafer and Yucel (2002). An important question is how to monitor convergence of the underlying Gibbs sampler. As noted by many, including Gelman et al. (2003), and Schafer (1997), there are two general issues with the convergence: (1) whether the simulations are representative of the predictive distribution of missing data P(Ymis|Yobs), and (2) whether the successive draws are significantly correlated. These two issues, individually or collectively, can seriously damage the overall quality of inferences in terms of accuracy and representativeness of the target distribution. Simple investigation of time series plots and autocorrelation function plots is often indicative of the interdependency of the successive draws. These plots, along with mode-finding algorithms such as EM, can be used to assess convergence behavior and to determine the number of iterations needed to achieve independent samples from the desired distributions P(Ymis|Yobs) and P(θ|Yobs). Running multiple independent chains from common starting values and checking if they converge to the same P(Ymis|Yobs) is also a powerful and practical tool to monitor the convergence. We provide more specific details on Section 4.1.5.

Studies in the multiple imputation literature indicate that inference by MI is quite robust to departures from model assumptions used to derive the posterior predictive distribution of missing data P(Ymis|Yobs) (Schafer, 1997; Demirtas et al., 2008). Under Gaussian assumptions on the random effects, bi, and the error terms, εi, MLMM-based MI uses simple conjugacy to derive the conditional posterior distributions of θ, random-effects and missing data which are all either normal or Wishart-based distributions (Schafer and Yucel, 2002). These conditionals constitute the steps of the Gibbs sampler given above, leading to the posterior predictive distribution of missing data. How sensitive is the inference by multiple imputation to the Gaussian assumption on the random effects? This question begs several sub-questions: what are the consequences of imputing under MLMM when the true distribution has heavier tails than normal, or when it is not symmetric and so on. The real data often do not conform with normality, and many procedures of statistical practice use the normality as an approximation. It is impossible to tailor the analytical solutions and computational algorithms suitable to tackle the underlying missing-data problem. Therefore, many of the consumers of multiple imputation rely on the normality assumption when imputing continuous multilevel data.

The next section assesses the viability of the Gaussian assumption in extensive simulation scenarios representing a diverse set of distributional violations of the assumed normality of the random-effects. We devise a simulation study primarily to generate the conditions under which the incomplete data were realized and where the imputation approach relies on the incompatible normality assumption. This strategy provides a feasible way of evaluation of the performance of the MLMM imputation with normal random-effects (Demirtas, 2004, 2005).

4. A simulation study

The characteristics of the MLMM-based MI when the real data-generating mechanism employs non-normal random effects are evaluated via the simulation study. Simulation design consists of the following steps: data generation, imposing a missingness mechanism, MI inference under a hypothetical analyst’s model, parameter estimation, and evaluation of this estimation.

4.1. Simulation design

4.1.1. Data generation

A bivariate linear mixed-effects model was used to generate the complete data:


where i = 1, 2, …, m, and j = 1, 2, …, ni. The number of clusters (m) and observations per cluster (ni) are varied so that large and small sample behavior were observed. We allowed m to take values of 100, 50 and 20; ni to take values of 5 and 10 (the ni are assumed to be same across the clusters).

The error term εi are assumed to be distributed as N(0, Σ) and our simulations used values of σ11 = 0.9, σ12 = 0.3 and σ22 = 0.4. Subject-specific random-effects bi = [bi1 bi2]T were imposed, varying distributions with means zero to investigate the performance of MI-based inferences under MLMM with non-normal random-effects. Results pertaining to varying distributions of the random effects are summarized in the Section 4.2. Similar results are also seen with non-normal distributions on εi and are available upon request from the authors.

  • Multivariate normal distribution: This is the default assumption under MLMM imputation, and was primarily used for comparative reasons. Random variates, say b, were drawn from N2(0, Σb):
    Elements of Σb were set as: σb12=3σb22=1.5 and σb1,b2 = 0.25. These values were also used in the following distributions.
  • Multivariate t distribution: Multivariate t distribution with ν = 4 was used to observe the behavior of MI under MLMM when the underlying distribution of the random-effects has heavier tails than normal (see Fig. 1). Here the random variates follow the density function of the form:
    Fig. 1
    Bivariate normal and t (with scale parameter 1 and 10 degrees of freedom) distributions.
  • Multivariate Laplace distribution: Multivariate Laplace distribution or double exponential distribution was considered to examine the behavior of MI inferences under distributions with heavier tails than normal (see Fig. 2). The density function is given by,
    Fig. 2
    Bivariate Laplace and uniform distributions.
  • Multivariate uniform distribution: Random bi-variates from Ud(0, 1) were generated with Σb to assess the behavior under a flat density.

4.1.2. Imposing missingness mechanism

Throughout the simulation study, we considered two general cases: (1) missingness is confined to covariates; and (2) arbitrary missingness is present in both response and covariates of the analyst’s model which is described in Section 4.1.3. To study the first case, we imposed missing values on the covariate Y2 under the missingness mechanism MAR defined by


where rY2 is the missingness indicator for Y2. Conditioning on completely-observed y1 makes the MAR assumption (Little and Rubin, 2002) as assumed by MLMM imputation, more plausible. The model stated above results in approximately 45% missingness in Y2 values. Tables 1 and and22 in the results section pertain to case (1).

Table 1
Summary of the results under random-effects following a multivariate normal and t4. Missingness rate is (approximately) 45% on y2. For each parameter of interest (PI), MI results and complete-case only analysis are given in the first and second rows, ...
Table 2
Summary of the results under random-effects following a multivariate Laplace and Uniform. Missingness rate is (approximately) 45% on y2. Format is the same as Table 1.

Missing values were imposed under two independent intercept-only regression models leading to MCAR mechanisms (stronger assumption than MAR) to study the arbitrary missingness on both covariates and responses:

logP(rY1=1)1P(rY1=1)=0.2, logP(rY2=1)1P(rY2=1)=0.2.

Under these specifications, rates of missingness in Y1 and Y2 are, on average, 45% and 55%. Tables 3 and and44 in the results section pertain the case (2).

Table 3
Summary of the results under random-effects following a multivariate normal and t4. Missingness rates are around 45% and 55% on y1 and y2, respectively. Format is the same as Table 1.
Table 4
Summary of the results under random-effects distributed as Laplace and Uniform. Missingness rates are around 45% and 55% on y1 and y2, respectively. Format is the same as Table 1.

4.1.3. Estimation of the Analyst’s model

To mimic the practice of MI, the model given in (3) is assumed to underlie the substantive goal of the analysis, i.e. the analyst’s model. Hence, we performed MI inference on the parameters of this model and evaluated the estimation quality using the criteria given in Section 4.1.4.


where bi and εi are the random errors for level-1 and level-2 observational units, and assumed to be N(0,σb2)andN(0,σε2). In a typical analysis, the parameters of interest are β = (β0, β1), σb. For each of the imputed datasets created under MLMM model (1) via the R package pan, these parameters were estimated using the R package called lme4 (Bates and Sarkar, 2007). In Section 4.2, the performance of MI under MLMM in estimating these unknown quantities is summarized.

One of the important issues in MI inference is the compatibility of the MI model with the analyst’s model as well as the data-generation mechanism. MI practitioners are typically advised to include auxiliary variables either to account for the causes of missingness or to improve the missing data procedure. Collins et al. (2001) discuss this subject extensively in cross-sectional settings and present a simulation study assessing the benefits and costs of inclusive and restrictive MI strategies. Theoretical aspects of the compatibility of the MI model with the analyst’s model was earlier explored by Meng (1994), who also coined the term “uncongeniality” (i.e. incompatibility of MI and analyst’s model). In general, to prevent inferential biases, the MI models should be kept general enough to imply any particular analyst’s model.

4.1.4. Evaluation criteria

Multiple imputations were created using the R package pan. The simulation procedure (data generation, imposing missing values, creating MIs, finding the estimates of the analyst’s model (3) for each of the imputed datasets, and combining these estimates using Rubin’s rules) was repeated 1000 times to fully describe the behavior in a repetitive sampling environment as well as to reduce the simulation error as much as possible. Several quantities were used to measure the accuracy as well as the efficiency of parameter estimates under MLMM-based MI when the distributional assumptions are violated. Briefly, these quantities are (assuming the unknown parameter is θ):

  • Coverage rate (CR): The percentage of times that the true parameter value is covered in the 95% confidence interval. Here the true parameter value is the average parameter estimate across the simulations before the missing values are imposed. If a procedure is working well, the actual coverage should be close to the nominal rate of 95% in our study. However, it is important to evaluate coverage with the other measures because high variances can lead to higher CRs. The performance of the procedure is regarded to be poor if its coverage drops below 90% (Collins et al., 2001). If the procedure results in CRs that are close to 100% or below 85%, extra caution should be taken when using that procedure.
  • Root-mean-square error (RMSE): Because nonresponse or missing values have undesirable effect on the variances, it is important to evaluate this adverse effect. An integrated measure of bias and variance is used, evaluating [theta w/ hat] in terms of combined accuracy and precision. RMSE([theta w/ hat]) is defined as E(θ^θ)2.
  • Average width of confidence interval (AW): The distance between the average lower and upper confidence interval limits across 1000 confidence intervals. A high CR along with narrow, calibrated confidence intervals translates into greater accuracy and higher power.

4.1.5. Notes on computation

As noted in Section 3, it is important to monitor the convergence of the Gibbs sampler in the R package pan, the computational engine for producing the multiple imputations under model MLMM (1). To assess the convergence, the Gibbs sampler was run for 10,000 cycles under mild prior distributions; we set ν1=2,Λ11=2Σ^,ν2=2,andΛ21=2Ψ^ where [Sigma] and [Psi] were computed by the R package mlmmm (Yucel, 2007). We assessed time-series plots and sample autocorrelations for elements of β, Σ and Ψ for several of the simulations. This exercise suggested that, even in the extreme cases of m = 100, ni = 10 and m = 20, ni = 5, several thousand cycles were needed for the dependence between successive Gibbs draws to die out. To be quite conservative for the convergence, each of the simulations in all scenarios used a burn-in period of 10,000 cycles and saved the simulated values of missing data at 11,000, 12,000, …, 20,000 as multiple imputations. This required about 45 min on a 2.6 GHz Intel Core Duo workstation for each of the simulation scenarios presented in Tables 14. These multiply imputed datasets were used to estimate the model given in (3) using the R package lme4 (Bates and Sarkar, 2007), leading to 10 sets of (β^,s.e.^(β^)),(σ^b,s.e.^(σ^b)) which were then combined using the MI combining rules defined by Rubin (1987).

4.2. Results

The simulation results focus on the performance of the MLMM-based MI on the parameter estimates of the analyst’s model given by (3) (i.e. [beta]0, [beta]1 and [sigma with hat]b). Results show that the residual variance term is mostly unaffected by the current model misspecifications, and are, therefore, not reported here. Tables 14 present the results from scenarios that mimic a missing covariate (y2 missing on average 45% under MAR) and an arbitrary missingness (both y1 and y2 missing on average 45% and 55%, respectively, under MCAR). The first column of Tables 14 denotes the respective distributions of the random-effects and error terms used to simulate data and sample size (m, ni). The second column denotes the parameter of interest (PI): β0, β1, and σb. The next column contains the true value (TV) of the PI. The columns 4–7 state the evaluation measures for each of the unique combination of the first three column values. These are evaluation measures under the MI inference (first entry) and complete-case only or case deletion (second entry). For example, when both the random-effects and residuals are assumed to be distributed as normal with m = 100, ni = 10, the true value for β0 is calculated to be 2.713, as the average of [beta]0 across the 1000 simulated data before missing values are imposed. The average MI-estimate across the 1000 simulations is −2.694 and the average complete-case-only (restricting the analysis to completely-observed cases) estimate is −1.398. Corresponding coverage rates are 93% and 8.7%, respectively. Either method of handling missing data uses the same model-fitting technique as implemented by the R package lme4 (Bates and Sarkar, 2007).

Tables 1 and and22 show that, across the board, the MI inferences lead to acceptable parameter estimation and coverage, indicating a good performance of the MLMM-based MI procedure. We specifically observe estimates with minimal biases and CIs with excellent coverage rates, even when the distribution assumption is clearly violated and when the sample sizes are small. The performance of the MI is far superior to the unprincipled method of simple case-deletion, which leads to significant biases as well as dismal coverage rates (in some cases it is as dramatic as 0%). It is also noted that in some conditions the performance of the MI procedure is not as satisfactory. When the underlying distribution of the random-effects has heavier tails than normal, specifically a t distribution (see the bottom panel in Table 1), we observe weak performance (e.g. higher RMSEs, lower CRs and larger AWs) in the estimation of σb. However, in most conditions, regardless of the distributional assumption of the random-effects, the MI procedure results in well-calibrated estimates with minimal bias and coverage rates that are around the nominal rate of 95%. This consistent performance across the distributions and the varying sample sizes indicate that, under the MLMM-based MI inference, large and small sample properties of the maximum likelihood estimates are satisfied (Demidenko, 2004). Examples of these properties include unbiasedness, smaller uncertainty measures, and decreased RMSE for increased sample size. In the mixed-effects framework, we follow the convention of large m and ni for “large sample” while noting that the large m is more important for the large sample properties to hold. For more discussion see Demidenko (2004).

Overall performance of the MLMM-based MI is weaker when both variables are subject to missingness under the non-normal random-effects. We note bigger biases, relatively lower CRs, higher RMSEs and larger AWs in Tables 3 and and44 compared to Tables 1 and and2.2. This simply means that under arbitrary patterns of missing data, the inferences are more vulnerable to the misspecification of the random-effect distribution as well as to the small sample sizes. The estimates describing the associations and the spread of the random-effect distribution are most impacted. There is little discernible difference in the overall performance in the estimation of β0 between the two missing-data scenarios (univariate versus arbitrary). When both response and covariate are subject to missingness, even though the estimation of β1 is still impacted by the relatively higher RMSE, AW, the CR and small biases indicate that the inference on the fixed-effects remains insensitive to the non-normality of the random-effects and to the small sample sizes. This essentially implies that the MI inference leads to estimates with good frequentist properties even when the joint aspects of data are not well-estimated due to arbitrary patterns of missing data. However, when the distribution of random-effects is not normal, the validity of the MI inferences on the random-effect variance is highly questionable in terms of bias, CR or RMSE.

Recall that data generation scenario (given by model (2)) assumes a weak correlation between Y1 and Y2. To assess whether this assumption has any discernible impact on the performance measures, a similar simulation experiment to the one presented above under three different conditions for weak, moderate and high correlations among Y1 and Y2 was also conducted. All scenarios are in agreement on well-calibrated MI inference under MLMM. Higher correlations between Y1 and Y2 translate into the improved accuracy measures of performance, such as reduced AWs, higher rates of coverage, and lower RMSE. However, it should be noted that even when Y1 and Y2 are weakly correlated, these performance measures still indicate inferences with desirable properties (e.g. unbiasedness, low average width, low RMSE, close to 95% coverage rates). The effect of the correlations has little impact on the overall performance measures (Table 5).

Table 5
Results of performance measures under weak, moderate and high correlation among Y-variables of model (1), sample size is assumed to 100 clusters with 10 units each. Normal random-effects are assumed. All the other simulation conditions are same, with ...

5. Discussion

The overall goal of our work was to assess the impact of the normality assumption of the random effects on the MI-based inferences in multilevel incomplete data applications. In problems where the missing values are confined to a small number of variables, or when there are sources for auxiliary information that can be used as predictors in the imputation model, the assumption of normality does not seem to matter on the final inferences, even when the random-effects are wrongly assumed to be normal. However, when the missingness is arbitrary in many variables and the rates of missingness are high (or when the sample sizes are small), the validity of MI-based estimation is more adversely affected under the similar non-normal random effects. For incorrectly specified distributions, this negative effect is more clearly seen in parameters describing the relationships and/or the distribution of the random-effects. This is somewhat expected, as very little useful data might be available to estimate the “relatedness” aspects of the distribution. Our simulation study also suggests that taking no action about the missing data can lead to inferences that are far more misleading, regardless of correct or incorrect specification of distributional assumptions.

For the purposes of assessing the sensitivity of MLMM-based MI to distributional assumptions on the random-effects, a limited number of distinct distributions was considered. Therefore, the simulation study does not claim to be generalizable, as there are many other scenarios that can be potentially encountered in real data applications. However, we believe that our simulation study is valuable in the sense that it sheds some light on the effects of making wrong assumptions on the distribution of the random effects (or any distribution on the subject-specific effects when there is not much data support for estimation) in the imputation model.

MI-based procedures, when cautiously used, produce reasonably well-calibrated inference as long as the missingness is not widely spread across the data. However, it is important for the practitioners of missing data methods not to use the MLMM-based software blindly. Similar to the general practice of model-fitting and the associated checks for assessing the goodness-of-fit, inference by multiple imputation should always be performed in conjunction with the tools necessary to assess the plausibility of the underlying assumptions. What are these tools? While they are not as widely implemented as the imputation procedures, there are still well-established techniques for this purpose. Posterior predictive checks, as suggested by Gelman et al. (1996, 2003), can be employed to assess the plausibility of the imputation model. As a direct check for the quality of the assumed model in creating imputations, one can devise a calibration strategy to make the imputations better represent the structure of the data and possibly the varying structure of the clusters. Post-imputation checks and calibration-based methods are presented in detail by Gelman et al. (2005) and Yucel et al. (2008).

The results suggest a number of directions for future research. The first direction is the imputation models with less restrictive distributional assumptions such as those based on marginal models. These models can be particularly useful in small sample scenarios, especially when the missingness is seen indiscriminately in the response or covariates. A second potentially fruitful direction is to explore the ad-hoc methods of imputation, e.g. predictive mean matching applied to multilevel settings. These methods usually make minimal assumptions, but may lead to underestimation of the standard errors in the multilevel settings. However, with improvements to adjust the standard errors, the pay-off might be considerable when there is a reasonable hesitation to use a fully parametric approach for multiple imputation. Finally, we believe that the most important direction is the dissemination of any computational development and/or improvements by means of statistical software.


The authors thank the associate reviewer and the reviewer for helpful comments which greatly improved our manuscript. First author’s research was partially supported by the National Science Foundation.


  • Bates D, Sarkar D. lme4: Linear mixed-effects models using s4 classes. 2007
  • Browne WJ. MCMC estimation in MLwiN (version 2.0) London, UK: Institute of Education, University of London; 2003.
  • Carpenter J, Goldstein H. Multiple imputation in MLwiN. Multilevel modelling newsletter 4. 2004
  • Collins L, Schafer J, Kam C-M. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods. 2001;6:330–351. [PubMed]
  • Demidenko E. Mixed Models: Theory and Applications. New York, NY: J. Wiley & Sons; 2004.
  • Demirtas H. Simulation-driven inferences for multiply imputed longitudinal datasets. Statistica Neerlandica. 2004;58:466–482.
  • Demirtas H. Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out. Statistics in Medicine. 2005;24:2345–2363. [PubMed]
  • Demirtas H, Freels S, Yucel R. Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: A simulation assessment. Journal of Statistical Computation and Simulation. 2008;78:69–84.
  • Gelman A, Carlin JB, Stern HS, Rubin DB. Bayesian Data Analysis. second ed. London: Chapman & Hall Ltd.; 2003.
  • Gelman A, Mechelen I, Verbeke G, Heitjan D, Meulders M. Multiple imputation for model checking: Completed-data plots with missing and latent data. Biometrics. 2005;61:74–85. [PubMed]
  • Gelman A, Meng X, Stern H. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica. 1996;6:733–807.
  • Little R, Rubin D. Statistical Analysis with Missing Data. second ed. New York, NY: J. Wiley & Sons; 2002.
  • Meng X. Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994;10:538–573.
  • R Development Core Team. R: A Language and Environment for Statistical Computing. R foundation for statistical Computing. Vienna, Australia: 2007. ISBN: 3-900051-07-0, URL:
  • Rasbash J, Steel F, Browne W, Prosser B. Mlwin User’s Manual. Centre for Multilevel Modelling. Bristol, UK: 2006.
  • Rubin DB. Inference and missing data. Biometrika. 1976;63:581–590.
  • Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley and Sons; 1987.
  • Schafer J. Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997.
  • Schafer J, Graham J. Missing data: Our view of the state of the art. Psychological Methods. 2002;7:147–177. [PubMed]
  • Schafer J, Yucel R. Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics. 2002;11:421–442.
  • Yucel R. R mlmmm package: Fitting multivariate linear mixed-effects models with missing values. 2007. URL
  • Yucel R, He Y, Zaslavsky A. Using calibration to improve rounding in imputation. The American Statistician. 2008;62:125–129.