Manuscripts are organized following the underlying “imputation” philosophy implemented by the respective software. First group shares the common theme of variable-by-variable approach (also referred as chained imputation models). This approach is particularly useful in problems with a set of incompletely-observed variables with diverse set of measurement scales (e.g., continuous, categorical, count and semi-continuous) and in problems complicated by common survey practices including skip patterns and truncation. First paper in this group is by
Suet al. (2011). Their software implements flexible imputation techniques via chained imputation models and diagnostic tools that allow users to assess plausibility of the assumed imputation models. Specifically, their package
mi features flexible choice of predictors, models, and transformations for chained imputation models; binned residual plots for checking the fit of the conditional distributions used for imputation; and plots for comparing the distributions of observed and imputed data in one and two dimensions. Bayesian models are also used to construct more stable estimates when data are sparse and supported by a prior knowledge.
The second contribution is by
Buuren and Groothuis-Oudshoorn (2011) illustrating an increasingly popular approach to producing multiple imputations in settings pertaining to variables that are of varying natures and measured with restrictions. They present the most recent version of their R (
R Development Core Team 2011) package called
mice which imputes incomplete values by fully conditional specification. This package offers many practical solutions including predictor selection, passive imputation and automatic pooling to combine estimates from the multiply imputed datasets. These features are also extended to the multilevel continuous data. Finally, this version adds a capability of multilevel MI and interactive use with SPSS (
IBM Corporation 2011). The third contribution presents an implementation of a similar approach in Stata (
StataCorp. 2011). The manuscript by
Royston and White ( 2011) describes
ice which is the Stata module of the approach using the fully automatic pooling to produce multiple imputation.
Royston and White (2011) illustrate this fully-integrated module in Stata using real data from an observational study in ovarian cancer.
Joint modeling approach follows the variable-by-variable approach. Carpenter and his colleagues describe a comprehensive module called
REALCOM-IMPUTE of the multilevel model fitting software
MLwiN (
Carpenter et al. 2011). Variables subject to missing values are modeled under a multivariate latent normal model with random-effects, which is used as a basis to approximate the underlying posterior predictive distribution. The authors use Markov chain Monte Carlo (MCMC) simulation techniques to fit the imputation models and thus draw the multiple imputations. The software also allows for weights to account for sampling design both at level 1 and level 2. A variety of variables can be imputed: continuous, ordinal or nominal. Users can further analyze the imputed datasets under multilevel models and combine estimates using MI rules defined by
Rubin ( 1987).
Another increasingly popular package is PROC MI and PROC MIANALYZE procedures of SAS.
Yuan (2011) illustrates how to conduct MI inference in SAS. PROC MI implements three major techniques one can adopt to produce multiple imputations. Specific choice of these techniques depends on the missingness pattern and the type of imputed variable. For the problems with monotone patterns of missingness (i.e. a variable missing implies that all subsequent variables to be missing), one can choose from the following three methods depending on the type of the variable(s) to be imputed: matching (using propensity score or predictive mean) or MCMC which draws imputations from a multivariate normal if the underlying variables are continuous. If they are categorical, one can choose logistic regression or discriminant-function-based method to match. For the arbitrary patterns of missingness, one would have to approximate the underlying posterior predictive distribution using a multivariate normal distribution with a set of priors provided by PROC MI (e.g., ridge or Jeffreys prior).
The final contribution illustrates
Amelia (
Honaker et al. 2011).
Amelia integrates two important computational tools EM and bootstrap to produce multiple imputations (
Dempster, Laird, and Rubin 1977 ;
Efron 1979). It implements a new computationally-improved EM-bootstrapping algorithm as an alternative to MCMC-based solutions. The imputation model still relies on a joint model, but the underlying sampling from the posterior predictive distribution is fundamentally different. Because the computations are centered around maximum likelihood (or posterior mode) estimates and it merely uses a re-sampling-based algorithm, it provides a computational efficiency. It also includes features to accurately impute cross-sectional datasets, individual time series, or sets of time series for different cross-sections. Finally, it allows users to facilitate graphical diagnostics for the imputed datasets.