Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3010738

Formats

Article sections

- Abstract
- 1. Introduction
- 2. Consistency in the bivariate case
- 3. Consistency in general
- 4. Conclusions
- References

Authors

Related links

Am Stat. Author manuscript; available in PMC 2010 December 28.

Published in final edited form as:

Am Stat. 2010 August 1; 64(3): 263–267.

doi: 10.1198/tast.2010.09203PMCID: PMC3010738

NIHMSID: NIHMS258556

Ke-Hai Yuan

University of Notre Dame

University of California, Los Angeles

Correspondence concerning this article should be addressed to Ke-Hai Yuan (Email: ude.dn@nauyk).

See other articles in PMC that cite the published article.

This paper shows that, when variables with missing values are linearly related to observed variables, the normal-distribution-based pseudo MLEs are still consistent. The population distribution may be unknown while the missing data process can follow an arbitrary missing at random mechanism. Enough details are provided for the bivariate case so that readers having taken a course in statistics/probability can fully understand the development. Sufficient conditions for the consistency of the MLEs in higher dimensions are also stated, while the details are omitted.

Incomplete or missing data exist in almost all areas of empirical research. They are especially common when data are collected longitudinally and/or by surveys. There can be various reasons for missing data to occur. The process by which data become incomplete was called the missing data mechanism by Rubin (1976). Missing completely at random (MCAR) is a process in which missingness of data is independent of both the observed and the missing values; missing at random (MAR) is a process in which missingness is independent of the missing values given the observed data. When missingness depends on the missing values themselves given the observed data, the process is not missing at random (NMAR). Missing data with an NMAR mechanism are also referred to as non-ignorable non-responses because maximum likelihood estimates (MLE), by ignoring the missing data mechanism, are generally inconsistent. This paper studies the consistency of the normal-distribution-based MLE with MAR data.

In contrast to many ad hoc procedures for missing data analysis, MLEs have the desired property of being consistent even when the specific MAR mechanism is ignored. When modeling real data, however, specifying the correct distribution form to obtain the true MLE is always challenging if not impossible. The normal distribution is often chosen for convenience, not because practical data tend to come from normally distributed populations. Geary (1947, p. 241) observed that “Normality is a myth; there never was, and never will be a normal distribution.” Such an observation was further supported by Micceri (1989), who examined 440 data sets obtained from journal articles, research projects as well as tests, and found that all were significantly nonnormally distributed. Thus, the normal-distribution-based MLEs for real data typically are pseudo MLEs, whose properties have been obtained by White (1982) and Gourieroux, Monfort and Trognon (1984) in the context of complete data. With missing data, however, according to Laird (1988) and Rotnitzky and Wypij (1994), pseudo MLEs will be inconsistent unless the missing data mechanism is MCAR. The need for a correct likelihood function with an MAR mechanism was also noted by Liang and Zeger (1986) and Little (1993). If a pseudo MLE is not consistent when data are MAR, then only the MCAR mechanism can be ignored when modeling practical multivariate data with unknown population distributions. Thus, in addition to being an important mathematical property, consistency of the normal-distribution-based pseudo MLE with MAR data also has wide implications for many areas of applied statistics where the normal distribution is routinely used to model missing data.

Let **x** = (*x*_{1}, *x*_{2}, …, *x _{q}*)′ be a vector representing a

$$P({r}_{j}=0\mid {\mathrm{x}}_{o},{\mathrm{x}}_{m})=P({r}_{j}=0\mid {\mathrm{x}}_{o})={g}_{j}({\mathrm{x}}_{o},{\gamma}_{j}),j=1,2,\cdots ,q$$

(1)

and the *r _{j}*'s are conditionally/locally independent given

$${g}_{j}({\mathrm{x}}_{o},{\gamma}_{j})=1$$

when **x**_{o} falls into certain hyper-rectangles; the probit selection model

$${g}_{j}({\mathrm{x}}_{o},{\gamma}_{j})=\Phi ({\gamma}_{j0}+{\gamma}_{j1}^{\prime}{\mathrm{x}}_{o}),$$

where Φ(·) is the cumulative distribution function of *N*(0, 1) and γ_{j1} contains the regression coefficients; and the logistic selection model

$${g}_{j}({\mathrm{x}}_{o},{\gamma}_{j})=\frac{\text{exp}({\gamma}_{j0}+{\gamma}_{j1}^{\prime}{\mathrm{x}}_{o})}{1+\text{exp}({\gamma}_{j0}+{\gamma}_{j1}^{\prime}{\mathrm{x}}_{o})}.$$

The interval selection model is widely used in economics (Amemiya, 1973; Heckman, 1979) while the probit and logistic selection models are commonly used in many other disciplines (Allison 2001; Little & Rubin, 2002; Molenberghs & Kenward, 2007; Daniels & Hogan, 2008). Under the interval selection model, Yuan (2009) showed that the normal-distribution-based pseudo MLEs are consistent and asymptotically normally distributed even when the underlying distribution is unknown. The purpose of this note is to extend the result of Yuan (2009) by showing that the normal-distribution-based MLEs are still consistent when the underlying population is unknown and when the *g _{j}*(

While all the results in Yuan (2009) can be generalized to an MAR mechanism described by probit and logistic selection models, for brevity we only give the details for the bivariate case in section 2. With missing data and a misspecified likelihood function, little literature exists that facilitates thorough understanding of issues related to consistency. We choose this simple model with enough details so that readers having taken a course in statistics/probability can fully understand the development. We will also state the results for consistency for a general *q* in section 3, but the details will be omitted. We conclude the paper by pointing out that not all pseudo MLEs are consistent.

Let **x** = (*x*_{1}, *x*_{2})′ with

$$\mu =E\left(\mathbf{x}\right)=\left(\begin{array}{c}\hfill {\mu}_{1}\hfill \\ \hfill {\mu}_{2}\hfill \end{array}\right)\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}\Sigma =\text{Cov}\left(\mathbf{x}\right)=\left(\begin{array}{cc}\hfill {\sigma}_{11}\hfill & \hfill {\sigma}_{12}\hfill \\ \hfill {\sigma}_{21}\hfill & \hfill {\sigma}_{22}\hfill \end{array}\right),$$

(2)

where ${\sigma}_{11}={\sigma}_{1}^{2},{\sigma}_{22}={\sigma}_{2}^{2}$ and σ_{12} = σ_{21} = σ_{1}σ_{2}ρ. A sample from **x** with missing values in *x*_{2} can be represented by

$$\begin{array}{cc}\hfill & {x}_{11},\cdots ,{x}_{n1},{x}_{(n+1)1},\cdots ,{x}_{N1}\hfill \\ \hfill & {x}_{12},\cdots ,{x}_{n2},\hfill \end{array}$$

(3)

where *x*_{(n+1)2}, …, *x*_{N2} are missing. The interest is to infer (2) based on (3) using the possibly wrong assumption **x** ~ *N*_{2}(μ,Σ). Notice that the number of cases with *x*_{i2} missing is not controllable so that the *n* in (3) is a random number.

With two variables, there are a total of 4 possible observed patterns: (*x*_{i1}, *x*_{i2}), (*x*_{i1}, ), ( , *x*_{i2}) and ( , ). The sample in (3) only contains two of the four. We chose this sample because the MLEs enjoy analytical solutions and the proof of their consistency is simple enough to be understood by a broad audience. We will discuss the consistency of the MLEs with more missing data patterns and more variables in section 3.

Let

$$\begin{array}{c}\hfill {\stackrel{\u2012}{x}}_{1}=\frac{1}{N}\sum _{i=1}^{N}{x}_{i1},\phantom{\rule{thinmathspace}{0ex}}{s}_{11}=\frac{1}{N}\sum _{i=1}^{N}{({x}_{i1}-{\stackrel{\u2012}{x}}_{1})}^{2},\hfill \\ \hfill {\stackrel{\u2012}{x}}_{1\ast}=\frac{1}{n}\sum _{i=1}^{n}{x}_{i1},\phantom{\rule{thinmathspace}{0ex}}{s}_{11\ast}=\frac{1}{n}\sum _{i=1}^{n}{({x}_{i1}-{\stackrel{\u2012}{x}}_{1\ast})}^{2},\hfill \\ \hfill {\stackrel{\u2012}{x}}_{2\ast}=\frac{1}{n}\sum _{i=1}^{n}{x}_{i2},\phantom{\rule{thinmathspace}{0ex}}{s}_{22\ast}=\frac{1}{n}\sum _{i=1}^{n}{({x}_{i2}-{\stackrel{\u2012}{x}}_{2\ast})}^{2},\hfill \\ \hfill {s}_{21\ast}=\frac{1}{n}\sum _{i=1}^{n}({x}_{i2}-{\stackrel{\u2012}{x}}_{2\ast})({x}_{i1}-{\stackrel{\u2012}{x}}_{1\ast}),\phantom{\rule{thinmathspace}{0ex}}{\widehat{\beta}}_{21}=\frac{{s}_{21\ast}}{{s}_{11\ast}}=\frac{{\Sigma}_{i=1}^{n}({x}_{i2}-{\stackrel{\u2012}{x}}_{2\ast})({x}_{i1}-{\stackrel{\u2012}{x}}_{1\ast})}{{\Sigma}_{i=1}^{n}{({x}_{i1}-{\stackrel{\u2012}{x}}_{1\ast})}^{2}}.\hfill \end{array}$$

Then it follows from Anderson (1957) that the MLEs of (μ_{1}, σ_{11}, μ_{2}, σ_{22}, σ_{12}) based on the normal distribution assumption by ignoring the missing data mechanism are

$${\widehat{\mu}}_{1}={\stackrel{\u2012}{x}}_{1},\phantom{\rule{thinmathspace}{0ex}}{\widehat{\sigma}}_{11}={s}_{11},$$

(4a)

$${\widehat{\mu}}_{2}={\stackrel{\u2012}{x}}_{2\ast}+{\widehat{\beta}}_{21}({\stackrel{\u2012}{x}}_{1}-{\stackrel{\u2012}{x}}_{1\ast}),$$

(4b)

$${\widehat{\sigma}}_{22}={s}_{22\ast}+{\widehat{\beta}}_{21}^{2}({\widehat{\sigma}}_{11}-{s}_{11\ast}),$$

(4c)

$${\widehat{\sigma}}_{12}={\widehat{\beta}}_{21}{\widehat{\sigma}}_{11}.$$

(4d)

Through the work of Rubin (1976) and others, it is widely known that${\widehat{\mu}}_{2}$, ${\widehat{\sigma}}_{22}$, ${\widehat{\sigma}}_{12}$ are consistent when all the missing data in (3) are MCAR or MAR and **x** ~ *N*_{2}(μ,Σ). In the following, we will study the consistency of ${\widehat{\mu}}_{2}$, ${\widehat{\sigma}}_{22}$ and ${\widehat{\sigma}}_{12}$ using (4) when **x** does not follow the bivariate normal distribution and the missing *x*_{i2}'s in (3) are MAR. For such a purpose, we assume that the population for the data in (3) is

$${x}_{1}={\mu}_{1}+{\sigma}_{1}{z}_{1},\phantom{\rule{thinmathspace}{0ex}}{x}_{2}={\mu}_{2}+{\sigma}_{2}[\rho {z}_{1}+{(1+{\rho}^{2})}^{1\u22152}{z}_{2}],$$

(5)

where *z*_{1} and *z*_{2} are independent with *E*(*z*_{1}) = *E*(*z*_{2}) = 0 and Var(*z*_{1}) = Var(*z*_{2}) = 1. Clearly, (5) follows a bivariate normal distribution only when *z*_{1} ~ *N*(0, 1) and *z*_{2} ~ *N*(0, 1). However, the population mean vector and variance-covariance matrix of **x** = (*x*_{1}, *x*_{2})′ remain the same regardless of the distributions of *z*_{1} and *z*_{2}.

Corresponding to (3) there exist independent random variables *z*_{i1} and *z*_{i2} such that

$${x}_{i1}={\mu}_{1}+{\sigma}_{1}{z}_{i1},\phantom{\rule{thinmathspace}{0ex}}{x}_{i2}={\mu}_{2}+{\sigma}_{2}[\rho {z}_{i1}+{(1-{\rho}^{2})}^{1\u22152}{z}_{i2}].$$

Let

$$\begin{array}{c}\hfill {\stackrel{\u2012}{z}}_{1\ast}=\frac{1}{n}\sum _{i=1}^{n}{z}_{i1},\phantom{\rule{thinmathspace}{0ex}}{\stackrel{\u2012}{z}}_{2\ast}=\frac{1}{n}\sum _{i=1}^{n}{z}_{i2},\phantom{\rule{thinmathspace}{0ex}}{s}_{z11\ast}=\frac{1}{n}\sum _{i=1}^{n}{({z}_{i1}-{\stackrel{\u2012}{z}}_{\ast})}^{2},\hfill \\ \hfill {s}_{z21\ast}=\frac{1}{n}\sum _{i=1}^{n}({z}_{i2}-{\stackrel{\u2012}{z}}_{2\ast})({z}_{i1}-{\stackrel{\u2012}{z}}_{1\ast}),\phantom{\rule{thinmathspace}{0ex}}{s}_{z22\ast}=\frac{1}{n}\sum _{i=1}^{n}{({z}_{i2}-{\stackrel{\u2012}{z}}_{2\ast})}^{2}.\hfill \end{array}$$

Then

$${\stackrel{\u2012}{x}}_{1\ast}={\mu}_{1}+{\sigma}_{1}{\stackrel{\u2012}{z}}_{1\ast},\phantom{\rule{thinmathspace}{0ex}}{s}_{11\ast}={\sigma}_{11}{s}_{z11\ast},\phantom{\rule{thinmathspace}{0ex}}{s}_{21\ast}={\sigma}_{2}{\sigma}_{1}[\rho {s}_{z11\ast}+{(1-{\rho}^{2})}^{1\u22152}{s}_{z21\ast}],$$

(6a)

$${\stackrel{\u2012}{x}}_{2\ast}={\mu}_{2}+{\sigma}_{2}[\rho {\stackrel{\u2012}{z}}_{1\ast}+{(1-{\rho}^{2})}^{1\u22152}{\stackrel{\u2012}{z}}_{2\ast}],\phantom{\rule{thinmathspace}{0ex}}{s}_{22\ast}={\sigma}_{22}[{\rho}^{2}{s}_{z11\ast}+2\rho {(1-{\rho}^{2})}^{1\u22152}{s}_{z21\ast}+(1-{\rho}^{2}){s}_{z22\ast}].$$

(6b)

The equations in (6) allow us to obtain the probability limits of ${\stackrel{\u2012}{x}}_{j\ast}$ and *s*_{jk*} through those of ${\stackrel{\u2012}{z}}_{\mathit{j}\ast}$ and *s*_{zjk*}, which further lead to consistency of ${\widehat{\mu}}_{2}$, ${\widehat{\sigma}}_{12}$ and ${\widehat{\sigma}}_{22}$ in (4).

We also need to connect the observations in (3) to the MAR mechanism. Let *r _{i}* = 1 if the

$$P({r}_{i}=0\mid {x}_{i1},{x}_{i2})=P({r}_{i}=0\mid {x}_{i1})=P({r}_{i}=0\mid {z}_{i1})=g({x}_{i1},\gamma )=h\left({z}_{i1}\right),$$

(7)

where the parameter vector γ is omitted from *h*(·). Let the probability density functions (pdf) of *z*_{1} and *z*_{2} be *f*_{1}(*t*) and *f*_{2}(*t*), respectively. Then *n*, the number of complete cases in (3), follows the binomial distribution *B*(*N, p _{o}*), where, with

$$\begin{array}{cc}\hfill {p}_{o}& =P({r}_{i}=1)=E\left({I}_{\{{r}_{i}=1\}}\right)=E(1-{I}_{\{{r}_{i}=0\}})\hfill \\ \hfill & =1-E\left[E({T}_{\{{r}_{i}=0\}}\mid {x}_{i1})\right]=1-E\left[P({r}_{i}=0\mid {x}_{i1})\right]\hfill \\ \hfill & =1-E\left[h\left({z}_{i1}\right)\right].\hfill \end{array}$$

Let *t*_{i1} be the realized value of *z*_{i1} and *f* be a generic notation for the probability distribution/density function of the involved random variables. It follows from (7) and

$$f({r}_{i}=1,{t}_{i1})={p}_{o}f({t}_{i1}\mid {r}_{i}=1)=P({r}_{i}=1\mid {t}_{i1}){f}_{1}\left({t}_{i1}\right)$$

that

$$f({t}_{i1}\mid {r}_{i}=1)=\frac{1}{{p}_{o}}[1-h\left({t}_{i1}\right)]{f}_{1}\left({t}_{i1}\right).$$

Thus, the *z*_{i1} corresponding to the observed *x*_{i2} are independent, identically distributed (iid), and each follows the distribution with pdf

$${f}_{1\ast}\left(t\right)=\frac{1}{{p}_{o}}[1-h\left(t\right)]{f}_{1}\left(t\right).$$

Notice that, due to the MAR mechanism, the missingness in (3) has nothing to do with *z*_{i2}. Each *z*_{i2} corresponding to either the observed *x*_{i2} or missing *x*_{i2} still has the same distribution as *z*_{2} ~ *f*_{2}(*t*).

Let the mean and variance of *u* ~ *f*_{1*} (*t*) be μ_{z1*} and σ_{z11*}. Let *u*_{1}, *u*_{2}, …, *u _{N}* be iid with pdf

We are now ready to show the result of consistency. Applying the law of large numbers to the average of ω_{i} yields

$$\frac{n}{N}=\frac{1}{N}\sum _{i=1}^{N}{\omega}_{i}\stackrel{wp1}{\to}{p}_{o},$$

where the equal sign follows from equivalence in distribution and $\stackrel{\mathit{wp}1}{\to}$ is the notation for convergence with probability one. Continuously applying equivalence in distribution and the law of large numbers to the averages of ω_{i}*u _{i}* and ω

$${\stackrel{\u2012}{z}}_{1\ast}=\frac{{\Sigma}_{i=1}^{N}{\omega}_{i}{u}_{i}\u2215N}{n\u2215N}\stackrel{wp1}{\to}\frac{{p}_{o}E\left(u\right)}{{p}_{o}}={\mu}_{z1\ast}$$

(8)

and

$${\stackrel{\u2012}{z}}_{2\ast}=\frac{{\Sigma}_{i=1}^{N}{\omega}_{i}{z}_{i2}\u2215N}{n\u2215N}\stackrel{wp1}{\to}\frac{{p}_{o}E\left({z}_{2}\right)}{{p}_{o}}=0.$$

(9)

Similarly,

$$\begin{array}{c}\hfill \frac{1}{n}\sum _{i=1}^{n}{z}_{i1}^{2}=\frac{{\Sigma}_{i=1}^{N}{\omega}_{i}{u}_{i}^{2}\u2215N}{n\u2215N}\stackrel{wp1}{\to}E\left({u}^{2}\right);\hfill \\ \hfill \frac{1}{n}\sum _{i=1}^{n}{z}_{i1}{z}_{i2}=\frac{{\Sigma}_{i=1}^{N}{\omega}_{i}{u}_{i}{z}_{i2}\u2215N}{n\u2215N}\stackrel{wp1}{\to}E\left(u\right)E\left({z}_{2}\right)=0;\hfill \\ \hfill \frac{1}{n}\sum _{i=1}^{n}{z}_{i2}^{2}=\frac{{\Sigma}_{i=1}^{N}{\omega}_{i}{z}_{i2}^{2}\u2215N}{n\u2215N}\stackrel{wp1}{\to}E\left({z}_{2}^{2}\right)=1;.\hfill \end{array}$$

Thus,

$${s}_{z11\ast}=\frac{1}{n}\sum _{i=1}^{n}{z}_{i1}^{2}-{\stackrel{\u2012}{z}}_{1\ast}^{2}\stackrel{wp1}{\to}E\left({u}^{2}\right)-{\mu}_{z1\ast}^{2}={\sigma}_{z11\ast};$$

(10)

$${s}_{z21\ast}=\frac{1}{n}\sum _{i=1}^{n}{z}_{i1}{z}_{i2}-{\stackrel{\u2012}{z}}_{1\ast}{\stackrel{\u2012}{z}}_{2\ast}\stackrel{wp1}{\to}0;$$

(11)

$${s}_{z22\ast}=\frac{1}{n}\sum _{i=1}^{n}{z}_{i2}^{2}-{\stackrel{\u2012}{z}}_{2\ast}^{2}\stackrel{wp1}{\to}1.$$

(12)

It is obvious that

$${\widehat{\mu}}_{1}={\stackrel{\u2012}{x}}_{1}\stackrel{wp1}{\to}{\mu}_{1}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{\widehat{\sigma}}_{11}={s}_{11}\stackrel{wp1}{\to}{\sigma}_{11}.$$

(13)

Regarding μ_{1}, μ_{2}, σ_{1}, σ_{2} and ρ as constants, ${\stackrel{\u2012}{x}}_{1\ast}$, ${\stackrel{\u2012}{x}}_{2\ast}$, *s*_{11*}, *s*_{21*} and *s*_{22*} in (6) are just linear combinations of ${\stackrel{\u2012}{z}}_{1\ast}$, ${\stackrel{\u2012}{z}}_{2\ast}$, *s*_{z11*}, *s*_{z21*} and *s*_{z22*}, whose probability limits have already been obtained. Combining (6a), (8), (10) and (11) yields

$${\stackrel{\u2012}{x}}_{1\ast}\stackrel{wp1}{\to}{\mu}_{1\ast}={\mu}_{1}+{\sigma}_{1}{\mu}_{z1\ast},\phantom{\rule{thinmathspace}{0ex}}{s}_{11\ast}\stackrel{wp1}{\to}{\sigma}_{11\ast}={\sigma}_{11}{\sigma}_{z11\ast},\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{s}_{21\ast}\stackrel{wp1}{\to}{\sigma}_{2}{\sigma}_{1}{\sigma}_{z11\ast}\rho .$$

(14)

Combining (6b) and (8) to (12) yields

$${\stackrel{\u2012}{x}}_{\ast}\stackrel{wp1}{\to}{\mu}_{2}+{\sigma}_{2}\rho {\mu}_{z1\ast}\phantom{\rule{thinmathspace}{0ex}}\text{and}\phantom{\rule{thinmathspace}{0ex}}{s}_{22\ast}\stackrel{wp1}{\to}{\sigma}_{22}[{\rho}^{2}{\sigma}_{z11\ast}+(1-{\rho}^{2})].$$

(15)

Thus,

$${\widehat{\beta}}_{21}=\frac{{s}_{21\ast}}{{s}_{11\ast}}\stackrel{\mathit{wp}1}{\to}\frac{{\sigma}_{2}}{{\sigma}_{1}}\rho .$$

(16)

It follows from (4b) and (13) to (16) that

$${\widehat{\mu}}_{2}\stackrel{\mathit{wp}1}{\to}{\mu}_{2}+{\sigma}_{2}\rho {\mu}_{z1\ast}+\frac{{\sigma}_{2}}{{\sigma}_{1}}\rho [{\mu}_{1}+({\mu}_{1}+{\sigma}_{1}{\mu}_{z1\ast})]={\mu}_{2}.$$

So ${\widehat{\mu}}_{2}$ is consistent. It follows from (4c) and (13) to (16) that

$${\widehat{\sigma}}_{22}\stackrel{\mathit{wp}1}{\to}{\sigma}_{22}[{\rho}^{2}{\sigma}_{z11\ast}+(1-{\rho}^{2})]+\frac{{\sigma}_{22}{\rho}^{2}}{{\sigma}_{11}}({\sigma}_{11}-{\sigma}_{11}{\sigma}_{z11\ast})={\sigma}_{22}.$$

So ${\widehat{\sigma}}_{22}$ is also consistent. It follows from (4d), (13) and (16) that

$${\widehat{\sigma}}_{12}\stackrel{\mathit{wp}1}{\to}\frac{{\sigma}_{2}}{{\sigma}_{1}}\rho {\sigma}_{11}={\sigma}_{12}.$$

So ${\widehat{\sigma}}_{12}$ is, again, consistent.

Notice that the *g*(·,·) or *h*(·) in (7) can be any function of the observed data. Thus, the normal-distribution-based pseudo MLEs are consistent for any MAR process.

Parallel to (5), let

$$\mathrm{x}={({x}_{1},{x}_{2},\dots ,{x}_{q})}^{\prime}=\mu +\mathrm{Az},$$

(17)

where μ = (μ_{1}, μ_{2}, …, μ_{q})′,

$$\mathrm{A}=\left(\begin{array}{ccccc}\hfill {a}_{11}\hfill & \hfill 0\hfill & \hfill 0\hfill & \hfill \cdots \hfill & \hfill 0\hfill \\ \hfill {a}_{21}\hfill & \hfill {a}_{22}\hfill & \hfill 0\hfill & \hfill \cdots \hfill & \hfill 0\hfill \\ \hfill \hfill & \hfill \hfill & \hfill \cdots \hfill & \hfill \hfill & \hfill \hfill \\ \hfill {a}_{q1}\hfill & \hfill {a}_{q2}\hfill & \hfill {a}_{q3}\hfill & \hfill \cdots \hfill & \hfill {a}_{\mathit{qq}}\hfill \end{array}\right)$$

and satisfies Σ = **AA**′, and **z** = (*z*_{1}, *z*_{2}, …, *z*_{q})′ with *z*_{1}, *z*_{2}, …, *z _{q}* being independent and standardized random variables. Then

Let **x**_{1}, **x**_{2}, …, **x**_{N} be a random sample drawn from **x** with **x**_{i} = μ + **Az**_{i}, where **x**_{i} = (*x*_{i1}, *x*_{i2}, …, *x _{iq}*)′ and

$$\begin{array}{cc}\hfill P({r}_{\mathit{ij}}=0\mid {\mathrm{x}}_{\mathit{io}};{\mathrm{x}}_{\mathit{im}})& =P({r}_{\mathit{ij}}=0\mid \{{x}_{i1},{x}_{i2},\dots ,{x}_{\mathit{iL}},\dots \};{\mathrm{x}}_{\mathit{im}})\hfill \\ \hfill & =P({r}_{\mathit{ij}}=0\mid \{{z}_{i1},{z}_{i2},\dots ,{z}_{\mathit{iL}},\dots \};{\mathrm{x}}_{\mathit{im}})\hfill \\ \hfill & =P({r}_{\mathit{ij}}=0\mid \{{z}_{i1},{z}_{i2},\dots ,{z}_{\mathit{iL}},\dots \})\hfill \\ \hfill & =P({r}_{\mathit{ij}}=0\mid {\mathrm{x}}_{\mathit{io}})\hfill \\ \hfill & ={g}_{j}({\mathrm{x}}_{\mathit{io}},{\gamma}_{j})={h}_{j}\left({\mathrm{z}}_{\mathit{iL}}\right).\hfill \end{array}$$

(18)

Thus, the probability of missing only depends on the observed values and the missing data mechanism is MAR (Rubin, 1976). Notice that **x**_{iL} = (*x*_{i1}, *x*_{i2},…, *x _{iL}*)′ is a subset of

Unlike the problems considered in the previous section, the MLEs do not possess analytical forms when the observed data patterns are not monotonic. So we cannot directly show that the MLEs are consistent as was done in the previous section. We cannot use the established theory of maximum likelihood as in Rubin (1976) either, because the MLEs are obtained based on an incorrect likelihood function. By showing that the normal estimating equation is unbiased at the true population values, Yuan (2009) proved that the normal-distribution-based MLEs are consistent when the missingness of *x _{im}* is due to (

It has been argued that, in any statistical modeling, the distribution specification is at best only an approximation to the real world (Box, 1979). Thus, all MLEs in practice are pseudo MLEs. In the context of missing data, it is nice to know that pseudo MLEs can remain consistent when (17) and (18) hold. We need to note that data model (17) does not include nonnormal distributions created by nonlinear functions of independent random variables *z*_{1}, *z*_{2}, …, *z _{q}*; although it includes an infinite number of nonnormal distributions. Yuan (2009) describes an example in which the MLEs are not consistent when

For the purpose of allowing missingness to depend on all the linear combinations of the previously observed variables, we specified **A** as a lower triangular matrix in (17) so that (*z*_{1}, *z*_{2}, …, *z*_{L}) and (*x*_{1}, *x*_{2}, …, *x*_{L}) are determined by each other. In practice, a participant may join the study after missing a few times and then be missing again. The missingness at the later stage may depend on all the previously observed variables. We can match such a case with (17) by specifying that the rows of **A** that correspond to the observed variables form the upper-left part of a lower triangular matrix, then the consistency result still holds.

As with a general MAR missing data mechanism, the MAR condition in (18) cannot be tested. Without extra information beyond the observed sample, it is impossible to distinguish between MAR and NMAR mechanisms (Molenberghs et al., 2008). Similarly, the data model (17) cannot be tested either because the distribution of **z** is arbitrary.

We would like to thank the editor, an associate editor, and a referee for comments that lead to a significant improvement of the paper.

This research was supported by Grants DA00017 and DA01070 from the National Institute on Drug Abuse and a grant from the National Natural Science Foundation of China (30870784).

- Allison PD. Missing data. Sage; Thousand Oaks, CA: 2001.
- Amemiya T. Regression analysis when the dependent variable is truncated normal. Econometrica. 1973;41:997–1016.
- Anderson TW. Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association. 1957;52:200–203.
- Box GEP. Robustness in the strategy of scientific model building. In: Launer RL, Wilkinson GN, editors. Robustness in statistics. Academic Press; New York: 1979. pp. 201–236.
- Daniels MJ, Hogan JW. Missing data in longitudinal studies: Strategies for Bayesian modeling and sensitivity analysis. Chapman & Hall; Boca Raton, Florida: 2008.
- Geary RC. Testing for normality. Biometrika. 1947;34:209–242. [PubMed]
- Gourieroux C, Monfort A, Trognon A. Pseudo maximum likelihood methods: Theory. Econometrica. 1984;52:681–700.
- Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979;47:153–161.
- Laird NM. Missing data in longitudinal studies. Statistics in Medicine. 1988;7:305–315. [PubMed]
- Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
- Little RJA. Pattern-mixture models for multivariate incomplete data. Journal of the American Statistical Association. 1993;88:125–134.
- Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. Wiley; New York: 2002.
- Micceri T. The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin. 1989;105:156–166.
- Molenberghs G, Beunckens C, Sotto C, Kenward MG. Every missing not at random model has got a missing at random counterpart with equal fit. Journal of the Royal Statistical Society B. 2008;70:371–388.
- Molenberghs G, Kenward MG. Missing data in clinical studies. Wiley; Chichester, England: 2007.
- Rotnitzky A, Wypij D. A note on the bias of estimators with missing data. Biometrics. 1994;50:1163–1170. [PubMed]
- Rubin DB. Inference and missing data (with discussions) Biometrika. 1976;63:581–592.
- Schafer JL. Analysis of incomplete multivariate data. Chapman & Hall; London: 1997.
- White H. Maximum likelihood estimation of misspecified models. Econometrica. 1982;50:1–25.
- Yuan K-H. Normal distribution based pseudo ML for missing data: With applications to mean and covariance structure analysis. Journal of Multivariate Analysis. 2009;100:1900–1918.
- Yuan K-H, Lu L. SEM with missing data and unknown population distributions using two-stage ML: Theory and its application. Multivariate Behavioral Research. 2008;62:621–652.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |