Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3280824

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. MODELS AND METHODS
- 3. MARKOV CHAIN MONTE CARLO (MCMC)
- 4. NUMERICAL STUDIES
- 5. CONCLUSION
- References

Authors

Related links

J Am Stat Assoc. Author manuscript; available in PMC 2012 June 1.

Published in final edited form as:

PMCID: PMC3280824

NIHMSID: NIHMS299346

Mi-Ok Kim, Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229-3039;

Mi-Ok Kim: gro.cmhcc@mik.koim; Yunwen Yang: ude.sionilli@13gnay

The publisher's final edited version of this article is available at J Am Stat Assoc

See other articles in PMC that cite the published article.

We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.

We extend quantile regression (QR) method to a random effects analysis of clustered data in this paper. A dominant paradigm of clustered data analysis is a Gaussian structure where the random effects and random errors are both assumed identically distributed among themselves, following common Gaussian cumulative distributions respectively. Under this structure, however, the effects of the explanatory variables are assumed to affect only the location of the conditional distribution of the response. Such exclusive attention to the location is constraining as the analysis is often motivated by a natural heterogeneity among the clusters. Moreover the response variable is often analyzed on a transformed scale in practice to meet the rather rigid homoscedastic Gaussian distribution assumptions. The results are, however, less straightforward and difficult to interpret on the original scale. In addition the conditional mean analysis, although most popular, does not necessarily adequately address research questions of interest as shown in the Network for the Improvement for Addiction Treatment example below. QR can provide an alternative analytic tool. Conditional regression quantiles are invariant under monotonic transformation like quantiles in a univariate case and are modelled locally for their relationships with the explanatory variables with the relationships allowed to differ from one another.

The aforementioned features of QR has been well recognized with independent data. Methodologies are fully developed and the model is widely applied (see Koenker (2005) for an overview). For the analysis of clustered data, however, a limited number of approaches have been proposed. Jung (1996) considered fixed effects median regression and proposed a quasi-likelihood approach. Koenker (2004) considered a random intercept model and proposed a *l*_{1} penalty approach. The *l*_{1} penalty approach is less stringent in its assumptions than most other methods. While the results of the *l*_{1} penalty approach depend on the choice of a penalty parameter, inference of the fixed effects was not studied with an empirically chosen penalty parameter. The method also may not be applicable to more complex random effects model as it essentially treats random effects as parameters and the increasing dimensionality can be an issue. Geraci and Bottai (2007) assumed an asymmetric Laplace error distribution and proposed an expectation-maximization (EM) estimator. Assuming a common asymmetric Laplace distribution, their method constrains the errors to be not only homoscedastic but also have a mode at the median.

In the Bayesian analysis frame work, several parametric approaches have been proposed, similarly using asymmetric Laplace error densities and mostly for the analysis of independent data (e.g., Yu and Moyeed (2001)). Nonparametric approaches were proposed to avoid the restrictive parametric assumption (e.g., Hanson and Johnson (2002), Kottas and Gelfand (2001) and Kottas and Krnjajic (2009)). Although capturing more general forms of skewness and tail behaviors, these nonparametric approaches also restrict the error densities to necessarily have their modes at the quantile of interest. Reich et al. (2010) relaxed this restriction with an infinite mixture of quantile restricted two component Gaussian mixture densities. These nonparametric Bayesian approaches, however, essentially model the error densities, although avoiding parametrically specifying them. As a consequence they can accommodate the error heteroscedasticity only by correctly specifying its form parametrically in the model. This informative modeling requirement of the error heteroscedasticity is restrictive compared with the generality of independent data QR analysis methodology, and make the computation complex.

We propose a semiparametric approach that does not require modeling of the error densities, thereby realizing all the desirable features of the independent data QR analysis without computational complexity. We assume random regression coefficients which may not be necessarily identically distributed or Gaussian. The random coefficients have a common mean which corresponds to the population-average effects of the explanatory variables on the conditional quantile of interest. The random coefficients represent cluster specific deviations in the covariate effects. Under appropriate conditions discussed later the median regression coincides with the conditional mean, and appears as a robust alternative against outliers in the responses. We consider the estimation of the random effects as an estimating equations problem and use empirical likelihood (EL) to incorporate the parametric likelihood of the random effects. We yield a semiparametric likelihood-like criterion function, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates the maximizer as a natural estimator. We use the Bayesian framework and Markov Chain Monte Carlo (MCMC) samplers for the computation.

Monte Carlo methods have been used for classical estimation problems under various settings where classical methods are handicapped by computational difficulties (see Tian et al. (2007) for an overview). Recent works include Chernozhukov and Hong (2003) and Tian et al. (2007). Chernozhukov and Hong (2003) particularly considered EL and censored QR for the analysis of independent data. In this paper we are concerned with random effects QR analysis and semiparametric likelihood-like criterion function.

A few works in the Bayesian literature also have considered likelihood-like or non-likelihood statistical criterion functions (e.g. Lavine (1995); Dunson et al. (2003); Dunson and Taylor (2005); Lazar (2003), Schennach (2005), Lancaster and Jun (2010)). They were motivated by the computational complexities entailed in nonparametric Bayesian methods and the difficulty of likelihood specification. Lazar (2003) and Lancaster and Jun (2010) specifically considered EL. Most of the works were concerned with independent data analysis with few exceptions. Dunson et al. (2003) considered median regression for a latent variable model with multiple surrogate outcomes under Gausian within-subject dependency structure. Yin (2009) used a quadratic likelihood-like function motivated by generalized estimating equations and proposed Bayesian generalized method of moments method.

The proposed method is similarly motivated: the aforementioned Bayesian nonparametric methods are complex in the computation and require an informative modeling of the error heteroscedasticity. This work does not require modeling of the error densities. EL is one choice that does not require modeling of the error densities. Any non-parametric likelihood such as exponentially tilted EL can be used instead. Using the MCMC sampler, the proposed method also does not require directly estimating the variance of the estimator for inference and is not subject to a known challenge of QR inference of estimating error densities at the quantile of interest. In this paper we provide large sample properties of the resulting quasi-posterior estimators and inference, being the first work clearly showing shrinkage of the random effects estimators toward the population average effect and the asymptotic normality of the population average effect estimator.

The remainder of this paper proceeds as follows. Section 2 formally defines the semi-parametric random effects quantile regression (REQR) estimator and provides their large sample properties. Section 3 describes MCMC methods. Section 4 provides empirical results including the analysis of two real data examples. Section 5 concludes. All the proofs are deferred to the Appendix.

For expository purposes we assume random design points in this section. If the design points are non-stochastic, the results we obtain in this section hold under appropriate conditions on the design sequences.

Let
${\{({y}_{ij},{\mathbf{x}}_{ij})\}}_{j=1}^{{m}_{i}}$, *i* = 1, ···, *n*, denote observations from *n* clusters with cluster size *m _{i}*. For each of the

$${Q}_{i,\tau}\{y\mid \mathbf{x},{\mathbf{b}}_{i}(\tau )\}={\mathbf{x}}^{\top}{\mathbf{b}}_{i}(\tau )\phantom{\rule{0.38889em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\tau \in (0,1),$$

(1)

where **x** Ξ_{i}*R ^{q}* is a random covariate that includes an intercept with a distribution function
${F}_{i}^{X}$, Ξ

- C.1For
*i*= 1, ···,*n*,**b**(_{i}*τ*) are*q*-dimensional random variables supported onwith a common mean_{i}(*β**τ*) and cluster-specific covariance matrix**Σ**(_{i}*τ*) that are positive definite.

Similar to the usual QR analysis for independent data (*m _{i}* = 1),

In order to compare the model with the conditional mean model, we rewrite (1) as

$${y}_{ij}={\mathbf{x}}_{ij}^{\top}\mathit{\beta}+{\mathbf{x}}_{ij}^{\top}{\mathbf{b}}_{i}^{\ast}+{e}_{ij},$$

(2)

where
${\mathbf{b}}_{i}^{\ast}$ are mean zero random effects that correspond to deviations in the covariate effects of the *τ*-th conditional quantiles in the individual clusters from the population-average effect, and *e _{ij}* are random errors with zero

If *τ* = 0.5 and *e _{ij}* have symmetric distributions with a finite mean, the median REQR is equal to the conditional mean:
${\mathbf{x}}_{ij}^{\top}{\mathbf{b}}_{i}(\tau )=E\{{y}_{ij}\mid {\mathbf{x}}_{ij},{\mathbf{b}}_{i}(\tau )\}$. Moreover, the averaged conditional median is the conditional mean of a marginal model:
${\mathbf{x}}_{ij}^{\top}\mathit{\beta}(\tau )=E({y}_{ij}\mid {\mathbf{x}}_{ij})$ for

It is well known with the conditional mean regression that integrating over the random effects produces a marginal model where the regression parameters retain their meaning. Such a relationship does not hold with the REQR. Model (1) is not generally related to a marginal QR model, *Q _{τ}* (

$$E[{F}_{i}\{-{\mathbf{x}}^{\top}{\mathbf{b}}_{i}^{\ast}(\tau )\mid \mathbf{x}\}\mid \mathbf{x}]=\tau \phantom{\rule{0.16667em}{0ex}}\text{uniformly}\phantom{\rule{0.16667em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}\mathbf{x}\in \mathrm{\Xi}\phantom{\rule{0.16667em}{0ex}}\text{and}\phantom{\rule{0.16667em}{0ex}}1\le i\le n,$$

where *F _{i}*{·

Let Σ_{1}, ···, **Σ*** _{n}* be parameterized by a vector

- C.2
**b**is the unique solution to_{i}*E*[*ϕ*{_{τ}*y*−**x**^{⏉}**b**}_{i}**x**] = 0, where*ϕ*(_{τ}*t*) =*τ*−*I*_{[}_{t}_{<0]}, the derivative of the check function and the expectation is taken over the conditional distribution of*y*given**x**in the*i*-th cluster. - C.3
*F*(_{i}*t|***x**) has the density*f*(_{i}*t|***x**) that is differentiable at 0 with 0 <*f*(_{i}*t|***x**) < ∞ and ${sup}_{\mathbf{x}\in {\mathrm{\Xi}}_{i},\mid t\mid \phantom{\rule{0.16667em}{0ex}}<\epsilon}\mid {f}_{i}^{\prime}(t\mid \mathbf{x})\mid \phantom{\rule{0.16667em}{0ex}}<\xi $ for ξ > 0 and ε > 0. - C.4
*E*(**xx**^{⏉}) =**D**_{i}_{0}and*E*{*f*(0_{i}*|***x**)**xx**^{⏉}} =**D**_{i}_{1}in the*i*-th cluster for some positive definite matrix**D**_{i}_{0}and**D**_{i}_{1}, where the expectations are taken over**x**from ${F}_{i}^{X}$. - C.5Ξ
is uniformly bounded in_{i}*R*.^{q}

Under conditions C.2–C.4 standard QR method applied to each cluster separately produces estimators of the random coefficients * _{i}* with asymptotic variances
${m}_{i}^{-1}{\mathbf{Q}}_{i}$ where
${\mathbf{Q}}_{i}={\mathbf{D}}_{i1}^{-1}{\mathbf{D}}_{i0}{\mathbf{D}}_{i1}^{-1}$. We define

$${\mathbf{V}}_{i}={\mathbf{\sum}}_{i}+{m}_{i}^{-1}{\mathbf{Q}}_{i}.$$

(3)

Given **x**, **V*** _{i}* is the asymptotic variance of the the per cluster standard QR analysis estimator

We first define a semiparametric likelihood-like criterion function for the random coefficients **b*** _{i}*. Given

$$\underset{b\in {R}^{q}}{min}\sum _{j=1}^{{m}_{i}}{\rho}_{\tau}({y}_{ij}-{\mathbf{x}}_{ij}^{\top}\mathbf{b}),$$

where *ρ _{τ}* (·) is the check function. As well known,

$${m}_{i}^{-1/2}\sum _{j=1}^{{m}_{i}}{\varphi}_{\tau}({y}_{ij}-{\mathbf{x}}_{ij}^{\top}\mathbf{b}){\mathbf{x}}_{ij}\approx 0.$$

This estimating equations formulation motivates the below empirical likelihood (EL) for the coefficient **b*** _{i}* in the

$${L}_{{m}_{i}}(\mathbf{b})=max\left\{\prod _{j=1}^{{m}_{i}}{p}_{j}\mid \sum _{j=1}^{{m}_{i}}{p}_{j}{\varphi}_{\tau}({y}_{ij}-{\mathbf{x}}_{ij}^{\top}\mathbf{b}){\mathbf{x}}_{ij}=0,\sum _{j=1}^{{m}_{i}}{p}_{j}=1,0\le {p}_{j}\le 1\right\},$$

where *p _{j}* denote weights for the

We let *g _{i}*(·

$$\stackrel{\sim}{{L}_{{m}_{i}}}(\mathbf{b}\mid \mathit{\theta})={L}_{{m}_{i}}(\mathbf{b}){g}_{i}(\mathbf{b}\mid \mathit{\theta}).$$

(4)

By the independence among the clusters a semiparametric likelihood-like criterion function for ** θ** is given by

$$\prod _{i=1}^{n}{\int}_{{\mathbf{b}}_{i}\in {\mathrm{\Upsilon}}_{i}}\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})\phantom{\rule{0.16667em}{0ex}}d{\mathbf{b}}_{i}.$$

Under some regularity conditions, the above integral admits a quadratic form of the first order expansion on the log scale centered at *θ*_{0} in a *O* (*n*^{−1/2}) neighborhood of *θ*_{0}, while
$\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})$ is asymptotically equivalent to a multivariate normal density (see (A.6) and (A.7) in the Appendix). This motivates the following maximizers as natural estimators:

$${\overline{\mathbf{b}}}_{{m}_{i}}={\text{argmax}}_{{\mathbf{b}\in \mathrm{\Upsilon}}_{i}}\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid {\overline{\mathit{\theta}}}_{n}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\overline{\mathit{\theta}}}_{n}={\text{argmax}}_{\mathit{\theta}\in \mathrm{\Theta}}\prod _{i=1}^{n}{\int}_{{\mathbf{b}}_{i}\in {\mathrm{\Upsilon}}_{i}}\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})\phantom{\rule{0.16667em}{0ex}}d{\mathbf{b}}_{i}.$$

A direct computation of the estimators is daunting. We instead use the Bayesian framework and MCMC samplers.

Let *π*(** θ**) denote a non-informative prior appropriately defined for

$${\omega}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})=\prod _{i=1}^{n}\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})\pi (\mathit{\theta}).$$

We consider a quasi-posterior defined by *ω _{n}*(

$${\stackrel{\sim}{\omega}}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})=\frac{{\omega}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})}{{\int}_{\mathit{\theta}\in \mathrm{\Theta}}{\int}_{{\mathbf{b}}_{1}\in {\mathrm{\Upsilon}}_{1}}\cdots {\int}_{{\mathbf{b}}_{n}\in {\mathrm{\Upsilon}}_{n}}{\omega}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})\phantom{\rule{0.16667em}{0ex}}d{\mathbf{b}}_{n}\cdots d{\mathbf{b}}_{1}\phantom{\rule{0.16667em}{0ex}}d\mathit{\theta}}.$$

(5)

Improper priors may not guarantee a proper posterior and Yang and He (2011) specifically advise avoiding flat priors with their Bayesian EL approach. With the proposed REQR method, however, improper priors will guarantee a proper posterior, if they produce a proper posterior in relation to *g _{i}*(

$${\int}_{\mathit{\theta}\in \mathrm{\Theta}}{\int}_{{\mathbf{b}}_{1}\in {\mathrm{\Upsilon}}_{1}}\cdots {\int}_{{\mathbf{b}}_{n}\in {\mathrm{\Upsilon}}_{n}}{\omega}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})\phantom{\rule{0.16667em}{0ex}}d{\mathbf{b}}_{n}\cdots d{\mathbf{b}}_{1}\phantom{\rule{0.16667em}{0ex}}d\mathit{\theta}\le {\int}_{\mathit{\theta}\in \mathrm{\Theta}}{\int}_{{\mathbf{b}}_{1}\in {\mathrm{\Upsilon}}_{1}}\cdots {\int}_{{\mathbf{b}}_{n}\in {\mathrm{\Upsilon}}_{n}}\prod _{i=1}^{n}{g}_{i}({\mathbf{b}}_{i}\mid \mathit{\theta})\pi (\mathit{\theta})\phantom{\rule{0.16667em}{0ex}}d{\mathbf{b}}_{n}\cdots d{\mathbf{b}}_{1}\phantom{\rule{0.16667em}{0ex}}d\mathit{\theta},$$

where the second integral can be finite with a certain choice of improper *π*(** θ**) specific to

$$\mathbf{\sum}\mid {\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n}\sim {IW}_{n-1}({\mathbf{S}}_{n}),\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\mathit{\beta}\mid \mathbf{\sum},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n}\sim {N}_{(q)}\left({n}^{-1}\sum _{i=1}^{n}{\mathbf{b}}_{i},\phantom{\rule{0.16667em}{0ex}}{n}^{-1}\mathbf{\sum}\right),$$

(6)

where
${\mathbf{S}}_{n}={\sum}_{i=1}^{n}({\mathbf{b}}_{i}-{n}^{-1}{\sum}_{i=1}^{n}{\mathbf{b}}_{i})\phantom{\rule{0.16667em}{0ex}}{({\mathbf{b}}_{i}-{n}^{-1}{\sum}_{i=1}^{n}{\mathbf{b}}_{i})}^{\top}\phantom{\rule{0.16667em}{0ex}}$ and *IW _{n}*

We let * _{ni}*(

$${\stackrel{\sim}{\mathbf{b}}}_{{m}_{i}}={\int}_{\mathbf{b}\in \mathrm{\Upsilon}}\mathbf{b}\phantom{\rule{0.16667em}{0ex}}{\stackrel{\sim}{\omega}}_{ni}(\mathbf{b})\phantom{\rule{0.16667em}{0ex}}d\mathbf{b},\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}\phantom{\rule{0.38889em}{0ex}}{\stackrel{\sim}{\mathit{\theta}}}_{n}={\int}_{\mathit{\theta}\in \mathrm{\Theta}}\mathit{\theta}\phantom{\rule{0.16667em}{0ex}}{\stackrel{\sim}{\omega}}_{n}(\mathit{\theta})\phantom{\rule{0.16667em}{0ex}}d\mathit{\theta}.$$

(7)

We consider the following specific conditions in addition to Condition C.1.

- C.1.(i) For 1≤
*i*≤*n*,*g*(·|_{i}) =*η**g*(·|) with*η**η*_{0}denoting the true value. - C.1.(ii)
**b**have normal densities with not necessarily identical_{i}**Σ**. The heterogeneity among_{i}**Σ**is limited such that_{i}is a finite vector.*θ*

The requirements of condition C.1.(i) are comparable to those required for a general form of the Bayesian equivalent of the central limit theorem to hold for the posterior distribution of ** η** when standard Bayesian method is applied to
${\{{\mathbf{b}}_{i}\}}_{i=1}^{n}$, if

We first present asymptotic results with the normal densities. We define
${\mathbf{V}}_{i0}={\mathbf{\sum}}_{i0}+{m}_{i}^{-1}{\mathbf{Q}}_{i}$ following the definition of **V*** _{i}* in (3).

Assume condition C.1, C.1.(ii) and C.2–C.5.

*As m*→ ∞, ${\stackrel{\sim}{\mathbf{b}}}_{{m}_{i}}=\{\mathbf{I}-({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}\}{\widehat{\mathbf{b}}}_{i}+({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}{\stackrel{\sim}{\mathit{\beta}}}_{n}+{o}_{p}({m}_{i}^{-1/2})$_{i}*for all*1 ≤*i*≤*n. Moreover*, $\mathit{Var}({\stackrel{\sim}{\mathbf{b}}}_{{m}_{i}})=\{\mathbf{I}-({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}\}\phantom{\rule{0.16667em}{0ex}}({m}_{i}^{-1}{\mathbf{Q}}_{i})+{o}_{p}({m}_{i}^{-1})$.*If m*→ ∞_{i}*for all*1 ≤*i*≤*n and n*→ ∞,*then*${\stackrel{\sim}{\mathit{\beta}}}_{n}={n}^{-1}{\sum}_{i=1}^{n}({\overline{\mathit{V}}}_{n}{\mathbf{V}}_{i0}^{-1})\phantom{\rule{0.16667em}{0ex}}{\widehat{\mathbf{b}}}_{i}+{o}_{p}({n}^{-1/2})$,*where*${\overline{\mathit{V}}}_{n}={\{{n}^{-1}{\sum}_{i=1}^{n}{\mathbf{V}}_{i0}^{-1}\}}^{-1}$. Also n^{1/2}(−_{n}_{0}) ~*AN*_{(}_{q}_{)}(0,)_{n}*Suppose***Σ**=_{i}**Σ***and*_{n}is accordingly defined by the posterior mean._{n}*Consider matrix operator vec*(·)*that takes a matrix and forms a large column vector by stacking the columns*,*and its generalization vech*(·)*for a symmetric matrix*,*respectively described in*Henderson and Searle (1979).*If m*→ ∞_{i}*for all*1 ≤*i*≤*n and n*→ ∞,*then*$${n}^{1/2}\{\mathit{vech}({\mathbf{\sum}^{\sim}}_{n})-\mathit{vech}({\mathbf{\sum}}_{0})\}\sim {AN}_{(q(q+1)/2)}\left(\mathbf{0},2{\left\{{\mathbf{K}}^{\top}\left({n}^{-1}\sum _{i=1}^{n}{\mathbf{V}}_{i0}^{-1}\otimes {\mathbf{V}}_{i0}^{-1}\right)\mathbf{K}\right\}}^{-1}\right),$$*where**K**is a unique matrix given by the relationship vec*(**M**) =**K***vech*(**M**)*for any symmetric matrix***M***of the same dimension as***Σ***and**represents the Kronecker product.*

The first result of the theorem shows shrinkage in the REQR estimators that is comparable to the conditional mean regression: given **x**, the posterior means **x**^{⏉}_{mi} are weighted averages of the population averaged profile **x**^{⏉}* _{n}* and the within cluster QR estimates

From (ii) we note that * _{n}* is a weighted average of the per cluster estimators

In practice we have Markov chains
$\mathbf{S}({\mathbf{b}}_{i})=({\mathbf{b}}_{i}^{(1)},{\mathbf{b}}_{i}^{(2)},\cdots ,{\mathbf{b}}_{i}^{(B)})$, *i* = 1, ···, *n*, and **S**(** θ**) = (

Assume condition C.1, C.1.(ii) and C.2–C.5. For any α (0, 1),

- lim
_{mi→∞}*P*[*c*_{k}_{,}{_{n}**S**(**b**),_{i}*α*/2} ≤**b**≤_{i}*c*_{k}_{,}{_{n}**S**(**b**), 1 −_{i}*α*/2}] = 1 −*α for k*= 1, 2*and*1 ≤*i*≤*n*. - lim
_{n}_{→∞}*P*[*c*_{k}_{,}{_{n}**S**(),*θ**α*/2} ≤*θ*_{0}≤*c*_{k}_{,}{_{n}**S**(), 1 −*θ**α*/2}] = 1 −*α for k*= 1, 2.

The results similarly hold for the inference of *v*(**b*** _{i}*) or

Here each
$\stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})$ can be seen as a posterior density with *g _{i}*(

Under the non-normal density condition of C.1.(i), we define **V**_{i}_{0} with **Σ**_{i}_{0} = **Λ**(*η*_{0}) where **Λ**(** η**) is given in Lemma A.3 in the Appendix.

Assume Conditions C.1, C.1.(i) and C.2–C.5.

*As m*→ ∞, ${\stackrel{\sim}{\mathbf{b}}}_{{m}_{i}}=\{\mathbf{I}-({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}\}{\widehat{\mathbf{b}}}_{i}+({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}{\stackrel{\sim}{\mathit{\beta}}}_{n}+{o}_{p}({m}_{i}^{-1/2})$_{i}*for all*1 ≤*i*≤*n. Moreover*, $\mathit{Var}({\stackrel{\sim}{\mathbf{b}}}_{{m}_{i}})=\{\mathbf{I}-({m}_{i}^{-1}{\mathbf{Q}}_{i}){\mathbf{V}}_{i0}^{-1}\}({m}_{i}^{-1}{\mathbf{Q}}_{i})+{o}_{p}({m}_{i}^{-1})$.*If n*→ ∞*along with m*, 1 ≤_{i}*i*≤*n*,*such that*(min_{1≤}_{i}_{≤})_{n}m_{i}*n*^{−1/2}→ ∞,*then*${n}^{1/2}({\stackrel{\sim}{\mathit{\theta}}}_{n}-{\mathit{\theta}}_{0})\sim AN\left(0,{\left[\mathbf{H}({\mathit{\eta}}_{0}){\mathbf{J}}_{{\mathit{\eta}}_{0}}{\mathbf{\Omega}}_{{\mathit{\eta}}_{0}}^{-1}{\mathbf{J}}_{{\mathit{\eta}}_{0}}{\{\mathbf{H}({\mathit{\eta}}_{0})\}}^{\top}\right]}^{-1}\right)$,*where***H**(·)*is the Jacobian given in the Condition C.1.(i)*,*and***J**_{η0}and**Ω**_{η0}*are quantities related to the hessian and the score of*log(_{n})*η**defined in (*A.6*) in the*Appendix.

Inference results corresponding to Theorem 2 also holds accordingly.

We propose a simple combination of Metropolis-Hastings (M-H) and Gibbs algorithm. As **b*** _{i}* are independent given

$${\stackrel{\sim}{\omega}}_{n}({\mathbf{b}}_{i}\mid \mathit{\theta})\propto {\omega}_{n}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})\propto \stackrel{\sim}{{L}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\theta})={L}_{{m}_{i}}({\mathbf{b}}_{i}){g}_{i}({\mathbf{b}}_{i}\mid \mathit{\theta})\phantom{\rule{0.38889em}{0ex}}\text{for}\phantom{\rule{0.16667em}{0ex}}\text{all}\phantom{\rule{0.16667em}{0ex}}1\le i\le n,$$

where denotes equivalence up to some constants. Many algorithms are available for the computation of *L _{mi}*(

$${\stackrel{\sim}{\omega}}_{n}(\mathit{\theta}\mid {\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})=\frac{\stackrel{\sim}{\omega}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})}{{\int}_{\mathit{\theta}\in \mathrm{\Theta}}\stackrel{\sim}{\omega}(\mathit{\theta},{\mathbf{b}}_{1},\cdots ,{\mathbf{b}}_{n})\phantom{\rule{0.16667em}{0ex}}d\mathit{\theta}}\propto \prod _{i=1}^{n}{g}_{i}({\mathbf{b}}_{i}\mid \mathit{\theta})\pi (\mathit{\theta}),$$

where the right hand side may have a readily available closed form as *g _{i}*(

All computations were conducted in R (R Development Core Team, 2006) with the empirical likelihood computed using el.cen.EM2 function in R package ‘emplik’. Proposal distributions for the Metropolis-Hastings algorithm were chosen to have acceptance rates in the range of 0.1 ~ 0.4.

Real data examples were analyzed with 25,000 MCMC samples, the first 5,000 discarded as burn-in, and the standard QR solutions as initial values. The random effects were assumed Gaussian. The convergence of the Markove chains was checked and found satisfactory by the methods recommended by Cowles and Carlin (1996). For example, based on the Z-score by Geweke (1992), we found that all Z-scores were not significant. Specifically for the Example 1 NIATx data, Z-scores took values of 1.217, −0.074, and −1.747 for ** β** = (

Trace plots of the Markov chains: the top panel contains the trace plots for *β*(0.90), population averaged regression coefficients of a second degree polynomial at *τ* = 0.90. The bottom panel contains the trace plots for corresponding random **...**

NIATx We analyze data from the Network for the Improvement for Addiction Treatment (NIATx) using a second degree polynomial REQR. The NIATx is a learning community of alcohol and other drug treatment centers. It uses process improvement strategies to treatment services and facilitates organizational change in the delivery of addition treatment services (Capoccia et al. (2007)). A cross-center evaluation monitored impacts on days between first contact and first treatment. In this paper we analyze data on the delay time from twenty-six programs that provided at least ten months of data. The majority of the centers (22/26) have samples larger than 150 with eighteen of them greater than 200. This allows modeling of high quantiles as shown below. We refer to McCarty et al. (2007) and Hoffman et al. (2008) for more information on the study design, methods and outcomes.

Centers that successfully used process improvement reduced delay times and the reduction was more apparent the longer they implemented process improvements. The decreasing trend in the delay time, however, is not adequately summarized by the conditional mean. A small or zero improvement is expected in the lower tail of the distribution where delays were previously short. In the upper tail where delays were long, however, a more pronounced improvement is expected. The upper half of the data above the median are particularly interesting as the process improvement strategies were anticipated to reduce the spread of the data towards the median in the upper half. This justifies REQR analysis of the data. The distribution of delay times is right skewed (see examples in Figure 2) and analyzed on the log scale. We include admission time of each patient (in month since the beginning of NIATx participation) as a covariate and consider a change in the delay time as a second degree polynomial function of the admission time. For comparison we analyze the data for changes in the conditional mean delay time using restricted maximum likelihood (REML) method. We also conduct QR analysis separately for each center using the standard independent data QR method.

Scatter plots of days between first contact and first treatment against month of admission for four centers: *m* indicate per center sample sizes and supper imposed curves represent regression fits by three different methods: REQR, REML and standard independent **...**

Figure 2 shows the regression results for four centers. Increasing heterogeneity is observed among the centers at higher quantiles. The increasing between center heterogeneity is more clearly depicted in Figure 3. If the success of the process improvement strategies is reflected in decreasing delay time, this suggests that successful centers differe from unsuccessful ones more in the upper quantiles than in the medians. Center 24 is an example where the conditional mean analysis found no significant change, while the proposed REQR method found significant trends locally in the high quantiles. Center 12 exhibits an ideal trend anticipated with the process improvement strategies. The standard independent data QR method is known to sometimes produce crossing quantile fits in a small sample and Center 73 is one such case. Quantiles estimated by the proposed REQR, however, are not crossed. The REQR analysis pools information across the centers and the borrowed information offsets the small sample size.

Figure 4 presents point and interval estimates of the random coefficients of the linear term in the second degree polynomial 0.9-th quantile curves by the independent data QR and proposed RERQ method. Horizontal lines are respectively the population average effect estimate by the REQR method and a simple average of the per center estimates by the independent data QR method. The point estimates by the RERQ method are shrunken toward the population average effect estimate, and the interval estimates by the proposed RERQ method tend to be shorter than those by the standard independent QR method. This supports the asymptotic results of Theorem 1-(i).

Swallow Study In this section we analyze apnea duration data from swallow study previously analyzed in Reich et al. (2010) and reference therein. Apnea duration is the period of nasal airflow cessation during swallowing. Longer durations often indicate age-related function changes for seniors. The duration was measured repeatedly under different experimental conditions defined by feeding condition (examiner- versus self-fed), viscosity (liquild or pudding), and volume (5ml or 10ml). Similar to Reich et al. (2010), we include indicators of the three experimental conditions and their two-way interactions as covariates and analyze a total of 1286 observations measured on twenty-three elderly women of age seventy-five or older, each with fifty-five or fifty-six observations.

The distribution of apnea duration times remains right-skewed after a log transformation and the spread of the data also varies widely across the subjects (see examples in Figure 5). Figure 6 shows that the data distributions under different experimental conditions differ more in the right tail than in the middle. These justify the proposed REQR analysis. We analyze the data for the conditional median and 0.9-th quantile. For comparison we conduct standard QR analysis that assumes all observations are independent.

Boxplots of apnea duration times in seconds on the log scale: Wide boxes represent subjects being fed by the examiners, and narrow boxes represent self-feeding. Shaded boxes are pudding and white boxes are liquid. The horizontal lines are the sample quantiles **...**

Table 1 provides a summary. The REQR results agree well with the boxplot presentation of the data in Figure 6. The significant viscosity and viscosity by feeding condition interaction estimates suggest that swallowing liquid took significantly longer time, particularly when the subjects were fed by examiners. This is indicated by the higher median and 0.9-th sample quantile lines in the white boxes than in the shaded boxes across the experimental conditions, while the differences much bigger in the comparison of the wide white boxes with the wide shaded boxes. The non-significant volume and significant viscosity by volume interaction estimates suggest that swallowing the larger quantity did not necessarily take significantly more time, while the difference in the duration time by the volume was significantly larger when liquid was swallowed compared to pudding. This is implied by the greater height differences in the median and 0.9-th sample quantile lines by volume among the white boxes than the shaded boxes in Figure 6. The estimated effects are much larger for the 0.9-th conditional quantile.

95% interval estimates for the apnea duration example: the REQR method is compared with the standard QR method that assumes independency for the within subject correlation. Intervals with asterisks do not included 0.

Compared to the independent data QR analysis, the REQR provides similar results, mostly agreeing on the significance of the effects implied by the interval estimates. However, the interval estimates tend to be much shorter with the REQR. The shorter intervals are expected as the independent data QR method ignores the within subject correlation in the data. The REQR also finds one more significant effect in the 0.9-th conditional quantile analysis.

We now compare our results with those of Reich et al. (2010). The results are notably different concerning the effects of feeding condition and viscosity by feeding interaction in the 0.9-th conditional quantile analysis. Contrary to our results, Reich et al. (2010) found the feeding condition effect significant, while the viscosity by feeding interaction effect non-significant. The two results are not directly comparable as Reich et al. (2010) considered the population quantiles using the below marginal model:

$${y}_{ij}={\mathbf{x}}_{ij}^{\top}\mathit{\eta}(\tau )+{\mathbf{x}}_{ij}^{\top}\mathit{\gamma}({\alpha}_{i}(\tau )+{e}_{ij}),$$

where *α _{i}*(

We use the below model and study the performance of the proposed REQR method by Monte-Carlo simulation:

$${y}_{ij}={b}_{0i}+{b}_{1i}\times {z}_{ij}+{e}_{ij},i=1,\cdots ,n;j=1,\cdots ,m,$$

where *b*_{0}* _{i}* and

4,000 MCMC samples were drawn per replicated data and the first 1,000 were discarded as burn-in. We randomly sampled values from *N*_{(1)}(0, 1) and used them as initial values. In each simulation setting we examined the convergence of the Markove chains for the first 10 replicated datasets and found the convergence satisfactory similarly by the methods recommended by Cowles and Carlin (1996). We provide examples of trace plots in Figure 7.

Trace plots of the Markov chains: the top panel contains the trace plots of the random intercepts and the bottom panel slopes respectively of 4 clusters in a simulation setting with homogeneous errors from t distribution with 2 degree of freedom. For **...**

The results for the medians (*τ* = 0.5) are summarized and compared with the results of the restricted maximum likelihood (REML) method in Table 2 and and3.3. When the ho-moscedastic Gaussian error assumption of the REML is met, the REQR performed slightly worse in a small sample (*m* = 10) yet comparably in a moderately large sample (*m* = 20). The REQR and REML results are not directly comparable about the intercept when the error distribution is log normal, as they measure different quantities. With this exception, the proposed method outperformed the REML, particularly when inference is concerned with the variances. In a small sample (*m* = 10) the REML reported 0 for the variances of the random effects particularly in an alarmingly large number of cases for the heteroscedastic errors. With the 0 random effects variance estimates, the error variance estimate is inflated. This leads to a poor coverage for the variance parameters and an over-coverage and extended confidence intervals for the mean parameters ** β**. The REQR performed overall decently in a small sample (

Monte-Carlo study results for *τ* = 0.5 with *n* = 20 and *m* = 10 based on 300 replicated datasets: *MSE** indicates the ratio of mean squared errors (MSE) for REML method over MSE for the REQR method. *AL*** similarly indicates the ratio of average interval **...**

Table 4 shows that the proposed method performed reasonably well for the 0.9-th quantile in a small sample with homoscedastic errors yet only for the slope parameters with heteroscedastic errors. Increase in the sample size did not improve the performance much for the intercept parameter. The non-normal distributions under consideration are extreme examples and represent challenging cases under the heteroscedastic setup. Moreover the simulation setups are not favorable with small between cluster variabilities relative to the within cluster variabilities. The high quantile under consideration compounded the difficulty. The performance improved when the between cluster variability increased. For example, when *b*_{0}* _{i}* ~

In this paper we extend QR method to a random effects analysis of clustered data, while fully preserving the features of the standard independent data QR method and admitting non-Gaussian random effects. The parameterization of the proposed REQR model is intuitive. The mean of the cluster specific random coefficients correspond to the population-average effects of the explanatory variables on the conditional quantile of interest. The discrepancy between the random coefficients and their mean corresponds to the deviation in the explanatory variable effects in the individual clusters from the population average. With symmetric error distributions with a finite mean, the median REQR coincides with the conditional mean and can be seen as a robust alternative that performs comparably when the homoscedastic Gaussian assumptions are met and outperforms in other cases. The proposed approach also avoids modeling the error distributions entirely and is computationally simpler than existing nonparametric Bayesian methods.

The authors thank Drs. Dongseok Choi and Dennis McCarty in the Department of Public Health and Preventive Medicine at the Oregon Health & Science University for sharing the Network for the Improvement for Addiction Treatment (NIATx) data. The authors thank the editor, the associate editor and two reviewers for their insightful comments. This research is supported in part by the National Institutes of Health (1R03CA133944-01). The collection of the NIATx data is supported by awards from the Robert Wood Johnson Foundation, the Center for Substance Abuse Treatment, and the National Institute on Drug Abuse (R01 DA018282; R01 DA020832).

We will omit *τ* from notations below wherever the omission does not cause confusion. We let
$\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b})={\mathbf{x}}_{ij}{\varphi}_{\tau}({y}_{ij}-{\mathbf{x}}_{ij}^{\top}\mathbf{b})$, *l _{mi}*(

*Assume conditions C.2–C.5. For all* 1 ≤ *i* ≤ *n*, *given*
**b*** _{i}*,

- $\Vert {m}_{i}^{-1/2}{\sum}_{j=1}^{{m}_{i}}\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},{\widehat{\mathbf{b}}}_{i})\Vert ={o}_{p}(1)$.
- $\left|\right|{\sum}_{j=1}^{{m}_{i}}\{\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b})-E\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b})\}\left|\right|\phantom{\rule{0.16667em}{0ex}}={O}_{p}({m}_{i}^{1/2})$
*and*${m}_{i}^{-1}{\sum}_{j=1}^{{m}_{i}}\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b}){\mathbf{z}}^{\top}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b})={\mathbf{D}}_{i0}+{o}_{p}(1)$,*uniformly for all***b***in an o*(1)-*neighborhood of***b**._{i} - ${m}_{i}^{-1/2}{\sum}_{j=1}^{{m}_{i}}\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},\mathbf{b})={m}_{i}^{-1/2}{\sum}_{j=1}^{{m}_{i}}\mathbf{z}({\mathbf{x}}_{ij},{y}_{ij},{\mathbf{b}}_{i})-{\mathbf{D}}_{i1}\{{m}_{i}^{1/2}(\mathbf{b}-{\mathbf{b}}_{i})\}+{O}_{p}({m}_{i}^{-1/2})$,
*uniformly for all***b***in an*$O({m}_{i}^{-1/2})$-*neighborhood of***b**._{i}

(i) follows from Lemma A.2 in Ruppert and Carroll (1980) and (ii)–(iii) follows from Lemma A.5 in Yang and He (2011).

*Assume conditions C.2–C.5. Given*
**b*** _{i}*,

$$\underset{{m}_{i}\to \infty}{liminf}P\{\underset{\left|\right|\mathbf{b}-{\mathbf{b}}_{i}\left|\right|\phantom{\rule{0.16667em}{0ex}}\ge {\delta}_{1}}{sup}{m}_{i}^{-1}\{{l}_{{m}_{i}}(\mathbf{b})-{l}_{{m}_{i}}({\mathbf{b}}_{i})\}\le -{\epsilon}_{1}\}=1.$$

(A.1)

Also for **b** in an open neighborhood of **b**_{i},
${l}_{{m}_{i}}(\mathbf{b})=-{\scriptstyle \frac{{m}_{i}}{2}}{(\mathbf{b}-{\widehat{\mathbf{b}}}_{i})}^{\top}{\mathbf{Q}}_{i}^{-1}(\mathbf{b}-{\widehat{\mathbf{b}}}_{i})+{R}_{{m}_{i}}(\mathbf{b})$, *where there exists a sufficiently small δ*_{2} > 0 *and large M* > 0 *for each ε*_{2} > 0 *such that*

$$\begin{array}{l}\underset{{m}_{i}\to \infty}{limsup}P\left\{\underset{{m}_{i}^{-1/2}M\le \left|\right|\mathbf{b}-{\mathbf{b}}_{i}\left|\right|\phantom{\rule{0.16667em}{0ex}}\le {\delta}_{2}}{sup}{({m}_{i}{\left|\right|\mathbf{b}-{\mathbf{b}}_{i}\left|\right|}^{2})}^{-1}\mid {R}_{{m}_{i}}(\mathbf{b})\mid \phantom{\rule{0.16667em}{0ex}}>{\epsilon}_{2}\right\}<{\epsilon}_{2},\\ \underset{{m}_{i}\to \infty}{limsup}P\left\{\underset{\left|\right|\mathbf{b}-{\mathbf{b}}_{i}\left|\right|\le {m}_{i}^{-1/2}M}{sup}\mid {R}_{{m}_{i}}(\mathbf{b})\mid \phantom{\rule{0.16667em}{0ex}}>{\epsilon}_{2}\right\}=0.\end{array}$$

(A.2)

With **b*** _{i}* replacing

Assume conditions C.1 and C.1.(i) for g(b|**η**). We define
${\{\mathbf{\Lambda}(\mathit{\eta})\}}^{-1}=-\{{\partial}^{2}logg/{(\partial \mathbf{b}{(\partial \mathbf{b})}^{\top})\mid}_{\mathbf{b}=\mathit{\beta}}\}$
*and*
${\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\eta})={[{\mathbf{Q}}_{i}^{-1}+{m}_{i}^{-1}{\{\mathbf{\Lambda}(\mathit{\eta})\}}^{-1}]}^{-1}$*. Also assume conditions C.2–C.5. If* (min_{1≤}_{i}_{≤}* _{n} m_{i}*)

$$log{\stackrel{\sim}{\omega}}_{n}(\mathit{\eta})=\sum _{i=1}^{n}logg({\widehat{\mathbf{b}}}_{i}\mid \mathit{\eta})+{R}_{n}(\mathit{\eta}),$$

(A.3)

*where similar results as in (*A.2*) hold for R _{n}*(

Under C.1.(i) it holds that for any **b**, **b**^{*} with ||**b** − **b**^{*}|| < *ε*,

$$logg(\mathbf{b}\mid \mathit{\eta})=logg({\mathbf{b}}^{\ast}\mid \mathit{\eta})-\frac{1}{2}{(\mathbf{b}-\mathit{\beta})}^{\top}{\{\mathbf{\Lambda}(\mathit{\eta})\}}^{-1}(\mathbf{b}-\mathit{\beta})+\frac{1}{2}{({\mathbf{b}}^{\ast}-\mathit{\beta})}^{\top}{\{\mathbf{\Lambda}(\mathit{\eta})\}}^{-1}({\mathbf{b}}^{\ast}-\mathit{\beta})+{(\mathbf{b}-{\mathbf{b}}^{\ast})}^{\top}\mathbf{r}(\mathbf{b},{\mathbf{b}}^{\ast},\mathit{\eta}),$$

(A.4)

where ||*E*{**r**(**b**, **b**^{*}, ** η**)}|| < ∞ and

$$\stackrel{\sim}{{l}_{{m}_{i}}}(\mathbf{b}\mid \mathit{\eta})={\upsilon}_{{m}_{i}}(\mathbf{b},{\widehat{\mathbf{b}}}_{i},\mathit{\eta})+{\rho}_{{m}_{i}}({\widehat{\mathbf{b}}}_{i},\mathit{\eta}),$$

(A.5)

where
${\upsilon}_{{m}_{i}}(\mathbf{b},{\widehat{\mathbf{b}}}_{i},\mathit{\eta})=log\phi (\mathbf{b}\mid {\widehat{\mathbf{b}}}_{i}^{\ast}(\mathit{\eta}),{m}_{i}^{-1}{\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\eta}))+{R}_{{m}_{i}}(\mathbf{b})+{(\mathbf{b}-{\mathbf{b}}^{\ast})}^{\top}\mathbf{r}(\mathbf{b},{\mathbf{b}}^{\ast},\mathit{\eta})$ and
${\rho}_{{m}_{i}}({\widehat{\mathbf{b}}}_{i},\mathit{\eta})=logg({\widehat{\mathbf{b}}}_{i}\mid \mathit{\eta})+{\scriptstyle \frac{1}{2{m}_{i}}}{({\widehat{\mathbf{b}}}_{i}-\mathit{\beta})}^{\top}{[\mathbf{\Lambda}(\mathit{\eta}){\{{\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\eta})\}}^{-1}\mathbf{\Lambda}(\mathit{\eta})]}^{-1}({\widehat{\mathbf{b}}}_{i}-\mathit{\beta})+{\scriptstyle \frac{1}{2}}log\mid {m}_{i}^{-1}{\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\eta})\mid $, and
${\widehat{\mathbf{b}}}_{i}^{\ast}(\mathit{\eta})={\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\eta})[{\mathbf{Q}}_{i}^{-1}{\widehat{\mathbf{b}}}_{i}+{m}_{i}^{-1}{\{\mathbf{\Lambda}(\mathit{\eta})\}}^{-1}\mathit{\beta}]$. As (min_{1≤}_{i}_{≤}* _{n} m_{i}*)

$$log{\stackrel{\sim}{\omega}}_{n}(\mathit{\eta})=\sum _{i=1}^{n}logg({\widehat{\mathbf{b}}}_{i}\mid \mathit{\eta})+\sum _{i=1}^{n}log{U}_{i}({\widehat{\mathbf{b}}}_{i},\mathit{\eta})+{o}_{p}(1),$$

where *U _{i}*(

$${U}_{i}({\widehat{\mathbf{b}}}_{i},\mathit{\eta})=\int \phi ({\mathit{\zeta}}_{i}\mid \mathbf{0},\mathbf{I})exp[{R}_{{m}_{i}}^{\ast}({\mathit{\zeta}}_{i},{\widehat{\mathbf{b}}}_{i})+{\{{m}_{i}^{-1/2}{\stackrel{\sim}{\mathbf{Q}}}_{i}^{1/2}(\mathit{\eta}){\mathit{\zeta}}_{i}+({\widehat{\mathbf{b}}}_{i}^{\ast}(\mathit{\eta})-{\widehat{\mathbf{b}}}_{i})\}}^{\top}{\mathbf{r}}^{\ast}({\mathit{\zeta}}_{i},{\widehat{\mathbf{b}}}_{i},\mathit{\eta})]d{\mathit{\zeta}}_{i}.$$

We note that |*U _{i}*(

Proofs rely on the factorization of the likelihood like function $\stackrel{\sim}{{l}_{{m}_{i}}}({\mathbf{b}}_{i}\mid \mathit{\eta})$ in an asymptotic sense. We sketch the proofs below.

We prove the result (ii) first. We let

$${\mathbf{\Delta}}_{{\mathit{\eta}}_{0}}=\left\{{\partial \sum _{i=1}^{n}logg({\widehat{\mathbf{b}}}_{i}|\mathit{\eta})/\partial \mathit{\eta}|}_{\mathit{\eta}={\mathit{\eta}}_{0}}\right\},\phantom{\rule{0.38889em}{0ex}}n{\mathbf{J}}_{{\mathit{\eta}}_{0}}=-E\left\{{{\partial}^{2}\sum _{i=1}^{n}logg({\widehat{\mathbf{b}}}_{i}\mid \mathit{\eta})/(\partial \mathit{\eta}\partial {\mathit{\eta}}^{\top})|}_{\mathit{\eta}={\mathit{\eta}}_{0}}\right\}.$$

By Lemma A.3 and expanding (A.3) around *η*_{0}, for any ** η** with ||

$$log{\stackrel{\sim}{\omega}}_{n}(\mathit{\eta})-log{\stackrel{\sim}{\omega}}_{n}({\mathit{\eta}}_{0})={(\mathit{\eta}-{\mathit{\eta}}_{0})}^{\top}{\mathbf{\Delta}}_{{\mathit{\eta}}_{0}}-\frac{1}{2}{(\mathit{\eta}-{\mathit{\eta}}_{0})}^{\top}({nJ}_{{\mathit{\eta}}_{0}})(\mathit{\eta}-{\mathit{\eta}}_{0})+{o}_{p}(1).$$

(A.6)

Under condition C.1.(i) we note that {*Var*(*n*^{−1/2}**Δ**_{η0})}^{1/2}*n*^{−1/2}**Δ**_{η0} → *AN*(**0**, **I**) and **J**_{η0} is positive definite almost surely as *n* → ∞ since (min_{1≤}_{i}_{≤}* _{n} m_{i}*)

$${n}^{1/2}({\stackrel{\sim}{\mathit{\eta}}}_{n}-{\mathit{\eta}}_{0})\sim AN(\mathbf{0},{\mathbf{J}}_{{\mathit{\eta}}_{0}}^{-1}{\mathbf{\Omega}}_{{\mathit{\eta}}_{0}}{\mathbf{J}}_{{\mathit{\eta}}_{0}}^{-1}),$$

where **Ω**_{η0} = *Var* (*n*^{−1/2}
**Δ**_{η0}) + *o _{p}*(1). The result (ii) follows by the Delta method.

For the proof of (i) we note that similar result as in (A.1) in Lemma A.2 holds for log * _{ni}*(

$$log{\stackrel{\sim}{\omega}}_{ni}(\mathbf{b})=log\phi \{\mathbf{b}\mid {\widehat{\mathbf{b}}}_{i}^{\ast}({\mathit{\eta}}_{0}),{m}_{i}^{-1}{\stackrel{\sim}{\mathbf{Q}}}_{i}({\mathit{\eta}}_{0})\}+{\stackrel{\sim}{R}}_{{m}_{i}}(\mathbf{b}),$$

(A.7)

where results corresponding to (A.2) hold for * _{mi}*(

We first prove the results assuming **Σ*** _{i}* =

$$\begin{array}{l}\stackrel{\sim}{{l}_{{m}_{i}}}(\mathbf{b}\mid \mathit{\theta})=log\phi (\mathbf{b}\mid {\widehat{\mathbf{b}}}_{i}^{\ast}(\mathit{\theta}),{m}_{i}^{-1}{\stackrel{\sim}{\mathbf{Q}}}_{i}(\mathit{\theta}))+{R}_{{m}_{i}}(\mathbf{b})\\ +log\phi ({\widehat{\mathbf{b}}}_{i}\mid \mathit{\beta},{\mathbf{V}}_{i}(\mathit{\theta}))+(1/2)log\mid {m}_{i}^{-1}{\mathbf{Q}}_{i}\mid ,\end{array}$$

(A.8)

where ${\mathbf{V}}_{i}(\mathit{\theta})=\mathbf{\sum}+{m}_{i}^{-1}{\mathbf{Q}}_{i}$. For (iii), we note that it follows from McCulloch (1982) that

$$E\left\{-{\frac{{\partial}^{2}\phi ({\widehat{\mathbf{b}}}_{i}\mid \mathit{\beta},{\mathbf{V}}_{i}(\mathit{\theta}))}{\partial \mathit{vech}({\mathbf{V}}_{i}(\mathit{\theta})){(\partial \mathit{vech}({\mathbf{V}}_{i}(\mathit{\theta})))}^{\top}}|}_{\mathit{\theta}={\mathit{\theta}}_{0}}\right\}=\frac{1}{2}{\mathbf{K}}^{\top}({\mathbf{V}}_{i0}^{-1}\otimes {\mathbf{V}}_{i0}^{-1})\mathbf{K},$$

where **K** defines the matrix operator *vech*(·) as shown in the theorem. The desired results follow as the Jacobians of *vech*(**Σ**) with respect to *vech*(**V*** _{i}*) are identity matrices. On the other hand, the results (i) of Theorem 2 follows from Theorem 3 and 4 of Chernozhukov and Hong (2003). (ii) is given by standard Bayesian results respectively. For the proof when

Mi-Ok Kim, Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229-3039.

Yunwen Yang, Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820.

- Bernardo JM, Smith AFM. Bayesian Theory. John Wiley & Sons, Ltd; 2003.
- Capoccia VA, Cotter F, Gustafson DH, Cassidy E, Ford J, Madden L, Owens BH, Farnum SO, McCarty D, Molfenter T. Making ‘Stone Soup’: Improvements in Clinic Access and Retention in Addiction Treatment. Joint Commission Journal on Quality and Patient Safety. 2007;33:95–103. [PubMed]
- Chernozhukov V, Hong H. An MCMC Approach to Classical Estimation. Journal of Econometrics. 2003;115:293–346.
- Cowles MK, Carlin BP. Markov Chain Monte Carlo Convergence Diagnostics: A Compariative Review. Journal of American Statistical Association. 1996;91:883–904.
- Dunson DB, Taylor JA. Approximate Bayesian Inference for Quantiles. Journal of Nonparametric Statistics. 2005;17:385–400.
- Dunson DB, Watson M, Taylor JA. Bayesian Latent Variable Models for Median Regression on Multiple Outcomes. Biometrics. 2003;59:296–304. [PubMed]
- Geraci M, Bottai M. Quantile Regression for Longitudinal Data Using the Asymmetric Laplace Distribution. Biostatistics. 2007;8:140–154. [PubMed]
- Geweke J. Evalating the Accurancy of Sampling-Based Approaches to the Calculation of Posterior Moments. In: Benardo JM, Berger J, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 4. Oxford: Oxford University Press; 1992. pp. 169–193.
- Henderson HV, Searle SR. Vec and Vech Operators for Matrices, With Some Uses in Jacobinas and Multivariate Statistics. The Canadian Journal of Statistics. 1979;7:65–81.
- Hanson T, Johnson WO. Modeling Regression Error with a Mixture of Polya trees. Journal of the American Statistical Association. 2002;97:1020–1033.
- Hoffman KA, Ford JH, Choi D, Gustafson DH, McCarty D. Replication and Sustainability of Improved Access and Retention Within the Network for the Improvement of Addiction Treatment. Drug and Alcohol Dependence. 2008;98:63–69. [PMC free article] [PubMed]
- Jung S. Quasi-likelihood for Median Regression Models. Journal of the American Statistical Association. 1996;91:251–257.
- Koenker R. Quantile Regression for Longitudinal Data. Journal of Multivariate Analysis. 2004;91:74–89.
- Koenker R. Quantile Regression. Cambridge University Press; 2005.
- Kottas A, Gelfand AE. Bayesian Semiparametric Median Regression Modeling. Journal of the American Statistical Association. 2001;96:1458–1468.
- Kottas A, Krnjajic M. Bayesian Nonparametric Modeling in Quantile Regression. Scandinavian Journal of Statistics. 2009;36:297–319.
- Lavine M. On an Approximate Likelihood for Quantiles. Biometrika. 1995;82:220–222.
- Lazar NA. Bayesian Empirical Likelihood. Biometrika. 2003;90:319–326.
- Lancaster T, Jun SJ. Bayesian quantile regression methods. Journal of Applied Econometrics. 2010;25:287–307.
- McCarty D, Gustafson DH, Wisdom JP, Ford J, Choi D, Molfenter T, Capoccia V, Cotter F. The Network for the Improvement of Addiction Treatment (NIATx): Enhancing access and retention. Drug and Alcohol Dependence. 2007;88:138–145. [PMC free article] [PubMed]
- McCulloch CE. Symmetric Matrix Derivatives with Applications. Journal of the American Statistical Association. 1982;77:679–682.
- Owen A. Empirical Likelihood for Linear Models. The Annals of Statistics. 1991;19:90–120.
- Owen A. Empirical Likelihood. Chapman & Hall/CRC Press; 2001.
- Qin J, Lawless J. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325.
- Reich BJ, Bondell HD, Wang HJ. Flexible Bayesian Quantile Regression for Independent and Clustered Data. Biostatistics. 2010;11:337–352. [PubMed]
- Ruppert D, Carroll RJ. Trimmed Least Squares Estimation in the Linear Model. Journal of the American Statistical Association. 1980;75:828–838.
- Schennach SM. Bayesian Exponentially Tilted Empirical Likelihood. Biometrika. 2005;92:31–46.
- Tian L, Liu JS, Wei LJ. Implementation of Estimating Function-Based Inference Procedures With Markov Chain Monte Carlo Samplers. Journal of the American Statistical Association. 2007;102:881–888.
- Yin G. Bayesian Generalized Method of Moments. Bayesian Analysis. 2009;4:191–208.
- Yu K, Moyeed RA. Bayesian Quantile Regression. Statistics and Probability Letters. 2001;54:437–447.
- Yang Y, He X. Bayesian Empirical Likelihood for Quantile Regression. The Annals of Statistics. 2011 under review.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |