J Am Stat Assoc. Author manuscript; available in PMC 2012 June 1.
Published in final edited form as:
PMCID: PMC3280824
NIHMSID: NIHMS299346

# Semiparametric Approach to a Random Effects Quantile Regression Model

Mi-Ok Kim, Associate Professor and Yunwen Yang, Ph.D., Student

## Abstract

We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.

Keywords: Empirical likelihood, Markov Chain Monte Carlo, Quasi-posterior distribution

## 1. INTRODUCTION

We extend quantile regression (QR) method to a random effects analysis of clustered data in this paper. A dominant paradigm of clustered data analysis is a Gaussian structure where the random effects and random errors are both assumed identically distributed among themselves, following common Gaussian cumulative distributions respectively. Under this structure, however, the effects of the explanatory variables are assumed to affect only the location of the conditional distribution of the response. Such exclusive attention to the location is constraining as the analysis is often motivated by a natural heterogeneity among the clusters. Moreover the response variable is often analyzed on a transformed scale in practice to meet the rather rigid homoscedastic Gaussian distribution assumptions. The results are, however, less straightforward and difficult to interpret on the original scale. In addition the conditional mean analysis, although most popular, does not necessarily adequately address research questions of interest as shown in the Network for the Improvement for Addiction Treatment example below. QR can provide an alternative analytic tool. Conditional regression quantiles are invariant under monotonic transformation like quantiles in a univariate case and are modelled locally for their relationships with the explanatory variables with the relationships allowed to differ from one another.

The aforementioned features of QR has been well recognized with independent data. Methodologies are fully developed and the model is widely applied (see Koenker (2005) for an overview). For the analysis of clustered data, however, a limited number of approaches have been proposed. Jung (1996) considered fixed effects median regression and proposed a quasi-likelihood approach. Koenker (2004) considered a random intercept model and proposed a l1 penalty approach. The l1 penalty approach is less stringent in its assumptions than most other methods. While the results of the l1 penalty approach depend on the choice of a penalty parameter, inference of the fixed effects was not studied with an empirically chosen penalty parameter. The method also may not be applicable to more complex random effects model as it essentially treats random effects as parameters and the increasing dimensionality can be an issue. Geraci and Bottai (2007) assumed an asymmetric Laplace error distribution and proposed an expectation-maximization (EM) estimator. Assuming a common asymmetric Laplace distribution, their method constrains the errors to be not only homoscedastic but also have a mode at the median.

In the Bayesian analysis frame work, several parametric approaches have been proposed, similarly using asymmetric Laplace error densities and mostly for the analysis of independent data (e.g., Yu and Moyeed (2001)). Nonparametric approaches were proposed to avoid the restrictive parametric assumption (e.g., Hanson and Johnson (2002), Kottas and Gelfand (2001) and Kottas and Krnjajic (2009)). Although capturing more general forms of skewness and tail behaviors, these nonparametric approaches also restrict the error densities to necessarily have their modes at the quantile of interest. Reich et al. (2010) relaxed this restriction with an infinite mixture of quantile restricted two component Gaussian mixture densities. These nonparametric Bayesian approaches, however, essentially model the error densities, although avoiding parametrically specifying them. As a consequence they can accommodate the error heteroscedasticity only by correctly specifying its form parametrically in the model. This informative modeling requirement of the error heteroscedasticity is restrictive compared with the generality of independent data QR analysis methodology, and make the computation complex.

We propose a semiparametric approach that does not require modeling of the error densities, thereby realizing all the desirable features of the independent data QR analysis without computational complexity. We assume random regression coefficients which may not be necessarily identically distributed or Gaussian. The random coefficients have a common mean which corresponds to the population-average effects of the explanatory variables on the conditional quantile of interest. The random coefficients represent cluster specific deviations in the covariate effects. Under appropriate conditions discussed later the median regression coincides with the conditional mean, and appears as a robust alternative against outliers in the responses. We consider the estimation of the random effects as an estimating equations problem and use empirical likelihood (EL) to incorporate the parametric likelihood of the random effects. We yield a semiparametric likelihood-like criterion function, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates the maximizer as a natural estimator. We use the Bayesian framework and Markov Chain Monte Carlo (MCMC) samplers for the computation.

Monte Carlo methods have been used for classical estimation problems under various settings where classical methods are handicapped by computational difficulties (see Tian et al. (2007) for an overview). Recent works include Chernozhukov and Hong (2003) and Tian et al. (2007). Chernozhukov and Hong (2003) particularly considered EL and censored QR for the analysis of independent data. In this paper we are concerned with random effects QR analysis and semiparametric likelihood-like criterion function.

A few works in the Bayesian literature also have considered likelihood-like or non-likelihood statistical criterion functions (e.g. Lavine (1995); Dunson et al. (2003); Dunson and Taylor (2005); Lazar (2003), Schennach (2005), Lancaster and Jun (2010)). They were motivated by the computational complexities entailed in nonparametric Bayesian methods and the difficulty of likelihood specification. Lazar (2003) and Lancaster and Jun (2010) specifically considered EL. Most of the works were concerned with independent data analysis with few exceptions. Dunson et al. (2003) considered median regression for a latent variable model with multiple surrogate outcomes under Gausian within-subject dependency structure. Yin (2009) used a quadratic likelihood-like function motivated by generalized estimating equations and proposed Bayesian generalized method of moments method.

The proposed method is similarly motivated: the aforementioned Bayesian nonparametric methods are complex in the computation and require an informative modeling of the error heteroscedasticity. This work does not require modeling of the error densities. EL is one choice that does not require modeling of the error densities. Any non-parametric likelihood such as exponentially tilted EL can be used instead. Using the MCMC sampler, the proposed method also does not require directly estimating the variance of the estimator for inference and is not subject to a known challenge of QR inference of estimating error densities at the quantile of interest. In this paper we provide large sample properties of the resulting quasi-posterior estimators and inference, being the first work clearly showing shrinkage of the random effects estimators toward the population average effect and the asymptotic normality of the population average effect estimator.

The remainder of this paper proceeds as follows. Section 2 formally defines the semi-parametric random effects quantile regression (REQR) estimator and provides their large sample properties. Section 3 describes MCMC methods. Section 4 provides empirical results including the analysis of two real data examples. Section 5 concludes. All the proofs are deferred to the Appendix.

## 2. MODELS AND METHODS

For expository purposes we assume random design points in this section. If the design points are non-stochastic, the results we obtain in this section hold under appropriate conditions on the design sequences.

### 2.1 Random Effects Quantile Regression (REQR) Model

Let ${(yij,xij)}j=1mi$, i = 1, ···, n, denote observations from n clusters with cluster size mi. For each of the n clusters, we suppose that ${(yij,xij)}j=1mi$ is a random sample from the following random effects quantile regression (REQR) model,

(1)

where x Ξi Rq is a random covariate that includes an intercept with a distribution function $FiX$, Ξi is the support of x in the i-th cluster, and Qi,τ{y|x, bi(τ)} denotes the τ-th conditional quantile of the response y R in the i-th cluster given x and bi(τ). We view the cluster as randomly chosen from a population, and thus the cluster-specific quantile coefficients bi(τ) are random variables. We further assume the below condition:

• C.1
For i = 1, ···, n, bi(τ) are q-dimensional random variables supported on i with a common mean β(τ) and cluster-specific covariance matrix Σi(τ) that are positive definite.

Similar to the usual QR analysis for independent data (mi = 1), bi(τ) denotes the quantile specific effects of the covariates on the responses in the i-th cluster, while the common mean β(τ) corresponds to the population-average effects of the covariates specific to the τ-th conditional quantile of the response. If Σi(τ) = Σ(τ) for all clusters, Σ(τ) represents the variability of the quantile specific covariate effects across the clusters. In this paper we consider the problem of estimating the model for a specific regression quantile with a given τ (0, 1). We therefore omit τ from notations whenever no confusion would arise.

In order to compare the model with the conditional mean model, we rewrite (1) as

(2)

where are mean zero random effects that correspond to deviations in the covariate effects of the τ-th conditional quantiles in the individual clusters from the population-average effect, and eij are random errors with zero τ-th conditional quantiles. Similar to the conditional mean regression, eij are independent of given xij. In the Network for the Improvement for Addiction Treatment (NIATx) example presented in Section 4.1 later, where τ = 0.9 and x = (1, t, t2) is considered with t denoting a random admission time counted from the beginning of the NIATx funding, xβ corresponds to the mean of the 90th percentile wait time at the admission time t. Also indicates a shorter wait time than the population average of the 90th percentile in the i-th addiction treatment center, while indicates a longer wait time.

If τ = 0.5 and eij have symmetric distributions with a finite mean, the median REQR is equal to the conditional mean: . Moreover, the averaged conditional median is the conditional mean of a marginal model: for τ = 0.5. In this case, the proposed method can be seen as a robust alternative to the random effects conditional mean regression against outliers in the responses.

It is well known with the conditional mean regression that integrating over the random effects produces a marginal model where the regression parameters retain their meaning. Such a relationship does not hold with the REQR. Model (1) is not generally related to a marginal QR model, Qτ (y|x) = xζ(τ). Obviously, we cannot expect β(τ) = ζ(τ) unless Ξi = Ξ and

where Fi|x} denote the conditional distribution of eij given xij in model (2) and the expectation is taken with respect to the distribution of .

### 2.2 Estimation

Let Σ1, ···, Σn be parameterized by a vector σ. Define θ = (β, σ) with the true value denoted by θ0 such that β(θ0) = β0 and Σi(θ0) = Σi0. We denote the parameter space of θ by Θ. We assume the following conditions for all 1 ≤ in:

• C.2
bi is the unique solution to E [ϕτ {yxbi}x] = 0, where ϕτ (t) = τI[t<0], the derivative of the check function and the expectation is taken over the conditional distribution of y given x in the i-th cluster.
• C.3
Fi(t|x) has the density fi(t|x) that is differentiable at 0 with 0 < fi(t|x) < ∞ and for ξ > 0 and ε > 0.
• C.4
E(xx) = Di0 and E{fi(0|x)xx} = Di1 in the i-th cluster for some positive definite matrix Di0 and Di1, where the expectations are taken over x from $FiX$.
• C.5
Ξi is uniformly bounded in Rq.

Under conditions C.2–C.4 standard QR method applied to each cluster separately produces estimators of the random coefficients i with asymptotic variances $mi−1Qi$ where $Qi=Di1−1Di0Di1−1$. We define

$Vi=∑i+mi−1Qi.$
(3)

Given x, Vi is the asymptotic variance of the the per cluster standard QR analysis estimator xbi about the population averaged regression quantile of interest: E{(xix β2|x} = xVix. Condition C.5 is a sufficient condition and may be relaxed.

We first define a semiparametric likelihood-like criterion function for the random coefficients bi. Given τ and in the i-th cluster, i is formally defined as a solution to the minimization problem

where ρτ (·) is the check function. As well known, i can be alternatively defined as a solution to estimating equations,

This estimating equations formulation motivates the below empirical likelihood (EL) for the coefficient bi in the i-th cluster:

where pj denote weights for the j-th observation in the cluster respectively. Like in parametric case, EL method considers a parameter as a function of distributions, specifically those empirically defined by a set of weights ${pj}j=1mi$, and defines a likelihood for the parameter by profiling out ${pj}j=1mi$. We refer to Owen (2001) for a general introduction and Qin and Lawless (1994) for its application with estimating equations. We note that the above EL formulation is also valid with fixed design points under appropriate conditions on the design sequence (see Owen (1991) for details), and accordingly the results of this paper hold for fixed design points.

We let gi|η) denote the densities of bi under Condition C.1, where η is a r-variate vector in Ψ parameter space and not necessarily equal to θ. For expository simplicity, we assume η = θ and Ψ = Θ here. We define a semiparametric likelihood-like criterion function for bi by

(4)

By the independence among the clusters a semiparametric likelihood-like criterion function for θ is given by

Under some regularity conditions, the above integral admits a quadratic form of the first order expansion on the log scale centered at θ0 in a O (n−1/2) neighborhood of θ0, while is asymptotically equivalent to a multivariate normal density (see (A.6) and (A.7) in the Appendix). This motivates the following maximizers as natural estimators:

A direct computation of the estimators is daunting. We instead use the Bayesian framework and MCMC samplers.

Let π(θ) denote a non-informative prior appropriately defined for gi(·|θ). Define

We consider a quasi-posterior defined by ωn(θ, b1, ···, bn):

(5)

Improper priors may not guarantee a proper posterior and Yang and He (2011) specifically advise avoiding flat priors with their Bayesian EL approach. With the proposed REQR method, however, improper priors will guarantee a proper posterior, if they produce a proper posterior in relation to gi(b|θ) when applied to ${bi}i=1n$ as if they were observed. From the definition of EL, we note that lim supmi→∞ supbi Lmi(b) ≤ 1 for all 1 ≤ in and every n. We have

where the second integral can be finite with a certain choice of improper π(θ) specific to gi(bi|θ). For example, when gi(b|θ) = (b|β, Σ) for some normal density (·), a joint flat prior is the improper prior π(β, Σ) |Σ|−(q+1)/2, which produces a proper posterior:

(6)

where and IWn−1(·) denotes the inverse-Wishart distribution with (n − 1) degrees of freedom. If such choice of improper priors is not available, we recommend vague priors.

We let ni(b) and n(θ) denote the marginal densities appropriately defined corresponding to bi and θ. The quasi-posterior means are then respectively defined as

(7)

We consider the following specific conditions in addition to Condition C.1.

• C.1.(i) For 1≤ in, gi(·|η) = g(·|η) with η0 denoting the true value.
• C.1.(ii) bi have normal densities with not necessarily identical Σi. The heterogeneity among Σi is limited such that θ is a finite vector.

The requirements of condition C.1.(i) are comparable to those required for a general form of the Bayesian equivalent of the central limit theorem to hold for the posterior distribution of η when standard Bayesian method is applied to ${bi}i=1n$, if bi were observed (see Chapter 5.3 of Bernado and Smith (2003) for example). They are general and admit skewed multivariate normal or multivariate student-t distributions for bi. The limited heterogeneity of the normal densities in condition C.1.(ii) is a standard assumption of the Gaussian conditional mean regression analysis. It essentially enables Σi to be estimated at a parametric rate.

We first present asymptotic results with the normal densities. We define $Vi0=∑i0+mi−1Qi$ following the definition of Vi in (3).

#### Theorem 1

Assume condition C.1, C.1.(ii) and C.2–C.5.

1. As mi → ∞, $b~mi={I−(mi−1Qi)Vi0−1}b^i+(mi−1Qi)Vi0−1β~n+op(mi−1/2)$ for all 1 ≤ in. Moreover, $Var(b~mi)={I−(mi−1Qi)Vi0−1}(mi−1Qi)+op(mi−1)$.
2. If mi → ∞ for all 1 ≤ in and n → ∞, then $β~n=n−1∑i=1n(V¯nVi0−1)b^i+op(n−1/2)$, where $V¯n={n−1∑i=1nVi0−1}−1$. Also n1/2(n0) ~ AN(q) (0, n)
3. Suppose Σi = Σ and n is accordingly defined by the posterior mean n. Consider matrix operator vec(·) that takes a matrix and forms a large column vector by stacking the columns, and its generalization vech(·) for a symmetric matrix, respectively described in Henderson and Searle (1979). If mi → ∞ for all 1 ≤ in and n → ∞, then
where K is a unique matrix given by the relationship vec(M) = K vech(M) for any symmetric matrix M of the same dimension as Σ and represents the Kronecker product.

The first result of the theorem shows shrinkage in the REQR estimators that is comparable to the conditional mean regression: given x, the posterior means xmi are weighted averages of the population averaged profile xn and the within cluster QR estimates xi with the weights defined by ratios between the asymptotic within cluster variances $mi−1Qi$ and Vi0. Much weight will be given to the overall average profile if the within cluster variability of the quantile is large in comparison to the between-cluster variability, whereas much weight will be given to the cluster level estimator if the opposite is true. Also Var(lmi) ≤ Var(li) asymptotically for any linear combination l of the random effects.

From (ii) we note that n is a weighted average of the per cluster estimators i with the weights determined by the relative magnitude of $Vi0−1$. If Σi0 = Σ0, then $V¯n=∑01/2{I−n−1∑i=1n(I+mi∑01/2Qi−1∑01/2)−1}−1∑01/2$. As Qi are positive definite, asymptotically for any linear combination l.

In practice we have Markov chains , i = 1, ···, n, and S(θ) = (θ(1), θ(2), ···, θ(B)) for some large B. We estimate mi, n, Var (mi), Var(n), and Var(vech(n)) by the mean and the variance-covariance matrix of the corresponding MCMC sequences. For a given α (0, 1) we can construct 100(1 − α)% confidence intervals based on the normal approximation results given in Theorem 1. We denote this confidence intervals by [c1,n{S(·), α/2}, c1,n{S(·), 1 − α/2}]. Alternatively we can construct confidence intervals using the sample percentiles of the MCMC sequence S(·). We denote them by [c2,n{S(·), α/2}, c2,n{S(·), 1 − α/2}]. The below theorem shows that both intervals are asymptotically valid.

#### Theorem 2

Assume condition C.1, C.1.(ii) and C.2–C.5. For any α (0, 1),

1. limmi→∞ P[ck,n{S(bi), α/2} ≤ bick,n{S(bi), 1 − α/2}] = 1 − α for k = 1, 2 and 1 ≤ in.
2. limn→∞ P[ck,n{S(θ), α/2} ≤ θ0ck,n{S(θ), 1 − α/2}] = 1 − α for k = 1, 2.

The results similarly hold for the inference of v(bi) or v(θ0) for some given continuously differentiable function v : Θ → R or v : iR. A known challenge of QR inference is that a direct estimation of the variance requires estimating the error densities at the quantile of interest. The proposed method is not subject to this requirement.

Here each can be seen as a posterior density with gi(b|θ) an informative prior, and its asymptotic normality is shown by Lazar (2003) and Yang and He (2011). As does not contain any population level parameters for the error distributions, completely allowing the errors to be heteroscedastic, and is only known to be consistent asymptotically, mi are required to grow along with n for the asymptotic results of θ. With the normal densities assumed for bi, nevertheless, mi are only needed to grow, independently of each other and independent of n. With the non-normal densities, however, we require a rather stricter condition (min1≤in mi)n−1/2 → ∞ as a sufficient condition as shown below.

Under the non-normal density condition of C.1.(i), we define Vi0 with Σi0 = Λ(η0) where Λ(η) is given in Lemma A.3 in the Appendix.

#### Theorem 3

Assume Conditions C.1, C.1.(i) and C.2–C.5.

1. As mi → ∞, $b~mi={I−(mi−1Qi)Vi0−1}b^i+(mi−1Qi)Vi0−1β~n+op(mi−1/2)$ for all 1 ≤ in. Moreover, $Var(b~mi)={I−(mi−1Qi)Vi0−1}(mi−1Qi)+op(mi−1)$.
2. If n → ∞ along with mi, 1 ≤ in, such that (min1≤in mi)n−1/2 → ∞, then , where H(·) is the Jacobian given in the Condition C.1.(i), and Jη0 and Ωη0 are quantities related to the hessian and the score of log n(η) defined in (A.6) in the Appendix.

Inference results corresponding to Theorem 2 also holds accordingly.

## 3. MARKOV CHAIN MONTE CARLO (MCMC)

We propose a simple combination of Metropolis-Hastings (M-H) and Gibbs algorithm. As bi are independent given θ and the posterior is always proportional to the joint, we have

where denotes equivalence up to some constants. Many algorithms are available for the computation of Lmi(b), for example, R package ‘emplik’. We use a M-H algorithm with a pre-specified normal proposal distribution to sample each bi. On the other hand,

where the right hand side may have a readily available closed form as gi(bi|θ) are parametrically specified. A Gibbs sampling or M-H algorithm is used to sample θ. Specifically when bi are assumed having a common Gaussian distribution as in the empirical studies in Section 4, θ = (Σ, μ) can be sampled by a Gibbs sampling algorithms using an inverse-Wishart and a normal density as shown in (6). For expository simplicity, we assume η = θ above. If ηθ, η will be sampled instead of θ and θ is given by h(η) under Condition C.1.(i). As mentioned previously, we avoid modeling the error densities entirely, and the MCMC computation is much simpler.

## 4. NUMERICAL STUDIES

All computations were conducted in R (R Development Core Team, 2006) with the empirical likelihood computed using el.cen.EM2 function in R package ‘emplik’. Proposal distributions for the Metropolis-Hastings algorithm were chosen to have acceptance rates in the range of 0.1 ~ 0.4.

### 4.1 Real Data Examples

Real data examples were analyzed with 25,000 MCMC samples, the first 5,000 discarded as burn-in, and the standard QR solutions as initial values. The random effects were assumed Gaussian. The convergence of the Markove chains was checked and found satisfactory by the methods recommended by Cowles and Carlin (1996). For example, based on the Z-score by Geweke (1992), we found that all Z-scores were not significant. Specifically for the Example 1 NIATx data, Z-scores took values of 1.217, −0.074, and −1.747 for β = (β0, β1, β3) and 1.699, 0.243, −1.514, 0.261, 1.520, and 0.544 for Σ = (Σ11, Σ12, Σ13, Σ22, Σ23, Σ33) respectively. Z-scores for the random effects ranged from −1.897 to 1.723. We provide the trace plots in Figure 1.

Trace plots of the Markov chains: the top panel contains the trace plots for β(0.90), population averaged regression coefficients of a second degree polynomial at τ = 0.90. The bottom panel contains the trace plots for corresponding random ...

#### Example 1

NIATx We analyze data from the Network for the Improvement for Addiction Treatment (NIATx) using a second degree polynomial REQR. The NIATx is a learning community of alcohol and other drug treatment centers. It uses process improvement strategies to treatment services and facilitates organizational change in the delivery of addition treatment services (Capoccia et al. (2007)). A cross-center evaluation monitored impacts on days between first contact and first treatment. In this paper we analyze data on the delay time from twenty-six programs that provided at least ten months of data. The majority of the centers (22/26) have samples larger than 150 with eighteen of them greater than 200. This allows modeling of high quantiles as shown below. We refer to McCarty et al. (2007) and Hoffman et al. (2008) for more information on the study design, methods and outcomes.

Centers that successfully used process improvement reduced delay times and the reduction was more apparent the longer they implemented process improvements. The decreasing trend in the delay time, however, is not adequately summarized by the conditional mean. A small or zero improvement is expected in the lower tail of the distribution where delays were previously short. In the upper tail where delays were long, however, a more pronounced improvement is expected. The upper half of the data above the median are particularly interesting as the process improvement strategies were anticipated to reduce the spread of the data towards the median in the upper half. This justifies REQR analysis of the data. The distribution of delay times is right skewed (see examples in Figure 2) and analyzed on the log scale. We include admission time of each patient (in month since the beginning of NIATx participation) as a covariate and consider a change in the delay time as a second degree polynomial function of the admission time. For comparison we analyze the data for changes in the conditional mean delay time using restricted maximum likelihood (REML) method. We also conduct QR analysis separately for each center using the standard independent data QR method.

Scatter plots of days between first contact and first treatment against month of admission for four centers: m indicate per center sample sizes and supper imposed curves represent regression fits by three different methods: REQR, REML and standard independent ...

Figure 2 shows the regression results for four centers. Increasing heterogeneity is observed among the centers at higher quantiles. The increasing between center heterogeneity is more clearly depicted in Figure 3. If the success of the process improvement strategies is reflected in decreasing delay time, this suggests that successful centers differe from unsuccessful ones more in the upper quantiles than in the medians. Center 24 is an example where the conditional mean analysis found no significant change, while the proposed REQR method found significant trends locally in the high quantiles. Center 12 exhibits an ideal trend anticipated with the process improvement strategies. The standard independent data QR method is known to sometimes produce crossing quantile fits in a small sample and Center 73 is one such case. Quantiles estimated by the proposed REQR, however, are not crossed. The REQR analysis pools information across the centers and the borrowed information offsets the small sample size.

Fitted second degree polynomials for the conditional quantiles at τ {0.50, 0.75, 0.90}.

Figure 4 presents point and interval estimates of the random coefficients of the linear term in the second degree polynomial 0.9-th quantile curves by the independent data QR and proposed RERQ method. Horizontal lines are respectively the population average effect estimate by the REQR method and a simple average of the per center estimates by the independent data QR method. The point estimates by the RERQ method are shrunken toward the population average effect estimate, and the interval estimates by the proposed RERQ method tend to be shorter than those by the standard independent QR method. This supports the asymptotic results of Theorem 1-(i).

Point and interval estimates by the proposed REQR and independent data QR method: the interval estimates correspond to 95% credible intervals for the REQR analysis and 95% confidence intervals for the per center analysis by the independent data QR method. ...

#### Example 2

Swallow Study In this section we analyze apnea duration data from swallow study previously analyzed in Reich et al. (2010) and reference therein. Apnea duration is the period of nasal airflow cessation during swallowing. Longer durations often indicate age-related function changes for seniors. The duration was measured repeatedly under different experimental conditions defined by feeding condition (examiner- versus self-fed), viscosity (liquild or pudding), and volume (5ml or 10ml). Similar to Reich et al. (2010), we include indicators of the three experimental conditions and their two-way interactions as covariates and analyze a total of 1286 observations measured on twenty-three elderly women of age seventy-five or older, each with fifty-five or fifty-six observations.

The distribution of apnea duration times remains right-skewed after a log transformation and the spread of the data also varies widely across the subjects (see examples in Figure 5). Figure 6 shows that the data distributions under different experimental conditions differ more in the right tail than in the middle. These justify the proposed REQR analysis. We analyze the data for the conditional median and 0.9-th quantile. For comparison we conduct standard QR analysis that assumes all observations are independent.

The scatter of apnea duration times in seconds per subject on the log scale.
Boxplots of apnea duration times in seconds on the log scale: Wide boxes represent subjects being fed by the examiners, and narrow boxes represent self-feeding. Shaded boxes are pudding and white boxes are liquid. The horizontal lines are the sample quantiles ...

Table 1 provides a summary. The REQR results agree well with the boxplot presentation of the data in Figure 6. The significant viscosity and viscosity by feeding condition interaction estimates suggest that swallowing liquid took significantly longer time, particularly when the subjects were fed by examiners. This is indicated by the higher median and 0.9-th sample quantile lines in the white boxes than in the shaded boxes across the experimental conditions, while the differences much bigger in the comparison of the wide white boxes with the wide shaded boxes. The non-significant volume and significant viscosity by volume interaction estimates suggest that swallowing the larger quantity did not necessarily take significantly more time, while the difference in the duration time by the volume was significantly larger when liquid was swallowed compared to pudding. This is implied by the greater height differences in the median and 0.9-th sample quantile lines by volume among the white boxes than the shaded boxes in Figure 6. The estimated effects are much larger for the 0.9-th conditional quantile.

95% interval estimates for the apnea duration example: the REQR method is compared with the standard QR method that assumes independency for the within subject correlation. Intervals with asterisks do not included 0.

Compared to the independent data QR analysis, the REQR provides similar results, mostly agreeing on the significance of the effects implied by the interval estimates. However, the interval estimates tend to be much shorter with the REQR. The shorter intervals are expected as the independent data QR method ignores the within subject correlation in the data. The REQR also finds one more significant effect in the 0.9-th conditional quantile analysis.

We now compare our results with those of Reich et al. (2010). The results are notably different concerning the effects of feeding condition and viscosity by feeding interaction in the 0.9-th conditional quantile analysis. Contrary to our results, Reich et al. (2010) found the feeding condition effect significant, while the viscosity by feeding interaction effect non-significant. The two results are not directly comparable as Reich et al. (2010) considered the population quantiles using the below marginal model:

where αi(τ) are random subject effects, accounting for the within subject dependency, and eij are independent within subject errors. As discussed in Section 2.1, our conditional model and the above marginal model are not necessarily related and cannot be compared. Nevertheless we note that our results agree with the boxplot representation of the data better.

### 4.2 Simulations

We use the below model and study the performance of the proposed REQR method by Monte-Carlo simulation:

where b0i and b1i are univariate random variables, respectively sampled from an independent standard normal distribution $N(1)(β0,∑02)$ and $N(1)(β1,∑12)$, zij are sampled from a uniform distribution, U(0, 121/2), and eij are independent of b0i and b1i, sampled from one of the following six distributions: N(1)(0, 1), log normal distribution (LN (0, 1)), t distribution with the degree of freedom 2 (t2), and their heteroscedastic versions.

4,000 MCMC samples were drawn per replicated data and the first 1,000 were discarded as burn-in. We randomly sampled values from N(1)(0, 1) and used them as initial values. In each simulation setting we examined the convergence of the Markove chains for the first 10 replicated datasets and found the convergence satisfactory similarly by the methods recommended by Cowles and Carlin (1996). We provide examples of trace plots in Figure 7.

Trace plots of the Markov chains: the top panel contains the trace plots of the random intercepts and the bottom panel slopes respectively of 4 clusters in a simulation setting with homogeneous errors from t distribution with 2 degree of freedom. For ...

The results for the medians (τ = 0.5) are summarized and compared with the results of the restricted maximum likelihood (REML) method in Table 2 and and3.3. When the ho-moscedastic Gaussian error assumption of the REML is met, the REQR performed slightly worse in a small sample (m = 10) yet comparably in a moderately large sample (m = 20). The REQR and REML results are not directly comparable about the intercept when the error distribution is log normal, as they measure different quantities. With this exception, the proposed method outperformed the REML, particularly when inference is concerned with the variances. In a small sample (m = 10) the REML reported 0 for the variances of the random effects particularly in an alarmingly large number of cases for the heteroscedastic errors. With the 0 random effects variance estimates, the error variance estimate is inflated. This leads to a poor coverage for the variance parameters and an over-coverage and extended confidence intervals for the mean parameters β. The REQR performed overall decently in a small sample (m = 10) with homogeneous errors and reliably across all distributions in a moderate sample (m = 20) across all the error distributions.

Monte-Carlo study results for τ = 0.5 with n = 20 and m = 10 based on 300 replicated datasets: MSE* indicates the ratio of mean squared errors (MSE) for REML method over MSE for the REQR method. AL** similarly indicates the ratio of average interval ...
Monte-Carlo study results for τ = 0.5 with n = 20 and m = 20 based on 500 simulations: MSE* and AL** report the ratios of mean squared errors (MSE) and average interval lengths as in Table 2. CP (%) is the coverage of 95% intervals in percent. ...

Table 4 shows that the proposed method performed reasonably well for the 0.9-th quantile in a small sample with homoscedastic errors yet only for the slope parameters with heteroscedastic errors. Increase in the sample size did not improve the performance much for the intercept parameter. The non-normal distributions under consideration are extreme examples and represent challenging cases under the heteroscedastic setup. Moreover the simulation setups are not favorable with small between cluster variabilities relative to the within cluster variabilities. The high quantile under consideration compounded the difficulty. The performance improved when the between cluster variability increased. For example, when b0i ~ N(1)(0, 4) and b1i, N(1)(0, 4), we obtained comparable coverage results under the heteroscedastic settings as to the homoscedastic setting results reported in Table 4.

Coverage (%) of 95% intervals for τ = 0.9 based on 300 simulations with n = 20.

## 5. CONCLUSION

In this paper we extend QR method to a random effects analysis of clustered data, while fully preserving the features of the standard independent data QR method and admitting non-Gaussian random effects. The parameterization of the proposed REQR model is intuitive. The mean of the cluster specific random coefficients correspond to the population-average effects of the explanatory variables on the conditional quantile of interest. The discrepancy between the random coefficients and their mean corresponds to the deviation in the explanatory variable effects in the individual clusters from the population average. With symmetric error distributions with a finite mean, the median REQR coincides with the conditional mean and can be seen as a robust alternative that performs comparably when the homoscedastic Gaussian assumptions are met and outperforms in other cases. The proposed approach also avoids modeling the error distributions entirely and is computationally simpler than existing nonparametric Bayesian methods.

## Acknowledgments

The authors thank Drs. Dongseok Choi and Dennis McCarty in the Department of Public Health and Preventive Medicine at the Oregon Health & Science University for sharing the Network for the Improvement for Addiction Treatment (NIATx) data. The authors thank the editor, the associate editor and two reviewers for their insightful comments. This research is supported in part by the National Institutes of Health (1R03CA133944-01). The collection of the NIATx data is supported by awards from the Robert Wood Johnson Foundation, the Center for Substance Abuse Treatment, and the National Institute on Drug Abuse (R01 DA018282; R01 DA020832).

## APPENDIX: PROOFS

We will omit τ from notations below wherever the omission does not cause confusion. We let , lmi(b) = log Lmi(b) and . We denote a multivariate normal density by (·|·) and a unit vector in Rq by u.

#### Lemma A.1

Assume conditions C.2–C.5. For all 1 ≤ in, given bi,

1. $‖mi−1/2∑j=1miz(xij,yij,b^i)‖=op(1)$.
2. $||∑j=1mi{z(xij,yij,b)−Ez(xij,yij,b)}||=Op(mi1/2)$ and , uniformly for all b in an o(1)-neighborhood of bi.
3. $mi−1/2∑j=1miz(xij,yij,b)=mi−1/2∑j=1miz(xij,yij,bi)−Di1{mi1/2(b−bi)}+Op(mi−1/2)$, uniformly for all b in an $O(mi−1/2)$-neighborhood of bi.

##### Proof

(i) follows from Lemma A.2 in Ruppert and Carroll (1980) and (ii)–(iii) follows from Lemma A.5 in Yang and He (2011).

#### Lemma A.2

Assume conditions C.2–C.5. Given bi, there exists ε1 > 0 for any δ1 such that

$liminfmi→∞P{sup||b−bi||≥δ1mi−1{lmi(b)−lmi(bi)}≤−ε1}=1.$
(A.1)

Also for b in an open neighborhood of bi, , where there exists a sufficiently small δ2 > 0 and large M > 0 for each ε2 > 0 such that

(A.2)

##### Proof

With bi replacing i, the above results follow directly from Lemma A.2 and A.4 in Yang and He (2011) or Proposition 2 in Chernozhukov and Hong (2003). The results still holds with i by Lemma A.1-(i).

#### Lemma A.3

Assume conditions C.1 and C.1.(i) for g(b|η). We define and $Q~i(η)=[Qi−1+mi−1{Λ(η)}−1]−1$. Also assume conditions C.2–C.5. If (min1≤in mi)n−1/2 → ∞, for any η in an o(1)-neighborhood of η0,

(A.3)

where similar results as in (A.2) hold for Rn(η).

##### Proof

Under C.1.(i) it holds that for any b, b* with ||bb*|| < ε,

(A.4)

where ||E{r(b, b*, η)}|| < ∞ and E[u{r(b, b*; η)/η}u] < ∞ for all η Ψ as ε → 0. As $||bi−b^i||=Op(mi−1/2)$, it follows from Lemma A.2 and (A.4) that given bi and for all b in a o(1)-neighborhood of bi,

(A.5)

where and , and . As (min1≤in mi)n−1/2 → ∞ and (A.5) holds true for all 1≤ in with b = bi, we have

where Ui(i, η) = ∫bi exp{υmi(bi, i, η)} d bi. Consider . For some appropriately modified functions and r*(ζi, i, η),

We note that |Ui(i, η)| = Op(1) by Lemma A.2 for any η Ψ and Ui(i, η) is continuous in η. Also as , and for all η Ψ, and ∫{log (ζi|0, I)/η}(ζi|0, I) dζi = 0. It follows that for all η Ψ and all 1 ≤ in. The desired results follow as n → ∞ with (min1≤in mi)n−1/2 → ∞.

#### Proof of Theorems

Proofs rely on the factorization of the likelihood like function in an asymptotic sense. We sketch the proofs below.

#### Proof of Theorem 3

We prove the result (ii) first. We let

By Lemma A.3 and expanding (A.3) around η0, for any η with ||ηη0|| = O (n−1/2),

(A.6)

Under condition C.1.(i) we note that {Var(n−1/2Δη0)}1/2n−1/2Δη0AN(0, I) and Jη0 is positive definite almost surely as n → ∞ since (min1≤in mi)n−1/2 → ∞. It follows from standard Bayesian equivalent of the central limit theorem results such as Bernado and Smith (2003) (Chapter 5.3) that we have

$n1/2(η~n−η0)~AN(0,Jη0−1Ωη0Jη0−1),$

where Ωη0 = Var (n−1/2 Δη0) + op(1). The result (ii) follows by the Delta method.

For the proof of (i) we note that similar result as in (A.1) in Lemma A.2 holds for log ni(b) under condition C.1.(i). As for all η Ψ, it follows from (A.5) and by the definition of ni(·) that for any b in an open neighborhood of bi that is given,

(A.7)

where results corresponding to (A.2) hold for mi(b). The rest of the proof follows by Theorem 2 of Chernozhukov and Hong (2003).

#### Proof of Theorem 1 and 2

We first prove the results assuming Σi = Σ. For the proof of Theorem 1, it suffices to note that in (A.4) r(b, b*, θ) = 0 and {Λ(η)}−1 = Σ−1 and (A.5) becomes

(A.8)

where $Vi(θ)=∑+mi−1Qi$. For (iii), we note that it follows from McCulloch (1982) that

where K defines the matrix operator vech(·) as shown in the theorem. The desired results follow as the Jacobians of vech(Σ) with respect to vech(Vi) are identity matrices. On the other hand, the results (i) of Theorem 2 follows from Theorem 3 and 4 of Chernozhukov and Hong (2003). (ii) is given by standard Bayesian results respectively. For the proof when ΣiΣ for some i, it suffices to point that (A.8) remains true with Σi.

## Contributor Information

Mi-Ok Kim, Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH 45229-3039.

Yunwen Yang, Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL 61820.

## References

• Bernardo JM, Smith AFM. Bayesian Theory. John Wiley & Sons, Ltd; 2003.
• Capoccia VA, Cotter F, Gustafson DH, Cassidy E, Ford J, Madden L, Owens BH, Farnum SO, McCarty D, Molfenter T. Making ‘Stone Soup’: Improvements in Clinic Access and Retention in Addiction Treatment. Joint Commission Journal on Quality and Patient Safety. 2007;33:95–103. [PubMed]
• Chernozhukov V, Hong H. An MCMC Approach to Classical Estimation. Journal of Econometrics. 2003;115:293–346.
• Cowles MK, Carlin BP. Markov Chain Monte Carlo Convergence Diagnostics: A Compariative Review. Journal of American Statistical Association. 1996;91:883–904.
• Dunson DB, Taylor JA. Approximate Bayesian Inference for Quantiles. Journal of Nonparametric Statistics. 2005;17:385–400.
• Dunson DB, Watson M, Taylor JA. Bayesian Latent Variable Models for Median Regression on Multiple Outcomes. Biometrics. 2003;59:296–304. [PubMed]
• Geraci M, Bottai M. Quantile Regression for Longitudinal Data Using the Asymmetric Laplace Distribution. Biostatistics. 2007;8:140–154. [PubMed]
• Geweke J. Evalating the Accurancy of Sampling-Based Approaches to the Calculation of Posterior Moments. In: Benardo JM, Berger J, Dawid AP, Smith AFM, editors. Bayesian Statistics. Vol. 4. Oxford: Oxford University Press; 1992. pp. 169–193.
• Henderson HV, Searle SR. Vec and Vech Operators for Matrices, With Some Uses in Jacobinas and Multivariate Statistics. The Canadian Journal of Statistics. 1979;7:65–81.
• Hanson T, Johnson WO. Modeling Regression Error with a Mixture of Polya trees. Journal of the American Statistical Association. 2002;97:1020–1033.
• Hoffman KA, Ford JH, Choi D, Gustafson DH, McCarty D. Replication and Sustainability of Improved Access and Retention Within the Network for the Improvement of Addiction Treatment. Drug and Alcohol Dependence. 2008;98:63–69. [PubMed]
• Jung S. Quasi-likelihood for Median Regression Models. Journal of the American Statistical Association. 1996;91:251–257.
• Koenker R. Quantile Regression for Longitudinal Data. Journal of Multivariate Analysis. 2004;91:74–89.
• Koenker R. Quantile Regression. Cambridge University Press; 2005.
• Kottas A, Gelfand AE. Bayesian Semiparametric Median Regression Modeling. Journal of the American Statistical Association. 2001;96:1458–1468.
• Kottas A, Krnjajic M. Bayesian Nonparametric Modeling in Quantile Regression. Scandinavian Journal of Statistics. 2009;36:297–319.
• Lavine M. On an Approximate Likelihood for Quantiles. Biometrika. 1995;82:220–222.
• Lazar NA. Bayesian Empirical Likelihood. Biometrika. 2003;90:319–326.
• Lancaster T, Jun SJ. Bayesian quantile regression methods. Journal of Applied Econometrics. 2010;25:287–307.
• McCarty D, Gustafson DH, Wisdom JP, Ford J, Choi D, Molfenter T, Capoccia V, Cotter F. The Network for the Improvement of Addiction Treatment (NIATx): Enhancing access and retention. Drug and Alcohol Dependence. 2007;88:138–145. [PubMed]
• McCulloch CE. Symmetric Matrix Derivatives with Applications. Journal of the American Statistical Association. 1982;77:679–682.
• Owen A. Empirical Likelihood for Linear Models. The Annals of Statistics. 1991;19:90–120.
• Owen A. Empirical Likelihood. Chapman & Hall/CRC Press; 2001.
• Qin J, Lawless J. Empirical Likelihood and General Estimating Equations. The Annals of Statistics. 1994;22:300–325.
• Reich BJ, Bondell HD, Wang HJ. Flexible Bayesian Quantile Regression for Independent and Clustered Data. Biostatistics. 2010;11:337–352. [PubMed]
• Ruppert D, Carroll RJ. Trimmed Least Squares Estimation in the Linear Model. Journal of the American Statistical Association. 1980;75:828–838.
• Schennach SM. Bayesian Exponentially Tilted Empirical Likelihood. Biometrika. 2005;92:31–46.
• Tian L, Liu JS, Wei LJ. Implementation of Estimating Function-Based Inference Procedures With Markov Chain Monte Carlo Samplers. Journal of the American Statistical Association. 2007;102:881–888.
• Yin G. Bayesian Generalized Method of Moments. Bayesian Analysis. 2009;4:191–208.
• Yu K, Moyeed RA. Bayesian Quantile Regression. Statistics and Probability Letters. 2001;54:437–447.
• Yang Y, He X. Bayesian Empirical Likelihood for Quantile Regression. The Annals of Statistics. 2011 under review.

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.