We extend quantile regression (QR) method to a random effects analysis of clustered data in this paper. A dominant paradigm of clustered data analysis is a Gaussian structure where the random effects and random errors are both assumed identically distributed among themselves, following common Gaussian cumulative distributions respectively. Under this structure, however, the effects of the explanatory variables are assumed to affect only the location of the conditional distribution of the response. Such exclusive attention to the location is constraining as the analysis is often motivated by a natural heterogeneity among the clusters. Moreover the response variable is often analyzed on a transformed scale in practice to meet the rather rigid homoscedastic Gaussian distribution assumptions. The results are, however, less straightforward and difficult to interpret on the original scale. In addition the conditional mean analysis, although most popular, does not necessarily adequately address research questions of interest as shown in the Network for the Improvement for Addiction Treatment example below. QR can provide an alternative analytic tool. Conditional regression quantiles are invariant under monotonic transformation like quantiles in a univariate case and are modelled locally for their relationships with the explanatory variables with the relationships allowed to differ from one another.
The aforementioned features of QR has been well recognized with independent data. Methodologies are fully developed and the model is widely applied (see Koenker (2005
) for an overview). For the analysis of clustered data, however, a limited number of approaches have been proposed. Jung (1996)
considered fixed effects median regression and proposed a quasi-likelihood approach. Koenker (2004)
considered a random intercept model and proposed a l1
penalty approach. The l1
penalty approach is less stringent in its assumptions than most other methods. While the results of the l1
penalty approach depend on the choice of a penalty parameter, inference of the fixed effects was not studied with an empirically chosen penalty parameter. The method also may not be applicable to more complex random effects model as it essentially treats random effects as parameters and the increasing dimensionality can be an issue. Geraci and Bottai (2007)
assumed an asymmetric Laplace error distribution and proposed an expectation-maximization (EM) estimator. Assuming a common asymmetric Laplace distribution, their method constrains the errors to be not only homoscedastic but also have a mode at the median.
In the Bayesian analysis frame work, several parametric approaches have been proposed, similarly using asymmetric Laplace error densities and mostly for the analysis of independent data (e.g., Yu and Moyeed (2001)
). Nonparametric approaches were proposed to avoid the restrictive parametric assumption (e.g., Hanson and Johnson (2002)
, Kottas and Gelfand (2001)
and Kottas and Krnjajic (2009)
). Although capturing more general forms of skewness and tail behaviors, these nonparametric approaches also restrict the error densities to necessarily have their modes at the quantile of interest. Reich et al. (2010)
relaxed this restriction with an infinite mixture of quantile restricted two component Gaussian mixture densities. These nonparametric Bayesian approaches, however, essentially model the error densities, although avoiding parametrically specifying them. As a consequence they can accommodate the error heteroscedasticity only by correctly specifying its form parametrically in the model. This informative modeling requirement of the error heteroscedasticity is restrictive compared with the generality of independent data QR analysis methodology, and make the computation complex.
We propose a semiparametric approach that does not require modeling of the error densities, thereby realizing all the desirable features of the independent data QR analysis without computational complexity. We assume random regression coefficients which may not be necessarily identically distributed or Gaussian. The random coefficients have a common mean which corresponds to the population-average effects of the explanatory variables on the conditional quantile of interest. The random coefficients represent cluster specific deviations in the covariate effects. Under appropriate conditions discussed later the median regression coincides with the conditional mean, and appears as a robust alternative against outliers in the responses. We consider the estimation of the random effects as an estimating equations problem and use empirical likelihood (EL) to incorporate the parametric likelihood of the random effects. We yield a semiparametric likelihood-like criterion function, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates the maximizer as a natural estimator. We use the Bayesian framework and Markov Chain Monte Carlo (MCMC) samplers for the computation.
Monte Carlo methods have been used for classical estimation problems under various settings where classical methods are handicapped by computational difficulties (see Tian et al. (2007
) for an overview). Recent works include Chernozhukov and Hong (2003)
and Tian et al. (2007)
. Chernozhukov and Hong (2003)
particularly considered EL and censored QR for the analysis of independent data. In this paper we are concerned with random effects QR analysis and semiparametric likelihood-like criterion function.
A few works in the Bayesian literature also have considered likelihood-like or non-likelihood statistical criterion functions (e.g. Lavine (1995
); Dunson et al. (2003)
; Dunson and Taylor (2005)
; Lazar (2003)
, Schennach (2005)
, Lancaster and Jun (2010)
). They were motivated by the computational complexities entailed in nonparametric Bayesian methods and the difficulty of likelihood specification. Lazar (2003)
and Lancaster and Jun (2010)
specifically considered EL. Most of the works were concerned with independent data analysis with few exceptions. Dunson et al. (2003)
considered median regression for a latent variable model with multiple surrogate outcomes under Gausian within-subject dependency structure. Yin (2009)
used a quadratic likelihood-like function motivated by generalized estimating equations and proposed Bayesian generalized method of moments method.
The proposed method is similarly motivated: the aforementioned Bayesian nonparametric methods are complex in the computation and require an informative modeling of the error heteroscedasticity. This work does not require modeling of the error densities. EL is one choice that does not require modeling of the error densities. Any non-parametric likelihood such as exponentially tilted EL can be used instead. Using the MCMC sampler, the proposed method also does not require directly estimating the variance of the estimator for inference and is not subject to a known challenge of QR inference of estimating error densities at the quantile of interest. In this paper we provide large sample properties of the resulting quasi-posterior estimators and inference, being the first work clearly showing shrinkage of the random effects estimators toward the population average effect and the asymptotic normality of the population average effect estimator.
The remainder of this paper proceeds as follows. Section 2 formally defines the semi-parametric random effects quantile regression (REQR) estimator and provides their large sample properties. Section 3 describes MCMC methods. Section 4 provides empirical results including the analysis of two real data examples. Section 5 concludes. All the proofs are deferred to the Appendix