Apply quantile regression for a high resolution analysis of changes in wait time to treatment and assess its applicability to quality improvement data compared to least squares regression.
Addiction treatment programs participating in the Network for the Improvement of Addiction Treatment.
We used quantile regression to estimate wait time changes at 5%, 50% and 95% and compared the results with mean trends by least squares regression.
Quantile regression analysis found statistically significant changes in the 5% and 95% quantiles of wait time that were not identified using least squares regression.
Quantile regression enabled estimating changes specific to different percentiles of the wait time distribution. It provided a high resolution analysis that was more sensitive to changes in quantiles of the wait time distributions.
wait time; process improvement; quantile regression; Network for the Improvement of Addiction Treatment (NIATx)
Climate change may lead to changes in several aspects of the distribution of climate variables, including changes in the mean, increased variability, and severity of extreme events. In this paper, we propose using spatiotemporal quantile regression as a flexible and interpretable method for simultaneously detecting changes in several features of the distribution of climate variables. The spatiotemporal quantile regression model assumes that each quantile level changes linearly in time, permitting straight-forward inference on the time trend for each quantile level. Unlike classical quantile regression which uses model-free methods to analyze a single quantile or several quantiles separately, we take a model-based approach which jointly models all quantiles, and thus the entire response distribution. In the spatiotemporal quantile regression model, each spatial location has its own quantile function that evolves over time, and the quantile functions are smoothed spatially using Gaussian process priors. We propose a basis expansion for the quantile function that permits a closed-form for the likelihood, and allows for residual correlation modeling via a Gaussian spatial copula. We illustrate the methods using temperature data for the southeast US from the years 1931–2009. For these data, borrowing information across space identifies more significant time trends than classical non-spatial quantile regression. We find a decreasing time trend for much of the spatial domain for monthly mean and maximum temperatures. For the lower quantiles of monthly minimum temperature, we find a decrease in Georgia and Florida, and an increase in Virginia and the Carolinas.
Bayesian hierarchical model; climate change; non-Gaussian data; US temperature data; warming hole
Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online.
Model Selection; COSSO; Reproducing Kernel Hilbert Space; Kernel Quantile Regression
Mass spectrometry (MS) data are often generated from various biological or chemical experiments and there may exist outlying observations, which are extreme due to technical reasons. The determination of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential for successful analysis with reliable results and manual outlier detection as one of pre-processing steps is time-consuming. The heterogeneity of variability and low replication are often obstacles to successful analysis, including outlier detection. Existing approaches, which assume constant variability, can generate many false positives (outliers) and/or false negatives (non-outliers). Thus, a more powerful and accurate approach is needed to account for the heterogeneity of variability and low replication.
We proposed an outlier detection algorithm using projection and quantile regression in MS data from multiple experiments. The performance of the algorithm and program was demonstrated by using both simulated and real-life data. The projection approach with linear, nonlinear, or nonparametric quantile regression was appropriate in heterogeneous high-throughput data with low replication.
Various quantile regression approaches combined with projection were proposed for detecting outliers. The choice among linear, nonlinear, and nonparametric regressions is dependent on the degree of heterogeneity of the data. The proposed approach was illustrated with MS data with two or more replicates.
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation.
asymptotic normality; kernel; multiple quantile regression; non-crossing; oracle property; regularisation; variable selection
Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change-point, and discuss several methods for testing the existence of a change-point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.
Bahadur representation; Bootstrap; Change-point; Piecewise linear; Profile estimation
Quantile regression has emerged as a powerful tool in survival analysis as it directly links the quantiles of patients’ survival times to their demographic and genomic profiles, facilitating the identification of important prognostic factors. In view of limited work on variable selection in the context, we develop a new adaptive-lasso-based variable selection procedure for quantile regression with censored outcomes. To account for random censoring for data with multivariate covariates, we employ the ideas of redistribution-of-mass and e ective dimension reduction. Asymptotically our procedure enjoys the model selection consistency, that is, identifying the true model with probability tending to one. Moreover, as opposed to the existing methods, our new proposal requires fewer assumptions, leading to more accurate variable selection. The analysis of a real cancer clinical trial demonstrates that our procedure can identify and distinguish important factors associated with patient sub-populations characterized by short or long survivals, which is of particular interest to oncologists.
Conditional Kaplan-Meier; dimension reduction; kernel; quantile regression; survival analysis; variable selection
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset.
Dimension reduction; heteroscedasticity; linearity condition; local polynomial regression; quantile regression; single-index model
Forecasting higher than expected numbers of health events provides potentially valuable insights in its own right, and may contribute to health services management and syndromic surveillance. This study investigates the use of quantile regression to predict higher than expected respiratory deaths.
Data taken from 70,830 deaths occurring in New York were used. Temporal, weather and air quality measures were fitted using quantile regression at the 90th-percentile with half the data (in-sample). Four QR models were fitted: an unconditional model predicting the 90th-percentile of deaths (Model 1), a seasonal / temporal (Model 2), a seasonal, temporal plus lags of weather and air quality (Model 3), and a seasonal, temporal model with 7-day moving averages of weather and air quality. Models were cross-validated with the out of sample data. Performance was measured as proportionate reduction in weighted sum of absolute deviations by a conditional, over unconditional models; i.e., the coefficient of determination (R1).
The coefficient of determination showed an improvement over the unconditional model between 0.16 and 0.19. The greatest improvement in predictive and forecasting accuracy of daily mortality was associated with the inclusion of seasonal and temporal predictors (Model 2). No gains were made in the predictive models with the addition of weather and air quality predictors (Models 3 and 4). However, forecasting models that included weather and air quality predictors performed slightly better than the seasonal and temporal model alone (i.e., Model 3 > Model 4 > Model 2)
This study provided a new approach to predict higher than expected numbers of respiratory related-deaths. The approach, while promising, has limitations and should be treated at this stage as a proof of concept.
Studies on the health impacts of climate change routinely use climate model output as future exposure projection. Uncertainty quantification, usually in the form of sensitivity analysis, has focused predominantly on the variability arise from different emission scenarios or multi-model ensembles. This paper describes a Bayesian spatial quantile regression approach to calibrate climate model output for examining to the risks of future temperature on adverse health outcomes. Specifically, we first estimate the spatial quantile process for climate model output using nonlinear monotonic regression during a historical period. The quantile process is then calibrated using the quantile functions estimated from the observed monitoring data. Our model also down-scales the gridded climate model output to the point-level for projecting future exposure over a specific geographical region. The quantile regression approach is motivated by the need to better characterize the tails of future temperature distribution where the greatest health impacts are likely to occur. We applied the methodology to calibrate temperature projections from a regional climate model for the period 2041 to 2050. Accounting for calibration uncertainty, we calculated the number of of excess deaths attributed to future temperature for three cities in the US state of Alabama.
Bayesian spatial quantile regression; climate change; model calibration; health impacts
Clinical binary end-point traits are often governed by quantitative precursors. Hence it may be a prudent strategy to analyze a clinical end-point trait by considering a multivariate phenotype vector, possibly including both quantitative and qualitative phenotypes. A major statistical challenge lies in integrating the constituent phenotypes into a reduced univariate phenotype for association analyses. We assess the performances of certain reduced phenotypes using analysis of variance and a model-free quantile-based approach. We find that analysis of variance is more powerful than the quantile-based approach in detecting association, particularly for rare variants. We also find that using a principal component of the quantitative phenotypes and the residual of a logistic regression of the binary phenotype on the quantitative phenotypes may be an optimal method for integrating a binary phenotype with quantitative phenotypes to define a reduced univariate phenotype.
Many studies have investigated racial/ethnic disparities in medication non-adherence in patients with type 2 diabetes using common measures such as medication possession ratio (MPR) or gaps between refills. All these measures including MPR are quasi-continuous and bounded and their distribution is usually skewed. Analysis of such measures using traditional regression methods that model mean changes in the dependent variable may fail to provide a full picture about differential patterns in non-adherence between groups.
A retrospective cohort of 11,272 veterans with type 2 diabetes was assembled from Veterans Administration datasets from April 1996 to May 2006. The main outcome measure was MPR with quantile cutoffs Q1-Q4 taking values of 0.4, 0.6, 0.8 and 0.9. Quantile-regression (QReg) was used to model the association between MPR and race/ethnicity after adjusting for covariates. Comparison was made with commonly used ordinary-least-squares (OLS) and generalized linear mixed models (GLMM).
Quantile-regression showed that Non-Hispanic-Black (NHB) had statistically significantly lower MPR compared to Non-Hispanic-White (NHW) holding all other variables constant across all quantiles with estimates and p-values given as -3.4% (p = 0.11), -5.4% (p = 0.01), -3.1% (p = 0.001), and -2.00% (p = 0.001) for Q1 to Q4, respectively. Other racial/ethnic groups had lower adherence than NHW only in the lowest quantile (Q1) of about -6.3% (p = 0.003). In contrast, OLS and GLMM only showed differences in mean MPR between NHB and NHW while the mean MPR difference between other racial groups and NHW was not significant.
Quantile regression is recommended for analysis of data that are heterogeneous such that the tails and the central location of the conditional distributions vary differently with the covariates. QReg provides a comprehensive view of the relationships between independent and dependent variables (i.e. not just centrally but also in the tails of the conditional distribution of the dependent variable). Indeed, without performing QReg at different quantiles, an investigator would have no way of assessing whether a difference in these relationships might exist.
Medication adherence; Quantile regression; Diabetes; Health disparities
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels.
Crossing quantile curve; Heteroscedastic error; Quantile regression; Robustness; Smoothing spline; Tropical cyclone
A time-specific log-linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time-to-event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance-covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long-term follow-up to estimate the patients’ median residual lifetimes, adjusting for important prognostic factors.
Breast cancer; Martingale; Minimum dispersion statistic
The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data.
Asymptotic relative efficiency; composite quantile regression; semiparametric varying-coefficient partially linear model; oracle properties; variable selection
The quantile approximation method has recently been proposed as a simple method for deriving confidence intervals for the treatment effect in a random effects meta-analysis. Although easily implemented, the quantiles used to construct intervals are derived from a single simulation study. Here it is shown that altering the study parameters, and in particular introducing changes to the distribution of the within-study variances, can have a dramatic impact on the resulting quantiles. This is further illustrated analytically by examining the scenario where all trials are assumed to be the same size. A more cautious approach is therefore suggested, where the conventional standard normal quantile is used in the primary analysis, but where the use of alternative quantiles is also considered in a sensitivity analysis. Copyright © 2008 John Wiley & Sons, Ltd.
meta-analysis; random effects model; quantile approximation method
Autoregressive (AR) models with finite variance errors have been well studied. This paper is concerned with AR models with heavy-tailed errors, which is useful in various scientific research areas. Statistical estimation for AR models with infinite variance errors is very different from those for AR models with finite variance errors. In this paper, we consider a weighted quantile regression for AR models to deal with infinite variance errors. We further propose an induced smoothing method to deal with computational challenges in weighted quantile regression. We show that the difference between weighted quantile regression estimate and its smoothed version is negligible. We further propose a test for linear hypothesis on the regression coefficients. We conduct Monte Carlo simulation study to assess the finite sample performance of the proposed procedures. We illustrate the proposed methodology by an empirical analysis of a real-life data set.
Quantile regression; autoregressive model; infinite variance
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited.
We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate.
Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting.
At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable.
Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
Because of the low frequency of rare genetic variants in observed data, the statistical power of detecting their associations with target traits is usually low. The collapsing test of collective effect of multiple rare variants is an important and useful strategy to increase the power; in addition, family data may be enriched with causal rare variants and therefore provide extra power. However, when family data are used, both population structure and familial relatedness need to be adjusted for the possible inflation of false positives. Using a unified mixed linear model and family data, we compared six methods to detect the association between multiple rare variants and quantitative traits. Through the analysis of 200 replications of the quantitative trait Q2 from the Genetic Analysis Workshop 17 data set simulated for 697 subjects from 8 extended families, and based on quantile-quantile plots under the null and receiver operating characteristic curves, we compared the false-positive rate and power of these methods. We observed that adjusting for pedigree-based kinship gives the best control for false-positive rate, whereas adjusting for marker-based identity by state slightly outperforms in terms of power. An adjustment based on a principal components analysis slightly improves the false-positive rate and power. Taking into account type-1 error, power, and computational efficiency, we find that adjusting for pedigree-based kinship seems to be a good choice for the collective test of association between multiple rare variants and quantitative traits using family data.
The number of particles in a sample heavily influences the shape of the distribution describing the corresponding individual particle measurements. Selecting an adequate number of particles that prevents biases due to sample size is particularly difficult for complex biological systems in which statistical distributions are not normal. Quantile analysis is a powerful statistical technique that can rapidly compare differences between multiple distributions of individual particles. This report utilizes quantile analysis to show that the number of events detected affects the mobility distributions for rat liver and mouse liver mitochondria, sample individual particles, when analyzed via capillary electrophoresis with laser-induced fluorescence. When the mitochondrial sample is small (e.g. <78), there are not enough events to obtain statistically relevant mobility data. Adsorption to the capillary surface also significantly affects the mobility distribution at a small number of events in uncoated and dynamically coated capillaries. These adsorption effects can be overcome when the mitochondrial load on the capillary is sufficiently large (i.e. >609 and >1426 events for mouse liver on uncoated capillaries and rat liver on dynamically coated capillaries, respectively). It is anticipated that quantile analysis can be used to study other distributions of individual particles, such as nanoparticles, organelles, and biomolecules, and that distributions of these particles will also be dependant on sample size.
Undersampling; quantile analysis; Mitochondria; capillary electrophoresis; adsorption; dynamic coatings
It is estimated that mainland Chinese tourists travelling to Taiwan can bring annual revenues of 400 billion NTD to the Taiwan economy. Thus, how the Taiwanese Government formulates relevant measures to satisfy both sides is the focus of most concern. Taiwan must improve the facilities and service quality of its tourism industry so as to attract more mainland tourists. This paper conducted a questionnaire survey of mainland tourists and used grey relational analysis in grey mathematics to analyze the satisfaction performance of all satisfaction question items. The first eight satisfaction items were used as independent variables, and the overall satisfaction performance was used as a dependent variable for quantile regression model analysis to discuss the relationship between the dependent variable under different quantiles and independent variables. Finally, this study further discussed the predictive accuracy of the least mean regression model and each quantile regression model, as a reference for research personnel. The analysis results showed that other variables could also affect the overall satisfaction performance of mainland tourists, in addition to occupation and age. The overall predictive accuracy of quantile regression model Q0.25 was higher than that of the other three models.
Genetic association of population-based quantitative trait data has traditionally been analyzed using analysis of variance (ANOVA). However, violations of certain statistical assumptions may lead to false-positive association results. In this study, we have explored model-free alternatives to ANOVA using correlations between allele frequencies in the different quantile intervals of the quantitative trait and the quantile values. We performed genome-wide association scans on anti-cyclic citrullinated peptide and rheumatoid factor-immunoglobulin M, two quantitative traits correlated with rheumatoid arthritis, using the data provided in Genetic Analysis Workshop 16. Both the quantitative traits exhibited significant evidence of association on Chromosome 6, although not in the human leukocyte antigen region which is known to harbor a major gene predisposing to rheumatoid arthritis. We found that while a majority of the significant findings using the asymptotic thresholds of ANOVA was not validated using permutations, a relatively higher proportion of the significant findings using the asymptotic cut-offs of the correlation statistic were validated using permutations.
DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.
We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.
In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.
Quantile and rank normalizations are two widely used pre-processing techniques designed to remove technological noise presented in genomic data. Subsequent statistical analysis such as gene differential expression analysis is usually based on normalized expressions. In this study, we find that these normalization procedures can have a profound impact on differential expression analysis, especially in terms of testing power.
We conduct theoretical derivations to show that the testing power of differential expression analysis based on quantile or rank normalized gene expressions can never reach 100% with fixed sample size no matter how strong the gene differentiation effects are. We perform extensive simulation analyses and find the results corroborate theoretical predictions.
Our finding may explain why genes with well documented strong differentiation are not always detected in microarray analysis. It provides new insights in microarray experimental design and will help practitioners in selecting proper normalization procedures.
To assess whether Black-White and Hispanic-White disparities increase or abate in the upper quantiles of total health care expenditure, conditional on covariates.
Nationally representative adult population of non-Hispanic Whites, African-Americans, and Hispanics from the 2001 - 2005 Medical Expenditure Panel Surveys.
We examine unadjusted racial/ethnic differences across the distribution of expenditures. We apply quantile regression to measure disparities at the median, 75th, 90th, and 95th quantiles, testing for differences over the distribution of health care expenditures and across income and education categories. We test the sensitivity of the results to comparisons based only on health status and estimate a two-part model to ensure that results are not driven by an extremely skewed distribution of expenditures with a large zero mass.
Black-White and Hispanic-White disparities diminish in the upper quantiles of expenditure, but expenditures for Blacks and Hispanics remain significantly lower than for Whites throughout the distribution. For most education and income categories, disparities exist at the median and decline, but remain significant even with increased education and income.
Blacks and Hispanics receive significantly disparate care at high expenditure levels, suggesting prioritization of improved access to quality care among minorities with critical health issues.
Racial disparities; healthcare expenditures; quantile regression; vigicile