PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (708933)

Clipboard (0)
None

Related Articles

1.  Outlier Detection using Projection Quantile Regression for Mass Spectrometry Data with Low Replication 
BMC Research Notes  2012;5:236.
Background
Mass spectrometry (MS) data are often generated from various biological or chemical experiments and there may exist outlying observations, which are extreme due to technical reasons. The determination of outlying observations is important in the analysis of replicated MS data because elaborate pre-processing is essential for successful analysis with reliable results and manual outlier detection as one of pre-processing steps is time-consuming. The heterogeneity of variability and low replication are often obstacles to successful analysis, including outlier detection. Existing approaches, which assume constant variability, can generate many false positives (outliers) and/or false negatives (non-outliers). Thus, a more powerful and accurate approach is needed to account for the heterogeneity of variability and low replication.
Findings
We proposed an outlier detection algorithm using projection and quantile regression in MS data from multiple experiments. The performance of the algorithm and program was demonstrated by using both simulated and real-life data. The projection approach with linear, nonlinear, or nonparametric quantile regression was appropriate in heterogeneous high-throughput data with low replication.
Conclusion
Various quantile regression approaches combined with projection were proposed for detecting outliers. The choice among linear, nonlinear, and nonparametric regressions is dependent on the degree of heterogeneity of the data. The proposed approach was illustrated with MS data with two or more replicates.
doi:10.1186/1756-0500-5-236
PMCID: PMC3514222  PMID: 22587344
2.  Bent Line Quantile Regression with Application to an Allometric Study of Land Mammals' Speed and Mass 
Biometrics  2011;67(1):242-249.
Summary
Quantile regression, which models the conditional quantiles of the response variable given covariates, usually assumes a linear model. However, this kind of linearity is often unrealistic in real life. One situation where linear quantile regression is not appropriate is when the response variable is piecewise linear but still continuous in covariates. To analyze such data, we propose a bent line quantile regression model. We derive its parameter estimates, prove that they are asymptotically valid given the existence of a change-point, and discuss several methods for testing the existence of a change-point in bent line quantile regression together with a power comparison by simulation. An example of land mammal maximal running speeds is given to illustrate an application of bent line quantile regression in which this model is theoretically justified and its parameters are of direct biological interests.
doi:10.1111/j.1541-0420.2010.01436.x
PMCID: PMC3059331  PMID: 20528859
Bahadur representation; Bootstrap; Change-point; Piecewise linear; Profile estimation
3.  Simultaneous multiple non-crossing quantile regression estimation using kernel constraints 
Quantile regression (QR) is a very useful statistical tool for learning the relationship between the response variable and covariates. For many applications, one often needs to estimate multiple conditional quantile functions of the response variable given covariates. Although one can estimate multiple quantiles separately, it is of great interest to estimate them simultaneously. One advantage of simultaneous estimation is that multiple quantiles can share strength among them to gain better estimation accuracy than individually estimated quantile functions. Another important advantage of joint estimation is the feasibility of incorporating simultaneous non-crossing constraints of QR functions. In this paper, we propose a new kernel-based multiple QR estimation technique, namely simultaneous non-crossing quantile regression (SNQR). We use kernel representations for QR functions and apply constraints on the kernel coefficients to avoid crossing. Both unregularised and regularised SNQR techniques are considered. Asymptotic properties such as asymptotic normality of linear SNQR and oracle properties of the sparse linear SNQR are developed. Our numerical results demonstrate the competitive performance of our SNQR over the original individual QR estimation.
doi:10.1080/10485252.2010.537336
PMCID: PMC3242516  PMID: 22190842
asymptotic normality; kernel; multiple quantile regression; non-crossing; oracle property; regularisation; variable selection
4.  Noncrossing quantile regression curve estimation 
Biometrika  2010;97(4):825-838.
Summary
Since quantile regression curves are estimated individually, the quantile curves can cross, leading to an invalid distribution for the response. A simple constrained version of quantile regression is proposed to avoid the crossing problem for both linear and nonparametric quantile curves. A simulation study and a reanalysis of tropical cyclone intensity data shows the usefulness of the procedure. Asymptotic properties of the estimator are equivalent to the typical approach under standard conditions, and the proposed estimator reduces to the classical one if there is no crossing. The performance of the constrained estimator has shown significant improvement by adding smoothing and stability across the quantile levels.
doi:10.1093/biomet/asq048
PMCID: PMC3371721  PMID: 22822254
Crossing quantile curve; Heteroscedastic error; Quantile regression; Robustness; Smoothing spline; Tropical cyclone
5.  Using quantile regression to investigate racial disparities in medication non-adherence 
Background
Many studies have investigated racial/ethnic disparities in medication non-adherence in patients with type 2 diabetes using common measures such as medication possession ratio (MPR) or gaps between refills. All these measures including MPR are quasi-continuous and bounded and their distribution is usually skewed. Analysis of such measures using traditional regression methods that model mean changes in the dependent variable may fail to provide a full picture about differential patterns in non-adherence between groups.
Methods
A retrospective cohort of 11,272 veterans with type 2 diabetes was assembled from Veterans Administration datasets from April 1996 to May 2006. The main outcome measure was MPR with quantile cutoffs Q1-Q4 taking values of 0.4, 0.6, 0.8 and 0.9. Quantile-regression (QReg) was used to model the association between MPR and race/ethnicity after adjusting for covariates. Comparison was made with commonly used ordinary-least-squares (OLS) and generalized linear mixed models (GLMM).
Results
Quantile-regression showed that Non-Hispanic-Black (NHB) had statistically significantly lower MPR compared to Non-Hispanic-White (NHW) holding all other variables constant across all quantiles with estimates and p-values given as -3.4% (p = 0.11), -5.4% (p = 0.01), -3.1% (p = 0.001), and -2.00% (p = 0.001) for Q1 to Q4, respectively. Other racial/ethnic groups had lower adherence than NHW only in the lowest quantile (Q1) of about -6.3% (p = 0.003). In contrast, OLS and GLMM only showed differences in mean MPR between NHB and NHW while the mean MPR difference between other racial groups and NHW was not significant.
Conclusion
Quantile regression is recommended for analysis of data that are heterogeneous such that the tails and the central location of the conditional distributions vary differently with the covariates. QReg provides a comprehensive view of the relationships between independent and dependent variables (i.e. not just centrally but also in the tails of the conditional distribution of the dependent variable). Indeed, without performing QReg at different quantiles, an investigator would have no way of assessing whether a difference in these relationships might exist.
doi:10.1186/1471-2288-11-88
PMCID: PMC3121729  PMID: 21645379
Medication adherence; Quantile regression; Diabetes; Health disparities
6.  Integrating binary traits with quantitative phenotypes for association mapping of multivariate phenotypes 
BMC Proceedings  2011;5(Suppl 9):S73.
Clinical binary end-point traits are often governed by quantitative precursors. Hence it may be a prudent strategy to analyze a clinical end-point trait by considering a multivariate phenotype vector, possibly including both quantitative and qualitative phenotypes. A major statistical challenge lies in integrating the constituent phenotypes into a reduced univariate phenotype for association analyses. We assess the performances of certain reduced phenotypes using analysis of variance and a model-free quantile-based approach. We find that analysis of variance is more powerful than the quantile-based approach in detecting association, particularly for rare variants. We also find that using a principal component of the quantitative phenotypes and the residual of a logistic regression of the binary phenotype on the quantitative phenotypes may be an optimal method for integrating a binary phenotype with quantitative phenotypes to define a reduced univariate phenotype.
doi:10.1186/1753-6561-5-S9-S73
PMCID: PMC3287913  PMID: 22373144
7.  A re-evaluation of the ‘quantile approximation method’ for random effects meta-analysis 
Statistics in Medicine  2008;28(2):338-348.
The quantile approximation method has recently been proposed as a simple method for deriving confidence intervals for the treatment effect in a random effects meta-analysis. Although easily implemented, the quantiles used to construct intervals are derived from a single simulation study. Here it is shown that altering the study parameters, and in particular introducing changes to the distribution of the within-study variances, can have a dramatic impact on the resulting quantiles. This is further illustrated analytically by examining the scenario where all trials are assumed to be the same size. A more cautious approach is therefore suggested, where the conventional standard normal quantile is used in the primary analysis, but where the use of alternative quantiles is also considered in a sensitivity analysis. Copyright © 2008 John Wiley & Sons, Ltd.
doi:10.1002/sim.3487
PMCID: PMC2991773  PMID: 19016302
meta-analysis; random effects model; quantile approximation method
8.  Regression on Quantile Residual Life 
Biometrics  2009;65(4):1203-1212.
Summary
A time-specific log-linear regression method on quantile residual lifetime is proposed. Under the proposed regression model, any quantile of a time-to-event distribution among survivors beyond a certain time point is associated with selected covariates under right censoring. Consistency and asymptotic normality of the regression estimator are established. An asymptotic test statistic is proposed to evaluate the covariate effects on the quantile residual lifetimes at a specific time point. Evaluation of the test statistic does not require estimation of the variance-covariance matrix of the regression estimators, which involves the probability density function of the survival distribution with censoring. Simulation studies are performed to assess finite sample properties of the regression parameter estimator and test statistic. The new regression method is applied to a breast cancer data set with long-term follow-up to estimate the patients’ median residual lifetimes, adjusting for important prognostic factors.
doi:10.1111/j.1541-0420.2009.01196.x
PMCID: PMC3050018  PMID: 19432781
Breast cancer; Martingale; Minimum dispersion statistic
9.  NEW EFFICIENT ESTIMATION AND VARIABLE SELECTION METHODS FOR SEMIPARAMETRIC VARYING-COEFFICIENT PARTIALLY LINEAR MODELS 
Annals of statistics  2011;39(1):305-332.
The complexity of semiparametric models poses new challenges to statistical inference and model selection that frequently arise from real applications. In this work, we propose new estimation and variable selection procedures for the semiparametric varying-coefficient partially linear model. We first study quantile regression estimates for the nonparametric varying-coefficient functions and the parametric regression coefficients. To achieve nice efficiency properties, we further develop a semiparametric composite quantile regression procedure. We establish the asymptotic normality of proposed estimators for both the parametric and nonparametric parts and show that the estimators achieve the best convergence rate. Moreover, we show that the proposed method is much more efficient than the least-squares-based method for many non-normal errors and that it only loses a small amount of efficiency for normal errors. In addition, it is shown that the loss in efficiency is at most 11.1% for estimating varying coefficient functions and is no greater than 13.6% for estimating parametric components. To achieve sparsity with high-dimensional covariates, we propose adaptive penalization methods for variable selection in the semiparametric varying-coefficient partially linear model and prove that the methods possess the oracle property. Extensive Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed procedures. Finally, we apply the new methods to analyze the plasma beta-carotene level data.
doi:10.1214/10-AOS842
PMCID: PMC3109949  PMID: 21666869
Asymptotic relative efficiency; composite quantile regression; semiparametric varying-coefficient partially linear model; oracle properties; variable selection
10.  Methods for adjusting population structure and familial relatedness in association test for collective effect of multiple rare variants on quantitative traits 
BMC Proceedings  2011;5(Suppl 9):S35.
Because of the low frequency of rare genetic variants in observed data, the statistical power of detecting their associations with target traits is usually low. The collapsing test of collective effect of multiple rare variants is an important and useful strategy to increase the power; in addition, family data may be enriched with causal rare variants and therefore provide extra power. However, when family data are used, both population structure and familial relatedness need to be adjusted for the possible inflation of false positives. Using a unified mixed linear model and family data, we compared six methods to detect the association between multiple rare variants and quantitative traits. Through the analysis of 200 replications of the quantitative trait Q2 from the Genetic Analysis Workshop 17 data set simulated for 697 subjects from 8 extended families, and based on quantile-quantile plots under the null and receiver operating characteristic curves, we compared the false-positive rate and power of these methods. We observed that adjusting for pedigree-based kinship gives the best control for false-positive rate, whereas adjusting for marker-based identity by state slightly outperforms in terms of power. An adjustment based on a principal components analysis slightly improves the false-positive rate and power. Taking into account type-1 error, power, and computational efficiency, we find that adjusting for pedigree-based kinship seems to be a good choice for the collective test of association between multiple rare variants and quantitative traits using family data.
doi:10.1186/1753-6561-5-S9-S35
PMCID: PMC3287871  PMID: 22373066
11.  A quantile-based method for association mapping of quantitative phenotypes: an application to rheumatoid arthritis phenotypes 
BMC Proceedings  2009;3(Suppl 7):S18.
Genetic association of population-based quantitative trait data has traditionally been analyzed using analysis of variance (ANOVA). However, violations of certain statistical assumptions may lead to false-positive association results. In this study, we have explored model-free alternatives to ANOVA using correlations between allele frequencies in the different quantile intervals of the quantitative trait and the quantile values. We performed genome-wide association scans on anti-cyclic citrullinated peptide and rheumatoid factor-immunoglobulin M, two quantitative traits correlated with rheumatoid arthritis, using the data provided in Genetic Analysis Workshop 16. Both the quantitative traits exhibited significant evidence of association on Chromosome 6, although not in the human leukocyte antigen region which is known to harbor a major gene predisposing to rheumatoid arthritis. We found that while a majority of the significant findings using the asymptotic thresholds of ANOVA was not validated using permutations, a relatively higher proportion of the significant findings using the asymptotic cut-offs of the correlation statistic were validated using permutations.
PMCID: PMC2795914  PMID: 20018007
12.  Evaluation of individual particle capillary electrophoresis experiments via quantile analysis 
Journal of chromatography. A  2007;1157(1-2):446-453.
The number of particles in a sample heavily influences the shape of the distribution describing the corresponding individual particle measurements. Selecting an adequate number of particles that prevents biases due to sample size is particularly difficult for complex biological systems in which statistical distributions are not normal. Quantile analysis is a powerful statistical technique that can rapidly compare differences between multiple distributions of individual particles. This report utilizes quantile analysis to show that the number of events detected affects the mobility distributions for rat liver and mouse liver mitochondria, sample individual particles, when analyzed via capillary electrophoresis with laser-induced fluorescence. When the mitochondrial sample is small (e.g. <78), there are not enough events to obtain statistically relevant mobility data. Adsorption to the capillary surface also significantly affects the mobility distribution at a small number of events in uncoated and dynamically coated capillaries. These adsorption effects can be overcome when the mitochondrial load on the capillary is sufficiently large (i.e. >609 and >1426 events for mouse liver on uncoated capillaries and rat liver on dynamically coated capillaries, respectively). It is anticipated that quantile analysis can be used to study other distributions of individual particles, such as nanoparticles, organelles, and biomolecules, and that distributions of these particles will also be dependant on sample size.
doi:10.1016/j.chroma.2007.04.065
PMCID: PMC2504414  PMID: 17521658
Undersampling; quantile analysis; Mitochondria; capillary electrophoresis; adsorption; dynamic coatings
13.  Identifying hypermethylated CpG islands using a quantile regression model 
BMC Bioinformatics  2011;12:54.
Background
DNA methylation has been shown to play an important role in the silencing of tumor suppressor genes in various tumor types. In order to have a system-wide understanding of the methylation changes that occur in tumors, we have developed a differential methylation hybridization (DMH) protocol that can simultaneously assay the methylation status of all known CpG islands (CGIs) using microarray technologies. A large percentage of signals obtained from microarrays can be attributed to various measurable and unmeasurable confounding factors unrelated to the biological question at hand. In order to correct the bias due to noise, we first implemented a quantile regression model, with a quantile level equal to 75%, to identify hypermethylated CGIs in an earlier work. As a proof of concept, we applied this model to methylation microarray data generated from breast cancer cell lines. However, we were unsure whether 75% was the best quantile level for identifying hypermethylated CGIs. In this paper, we attempt to determine which quantile level should be used to identify hypermethylated CGIs and their associated genes.
Results
We introduce three statistical measurements to compare the performance of the proposed quantile regression model at different quantile levels (95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%), using known methylated genes and unmethylated housekeeping genes reported in breast cancer cell lines and ovarian cancer patients. Our results show that the quantile levels ranging from 80% to 90% are better at identifying known methylated and unmethylated genes.
Conclusions
In this paper, we propose to use a quantile regression model to identify hypermethylated CGIs by incorporating probe effects to account for noise due to unmeasurable factors. Our model can efficiently identify hypermethylated CGIs in both breast and ovarian cancer data.
doi:10.1186/1471-2105-12-54
PMCID: PMC3051900  PMID: 21324121
14.  Measuring Racial/Ethnic Disparities across the Distribution of Health Care Expenditures 
Health services research  2009;44(5 Pt 1):1603-1621.
Objective
To assess whether Black-White and Hispanic-White disparities increase or abate in the upper quantiles of total health care expenditure, conditional on covariates.
Data Source
Nationally representative adult population of non-Hispanic Whites, African-Americans, and Hispanics from the 2001 - 2005 Medical Expenditure Panel Surveys.
Study Design
We examine unadjusted racial/ethnic differences across the distribution of expenditures. We apply quantile regression to measure disparities at the median, 75th, 90th, and 95th quantiles, testing for differences over the distribution of health care expenditures and across income and education categories. We test the sensitivity of the results to comparisons based only on health status and estimate a two-part model to ensure that results are not driven by an extremely skewed distribution of expenditures with a large zero mass.
Principal Findings
Black-White and Hispanic-White disparities diminish in the upper quantiles of expenditure, but expenditures for Blacks and Hispanics remain significantly lower than for Whites throughout the distribution. For most education and income categories, disparities exist at the median and decline, but remain significant even with increased education and income.
Conclusions
Blacks and Hispanics receive significantly disparate care at high expenditure levels, suggesting prioritization of improved access to quality care among minorities with critical health issues.
doi:10.1111/j.1475-6773.2009.01004.x
PMCID: PMC2754550  PMID: 19656228
Racial disparities; healthcare expenditures; quantile regression; vigicile
15.  Measuring Racial/Ethnic Disparities across the Distribution of Health Care Expenditures 
Health Services Research  2009;44(5p1):1603-1621.
Objective
To assess whether black–white and Hispanic–white disparities increase or abate in the upper quantiles of total health care expenditure, conditional on covariates.
Data Source
Nationally representative adult population of non-Hispanic whites, African Americans, and Hispanics from the 2001–2005 Medical Expenditure Panel Surveys.
Study Design
We examine unadjusted racial/ethnic differences across the distribution of expenditures. We apply quantile regression to measure disparities at the median, 75th, 90th, and 95th quantiles, testing for differences over the distribution of health care expenditures and across income and education categories. We test the sensitivity of the results to comparisons based only on health status and estimate a two-part model to ensure that results are not driven by an extremely skewed distribution of expenditures with a large zero mass.
Principal Findings
Black–white and Hispanic–white disparities diminish in the upper quantiles of expenditure, but expenditures for blacks and Hispanics remain significantly lower than for whites throughout the distribution. For most education and income categories, disparities exist at the median and decline, but remain significant even with increased education and income.
Conclusions
Blacks and Hispanics receive significantly disparate care at high expenditure levels, suggesting prioritization of improved access to quality care among minorities with critical health issues.
doi:10.1111/j.1475-6773.2009.01004.x
PMCID: PMC2754550  PMID: 19656228
Racial disparities; health care expenditures; quantile regression; vigicile
16.  Local CQR Smoothing: An Efficient and Safe Alternative to Local Polynomial Regression 
Local polynomial regression is a useful nonparametric regression tool to explore fine data structures and has been widely used in practice. In this paper, we propose a new nonparametric regression technique called local composite-quantile-regression (CQR) smoothing in order to further improve local polynomial regression. Sampling properties of the proposed estimation procedure are studied. We derive the asymptotic bias, variance and normality of the proposed estimate. Asymptotic relative efficiency of the proposed estimate with respect to the local polynomial regression is investigated. It is shown that the proposed estimate can be much more efficient than the local polynomial regression estimate for various non-normal errors, while being almost as efficient as the local polynomial regression estimate for normal errors. Simulation is conducted to examine the performance of the proposed estimates. The simulation results are consistent with our theoretical findings. A real data example is used to illustrate the proposed method.
doi:10.1111/j.1467-9868.2009.00725.x
PMCID: PMC2958780  PMID: 20975930
Asymptotic efficiency; CQR estimator; Kernel function; Local polynomial regression; Nonparametric regression
17.  Invited Commentary: Antecedents of Obesity—Analysis, Interpretation, and Use of Longitudinal Data 
American journal of epidemiology  2007;166(1):14-18.
The obesity epidemic causes misery and death. Most epidemiologists accept the hypothesis that characteristics of the early stages of human development have lifelong influences on obesity-related health outcomes. Unfortunately, there is a dearth of data of sufficient scope and individual history to help unravel the associations of prenatal, postnatal, and childhood factors with adult obesity and health outcomes. Here the authors discuss analytic methods, the interpretation of models, and the use to which such rare and valuable data may be put in developing interventions to combat the epidemic. For example, analytic methods such as quantile and multinomial logistic regression can describe the effects on body mass index range rather than just its mean; structural equation models may allow comparison of the contributions of different factors at different periods in the life course. Interpretation of the data and model construction is complex, and it requires careful consideration of the biologic plausibility and statistical interpretation of putative causal factors. The goals of discovering modifiable determinants of obesity during the prenatal, postnatal, and childhood periods must be kept in sight, and analyses should be built to facilitate them. Ultimately, interventions in these factors may help prevent obesity-related adverse health outcomes for future generations.
doi:10.1093/aje/kwm101
PMCID: PMC1989664  PMID: 17490988
birth weight; body mass index; body size; growth; obesity; overweight
18.  Prediction intervals for future BMI values of individual children - a non-parametric approach by quantile boosting 
Background
The construction of prediction intervals (PIs) for future body mass index (BMI) values of individual children based on a recent German birth cohort study with n = 2007 children is problematic for standard parametric approaches, as the BMI distribution in childhood is typically skewed depending on age.
Methods
We avoid distributional assumptions by directly modelling the borders of PIs by additive quantile regression, estimated by boosting. We point out the concept of conditional coverage to prove the accuracy of PIs. As conditional coverage can hardly be evaluated in practical applications, we conduct a simulation study before fitting child- and covariate-specific PIs for future BMI values and BMI patterns for the present data.
Results
The results of our simulation study suggest that PIs fitted by quantile boosting cover future observations with the predefined coverage probability and outperform the benchmark approach. For the prediction of future BMI values, quantile boosting automatically selects informative covariates and adapts to the age-specific skewness of the BMI distribution. The lengths of the estimated PIs are child-specific and increase, as expected, with the age of the child.
Conclusions
Quantile boosting is a promising approach to construct PIs with correct conditional coverage in a non-parametric way. It is in particular suitable for the prediction of BMI patterns depending on covariates, since it provides an interpretable predictor structure, inherent variable selection properties and can even account for longitudinal data structures.
doi:10.1186/1471-2288-12-6
PMCID: PMC3292459  PMID: 22276940
19.  Quantile Regression for Doubly Censored Data 
Biometrics  2011;68(1):101-112.
SUMMARY
Double censoring often occurs in registry studies when left censoring is present in addition to right censoring. In this work, we propose a new analysis strategy for such doubly censored data by adopting a quantile regression model. We develop computationally simple estimation and inference procedures by appropriately using the embedded martingale structure. Asymptotic properties, including the uniform consistency and weak convergence, are established for the resulting estimators. Moreover, we propose conditional inference to address the special identifiability issues attached to the doubly censoring setting. We further show that the proposed method can be readily adapted to handle left truncation. Simulation studies demonstrate good finite-sample performance of the new inferential procedures. The practical utility of our method is illustrated by an analysis of the onset of the most commonly investigated respiratory infection, Pseudomonas aeruginosa, in children with cystic fibrosis through the use of the US Cystic Fibrosis Registry.
doi:10.1111/j.1541-0420.2011.01667.x
PMCID: PMC3312995  PMID: 21950348
Conditional inference; Double censoring; Empirical process; Martingale; Regression quantile; Truncation
20.  Quantile-Specific Penetrance of Genes Affecting Lipoproteins, Adiposity and Height 
PLoS ONE  2012;7(1):e28764.
Quantile-dependent penetrance is proposed to occur when the phenotypic expression of a SNP depends upon the population percentile of the phenotype. To illustrate the phenomenon, quantiles of height, body mass index (BMI), and plasma lipids and lipoproteins were compared to genetic risk scores (GRS) derived from single nucleotide polymorphisms (SNP)s having established genome-wide significance: 180 SNPs for height, 32 for BMI, 37 for low-density lipoprotein (LDL)-cholesterol, 47 for high-density lipoprotein (HDL)-cholesterol, 52 for total cholesterol, and 31 for triglycerides in 1930 subjects. Both phenotypes and GRSs were adjusted for sex, age, study, and smoking status. Quantile regression showed that the slope of the genotype-phenotype relationships increased with the percentile of BMI (P = 0.002), LDL-cholesterol (P = 3×10−8), HDL-cholesterol (P = 5×10−6), total cholesterol (P = 2.5×10−6), and triglyceride distribution (P = 7.5×10−6), but not height (P = 0.09). Compared to a GRS's phenotypic effect at the 10th population percentile, its effect at the 90th percentile was 4.2-fold greater for BMI, 4.9-fold greater for LDL-cholesterol, 1.9-fold greater for HDL-cholesterol, 3.1-fold greater for total cholesterol, and 3.3-fold greater for triglycerides. Moreover, the effect of the rs1558902 (FTO) risk allele was 6.7-fold greater at the 90th than the 10th percentile of the BMI distribution, and that of the rs3764261 (CETP) risk allele was 2.4-fold greater at the 90th than the 10th percentile of the HDL-cholesterol distribution. Conceptually, it maybe useful to distinguish environmental effects on the phenotype that in turn alters a gene's phenotypic expression (quantile-dependent penetrance) from environmental effects affecting the gene's phenotypic expression directly (gene-environment interaction).
doi:10.1371/journal.pone.0028764
PMCID: PMC3250394  PMID: 22235250
21.  Semiparametric Approach to a Random Effects Quantile Regression Model 
We consider a random effects quantile regression analysis of clustered data and propose a semiparametric approach using empirical likelihood. The random regression coefficients are assumed independent with a common mean, following parametrically specified distributions. The common mean corresponds to the population-average effects of explanatory variables on the conditional quantile of interest, while the random coefficients represent cluster specific deviations in the covariate effects. We formulate the estimation of the random coefficients as an estimating equations problem and use empirical likelihood to incorporate the parametric likelihood of the random coefficients. A likelihood-like statistical criterion function is yield, which we show is asymptotically concave in a neighborhood of the true parameter value and motivates its maximizer as a natural estimator. We use Markov Chain Monte Carlo (MCMC) samplers in the Bayesian framework, and propose the resulting quasi-posterior mean as an estimator. We show that the proposed estimator of the population-level parameter is asymptotically normal and the estimators of the random coefficients are shrunk toward the population-level parameter in the first order asymptotic sense. These asymptotic results do not require Gaussian random effects, and the empirical likelihood based likelihood-like criterion function is free of parameters related to the error densities. This makes the proposed approach both flexible and computationally simple. We illustrate the methodology with two real data examples.
doi:10.1198/jasa.2011.tm10470.
PMCID: PMC3280824  PMID: 22347760
Empirical likelihood; Markov Chain Monte Carlo; Quasi-posterior distribution
22.  Accelerated Recurrence Time Models 
For the analysis with recurrent events, we propose a generalization of the accelerated failure time model to allow for evolving covariate effects. These so-called accelerated recurrence time models postulate that time to expected recurrence frequency, upon transformation, is a linear function of covariates with frequency-dependent coefficients. This modeling strategy shares the same spirit as quantile regression. An estimation and inference procedure is developed by generalizing the celebrated Powell’s (1984, 1986) estimator for censored quantile regression. Consistency and asymptotic normality of the proposed estimator are established. An algorithm is devised to attain good computational efficiency. Simulations demonstrate that this proposal performs well under practical settings. This methodology is illustrated in an application to the well-known bladder cancer study.
doi:10.1111/j.1467-9469.2009.00645.x
PMCID: PMC2813065  PMID: 20161629
accelerated failure time model; censored quantile regression; counting process; recurrent events; varying-coefficient model
23.  Determining under- and oversampling of individual particle distributions in microfluidic electrophoresis with orthogonal laser-induced fluorescence detection 
Electrophoresis  2008;29(7):1431-1440.
This report investigates the effects of sample size on the separation and analysis of individual biological particles using microfluidic devices equipped with an orthogonal LIF detector. A detection limit of 17 ± 1 molecules of fluorophore is obtained using this orthogonal LIF detector under a constant flow of fluorescein, which is a significant improvement over epifluorescence, the most common LIF detection scheme used with microfluidic devices. Mitochondria from rat liver tissue and cultured 143B osteosarcoma cells are used as model biological particles. Quantile–quantile (q–q) plots were used to investigate changes in the distributions. When the number of detected mitochondrial events became too large (>72 for rat liver and >98 for 143B mitochondria), oversampling occurs. Statistical overlap theory is used to suggest that the cause of oversampling is that separation power of the microfluidic device presented is not enough to adequately separate large numbers of individual mitochondrial events. Fortunately, q–q plots make it possible to identify and exclude these distributions from data analysis. Additionally, when the number of detected events became too small (<55 for rat liver and <81 for 143B mitochondria) there were not enough events to obtain a statistically relevant mobility distribution, but these distributions can be combined to obtain a statistically relevant electrophoretic mobility distribution.
doi:10.1002/elps.200700470
PMCID: PMC3037013  PMID: 18386300
Microfluidics; Mitochondria; Oversampling; Quantile–quantile; Under-sampling
24.  ELPIS-JP: a dataset of local-scale daily climate change scenarios for Japan 
We developed a dataset of local-scale daily climate change scenarios for Japan (called ELPIS-JP) using the stochastic weather generators (WGs) LARS-WG and, in part, WXGEN. The ELPIS-JP dataset is based on the observed (or estimated) daily weather data for seven climatic variables (daily mean, maximum and minimum temperatures; precipitation; solar radiation; relative humidity; and wind speed) at 938 sites in Japan and climate projections from the multi-model ensemble of global climate models (GCMs) used in the coupled model intercomparison project (CMIP3) and multi-model ensemble of regional climate models form the Japanese downscaling project (called S-5-3). The capability of the WGs to reproduce the statistical features of the observed data for the period 1981–2000 is assessed using several statistical tests and quantile–quantile plots. Overall performance of the WGs was good. The ELPIS-JP dataset consists of two types of daily data: (i) the transient scenarios throughout the twenty-first century using projections from 10 CMIP3 GCMs under three emission scenarios (A1B, A2 and B1) and (ii) the time-slice scenarios for the period 2081–2100 using projections from three S-5-3 regional climate models. The ELPIS-JP dataset is designed to be used in conjunction with process-based impact models (e.g. crop models) for assessment, not only the impacts of mean climate change but also the impacts of changes in climate variability, wet/dry spells and extreme events, as well as the uncertainty of future impacts associated with climate models and emission scenarios. The ELPIS-JP offers an excellent platform for probabilistic assessment of climate change impacts and potential adaptation at a local scale in Japan.
doi:10.1098/rsta.2011.0305
PMCID: PMC3261434  PMID: 22291226
ELPIS-JP; stochastic weather generator; LARS-WG; climate change; impact assessment; Japan
25.  Robust imputation method for missing values in microarray data 
BMC Bioinformatics  2007;8(Suppl 2):S6.
Background
When analyzing microarray gene expression data, missing values are often encountered. Most multivariate statistical methods proposed for microarray data analysis cannot be applied when the data have missing values. Numerous imputation algorithms have been proposed to estimate the missing values. In this study, we develop a robust least squares estimation with principal components (RLSP) method by extending the local least square imputation (LLSimpute) method. The basic idea of our method is to employ quantile regression to estimate the missing values, using the estimated principal components of a selected set of similar genes.
Results
Using the normalized root mean squares error, the performance of the proposed method was evaluated and compared with other previously proposed imputation methods. The proposed RLSP method clearly outperformed the weighted k-nearest neighbors imputation (kNNimpute) method and LLSimpute method, and showed competitive results with Bayesian principal component analysis (BPCA) method.
Conclusion
Adapting the principal components of the selected genes and employing the quantile regression model improved the robustness and accuracy of missing value imputation. Thus, the proposed RLSP method is, according to our empirical studies, more robust and accurate than the widely used kNNimpute and LLSimpute methods.
doi:10.1186/1471-2105-8-S2-S6
PMCID: PMC1892075  PMID: 17493255

Results 1-25 (708933)