Search tips
Search criteria

Results 1-11 (11)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
Document Types
author:("Liang, famine")
1.  A fast multilocus test with adaptive SNP selection for large-scale genetic-association studies 
As increasing evidence suggests that multiple correlated genetic variants could jointly influence the outcome, a multilocus test that aggregates association evidence across multiple genetic markers in a considered gene or a genomic region may be more powerful than a single-marker test for detecting susceptibility loci. We propose a multilocus test, AdaJoint, which adopts a variable selection procedure to identify a subset of genetic markers that jointly show the strongest association signal, and defines the test statistic based on the selected genetic markers. The P-value from the AdaJoint test is evaluated by a computationally efficient algorithm that effectively adjusts for multiple-comparison, and is hundreds of times faster than the standard permutation method. Simulation studies demonstrate that AdaJoint has the most robust performance among several commonly used multilocus tests. We perform multilocus analysis of over 26 000 genes/regions on two genome-wide association studies of pancreatic cancer. Compared with its competitors, AdaJoint identifies a much stronger association between the gene CLPTM1L and pancreatic cancer risk (6.0 × 10−8), with the signal optimally captured by two correlated single-nucleotide polymorphisms (SNPs). Finally, we show AdaJoint as a powerful tool for mapping cis-regulating methylation quantitative trait loci on normal breast tissues, and find many CpG sites whose methylation levels are jointly regulated by multiple SNPs nearby.
PMCID: PMC3992564  PMID: 24022295
genome-wide association study; cis-regulating meQTLs mapping; multilocus test; variable selection; multiple comparisons; pathway analysis
2.  Bayesian Analysis for Exponential Random Graph Models Using the Adaptive Exchange Sampler* 
Statistics and its interface  2013;6(4):559-576.
Exponential random graph models have been widely used in social network analysis. However, these models are extremely difficult to handle from a statistical viewpoint, because of the intractable normalizing constant and model degeneracy. In this paper, we consider a fully Bayesian analysis for exponential random graph models using the adaptive exchange sampler, which solves the intractable normalizing constant and model degeneracy issues encountered in Markov chain Monte Carlo (MCMC) simulations. The adaptive exchange sampler can be viewed as a MCMC extension of the exchange algorithm, and it generates auxiliary networks via an importance sampling procedure from an auxiliary Markov chain running in parallel. The convergence of this algorithm is established under mild conditions. The adaptive exchange sampler is illustrated using a few social networks, including the Florentine business network, molecule synthetic network, and dolphins network. The results indicate that the adaptive exchange algorithm can produce more accurate estimates than approximate exchange algorithms, while maintaining the same computational efficiency.
PMCID: PMC3956133  PMID: 24653788
Exchange Algorithm; Exponential Random Graph Model; Adaptive Markov chain Monte Carlo; Social Network
3.  Bayesian Peak Picking for NMR Spectra 
Protein structure determination is a very important topic in structural genomics, which helps people to understand varieties of biological functions such as protein-protein interactions, protein–DNA interactions and so on. Nowadays, nuclear magnetic resonance (NMR) has often been used to determine the three-dimensional structures of protein in vivo. This study aims to automate the peak picking step, the most important and tricky step in NMR structure determination. We propose to model the NMR spectrum by a mixture of bivariate Gaussian densities and use the stochastic approximation Monte Carlo algorithm as the computational tool to solve the problem. Under the Bayesian framework, the peak picking problem is casted as a variable selection problem. The proposed method can automatically distinguish true peaks from false ones without preprocessing the data. To the best of our knowledge, this is the first effort in the literature that tackles the peak picking problem for NMR spectrum data using Bayesian method.
PMCID: PMC4411369  PMID: 24184964
Markov chain Monte Carlo; Nuclear magnetic resonance; Peak picking
4.  Bayesian Detection of Causal Rare Variants under Posterior Consistency 
PLoS ONE  2013;8(7):e69633.
Identification of causal rare variants that are associated with complex traits poses a central challenge on genome-wide association studies. However, most current research focuses only on testing the global association whether the rare variants in a given genomic region are collectively associated with the trait. Although some recent work, e.g., the Bayesian risk index method, have tried to address this problem, it is unclear whether the causal rare variants can be consistently identified by them in the small--large- situation. We develop a new Bayesian method, the so-called Bayesian Rare Variant Detector (BRVD), to tackle this problem. The new method simultaneously addresses two issues: (i) (Global association test) Are there any of the variants associated with the disease, and (ii) (Causal variant detection) Which variants, if any, are driving the association. The BRVD ensures the causal rare variants to be consistently identified in the small--large- situation by imposing some appropriate prior distributions on the model and model specific parameters. The numerical results indicate that the BRVD is more powerful for testing the global association than the existing methods, such as the combined multivariate and collapsing test, weighted sum statistic test, RARECOVER, sequence kernel association test, and Bayesian risk index, and also more powerful for identification of causal rare variants than the Bayesian risk index method. The BRVD has also been successfully applied to the Early-Onset Myocardial Infarction (EOMI) Exome Sequence Data. It identified a few causal rare variants that have been verified in the literature.
PMCID: PMC3724943  PMID: 23922764
5.  Intrinsic Regression Models for Medial Representation of Subcortical Structures 
The aim of this paper is to develop a semiparametric model for describing the variability of the medial representation of subcortical structures, which belongs to a Riemannian manifold, and establishing its association with covariates of interest, such as diagnostic status, age and gender. We develop a two-stage estimation procedure to calculate the parameter estimates. The first stage is to calculate an intrinsic least squares estimator of the parameter vector using the annealing evolutionary stochastic approximation Monte Carlo algorithm and then the second stage is to construct a set of estimating equations to obtain a more efficient estimate with the intrinsic least squares estimate as the starting point. We use Wald statistics to test linear hypotheses of unknown parameters and establish their limiting distributions. Simulation studies are used to evaluate the accuracy of our parameter estimates and the finite sample performance of the Wald statistics. We apply our methods to the detection of the difference in the morphological changes of the left and right hippocampi between schizophrenia patients and healthy controls using medial shape description.
PMCID: PMC3685886  PMID: 23794769
Intrinsic least squares estimator; Medial representation; Semiparametric model; Wald statistic
6.  Efficient p-value evaluation for resampling-based tests 
Biostatistics (Oxford, England)  2011;12(3):582-593.
The resampling-based test, which often relies on permutation or bootstrap procedures, has been widely used for statistical hypothesis testing when the asymptotic distribution of the test statistic is unavailable or unreliable. It requires repeated calculations of the test statistic on a large number of simulated data sets for its significance level assessment, and thus it could become very computationally intensive. Here, we propose an efficient p-value evaluation procedure by adapting the stochastic approximation Markov chain Monte Carlo algorithm. The new procedure can be used easily for estimating the p-value for any resampling-based test. We show through numeric simulations that the proposed procedure can be 100–500 000 times as efficient (in term of computing time) as the standard resampling-based procedure when evaluating a test statistic with a small p-value (e.g. less than 10 − 6). With its computational burden reduced by this proposed procedure, the versatile resampling-based test would become computationally feasible for a much wider range of applications. We demonstrate the application of the new method by applying it to a large-scale genetic association study of prostate cancer.
PMCID: PMC3114653  PMID: 21209154
Bootstrap procedures; Genetic association studies; p-value; Resampling-based tests; Stochastic approximation Markov chain Monte Carlo
7.  Stochastic Generalized Method of Moments 
The generalized method of moments (GMM) is a very popular estimation and inference procedure based on moment conditions. When likelihood-based methods are difficult to implement, one can often derive various moment conditions and construct the GMM objective function. However, minimization of the objective function in the GMM may be challenging, especially over a large parameter space. Due to the special structure of the GMM, we propose a new sampling-based algorithm, the stochastic GMM sampler, which replaces the multivariate minimization problem by a series of conditional sampling procedures. We develop the theoretical properties of the proposed iterative Monte Carlo method, and demonstrate its superior performance over other GMM estimation procedures in simulation studies. As an illustration, we apply the stochastic GMM sampler to a Medfly life longevity study. Supplemental materials for the article are available online.
PMCID: PMC3286612  PMID: 22375093
Generalized linear model; Gibbs sampling; Iterative Monte Carlo; Markov chain Monte Carlo; Metropolis algorithm; Moment condition
8.  A Flexible Bayesian Model for Studying Gene–Environment Interaction 
PLoS Genetics  2012;8(1):e1002482.
An important follow-up step after genetic markers are found to be associated with a disease outcome is a more detailed analysis investigating how the implicated gene or chromosomal region and an established environment risk factor interact to influence the disease risk. The standard approach to this study of gene–environment interaction considers one genetic marker at a time and therefore could misrepresent and underestimate the genetic contribution to the joint effect when one or more functional loci, some of which might not be genotyped, exist in the region and interact with the environment risk factor in a complex way. We develop a more global approach based on a Bayesian model that uses a latent genetic profile variable to capture all of the genetic variation in the entire targeted region and allows the environment effect to vary across different genetic profile categories. We also propose a resampling-based test derived from the developed Bayesian model for the detection of gene–environment interaction. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the Bayesian model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region, which contains a cluster of nicotinic acetylcholine receptor genes and has been shown to be associated with both lung cancer and smoking behavior. We find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect appearing to be stronger in subjects with a genetic profile associated with a higher lung cancer risk; the conventional test of gene–environment interaction based on the single-marker approach is far from significant.
Author Summary
Many common diseases result from a complex interplay of genetic and environmental risk factors. It is important to study the potential genetic and environmental risk factors jointly in order to achieve a better understanding of the mechanisms underlying disease development. The standard single-marker approach that studies the environmental risk factor and one genetic marker at a time could misrepresent the gene–environment interaction, as the single genetic marker might not be an appropriate surrogate for the underlying genetic functioning polymorphisms. We propose a method to look at gene–environment interaction at the gene/region level by integrating information observed on multiple genetic markers within the selected gene/region with measures of environmental exposure. Using data collected in the Environment and Genetics in Lung Cancer Etiology (EAGLE) study, we apply the proposed model to evaluate the joint effect of smoking intensity and genetic variants in the 15q25.1 region and find evidence for gene–environment interaction (P-value = 0.016), with the smoking effect varying according to a subject's genetic profile.
PMCID: PMC3266891  PMID: 22291610
9.  BCL-2 (-938C > A) polymorphism is associated with breast cancer susceptibility 
BMC Medical Genetics  2011;12:48.
BCL-2 (B-cell leukemia/lymphoma 2) gene has been demonstrated to be associated with breast cancer development and a single nucleotide polymorphism (SNP; -938C > A) has been identified recently. To investigate whether this polymorphism functions as a modifier of breast cancer development, we analyzed the distribution of genotype frequency, as well as the association of genotype with clinicopathological characteristics. Furthermore, we also studied the effects of this SNP on Bcl-2 expression in vitro.
We genotyped the BCL-2 (-938C > A) in 114 patients and 107 controls, and analyzed the estrogen receptor (ER), progestogen receptor (PR), C-erbB2 and Ki67 status with immunohistochemistry (IHC). Different Bcl-2 protein levels in breast cancer cell lines were determined using western blot. Logistic regression model was applied in statistical analysis.
We found that homozygous AA genotype was associated with an increased risk (AA vs AC+CC) by 2.37-fold for breast cancer development and significant association was observed between nodal status and different genotypes of BCL-2 (-938C > A) (p = 0.014). AA genotype was more likely to develop into lobular breast cancer (p = 0.036). The result of western blot analysis indicated that allele A was associated with the lower level of Bcl-2 expression in breast cancer cell lines.
AA genotype of BCL-2 (-938C > A) is associated with susceptibility of breast cancer, and this genotype is only associated with the nodal status and pathological diagnosis of breast cancer. The polymorphism has an effect on Bcl-2 expression but needs further investigation.
PMCID: PMC3078853  PMID: 21457555
10.  Longitudinal functional principal component modeling via Stochastic Approximation Monte Carlo 
The authors consider the analysis of hierarchical longitudinal functional data based upon a functional principal components approach. In contrast to standard frequentist approaches to selecting the number of principal components, the authors do model averaging using a Bayesian formulation. A relatively straightforward reversible jump Markov Chain Monte Carlo formulation has poor mixing properties and in simulated data often becomes trapped at the wrong number of principal components. In order to overcome this, the authors show how to apply Stochastic Approximation Monte Carlo (SAMC) to this problem, a method that has the potential to explore the entire space and does not become trapped in local extrema. The combination of reversible jump methods and SAMC in hierarchical longitudinal functional data is simplified by a polar coordinate representation of the principal components. The approach is easy to implement and does well in simulated data in determining the distribution of the number of principal components, and in terms of its frequentist estimation properties. Empirical applications are also presented.
PMCID: PMC2915799  PMID: 20689648
Functional data analysis; Hierarchical models; Longitudinal data; Markov chain monte carlo; Principal Components; Stochastic approximation
11.  Bayesian modeling of ChIP-chip data using latent variables 
BMC Bioinformatics  2009;10:352.
The ChIP-chip technology has been used in a wide range of biomedical studies, such as identification of human transcription factor binding sites, investigation of DNA methylation, and investigation of histone modifications in animals and plants. Various methods have been proposed in the literature for analyzing the ChIP-chip data, such as the sliding window methods, the hidden Markov model-based methods, and Bayesian methods. Although, due to the integrated consideration of uncertainty of the models and model parameters, Bayesian methods can potentially work better than the other two classes of methods, the existing Bayesian methods do not perform satisfactorily. They usually require multiple replicates or some extra experimental information to parametrize the model, and long CPU time due to involving of MCMC simulations.
In this paper, we propose a Bayesian latent model for the ChIP-chip data. The new model mainly differs from the existing Bayesian models, such as the joint deconvolution model, the hierarchical gamma mixture model, and the Bayesian hierarchical model, in two respects. Firstly, it works on the difference between the averaged treatment and control samples. This enables the use of a simple model for the data, which avoids the probe-specific effect and the sample (control/treatment) effect. As a consequence, this enables an efficient MCMC simulation of the posterior distribution of the model, and also makes the model more robust to the outliers. Secondly, it models the neighboring dependence of probes by introducing a latent indicator vector. A truncated Poisson prior distribution is assumed for the latent indicator variable, with the rationale being justified at length.
The Bayesian latent method is successfully applied to real and ten simulated datasets, with comparisons with some of the existing Bayesian methods, hidden Markov model methods, and sliding window methods. The numerical results indicate that the Bayesian latent method can outperform other methods, especially when the data contain outliers.
PMCID: PMC2779819  PMID: 19857265

Results 1-11 (11)