Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)
more »
Year of Publication
Document Types
The annals of applied statistics  2013;7(1):391-417.
Gaussian Graphical Models (GGMs) have been used to construct genetic regulatory networks where regularization techniques are widely used since the network inference usually falls into a high–dimension–low–sample–size scenario. Yet, finding the right amount of regularization can be challenging, especially in an unsupervised setting where traditional methods such as BIC or cross-validation often do not work well. In this paper, we propose a new method — Bootstrap Inference for Network COnstruction (BINCO) — to infer networks by directly controlling the false discovery rates (FDRs) of the selected edges. This method fits a mixture model for the distribution of edge selection frequencies to estimate the FDRs, where the selection frequencies are calculated via model aggregation. This method is applicable to a wide range of applications beyond network construction. When we applied our proposed method to building a gene regulatory network with microarray expression breast cancer data, we were able to identify high-confidence edges and well-connected hub genes that could potentially play important roles in understanding the underlying biological processes of breast cancer.
PMCID: PMC3930359  PMID: 24563684
high dimensional data; GGM; model aggregation; mixture model; FDR
Journal of statistical research  2010;44(1):103-107.
In a recent paper [4], Efron pointed out that an important issue in large-scale multiple hypothesis testing is that the null distribution may be unknown and need to be estimated. Consider a Gaussian mixture model, where the null distribution is known to be normal but both null parameters-the mean and the variance-are unknown. We address the problem with a method based on Fourier transformation. The Fourier approach was first studied by Jin and Cai [9], which focuses on the scenario where any non-null effect has either the same or a larger variance than that of the null effects. In this paper, we review the main ideas in [9], and propose a generalized Fourier approach to tackle the problem under another scenario: any non-null effect has a larger mean than that of the null effects, but no constraint is imposed on the variance. This approach and that in [9] complement with each other: each approach is successful in a wide class of situations where the other fails. Also, we extend the Fourier approach to estimate the proportion of non-null effects. The proposed procedures perform well both in theory and on simulated data.
PMCID: PMC3928715  PMID: 24563569
empirical null; Fourier transformation; generalized Fourier transformation; proportion of non-null effects; sample size calculation
3.  Regularized Multivariate Regression for Identifying Master Predictors with Application to Integrative Genomics Study of Breast Cancer 
In this paper, we propose a new method remMap — REgularized Multivariate regression for identifying MAster Predictors — for fitting multivariate response regression models under the high-dimension-low-sample-size setting. remMap is motivated by investigating the regulatory relationships among different biological molecules based on multiple types of high dimensional genomic data. Particularly, we are interested in studying the influence of DNA copy number alterations on RNA transcript levels. For this purpose, we model the dependence of the RNA expression levels on DNA copy numbers through multivariate linear regressions and utilize proper regularization to deal with the high dimensionality as well as to incorporate desired network structures. Criteria for selecting the tuning parameters are also discussed. The performance of the proposed method is illustrated through extensive simulation studies. Finally, remMap is applied to a breast cancer study, in which genome wide RNA transcript levels and DNA copy numbers were measured for 172 tumor samples. We identify a trans-hub region in cytoband 17q12–q21, whose amplification influences the RNA expression levels of more than 30 unlinked genes. These findings may lead to a better understanding of breast cancer pathology.
PMCID: PMC3905690  PMID: 24489618
sparse regression; MAP(MAster Predictor) penalty; DNA copy number alteration; RNA transcript level; v-fold cross validation
4.  SNP set analysis for detecting disease association using exon sequence data 
BMC Proceedings  2011;5(Suppl 9):S91.
Rare variants are believed to play an important role in disease etiology. Recent advances in high-throughput sequencing technology enable investigators to systematically characterize the genetic effects of both common and rare variants. We introduce several approaches that simultaneously test the effects of common and rare variants within a single-nucleotide polymorphism (SNP) set based on logistic regression models and logistic kernel machine models. Gene-environment interactions and SNP-SNP interactions are also considered in some of these models. We illustrate the performance of these methods using the unrelated individuals data from Genetic Analysis Workshop 17. Three true disease genes (FLT1, PIK3C3, and KDR) were consistently selected using the proposed methods. In addition, compared to logistic regression models, the logistic kernel machine models were more powerful, presumably because they reduced the effective number of parameters through regularization. Our results also suggest that a screening step is effective in decreasing the number of false-positive findings, which is often a big concern for association studies.
PMCID: PMC3287933  PMID: 22373133
5.  Partial Correlation Estimation by Joint Sparse Regression Models 
In this paper, we propose a computationally efficient approach —space(Sparse PArtial Correlation Estimation)— for selecting non-zero partial correlations under the high-dimension-low-sample-size setting. This method assumes the overall sparsity of the partial correlation matrix and employs sparse regression techniques for model fitting. We illustrate the performance of space by extensive simulation studies. It is shown that space performs well in both non-zero partial correlation selection and the identification of hub variables, and also outperforms two existing methods. We then apply space to a microarray breast cancer data set and identify a set of hub genes which may provide important insights on genetic regulatory networks. Finally, we prove that, under a set of suitable assumptions, the proposed procedure is asymptotically consistent in terms of model selection and parameter estimation.
PMCID: PMC2770199  PMID: 19881892
concentration network; high-dimension-low-sample-size; lasso; shooting; genetic regulatory network
6.  Combining multiple family-based association studies 
BMC Proceedings  2007;1(Suppl 1):S162.
While high-throughput genotyping technologies are becoming readily available, the merit of using these technologies to perform genome-wide association studies has not been established. One major concern is that for studies of complex diseases and traits, the whole-genome approach requires such large sample sizes that both recruitment and genotyping pose considerable challenge. Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies. Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers. Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.
PMCID: PMC2367479  PMID: 18466508
7.  Controlling for false positive findings of trans-hubs in expression quantitative trait loci mapping 
BMC Proceedings  2007;1(Suppl 1):S157.
In the fast-developing field of expression quantitative traits loci (eQTL) studies, much interest has been concentrated on detecting genomic regions containing transcriptional regulators that influence multiple expression phenotypes (trans-hubs). In this paper, we develop statistical methods for eQTL mapping and propose a new procedure for investigating candidate trans-hubs. We use data from the Genetic Analysis Workshop 15 to illustrate our methods. After correlations among expressions were accounted for, the previously detected trans-hubs are no longer significant. Our results suggest that conclusions regarding regulation hot spots should be treated with great caution.
PMCID: PMC2367467  PMID: 18466502

Results 1-7 (7)