Search tips
Search criteria

Results 1-8 (8)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
author:("Gao, shoutgun")
1.  Predicting Disease-Related Subnetworks for Type 1 Diabetes Using a New Network Activity Score 
In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
PMCID: PMC3459426  PMID: 22917479
2.  Identification of highly synchronized subnetworks from gene expression data 
BMC Bioinformatics  2013;14(Suppl 9):S5.
There has been a growing interest in identifying context-specific active protein-protein interaction (PPI) subnetworks through integration of PPI and time course gene expression data. However the interaction dynamics during the biological process under study has not been sufficiently considered previously.
Here we propose a topology-phase locking (TopoPL) based scoring metric for identifying active PPI subnetworks from time series expression data. First the temporal coordination in gene expression changes is evaluated through phase locking analysis; The results are subsequently integrated with PPI to define an activity score for each PPI subnetwork, based on individual member expression, as well topological characteristics of the PPI network and of the expression temporal coordination network; Lastly, the subnetworks with the top scores in the whole PPI network are identified through simulated annealing search.
Application of TopoPL to simulated data and to the yeast cell cycle data showed that it can more sensitively identify biologically meaningful subnetworks than the method that only utilizes the static PPI topology, or the additive scoring method. Using TopoPL we identified a core subnetwork with 49 genes important to yeast cell cycle. Interestingly, this core contains a protein complex known to be related to arrangement of ribosome subunits that exhibit extremely high gene expression synchronization.
Inclusion of interaction dynamics is important to the identification of relevant gene networks.
PMCID: PMC3698028  PMID: 23901792
3.  Cross Tissue Trait-Pathway Network Reveals the Importance of Oxidative Stress and Inflammation Pathways in Obesity-Induced Diabetes in Mouse 
PLoS ONE  2012;7(9):e44544.
Complex disorders often involve dysfunctions in multiple tissue organs. Elucidating the communication among them is important to understanding disease pathophysiology. In this study we integrate multiple tissue gene expression and quantitative trait measurements of an obesity-induced diabetes mouse model, with databases of molecular interaction networks, to construct a cross tissue trait-pathway network. The animals belong to two strains of mice (BTBR or B6), of two obesity status (obese or lean), and at two different ages (4 weeks and 10 weeks). Only 10 week obese BTBR animals are diabetic. The expression data was first utilized to determine the state of every pathway in each tissue, which is subsequently utilized to construct a pathway co-expression network and to define trait-relevant and trait-linking pathways. Among the six tissues profiled, the adipose contains the largest number of trait-linking pathways. Among the eight traits measured, the body weight and plasma insulin level possess the most number of relevant and linking pathways. Topological analysis of the trait-pathway network revealed that the glycolysis/gluconeogenesis pathway in liver and the insulin signaling pathway in muscle are of top importance to the information flow in the network, with the highest degrees and betweenness centralities. Interestingly, pathways related to metabolism and oxidative stress actively interact with many other pathways in all animals, whereas, among the 10 week animals, the inflammation pathways were preferentially interactive in the diabetic ones only. In summary, our method offers a systems approach to delineate disease trait relevant intra- and cross tissue pathway interactions, and provides insights to the molecular basis of the obesity-induced diabetes.
PMCID: PMC3444455  PMID: 23028558
4.  Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data 
BMC Bioinformatics  2011;12:359.
Bayesian Network (BN) is a powerful approach to reconstructing genetic regulatory networks from gene expression data. However, expression data by itself suffers from high noise and lack of power. Incorporating prior biological knowledge can improve the performance. As each type of prior knowledge on its own may be incomplete or limited by quality issues, integrating multiple sources of prior knowledge to utilize their consensus is desirable.
We introduce a new method to incorporate the quantitative information from multiple sources of prior knowledge. It first uses the Naïve Bayesian classifier to assess the likelihood of functional linkage between gene pairs based on prior knowledge. In this study we included cocitation in PubMed and schematic similarity in Gene Ontology annotation. A candidate network edge reservoir is then created in which the copy number of each edge is proportional to the estimated likelihood of linkage between the two corresponding genes. In network simulation the Markov Chain Monte Carlo sampling algorithm is adopted, and samples from this reservoir at each iteration to generate new candidate networks. We evaluated the new algorithm using both simulated and real gene expression data including that from a yeast cell cycle and a mouse pancreas development/growth study. Incorporating prior knowledge led to a ~2 fold increase in the number of known transcription regulations recovered, without significant change in false positive rate. In contrast, without the prior knowledge BN modeling is not always better than a random selection, demonstrating the necessity in network modeling to supplement the gene expression data with additional information.
our new development provides a statistical means to utilize the quantitative information in prior biological knowledge in the BN modeling of gene expression data, which significantly improves the performance.
PMCID: PMC3203352  PMID: 21884587
5.  Global analysis of phase locking in gene expression during cell cycle: the potential in network modeling 
BMC Systems Biology  2010;4:167.
In nonlinear dynamic systems, synchrony through oscillation and frequency modulation is a general control strategy to coordinate multiple modules in response to external signals. Conversely, the synchrony information can be utilized to infer interaction. Increasing evidence suggests that frequency modulation is also common in transcription regulation.
In this study, we investigate the potential of phase locking analysis, a technique to study the synchrony patterns, in the transcription network modeling of time course gene expression data. Using the yeast cell cycle data, we show that significant phase locking exists between transcription factors and their targets, between gene pairs with prior evidence of physical or genetic interactions, and among cell cycle genes. When compared with simple correlation we found that the phase locking metric can identify gene pairs that interact with each other more efficiently. In addition, it can automatically address issues of arbitrary time lags or different dynamic time scales in different genes, without the need for alignment. Interestingly, many of the phase locked gene pairs exhibit higher order than 1:1 locking, and significant phase lags with respect to each other. Based on these findings we propose a new phase locking metric for network reconstruction using time course gene expression data. We show that it is efficient at identifying network modules of focused biological themes that are important to cell cycle regulation.
Our result demonstrates the potential of phase locking analysis in transcription network modeling. It also suggests the importance of understanding the dynamics underlying the gene expression patterns.
PMCID: PMC3017040  PMID: 21129191
6.  Quality Weighted Mean and T-test in Microarray Analysis Lead to Improved Accuracy in Gene Expression Measurements and Reduced Type I and II Errors in Differential Expression Detection 
Previously we have reported a microarray image processing and data analysis package Matarray, where quality scores are defined for every spot that reflect the reliability and variability of the data acquired from each spot. In this article we present a new development in Matarray, where the quality scores are incorporated as weights in the statistical evaluation and data mining of microarray data. With this approach filtering of poor quality data is automatically achieved through the reduction in their weights, thereby eliminating the need to manually flag or remove bad data points, as well as the problem of missing values. More significantly, utilizing a set of control clones spiked in at known input ratios ranging from 1:30 to 30:1, we find that the quality-weighted statistics leads to more accurate gene expression measurements and more sensitive detection of their changes with significantly lower type II error rates. Further, we have applied the quality-weighted clustering to a time-course microarray data set, and find that the new algorithm improves grouping accuracy. In summary, incorporating quantitative quality measure of microarray data as weight in complex data analysis leads to improved reliability and convenience. In addition it provides a practical way to deal with the missing value issue in establishing automatic statistical tests.
PMCID: PMC2819534  PMID: 20151041
microarray; quality score; weighted algorithms; accurate expression measurement
7.  Predicting Type 1 Diabetes Candidate Genes using Human Protein-Protein Interaction Networks 
Journal of computer science and systems biology  2009;2:10.4172/jcsb.1000025.
Proteins directly interacting with each other tend to have similar functions and be involved in the same cellular processes. Mutations in genes that code for them often lead to the same family of disease phenotypes. Efforts have been made to prioritize positional candidate genes for complex diseases utilize the protein-protein interaction (PPI) information. But such an approach is often considered too general to be practically useful for specific diseases.
In this study we investigate the efficacy of this approach in type 1 diabetes (T1D). 266 known disease genes, and 983 positional candidate genes from the 18 established linkage loci of T1D, are compiled from the T1Dbase ( We found that the PPI network of known T1D genes has distinct topological features from others, with significantly higher number of interactions among themselves even after adjusting for their high network degrees (p<1e-5). We then define those positional candidates that are first degree PPI neighbours of the 266 known disease genes to be new candidate disease genes. This leads to a list of 68 genes for further study. Cross validation using the known disease genes as benchmark reveals that the enrichment is ~17.1 fold over random selection, and ~4 fold better than using the linkage information alone. We find that the citations of the new candidates in T1D-related publications are significantly (p<1e-7) more than random, even after excluding the co-citation with the known disease genes; they are significantly over-represented (p<1e-10) in the top 30 GO terms shared by known disease genes. Furthermore, sequence analysis reveals that they contain significantly (p<0.0004) more protein domains that are known to be relevant to T1D. These findings provide indirect validation of the newly predicted candidates.
Our study demonstrates the potential of the PPI information in prioritizing positional candidate genes for T1D.
PMCID: PMC2818071  PMID: 20148193
8.  TAPPA: topological analysis of pathway phenotype association 
Bioinformatics (Oxford, England)  2007;23(22):3100-3102.
Extracting biological insight from microarray data is important but challenging. Here we describe TAPPA, a java-based tool, for identification of phenotype-associated genetic pathways utilizing the pathway topological measures. This is achieved by first calculating a Pathway Connectivity Index (PCI) for each pathway, followed by evaluating its correlation to the phenotypic variation. Our PCI definition not only efficiently captures the contributions from genes that show subtle but consistent changes in expression, but also naturally overweighs the hub genes that interact with a large number of other genes in the pathway. TAPPA also allows evaluation of sub-modules within a pathway and their association to phenotypes.
PMCID: PMC2473868  PMID: 17890270

Results 1-8 (8)