Search tips
Search criteria

Results 1-25 (62)

Clipboard (0)

Select a Filter Below

Year of Publication
author:("Jia, pailin")
1.  EW_dmGWAS: edge-weighted dense module search for genome-wide association studies and gene expression profiles 
Bioinformatics  2015;31(15):2591-2594.
Summary: We previously developed dmGWAS to search for dense modules in a human protein–protein interaction (PPI) network; it has since become a popular tool for network-assisted analysis of genome-wide association studies (GWAS). dmGWAS weights nodes by using GWAS signals. Here, we introduce an upgraded algorithm, EW_dmGWAS, to boost GWAS signals in a node- and edge-weighted PPI network. In EW_dmGWAS, we utilize condition-specific gene expression profiles for edge weights. Specifically, differential gene co-expression is used to infer the edge weights. We applied EW_dmGWAS to two diseases and compared it with other relevant methods. The results suggest that EW_dmGWAS is more powerful in detecting disease-associated signals.
Availability and implementation: The algorithm of EW_dmGWAS is implemented in the R package dmGWAS_3.0 and is available at
Contact: or
Supplementary information: Supplementary materials are available at Bioinformatics online.
PMCID: PMC4514922  PMID: 25805723
2.  Inconsistency and features of single nucleotide variants detected in whole exome sequencing versus transcriptome sequencing: A case study in lung cancer 
Methods (San Diego, Calif.)  2015;83:118-127.
Whole exome sequencing (WES) and RNA sequencing (RNA-Seq) are two main platforms used for next-generation sequencing (NGS). While WES is primarily for DNA variant discovery and RNA-Seq is mainly for measurement of gene expression, both can be used for detection of genetic variants, especially single nucleotide variants (SNVs). How consistently variants can be detected from WES and RNA-Seq has not been systematically evaluated. In this study, we examined the technical and biological inconsistencies in SNV detection using WES and RNA-Seq data from 27 pairs of tumor and matched normal samples. We analyzed SNVs in three categories: WES unique - those only detected in WES, RNA-Seq unique - those only detected in RNA-Seq, and shared – those detected in both. We found a small overlap (average ∼14%) between the SNVs called in WES and RNA-Seq. The WES unique SNVs were mainly due to low coverage, low expression, or their location on the non-transcribed strand in RNA-Seq data, while the RNA-Seq unique SNVs were primarily due to their location out of the WES-capture boundary regions (accounting ∼71%), as well as low coverage of the regions, low coverage of the mutant alleles or RNA-editing. The shared SNVs had high locus-specific coverage in both WES and RNA-Seq and high gene expression levels. Additionally, WES unique and RNA-Seq unique SNVs showed different nucleotide substitution patterns, e.g., ∼55% of RNA-Seq unique variants were A:T→G:C, a hallmark of RNA editing. This study provides an important evaluation on the inconsistencies of somatic SNVs called in WES and RNA-Seq data.
PMCID: PMC4509831  PMID: 25913717
Single nucleotide variants; whole exome sequencing; RNA-Seq; somatic mutations; allele frequency; RNA editing
3.  Genetic Relationship between Schizophrenia and Nicotine Dependence 
Scientific Reports  2016;6:25671.
It is well known that most schizophrenia patients smoke cigarettes. There are different hypotheses postulating the underlying mechanisms of this comorbidity. We used summary statistics from large meta-analyses of plasma cotinine concentration (COT), Fagerström test for nicotine dependence (FTND) and schizophrenia to examine the genetic relationship between these traits. We found that schizophrenia risk scores calculated at P-value thresholds of 5 × 10−3 and larger predicted FTND and cigarettes smoked per day (CPD), suggesting that genes most significantly associated with schizophrenia were not associated with FTND/CPD, consistent with the self-medication hypothesis. The COT risk scores predicted schizophrenia diagnosis at P-values of 5 × 10−3 and smaller, implying that genes most significantly associated with COT were associated with schizophrenia. These results implicated that schizophrenia and FTND/CPD/COT shared some genetic liability. Based on this shared liability, we identified multiple long non-coding RNAs and RNA binding protein genes (DA376252, BX089737, LOC101927273, LINC01029, LOC101928622, HY157071, DA902558, RBFOX1 and TINCR), protein modification genes (MANBA, UBE2D3, and RANGAP1) and energy production genes (XYLB, MTRF1 and ENOX1) that were associated with both conditions. Further analyses revealed that these shared genes were enriched in calcium signaling, long-term potentiation and neuroactive ligand-receptor interaction pathways that played a critical role in cognitive functions and neuronal plasticity.
PMCID: PMC4862382  PMID: 27164557
4.  Heterogeneous DNA Methylation Contributes to Tumorigenesis Through Inducing the Loss of Coexpression Connectivity in Colorectal Cancer 
Genes, chromosomes & cancer  2014;54(2):110-121.
Increasing evidence indicates the high heterogeneity of cancer cells. Recent studies have revealed distinct subtypes of DNA methylation in colorectal cancer (CRC); however, the mechanism of heterogeneous methylation remains poorly understood. Gene expression is a natural, intermediate quantitative trait that bridges genotypic and phenotypic features. In this work, we studied the role of heterogeneous DNA methylation in tumorigenesis via gene expression analyses. Specifically, we integrated methylation and expression data in normal and tumor tissues, and examined the perturbations in coexpression patterns. We found that the heterogeneity of methylation leads to significant loss of coexpression connectivity in CRC; this finding was validated in an independent cohort. Functional analyses showed that the lost coexpression partners participate in important cancer-related pathways/networks, such as ErbB and mitogen-activated protein kinase (MAPK) signaling pathways. Our analyses suggest that the loss of coexpression connectivity induced by methylation heterogeneity might play an important role in CRC. To our knowledge, this is the first study to interpret methylation heterogeneity in cancer from the perspective of coexpression perturbation. Our results provide new perspectives in tumor biology and may facilitate the identification of potential biomedical therapies for cancer treatment.
PMCID: PMC4785867  PMID: 25407423
5.  Optimizing the sequence of anti-EGFR targeted therapy in EGFR-mutant lung cancer 
Molecular cancer therapeutics  2014;14(2):542-552.
Metastatic EGFR-mutant lung cancers are sensitive to the first- and second- generation EGFR tyrosine kinase inhibitors (TKIs), gefitinib, erlotinib, and afatinib, but resistance develops. Acquired resistance (AR) to gefitinib or erlotinib occurs most commonly (>50%) via the emergence of a second-site EGFR mutation, T790M. Two strategies to overcome T790M-mediated resistance are dual inhibition of EGFR with afatinib plus the anti-EGFR antibody, cetuximab (A+C), or mutant-specific EGFR inhibition with AZD9291. A+C and AZD9291 are now also being tested as first-line therapies, but whether these therapies will extend progression-free survival or induce more aggressive forms of resistance in this setting remains unknown. We modeled resistance to multiple generations of anti-EGFR therapies preclinically in order to understand the effects of sequential treatment with anti-EGFR agents on drug resistance and determine the optimal order of treatment. Using a panel of erlotinib/afatinib-resistant cells including a novel patient-derived cell line (VP-2), we found that AZD9291 was more potent than A+C at inhibiting cell growth and EGFR signaling in this setting. 4 of 4 xenograft-derived A+C-resistant cell lines displayed in vitro and in vivo sensitivity to AZD9291, but 4 of 4 AZD9291-resistant cell lines demonstrated cross-resistance to A+C. Addition of cetuximab to AZD9291 did not confer additive benefit in any preclinical disease setting. This work, emphasizing a mechanistic understanding of the effects of therapies on tumor evolution, provides a framework for future clinical trials testing different treatment sequences. This paradigm is applicable to other tumor types in which multiple generations of inhibitors are now available.
PMCID: PMC4338015  PMID: 25477325
Lung cancer; EGFR; AZD9291; afatinib; cetuximab
6.  Transcriptome Sequencing and Genome-wide Association Analyses Reveal Lysosomal Function and Actin Cytoskeleton Remodeling in Schizophrenia and Bipolar Disorder 
Molecular psychiatry  2014;20(5):563-572.
Schizophrenia (SCZ) and bipolar disorder (BPD) are severe mental disorders with high heritability. Clinicians have long noticed the similarities of clinic symptoms between these disorders. In recent years, accumulating evidence indicates some shared genetic liabilities. However, what is shared remains elusive. In this study, we conducted whole transcriptome analysis of postmortem brain tissues (cingulate cortex) from SCZ, BPD and control subjects, and identified differentially expressed genes in these disorders. We found 105 and 153 genes differentially expressed in SCZ and BPD, respectively. By comparing the t-test scores, we found that many of the genes differentially expressed in SCZ and BPD are concordant in their expression level (q ≤ 0.01, 53 genes; q ≤ 0.05, 213 genes; q ≤ 0.1, 885 genes). Using genome-wide association data from the Psychiatric Genomics Consortium, we found that these differentially and concordantly expressed genes were enriched in association signals for both SCZ (p < 10−7 ) and BPD (p = 0.029). To our knowledge, this is the first time that a substantially large number of genes shows concordant expression and association for both SCZ and BPD. Pathway analyses of these genes indicated that they are involved in the lysosome, Fc gamma receptor mediated phagocytosis, regulation of actin skeleton pathways, along with several cancer pathways. Functional analyses of these genes revealed an interconnected pathway network centered on lysosomal function and the regulation of actin cytoskeleton. These pathways and their interacting network were principally confirmed by an independent transcriptome sequencing dataset of hippocampus. Dysregulation of lysosomal function and cytoskeleton remodeling has direct impacts on endocytosis, phagocytosis, exocytosis, vesicle trafficking, neuronal maturation and migration, neurite outgrowth, and synaptic density and plasticity, and different aspects of these processes have been implicated in SCZ and BPD.
PMCID: PMC4326626  PMID: 25113377
7.  A Gene Gravity Model for the Evolution of Cancer Genomes: A Study of 3,000 Cancer Genomes across 9 Cancer Types 
PLoS Computational Biology  2015;11(9):e1004497.
Cancer development and progression result from somatic evolution by an accumulation of genomic alterations. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferation, angiogenesis, and altered anticancer drug responses. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape subsequent evolution of the cancer genome. In this study, we proposed the gene gravity model to study the evolution of cancer genomes by incorporating the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that somatic mutations of a cancer driver gene may drive cancer genome evolution by inducing mutations in other genes. This functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. By quantifying cancer genome evolution using the gene gravity model, we identified six putative cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1). The tumor genomes harboring the nonsynonymous somatic mutations in these genes had a higher mutation density at the genome level compared to the wild-type groups. Furthermore, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes. In summary, this study sheds light on the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by propelling adaptive cancer genome evolution, which would provide new perspectives for cancer research and therapeutics.
Author Summary
Cancer genome instabilities, such as chromosomal instability and microsatellite instability, have been recognized as a hallmark of cancer for several decades. However, distinguishing cancer functional somatic mutations from massive passenger mutations and non-genetic events is a major challenge in cancer research. Massive genomic alterations present researchers with a dilemma: does this somatic genome evolution contribute to cancer, or is it simply a byproduct of cellular processes gone awry? In this study, we developed a new mathematical model to incorporate the genome-wide transcription and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas into a broad gene network. We found that cancer driver genes may shape somatic genome evolution by inducing mutations in other genes in cancer. This functional consequence is often generated by the combined effect of genetic and epigenetic alterations (e.g. chromatin regulation). Moreover, we provided statistical evidence that hypermutation of cancer driver genes on inactive X chromosomes is a general feature in female cancer genomes and found a putative X-inactive specific gene STAG2 in uterine cancer. In summary, this work illustrates the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis through driving adaptive cancer genome evolution.
PMCID: PMC4564226  PMID: 26352260
8.  Studying Tumorigenesis through Network Evolution and Somatic Mutational Perturbations in the Cancer Interactome 
Molecular Biology and Evolution  2014;31(8):2156-2169.
Cells govern biological functions through complex biological networks. Perturbations to networks may drive cells to new phenotypic states, for example, tumorigenesis. Identifying how genetic lesions perturb molecular networks is a fundamental challenge. This study used large-scale human interactome data to systematically explore the relationship among network topology, somatic mutation, evolutionary rate, and evolutionary origin of cancer genes. We found the unique network centrality of cancer proteins, which is largely independent of gene essentiality. Cancer genes likely have experienced a lower evolutionary rate and stronger purifying selection than those of noncancer, Mendelian disease, and orphan disease genes. Cancer proteins tend to have ancient histories, likely originated in early metazoan, although they are younger than proteins encoded by Mendelian disease genes, orphan disease genes, and essential genes. We found that the protein evolutionary origin (age) positively correlates with protein connectivity in the human interactome. Furthermore, we investigated the network-attacking perturbations due to somatic mutations identified from 3,268 tumors across 12 cancer types in The Cancer Genome Atlas. We observed a positive correlation between protein connectivity and the number of nonsynonymous somatic mutations, whereas a weaker or insignificant correlation between protein connectivity and the number of synonymous somatic mutations. These observations suggest that somatic mutational network-attacking perturbations to hub genes play an important role in tumor emergence and evolution. Collectively, this work has broad biomedical implications for both basic cancer biology and the development of personalized cancer therapy.
PMCID: PMC4104318  PMID: 24881052
tumorigenesis; network evolution; network-attacking perturbation; somatic mutation; TCGA
9.  A meta-analysis of somatic mutations from next generation sequencing of 241 melanomas: a road map for the study of genes with potential clinical relevance 
Molecular cancer therapeutics  2014;13(7):1918-1928.
Next generation sequencing (NGS) has been used to characterize the overall genomic landscape of melanomas. Here, we systematically examined mutations from recently published melanoma NGS data involving 241 paired tumor-normal samples to identify potentially clinically relevant mutations. Melanomas were characterized according to an in-house clinical assay that identifies well-known specific recurrent mutations in five driver genes: BRAF (affecting V600), NRAS (G12, G13, and Q61), KIT (W557, V559, L576, K642, and D816), GNAQ (Q209), and GNA11 (Q209). Tumors with none of these mutations are termed “pan-negative”. We then mined the driver mutation-positive and pan-negative melanoma NGS data for mutations in 632 cancer genes that could influence existing or emerging targeted therapies. First, we uncovered several genes whose mutations were more likely associated with BRAF- or NRAS-driven melanomas, including TP53 and COL1A1 with BRAF, and PPP6C, KALRN, PIK3R4, TRPM6, GUCY2C, and PRKAA2 with NRAS. Second, we found that the 69 “pan-negative” melanoma genomes harbored alternate infrequent mutations in the 5 known driver genes along with many mutations in genes encoding guanine nucleotide binding protein α-subunits. Third, we identified 12 significantly mutated genes in “pan-negative” samples (ALK, STK31, DGKI, RAC1, EPHA4, ADAMTS18, EPHA7, ERBB4, TAF1L, NF1, SYK, and KDR), including 5 genes (RAC1, ADAMTS18, EPHA7, TAF1L, and NF1) with a recurrent mutation in at least 2 “pan-negative” tumor samples. This meta-analysis provides a road map for the study of additional potentially actionable genes in both driver mutation-positive and pan-negative melanomas.
PMCID: PMC4090262  PMID: 24755198
Melanoma; Next-generation sequencing; Meta-analysis; Driver mutation; BRAF; NRAS; KIT; GNA11; GNAQ
10.  Deciphering Signaling Pathway Networks to Understand the Molecular Mechanisms of Metformin Action 
PLoS Computational Biology  2015;11(6):e1004202.
A drug exerts its effects typically through a signal transduction cascade, which is non-linear and involves intertwined networks of multiple signaling pathways. Construction of such a signaling pathway network (SPNetwork) can enable identification of novel drug targets and deep understanding of drug action. However, it is challenging to synopsize critical components of these interwoven pathways into one network. To tackle this issue, we developed a novel computational framework, the Drug-specific Signaling Pathway Network (DSPathNet). The DSPathNet amalgamates the prior drug knowledge and drug-induced gene expression via random walk algorithms. Using the drug metformin, we illustrated this framework and obtained one metformin-specific SPNetwork containing 477 nodes and 1,366 edges. To evaluate this network, we performed the gene set enrichment analysis using the disease genes of type 2 diabetes (T2D) and cancer, one T2D genome-wide association study (GWAS) dataset, three cancer GWAS datasets, and one GWAS dataset of cancer patients with T2D on metformin. The results showed that the metformin network was significantly enriched with disease genes for both T2D and cancer, and that the network also included genes that may be associated with metformin-associated cancer survival. Furthermore, from the metformin SPNetwork and common genes to T2D and cancer, we generated a subnetwork to highlight the molecule crosstalk between T2D and cancer. The follow-up network analyses and literature mining revealed that seven genes (CDKN1A, ESR1, MAX, MYC, PPARGC1A, SP1, and STK11) and one novel MYC-centered pathway with CDKN1A, SP1, and STK11 might play important roles in metformin’s antidiabetic and anticancer effects. Some results are supported by previous studies. In summary, our study 1) develops a novel framework to construct drug-specific signal transduction networks; 2) provides insights into the molecular mode of metformin; 3) serves a model for exploring signaling pathways to facilitate understanding of drug action, disease pathogenesis, and identification of drug targets.
Author Summary
A deep understanding of a drug’s mechanisms of actions is essential not only in the discovery of new treatments but also in minimizing adverse effects. Here, we develop a computational framework, the Drug-specific Signaling Pathway Network (DSPathNet), to reconstruct a comprehensive signaling pathway network (SPNetwork) impacted by a particular drug. To illustrate this computational approach, we used metformin, an anti-diabetic drug, as an example. Starting from collecting the metformin-related upstream genes and inferring the metformin-related downstream genes, we built one metformin-specific SPNetwork via random walk based algorithms. Our evaluation of the metformin-specific SPNetwork by using disease genes and genotyping data from genome-wide association studies showed that our DSPathNet approach was efficient to synopsize drug’s key components and their relationship involved in the type 2 diabetes and cancer, even the metformin anticancer activity. This work presents a novel computational framework for constructing individual drug-specific signal transduction networks. Furthermore, its successful application to the drug metformin provides some valuable insights into the mode of metformin action, which will facilitate our understanding of the molecular mechanisms underlying drug treatments, disease pathogenesis, and identification of novel drug targets and repurposed drugs.
PMCID: PMC4470683  PMID: 26083494
11.  Acquired resistance of EGFR-mutant lung adenocarcinomas to afatinib plus cetuximab is associated with activation of mTORC1 
Cell reports  2014;7(4):999-1008.
Patients with EGFR-mutant lung adenocarcinomas (LUADs) who initially respond to first-generation TKIs develop resistance to these drugs. A combination of the irreversible TKI afatinib and the EGFR antibody cetuximab can be used to overcome resistance to first-generation TKIs; however, resistance to this drug combination eventually emerges. We identified activation of the mTORC1 signaling pathway as a mechanism of resistance to dual inhibition of EGFR in mouse models. Addition of rapamycin reversed resistance in vivo. Analysis of afatinib+cetuximab-resistant biopsy specimens revealed the presence of genomic alterations in genes that modulate mTORC1 signaling including NF2 and TSC1. These findings pinpoint enhanced mTORC1 activation as a mechanism of resistance to afatinib+cetuximab and identify genomic mechanisms that lead to activation of this pathway, revealing a potential therapeutic strategy for treating patients with resistance to these drugs.
PMCID: PMC4074596  PMID: 24813888
12.  NGS Catalog: A Database of Next Generation Sequencing Studies in Humans 
Human mutation  2012;33(6):E2341-E2355.
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog,, a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits.
PMCID: PMC4431973  PMID: 22517761
next generation sequencing (NGS); exome sequencing; whole genome sequencing; RNA sequencing; disease genome; gene fusion; database
13.  Clinically relevant genes and regulatory pathways associated with NRASQ61 mutations in melanoma through an integrative genomics approach 
Oncotarget  2014;6(4):2496-2508.
Therapies such as BRAF inhibitors have become standard treatment for melanoma patients whose tumors harbor activating BRAFV600 mutations. However, analogous therapies for inhibiting NRAS mutant signaling have not yet been well established. In this study, we performed an integrative analysis of DNA methylation, gene expression, and microRNA expression data to identify potential regulatory pathways associated with the most common driver mutations in NRAS (Q61K/L/R) through comparison of NRASQ61-mutated melanomas with pan-negative melanomas. Surprisingly, we found dominant hypomethylation (98.03%) in NRASQ61-mutated melanomas. We identified 1,150 and 49 differentially expressed genes and microRNAs, respectively. Integrated functional analyses of alterations in all three data types revealed important signaling pathways associated with NRASQ61 mutations, such as the MAPK pathway, as well as other novel cellular processes, such as axon guidance. Further analysis of the relationship between DNA methylation and gene expression changes revealed 9 hypermethylated and down-regulated genes and 112 hypomethylated and up-regulated genes in NRASQ61 melanomas. Finally, we identified 52 downstream regulatory cascades of three hypomethylated and up-regulated genes (PDGFD, ZEB1, and THRB). Collectively, our observation of predominant gene hypomethylation in NRASQ61 melanomas and the identification of NRASQ61-linked pathways will be useful for the development of targeted therapies against melanomas harboring NRASQ61 mutations.
PMCID: PMC4385866  PMID: 25537510
NRAS; melanoma; driver mutation; DNA methylation; gene expression; regulatory pathway
14.  VERSE: a novel approach to detect virus integration in host genomes through reference genome customization 
Genome Medicine  2015;7(1):2.
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0126-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4333248  PMID: 25699093
15.  Protein-Protein Interaction and Pathway Analyses of Top Schizophrenia Genes Reveal Schizophrenia Susceptibility Genes Converge on Common Molecular Networks and Enrichment of Nucleosome (Chromatin) Assembly Genes in Schizophrenia Susceptibility Loci 
Schizophrenia Bulletin  2013;40(1):39-49.
Recent genome-wide association studies have identified many promising schizophrenia candidate genes and demonstrated that common polygenic variation contributes to schizophrenia risk. However, whether these genes represent perturbations to a common but limited set of underlying molecular processes (pathways) that modulate risk to schizophrenia remains elusive, and it is not known whether these genes converge on common biological pathways (networks) or represent different pathways. In addition, the theoretical and genetic mechanisms underlying the strong genetic heterogeneity of schizophrenia remain largely unknown. Using 4 well-defined data sets that contain top schizophrenia susceptibility genes and applying protein-protein interaction (PPI) network analysis, we investigated the interactions among proteins encoded by top schizophrenia susceptibility genes. We found proteins encoded by top schizophrenia susceptibility genes formed a highly significant interconnected network, and, compared with random networks, these PPI networks are statistically highly significant for both direct connectivity and indirect connectivity. We further validated these results using empirical functional data (transcriptome data from a clinical sample). These highly significant findings indicate that top schizophrenia susceptibility genes encode proteins that significantly directly interacted and formed a densely interconnected network, suggesting perturbations of common underlying molecular processes or pathways that modulate risk to schizophrenia. Our findings that schizophrenia susceptibility genes encode a highly interconnected protein network may also provide a novel explanation for the observed genetic heterogeneity of schizophrenia, ie, mutation in any member of this molecular network will lead to same functional consequences that eventually contribute to risk of schizophrenia.
PMCID: PMC3885298  PMID: 23671194
genome-wide association study; schizophrenia susceptibility genes; protein-protein interaction; common molecular networks; genetic heterogeneity; enrichment
16.  Key regulators in prostate cancer identified by co-expression module analysis 
BMC Genomics  2014;15(1):1015.
Prostate cancer (PrCa) is the most commonly diagnosed cancer in men in the world. Despite the fact that a large number of its genes have been investigated, its etiology remains poorly understood. Furthermore, most PrCa candidate genes have not been rigorously replicated, and the methods by which they biologically function in PrCa remain largely unknown.
Aiming to identify key players in the complex prostate cancer system, we reconstructed PrCa co-expressed modules within functional gene sets defined by the Gene Ontology (GO) annotation (biological process, GO_BP). We primarily identified 118 GO_BP terms that were well-preserved between two independent gene expression datasets and a consequent 55 conserved co-expression modules within them. Five modules were then found to be significantly enriched with PrCa candidate genes collected from expression Quantitative Trait Loci (eQTL), somatic copy number alteration (SCNA), somatic mutation data, or prognostic analyses. Specifically, two transcription factors (TFs) (NFAT and SP1) and three microRNAs (hsa-miR-19a, hsa-miR-15a, and hsa-miR-200b) regulating these five candidate modules were found to be critical to the development of PrCa.
Collectively, our results indicated that genes with similar functions may play important roles in disease through co-expression, and modules with different functions could be regulated by similar genetic components, such as TFs and microRNAs, in a synergistic manner.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1015) contains supplementary material, which is available to authorized users.
PMCID: PMC4258300  PMID: 25418933
Prostate cancer; Co-expression; Gene Ontology; Module; Transcription factor; MicroRNA
17.  MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis 
Genome Biology  2014;15(10):489.
Many cancer genes form mutation hotspots that disrupt their functional domains or active sites, leading to gain- or loss-of-function. We propose a mutation set enrichment analysis (MSEA) implemented by two novel methods, MSEA-clust and MSEA-domain, to predict cancer genes based on mutation hotspot patterns. MSEA methods are evaluated by both simulated and real cancer data. We find approximately 51% of the eligible known cancer genes form detectable mutation hotspots. Application of MSEA in eight cancers reveals a total of 82 genes with mutation hotspots, including well-studied cancer genes, known cancer genes re-found in new cancer types, and novel cancer genes.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0489-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4226881  PMID: 25348067
18.  Genetic Variation in Iron Metabolism Is Associated with Neuropathic Pain and Pain Severity in HIV-Infected Patients on Antiretroviral Therapy 
PLoS ONE  2014;9(8):e103123.
HIV sensory neuropathy and distal neuropathic pain (DNP) are common, disabling complications associated with combination antiretroviral therapy (cART). We previously associated iron-regulatory genetic polymorphisms with a reduced risk of HIV sensory neuropathy during more neurotoxic types of cART. We here evaluated the impact of polymorphisms in 19 iron-regulatory genes on DNP in 560 HIV-infected subjects from a prospective, observational study, who underwent neurological examinations to ascertain peripheral neuropathy and structured interviews to ascertain DNP. Genotype-DNP associations were explored by logistic regression and permutation-based analytical methods. Among 559 evaluable subjects, 331 (59%) developed HIV-SN, and 168 (30%) reported DNP. Fifteen polymorphisms in 8 genes (p<0.05) and 5 variants in 4 genes (p<0.01) were nominally associated with DNP: polymorphisms in TF, TFRC, BMP6, ACO1, SLC11A2, and FXN conferred reduced risk (adjusted odds ratios [ORs] ranging from 0.2 to 0.7, all p<0.05); other variants in TF, CP, ACO1, BMP6, and B2M conferred increased risk (ORs ranging from 1.3 to 3.1, all p<0.05). Risks associated with some variants were statistically significant either in black or white subgroups but were consistent in direction. ACO1 rs2026739 remained significantly associated with DNP in whites (permutation p<0.0001) after correction for multiple tests. Several of the same iron-regulatory-gene polymorphisms, including ACO1 rs2026739, were also associated with severity of DNP (all p<0.05). Common polymorphisms in iron-management genes are associated with DNP and with DNP severity in HIV-infected persons receiving cART. Consistent risk estimates across population subgroups and persistence of the ACO1 rs2026739 association after adjustment for multiple testing suggest that genetic variation in iron-regulation and transport modulates susceptibility to DNP.
PMCID: PMC4140681  PMID: 25144566
19.  Top associated SNPs in prostate cancer are significantly enriched in cis-expression quantitative trait loci and at transcription factor binding sites 
Oncotarget  2014;5(15):6168-6177.
While genome-wide association studies (GWAS) have revealed thousands of disease risk single nucleotide polymorphisms (SNPs), their functions remain largely unknown. Recent studies have suggested the regulatory roles of GWAS risk variants in several common diseases; however, the complex regulatory structure in prostate cancer is unclear.
We investigated the potential regulatory roles of risk variants in two prostate cancer GWAS datasets by their interactions with expression quantitative trait loci (eQTL) and/or transcription factor binding sites (TFBSs) in three populations.
Our results indicated that the moderately associated GWAS SNPs were significantly enriched with cis-eQTLs and TFBSs in Caucasians (CEU), but not in African Americans (AA) or Japanese (JPT); this was also observed in an independent pan-cancer related SNPs from the GWAS Catalog. We found that the eQTL enrichment in the CEU population was tissue-specific to eQTLs from CEU lymphoblastoid cell lines. Importantly, we pinpointed two SNPs, rs2861405 and rs4766642, by overlapping results from cis-eQTL and TFBS as applied to the CEU data.
These results suggested that prostate cancer associated SNPs and pan-cancer associated SNPs are likely to play regulatory roles in CEU. However, the negative enrichment results in AA or JPT and the potential mechanisms remain to be elucidated in additional samples.
PMCID: PMC4171620  PMID: 25026280
prostate cancer; genome-wide association studies; eQTL; TFBS; regulatory variants
20.  Two non-synonymous markers in PTPN21, identified by genome-wide association study data-mining and replication, are associated with schizophrenia 
Schizophrenia research  2011;131(0):43-51.
We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r2>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10×10−3 and rs2274736, P=1.21×10−3). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86–0.97, P=5.45×10−3 and rs2401751, OR = 0.92, 95% CI: 0.86–0.97, P=5.29×10−3). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR = 1.08, 95% CI: 1.02-1.14, P=6.43×10−3). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.
PMCID: PMC4117700  PMID: 21752600
Data-mining; Informatic prioritization; Genetic association study; PTPN21; Non-synonymous SNP
21.  Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives 
Briefings in Bioinformatics  2012;14(4):506-519.
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
PMCID: PMC3713712  PMID: 22877769
gene fusion; next generation sequencing; cancer; whole genome sequencing; transcriptome sequencing; computational tools
22.  Gastric adenocarcinoma has a unique microRNA signature not present in esophageal adenocarcinoma 
Cancer  2013;119(11):1985-1993.
MicroRNAs (miRNAs) play critical roles in tumor development and progression. The fact that a single miRNA can regulate hundreds of genes places miRNAs at critical hubs of signaling pathways. In this study, we investigated the miRNA expression profile in gastric adenocarcinomas and compared it to esophageal adenocarcinomas to better identify a unique miRNA signature of gastric adenocarcinoma.
Methods and Results
The miRNA expression profile was obtained using Agilent and Exiqon microarray platforms on primary gastric adenocarcinoma tissue samples. The cross comparison of results identified 17 up-regulated and 12 down-regulated miRNAs that overlapped in both platforms. Quantitative real-time RT-PCR was performed for independent validation of a representative set of 8 miRNAs in gastric and esophageal adenocarcinomas as compared to normal gastric mucosa or esophageal mucosa, respectively. The de-regulation of miR-146b-5p, -375, -148a, -31, and -451 was significantly associated with gastric adenocarcinomas. On the other hand, de-regulation of miR-21 (up-regulation) and miR-133b (down-regulation) was detectable in both gastric and esophageal adenocarcinomas. Interestingly, miR-200a was significantly down-regulated in gastric adenocarcinoma (p=0.04) but up-regulated in esophageal adenocarcinoma samples (p=0.001). In addition, the expression level of miR-146b-5p displayed a strong correlation with the tumor staging of gastric cancer.
Gastric adenocarcinoma displays a unique miRNA signature that distinguishes it from esophageal adenocarcinoma. This specific signature could reflect differences in the etiology and/or molecular signaling in these two closely related cancers. Our findings suggest important miRNA candidates that can be investigated for their molecular functions and possible diagnostic, prognostic, and therapeutic role in gastric adenocarcinoma.
PMCID: PMC3731210  PMID: 23456798
miRNA; esophageal adenocarcinoma; gastric adenocarcinoma; microarray; prognosis
23.  Quantitative network mapping of the human kinome interactome reveals new clues for rational kinase inhibitor discovery and individualized cancer therapy 
Oncotarget  2014;5(11):3697-3710.
The human kinome is gaining importance through its promising cancer therapeutic targets, yet no general model to address the kinase inhibitor resistance has emerged. Here, we constructed a systems biology-based framework to catalogue the human kinome, including 538 kinase genes, in the broader context of the human interactome. Specifically, we constructed three networks: a kinase-substrate interaction network containing 7,346 pairs connecting 379 kinases to 36,576 phosphorylation sites in 1,961 substrates, a protein-protein interaction network (PPIN) containing 92,699 pairs, and an atomic resolution PPIN containing 4,278 pairs. We identified the conserved regulatory phosphorylation motifs (e.g., Ser/Thr-Pro) using a sequence logo analysis. We found the typical anticancer target selection strategy that uses network hubs as drug targets, might lead to a high adverse drug reaction risk. Furthermore, we found the distinct network centrality of kinases creates a high anticancer drug resistance risk by feedback or crosstalk mechanisms within cellular networks. This notion is supported by the systematic network and pathway analyses that anticancer drug resistance genes are significantly enriched as hubs and heavily participate in multiple signaling pathways. Collectively, this comprehensive human kinome interactome map sheds light on anticancer drug resistance mechanisms and provides an innovative resource for rational kinase inhibitor design.
PMCID: PMC4116514  PMID: 25003367
Kinome; kinase-substrate interaction; phosphorylation; interactome; resistance; systems biology
24.  Network and Pathway Analysis of Cancer Susceptibility (A) 
Cancer Informatics  2014;13(Suppl 5):125-127.
PMCID: PMC4364546  PMID: 25861212
25.  Patterns and processes of somatic mutations in nine major cancers 
BMC Medical Genomics  2014;7:11.
Cancer genomes harbor hundreds to thousands of somatic nonsynonymous mutations. DNA damage and deficiency of DNA repair systems are two major forces to cause somatic mutations, marking cancer genomes with specific somatic mutation patterns. Recently, several pan-cancer genome studies revealed more than 20 mutation signatures across multiple cancer types. However, detailed cancer-type specific mutation signatures and their different features within (intra-) and between (inter-) cancer types remain largely unexplored.
We employed a matrix decomposition algorithm, namely Non-negative Matrix Factorization, to survey the somatic mutations in nine major human cancers, involving a total of ~2100 genomes.
Our results revealed 3-5 independent mutational signatures in each cancer, implying that a range of 3-5 predominant mutational processes likely underlie each cancer genome. Both mutagen exposure (tobacco and sun) and changes in DNA repair systems (APOBEC family, POLE, and MLH1) were found as mutagenesis forces, each of which marks the genome with an evident mutational signature. We studied the features of several signatures and their combinatory patterns within and across cancers. On one hand, we found each signature may influence a cancer genome with different influential magnitudes even in the same cancer type and the signature-specific load reflects intra-cancer heterogeneity (e.g., the smoking-related signature in lung cancer smokers and never smokers). On the other hand, inter-cancer heterogeneity is characterized by combinatory patterns of mutational signatures, where no cancers share the same signature profile, even between two lung cancer subtypes (lung adenocarcinoma and squamous cell lung cancer).
Our work provides a detailed overview of the mutational characteristics in each of nine major cancers and highlights that the mutational signature profile is representative of each cancer.
PMCID: PMC3942057  PMID: 24552141
Somatic mutation; Cancer; Kataegis; Mutation signature; Mutagen; Heterogeneity

Results 1-25 (62)