Search tips
Search criteria

Results 1-25 (124)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Big data - a 21st century science Maginot Line? No-boundary thinking: shifting from the big data paradigm 
BioData Mining  2015;8:7.
Whether your interests lie in scientific arenas, the corporate world, or in government, you have certainly heard the praises of big data: Big data will give you new insights, allow you to become more efficient, and/or will solve your problems. While big data has had some outstanding successes, many are now beginning to see that it is not the Silver Bullet that it has been touted to be. Here our main concern is the overall impact of big data; the current manifestation of big data is constructing a Maginot Line in science in the 21st century. Big data is not “lots of data” as a phenomena anymore; The big data paradigm is putting the spirit of the Maginot Line into lots of data. Big data overall is disconnecting researchers and science challenges. We propose No-Boundary Thinking (NBT), applying no-boundary thinking in problem defining to address science challenges.
PMCID: PMC4323225  PMID: 25670967
Big data; Maginot Line; No-Boundary thinking
2.  VERSE: a novel approach to detect virus integration in host genomes through reference genome customization 
Genome Medicine  2015;7(1):2.
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0126-6) contains supplementary material, which is available to authorized users.
PMCID: PMC4333248
3.  Protein-Protein Interaction and Pathway Analyses of Top Schizophrenia Genes Reveal Schizophrenia Susceptibility Genes Converge on Common Molecular Networks and Enrichment of Nucleosome (Chromatin) Assembly Genes in Schizophrenia Susceptibility Loci 
Schizophrenia Bulletin  2013;40(1):39-49.
Recent genome-wide association studies have identified many promising schizophrenia candidate genes and demonstrated that common polygenic variation contributes to schizophrenia risk. However, whether these genes represent perturbations to a common but limited set of underlying molecular processes (pathways) that modulate risk to schizophrenia remains elusive, and it is not known whether these genes converge on common biological pathways (networks) or represent different pathways. In addition, the theoretical and genetic mechanisms underlying the strong genetic heterogeneity of schizophrenia remain largely unknown. Using 4 well-defined data sets that contain top schizophrenia susceptibility genes and applying protein-protein interaction (PPI) network analysis, we investigated the interactions among proteins encoded by top schizophrenia susceptibility genes. We found proteins encoded by top schizophrenia susceptibility genes formed a highly significant interconnected network, and, compared with random networks, these PPI networks are statistically highly significant for both direct connectivity and indirect connectivity. We further validated these results using empirical functional data (transcriptome data from a clinical sample). These highly significant findings indicate that top schizophrenia susceptibility genes encode proteins that significantly directly interacted and formed a densely interconnected network, suggesting perturbations of common underlying molecular processes or pathways that modulate risk to schizophrenia. Our findings that schizophrenia susceptibility genes encode a highly interconnected protein network may also provide a novel explanation for the observed genetic heterogeneity of schizophrenia, ie, mutation in any member of this molecular network will lead to same functional consequences that eventually contribute to risk of schizophrenia.
PMCID: PMC3885298  PMID: 23671194
genome-wide association study; schizophrenia susceptibility genes; protein-protein interaction; common molecular networks; genetic heterogeneity; enrichment
4.  Integrated Approach in Systems Biology 
PMCID: PMC4297625  PMID: 25628754
5.  Synergetic regulatory networks mediated by oncogene-driven microRNAs and transcription factors in serous ovarian cancer 
Molecular bioSystems  2013;9(12):10.1039/c3mb70172g.
Although high-grade serous ovarian cancer (OVC) is the most lethal gynecologic malignancy in women, little is known about the regulatory mechanisms in the cellular processes that lead to this cancer. Recently, accumulated lines of evidence have shown that the interplay between transcription factors (TFs) and microRNAs (miRNAs) is critical in cellular regulation during tumorigenesis. A comprehensive investigation of TFs and miRNAs, and their target genes, may provide a deeper understanding of the regulatory mechanisms in the pathology of OVC. In this study, we have integrated three complementary algorithms into a framework, aiming to infer the regulation by miRNAs and TFs in conjunction with gene expression profiles. We demonstrated the utility of our framework by inferring 67 OVC-specific regulatory feed-forward loops (FFL) initiated by miRNAs or TFs in high-grade serous OVC. By analyzing these regulatory behaviors, we found that all the 67 FFLs are consistent in their regulatory effects on genes that jointly targeted by miRNAs and TFs. Remarkably, we unveiled an unbalanced distribution of FFLs with different oncogenic effects. In total, 31 of the 67 coherent FFLs were mainly initiated by oncogenes. On the contrary, only 4 of the FFLs were initiated by tumor suppressor genes. These overwhelmingly observed oncogenic genes were further detected in a sub-network with 32 FFLs centered by miRNA let-7b and TF TCF7L1 to regulate cell differentiation. Closer inspection of 32 FFLs revealed that 75% of the miRNAs reportedly play functional roles in cell differentiation, especially when enriched in epithelial–mesenchymal transitions. This study provides a comprehensive pathophysiological overview of recurring coherent circuits in OVC that are co-regulated by miRNAs and TFs. The prevalence of oncogenic coherent FFLs in serous OVC suggests that oncogene-driven regulatory motifs could cooperatively act upon critical cellular process such as cell differentiation in a highly efficient and consistent manner.
PMCID: PMC3855196  PMID: 24129674
6.  Key regulators in prostate cancer identified by co-expression module analysis 
BMC Genomics  2014;15(1):1015.
Prostate cancer (PrCa) is the most commonly diagnosed cancer in men in the world. Despite the fact that a large number of its genes have been investigated, its etiology remains poorly understood. Furthermore, most PrCa candidate genes have not been rigorously replicated, and the methods by which they biologically function in PrCa remain largely unknown.
Aiming to identify key players in the complex prostate cancer system, we reconstructed PrCa co-expressed modules within functional gene sets defined by the Gene Ontology (GO) annotation (biological process, GO_BP). We primarily identified 118 GO_BP terms that were well-preserved between two independent gene expression datasets and a consequent 55 conserved co-expression modules within them. Five modules were then found to be significantly enriched with PrCa candidate genes collected from expression Quantitative Trait Loci (eQTL), somatic copy number alteration (SCNA), somatic mutation data, or prognostic analyses. Specifically, two transcription factors (TFs) (NFAT and SP1) and three microRNAs (hsa-miR-19a, hsa-miR-15a, and hsa-miR-200b) regulating these five candidate modules were found to be critical to the development of PrCa.
Collectively, our results indicated that genes with similar functions may play important roles in disease through co-expression, and modules with different functions could be regulated by similar genetic components, such as TFs and microRNAs, in a synergistic manner.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-1015) contains supplementary material, which is available to authorized users.
PMCID: PMC4258300  PMID: 25418933
Prostate cancer; Co-expression; Gene Ontology; Module; Transcription factor; MicroRNA
7.  MSEA: detection and quantification of mutation hotspots through mutation set enrichment analysis 
Genome Biology  2014;15(10):489.
Many cancer genes form mutation hotspots that disrupt their functional domains or active sites, leading to gain- or loss-of-function. We propose a mutation set enrichment analysis (MSEA) implemented by two novel methods, MSEA-clust and MSEA-domain, to predict cancer genes based on mutation hotspot patterns. MSEA methods are evaluated by both simulated and real cancer data. We find approximately 51% of the eligible known cancer genes form detectable mutation hotspots. Application of MSEA in eight cancers reveals a total of 82 genes with mutation hotspots, including well-studied cancer genes, known cancer genes re-found in new cancer types, and novel cancer genes.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-014-0489-9) contains supplementary material, which is available to authorized users.
PMCID: PMC4226881  PMID: 25348067
8.  Functional consequences of somatic mutations in cancer using protein pocket-based prioritization approach 
Genome Medicine  2014;6(10):81.
Recently, a number of large-scale cancer genome sequencing projects have generated a large volume of somatic mutations; however, identifying the functional consequences and roles of somatic mutations in tumorigenesis remains a major challenge. Researchers have identified that protein pocket regions play critical roles in the interaction of proteins with small molecules, enzymes, and nucleic acid. As such, investigating the features of somatic mutations in protein pocket regions provides a promising approach to identifying new genotype-phenotype relationships in cancer.
In this study, we developed a protein pocket-based computational approach to uncover the functional consequences of somatic mutations in cancer. We mapped 1.2 million somatic mutations across 36 cancer types from the COSMIC database and The Cancer Genome Atlas (TCGA) onto the protein pocket regions of over 5,000 protein three-dimensional structures. We further integrated cancer cell line mutation profiles and drug pharmacological data from the Cancer Cell Line Encyclopedia (CCLE) onto protein pocket regions in order to identify putative biomarkers for anticancer drug responses.
We found that genes harboring protein pocket somatic mutations were significantly enriched in cancer driver genes. Furthermore, genes harboring pocket somatic mutations tended to be highly co-expressed in a co-expressed protein interaction network. Using a statistical framework, we identified four putative cancer genes (RWDD1, NCF1, PLEK, and VAV3), whose expression profiles were associated with overall poor survival rates in melanoma, lung, or colorectal cancer patients. Finally, genes harboring protein pocket mutations were more likely to be drug-sensitive or drug-resistant. In a case study, we illustrated that the BAX gene was associated with the sensitivity of three anticancer drugs (midostaurin, vinorelbine, and tipifarnib).
This study provides novel insights into the functional consequences of somatic mutations during tumorigenesis and for anticancer drug responses. The computational approach used might be beneficial to the study of somatic mutations in the era of cancer precision medicine.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-014-0081-7) contains supplementary material, which is available to authorized users.
PMCID: PMC4213513  PMID: 25360158
10.  Genetic Variation in Iron Metabolism Is Associated with Neuropathic Pain and Pain Severity in HIV-Infected Patients on Antiretroviral Therapy 
PLoS ONE  2014;9(8):e103123.
HIV sensory neuropathy and distal neuropathic pain (DNP) are common, disabling complications associated with combination antiretroviral therapy (cART). We previously associated iron-regulatory genetic polymorphisms with a reduced risk of HIV sensory neuropathy during more neurotoxic types of cART. We here evaluated the impact of polymorphisms in 19 iron-regulatory genes on DNP in 560 HIV-infected subjects from a prospective, observational study, who underwent neurological examinations to ascertain peripheral neuropathy and structured interviews to ascertain DNP. Genotype-DNP associations were explored by logistic regression and permutation-based analytical methods. Among 559 evaluable subjects, 331 (59%) developed HIV-SN, and 168 (30%) reported DNP. Fifteen polymorphisms in 8 genes (p<0.05) and 5 variants in 4 genes (p<0.01) were nominally associated with DNP: polymorphisms in TF, TFRC, BMP6, ACO1, SLC11A2, and FXN conferred reduced risk (adjusted odds ratios [ORs] ranging from 0.2 to 0.7, all p<0.05); other variants in TF, CP, ACO1, BMP6, and B2M conferred increased risk (ORs ranging from 1.3 to 3.1, all p<0.05). Risks associated with some variants were statistically significant either in black or white subgroups but were consistent in direction. ACO1 rs2026739 remained significantly associated with DNP in whites (permutation p<0.0001) after correction for multiple tests. Several of the same iron-regulatory-gene polymorphisms, including ACO1 rs2026739, were also associated with severity of DNP (all p<0.05). Common polymorphisms in iron-management genes are associated with DNP and with DNP severity in HIV-infected persons receiving cART. Consistent risk estimates across population subgroups and persistence of the ACO1 rs2026739 association after adjustment for multiple testing suggest that genetic variation in iron-regulation and transport modulates susceptibility to DNP.
PMCID: PMC4140681  PMID: 25144566
11.  Methylation of promoters of microRNAs and their host genes in myelodysplastic syndromes 
Leukemia & lymphoma  2013;54(12):2720-2727.
Myelodysplastic syndromes (MDS) are a group of hematopoietic malignancies characterized by ineffective hematopoiesis. Recently, we identified MDS-associated microRNAs (miRNAs) that are down-regulated in MDS. This study examines possible explanations for that observed down-regulation of miRNA expression in MDS. Since genomic losses are insufficient to explain the down-regulation of all our MDS-associated miRNAs, we explored other avenues. We demonstrate that these miRNAs are predominantly intragenic, and that, in many cases, they and their host genes are expressed in a similar pattern during myeloid maturation, suggesting their co-regulation. This co-regulation is further supported by the down-regulation of several of the host genes in MDS and increased methylation of the shared promoters of several miRNAs and their respective host genes. These studies identify a role of hypermethylation of miRNA promoters in the down-regulation of MDS-associated miRNAs, unifying research on miRNAs in MDS and epigenetic regulation in MDS into a common pathway.
PMCID: PMC4120331  PMID: 23547841
Myelodysplastic syndrome; microRNAs; hypermethylation; host genes; myeloid maturation
12.  Top associated SNPs in prostate cancer are significantly enriched in cis-expression quantitative trait loci and at transcription factor binding sites 
Oncotarget  2014;5(15):6168-6177.
While genome-wide association studies (GWAS) have revealed thousands of disease risk single nucleotide polymorphisms (SNPs), their functions remain largely unknown. Recent studies have suggested the regulatory roles of GWAS risk variants in several common diseases; however, the complex regulatory structure in prostate cancer is unclear.
We investigated the potential regulatory roles of risk variants in two prostate cancer GWAS datasets by their interactions with expression quantitative trait loci (eQTL) and/or transcription factor binding sites (TFBSs) in three populations.
Our results indicated that the moderately associated GWAS SNPs were significantly enriched with cis-eQTLs and TFBSs in Caucasians (CEU), but not in African Americans (AA) or Japanese (JPT); this was also observed in an independent pan-cancer related SNPs from the GWAS Catalog. We found that the eQTL enrichment in the CEU population was tissue-specific to eQTLs from CEU lymphoblastoid cell lines. Importantly, we pinpointed two SNPs, rs2861405 and rs4766642, by overlapping results from cis-eQTL and TFBS as applied to the CEU data.
These results suggested that prostate cancer associated SNPs and pan-cancer associated SNPs are likely to play regulatory roles in CEU. However, the negative enrichment results in AA or JPT and the potential mechanisms remain to be elucidated in additional samples.
PMCID: PMC4171620  PMID: 25026280
prostate cancer; genome-wide association studies; eQTL; TFBS; regulatory variants
13.  Two non-synonymous markers in PTPN21, identified by genome-wide association study data-mining and replication, are associated with schizophrenia 
Schizophrenia research  2011;131(0):43-51.
We conducted data-mining analyses of genome wide association (GWA) studies of the CATIE and MGS-GAIN datasets, and found 13 markers in the two physically linked genes, PTPN21 and EML5, showing nominally significant association with schizophrenia. Linkage disequilibrium (LD) analysis indicated that all 7 markers from PTPN21 shared high LD (r2>0.8), including rs2274736 and rs2401751, the two non-synonymous markers with the most significant association signals (rs2401751, P=1.10×10−3 and rs2274736, P=1.21×10−3). In a meta-analysis of all 13 replication datasets with a total of 13,940 subjects, we found that the two non-synonymous markers are significantly associated with schizophrenia (rs2274736, OR=0.92, 95% CI: 0.86–0.97, P=5.45×10−3 and rs2401751, OR = 0.92, 95% CI: 0.86–0.97, P=5.29×10−3). One SNP (rs7147796) in EML5 is also significantly associated with the disease (OR = 1.08, 95% CI: 1.02-1.14, P=6.43×10−3). These 3 markers remain significant after Bonferroni correction. Furthermore, haplotype conditioned analyses indicated that the association signals observed between rs2274736/rs2401751 and rs7147796 are statistically independent. Given the results that 2 non-synonymous markers in PTPN21 are associated with schizophrenia, further investigation of this locus is warranted.
PMCID: PMC4117700  PMID: 21752600
Data-mining; Informatic prioritization; Genetic association study; PTPN21; Non-synonymous SNP
14.  A Tri-Component Conservation Strategy Reveals Highly Confident MicroRNA-mRNA Interactions and Evolution of MicroRNA Regulatory Networks 
PLoS ONE  2014;9(7):e103142.
MicroRNAs are small non-coding RNAs that can regulate expressions of their target genes at the post-transcriptional level. In this study, we propose a tri-component strategy that combines the conservation of microRNAs, homology of mRNA coding regions, and conserved microRNA binding sites in the 3′ untranslated regions to discover conserved microRNA-mRNA interactions. To validate the performance of our conservation strategy, we collected the experimentally validated microRNA-mRNA interactions from three databases as the golden standard. We found that the proposed strategy can improve the performance of existing target prediction algorithms by approximately 2–4 fold. In addition, we demonstrated that the proposed strategy could efficiently retain highly confident interactions from the intersection results of the existing algorithms and filter out the possible false positive predictions in the union one. Furthermore, this strategy can facilitate our ability to trace the homologues in different species that are targeted by the same miRNA family because it combines these three features to identify the conserved miRNA-mRNA interactions during evolution. Through an extensive application of the proposed conservation strategy to a study of the miR-1/206 regulatory network, we demonstrate that the target mRNA recruiting process could be associated with expansion of miRNA family during its evolution. We also uncovered the functional evolution of the miR-1/206 regulatory network. In this network, the early targeted genes tend to participate in more general and development-related functions. In summary, the conservation strategy is capable of helping to highlight the highly confident miRNA-mRNA interactions and can be further applied to reveal the evolutionary features of miRNA regulatory network and functions.
PMCID: PMC4108425  PMID: 25054916
Drug responses vary greatly among individuals due to human genetic variations, which is known as pharmacogenomics (PGx). Much of the PGx knowledge has been embedded in biomedical literature and there is a growing interest to develop text mining approaches to extract such knowledge. In this paper, we present a study to rank candidate gene-drug relations using Latent Dirichlet Allocation (LDA) model. Our approach consists of three steps: 1) recognize gene and drug entities in MEDLINE abstracts; 2) extract candidate gene-drug pairs based on different levels of co-occurrence, including abstract level, sentence level, and phrase level; and 3) rank candidate gene-drug pairs using multiple different methods including term frequency, Chi-square test, Mutual Information (MI), a reported Kullback-Leibler (KL) distance based on topics derived from LDA (LDA-KL), and a newly defined probabilistic KL distance based on LDA (LDA-PKL). We systematically evaluated these methods by using a gold standard data set of gene-drug relations derived from PharmGKB. Our results showed that the proposed LDA-PKL method achieved better Mean Average Precision (MAP) than any other methods, suggesting its promising uses for ranking and detecting PGx relations.
PMCID: PMC4095990  PMID: 22174297
Gene-drug Relation; Latent Dirichlet Allocation; Ranking; Pharmacogenomics
16.  Application of next generation sequencing to human gene fusion detection: computational tools, features and perspectives 
Briefings in Bioinformatics  2012;14(4):506-519.
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
PMCID: PMC3713712  PMID: 22877769
gene fusion; next generation sequencing; cancer; whole genome sequencing; transcriptome sequencing; computational tools
17.  Integrative Genomics and Computational Systems Medicine 
BioMed Research International  2014;2014:945253.
PMCID: PMC4082850  PMID: 25025078
18.  Gastric adenocarcinoma has a unique microRNA signature not present in esophageal adenocarcinoma 
Cancer  2013;119(11):1985-1993.
MicroRNAs (miRNAs) play critical roles in tumor development and progression. The fact that a single miRNA can regulate hundreds of genes places miRNAs at critical hubs of signaling pathways. In this study, we investigated the miRNA expression profile in gastric adenocarcinomas and compared it to esophageal adenocarcinomas to better identify a unique miRNA signature of gastric adenocarcinoma.
Methods and Results
The miRNA expression profile was obtained using Agilent and Exiqon microarray platforms on primary gastric adenocarcinoma tissue samples. The cross comparison of results identified 17 up-regulated and 12 down-regulated miRNAs that overlapped in both platforms. Quantitative real-time RT-PCR was performed for independent validation of a representative set of 8 miRNAs in gastric and esophageal adenocarcinomas as compared to normal gastric mucosa or esophageal mucosa, respectively. The de-regulation of miR-146b-5p, -375, -148a, -31, and -451 was significantly associated with gastric adenocarcinomas. On the other hand, de-regulation of miR-21 (up-regulation) and miR-133b (down-regulation) was detectable in both gastric and esophageal adenocarcinomas. Interestingly, miR-200a was significantly down-regulated in gastric adenocarcinoma (p=0.04) but up-regulated in esophageal adenocarcinoma samples (p=0.001). In addition, the expression level of miR-146b-5p displayed a strong correlation with the tumor staging of gastric cancer.
Gastric adenocarcinoma displays a unique miRNA signature that distinguishes it from esophageal adenocarcinoma. This specific signature could reflect differences in the etiology and/or molecular signaling in these two closely related cancers. Our findings suggest important miRNA candidates that can be investigated for their molecular functions and possible diagnostic, prognostic, and therapeutic role in gastric adenocarcinoma.
PMCID: PMC3731210  PMID: 23456798
miRNA; esophageal adenocarcinoma; gastric adenocarcinoma; microarray; prognosis
19.  Quantitative network mapping of the human kinome interactome reveals new clues for rational kinase inhibitor discovery and individualized cancer therapy 
Oncotarget  2014;5(11):3697-3710.
The human kinome is gaining importance through its promising cancer therapeutic targets, yet no general model to address the kinase inhibitor resistance has emerged. Here, we constructed a systems biology-based framework to catalogue the human kinome, including 538 kinase genes, in the broader context of the human interactome. Specifically, we constructed three networks: a kinase-substrate interaction network containing 7,346 pairs connecting 379 kinases to 36,576 phosphorylation sites in 1,961 substrates, a protein-protein interaction network (PPIN) containing 92,699 pairs, and an atomic resolution PPIN containing 4,278 pairs. We identified the conserved regulatory phosphorylation motifs (e.g., Ser/Thr-Pro) using a sequence logo analysis. We found the typical anticancer target selection strategy that uses network hubs as drug targets, might lead to a high adverse drug reaction risk. Furthermore, we found the distinct network centrality of kinases creates a high anticancer drug resistance risk by feedback or crosstalk mechanisms within cellular networks. This notion is supported by the systematic network and pathway analyses that anticancer drug resistance genes are significantly enriched as hubs and heavily participate in multiple signaling pathways. Collectively, this comprehensive human kinome interactome map sheds light on anticancer drug resistance mechanisms and provides an innovative resource for rational kinase inhibitor design.
PMCID: PMC4116514  PMID: 25003367
Kinome; kinase-substrate interaction; phosphorylation; interactome; resistance; systems biology
20.  A Rapid Association Test Procedure Robust under Different Genetic Models Accounting for Population Stratification 
Human heredity  2013;75(1):23-33.
For genome-wide association studies (GWAS) in case-control data with stratification, a commonly used association test is the generalized Armitage (GA) trend test implemented in the software EIGENSTRAT. The GA trend test uses principal component analysis to correct for population stratification. It usually assumes an additive disease model and can have high power when the underlying disease model is additive or multiplicative, but may have relatively low power when the underlying disease model is recessive or dominant. The purpose of this paper is to provide a test procedure for GWAS with increased power over the GA trend test under the recessive and dominant models while maintaining the power of the GA trend test under the additive and multiplicative models.
We extend a Hardy-Weinberg disequilibrium (HWD) trend test for a homogeneous population to account for population stratification, and then propose a robust association test procedure for GWAS that incorporates information from the extended HWD trend test into the GA trend test.
Results and Conclusions
Our simulation studies and application of our method to a GWAS data set indicate that our proposed method can achieve the purpose described above.
PMCID: PMC3786013  PMID: 23571404
generalized sequential Bonferroni procedure; genome-wide association studies; Hardy-Weinberg trend test; robust test; recessive model
21.  Patterns and processes of somatic mutations in nine major cancers 
BMC Medical Genomics  2014;7:11.
Cancer genomes harbor hundreds to thousands of somatic nonsynonymous mutations. DNA damage and deficiency of DNA repair systems are two major forces to cause somatic mutations, marking cancer genomes with specific somatic mutation patterns. Recently, several pan-cancer genome studies revealed more than 20 mutation signatures across multiple cancer types. However, detailed cancer-type specific mutation signatures and their different features within (intra-) and between (inter-) cancer types remain largely unexplored.
We employed a matrix decomposition algorithm, namely Non-negative Matrix Factorization, to survey the somatic mutations in nine major human cancers, involving a total of ~2100 genomes.
Our results revealed 3-5 independent mutational signatures in each cancer, implying that a range of 3-5 predominant mutational processes likely underlie each cancer genome. Both mutagen exposure (tobacco and sun) and changes in DNA repair systems (APOBEC family, POLE, and MLH1) were found as mutagenesis forces, each of which marks the genome with an evident mutational signature. We studied the features of several signatures and their combinatory patterns within and across cancers. On one hand, we found each signature may influence a cancer genome with different influential magnitudes even in the same cancer type and the signature-specific load reflects intra-cancer heterogeneity (e.g., the smoking-related signature in lung cancer smokers and never smokers). On the other hand, inter-cancer heterogeneity is characterized by combinatory patterns of mutational signatures, where no cancers share the same signature profile, even between two lung cancer subtypes (lung adenocarcinoma and squamous cell lung cancer).
Our work provides a detailed overview of the mutational characteristics in each of nine major cancers and highlights that the mutational signature profile is representative of each cancer.
PMCID: PMC3942057  PMID: 24552141
Somatic mutation; Cancer; Kataegis; Mutation signature; Mutagen; Heterogeneity
22.  Network-Assisted Prediction of Potential Drugs for Addiction 
BioMed Research International  2014;2014:258784.
Drug addiction is a chronic and complex brain disease, adding much burden on the community. Though numerous efforts have been made to identify the effective treatment, it is necessary to find more novel therapeutics for this complex disease. As network pharmacology has become a promising approach for drug repurposing, we proposed to apply the approach to drug addiction, which might provide new clues for the development of effective addiction treatment drugs. We first extracted 44 addictive drugs from the NIDA and their targets from DrugBank. Then, we constructed two networks: an addictive drug-target network and an expanded addictive drug-target network by adding other drugs that have at least one common target with these addictive drugs. By performing network analyses, we found that those addictive drugs with similar actions tended to cluster together. Additionally, we predicted 94 nonaddictive drugs with potential pharmacological functions to the addictive drugs. By examining the PubMed data, 51 drugs significantly cooccurred with addictive keywords than expected. Thus, the network analyses provide a list of candidate drugs for further investigation of their potential in addiction treatment or risk.
PMCID: PMC3932722  PMID: 24689033
23.  VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data 
PLoS Computational Biology  2014;10(2):e1003460.
A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
Author Summary
A cancer genome typically harbors both driver mutations, which contribute to tumorigenesis, and passenger mutations, which tend to be neutral and occur randomly. Cancer genomes differ dramatically due to genetic and environmental factors. A major challenge in interpreting the large volume of mutation data identified in cancer genomes using next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. Applying our approach in a large cohort of lung adenocarcinoma samples and melanoma samples, we derived a consensus mutation subnetwork for each cancer containing significantly enriched cancer genes and cancer-related functional pathways. Our results indicated that driver genes occur within a broad spectrum of frequency, interact with each other, and converge in several key pathways that play critical roles in tumorigenesis.
PMCID: PMC3916227  PMID: 24516372
24.  An evidence-based knowledgebase of pulmonary arterial hypertension to identify genes and pathways relevant to pathogenesis† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c3mb70496c Click here for additional data file.  
Molecular Biosystems  2014;10(4):732-740.
First literature-based, high-quality gene resource focused on pulmonary arterial hypertension (PAH) to identify genes and pathways relevant to PAH pathogenesis.
Pulmonary arterial hypertension (PAH) is a major progressive form of pulmonary hypertension (PH) with more than 4800 patients in the United States. In the last two decades, many studies have identified numerous genes associated with this disease. However, there is no comprehensive research resource for PAH or other PH types that integrates various genetic studies and their related biological information. Thus, the number of associated genes, and their strength of evidence, is unclear. In this study, we tested the hypothesis that a web-based knowledgebase could be used to develop a biological map of highly interrelated, functionally important genes in PAH. We developed the pulmonary arterial hypertension knowledgebase (PAHKB,, a comprehensive database with a user-friendly web interface. PAHKB extracts genetic data from all available sources, including those from association studies, genetic mutation, gene expression, animal model, supporting literature, various genomic annotations, gene networks, cellular and regulatory pathways, as well as microRNAs. Moreover, PAHKB provides online tools for data browsing and searching, data integration, pathway graphical presentation, and gene ranking. In the current release, PAHKB contains 341 human PH-related genes (293 protein coding and 48 non-coding genes) curated from over 1000 PubMed abstracts. Based on the top 39 ranked PAH-related genes in PAHKB, we constructed a core biological map. This core map was enriched with the TGF-beta signaling pathway, focal adhesion, cytokine–cytokine receptor interaction, and MAPK signaling. In addition, the reconstructed map elucidates several novel cancer signaling pathways, which may provide clues to support the application of anti-cancer therapeutics to PAH. In summary, we have developed a system for the identification of core PH-related genes and identified critical signaling pathways that may be relevant to PAH pathogenesis. This system can be easily applied to other pulmonary diseases.
PMCID: PMC3950334  PMID: 24448676
25.  Interdisciplinary dialogue for education, collaboration, and innovation: Intelligent Biology and Medicine in and beyond 2013 
BMC Genomics  2013;14(Suppl 8):S1.
The 2013 International Conference on Intelligent Biology and Medicine (ICIBM 2013) was held on August 11-13, 2013 in Nashville, Tennessee, USA. The conference included six scientific sessions, two tutorial sessions, one workshop, two poster sessions, and four keynote presentations that covered cutting-edge research topics in bioinformatics, systems biology, computational medicine, and intelligent computing. Here, we present a summary of the conference and an editorial report of the supplements to BMC Genomics and BMC Systems Biology that include 19 research papers selected from ICIBM 2013.
PMCID: PMC4042234  PMID: 24564388

Results 1-25 (124)