Search tips
Search criteria

Results 1-24 (24)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
1.  Inflammation induced oxidative stress mediates gene fusion formation in prostate cancer 
Cell reports  2016;17(10):2620-2631.
Approximately 50% of prostate cancers are associated with gene fusions of the androgen-regulated gene, TMPRSS2, to the oncogenic ETS transcription factor, ERG. The three-dimensional proximity of TMPRSS2 and ERG genes, in combination with DNA breaks facilitate the formation of TMPRSS2-ERG gene fusions. However, the origins of DNA breaks that underlie gene fusion formation in prostate cancers are far from clear. We demonstrate a role for inflammation induced oxidative stress in the formation of DNA breaks leading to recurrent TMPRSS2-ERG gene fusions. The transcriptional status and epigenetic features of the target genes influence this effect. Importantly, inflammation induced de novo genomic rearrangements are blocked by homologous recombination (HR) and promoted by non-homologous end-joining (NHEJ) pathways. In conjunction with the association of proliferative inflammatory atrophy (PIA) with human prostate cancer, our results support a working model in which recurrent genomic rearrangements induced by inflammatory stimuli leads to the development of prostate cancer.
While there is considerable evidence in the literature linking inflammation to the development of prostate cancer, there are few direct links to recurrent driver gene mutations. Mani et al. find a role for inflammation-induced oxidative stress in the formation of DNA breaks leading to recurrent TMPRSS2-ERG gene fusions.
PMCID: PMC5147555  PMID: 27926866
2.  Heat Shock Protein Beta-1 Modifies Anterior to Posterior Purkinje Cell Vulnerability in a Mouse Model of Niemann-Pick Type C Disease 
PLoS Genetics  2016;12(5):e1006042.
Selective neuronal vulnerability is characteristic of most degenerative disorders of the CNS, yet mechanisms underlying this phenomenon remain poorly characterized. Many forms of cerebellar degeneration exhibit an anterior-to-posterior gradient of Purkinje cell loss including Niemann-Pick type C1 (NPC) disease, a lysosomal storage disorder characterized by progressive neurological deficits that often begin in childhood. Here, we sought to identify candidate genes underlying vulnerability of Purkinje cells in anterior cerebellar lobules using data freely available in the Allen Brain Atlas. This approach led to the identification of 16 candidate neuroprotective or susceptibility genes. We demonstrate that one candidate gene, heat shock protein beta-1 (HSPB1), promoted neuronal survival in cellular models of NPC disease through a mechanism that involved inhibition of apoptosis. Additionally, we show that over-expression of wild type HSPB1 or a phosphomimetic mutant in NPC mice slowed the progression of motor impairment and diminished cerebellar Purkinje cell loss. We confirmed the modulatory effect of Hspb1 on Purkinje cell degeneration in vivo, as knockdown by Hspb1 shRNA significantly enhanced neuron loss. These results suggest that strategies to promote HSPB1 activity may slow the rate of cerebellar degeneration in NPC disease and highlight the use of bioinformatics tools to uncover pathways leading to neuronal protection in neurodegenerative disorders.
Author Summary
Niemann-Pick type C1 (NPC) disease is an autosomal recessive lipid storage disorder for which there is no effective treatment. Patients develop a clinically heterogeneous phenotype that typically includes childhood onset neurodegeneration and early death. Mice with loss of function mutations in the Npc1 gene model many aspects of the human disease, including cerebellar degeneration that results in marked ataxia. Cerebellar Purkinje cells in mutant mice exhibit striking selective vulnerability, with neuron loss in anterior lobules and preservation in posterior lobules. As this anterior to posterior gradient is reproduced following cell autonomous deletion of Npc1 and is also observed in other forms of cerebellar degeneration, we hypothesized that it is mediated by differential gene expression. To test this notion, we probed the Allen Brain Atlas to identify 16 candidate neuroprotective or susceptibility genes. We confirmed that one of these genes, encoding the small heat shock protein Hspb1, promotes survival in cell culture models of NPC disease. Moreover, we found that modulating Hspb1 expression in NPC mice promoted (following over-expression) or diminished (following knock-down) Purkinje cell survival, confirming its neuroprotective activity. We suggest that this approach may be similarly used in other diseases to uncover pathways that modify selective neuronal vulnerability.
PMCID: PMC4859571  PMID: 27152617
3.  Non-coding RNA LINC00857 is predictive of poor patient survival and promotes tumor progression via cell cycle regulation in lung cancer 
Oncotarget  2016;7(10):11487-11499.
We employed next generation RNA sequencing analysis to reveal dysregulated long non-coding RNAs (lncRNAs) in lung cancer utilizing 461 lung adenocarcinomas (LUAD) and 156 normal lung tissues from 3 separate institutions. We identified 281 lncRNAs with significant differential-expression between LUAD and normal lung tissue. LINC00857, a top deregulated lncRNAs, was overexpressed in tumors and significantly associated with poor survival in LUAD. knockdown of LINC00857 with siRNAs decreased tumor cell proliferation, colony formation, migration and invasion in vitro, as well as tumor growth in vivo. Overexpression of LINC00857 increased cancer cell proliferation, colony formation and invasion. Mechanistic analyses indicated that LINC00857 mediates tumor progression via cell cycle regulation. Our study highlights the diagnostic/prognostic potential of LINC00857 in LUAD besides delineating the functional and mechanistic aspects of its aberrant disease specific expression and potentially using as a new therapeutic target.
PMCID: PMC4905488  PMID: 26862852
non-coding RNA; LINC00857; lung adenocarcinoma; prognosis; diagnosis
4.  Landscape of gene fusions in epithelial cancers: seq and ye shall find 
Genome Medicine  2015;7:129.
Enabled by high-throughput sequencing approaches, epithelial cancers across a range of tissue types are seen to harbor gene fusions as integral to their landscape of somatic aberrations. Although many gene fusions are found at high frequency in several rare solid cancers, apart from fusions involving the ETS family of transcription factors which have been seen in approximately 50 % of prostate cancers, several other common solid cancers have been shown to harbor recurrent gene fusions at low frequencies. On the other hand, many gene fusions involving oncogenes, such as those encoding ALK, RAF or FGFR kinase families, have been detected across multiple different epithelial carcinomas. Tumor-specific gene fusions can serve as diagnostic biomarkers or help define molecular subtypes of tumors; for example, gene fusions involving oncogenes such as ERG, ETV1, TFE3, NUT, POU5F1, NFIB, PLAG1, and PAX8 are diagnostically useful. Tumors with fusions involving therapeutically targetable genes such as ALK, RET, BRAF, RAF1, FGFR1–4, and NOTCH1–3 have immediate implications for precision medicine across tissue types. Thus, ongoing cancer genomic and transcriptomic analyses for clinical sequencing need to delineate the landscape of gene fusions. Prioritization of potential oncogenic “drivers” from “passenger” fusions, and functional characterization of potentially actionable gene fusions across diverse tissue types, will help translate these findings into clinical applications. Here, we review recent advances in gene fusion discovery and the prospects for medicine.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-015-0252-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4683719  PMID: 26684754
5.  Transcriptome Meta-Analysis of Lung Cancer Reveals Recurrent Aberrations in NRG1 and Hippo Pathway Genes 
Nature communications  2014;5:5893.
Lung cancer is emerging as a paradigm for disease molecular subtyping, facilitating targeted therapy based on driving somatic alterations. Here, we perform transcriptome analysis of 153 samples representing lung adenocarcinomas, squamous cell carcinomas, large cell lung cancer, adenoid cystic carcinomas and cell lines. By integrating our data with The Cancer Genome Atlas and published sources, we analyze 753 lung cancer samples for gene fusions and other transcriptomic alterations. We show that higher numbers of gene fusions is an independent prognostic factor for poor survival in lung cancer. Our analysis confirms the recently reported CD74-NRG1 fusion and suggests that NRG1, NF1 and Hippo pathway fusions may play important roles in tumors without known driver mutations. In addition, we observe exon skipping events in c-MET, which are attributable to splice site mutations. These classes of genetic aberrations may play a significant role in the genesis of lung cancers lacking known driver mutations.
PMCID: PMC4274748  PMID: 25531467
7.  Outlier Kinase Expression by RNA Sequencing as Targets for Precision Therapy 
Cancer discovery  2013;3(3):280-293.
Protein kinases represent the most effective class of therapeutic targets in cancer; therefore determination of kinase aberrations is a major focus of cancer genomic studies. Here, we analyzed transcriptome sequencing data from a compendium of 482 cancer and benign samples from 25 different tissue types, and defined distinct ‘outlier kinases’ in individual breast and pancreatic cancer samples, based on highest levels of absolute and differential expression. Frequent outlier kinases in breast cancer included therapeutic targets like ERBB2 and FGFR4, distinct from MET, AKT2, and PLK2 in pancreatic cancer. Outlier kinases imparted sample-specific dependencies in various cell lines as tested by siRNA knockdown and/or pharmacologic inhibition. Outlier expression of polo-like kinases was observed in a subset of KRAS-dependent pancreatic cancer cell lines, and conferred increased sensitivity to the pan-PLK inhibitor BI 6727. Our results suggest that outlier kinases represent effective precision therapeutic targets that are readily identifiable through RNA-sequencing of tumors.
PMCID: PMC3597439  PMID: 23384775
Pancreatic Cancer; RNA-Seq; Kinases; Outlier Expression; Personalized Medicine
9.  Identification of Targetable FGFR Gene Fusions in Diverse Cancers 
Cancer discovery  2013;3(6):636-647.
Through a prospective clinical sequencing program for advanced cancers, four index cases were identified which harbor gene rearrangements of FGFR2 including patients with cholangiocarcinoma, breast cancer, and prostate cancer. After extending our assessment of FGFR rearrangements across multiple tumor cohorts, we identified additional FGFR gene fusions with intact kinase domains in lung squamous cell cancer, bladder cancer, thyroid cancer, oral cancer, glioblastoma, and head and neck squamous cell cancer. All FGFR fusion partners tested exhibit oligomerization capability, suggesting a shared mode of kinase activation. Overexpression of FGFR fusion proteins induced cell proliferation. Two bladder cancer cell lines that harbor FGFR3 fusion proteins exhibited enhanced susceptibility to pharmacologic inhibition in vitro and in vivo. Due to the combinatorial possibilities of FGFR family fusion to a variety of oligomerization partners, clinical sequencing efforts which incorporate transcriptome analysis for gene fusions are poised to identify rare, targetable FGFR fusions across diverse cancer types.
PMCID: PMC3694764  PMID: 23558953
MI-ONCOSEQ; integrative clinical sequencing; FGFR fusions; driver mutations; therapeutic targets
10.  Identification of Recurrent NAB2-STAT6 Gene Fusions in Solitary Fibrous Tumor by Integrative Sequencing 
Nature genetics  2013;45(2):180-185.
A 44-year old woman with recurrent solitary fibrous tumor (SFT)/hemangiopericytoma was enrolled in a clinical sequencing program including whole exome and transcriptome sequencing. A gene fusion of the transcriptional repressor NAB2 with the transcriptional activator STAT6 was detected. Transcriptome sequencing of 27 additional SFTs all revealed the presence of a NAB2-STAT6 gene fusion. Using RT-PCR and sequencing, we detected this fusion in 51 of 51 SFTs, indicating high levels of recurrence. Expression of NAB2-STAT6 fusion proteins was confirmed in SFT, and the predicted fusion products harbor the early growth response (EGR)-binding domain of NAB2 fused to the activation domain of STAT6. Overexpression of the NAB2-STAT6 gene fusion induced proliferation in cultured cells and activated EGR-responsive genes. These studies establish NAB2-STAT6 as the defining driver mutation of SFT and provide an example of how neoplasia can be initiated by converting a transcriptional repressor of mitogenic pathways into a transcriptional activator.
PMCID: PMC3654808  PMID: 23313952
11.  SLC45A3-ELK4 Chimera in Prostate Cancer: Spotlight on Cis-Splicing 
Cancer discovery  2012;2(7):582-585.
Using a series of detailed experiments, Zhang et al establish that the prostate cancer RNA chimera SLC45A3-ELK4 is generated by cis-splicing between the two adjacent genes and does not involve DNA rearrangements or trans-splicing. The chimera expression is induced by androgen treatment likely by overcoming the read-through block imposed by the intergenic CCCTC-insulators bound by CTCF repressor protein. The chimeric transcript, but not wild type ELK4, is shown to augment prostate cancer cell proliferation.
PMCID: PMC3597435  PMID: 22787087
12.  Expressed Pseudogenes in the Transcriptional Landscape of Human Cancers 
Cell  2012;149(7):1622-1634.
Pseudogene transcripts can provide a novel tier of gene regulation through generation of endogenous siRNAs or miRNA-binding sites. Characterization of pseudogene expression, however, has remained confined to anecdotal observations due to analytical challenges posed by the extremely close sequence similarity with their counterpart coding genes. Here, we describe a systematic analysis of pseudogene “transcription” from an RNA-Seq resource of 293 samples, representing 13 cancer and normal tissue types, and observe a surprisingly prevalent, genome-wide expression of pseudogenes that could be categorized as ubiquitously expressed or lineage and/or cancer specific. Further, we explore disease subtype specificity and functions of selected expressed pseudogenes. Taken together, we provide evidence that transcribed pseudogenes are a significant contributor to the transcriptional landscape of cells and are positioned to play significant roles in cellular differentiation and cancer progression, especially in light of the recently described ceRNA networks. Our work provides a transcriptome resource that enables high-throughput analyses of pseudogene expression.
PMCID: PMC3597446  PMID: 22726445
13.  miRConnect 2.0: identification of oncogenic, antagonistic miRNA families in three human cancers 
BMC Genomics  2013;14:179.
Based on their function in cancer micro(mi)RNAs are often grouped as either tumor suppressors or oncogenes. However, miRNAs regulate multiple tumor relevant signaling pathways raising the question whether two oncogenic miRNAs could be functional antagonists by promoting different steps in tumor progression. We recently developed a method to connect miRNAs to biological function by comparing miRNA and gene array expression data from the NCI60 cell lines without using miRNA target predictions (miRConnect).
We have now extended this analysis to three primary human cancers (ovarian cancer, glioblastoma multiforme, and kidney renal clear cell carcinoma) available at the Cancer Genome Atlas (TCGA), and have correlated the expression of the clustered miRNAs with 158 oncogenic signatures (miRConnect 2.0). We have identified functionally antagonistic groups of miRNAs. One group (the agonists), which contains many of the members of the miR-17 family, correlated with c-Myc induced genes and E2F gene signatures. A group that was directly antagonistic to the agonists in all three primary cancers contains miR-221 and miR-222. Since both miR-17 ~ 92 and miR-221/222 are considered to be oncogenic this points to a functional antagonism of different oncogenic miRNAs. Analysis of patient data revealed that in certain patients agonistic miRNAs predominated, whereas in other patients antagonists predominated. In glioblastoma a high ratio of miR-17 to miR-221/222 was predictive of better overall survival suggesting that high miR-221/222 expression is more adverse for patients than high miR-17 expression.
miRConnect 2.0 is useful for identifying activities of miRNAs that are relevant to primary cancers. The new correlation data on miRNAs and mRNAs deregulated in three primary cancers are available at
PMCID: PMC3637148  PMID: 23497354
Oncogenes; Tumor suppressors; Gene array; microRNA (miRNA) groups; NCI60 cell lines
14.  Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study 
Science translational medicine  2011;3(111):111ra121.
Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology.
PMCID: PMC3476478  PMID: 22133722
15.  Gene Fusion Markup Language: a prototype for exchanging gene fusion data 
BMC Bioinformatics  2012;13:269.
An avalanche of next generation sequencing (NGS) studies has generated an unprecedented amount of genomic structural variation data. These studies have also identified many novel gene fusion candidates with more detailed resolution than previously achieved. However, in the excitement and necessity of publishing the observations from this recently developed cutting-edge technology, no community standardization approach has arisen to organize and represent the data with the essential attributes in an interchangeable manner. As transcriptome studies have been widely used for gene fusion discoveries, the current non-standard mode of data representation could potentially impede data accessibility, critical analyses, and further discoveries in the near future.
Here we propose a prototype, Gene Fusion Markup Language (GFML) as an initiative to provide a standard format for organizing and representing the significant features of gene fusion data. GFML will offer the advantage of representing the data in a machine-readable format to enable data exchange, automated analysis interpretation, and independent verification. As this database-independent exchange initiative evolves it will further facilitate the formation of related databases, repositories, and analysis tools. The GFML prototype is made available at
The Gene Fusion Markup Language (GFML) presented here could facilitate the development of a standard format for organizing, integrating and representing the significant features of gene fusion data in an inter-operable and query-able fashion that will enable biologically intuitive access to gene fusion findings and expedite functional characterization. A similar model is envisaged for other NGS data analyses.
PMCID: PMC3607969  PMID: 23072312
16.  Gene Fusions Associated with Recurrent Amplicons Represent a Class of Passenger Aberrations in Breast Cancer12 
Neoplasia (New York, N.Y.)  2012;14(8):702-708.
Application of high-throughput transcriptome sequencing has spurred highly sensitive detection and discovery of gene fusions in cancer, but distinguishing potentially oncogenic fusions from random, “passenger” aberrations has proven challenging. Here we examine a distinctive group of gene fusions that involve genes present in the loci of chromosomal amplifications—a class of oncogenic aberrations that are widely prevalent in breast cancers. Integrative analysis of a panel of 14 breast cancer cell lines comparing gene fusions discovered by high-throughput transcriptome sequencing and genome-wide copy number aberrations assessed by array comparative genomic hybridization, led to the identification of 77 gene fusions, of which more than 60% were localized to amplicons including 17q12, 17q23, 20q13, chr8q, and others. Many of these fusions appeared to be recurrent or involved highly expressed oncogenic drivers, frequently fused with multiple different partners, but sometimes displaying loss of functional domains. As illustrative examples of the “amplicon-associated” gene fusions, we examined here a recurrent gene fusion involving the mediator of mammalian target of rapamycin signaling, RPS6KB1 kinase in BT-474, and the therapeutically important receptor tyrosine kinase EGFR in MDA-MB-468 breast cancer cell line. These gene fusions comprise a minor allelic fraction relative to the highly expressed full-length transcripts and encode chimera lacking the kinase domains, which do not impart dependence on the respective cells. Our study suggests that amplicon-associated gene fusions in breast cancer primarily represent a by-product of chromosomal amplifications, which constitutes a subset of passenger aberrations and should be factored accordingly during prioritization of gene fusion candidates.
PMCID: PMC3431177  PMID: 22952423
17.  Functionally Recurrent Rearrangements of the MAST Kinase and Notch Gene Families in Breast Cancer 
Nature medicine  2011;17(12):1646-1651.
Breast cancer is a heterogeneous disease, exhibiting a wide range of molecular aberrations and clinical outcomes. Here we employed paired-end transcriptome sequencing to explore the landscape of gene fusions in a panel of breast cancer cell lines and tissues. We observed that individual breast cancers harbor an array of expressed gene fusions. We identified two classes of recurrent gene rearrangements involving microtubule associated serine-threonine kinase (MAST) and Notch family genes. Both MAST and Notch family gene fusions exerted significant phenotypic effects in breast epithelial cells. Breast cancer lines harboring Notch gene rearrangements are uniquely sensitive to inhibition of Notch signaling, and over-expression of MAST1 or MAST2 gene fusions had a proliferative effect both in vitro and in vivo. These findings illustrate that recurrent gene rearrangements play significant roles in subsets of carcinomas and suggest that transcriptome sequencing may serve to identify patients with rare, actionable gene fusions.
PMCID: PMC3233654  PMID: 22101766
18.  The tumor suppressor gene rap1GAP is silenced by mir-101-mediated EZH2 overexpression in invasive squamous cell carcinoma 
Oncogene  2011;30(42):4339-4349.
Rap1GAP is a critical tumor suppressor gene that is down-regulated in multiple aggressive cancers such as head and neck squamous cell carcinoma, melanoma and pancreatic cancer. However, the mechanistic basis of rap1GAP down-regulation in cancers is poorly understood. By employing an integrative approach, we demonstrate polycomb mediated repression of rap1GAP that involves EZH2, a histone methyltransferase in head and neck cancers. We further demonstrate that the loss of miR-101 expression correlates with EZH2 up-regulation, and the concomitant down-regulation of rap1GAP in head and neck cancers. EZH2 represses rap1GAP by facilitating the trimethylation of H3K27, a mark of gene repression, and also hypermethylation of rap1GAP promoter. These results provide a conceptual framework involving a microRNA-oncogene-tumor suppressor axis to understand head and neck cancer progression.
PMCID: PMC3154567  PMID: 21532618
mir101; EZH2; rap1GAP; rap1; promoter hypermethylation
19.  Rearrangements of the RAF Kinase Pathway in Prostate Cancer, Gastric Cancer and Melanoma 
Nature medicine  2010;16(7):793-798.
While recurrent gene fusions involving ETS family transcription factors are common in prostate cancer, their products are considered “undruggable” by conventional approaches. Recently, rare “targetable” gene fusions (involving the ALK kinase), have been identified in 1–5% of lung cancers1, suggesting that similar rare gene fusions may occur in other common epithelial cancers including prostate cancer. Here we employed paired-end transcriptome sequencing to screen ETS rearrangement negative prostate cancers for targetable gene fusions and identified the SLC45A3-BRAF and ESRP1-RAF1 gene fusions. Expression of SLC45A3-BRAF or ESRP1-RAF1 in prostate cells induced a neoplastic phenotype that was sensitive to RAF and MEK inhibitors. Screening a large cohort of patients, we found that although rare (1–2%), recurrent rearrangements in the RAF pathway tend to occur in advanced prostate cancers, gastric cancers, and melanoma. Taken together, our results emphasize the importance of RAF rearrangements in cancer, suggest that RAF and MEK inhibitors may be useful in a subset of gene fusion harboring solid tumors, and demonstrate that sequencing of tumor transcriptomes and genomes may lead to the identification of rare targetable fusions across cancer types.
PMCID: PMC2903732  PMID: 20526349
20.  HPeak: an HMM-based algorithm for defining read-enriched regions in ChIP-Seq data 
BMC Bioinformatics  2010;11:369.
Protein-DNA interaction constitutes a basic mechanism for the genetic regulation of target gene expression. Deciphering this mechanism has been a daunting task due to the difficulty in characterizing protein-bound DNA on a large scale. A powerful technique has recently emerged that couples chromatin immunoprecipitation (ChIP) with next-generation sequencing, (ChIP-Seq). This technique provides a direct survey of the cistrom of transcription factors and other chromatin-associated proteins. In order to realize the full potential of this technique, increasingly sophisticated statistical algorithms have been developed to analyze the massive amount of data generated by this method.
Here we introduce HPeak, a Hidden Markov model (HMM)-based Peak-finding algorithm for analyzing ChIP-Seq data to identify protein-interacting genomic regions. In contrast to the majority of available ChIP-Seq analysis software packages, HPeak is a model-based approach allowing for rigorous statistical inference. This approach enables HPeak to accurately infer genomic regions enriched with sequence reads by assuming realistic probability distributions, in conjunction with a novel weighting scheme on the sequencing read coverage.
Using biologically relevant data collections, we found that HPeak showed a higher prevalence of the expected transcription factor binding motifs in ChIP-enriched sequences relative to the control sequences when compared to other currently available ChIP-Seq analysis approaches. Additionally, in comparison to the ChIP-chip assay, ChIP-Seq provides higher resolution along with improved sensitivity and specificity of binding site detection. Additional file and the HPeak program are freely available at
PMCID: PMC2912305  PMID: 20598134
21.  Transcriptome Sequencing to Detect Gene Fusions in Cancer 
Nature  2009;458(7234):97-101.
Recurrent gene fusions, typically associated with hematological malignancies and rare bone and soft tissue tumors1, have been recently described in common solid tumors2–9. Here we employ an integrative analysis of high-throughput long and short read transcriptome sequencing of cancer cells to discover novel gene fusions. As a proof of concept we successfully utilized integrative transcriptome sequencing to “re-discover” the BCR-ABL1 10 gene fusion in a chronic myelogenous leukemia cell line and the TMPRSS2-ERG 2,3 gene fusion in a prostate cancer cell line and tissues. Additionally, we nominated, and experimentally validated, novel gene fusions resulting in chimeric transcripts in cancer cell lines and tumors. Taken together, this study establishes a robust pipeline for the discovery of novel gene chimeras using high throughput sequencing, opening up an important class of cancer-related mutations for comprehensive characterization.
PMCID: PMC2725402  PMID: 19136943
Transcriptome sequencing; Prostate cancer; Bioinformatics; Gene fusions
22.  Metabolomic Profiles Delineate Potential Role for Sarcosine in Prostate Cancer Progression 
Nature  2009;457(7231):910-914.
Multiple, complex molecular events characterize cancer development and progression1,2. Deciphering the molecular networks that distinguish organ-confined disease from metastatic disease may lead to the identification of critical biomarkers for cancer invasion and disease aggressiveness. Although gene and protein expression have been extensively profiled in human tumors, little is known about the global metabolomic alterations that characterize neoplastic progression. Using a combination of high throughput liquid and gas chromatography-based mass spectrometry, we profiled more than 1126 metabolites across 262 clinical samples related to prostate cancer (42 tissues and 110 each of urine and plasma). These unbiased metabolomic profiles were able to distinguish benign prostate, clinically localized prostate cancer, and metastatic disease. Sarcosine, an N-methyl derivative of the amino acid glycine, was identified as a differential metabolite that was highly elevated during prostate cancer progression to metastasis and can be detected non-invasively in urine. Sarcosine levels were also elevated in invasive prostate cancer cell lines relative to benign prostate epithelial cells. Knockdown of glycine-N-methyl transferase (GNMT), the enzyme that generates sarcosine from glycine, attenuated prostate cancer invasion. Addition of exogenous sarcosine or knockdown of the enzyme that leads to sarcosine degradation, sarcosine dehydrogenase (SARDH), induced an invasive phenotype in benign prostate epithelial cells. Androgen receptor and the ERG gene fusion product coordinately regulate components of the sarcosine pathway. Taken together, we profiled the metabolomic alterations of prostate cancer progression revealing sarcosine as a potentially important metabolic intermediary of cancer cell invasion and aggressivity.
PMCID: PMC2724746  PMID: 19212411
23.  Molecular Concepts Analysis Links Tumors, Pathways, Mechanisms, and Drugs1 * 
Neoplasia (New York, N.Y.)  2007;9(5):443-454.
Global molecular profiling of cancers has shown broad utility in delineating pathways and processes underlying disease, in predicting prognosis and response to therapy, and in suggesting novel treatments. To gain further insights from such data, we have integrated and analyzed a comprehensive collection of “molecular concepts” representing > 2500 cancer-related gene expression signatures from Oncomine and manual curation of the literature, drug treatment signatures from the Connectivity Map, target gene sets from genome-scale regulatory motif analyses, and reference gene sets from several gene and protein annotation databases. We computed pairwise association analysis on all 13,364 molecular concepts and identified > 290,000 significant associations, generating hypotheses that link cancer types and subtypes, pathways, mechanisms, and drugs. To navigate a network of associations, we developed an analysis platform, the Molecular Concepts Map. We demonstrate the utility of the approach by highlighting molecular concepts analyses of Myc pathway activation, breast cancer relapse, and retinoic acid treatment.
PMCID: PMC1877973  PMID: 17534450
Cancer; bioinformatics; gene expression signature; network; oncomine
24.  Oncomine 3.0: Genes, Pathways, and Networks in a Collection of 18,000 Cancer Gene Expression Profiles1 
Neoplasia (New York, N.Y.)  2007;9(2):166-180.
DNA microarrays have been widely applied to cancer transcriptome analysis; however, the majority of such data are not easily accessible or comparable. Furthermore, several important analytic approaches have been applied to microarray analysis; however, their application is often limited. To overcome these limitations, we have developed Oncomine, a bioinformatics initiative aimed at collecting, standardizing, analyzing, and delivering cancer transcriptome data to the biomedical research community. Our analysis has identified the genes, pathways, and networks deregulated across 18,000 cancer gene expression microarrays, spanning the majority of cancer types and subtypes. Here, we provide an update on the initiative, describe the database and analysis modules, and highlight several notable observations. Results from this comprehensive analysis are available at
PMCID: PMC1813932  PMID: 17356713
Oncomine; cancer gene expression; microarrays; bioinformatics; differential expression

Results 1-24 (24)