PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-6 (6)
 

Clipboard (0)
None

Select a Filter Below

Journals
Year of Publication
Document Types
1.  Oculus: faster sequence alignment by streaming read compression 
BMC Bioinformatics  2012;13:297.
Background
Despite significant advancement in alignment algorithms, the exponential growth of nucleotide sequencing throughput threatens to outpace bioinformatic analysis. Computation may become the bottleneck of genome analysis if growing alignment costs are not mitigated by further improvement in algorithms. Much gain has been gleaned from indexing and compressing alignment databases, but many widely used alignment tools process input reads sequentially and are oblivious to any underlying redundancy in the reads themselves.
Results
Here we present Oculus, a software package that attaches to standard aligners and exploits read redundancy by performing streaming compression, alignment, and decompression of input sequences. This nearly lossless process (> 99.9%) led to alignment speedups of up to 270% across a variety of data sets, while requiring a modest amount of memory. We expect that streaming read compressors such as Oculus could become a standard addition to existing RNA-Seq and ChIP-Seq alignment pipelines, and potentially other applications in the future as throughput increases.
Conclusions
Oculus efficiently condenses redundant input reads and wraps existing aligners to provide nearly identical SAM output in a fraction of the aligner runtime. It includes a number of useful features, such as tunable performance and fidelity options, compatibility with FASTA or FASTQ files, and adherence to the SAM format. The platform-independent C++ source code is freely available online, at http://code.google.com/p/oculus-bio.
doi:10.1186/1471-2105-13-297
PMCID: PMC3534618  PMID: 23148484
DNA nucleotide sequence alignment streaming identity redundancy compression software algorithm
2.  Personalized Oncology Through Integrative High-Throughput Sequencing: A Pilot Study 
Science translational medicine  2011;3(111):111ra121.
Individual cancers harbor a set of genetic aberrations that can be informative for identifying rational therapies currently available or in clinical trials. We implemented a pilot study to explore the practical challenges of applying high-throughput sequencing in clinical oncology. We enrolled patients with advanced or refractory cancer who were eligible for clinical trials. For each patient, we performed whole-genome sequencing of the tumor, targeted whole-exome sequencing of tumor and normal DNA, and transcriptome sequencing (RNA-Seq) of the tumor to identify potentially informative mutations in a clinically relevant time frame of 3 to 4 weeks. With this approach, we detected several classes of cancer mutations including structural rearrangements, copy number alterations, point mutations, and gene expression alterations. A multidisciplinary Sequencing Tumor Board (STB) deliberated on the clinical interpretation of the sequencing results obtained. We tested our sequencing strategy on human prostate cancer xenografts. Next, we enrolled two patients into the clinical protocol and were able to review the results at our STB within 24 days of biopsy. The first patient had metastatic colorectal cancer in which we identified somatic point mutations in NRAS, TP53, AURKA, FAS, and MYH11, plus amplification and overexpression of cyclin-dependent kinase 8 (CDK8). The second patient had malignant melanoma, in which we identified a somatic point mutation in HRAS and a structural rearrangement affecting CDKN2C. The STB identified the CDK8 amplification and Ras mutation as providing a rationale for clinical trials with CDK inhibitors or MEK (mitogenactivated or extracellular signal–regulated protein kinase kinase) and PI3K (phosphatidylinositol 3-kinase) inhibitors, respectively. Integrative high-throughput sequencing of patients with advanced cancer generates a comprehensive, individual mutational landscape to facilitate biomarker-driven clinical trials in oncology.
doi:10.1126/scitranslmed.3003161
PMCID: PMC3476478  PMID: 22133722
3.  ChimeraScan: a tool for identifying chimeric transcription in sequencing data 
Bioinformatics  2011;27(20):2903-2904.
Summary: Next generation sequencing (NGS) technologies have enabled de novo gene fusion discovery that could reveal candidates with therapeutic significance in cancer. Here we present an open-source software package, ChimeraScan, for the discovery of chimeric transcription between two independent transcripts in high-throughput transcriptome sequencing data.
Availability: http://chimerascan.googlecode.com
Contact: cmaher@dom.wustl.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btr467
PMCID: PMC3187648  PMID: 21840877
4.  TMPRSS2-ERG-mediated feed-forward regulation of wild-type ERG in human prostate cancers 
Cancer Research  2011;71(16):5387-5392.
Recurrent gene fusions involving ETS family genes are a distinguishing feature of human prostate cancers, with TMPRSS2-ERG fusions representing the most common subtype. The TMPRSS2-ERG fusion transcript and its splice variants are well characterized in prostate cancers, however not much is known about the levels and regulation of wild-type ERG. By employing an integrative approach, we demonstrate that the TMPRSS2-ERG gene fusion product binds to the ERG locus and drives the over-expression of wild-type ERG in prostate cancers. Knock-down of TMPRSS2-ERG in VCaP cells resulted in the down regulation of wild-type ERG transcription, while stable over-expression of TMPRSS2-ERG in the gene fusion-negative PC3 cells was associated with the up-regulation of wild-type ERG transcript. Further, androgen signaling-mediated up-regulation of TMPRSS2-ERG resulted in the concomitant up-regulation of wild-type ERG transcription in VCaP cells. The loss of wild-type ERG expression was associated with a decrease in the invasive potential of VCaP cells. Importantly, 38% of clinically localized prostate cancers and 27% of metastatic prostate cancers harboring the TMPRSS2-ERG gene fusions exhibited over-expression of wild-type ERG. Taken together, these results provide novel insights into the regulation of ERG in human prostate cancers.
doi:10.1158/0008-5472.CAN-11-0876
PMCID: PMC3156376  PMID: 21676887
ERG; prostate cancer; gene fusion
5.  Gene Fusions Associated with Recurrent Amplicons Represent a Class of Passenger Aberrations in Breast Cancer12 
Neoplasia (New York, N.Y.)  2012;14(8):702-708.
Application of high-throughput transcriptome sequencing has spurred highly sensitive detection and discovery of gene fusions in cancer, but distinguishing potentially oncogenic fusions from random, “passenger” aberrations has proven challenging. Here we examine a distinctive group of gene fusions that involve genes present in the loci of chromosomal amplifications—a class of oncogenic aberrations that are widely prevalent in breast cancers. Integrative analysis of a panel of 14 breast cancer cell lines comparing gene fusions discovered by high-throughput transcriptome sequencing and genome-wide copy number aberrations assessed by array comparative genomic hybridization, led to the identification of 77 gene fusions, of which more than 60% were localized to amplicons including 17q12, 17q23, 20q13, chr8q, and others. Many of these fusions appeared to be recurrent or involved highly expressed oncogenic drivers, frequently fused with multiple different partners, but sometimes displaying loss of functional domains. As illustrative examples of the “amplicon-associated” gene fusions, we examined here a recurrent gene fusion involving the mediator of mammalian target of rapamycin signaling, RPS6KB1 kinase in BT-474, and the therapeutically important receptor tyrosine kinase EGFR in MDA-MB-468 breast cancer cell line. These gene fusions comprise a minor allelic fraction relative to the highly expressed full-length transcripts and encode chimera lacking the kinase domains, which do not impart dependence on the respective cells. Our study suggests that amplicon-associated gene fusions in breast cancer primarily represent a by-product of chromosomal amplifications, which constitutes a subset of passenger aberrations and should be factored accordingly during prioritization of gene fusion candidates.
PMCID: PMC3431177  PMID: 22952423
6.  Transcriptome Sequencing Identifies PCAT-1, a Novel lincRNA Implicated in Prostate Cancer Progression 
Nature biotechnology  2011;29(8):742-749.
High-throughput sequencing of polyA+ RNA (RNA-Seq) in human cancer shows remarkable potential to identify both novel markers of disease and uncharacterized aspects of tumor biology, particularly non-coding RNA (ncRNA) species. We employed RNA-Seq on a cohort of 102 prostate tissues and cells lines and performed ab initio transcriptome assembly to discover unannotated ncRNAs. We nominated 121 such Prostate Cancer Associated Transcripts (PCATs) with cancer-specific expression patterns. Among these, we characterized PCAT-1 as a novel prostate-specific regulator of cell proliferation and target of the Polycomb Repressive Complex 2 (PRC2). We further found that high PCAT-1 and PRC2 expression stratified patient tissues into molecular subtypes distinguished by expression signatures of PCAT-1-repressed target genes. Taken together, the findings presented herein identify PCAT-1 as a novel transcriptional repressor implicated in subset of prostate cancer patients. These findings establish the utility of RNA-Seq to identify disease-associated ncRNAs that may improve the stratification of cancer subtypes.
doi:10.1038/nbt.1914
PMCID: PMC3152676  PMID: 21804560
prostate cancer; transcriptome; next generation sequencing; non-coding RNA; EZH2

Results 1-6 (6)