Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Potential of fecal microbiota for early-stage detection of colorectal cancer 
Molecular Systems Biology  2014;10(11):766.
Several bacterial species have been implicated in the development of colorectal carcinoma (CRC), but CRC-associated changes of fecal microbiota and their potential for cancer screening remain to be explored. Here, we used metagenomic sequencing of fecal samples to identify taxonomic markers that distinguished CRC patients from tumor-free controls in a study population of 156 participants. Accuracy of metagenomic CRC detection was similar to the standard fecal occult blood test (FOBT) and when both approaches were combined, sensitivity improved > 45% relative to the FOBT, while maintaining its specificity. Accuracy of metagenomic CRC detection did not differ significantly between early- and late-stage cancer and could be validated in independent patient and control populations (N = 335) from different countries. CRC-associated changes in the fecal microbiome at least partially reflected microbial community composition at the tumor itself, indicating that observed gene pool differences may reveal tumor-related host–microbe interactions. Indeed, we deduced a metabolic shift from fiber degradation in controls to utilization of host carbohydrates and amino acids in CRC patients, accompanied by an increase of lipopolysaccharide metabolism.
PMCID: PMC4299606  PMID: 25432777
cancer screening; colorectal cancer; fecal biomarkers; human gut microbiome; metagenomics
3.  Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis 
Bioinformatics  2014;30(9):1300-1301.
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at, (ii) a public Galaxy instance at, (iii) a git repository containing all installed software (; most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3998122  PMID: 24413671
4.  Characterization of drug-induced transcriptional modules: towards drug repositioning and functional understanding 
Drug-induced transcriptional modules (biclusters) were identified and annotated in three human cell lines and rat liver. These were used to assess conservation across systems and to infer and experimentally validate novel drug effects and gene functions.
Biclustering of drug-induced gene expression profiles resulted in modules of drugs and genes, which were enriched in both drug and gene annotations.Identifying drug-induced transcriptional modules separately in three human cell lines and rat liver allows assessment of their conservation across model systems. About 70% of modules are conserved across cell lines, a lower bound of 15% was estimated for their conservation across organisms, and between the in vitro and in vivo systems.Drug-induced transcriptional modules can predict novel gene functions. A conserved module associated with (chole)sterol metabolism revealed novel regulators of cellular cholesterol homeostasis; 10 of them were validated in functional imaging assays.Analysis of drugs clustered into modules can give new insights into their mechanisms of action and provide leads for drug repositioning. We predicted and experimentally validated novel cell cycle inhibitors and modulators of PPARγ, estrogen and adrenergic receptors, with potential for developing new therapies against diabetes and cancer.
In pharmacology, it is crucial to understand the complex biological responses that drugs elicit in the human organism and how well they can be inferred from model organisms. We therefore identified a large set of drug-induced transcriptional modules from genome-wide microarray data of drug-treated human cell lines and rat liver, and first characterized their conservation. Over 70% of these modules were common for multiple cell lines and 15% were conserved between the human in vitro and the rat in vivo system. We then illustrate the utility of conserved and cell-type-specific drug-induced modules by predicting and experimentally validating (i) gene functions, e.g., 10 novel regulators of cellular cholesterol homeostasis and (ii) new mechanisms of action for existing drugs, thereby providing a starting point for drug repositioning, e.g., novel cell cycle inhibitors and new modulators of α-adrenergic receptor, peroxisome proliferator-activated receptor and estrogen receptor. Taken together, the identified modules reveal the conservation of transcriptional responses towards drugs across cell types and organisms, and improve our understanding of both the molecular basis of drug action and human biology.
PMCID: PMC3658274  PMID: 23632384
cell line models in drug discovery; drug-induced transcriptional modules; drug repositioning; gene function prediction; transcriptome conservation across cell types and organisms
5.  Prediction of Drug Combinations by Integrating Molecular and Pharmacological Data 
PLoS Computational Biology  2011;7(12):e1002323.
Combinatorial therapy is a promising strategy for combating complex disorders due to improved efficacy and reduced side effects. However, screening new drug combinations exhaustively is impractical considering all possible combinations between drugs. Here, we present a novel computational approach to predict drug combinations by integrating molecular and pharmacological data. Specifically, drugs are represented by a set of their properties, such as their targets or indications. By integrating several of these features, we show that feature patterns enriched in approved drug combinations are not only predictive for new drug combinations but also provide insights into mechanisms underlying combinatorial therapy. Further analysis confirmed that among our top ranked predictions of effective combinations, 69% are supported by literature, while the others represent novel potential drug combinations. We believe that our proposed approach can help to limit the search space of drug combinations and provide a new way to effectively utilize existing drugs for new purposes.
Author Summary
The combination of distinct drugs in combinatorial therapy can help to improve therapeutic efficacy by overcoming the redundancy and robustness of pathogenic processes, or by lowering the risk of side effects. However, identification of effective drug combinations is cumbersome, considering the possible search space with respect to the large number of drugs that could potentially be combined. In this work, we explore various molecular and pharmacological features of drugs, and show that by utilizing combinations of such features it is possible to predict new drug combinations. Benchmarking the approach using approved drug combinations demonstrates that these feature combinations are indeed predictive and can propose promising new drug combinations. In addition, the enriched feature patterns provide insights into the mechanisms underlying drug combinations. For example, they suggest that if two drugs share targets or therapeutic effects, they can be independently combined with a third common drug. The ability to efficiently predict drug combinations should facilitate the development of more efficient drug therapies for a broader range of indications including hard-to-treat complex diseases.
PMCID: PMC3248384  PMID: 22219721
7.  Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project 
Gerstein, Mark B. | Lu, Zhi John | Van Nostrand, Eric L. | Cheng, Chao | Arshinoff, Bradley I. | Liu, Tao | Yip, Kevin Y. | Robilotto, Rebecca | Rechtsteiner, Andreas | Ikegami, Kohta | Alves, Pedro | Chateigner, Aurelien | Perry, Marc | Morris, Mitzi | Auerbach, Raymond K. | Feng, Xin | Leng, Jing | Vielle, Anne | Niu, Wei | Rhrissorrakrai, Kahn | Agarwal, Ashish | Alexander, Roger P. | Barber, Galt | Brdlik, Cathleen M. | Brennan, Jennifer | Brouillet, Jeremy Jean | Carr, Adrian | Cheung, Ming-Sin | Clawson, Hiram | Contrino, Sergio | Dannenberg, Luke O. | Dernburg, Abby F. | Desai, Arshad | Dick, Lindsay | Dosé, Andréa C. | Du, Jiang | Egelhofer, Thea | Ercan, Sevinc | Euskirchen, Ghia | Ewing, Brent | Feingold, Elise A. | Gassmann, Reto | Good, Peter J. | Green, Phil | Gullier, Francois | Gutwein, Michelle | Guyer, Mark S. | Habegger, Lukas | Han, Ting | Henikoff, Jorja G. | Henz, Stefan R. | Hinrichs, Angie | Holster, Heather | Hyman, Tony | Iniguez, A. Leo | Janette, Judith | Jensen, Morten | Kato, Masaomi | Kent, W. James | Kephart, Ellen | Khivansara, Vishal | Khurana, Ekta | Kim, John K. | Kolasinska-Zwierz, Paulina | Lai, Eric C. | Latorre, Isabel | Leahey, Amber | Lewis, Suzanna | Lloyd, Paul | Lochovsky, Lucas | Lowdon, Rebecca F. | Lubling, Yaniv | Lyne, Rachel | MacCoss, Michael | Mackowiak, Sebastian D. | Mangone, Marco | McKay, Sheldon | Mecenas, Desirea | Merrihew, Gennifer | Miller, David M. | Muroyama, Andrew | Murray, John I. | Ooi, Siew-Loon | Pham, Hoang | Phippen, Taryn | Preston, Elicia A. | Rajewsky, Nikolaus | Rätsch, Gunnar | Rosenbaum, Heidi | Rozowsky, Joel | Rutherford, Kim | Ruzanov, Peter | Sarov, Mihail | Sasidharan, Rajkumar | Sboner, Andrea | Scheid, Paul | Segal, Eran | Shin, Hyunjin | Shou, Chong | Slack, Frank J. | Slightam, Cindie | Smith, Richard | Spencer, William C. | Stinson, E. O. | Taing, Scott | Takasaki, Teruaki | Vafeados, Dionne | Voronina, Ksenia | Wang, Guilin | Washington, Nicole L. | Whittle, Christina M. | Wu, Beijing | Yan, Koon-Kiu | Zeller, Georg | Zha, Zheng | Zhong, Mei | Zhou, Xingliang | Ahringer, Julie | Strome, Susan | Gunsalus, Kristin C. | Micklem, Gos | Liu, X. Shirley | Reinke, Valerie | Kim, Stuart K. | Hillier, LaDeana W. | Henikoff, Steven | Piano, Fabio | Snyder, Michael | Stein, Lincoln | Lieb, Jason D. | Waterston, Robert H.
Science (New York, N.Y.)  2010;330(6012):1775-1787.
We systematically generated large-scale data sets to improve genome annotation for the nematode Caenorhabditis elegans, a key model organism. These data sets include transcriptome profiling across a developmental time course, genome-wide identification of transcription factor–binding sites, and maps of chromatin organization. From this, we created more complete and accurate gene models, including alternative splice forms and candidate noncoding RNAs. We constructed hierarchical networks of transcription factor–binding and microRNA interactions and discovered chromosomal locations bound by an unusually large number of transcription factors. Different patterns of chromatin composition and histone modification were revealed between chromosome arms and centers, with similarly prominent differences between autosomes and the X chromosome. Integrating data types, we built statistical models relating chromatin, transcription factor binding, and gene expression. Overall, our analyses ascribed putative functions to most of the conserved genome.
PMCID: PMC3142569  PMID: 21177976
8.  Network Neighbors of Drug Targets Contribute to Drug Side-Effect Similarity 
PLoS ONE  2011;6(7):e22187.
In pharmacology, it is essential to identify the molecular mechanisms of drug action in order to understand adverse side effects. These adverse side effects have been used to infer whether two drugs share a target protein. However, side-effect similarity of drugs could also be caused by their target proteins being close in a molecular network, which as such could cause similar downstream effects. In this study, we investigated the proportion of side-effect similarities that is due to targets that are close in the network compared to shared drug targets. We found that only a minor fraction of side-effect similarities (5.8 %) are caused by drugs targeting proteins close in the network, compared to side-effect similarities caused by overlapping drug targets (64%). Moreover, these targets that cause similar side effects are more often in a linear part of the network, having two or less interactions, than drug targets in general. Based on the examples, we gained novel insight into the molecular mechanisms of side effects associated with several drug targets. Looking forward, such analyses will be extremely useful in the process of drug development to better understand adverse side effects.
PMCID: PMC3135612  PMID: 21765950
9.  Support vector machines-based identification of alternative splicing in Arabidopsis thaliana from whole-genome tiling arrays 
BMC Bioinformatics  2011;12:55.
Alternative splicing (AS) is a process which generates several distinct mRNA isoforms from the same gene by splicing different portions out of the precursor transcript. Due to the (patho-)physiological importance of AS, a complete inventory of AS is of great interest. While this is in reach for human and mammalian model organisms, our knowledge of AS in plants has remained more incomplete. Experimental approaches for monitoring AS are either based on transcript sequencing or rely on hybridization to DNA microarrays. Among the microarray platforms facilitating the discovery of AS events, tiling arrays are well-suited for identifying intron retention, the most prevalent type of AS in plants. However, analyzing tiling array data is challenging, because of high noise levels and limited probe coverage.
In this work, we present a novel method to detect intron retentions (IR) and exon skips (ES) from tiling arrays. While statistical tests have typically been proposed for this purpose, our method instead utilizes support vector machines (SVMs) which are appreciated for their accuracy and robustness to noise. Existing EST and cDNA sequences served for supervised training and evaluation. Analyzing a large collection of publicly available microarray and sequence data for the model plant A. thaliana, we demonstrated that our method is more accurate than existing approaches. The method was applied in a genome-wide screen which resulted in the discovery of 1,355 IR events. A comparison of these IR events to the TAIR annotation and a large set of short-read RNA-seq data showed that 830 of the predicted IR events are novel and that 525 events (39%) overlap with either the TAIR annotation or the IR events inferred from the RNA-seq data.
The method developed in this work expands the scarce repertoire of analysis tools for the identification of alternative mRNA splicing from whole-genome tiling arrays. Our predictions are highly enriched with known AS events and complement the A. thaliana genome annotation with respect to AS. Since all predicted AS events can be precisely attributed to experimental conditions, our work provides a basis for follow-up studies focused on the elucidation of the regulatory mechanisms underlying tissue-specific and stress-dependent AS in plants.
PMCID: PMC3051901  PMID: 21324185
10.  mGene.web: a web service for accurate computational gene finding 
Nucleic Acids Research  2009;37(Web Server issue):W312-W316.
We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).
PMCID: PMC2703990  PMID: 19494180
11.  Comprehensive analysis of Arabidopsis expression level polymorphisms with simple inheritance 
In Arabidopsis thaliana, gene expression level polymorphisms (ELPs) between natural accessions that exhibit simple, single locus inheritance are promising quantitative trait locus (QTL) candidates to explain phenotypic variability. It is assumed that such ELPs overwhelmingly represent regulatory element polymorphisms. However, comprehensive genome-wide analyses linking expression level, regulatory sequence and gene structure variation are missing, preventing definite verification of this assumption. Here, we analyzed ELPs observed between the Eil-0 and Lc-0 accessions. Compared with non-variable controls, 5′ regulatory sequence variation in the corresponding genes is indeed increased. However, ∼42% of all the ELP genes also carry major transcription unit deletions in one parent as revealed by genome tiling arrays, representing a >4-fold enrichment over controls. Within the subset of ELPs with simple inheritance, this proportion is even higher and deletions are generally more severe. Similar results were obtained from analyses of the Bay-0 and Sha accessions, using alternative technical approaches. Collectively, our results suggest that drastic structural changes are a major cause for ELPs with simple inheritance, corroborating experimentally observed indel preponderance in cloned Arabidopsis QTL.
PMCID: PMC2657532  PMID: 19225455
Arabidopsis; eQTL; expression level polymorphism; heritability of expression; QTL; structural gene variation
12.  At-TAX: a whole genome tiling array resource for developmental expression analysis and transcript identification in Arabidopsis thaliana 
Genome Biology  2008;9(7):R112.
A developmental expression atlas, At-TAX, based on whole-genome tiling arrays, is presented along with associated analysis methods.
Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage.
PMCID: PMC2530869  PMID: 18613972

Results 1-12 (12)