Search tips
Search criteria

Results 1-10 (10)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  A chemo-centric view of human health and disease 
Nature communications  2014;5:5676.
Efforts to compile the phenotypic effects of drugs and environmental chemicals offer the opportunity to adopt a chemo-centric view of human health that does not require detailed mechanistic information. Here, we consider thousands of chemicals and analyze the relationship of their structures with adverse and therapeutic responses. Our study includes molecules related to the etiology of 934 health threatening conditions and used to treat 835 diseases. We first identify chemical moieties that could be independently associated with each phenotypic effect. Using these fragments, we build accurate predictors for approximately 400 clinical phenotypes, finding many privileged and liable structures. Finally, we connect two diseases if they relate to similar chemical structures. The resulting networks of human conditions are able to predict disease comorbidities, as well as identifying potential drug side effects and opportunities for drug repositioning, and show a remarkable coincidence with clinical observations.
PMCID: PMC4338530  PMID: 25435099
Fragment Mining; Disease Models; Disease Networks
The annals of applied statistics  2014;8(1):309-330.
RNA-sequencing has revolutionized biomedical research and, in particular, our ability to study gene alternative splicing. The problem has important implications for human health, as alternative splicing may be involved in malfunctions at the cellular level and multiple diseases. However, the high-dimensional nature of the data and the existence of experimental biases pose serious data analysis challenges. We find that the standard data summaries used to study alternative splicing are severely limited, as they ignore a substantial amount of valuable information. Current data analysis methods are based on such summaries and are hence sub-optimal. Further, they have limited flexibility in accounting for technical biases. We propose novel data summaries and a Bayesian modeling framework that overcome these limitations and determine biases in a non-parametric, highly flexible manner. These summaries adapt naturally to the rapid improvements in sequencing technology. We provide efficient point estimates and uncertainty assessments. The approach allows to study alternative splicing patterns for individual samples and can also be the basis for downstream analyses. We found a several fold improvement in estimation mean square error compared popular approaches in simulations, and substantially higher consistency between replicates in experimental data. Our findings indicate the need for adjusting the routine summarization and analysis of alternative splicing RNA-seq studies. We provide a software implementation in the R package casper*
PMCID: PMC4005600  PMID: 24795787
Alternative Splicing; RNA-Seq; Bayesian modelling; Estimation
3.  Bayesian Model Selection in High-Dimensional Settings 
Journal of the American Statistical Association  2012;107(498):10.1080/01621459.2012.682536.
Standard assumptions incorporated into Bayesian model selection procedures result in procedures that are not competitive with commonly used penalized likelihood methods. We propose modifications of these methods by imposing nonlocal prior densities on model parameters. We show that the resulting model selection procedures are consistent in linear model settings when the number of possible covariates p is bounded by the number of observations n, a property that has not been extended to other model selection procedures. In addition to consistently identifying the true model, the proposed procedures provide accurate estimates of the posterior probability that each identified model is correct. Through simulation studies, we demonstrate that these model selection procedures perform as well or better than commonly used penalized likelihood methods in a range of simulation settings. Proofs of the primary theorems are provided in the Supplementary Material that is available online.
PMCID: PMC3867525  PMID: 24363474
Adaptive LASSO; Dantzig selector; Elastic net; g-prior; Intrinsic Bayes factor; Intrinsic prior; Nonlocal prior; Nonnegative garrote; Oracle
4.  chroGPS, a global chromatin positioning system for the functional analysis and visualization of the epigenome 
Nucleic Acids Research  2013;42(4):2126-2137.
Development of tools to jointly visualize the genome and the epigenome remains a challenge. chroGPS is a computational approach that addresses this question. chroGPS uses multidimensional scaling techniques to represent similarity between epigenetic factors, or between genetic elements on the basis of their epigenetic state, in 2D/3D reference maps. We emphasize biological interpretability, statistical robustness, integration of genetic and epigenetic data from heterogeneous sources, and computational feasibility. Although chroGPS is a general methodology to create reference maps and study the epigenetic state of any class of genetic element or genomic region, we focus on two specific kinds of maps: chroGPSfactors, which visualizes functional similarities between epigenetic factors, and chroGPSgenes, which describes the epigenetic state of genes and integrates gene expression and other functional data. We use data from the modENCODE project on the genomic distribution of a large collection of epigenetic factors in Drosophila, a model system extensively used to study genome organization and function. Our results show that the maps allow straightforward visualization of relationships between factors and elements, capturing relevant information about their functional properties that helps to interpret epigenetic information in a functional context and derive testable hypotheses.
PMCID: PMC3936722  PMID: 24271395
5.  An Integrated Model of the Transcriptome of HER2-Positive Breast Cancer 
PLoS ONE  2013;8(11):e79298.
Our goal in these analyses was to use genomic features from a test set of primary breast tumors to build an integrated transcriptome landscape model that makes relevant hypothetical predictions about the biological and/or clinical behavior of HER2-positive breast cancer. We interrogated RNA-Seq data from benign breast lesions, ER+, triple negative, and HER2-positive tumors to identify 685 differentially expressed genes, 102 alternatively spliced genes, and 303 genes that expressed single nucleotide sequence variants (eSNVs) that were associated with the HER2-positive tumors in our survey panel. These features were integrated into a transcriptome landscape model that identified 12 highly interconnected genomic modules, each of which represents a cellular processes pathway that appears to define the genomic architecture of the HER2-positive tumors in our test set. The generality of the model was confirmed by the observation that several key pathways were enriched in HER2-positive TCGA breast tumors. The ability of this model to make relevant predictions about the biology of breast cancer cells was established by the observation that integrin signaling was linked to lapatinib sensitivity in vitro and strongly associated with risk of relapse in the NCCTG N9831 adjuvant trastuzumab clinical trial dataset. Additional modules from the HER2 transcriptome model, including ubiquitin-mediated proteolysis, TGF-beta signaling, RHO-family GTPase signaling, and M-phase progression, were linked to response to lapatinib and paclitaxel in vitro and/or risk of relapse in the N9831 dataset. These data indicate that an integrated transcriptome landscape model derived from a test set of HER2-positive breast tumors has potential for predicting outcome and for identifying novel potential therapeutic strategies for this breast cancer subtype.
PMCID: PMC3815156  PMID: 24223926
6.  Dependency of colorectal cancer on a TGF-beta-driven programme in stromal cells for metastasis initiation 
Cancer cell  2012;22(5):571-584.
A large proportion of colorectal cancers (CRCs) display mutational inactivation of the TGF-beta pathway yet paradoxically, they are characterized by elevated TGF-beta production. Here, we unveil a prometastatic programme induced by TGF-beta in the microenvironment that associates with a high-risk of CRC relapse upon treatment. The activity of TGF-beta on stromal cells increases the efficiency of organ colonization by CRC cells whereas mice treated with a pharmacological inhibitor of TGFBR1 are resilient to metastasis formation. Secretion of IL11 by TGF-beta-stimulated cancer-associated fibroblasts (CAFs) triggers GP130/STAT3 signalling in tumour cells. This crosstalk confers a survival advantage to metastatic cells. The dependency on the TGF-beta stromal programme for metastasis initiation could be exploited to improve the diagnosis and treatment of CRC.
PMCID: PMC3512565  PMID: 23153532
7.  Sequential stopping for high-throughput experiments 
In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential benefits in obtaining additional data. The underlying decision-theoretic framework guarantees the design to proceed in a coherent fashion. We propose intuitively appealing, easy-to-implement utility functions. As in most sequential design problems, an exact solution is prohibitive. We propose a simulation-based approximation that uses decision boundaries. We apply the method to RNA-seq, microarray, and reverse-phase protein array studies and show its potential advantages. The approach has been added to the Bioconductor package gaga.
PMCID: PMC3520501  PMID: 22908218
Decision theory; Forward simulation; High-throughput experiments; multiple testing; Optimal design; Sample size; Sequential design
8.  dKDM5/LID regulates H3K4me3 dynamics at the transcription-start site (TSS) of actively transcribed developmental genes 
Nucleic Acids Research  2012;40(19):9493-9505.
H3K4me3 is a histone modification that accumulates at the transcription-start site (TSS) of active genes and is known to be important for transcription activation. The way in which H3K4me3 is regulated at TSS and the actual molecular basis of its contribution to transcription remain largely unanswered. To address these questions, we have analyzed the contribution of dKDM5/LID, the main H3K4me3 demethylase in Drosophila, to the regulation of the pattern of H3K4me3. ChIP-seq results show that, at developmental genes, dKDM5/LID localizes at TSS and regulates H3K4me3. dKDM5/LID target genes are highly transcribed and enriched in active RNApol II and H3K36me3, suggesting a positive contribution to transcription. Expression-profiling show that, though weakly, dKDM5/LID target genes are significantly downregulated upon dKDM5/LID depletion. Furthermore, dKDM5/LID depletion results in decreased RNApol II occupancy, particularly by the promoter-proximal Pol lloser5 form. Our results also show that ASH2, an evolutionarily conserved factor that locates at TSS and is required for H3K4me3, binds and positively regulates dKDM5/LID target genes. However, dKDM5/LID and ASH2 do not bind simultaneously and recognize different chromatin states, enriched in H3K4me3 and not, respectively. These results indicate that, at developmental genes, dKDM5/LID and ASH2 coordinately regulate H3K4me3 at TSS and that this dynamic regulation contributes to transcription.
PMCID: PMC3479210  PMID: 22904080
9.  Deep Sequence Analysis of Non-Small Cell Lung Cancer: Integrated Analysis of Gene Expression, Alternative Splicing, and Single Nucleotide Variations in Lung Adenocarcinomas with and without Oncogenic KRAS Mutations 
KRAS mutations are highly prevalent in non-small cell lung cancer (NSCLC), and tumors harboring these mutations tend to be aggressive and resistant to chemotherapy. We used next-generation sequencing technology to identify pathways that are specifically altered in lung tumors harboring a KRAS mutation. Paired-end RNA-sequencing of 15 primary lung adenocarcinoma tumors (8 harboring mutant KRAS and 7 with wild-type KRAS) were performed. Sequences were mapped to the human genome, and genomic features, including differentially expressed genes, alternate splicing isoforms and single nucleotide variants, were determined for tumors with and without KRAS mutation using a variety of computational methods. Network analysis was carried out on genes showing differential expression (374 genes), alternate splicing (259 genes), and SNV-related changes (65 genes) in NSCLC tumors harboring a KRAS mutation. Genes exhibiting two or more connections from the lung adenocarcinoma network were used to carry out integrated pathway analysis. The most significant signaling pathways identified through this analysis were the NFκB, ERK1/2, and AKT pathways. A 27 gene mutant KRAS-specific sub network was extracted based on gene–gene connections from the integrated network, and interrogated for druggable targets. Our results confirm previous evidence that mutant KRAS tumors exhibit activated NFκB, ERK1/2, and AKT pathways and may be preferentially sensitive to target therapeutics toward these pathways. In addition, our analysis indicates novel, previously unappreciated links between mutant KRAS and the TNFR and PPARγ signaling pathways, suggesting that targeted PPARγ antagonists and TNFR inhibitors may be useful therapeutic strategies for treatment of mutant KRAS lung tumors. Our study is the first to integrate genomic features from RNA-Seq data from NSCLC and to define a first draft genomic landscape model that is unique to tumors with oncogenic KRAS mutations.
PMCID: PMC3356053  PMID: 22655260
transcriptome sequencing; RNA-Seq; KRAS mutation; NSCLC; bioinformatics; network analysis; data integration and computational methods
10.  Chimeric tRNAs as tools to induce proteome damage and identify components of stress responses 
Nucleic Acids Research  2009;38(5):e30.
Misfolded proteins are caused by genomic mutations, aberrant splicing events, translation errors or environmental factors. The accumulation of misfolded proteins is a phenomenon connected to several human disorders, and is managed by stress responses specific to the cellular compartments being affected. In wild-type cells these mechanisms of stress response can be experimentally induced by expressing recombinant misfolded proteins or by incubating cells with large concentrations of amino acid analogues. Here, we report a novel approach for the induction of stress responses to protein aggregation. Our method is based on engineered transfer RNAs that can be expressed in cells or tissues, where they actively integrate in the translation machinery causing general proteome substitutions. This strategy allows for the introduction of mutations of increasing severity randomly in the proteome, without exposing cells to unnatural compounds. Here, we show that this approach can be used for the differential activation of the stress response in the Endoplasmic Reticulum (ER). As an example of the applications of this method, we have applied it to the identification of human microRNAs activated or repressed during unfolded protein stress.
PMCID: PMC2836549  PMID: 20007146

Results 1-10 (10)