Search tips
Search criteria

Results 1-25 (129)

Clipboard (0)

Select a Filter Below

Year of Publication
more »
1.  Extreme genomic erosion after recurrent demographic bottlenecks in the highly endangered Iberian lynx 
Genome Biology  2016;17:251.
Genomic studies of endangered species provide insights into their evolution and demographic history, reveal patterns of genomic erosion that might limit their viability, and offer tools for their effective conservation. The Iberian lynx (Lynx pardinus) is the most endangered felid and a unique example of a species on the brink of extinction.
We generate the first annotated draft of the Iberian lynx genome and carry out genome-based analyses of lynx demography, evolution, and population genetics. We identify a series of severe population bottlenecks in the history of the Iberian lynx that predate its known demographic decline during the 20th century and have greatly impacted its genome evolution. We observe drastically reduced rates of weak-to-strong substitutions associated with GC-biased gene conversion and increased rates of fixation of transposable elements. We also find multiple signatures of genetic erosion in the two remnant Iberian lynx populations, including a high frequency of potentially deleterious variants and substitutions, as well as the lowest genome-wide genetic diversity reported so far in any species.
The genomic features observed in the Iberian lynx genome may hamper short- and long-term viability through reduced fitness and adaptive potential. The knowledge and resources developed in this study will boost the research on felid evolution and conservation genomics and will benefit the ongoing conservation and management of this emblematic species.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1090-1) contains supplementary material, which is available to authorized users.
PMCID: PMC5155386  PMID: 27964752
Conservation genomics; Genetic diversity; Inbreeding; Genetic drift; Lynx
2.  Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells 
Chen, Lu | Ge, Bing | Casale, Francesco Paolo | Vasquez, Louella | Kwan, Tony | Garrido-Martín, Diego | Watt, Stephen | Yan, Ying | Kundu, Kousik | Ecker, Simone | Datta, Avik | Richardson, David | Burden, Frances | Mead, Daniel | Mann, Alice L. | Fernandez, Jose Maria | Rowlston, Sophia | Wilder, Steven P. | Farrow, Samantha | Shao, Xiaojian | Lambourne, John J. | Redensek, Adriana | Albers, Cornelis A. | Amstislavskiy, Vyacheslav | Ashford, Sofie | Berentsen, Kim | Bomba, Lorenzo | Bourque, Guillaume | Bujold, David | Busche, Stephan | Caron, Maxime | Chen, Shu-Huang | Cheung, Warren | Delaneau, Oliver | Dermitzakis, Emmanouil T. | Elding, Heather | Colgiu, Irina | Bagger, Frederik O. | Flicek, Paul | Habibi, Ehsan | Iotchkova, Valentina | Janssen-Megens, Eva | Kim, Bowon | Lehrach, Hans | Lowy, Ernesto | Mandoli, Amit | Matarese, Filomena | Maurano, Matthew T. | Morris, John A. | Pancaldi, Vera | Pourfarzad, Farzin | Rehnstrom, Karola | Rendon, Augusto | Risch, Thomas | Sharifi, Nilofar | Simon, Marie-Michelle | Sultan, Marc | Valencia, Alfonso | Walter, Klaudia | Wang, Shuang-Yin | Frontini, Mattia | Antonarakis, Stylianos E. | Clarke, Laura | Yaspo, Marie-Laure | Beck, Stephan | Guigo, Roderic | Rico, Daniel | Martens, Joost H.A. | Ouwehand, Willem H. | Kuijpers, Taco W. | Paul, Dirk S. | Stunnenberg, Hendrik G. | Stegle, Oliver | Downes, Kate | Pastinen, Tomi | Soranzo, Nicole
Cell  2016;167(5):1398-1414.e24.
Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.
Graphical Abstract
•Genome, transcriptome, and epigenome reference panel in three human immune cell types•Identified 4,418 genes associated with epigenetic changes independent of genetics•Described genome-epigenome coordination defining cell-type-specific regulatory events•Functionally mapped disease mechanisms at 345 unique autoimmune disease loci
As part of the IHEC consortium, this study integrates genetic, epigenetic, and transcriptomic profiling in three immune cell types from nearly 200 people to characterize the distinct and cooperative contributions of diverse genomic inputs to transcriptional variation. Explore the Cell Press IHEC web portal at
PMCID: PMC5119954  PMID: 27863251
immune; monocyte; neutrophil; t-cell; EWAS; histone modification; DNA methylation; transription; allele specific; QTL
4.  Identifying ELIXIR Core Data Resources 
F1000Research  2016;5:ELIXIR-2422.
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are available long-term and that the life-cycles of these resources are managed such that they support the scientific needs of the life-sciences, including biological research.
ELIXIR Core Data Resources are defined as a set of European data resources that are of fundamental importance to the wider life-science community and the long-term preservation of biological data. They are complete collections of generic value to life-science, are considered an authority in their field with respect to one or more characteristics, and show high levels of scientific quality and service. Thus, ELIXIR Core Data Resources are of wide applicability and usage.
This paper describes the structures, governance and processes that support the identification and evaluation of ELIXIR Core Data Resources. It identifies key indicators which reflect the essence of the definition of an ELIXIR Core Data Resource and support the promotion of excellence in resource development and operation. It describes the specific indicators in more detail and explains their application within ELIXIR’s sustainability strategy and science policy actions, and in capacity building, life-cycle management and technical actions.
Establishing the portfolio of ELIXIR Core Data Resources and ELIXIR Services is a key priority for ELIXIR and publicly marks the transition towards a cohesive infrastructure.
PMCID: PMC5070591  PMID: 27803796
ELIXIR; Sustainability; Data resources; Indicators; Capacity building; Infrastructure; Bioinformatics; Life sciences
5.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy 
Jiang, Yuxiang | Oron, Tal Ronnen | Clark, Wyatt T. | Bankapur, Asma R. | D’Andrea, Daniel | Lepore, Rosalba | Funk, Christopher S. | Kahanda, Indika | Verspoor, Karin M. | Ben-Hur, Asa | Koo, Da Chen Emily | Penfold-Brown, Duncan | Shasha, Dennis | Youngs, Noah | Bonneau, Richard | Lin, Alexandra | Sahraeian, Sayed M. E. | Martelli, Pier Luigi | Profiti, Giuseppe | Casadio, Rita | Cao, Renzhi | Zhong, Zhaolong | Cheng, Jianlin | Altenhoff, Adrian | Skunca, Nives | Dessimoz, Christophe | Dogan, Tunca | Hakala, Kai | Kaewphan, Suwisa | Mehryary, Farrokh | Salakoski, Tapio | Ginter, Filip | Fang, Hai | Smithers, Ben | Oates, Matt | Gough, Julian | Törönen, Petri | Koskinen, Patrik | Holm, Liisa | Chen, Ching-Tai | Hsu, Wen-Lian | Bryson, Kevin | Cozzetto, Domenico | Minneci, Federico | Jones, David T. | Chapman, Samuel | BKC, Dukka | Khan, Ishita K. | Kihara, Daisuke | Ofer, Dan | Rappoport, Nadav | Stern, Amos | Cibrian-Uhalte, Elena | Denny, Paul | Foulger, Rebecca E. | Hieta, Reija | Legge, Duncan | Lovering, Ruth C. | Magrane, Michele | Melidoni, Anna N. | Mutowo-Meullenet, Prudence | Pichler, Klemens | Shypitsyna, Aleksandra | Li, Biao | Zakeri, Pooya | ElShal, Sarah | Tranchevent, Léon-Charles | Das, Sayoni | Dawson, Natalie L. | Lee, David | Lees, Jonathan G. | Sillitoe, Ian | Bhat, Prajwal | Nepusz, Tamás | Romero, Alfonso E. | Sasidharan, Rajkumar | Yang, Haixuan | Paccanaro, Alberto | Gillis, Jesse | Sedeño-Cortés, Adriana E. | Pavlidis, Paul | Feng, Shou | Cejuela, Juan M. | Goldberg, Tatyana | Hamp, Tobias | Richter, Lothar | Salamov, Asaf | Gabaldon, Toni | Marcet-Houben, Marina | Supek, Fran | Gong, Qingtian | Ning, Wei | Zhou, Yuanpeng | Tian, Weidong | Falda, Marco | Fontana, Paolo | Lavezzo, Enrico | Toppo, Stefano | Ferrari, Carlo | Giollo, Manuel | Piovesan, Damiano | Tosatto, Silvio C.E. | del Pozo, Angela | Fernández, José M. | Maietta, Paolo | Valencia, Alfonso | Tress, Michael L. | Benso, Alfredo | Di Carlo, Stefano | Politano, Gianfranco | Savino, Alessandro | Rehman, Hafeez Ur | Re, Matteo | Mesiti, Marco | Valentini, Giorgio | Bargsten, Joachim W. | van Dijk, Aalt D. J. | Gemovic, Branislava | Glisic, Sanja | Perovic, Vladmir | Veljkovic, Veljko | Veljkovic, Nevena | Almeida-e-Silva, Danillo C. | Vencio, Ricardo Z. N. | Sharan, Malvika | Vogel, Jörg | Kansakar, Lakesh | Zhang, Shanshan | Vucetic, Slobodan | Wang, Zheng | Sternberg, Michael J. E. | Wass, Mark N. | Huntley, Rachael P. | Martin, Maria J. | O’Donovan, Claire | Robinson, Peter N. | Moreau, Yves | Tramontano, Anna | Babbitt, Patricia C. | Brenner, Steven E. | Linial, Michal | Orengo, Christine A. | Rost, Burkhard | Greene, Casey S. | Mooney, Sean D. | Friedberg, Iddo | Radivojac, Predrag
Genome Biology  2016;17(1):184.
A major bottleneck in our understanding of the molecular underpinnings of life is the assignment of function to proteins. While molecular experiments provide the most reliable annotation of proteins, their relatively low throughput and restricted purview have led to an increasing role for computational function prediction. However, assessing methods for protein function prediction and tracking progress in the field remain challenging.
We conducted the second critical assessment of functional annotation (CAFA), a timed challenge to assess computational methods that automatically assign protein function. We evaluated 126 methods from 56 research groups for their ability to predict biological functions using Gene Ontology and gene-disease associations using Human Phenotype Ontology on a set of 3681 proteins from 18 species. CAFA2 featured expanded analysis compared with CAFA1, with regards to data set size, variety, and assessment metrics. To review progress in the field, the analysis compared the best methods from CAFA1 to those of CAFA2.
The top-performing methods in CAFA2 outperformed those from CAFA1. This increased accuracy can be attributed to a combination of the growing number of experimental annotations and improved methods for function prediction. The assessment also revealed that the definition of top-performing algorithms is ontology specific, that different performance metrics can be used to probe the nature of accurate predictions, and the relative diversity of predictions in the biological process and human phenotype ontologies. While there was methodological improvement between CAFA1 and CAFA2, the interpretation of results and usefulness of individual methods remain context-dependent.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1037-6) contains supplementary material, which is available to authorized users.
PMCID: PMC5015320  PMID: 27604469
Protein function prediction; Disease gene prioritization
6.  The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge 
Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing solutions to problems of practical interest. Most notably, community-oriented initiatives such as the BioCreative challenge have enabled controlled environments for the comparison of automatic systems while pursuing practical biomedical tasks. Under this scenario, the present work describes the Markyt Web-based document curation platform, which has been implemented to support the visualisation, prediction and benchmark of chemical and gene mention annotations at BioCreative/CHEMDNER challenge. Creating this platform is an important step for the systematic and public evaluation of automatic prediction systems and the reusability of the knowledge compiled for the challenge. Markyt was not only critical to support the manual annotation and annotation revision process but also facilitated the comparative visualisation of automated results against the manually generated Gold Standard annotations and comparative assessment of generated results. We expect that future biomedical text mining challenges and the text mining community may benefit from the Markyt platform to better explore and interpret annotations and improve automatic system predictions.
Database URL:,
PMCID: PMC5001550  PMID: 27542845
7.  Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity 
Genome Biology  2016;17:152.
Network analysis is a powerful way of modeling chromatin interactions. Assortativity is a network property used in social sciences to identify factors affecting how people establish social ties. We propose a new approach, using chromatin assortativity, to integrate the epigenomic landscape of a specific cell type with its chromatin interaction network and thus investigate which proteins or chromatin marks mediate genomic contacts.
We use high-resolution promoter capture Hi-C and Hi-Cap data as well as ChIA-PET data from mouse embryonic stem cells to investigate promoter-centered chromatin interaction networks and calculate the presence of specific epigenomic features in the chromatin fragments constituting the nodes of the network. We estimate the association of these features with the topology of four chromatin interaction networks and identify features localized in connected areas of the network. Polycomb group proteins and associated histone marks are the features with the highest chromatin assortativity in promoter-centered networks. We then ask which features distinguish contacts amongst promoters from contacts between promoters and other genomic elements. We observe higher chromatin assortativity of the actively elongating form of RNA polymerase 2 (RNAPII) compared with inactive forms only in interactions between promoters and other elements.
Contacts among promoters and between promoters and other elements have different characteristic epigenomic features. We identify a possible role for the elongating form of RNAPII in mediating interactions among promoters, enhancers, and transcribed gene bodies. Our approach facilitates the study of multiple genome-wide epigenomic profiles, considering network topology and allowing the comparison of chromatin interaction networks.
Electronic supplementary material
The online version of this article (doi:10.1186/s13059-016-1003-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4939006  PMID: 27391817
Assortativity; 3D genome; Chromatin Interaction Network; Embryonic stem cells; Epigenomics; Promoter Capture Hi-C; Enhancers; Polycomb; RNA polymerase
8.  KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily 
BMC Genomics  2016;17(Suppl 2):396.
The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.
KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB. KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set. Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified.
A public implementation of KinMutRF, including documentation and examples, is available online ( The source code for local installation is released under a GPL version 3 license, and can be downloaded from
KinMutRF is capable of classifying kinase variation with good performance. Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance. This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-016-2723-1) contains supplementary material, which is available to authorized users.
PMCID: PMC4928150  PMID: 27357839
Protein kinases; Variant prioritization; Pathogenicity prediction; Functional impact; X-linked agammaglobulinemia
9.  POLE and POLD1 mutations in 529 kindred with familial colorectal cancer and/or polyposis: review of reported cases and recommendations for genetic testing and surveillance 
Genetics in Medicine  2015;18(4):325-332.
Germ-line mutations in the exonuclease domains of POLE and POLD1 have been recently associated with polyposis and colorectal cancer (CRC) predisposition. Here, we aimed to gain a better understanding of the phenotypic characteristics of this syndrome to establish specific criteria for POLE and POLD1 mutation screening and to help define the clinical management of mutation carriers.
Genet Med 18 4, 325–332.
The exonuclease domains of POLE and POLD1 were studied in 529 kindred, 441 with familial nonpolyposis CRC and 88 with polyposis, by using pooled DNA amplification and massively parallel sequencing.
Genet Med 18 4, 325–332.
Seven novel or rare genetic variants were identified. In addition to the POLE p.L424V recurrent mutation in a patient with polyposis, CRC and oligodendroglioma, six novel or rare POLD1 variants (four of them, p.D316H, p.D316G, p.R409W, and p.L474P, with strong evidence for pathogenicity) were identified in nonpolyposis CRC families. Phenotypic data from these and previously reported POLE/POLD1 carriers point to an associated phenotype characterized by attenuated or oligo-adenomatous colorectal polyposis, CRC, and probably brain tumors. In addition, POLD1 mutations predispose to endometrial and breast tumors.
Genet Med 18 4, 325–332.
Our results widen the phenotypic spectrum of the POLE/POLD1-associated syndrome and identify novel pathogenic variants. We propose guidelines for genetic testing and surveillance recommendations.
Genet Med 18 4, 325–332.
PMCID: PMC4823640  PMID: 26133394
adenomatous polyposis; genetic testing; hereditary nonpolyposis colorectal cancer; polymerase proofreading-associated polyposis
11.  Correction to “Analyzing the First Drafts of the Human Proteome” 
Journal of proteome research  2015;14(4):1991.
PMCID: PMC4777874  PMID: 25756922
12.  Most Highly Expressed Protein-Coding Genes Have a Single Dominant Isoform 
Journal of proteome research  2015;14(4):1880-1887.
Although eukaryotic cells express a wide range of alternatively spliced transcripts, it is not clear whether genes tend to express a range of transcripts simultaneously across cells, or produce dominant isoforms in a manner that is either tissue-specific or regardless of tissue. To date, large-scale investigations into the pattern of transcript expression across distinct tissues have produced contradictory results. Here, we attempt to determine whether genes express a dominant splice variant at the protein level. We interrogate peptides from eight large-scale human proteomics experiments and databases and find that there is a single dominant protein isoform, irrespective of tissue or cell type, for the vast majority of the protein-coding genes in these experiments, in partial agreement with the conclusions from the most recent large-scale RNAseq study. Remarkably, the dominant isoforms from the experimental proteomics analyses coincided overwhelmingly with the reference isoforms selected by two completely orthogonal sources, the consensus coding sequence variants, which are agreed upon by separate manual genome curation teams, and the principal isoforms from the APPRIS database, predicted automatically from the conservation of protein sequence, structure, and function.
PMCID: PMC4768900  PMID: 25732134
Large-scale proteomics; RNAseq; Alternative splicing; Dominant isoforms; Protein structure; Protein function
13.  ISCB’s initial reaction to New England Journal of Medicine editorial on data sharing 
F1000Research  2016;5:ISCB Comm J-157.
This message is a response from the ISCB in light of the recent the New England Journal of Medicine (NEJM) editorial around data sharing.
PMCID: PMC4786891  PMID: 26998241
Data sharing; Data reuse; Data repositories; Data archiving; Open data
14.  The potential clinical impact of the release of two drafts of the human proteome 
Expert Review of Proteomics  2015;12(6):579-593.
The authors have carried out an investigation of the two “draft maps of the human proteome” published in 2014 in Nature. The findings include an abundance of poor spectra, low-scoring peptide-spectrum matches and incorrectly identified proteins in both these studies, highlighting clear issues with the application of false discovery rates. This noise means that the claims made by the two papers – the identification of high numbers of protein coding genes, the detection of novel coding regions and the draft tissue maps themselves – should be treated with considerable caution. The authors recommend that clinicians and researchers do not use the unfiltered data from these studies. Despite this these studies will inspire further investigation into tissue-based proteomics. As long as this future work has proper quality controls, it could help produce a consensus map of the human proteome and improve our understanding of the processes that underlie health and disease.
PMCID: PMC4732427  PMID: 26496066
Clinical applications; false discovery rates; human proteome; protein coding genes; proteomics
15.  Pathway and Network Analysis of Cancer Genomes 
Nature methods  2015;12(7):615-621.
Genomic information on tumors from 50 cancer types catalogued by The International Cancer Genome Consortium (ICGC) shows that only few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been large interest in developing pathway and network analysis methods that group genes and illuminate the processes involved. We provide an overview of these analysis techniques and show where they guide mechanistic and translational investigations.
PMCID: PMC4717906  PMID: 26125594
17.  Analyzing the First Drafts of the Human Proteome 
Journal of Proteome Research  2014;13(8):3854-3855.
This letter analyzes two large-scale proteomics studies published in the same issue of Nature. At the time of the release, both studies were portrayed as draft maps of the human proteome and great advances in the field. As with the initial publication of the human genome, these papers have broad appeal and will no doubt lead to a great deal of further analysis by the scientific community. However, we were intrigued by the number of protein-coding genes detected by the two studies, numbers that far exceeded what has been reported for the multinational Human Proteome Project effort. We carried out a simple quality test on the data using the olfactory receptor family. A high-quality proteomics experiment that does not specifically analyze nasal tissues should not expect to detect many peptides for olfactory receptors. Neither of the studies carried out experiments on nasal tissues, yet we found peptide evidence for more than 100 olfactory receptors in the two studies. These results suggest that the two studies are substantially overestimating the number of protein coding genes they identify. We conclude that the experimental data from these two studies should be used with caution.
PMCID: PMC4334283  PMID: 25014353
proteomics; Nature; human proteome; protein coding genes; olfactory receptors
18.  Alternatively Spliced Homologous Exons Have Ancient Origins and Are Highly Expressed at the Protein Level 
PLoS Computational Biology  2015;11(6):e1004325.
Alternative splicing of messenger RNA can generate a wide variety of mature RNA transcripts, and these transcripts may produce protein isoforms with diverse cellular functions. While there is much supporting evidence for the expression of alternative transcripts, the same is not true for the alternatively spliced protein products. Large-scale mass spectroscopy experiments have identified evidence of alternative splicing at the protein level, but with conflicting results. Here we carried out a rigorous analysis of the peptide evidence from eight large-scale proteomics experiments to assess the scale of alternative splicing that is detectable by high-resolution mass spectroscopy. We find fewer splice events than would be expected: we identified peptides for almost 64% of human protein coding genes, but detected just 282 splice events. This data suggests that most genes have a single dominant isoform at the protein level. Many of the alternative isoforms that we could identify were only subtly different from the main splice isoform. Very few of the splice events identified at the protein level disrupted functional domains, in stark contrast to the two thirds of splice events annotated in the human genome that would lead to the loss or damage of functional domains. The most striking result was that more than 20% of the splice isoforms we identified were generated by substituting one homologous exon for another. This is significantly more than would be expected from the frequency of these events in the genome. These homologous exon substitution events were remarkably conserved—all the homologous exons we identified evolved over 460 million years ago—and eight of the fourteen tissue-specific splice isoforms we identified were generated from homologous exons. The combination of proteomics evidence, ancient origin and tissue-specific splicing indicates that isoforms generated from homologous exons may have important cellular roles.
Author Summary
Alternative splicing is thought to be one means for generating the protein diversity necessary for the whole range of cellular functions. While the presence of alternatively spliced transcripts in the cell has been amply demonstrated, the same cannot be said for alternatively spliced proteins. The quest for alternative protein isoforms has focused primarily on the analysis of peptides from large-scale mass spectroscopy experiments, but evidence for alternative isoforms has been patchy and contradictory. A careful analysis of the peptide evidence is needed to fully understand the scale of alternative splicing detectable at the protein level. Here we analysed peptides from eight large-scale data sets, identifying just 282 splice events among 12,716 genes. This suggests that most genes have a single dominant isoform. Many of the alternative isoforms that we identified were only subtly different from the main splice variant, and one in five was generated by substitution of homologous exons by swapping one related exon for another. Remarkably, the alternative isoforms generated from homologous exons were highly conserved, first appearing 460 million years ago, and several appear to have tissue-specific roles in the brain and heart. Our results suggest that these particular isoforms are likely to have important cellular roles.
PMCID: PMC4465641  PMID: 26061177
19.  The UBC-40 Urothelial Bladder Cancer cell line index: a genomic resource for functional studies 
BMC Genomics  2015;16(1):403.
Urothelial bladder cancer is a highly heterogeneous disease. Cancer cell lines are useful tools for its study. This is a comprehensive genomic characterization of 40 urothelial bladder carcinoma (UBC) cell lines including information on origin, mutation status of genes implicated in bladder cancer (FGFR3, PIK3CA, TP53, and RAS), copy number alterations assessed using high density SNP arrays, uniparental disomy (UPD) events, and gene expression.
Based on gene mutation patterns and genomic changes we identify lines representative of the FGFR3-driven tumor pathway and of the TP53/RB tumor suppressor-driven pathway. High-density array copy number analysis identified significant focal gains (1q32, 5p13.1-12, 7q11, and 7q33) and losses (i.e. 6p22.1) in regions altered in tumors but not previously described as affected in bladder cell lines. We also identify new evidence for frequent regions of UPD, often coinciding with regions reported to be lost in tumors. Previously undescribed chromosome X losses found in UBC lines also point to potential tumor suppressor genes. Cell lines representative of the FGFR3-driven pathway showed a lower number of UPD events.
Overall, there is a predominance of more aggressive tumor subtypes among the cell lines. We provide a cell line classification that establishes their relatedness to the major molecularly-defined bladder tumor subtypes. The compiled information should serve as a useful reference to the bladder cancer research community and should help to select cell lines appropriate for the functional analysis of bladder cancer genes, for example those being identified through massive parallel sequencing.
Electronic supplementary material
The online version of this article (doi:10.1186/s12864-015-1450-3) contains supplementary material, which is available to authorized users.
PMCID: PMC4470036  PMID: 25997541
Urothelial bladder cancer; Cell line; Genomics; Mutation; Oncogene; Tumor suppressor
20.  APPRIS WebServer and WebServices 
Nucleic Acids Research  2015;43(Web Server issue):W455-W459.
This paper introduces the APPRIS WebServer ( and WebServices ( Both the web servers and the web services are based around the APPRIS Database, a database that presently houses annotations of splice isoforms for five different vertebrate genomes. The APPRIS WebServer and WebServices provide access to the computational methods implemented in the APPRIS Database, while the APPRIS WebServices also allows retrieval of the annotations. The APPRIS WebServer and WebServices annotate splice isoforms with protein structural and functional features, and with data from cross-species alignments. In addition they can use the annotations of structure, function and conservation to select a single reference isoform for each protein-coding gene (the principal protein isoform). APPRIS principal isoforms have been shown to agree overwhelmingly with the main protein isoform detected in proteomics experiments. The APPRIS WebServer allows for the annotation of splice isoforms for individual genes, and provides a range of visual representations and tools to allow researchers to identify the likely effect of splicing events. The APPRIS WebServices permit users to generate annotations automatically in high throughput mode and to interrogate the annotations in the APPRIS Database. The APPRIS WebServices have been implemented using REST architecture to be flexible, modular and automatic.
PMCID: PMC4489225  PMID: 25990727
21.  NOTCH pathway inactivation promotes bladder cancer progression 
NOTCH signaling suppresses tumor growth and proliferation in several types of stratified epithelia. Here, we show that missense mutations in NOTCH1 and NOTCH2 found in human bladder cancers result in loss of function. In murine models, genetic ablation of the NOTCH pathway accelerated bladder tumorigenesis and promoted the formation of squamous cell carcinomas, with areas of mesenchymal features. Using bladder cancer cells, we determined that the NOTCH pathway stabilizes the epithelial phenotype through its effector HES1 and, consequently, loss of NOTCH activity favors the process of epithelial-mesenchymal transition. Evaluation of human bladder cancer samples revealed that tumors with low levels of HES1 present mesenchymal features and are more aggressive. Together, our results indicate that NOTCH serves as a tumor suppressor in the bladder and that loss of this pathway promotes mesenchymal and invasive features.
PMCID: PMC4319408  PMID: 25574842
22.  The Evolutionary Fate of Alternatively Spliced Homologous Exons after Gene Duplication 
Genome Biology and Evolution  2015;7(6):1392-1403.
Alternative splicing and gene duplication are the two main processes responsible for expanding protein functional diversity. Although gene duplication can generate new genes and alternative splicing can introduce variation through alternative gene products, the interplay between the two processes is complex and poorly understood. Here, we have carried out a study of the evolution of alternatively spliced exons after gene duplication to better understand the interaction between the two processes. We created a manually curated set of 97 human genes with mutually exclusively spliced homologous exons and analyzed the evolution of these exons across five distantly related vertebrates (lamprey, spotted gar, zebrafish, fugu, and coelacanth). Most of these exons had an ancient origin (more than 400 Ma). We found examples supporting two extreme evolutionary models for the behaviour of homologous axons after gene duplication. We observed 11 events in which gene duplication was accompanied by splice isoform separation, that is, each paralog specifically conserved just one distinct ancestral homologous exon. At other extreme, we identified genes in which the homologous exons were always conserved within paralogs, suggesting that the alternative splicing event cannot easily be separated from the function in these genes. That many homologous exons fall in between these two extremes highlights the diversity of biological systems and suggests that the subtle balance between alternative splicing and gene duplication is adjusted to the specific cellular context of each gene.
PMCID: PMC4494069  PMID: 25931610
alternative splicing; gene duplication; protein diversity; homologous exons; subfunctionalization
23.  Structure-PPi: a module for the annotation of cancer-related single-nucleotide variants at protein–protein interfaces 
Bioinformatics  2015;31(14):2397-2399.
Motivation: The interpretation of cancer-related single-nucleotide variants (SNVs) considering the protein features they affect, such as known functional sites, protein–protein interfaces, or relation with already annotated mutations, might complement the annotation of genetic variants in the analysis of NGS data. Current tools that annotate mutations fall short on several aspects, including the ability to use protein structure information or the interpretation of mutations in protein complexes.
Results: We present the Structure–PPi system for the comprehensive analysis of coding SNVs based on 3D protein structures of protein complexes. The 3D repository used, Interactome3D, includes experimental and modeled structures for proteins and protein–protein complexes. Structure–PPi annotates SNVs with features extracted from UniProt, InterPro, APPRIS, dbNSFP and COSMIC databases. We illustrate the usefulness of Structure–PPi with the interpretation of 1 027 122 non-synonymous SNVs from COSMIC and the 1000G Project that provides a collection of ∼172 700 SNVs mapped onto the protein 3D structure of 8726 human proteins (43.2% of the 20 214 SwissProt-curated proteins in UniProtKB release 2014_06) and protein–protein interfaces with potential functional implications.
Availability and implementation: Structure–PPi, along with a user manual and examples, isavailable at, the code for local installations at
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4495296  PMID: 25765346
24.  Alternative splicing and co-option of transposable elements: the case of TMPO/LAP2α and ZNF451 in mammals 
Bioinformatics  2015;31(14):2257-2261.
Summary: Transposable elements constitute a large fraction of vertebrate genomes and, during evolution, may be co-opted for new functions. Exonization of transposable elements inserted within or close to host genes is one possible way to generate new genes, and alternative splicing of the new exons may represent an intermediate step in this process. The genes TMPO and ZNF451 are present in all vertebrate lineages. Although they are not evolutionarily related, mammalian TMPO and ZNF451 do have something in common—they both code for splice isoforms that contain LAP2alpha domains. We found that these LAP2alpha domains have sequence similarity to repetitive sequences in non-mammalian genomes, which are in turn related to the first ORF from a DIRS1-like retrotransposon. This retrotransposon domestication happened separately and resulted in proteins that combine retrotransposon and host protein domains. The alternative splicing of the retrotransposed sequence allowed the production of both the new and the untouched original isoforms, which may have contributed to the success of the colonization process. The LAP2alpha-specific isoform of TMPO (LAP2α) has been co-opted for important roles in the cell, whereas the ZNF451 LAP2alpha isoform is evolving under strong purifying selection but remains uncharacterized.
Contact: or
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4495291  PMID: 25735770
25.  Integrated Next-Generation Sequencing and Avatar Mouse Models for Personalized Cancer Treatment 
Current technology permits an unbiased massive analysis of somatic genetic alterations from tumor DNA as well as the generation of individualized mouse xenografts (Avatar models). This work aimed to evaluate our experience integrating these two strategies to personalize the treatment of patients with cancer.
We performed whole-exome sequencing analysis of 25 patients with advanced solid tumors to identify putatively actionable tumor-specific genomic alterations. Avatar models were used as an in vivo platform to test proposed treatment strategies.
Successful exome sequencing analyses have been obtained for 23 patients. Tumor-specific mutations and copy-number variations were identified. All samples profiled contained relevant genomic alterations. Tumor was implanted to create an Avatar model from 14 patients and 10 succeeded. Occasionally, actionable alterations such as mutations in NF1, PI3KA, and DDR2 failed to provide any benefit when a targeted drug was tested in the Avatar and, accordingly, treatment of the patients with these drugs was not effective. To date, 13 patients have received a personalized treatment and 6 achieved durable partial remissions. Prior testing of candidate treatments in Avatar models correlated with clinical response and helped to select empirical treatments in some patients with no actionable mutations.
The use of full genomic analysis for cancer care is encouraging but presents important challenges that will need to be solved for broad clinical application. Avatar models are a promising investigational platform for therapeutic decision making. While limitations still exist, this strategy should be further tested.
PMCID: PMC4322867  PMID: 24634382

Results 1-25 (129)