PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (100)
 

Clipboard (0)
None

Select a Filter Below

Authors
1.  Integrated Next-Generation Sequencing and Avatar Mouse Models for Personalized Cancer Treatment 
Background
Current technology permits an unbiased massive analysis of somatic genetic alterations from tumor DNA as well as the generation of individualized mouse xenografts (Avatar models). This work aimed to evaluate our experience integrating these two strategies to personalize the treatment of patients with cancer.
Methods
We performed whole-exome sequencing analysis of 25 patients with advanced solid tumors to identify putatively actionable tumor-specific genomic alterations. Avatar models were used as an in vivo platform to test proposed treatment strategies.
Results
Successful exome sequencing analyses have been obtained for 23 patients. Tumor-specific mutations and copy-number variations were identified. All samples profiled contained relevant genomic alterations. Tumor was implanted to create an Avatar model from 14 patients and 10 succeeded. Occasionally, actionable alterations such as mutations in NF1, PI3KA, and DDR2 failed to provide any benefit when a targeted drug was tested in the Avatar and, accordingly, treatment of the patients with these drugs was not effective. To date, 13 patients have received a personalized treatment and 6 achieved durable partial remissions. Prior testing of candidate treatments in Avatar models correlated with clinical response and helped to select empirical treatments in some patients with no actionable mutations.
Conclusion
The use of full genomic analysis for cancer care is encouraging but presents important challenges that will need to be solved for broad clinical application. Avatar models are a promising investigational platform for therapeutic decision making. While limitations still exist, this strategy should be further tested.
doi:10.1158/1078-0432.CCR-13-3047
PMCID: PMC4322867  PMID: 24634382
2.  Higher gene expression variability in the more aggressive subtype of chronic lymphocytic leukemia 
Genome Medicine  2015;7(1):8.
Background
Chronic lymphocytic leukemia (CLL) presents two subtypes which have drastically different clinical outcomes, IgVH mutated (M-CLL) and IgVH unmutated (U-CLL). So far, these two subtypes are not associated to clear differences in gene expression profiles. Interestingly, recent results have highlighted important roles for heterogeneity, both at the genetic and at the epigenetic level in CLL progression.
Methods
We analyzed gene expression data of two large cohorts of CLL patients and quantified expression variability across individuals to investigate differences between the two subtypes using different measures and statistical tests. Functional significance was explored by pathway enrichment and network analyses. Furthermore, we implemented a random forest approach based on expression variability to classify patients into disease subtypes.
Results
We found that U-CLL, the more aggressive type of the disease, shows significantly increased variability of gene expression across patients and that, overall, genes that show higher variability in the aggressive subtype are related to cell cycle, development and inter-cellular communication. These functions indicate a potential relation between gene expression variability and the faster progression of this CLL subtype. Finally, a classifier based on gene expression variability was able to correctly predict the disease subtype of CLL patients.
Conclusions
There are strong relations between gene expression variability and disease subtype linking significantly increased expression variability to phenotypes such as aggressiveness and resistance to therapy in CLL.
Electronic supplementary material
The online version of this article (doi:10.1186/s13073-014-0125-z) contains supplementary material, which is available to authorized users.
doi:10.1186/s13073-014-0125-z
PMCID: PMC4308895  PMID: 25632304
3.  Integration of biological data by kernels on graph nodes allows prediction of new genes involved in mitotic chromosome condensation 
Molecular Biology of the Cell  2014;25(16):2522-2536.
A gene function prediction method suitable for the design of targeted RNAi libraries is described and used to predict chromosome condensation genes. Systematic experimental validation of candidate genes in a focused RNAi screen by automated microscopy and quantitative image analysis reveals many new chromosome condensation factors.
The advent of genome-wide RNA interference (RNAi)–based screens puts us in the position to identify genes for all functions human cells carry out. However, for many functions, assay complexity and cost make genome-scale knockdown experiments impossible. Methods to predict genes required for cell functions are therefore needed to focus RNAi screens from the whole genome on the most likely candidates. Although different bioinformatics tools for gene function prediction exist, they lack experimental validation and are therefore rarely used by experimentalists. To address this, we developed an effective computational gene selection strategy that represents public data about genes as graphs and then analyzes these graphs using kernels on graph nodes to predict functional relationships. To demonstrate its performance, we predicted human genes required for a poorly understood cellular function—mitotic chromosome condensation—and experimentally validated the top 100 candidates with a focused RNAi screen by automated microscopy. Quantitative analysis of the images demonstrated that the candidates were indeed strongly enriched in condensation genes, including the discovery of several new factors. By combining bioinformatics prediction with experimental validation, our study shows that kernels on graph nodes are powerful tools to integrate public biological data and predict genes involved in cellular functions of interest.
doi:10.1091/mbc.E13-04-0221
PMCID: PMC4142622  PMID: 24943848
4.  Computational approaches to identify functional genetic variants in cancer genomes 
Nature methods  2013;10(8):723-729.
The International Cancer Genome Consortium (ICGC) aims to catalog genomic abnormalities in tumors from 50 different cancer types. Genome sequencing reveals hundreds to thousands of somatic mutations in each tumor, but only a minority drive tumor progression. We present the result of discussions within the ICGC on how to address the challenge of identifying mutations that contribute to oncogenesis, tumor maintenance or response to therapy, and recommend computational techniques to annotate somatic variants and predict their impact on cancer phenotype.
doi:10.1038/nmeth.2562
PMCID: PMC3919555  PMID: 23900255
5.  The pseudo GTPase CENP-M drives human kinetochore assembly 
eLife  2014;3:e02978.
Kinetochores, multi-subunit complexes that assemble at the interface with centromeres, bind spindle microtubules to ensure faithful delivery of chromosomes during cell division. The configuration and function of the kinetochore–centromere interface is poorly understood. We report that a protein at this interface, CENP-M, is structurally and evolutionarily related to small GTPases but is incapable of GTP-binding and conformational switching. We show that CENP-M is crucially required for the assembly and stability of a tetramer also comprising CENP-I, CENP-H, and CENP-K, the HIKM complex, which we extensively characterize through a combination of structural, biochemical, and cell biological approaches. A point mutant affecting the CENP-M/CENP-I interaction hampers kinetochore assembly and chromosome alignment and prevents kinetochore recruitment of the CENP-T/W complex, questioning a role of CENP-T/W as founder of an independent axis of kinetochore assembly. Our studies identify a single pathway having CENP-C as founder, and CENP-H/I/K/M and CENP-T/W as CENP-C-dependent followers.
DOI: http://dx.doi.org/10.7554/eLife.02978.001
eLife digest
When a human cell divides to make new cells, its 46 chromosomes must be replicated and then separated evenly between the two daughter cells. The process of separation is performed by the spindle—a network of fibres that form inside the cell, attach to the chromosomes and pull the copies to the opposite ends of the cell.
The spindle fibres attach to a structure called a kinetochore, which forms at a region of the chromosomes called the centromere. The kinetochore has a layered structure with multiple copies of many proteins, and the inner layer is composed of at least 16 centromeric proteins. These proteins interact directly with the centromere and influence the formation of the rest of the kinetochore and the spindle fibres. While some of the interactions between centromeric proteins have been uncovered, the roles of several of them—including one called CENP-M—remain unknown.
Now, Basilico, Maffini, Weir et al. reveal that CENP-M is essential for assembling and stabilizing the inner layer of the kinetochore. However, while it is structurally and evolutionarily related to enzymes called GTPases, CENP-M is not an enzyme. Instead, the CENP-M protein interacts with three other centromeric proteins to form a complex that becomes part of the inner layer of the kinetochore. Basilico, Maffini, Weir et al. also find that another centromeric protein, CENP-C, appears to start the assembly of the inner layer. This protein then recruits two complexes made of other centromeric proteins to the kinetochore, including the complex that contains CENP-M.
The next challenge will be to reconstitute larger protein complexes that contain more proteins from the inner layer of the kinetochore, so that these assemblies can be studied in greater detail. It will also be important to investigate how CENP-C acts as a scaffold to organize the interface between the kinetochore and the centromere.
DOI: http://dx.doi.org/10.7554/eLife.02978.002
doi:10.7554/eLife.02978
PMCID: PMC4080450  PMID: 25006165
centromeres; kinetochores; mitosis; human
6.  Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes 
Human Molecular Genetics  2014;23(22):5866-5878.
Determining the full complement of protein-coding genes is a key goal of genome annotation. The most powerful approach for confirming protein-coding potential is the detection of cellular protein expression through peptide mass spectrometry (MS) experiments. Here, we mapped peptides detected in seven large-scale proteomics studies to almost 60% of the protein-coding genes in the GENCODE annotation of the human genome. We found a strong relationship between detection in proteomics experiments and both gene family age and cross-species conservation. Most of the genes for which we detected peptides were highly conserved. We found peptides for >96% of genes that evolved before bilateria. At the opposite end of the scale, we identified almost no peptides for genes that have appeared since primates, for genes that did not have any protein-like features or for genes with poor cross-species conservation. These results motivated us to describe a set of 2001 potential non-coding genes based on features such as weak conservation, a lack of protein features, or ambiguous annotations from major databases, all of which correlated with low peptide detection across the seven experiments. We identified peptides for just 3% of these genes. We show that many of these genes behave more like non-coding genes than protein-coding genes and suggest that most are unlikely to code for proteins under normal circumstances. We believe that their inclusion in the human protein-coding gene catalogue should be revised as part of the ongoing human genome annotation effort.
doi:10.1093/hmg/ddu309
PMCID: PMC4204768  PMID: 24939910
7.  Recurrent inactivation of STAG2 in bladder cancer is not associated with aneuploidy 
Nature genetics  2013;45(12):10.1038/ng.2799.
Urothelial bladder cancer (UBC) is heterogeneous at the clinical, pathological, and genetic levels. Tumor invasiveness (T) and grade (G) are the main factors associated with outcome and determine patient management (1). A discovery exome sequencing screen (n=17), followed by a prevalence screen (n=60), identified new genes mutated in this tumor coding for proteins involved in chromatin modification (MLL2, ASXL2, BPTF), cell division (STAG2, SMC1A, SMC1B), and DNA repair (ATM, ERCC2, FANCA). STAG2, a subunit of cohesin, was significantly and commonly mutated/lost in UBC, mainly in tumors of low stage/grade, and its loss was associated with improved outcome. Loss of expression was often observed in chromosomally-stable tumors and STAG2 knockdown in bladder cancer cells did not increase aneuploidy. STAG2 reintroduction in non-expressing cells led to reduced colony formation. Our findings indicate that STAG2 is a novel UBC tumor suppressor acting through mechanisms that are different from its role to prevent aneuploidy.
doi:10.1038/ng.2799
PMCID: PMC3840052  PMID: 24121791
exome sequencing; bladder cancer; STAG2; cohesin; aneuploidy; tumorsuppressor
9.  Transcriptional dissection of pancreatic tumors engrafted in mice 
Genome Medicine  2014;6(4):27.
Background
Engraftment of primary pancreas ductal adenocarcinomas (PDAC) in mice to generate patient-derived xenograft (PDX) models is a promising platform for biological and therapeutic studies in this disease. However, these models are still incompletely characterized. Here, we measured the impact of the murine tumor environment on the gene expression of the engrafted human tumoral cells.
Methods
We have analyzed gene expression profiles from 35 new PDX models and compared them with previously published microarray data of 18 PDX models, 53 primary tumors and 41 cell lines from PDAC. The results obtained in the PDAC system were further compared with public available microarray data from 42 PDX models, 108 primary tumors and 32 cell lines from hepatocellular carcinoma (HCC). We developed a robust analysis protocol to explore the gene expression space. In addition, we completed the analysis with a functional characterization of PDX models, including if changes were caused by murine environment or by serial passing.
Results
Our results showed that PDX models derived from PDAC, or HCC, were clearly different to the cell lines derived from the same cancer tissues. Indeed, PDAC- and HCC-derived cell lines are indistinguishable from each other based on their gene expression profiles. In contrast, the transcriptomes of PDAC and HCC PDX models can be separated into two different groups that share some partial similarity with their corresponding original primary tumors. Our results point to the lack of human stromal involvement in PDXs as a major factor contributing to their differences from the original primary tumors. The main functional differences between pancreatic PDX models and human PDAC are the lower expression of genes involved in pathways related to extracellular matrix and hemostasis and the up- regulation of cell cycle genes. Importantly, most of these differences are detected in the first passages after the tumor engraftment.
Conclusions
Our results suggest that PDX models of PDAC and HCC retain, to some extent, a gene expression memory of the original primary tumors, while this pattern is not detected in conventional cancer cell lines. Expression changes in PDXs are mainly related to pathways reflecting the lack of human infiltrating cells and the adaptation to a new environment. We also provide evidence of the stability of gene expression patterns over subsequent passages, indicating early phases of the adaptation process.
doi:10.1186/gm544
PMCID: PMC4062047  PMID: 24739241
10.  Late-replicating CNVs as a source of new genes 
Biology Open  2014;3(3):231.
doi:10.1242/bio.20147815
PMCID: PMC4001237  PMID: 24596403
11.  Molecular Evidence for the Inverse Comorbidity between Central Nervous System Disorders and Cancers Detected by Transcriptomic Meta-analyses 
PLoS Genetics  2014;10(2):e1004173.
There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower than expected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity is driven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. We conducted transcriptomic meta-analyses of three CNS disorders (Alzheimer's disease, Parkinson's disease and Schizophrenia) and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap was observed between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genes downregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in opposite directions at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which could increase the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulation of another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing the Cancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, and reveal potential new candidates, in particular related with protein degradation processes.
Author Summary
A lower-than-expected probability of developing certain types of Cancer has been observed in patients with CNS disorders, including Alzheimer's disease, Parkinson's disease or Schizophrenia. Understanding such a protective effect could be the key to finding novel treatments for both types of conditions, for instance thanks to drug repurposing. However, little is known about the underlying mechanisms for these intriguing inverse comorbidities. Although environmental causes, drug treatments or lower screening surveys might contribute to the inverse comorbidity between complex disorders, we propose that inverse comorbidity is, at least in part, due to genetic factors.
We observe here that a common set of genes and biological processes are deregulated in opposite directions in CNS disorders and Cancers, i.e. upregulated in CNS disorders and downregulated in Cancers, or vice versa. We propose the alluring hypothesis that the deregulation of these genes and processes could promote CNS disorders and simultaneously lower the initiation or progression of Cancers.
doi:10.1371/journal.pgen.1004173
PMCID: PMC3930576  PMID: 24586201
12.  BioC: a minimalist approach to interoperability for biomedical text processing 
A vast amount of scientific information is encoded in natural language text, and the quantity of such text has become so great that it is no longer economically feasible to have a human as the first step in the search process. Natural language processing and text mining tools have become essential to facilitate the search for and extraction of information from text. This has led to vigorous research efforts to create useful tools and to create humanly labeled text corpora, which can be used to improve such tools. To encourage combining these efforts into larger, more powerful and more capable systems, a common interchange format to represent, store and exchange the data in a simple manner between different language processing systems and text mining tools is highly desirable. Here we propose a simple extensible mark-up language format to share text documents and annotations. The proposed annotation approach allows a large number of different annotations to be represented including sentences, tokens, parts of speech, named entities such as genes or diseases and relationships between named entities. In addition, we provide simple code to hold this data, read it from and write it back to extensible mark-up language files and perform some sample processing. We also describe completed as well as ongoing work to apply the approach in several directions. Code and data are available at http://bioc.sourceforge.net/.
Database URL: http://bioc.sourceforge.net/
doi:10.1093/database/bat064
PMCID: PMC3889917  PMID: 24048470
13.  wKinMut: An integrated tool for the analysis and interpretation of mutations in human protein kinases 
BMC Bioinformatics  2013;14:345.
Background
Protein kinases are involved in relevant physiological functions and a broad number of mutations in this superfamily have been reported in the literature to affect protein function and stability. Unfortunately, the exploration of the consequences on the phenotypes of each individual mutation remains a considerable challenge.
Results
The wKinMut web-server offers direct prediction of the potential pathogenicity of the mutations from a number of methods, including our recently developed prediction method based on the combination of information from a range of diverse sources, including physicochemical properties and functional annotations from FireDB and Swissprot and kinase-specific characteristics such as the membership to specific kinase groups, the annotation with disease-associated GO terms or the occurrence of the mutation in PFAM domains, and the relevance of the residues in determining kinase subfamily specificity from S3Det. This predictor yields interesting results that compare favourably with other methods in the field when applied to protein kinases.
Together with the predictions, wKinMut offers a number of integrated services for the analysis of mutations. These include: the classification of the kinase, information about associations of the kinase with other proteins extracted from iHop, the mapping of the mutations onto PDB structures, pathogenicity records from a number of databases and the classification of mutations in large-scale cancer studies. Importantly, wKinMut is connected with the SNP2L system that extracts mentions of mutations directly from the literature, and therefore increases the possibilities of finding interesting functional information associated to the studied mutations.
Conclusions
wKinMut facilitates the exploration of the information available about individual mutations by integrating prediction approaches with the automatic extraction of information from the literature (text mining) and several state-of-the-art databases.
wKinMut has been used during the last year for the analysis of the consequences of mutations in the context of a number of cancer genome projects, including the recent analysis of Chronic Lymphocytic Leukemia cases and is publicly available at http://wkinmut.bioinfo.cnio.es.
doi:10.1186/1471-2105-14-345
PMCID: PMC3879071  PMID: 24289158
14.  FireDB: a compendium of biological and pharmacologically relevant ligands 
Nucleic Acids Research  2013;42(Database issue):D267-D272.
FireDB (http://firedb.bioinfo.cnio.es) is a curated inventory of catalytic and biologically relevant small ligand-binding residues culled from the protein structures in the Protein Data Bank. Here we present the important new additions since the publication of FireDB in 2007. The database now contains an extensive list of manually curated biologically relevant compounds. Biologically relevant compounds are informative because of their role in protein function, but they are only a small fraction of the entire ligand set. For the remaining ligands, the FireDB provides cross-references to the annotations from publicly available biological, chemical and pharmacological compound databases. FireDB now has external references for 95% of contacting small ligands, making FireDB a more complete database and providing the scientific community with easy access to the pharmacological annotations of PDB ligands. In addition to the manual curation of ligands, FireDB also provides insights into the biological relevance of individual binding sites. Here, biological relevance is calculated from the multiple sequence alignments of related binding sites that are generated from all-against-all comparison of each FireDB binding site. The database can be accessed by RESTful web services and is available for download via MySQL.
doi:10.1093/nar/gkt1127
PMCID: PMC3965074  PMID: 24243844
15.  Late-replicating CNVs as a source of new genes 
Biology Open  2013;2(12):1402-1411.
Summary
Asynchronous replication of the genome has been associated with different rates of point mutation and copy number variation (CNV) in human populations. Here, our aim was to investigate whether the bias in the generation of CNV that is associated with DNA replication timing might have conditioned the birth of new protein-coding genes during evolution. We show that genes that were duplicated during primate evolution are more commonly found among the human genes located in late-replicating CNV regions. We traced the relationship between replication timing and the evolutionary age of duplicated genes. Strikingly, we found that there is a significant enrichment of evolutionary younger duplicates in late-replicating regions of the human and mouse genome. Indeed, the presence of duplicates in late-replicating regions gradually decreases as the evolutionary time since duplication extends. Our results suggest that the accumulation of recent duplications in late-replicating CNV regions is an active process influencing genome evolution.
doi:10.1242/bio.20136924
PMCID: PMC3863426  PMID: 24285712
CNV; DNA replication timing; Duplicated genes; Evolution
16.  CheNER: chemical named entity recognizer 
Bioinformatics  2013;30(7):1039-1040.
Motivation: Chemical named entity recognition is used to automatically identify mentions to chemical compounds in text and is the basis for more elaborate information extraction. However, only a small number of applications are freely available to identify such mentions. Particularly challenging and useful is the identification of International Union of Pure and Applied Chemistry (IUPAC) chemical compounds, which due to the complex morphology of IUPAC names requires more advanced techniques than that of brand names.
Results: We present CheNER, a tool for automated identification of systematic IUPAC chemical mentions. We evaluated different systems using an established literature corpus to show that CheNER has a superior performance in identifying IUPAC names specifically, and that it makes better use of computational resources.
Availability and implementation: http://metres.udl.cat/index.php/9-download/4-chener, http://chener.bioinfo.cnio.es/
Contact: miguel.vazquez@cnio.es
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btt639
PMCID: PMC3967102  PMID: 24227678
18.  An Epistatic Interaction between the PAX8 and STK17B Genes in Papillary Thyroid Cancer Susceptibility 
PLoS ONE  2013;8(9):e74765.
Papillary Thyroid Cancer (PTC) is a heterogeneous and complex disease; susceptibility to PTC is influenced by the joint effects of multiple common, low-penetrance genes, although relatively few have been identified to date. Here we applied a rigorous combined approach to assess both the individual and epistatic contributions of genetic factors to PTC susceptibility, based on one of the largest series of thyroid cancer cases described to date. In addition to identifying the involvement of TSHR variation in classic PTC, our pioneer study of epistasis revealed a significant interaction between variants in STK17B and PAX8. The interaction was detected by MD-MBR (p = 0.00010) and confirmed by other methods, and then replicated in a second independent series of patients (MD-MBR p = 0.017). Furthermore, we demonstrated an inverse correlation between expression of PAX8 and STK17B in a set of cell lines derived from human thyroid carcinomas. Overall, our work sheds additional light on the genetic basis of thyroid cancer susceptibility, and suggests a new direction for the exploration of the inherited genetic contribution to disease using association studies.
doi:10.1371/journal.pone.0074765
PMCID: PMC3781145  PMID: 24086368
19.  Getting personalized cancer genome analysis into the clinic: the challenges in bioinformatics 
Genome Medicine  2012;4(7):61.
Progress in genomics has raised expectations in many fields, and particularly in personalized cancer research. The new technologies available make it possible to combine information about potential disease markers, altered function and accessible drug targets, which, coupled with pathological and medical information, will help produce more appropriate clinical decisions. The accessibility of such experimental techniques makes it all the more necessary to improve and adapt computational strategies to the new challenges. This review focuses on the critical issues associated with the standard pipeline, which includes: DNA sequencing analysis; analysis of mutations in coding regions; the study of genome rearrangements; extrapolating information on mutations to the functional and signaling level; and predicting the effects of therapies using mouse tumor models. We describe the possibilities, limitations and future challenges of current bioinformatics strategies for each of these issues. Furthermore, we emphasize the need for the collaboration between the bioinformaticians who implement the software and use the data resources, the computational biologists who develop the analytical methods, and the clinicians, the systems' end users and those ultimately responsible for taking medical decisions. Finally, the different steps in cancer genome analysis are illustrated through examples of applications in cancer genome analysis.
doi:10.1186/gm362
PMCID: PMC3580417  PMID: 22839973
20.  RUbioSeq: a suite of parallelized pipelines to automate exome variation and bisulfite-seq analyses 
Bioinformatics  2013;29(13):1687-1689.
Motivation: RUbioSeq has been developed to facilitate the primary and secondary analysis of re-sequencing projects by providing an integrated software suite of parallelized pipelines to detect exome variants (single-nucleotide variants and copy number variations) and to perform bisulfite-seq analyses automatically. RUbioSeq’s variant analysis results have been already validated and published.
Availability: http://rubioseq.sourceforge.net/.
Contact: mrubioc@cnio.es
doi:10.1093/bioinformatics/btt203
PMCID: PMC3694642  PMID: 23630175
21.  Chapter 14: Cancer Genome Analysis 
PLoS Computational Biology  2012;8(12):e1002824.
Although there is great promise in the benefits to be obtained by analyzing cancer genomes, numerous challenges hinder different stages of the process, from the problem of sample preparation and the validation of the experimental techniques, to the interpretation of the results. This chapter specifically focuses on the technical issues associated with the bioinformatics analysis of cancer genome data. The main issues addressed are the use of database and software resources, the use of analysis workflows and the presentation of clinically relevant action items. We attempt to aid new developers in the field by describing the different stages of analysis and discussing current approaches, as well as by providing practical advice on how to access and use resources, and how to implement recommendations. Real cases from cancer genome projects are used as examples.
doi:10.1371/journal.pcbi.1002824
PMCID: PMC3531315  PMID: 23300415
23.  APPRIS: annotation of principal and alternative splice isoforms 
Nucleic Acids Research  2012;41(Database issue):D110-D117.
Here, we present APPRIS (http://appris.bioinfo.cnio.es), a database that houses annotations of human splice isoforms. APPRIS has been designed to provide value to manual annotations of the human genome by adding reliable protein structural and functional data and information from cross-species conservation. The visual representation of the annotations provided by APPRIS for each gene allows annotators and researchers alike to easily identify functional changes brought about by splicing events. In addition to collecting, integrating and analyzing reliable predictions of the effect of splicing events, APPRIS also selects a single reference sequence for each gene, here termed the principal isoform, based on the annotations of structure, function and conservation for each transcript. APPRIS identifies a principal isoform for 85% of the protein-coding genes in the GENCODE 7 release for ENSEMBL. Analysis of the APPRIS data shows that at least 70% of the alternative (non-principal) variants would lose important functional or structural information relative to the principal isoform.
doi:10.1093/nar/gks1058
PMCID: PMC3531113  PMID: 23161672
24.  ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data 
Nucleic Acids Research  2012;41(Database issue):D142-D151.
Chimeric RNAs that comprise two or more different transcripts have been identified in many cancers and among the Expressed Sequence Tags (ESTs) isolated from different organisms; they might represent functional proteins and produce different disease phenotypes. The ChiTaRS database of Chimeric Transcripts and RNA-Sequencing data (http://chitars.bioinfo.cnio.es/) collects more than 16 000 chimeric RNAs from humans, mice and fruit flies, 233 chimeras confirmed by RNA-seq reads and ∼2000 cancer breakpoints. The database indicates the expression and tissue specificity of these chimeras, as confirmed by RNA-seq data, and it includes mass spectrometry results for some human entries at their junctions. Moreover, the database has advanced features to analyze junction consistency and to rank chimeras based on the evidence of repeated junction sites. Finally, ‘Junction Search’ screens through the RNA-seq reads found at the chimeras’ junction sites to identify putative junctions in novel sequences entered by users. Thus, ChiTaRS is an extensive catalog of human, mouse and fruit fly chimeras that will extend our understanding of the evolution of chimeric transcripts in eukaryotes and can be advantageous in the analysis of human cancer breakpoints.
doi:10.1093/nar/gks1041
PMCID: PMC3531201  PMID: 23143107
25.  EnrichNet: network-based gene set enrichment analysis 
Bioinformatics  2012;28(18):i451-i457.
Motivation: Assessing functional associations between an experimentally derived gene or protein set of interest and a database of known gene/protein sets is a common task in the analysis of large-scale functional genomics data. For this purpose, a frequently used approach is to apply an over-representation-based enrichment analysis. However, this approach has four drawbacks: (i) it can only score functional associations of overlapping gene/proteins sets; (ii) it disregards genes with missing annotations; (iii) it does not take into account the network structure of physical interactions between the gene/protein sets of interest and (iv) tissue-specific gene/protein set associations cannot be recognized.
Results: To address these limitations, we introduce an integrative analysis approach and web-application called EnrichNet. It combines a novel graph-based statistic with an interactive sub-network visualization to accomplish two complementary goals: improving the prioritization of putative functional gene/protein set associations by exploiting information from molecular interaction networks and tissue-specific gene expression data and enabling a direct biological interpretation of the results. By using the approach to analyse sets of genes with known involvement in human diseases, new pathway associations are identified, reflecting a dense sub-network of interactions between their corresponding proteins.
Availability: EnrichNet is freely available at http://www.enrichnet.org.
Contact: Natalio.Krasnogor@nottingham.ac.uk, reinhard.schneider@uni.lu or avalencia@cnio.es
Supplementary Information: Supplementary data are available at Bioinformatics Online.
doi:10.1093/bioinformatics/bts389
PMCID: PMC3436816  PMID: 22962466

Results 1-25 (100)