|Home | About | Journals | Submit | Contact Us | Français|
Reactome (http://www.reactome.org) is a collaboration among groups at the Ontario Institute for Cancer Research, Cold Spring Harbor Laboratory, New York University School of Medicine and The European Bioinformatics Institute, to develop an open source curated bioinformatics database of human pathways and reactions. Recently, we developed a new web site with improved tools for pathway browsing and data analysis. The Pathway Browser is an Systems Biology Graphical Notation (SBGN)-based visualization system that supports zooming, scrolling and event highlighting. It exploits PSIQUIC web services to overlay our curated pathways with molecular interaction data from the Reactome Functional Interaction Network and external interaction databases such as IntAct, BioGRID, ChEMBL, iRefIndex, MINT and STRING. Our Pathway and Expression Analysis tools enable ID mapping, pathway assignment and overrepresentation analysis of user-supplied data sets. To support pathway annotation and analysis in other species, we continue to make orthology-based inferences of pathways in non-human species, applying Ensembl Compara to identify orthologs of curated human proteins in each of 20 other species. The resulting inferred pathway sets can be browsed and analyzed with our Species Comparison tool. Collaborations are also underway to create manually curated data sets on the Reactome framework for chicken, Drosophila and rice.
Reactome is an open source, open access, manually curated, peer-reviewed pathway database of human pathways and processes (1). Pathway annotations are created by expert biologists, in collaboration with Reactome editorial staff and cross-referenced to proteins (UniProt) and genes (NCBI EntrezGene, Ensembl, UCSC and HapMap), small molecules (KEGG Compound and ChEBI), primary research literature (PubMed) and GO controlled vocabularies (2–9). The Reactome data model generalizes the concept of a reaction to include transformations of entities such as transport from one compartment to another and interaction to form a complex, as well as the chemical transformations of classical biochemistry. Entities include nucleic acids, small molecules, proteins (with or without post-translational modifications) and macromolecular complexes. This generalization permits the capture of a range of biological processes that spans signaling, metabolism, transcriptional regulation, apoptosis and synaptic transmission in a single internally consistent, computationally navigable format. Reactome is an all-inclusive resource of human pathways for basic research, genome analysis, pathway modeling, systems biology and education. In the past 2 years, the Reactome data set has nearly doubled in size and new tools for data aggregation and data analysis have become available. To support the continued development of the Reactome knowledgebase, we have redesigned the Reactome web site and data analysis software.
Reactome’s recruitment of expert authors and curators has given us access to key aspects of human biology. The current release of Reactome (Version 34, September 2010) describes the roles of 5272 human proteins (26% of the 20286 human SwissProt entries) and 3504 macromolecular complexes in 3847 reactions organized into 1057 pathways. Over the last year, we have added new higher order topics. Notable additions include the molecular anatomy of transcriptional regulation, a largely complete catalogue of receptors with known ligands involved in GPCR signaling (10), Toll-like Receptors, Chromosome maintenance, Olfactory Signaling, Myogenesis, N-Glycan biosynthesis and Metabolism of RNA. Reactome has prototyped additional types of annotations to support pathway curation. We have curated pathways relating to insulin signaling cycle to prototype pathway–disease annotations. We have also developed a framework for physiological process annotations, e.g. vesicle transport and glutamate-mediated neurotransmission. To support the creation of the new pathway diagrams, we defined canonical pathways corresponding to 161 discrete biological domains. This enabled a simplification of the event hierarchy in the new Pathway Browser and has minimized event sharing across different pathways. Visualization features were implemented in the Author and Curator Tool to enable the layout and editing of these new pathways diagrams as part of our curation process.
The Reactome data model has been extended to the manual curation of pathways in model organisms. Gallus Reactome (http://gallus.reactome.org), an effort led by Carl Schmidt of the University of Delaware has been modeled after Reactome. The first public release was in mid-2009 and now Gallus Reactome includes annotations for 127 reactions involving 133 Gallus proteins in the domains of intermediary metabolism and DNA repair. A collaboration among Reactome, Michael Ashburner and Mark Williams at Cambridge University, similarly uses Reactome software and hardware to create and maintain a Drosophila pathway database (http://fly.reactome.org/). Its third release went public in mid-2010, and includes data for Wingless, JAK/STAT, Imd, Toll, Hedgehog, Circadian Clock, Hippo/Warts and Planar Cell Polarity signaling pathways.
The rapid growth in content has made the ‘starry sky’ reaction map display unwieldy as a navigation and visualization tool. At the same time, our outreach has grown to encompass diverse user groups interested in browsing a particular process or protein as a textbook, analyzing high-throughput expression data sets, data mining and data aggregation and online resources for education. Our front page has thus been redesigned to support quick, intuitive access to our data and tools as both features of the knowledgebase continue to grow. The new web site retains a comprehensive top menu bar that provides access to all of our tools and resources. It is now accompanied by a sidebar that provides access to basic, widely used tools for pathway browsing and data analysis, and panels that give a thumbnail overview of Reactome information, tutorials, recent news and a view of a recently added pathway of topical interest.
Visualization of full pathway information in a consistent format is vital to support the pathway-based analysis of complex experimental and computational data sets. To support such visual navigation and analysis of Reactome data we have developed, in collaboration with the ENFIN project (11), a new Pathway Browser based upon the Systems Biology Graphical Notation (SBGN) (12). SBGN is a standard graphical representation of biological pathway and network models, i.e. every molecule and reaction has a particular shape, color and cellular location. Our entire content has been organized into 161 canonical pathways, each displayed in this format. The Reactome Pathway Browser consists of four key elements. First, the ‘Search’ bar at the top of the page queries the entire Reactome database. Second, the ‘Pathways’ panel provides a scrolling display of the Reactome canonical pathway hierarchy. Third, clicking on the pathway name displays the corresponding pathway diagram in the ‘Visualization’ panel on the right side. The ‘Visualization’ panel offers interactive and dynamic pathway diagrams permitting zooming, scrolling and highlighting of events and molecules. Fourth, clicking on events and molecules in the pathway diagram uncovers a ‘Details’ panel below the pathway diagram with additional textual information about the events and molecules, respectively. Further functionality is provided in the form of context sensitive menus within the ‘Visualization’ panel (Figure 1). The precise features of the context sensitive menus are determined by the nature of the physical entity (small molecule, protein, complex): (i) a list of the other pathways in Reactome in which the selected entity participates; (ii) a display of the physical entities that contribute to the macromolecular complex; and, optionally (iii) a list of interactors of the entity from selected interaction databases (described later). The pathway diagrams are available for download as static PNG and PDF files. Dynamic pathway images compatible with third-party tools like Cytoscape (13) and CellDesigner (14) are currently being developed.
The Reactome data sets are a highly reliable platform for pathway-based data analysis but suffer from a low coverage of human proteins. To increase protein coverage and associated protein annotations, we have integrated molecular interaction (MI) data and network information into the Reactome pathway diagrams (Figure 1). The MI overlay displays proteins interacting with the manually annotated protein components of a Reactome pathway. As mentioned previously, individual protein interactors can be displayed using the context-dependent menus in the pathway ‘Visualization’ workspace. It is also possible to overlay all the interactors for all the pathway proteins by means of the ‘Analyze, Annotate and Upload’ feature of the Pathway Browser. The network overlay tool employs a PSICQUIC interface to implement flexible import of binary MI data into Reactome pathway diagrams. PSICQUIC is already widely implemented by interaction databases, including BioGRID, ChEMBL, IntAct, iRefIndex, MINT and STRING (15–20). The nodes and edges of the network overlay are interactive, providing links to the physical entity and interaction databases, respectively. Two of additional interaction data sets are managed by the Reactome group, ‘Reactome’ and ‘Reactome-FIs’. The original ‘Reactome’ data set reflects MI data derived from Reactome reactions and complexes. A new ‘Reactome-FIs’ (functional interactions) data set unites interactions from Reactome and those derived from other pathway databases, including KEGG, BioCyc, Panther, The Cancer Cell Map (http://cancer.cellmap.org/) and PID (7,21–23) with pair-wise interactions gleaned from physical protein–protein interactions in human and model organisms, gene co-expression data, protein domain–domain interactions, protein interactions generated from text mining and GO annotations (24). The ‘Reactome-FIs’ network contains 209988 functional interactions encompassing 10956 proteins (excluding splice isoforms), reflecting 46% of SwissProt proteins.
Comparative analysis of biological processes offers important information on their evolution, and supports metabolic engineering, the study of human disease and the identification of potential drug targets. Curated human reactions were used previously to electronically infer reactions by orthology in 20 evolutionary divergent species, with the assistance of the OrthoMCL (25). To align Reactome more closely with the Ensembl set of genome data and genome analysis tools, we have shifted to Ensembl Compara (26) to support orthology-based reaction inferences in 20 species for which high-quality whole-genome sequence data are available, including all 12 of the species in the GO Reference Genome annotation project (27). Viewing diagrams for predicted pathways in another species are available from within the Pathway Browser (Figure 2). A new Species Comparison tool allows users to compare these predicted pathways with those of human to find reactions and pathways common to a selected species and human (Figure 2).
Biologists are generating large amounts of functional data through gene expression, copy-number variation, protein–protein, protein–DNA and protein–RNA interactions, protein and metabolite abundance and large-scale DNA-sequencing experiments. Integrating this experimental information with the published literature and biological databases, including pathway databases, is vital to efficient and effective data analysis. Previously, Reactome provided the Skypainter tool for this level of functional data analysis (1). However, with the retirement of the ‘starry-sky’ reaction map and user requests to provide an expanded suite of bioinformatics tools, we redeveloped our data analysis suite to offer powerful and complementary tools. The Pathway Analysis tool analyzes user-supplied lists of genes, proteins and small molecules and provides ID mapping, pathway assignment and overrepresentation analysis (Figure 3). As with Skypainter, the pathway and expression analysis tools accept gene and protein accession numbers and identifiers that are associated with popular commercial platforms (e.g. Illumina, Agilent and Affymetrix). By default, the simplest of these analyses, ID mapping and pathway assignment, is selected. This analysis takes a set of identifiers and maps them to Reactome pathways. The results are presented in a tabular format (Figure 3). The overrepresentation analysis is based upon the previously reported Skypainter tools. Both pathway analysis results also link to the new Pathway Browser. The expression analysis tool is similar in design to pathway analysis tool, but it will accept numerical values (e.g. expression, abundance, fold change or statistical values) and shows how expression/abundance levels affect reactions and pathways in living organisms (Figure 3). Again, the results are provided in a tabular format. Results from both the pathway and expression analysis results can be downloaded as a spreadsheet or tab- and comma-separated formats. The colored pathway diagrams can be downloaded in publication quality format. The molecular overlay and context sensitive menu features are also enabled in the colored pathway diagrams, providing links from user-supplied experimental data to Reactome pathways and MIs and networks.
In collaboration with NCBI, Reactome annotations of pathways are being deposited into the NCBI BioSystems database, a large data repository for cataloguing molecules (nucleic acids, proteins, small molecules, drugs, etc.) that interact in biological systems (28). Reactome is part of the BioPAX Consortium to develop a data-exchange language to describe pathways, reactions and interactions (29). We have partnered with Gene Set Enrichment Analysis (GSEA) group at the Broad Institute to expand the collection of curated gene sets in the Molecular Signatures Database (MSigDB) to include Reactome’s high-quality pathway data (30). Reactome participated in this year’s Google Summer of Code program, collaborating with WikiPathways (31). The integration of pathway and interaction data has been a key element of the Reactome redevelopment. We have provided a new file format for the exchange of binary interaction data, based upon the PSI-MITAB format (32). In response to user requests, we recently changed the representation of protein modifications in Reactome to the PSI-MOD standard (33). We continue to support the use of Reactome data for ontology development with our relationships with the Gene and Protein Ontology groups. Reactome web pages link out to many online bioinformatics databases. This year, additional cross-references to RSCB Protein Data Bank (34), Comparative Toxicogenomics Database (35), DockBlaster (36), BioGPS (37) and dbSNP (38) have been added to the protein pages. Reactome software and data are now distributed under the terms of a Creative Commons Attribution 3.0 Unported License, that grants parties the non-exclusive right to use, distribute and create derivative works based on Reactome, provided that the software and information is correctly attributed to CSHL, OICR and EBI.
National Human Genome Research Institute at the National Institutes of Health (grant number P41 HG003751); European Union 6th Framework Programme ‘ENFIN’ (grant number LSHG-CT-2005-518254). Funding for open access charge: Ontario Institute for Cancer Research.
Conflict of interest statement. None declared.
We are grateful to the many researchers who have volunteered to be external authors and reviewers. Development of the Reactome data model and fly and chicken databases is a collaborative project and this work benefited greatly from our interactions with Carl Schmidt, Mark Williams and Michael Ashburner.