DOE JGI sequencing projects, ongoing and completed
Close to 4000 DOE JGI projects of different types are publicly available and searchable in our database. These projects include different genomic products, such as standard and improved draft, finished genomes, gene expression profiling, resequencing, metagenome projects and others. The ‘Project List
’ links on the Genome Portal page (http://genome.jgi.doe.gov
) and most of the Portal pages brings users to a list of DOE JGI projects with a detailed description of each project including its scope and current status, taxon, the JGI program and the project lead. The Resources column lists tools available for this project. Some of these tools, e.g. download
are available for all genomes, while others are taxon-, project type- or stage-dependent. For example, the plant genome will be linked to Phytozome, and the fungal genome—to MycoCosm.
Annotated DOE JGI genomes
The Genome Portal provides unified access to all annotated genomes and metagenomes available at the DOE JGI along with specialized analytical tools to navigate these data sets and compare genomes of related organisms. It is available at http://genome.jgi.doe.gov
or via the ‘Genomes’ tab on the JGI home page http://www.jgi.doe.gov/
. The Portal home page also provides worldwide statistics on the usage of the JGI resources and the information about the latest genome releases and new tool development.
From this page a user selects the organism and/or the tools to work with. There are over 3500 annotated genomes in the JGI database, and three convenient ways to find a particular genome of interest: an interactive The Tree of Life, a selection menu on the top of the page, and the Search function.
The Tree of Life organizes the sequenced genomes by domains of life and links to Organism home pages. Clicking on a branch name produces a menu displaying available genomes in this kingdom, phylum, class, or order (). Selecting a genome connects a user to a corresponding organism page or pages in different resources.
Figure 1. The Genome Portal page. A pull-down menu for the ‘Fungi’ branch of Eukaryota is shown. Search, BLAST and Download functions are available for the entire selected group. Each genome is linked to the organism page in the related resources, (more ...)
The same result can be achieved using the selection menu on the top of the page that allows for step-by-step genome selection by choosing All JGI Genomes, Bacteria, Archaea, Eukaryotic or Metagenome first, then organisms available for this group and finally the page to view. The latest addition to the JGI Genome Portal is Search function that enables searching for genomes by keyword (e.g. plants, Eukaryota), name, taxonId or projectId. Typing the beginning of the word in the text window brings up a pull-down menu with relevant search term choices.
Each organism's home page contains a description of the project, BLAST, download and links to specialized resources. For many eukaryotes (5–11
) the menu also includes several analytical tools described in the next section. The specialized JGI database resources connected to the portal include Integrated Microbial Genomes (IMG) (2
) and Metagenomes (IMG/M) (3
); Phytozome for green plant genomes (D. M. Goodstein et al.
, submitted for publication) and MycoCosm—the Fungal Genomics Resource that provides access to the annotated fungal genomes and tools for their analysis as described further in the text.
MycoCosm, an integrated fungal genomics resource
) was released in March 2010, in response to a call from the fungal community for integration of all fungal genomes and analytical tools in one place. MycoCosm brings together fungal genomics data and interactive analytical tools for diverse fungi that are important for energy and environment, which is the focus of the JGI Fungal program (12
). MycoCosm integrates genomics data from the DOE JGI and its users and promotes user community participation in data submission, annotation and analysis.
Over 100 newly sequenced and annotated fungal genomes from JGI and elsewhere are available to the public through MycoCosm, and new annotated genomes are being added to this resource upon completion of annotation. MycoCosm offers web-based genome analysis tools for fungal biologists to ‘navigate’ through sequenced genomes and explore them in the context of ‘genome-centric’ and ‘comparative views’.
provides search capabilities for annotated fungal genomes and visual navigation across their phylogenetic tree, where each node represents a group of phylogenetically related organisms and links to both genome centric and comparative analysis tools (). Each node includes a list of organisms and enables search and analysis within this list. Thus, by clicking on different nodes of the tree, a user can adjust the search and analysis space from single organism to the entire list of fungi. The Search function allows users to type an organism name or part of it and jump directly to a specific genome without browsing the tree.
Figure 2. The MycoCosm home page includes genome search function and displays major branches of the Fungal Tree of Life with nodes representing phylogenetically related groups. Clicking on a node brings up drop-down menu (shown on lower right) linked to an integrated (more ...)
MycoCosm genome-centric view
Includes the genome browser, download, BLAST and search capabilities within the data for a single genome, the VISTA tools for the analysis of whole-genome alignments, functional profiles and gene clusters ().
Figure 3. Genome-centric view of the MycoCosm includes several tools (listed in the top menu) and illustrated here by Genome Browser (on top), Synteny by interactive VISTADot plot (lower left) and KOG functional profile (lower right). Genome Browser tracks shown (more ...) The Genome browser
is the centerpiece of the MycoCosm genome-centric view and is based on the earlier version of the UCSC Genome Browser (15
) with configurable selection of tracks (). It displays predicted gene models and annotations along with different lines of evidence in support of these predictions (e.g. gene and protein expression profiles). It also displays other types of data mapped to a genome assembly such as VISTA tracks of genome conservation (16
), G+C profiles and annotation features including regions of homology, domains, repeats, non-coding genes and others. These features are either automatically computed or loaded by registered users as custom tracks. Predicted genome features in each track are linked to pages describing them and can also be linked to external resources. Gene models tracks are linked to the annotation reports and community annotation tools, which allow registered users to revise the predicted annotations.
This is a unique model across sequencing centers developed by the DOE JGI to engage users in collective analysis and improvement of genome annotations, which resulted in many successful projects (14
). Registered users participating in a particular genome project can validate and improve predicted gene models and annotations. Such gene models become highlighted on the browser (). Structural modifications are supported by the tools linked to Genome browser, which allow users to copy exons and gene models from any track, change them, or create them de novo
. Functional annotation tools are linked to annotation reports and enable user to curate functional assignments such as gene name and description, and communicate with other annotators.
of genomes are based on summaries of predicted gene annotations according to the GO (20
), KEGG (21
) and KOG (22
) classifications. Each profile is accessible as a separate tab and is searchable according to the classification nomenclature (). The profile lists the numbers of genes assigned to a particular functional category in the classification and links each number to the list of proteins assigned to the category. For every reference genome, a user can also compare its functional profile with profiles of related genomes to investigate gene family expansions or contractions at different levels of granularity.
Genome conservation and synteny
can be explored using VISTA Point
, designed for visualization and analysis of pairwise- and multiple DNA alignments (16
) at different levels of resolution in three visualization modes: (i) VISTA Browser
, which enables visual comparative analysis of complete genome assemblies using pairwise and multiple large-scale alignments; (ii) VISTA Synteny Viewer
, a multi-tiered graphical display of pairwise alignments at three different levels of resolution; (iii) VistaDot
, an interactive two-dimensional dot-plot genome synteny viewer across multiple chromosomes/scaffolds (). VISTA tools are also available through Phytozome and IMG for the plant and microbial genomes, respectively.
MycoCosm comparative view
This provides a different context for analyzing and summarizing information for entire groups of genomes, predefined in MycoCosm and corresponding to its nodes (). Unlike the genome-centric view, there is no reference genome in this analysis. Therefore, BLAST and search functions in this view are distinct from the genome-centric versions by their ability to search across multiple genomes simultaneously and compare analysis results side by side. For example, a keyword or BLAST search for protein kinases in Basidiomycota or Ascomycota will show differences in the number of found genes or BLAST hits across different members of these phyla. In addition, a user can save and download search results in different formats (FASTA, GFF) or download sequences and annotations for an entire group of organisms or its subset using the download tab.
This enables exploration of gene families within a given group of organisms. Clusters are built using Markov clustering algorithm MCL (23
) and all-against-all BLAST alignments of the proteins from the entire data set. On the Clusters front page, a user will find clusters of interest using gene search or cardinality filters to identify genome-specific clusters or those conserved across multiple genomes from the group (). Each cluster is linked to the Cluster Details page, where a user can explore the pattern of protein domains, intron–exon structure and local genomic context of each of the cluster members side-by-side. For some clusters a user can also examine precomputed multiple alignment of protein sequences and a species-reconciled phylogenetic tree with predicted gain/loss of genes.
Figure 4. MycoCosm comparative view includes several functions designed for analyzing groups of genomes (listed in the top menu) as illustrated by Cluster view. The Cluster front page (on top) lists two largest clusters of genes conserved in all four Eurotiomycetes (more ...) On-line video tutorial
This is available from the link on the main MycoCosm page (). It provides additional information on all features of MycoCosm and walks a user through the genome analysis process step by step. Several analytical tools are also available outside of MycoCosm for other eukaryotes (4–11
The Genome Portal web site is built on Apache HTTPD, Tomcat and MySQL. A majority of the Genome Portal components has been developed using Java and a variety of available open-sources tools and technologies. Our scalable database architecture is based on MySQL servers and currently contains more than 25 TB of genomics data. There are four load-balanced web servers, talking to two back-end database servers. A web-driven automated build system that takes each machine silently out of the cluster, builds a new version of the portal and puts the machine back into the cluster, ensures that updates can be applied without disruption to users. This setup further makes the portal resilient against hardware failures.
Data is fed into the portal by the JGI's annotation pipelines via an API that makes the data available to authorized users immediately. An advanced monitoring system allows administrators to quickly assess issues and deal with them before they become problems that may impact web site and database performance.