LocusLink obviously provides a very easy-to-use, gene-centric view of the “sequence information space,” but what if a scientist is more interested in seeing the gene of interest in context, particularly now that human genome sequencing is complete? A number of portals called genome browsers have been developed that allow users to access genomic data and, more importantly, view annotations that have been made on the underlying sequence data.
As alluded to above, NCBI has its own Map Viewer, a tool through which experimentally verified genes, predicted genes, genomic markers, physical maps, genetic maps, and sequence variation data can be viewed. Currently, the Map Viewer can be used to view the genomes of 19 organisms (), with the number increasing as the genomes of more and more model organisms are sequenced. The Map Viewer is integrated into other NCBI tools, allowing one to link between the Map Viewer, LocusLink, and the main integrated information retrieval system at NCBI, Entrez.
Figure 7 The NCBI Map Viewer home page. From this page, users can select any of the available organisms for which map information is available and perform targeted queries (by gene, location, or any of a number of other criteria). Information on constructing queries (more ...)
Continuing with the MLH1 example, to find the genomic context of the MLH1 gene, one would simply change the pull-down menu marked Search
on the Map viewer home page (http://www.ncbi.nlm.nih.gov/mapview/
; see ) to Human
, type “MLH1” in the text box, then press Go.
The resulting screen is shown in . Notice that there are several red bars marking chromosome 3; below the chromosome, the number 3 appears, indicating 3 hits. Below the pictogram is a list of all the matches found for MLH1: the 1st corresponds to the gene itself (“locus”), an STS marker is next, followed by an entry from OMIM (“MIM”). Clicking on the hyperlinked MLH1
in the “locus” line returns the map shown in , which is the default map display. The header above the actual map gives some overview information about the map itself: there are 1906 genes on chromosome 3, and the region that is currently being displayed (the “sequence coordinates”) are from 36805K to 36948K. These numbers are also indicated in the blue bar to the left of the maps; the region displayed is also indicated by the red tick mark next to the ideogram in the blue sidebar, relative to the known cytogenetic banding patterns on chromosome 3.
The results of an NCBI Map Viewer search, using “human” as the organism and “MLH1” as the query. The query returned 3 matches, one of which is to the MLH1 locus. See text for details.
Figure 9 The default map view. Three maps are displayed: the cytogenetic gene map, the UniGene cluster map, and the “Genes_seq” map (known and putative genes that have been placed as a result of alignments of mRNAs to individual contigs). See text (more ...)
Three maps are shown in the main window, to the left, as long, vertical bars. The map marked Genes_cyto (for “genes-cytogenetic”) shows the cytogenetic locations of genes as reported in LocusLink. Twenty genes, in addition to MLH1, have been cytogenetically mapped to this region of chromosome 3. The next map, marked HsUniG (for “Human UniGene”) shows the positions of UniGene clusters (described above); put another way, mRNA and EST sequences that comprise a UniGene cluster map to this region. On the left side of this particular map are gray bars that form what appears to be a histogram. These bars are intended to illustrate the density of aligned mRNAs and ESTs in this region.
The thick blue lines to the right of this map are intended to illustrate exons. The final, right-most map is labeled Genes_seq (for “gene sequence”). The map occupying the right-most position in any view is called the “master map,” and the information appearing to the right of all the maps pertains to that master map. Three genes are plotted on the master map in this particular view: an EPM2A-interacting protein 1, then the MLH1 gene, which was the basis of the query (highlighted in red), and finally a leucine-rich repeat interacting protein 2 (LRRFIP2). For each gene, an indication of the gene’s structure is given by the blue line running along the right side of the map, with exons being represented as thick blue bars and introns being represented as the thinner, intervening blue lines. Finally, note the arrow immediately to the right of each gene name; this arrow represents the direction of transcription for each gene.
Whereas the default map view is useful to gain a sense of what a particular genomic region looks like, there are additional maps available that may shed more light on the biological properties of a particular area of interest. To change the maps that are shown in any particular view, the user can click on the link marked Maps & Options
that appears at the top of any Map Viewer page (see , top). Clicking on the link brings up the Maps and Options window, allowing the user to now customize the view (). For purposes of this example, we will remove the cytogenetic and UniGene map; to do so, the user would highlight each of these in the Maps Displayed
list on the right, then click <<Remove
. Suppose that we wished to add the GenomeScan map (to see where all predicted genes are located) and the Variation map (to see the position of all known SNPs). To do so, select each of these from the list on the left, then click Add>>
. When done, the window should be similar to that shown in . Once these selections are made, the user would click the Apply
button, and then the Close
button. This will then recast the map as shown in . Looking first at the 2 maps to the left, we see that the Genes_seq map and the Gscan (GenomeScan) map are not the same. The reason for this is that the GenomeScan map gives the results of a gene prediction algorithm (9
), and the Genes_seq map’s annotation is based on known and putative genes that have been placed as a result of alignments of mRNAs to individual contigs. Note that the MLH1 gene is marked in red (as before, the center-most gene on the map).
The Maps & Options window. This control panel is used to select from the available maps and change the order in which the maps are displayed.
A new map view for the MLH1 gene, using the options shown in . Notice that the Variation map showing all known SNPs is now the “master map.” See text for details.
The master map (the right-most map) is now the Variation map, giving a different display than before. As with the UniGene map in , the gray bars shown to the left of the Variation map indicate the density of SNPs at any given position. Some positions are simply marked with the number of variations (for example, “11 variations”), indicating that the map is too dense to display information on each individual SNP; simply zoom out to get more information at those positions (see below). In this view, numerous SNPs can be seen, each marked with an “rs” number. Clicking on that rs number would bring the user to the dbSNP page for that particular SNP, which is similar in appearance (but not identical) to the Variation page shown in the LocusLink example above (see ). Moving across from the rs number is a series of columns of interest. The column labeled Map indicates whether a particular SNP has been mapped to the genome. If the SNP has been mapped to a single position, a single green down-arrow would be shown (as in ); if the SNP has been mapped to multiple positions, a double down-arrow would be shown. The column labeled Gene indicates whether the SNP of interest is associated with a particular genomic feature. In each row of the Gene column, notice that there is an L, T, and C either “lit up” or “grayed out.” If the L (“locus,” blue) is lit, as it is for most of the SNPs in , that indicates that the SNP lies within 2 kb of the 5′ end of a gene or within 500 bases of the 3′ end of a gene. If the T (“transcript,” green) is lit, the marker overlaps with a known mRNA. Finally, if the C (“coding,” orange) is lit, part or all of the SNP marker position overlaps with the coding region of a gene. The columns that follow provide additional information about the quality of the SNP marker, and more information on each of these can be found by clicking on the blue column headers.
In addition to changing the maps shown in any given view, the user can navigate by clicking anywhere on the ideogram on the left, or zoom in and out by clicking on the “out-zoom-in” picture above the ideogram. There are also short, gray bars at the top and bottom of each map that allow the user to “scroll up” or “scroll down,” moving to the next genomic segment.
This particular example only scratches the surface of what can be done with the NCBI Map Viewer. To further complicate matters, there are 2 other genome browsers that have found widespread usage and should be in the arsenal of every molecular biologist. These browsers take slightly different approaches to visualizing genomic data, and users may prefer using 1 browser to another to answer a particular biological question. The 1st of these, the UCSC Genome Browser (10
), is based on the concept of “tracks,” where each track represents a particular type of annotation; this roughly corresponds to NCBI’s maps. The annotation tracks available through UCSC include known genes, predicted genes, EST alignments, and cross-species homologies, to name a few. One of the strengths of the UCSC browser lies in its ease of navigation and the ability for individual users to display their own custom annotation tracks on a map, allowing them to correlate their own experimental data to publicly available data. An example of the UCSC display is given in , showing the region around the MLH1 gene. Unlike the NCBI maps that run from top to bottom, the UCSC tracks run from left to right, allowing for more data to be displayed in a more intuitive fashion.
Figure 12 The UCSC Genome Browser. The region shown is for the human MLH1 gene. The overall organization is as a series of “tracks” that go from left to right, as opposed to the NCBI maps that go from top to bottom. The appearance of each track (more ...)
Finally, the Ensembl browser, developed by the Wellcome Trust Sanger Institute and EMBL’s European Bioinformatics Institute, contains comprehensive genome annotation resulting from gene predictions and experimental data for 9 species (11
). One of Ensembl’s features is the ability to easily perform comparative analyses between species through the availability of DNA-DNA alignments, orthologous protein information, and large-scale synteny information. Ensembl’s “ContigView” around the MLH1 gene is shown in . As the user moves down the Web page, more detailed information is provided, moving from the chromosomal level in the 1st panel to an overview of the region of interest in the 2nd panel to a detailed view in the 3rd panel; scrolling to the bottom of the page (not shown) would eventually bring the user to the base pair level of detail.
Figure 13 The Ensembl browser. The region shown is for the human MLH1 gene. The page begins with a chromosomal view, and then moves through various levels of detail. Controls in the section marked Detailed View can be used to either zoom in or out or navigate 5′ (more ...)