|Home | About | Journals | Submit | Contact Us | Français|
This unit describes the Wash U Epigenome Browser, a next-generation genomic data visualization system. The Browser currently hosts ENCODE and Roadmap Epigenomics data for human and model organisms. The Browser displays many sequencing-based data sets across all or part of the genome, on specific gene sets or pathways, and in the context of their metadata. Investigators can order, filter, aggregate, classify and display data interactively based on given feature sets including metadata features, annotated biological pathways, and user-defined collections of genes or genomic coordinates. Further, statistical tests can be performed on selected data. Individual labs can upload their sequencing or array-based data as custom tracks and display them in the context of consortium data, allowing for direct comparisons. The Browser is an increasingly important and widely accessible tool for deriving biological insights from unprecedented amounts of high-quality genomic, epigenomic and expression data.
Newer and higher throughput genomic technologies, including next-generation sequencing, have revolutionized genome sciences. They have also generated unprecedented amounts of data. ENCODE and the NIH Roadmap Epigenomics projects alone have generated several thousand sequencing-based, genome-wide measurements of transcription factor binding and epigenetic marks including DNA methylation and histone modifications among others, in many cell types and tissues, creating a resource and reference for the scientific community. Additionally, small groups of researchers are able to rapidly obtain huge volumes of genomic and epigenomic data. Comprehensive analysis requires an informatics platform to integrate, visualize, and compare very heterogeneous datasets in the context of the rich metadata that accompany them.
The Wash U Epigenome Browser (a.k.a Human Epigenome Browser at Washington University in St. Louis, http://epigenomegateway.wustl.edu) represents a novel web-based genome browser designed to meet these needs. The Browser supports integration, visualization, and analysis of large, sequencing-based datasets. Genome-associated quantitative measurements are displayed as genome heatmaps wherein color gradients reflect strength of signal. Metadata such as cell type, assay type, epigenetic mark and phenotype of a sample are distinguished in the metadata color map alongside the genome heatmap. Investigators can zoom and pan in an intuitive style to examine as many as hundreds of datasets at detail levels ranging from whole-genome down to individual nucleotides. Data tracks can be sorted, organized, dragged and dropped individually or in combination with their metadata. Investigators can also toggle between heatmap views and height-based ‘wiggle’ plots. Additionally, investigators can visualize data for selected genomic features such as promoters, any set of genes or genomic coordinates, or any user defined pathways that can be dynamically obtained from popular pathway resources (e.g. KEGG; see UNIT 1.12). Investigators can also apply standard statistical analyses and display the results graphically on the browser. These features help investigators quickly obtain insights from genome-scale data that can be used to test, or even generate, hypotheses.
The Epigenome Browser currently hosts close to four thousand datasets from more than one hundred human cell/tissue types from the Human Epigenome Atlas and ENCODE projects. This includes an increasing number of full, single-base resolution DNA methylomes, profiles of different types of histone marks, locations of open chromatin, small RNAs, and strand-specific RNA-seq based expression and splicing profiles. The Browser is freely accessible to the public, and is being expanded to support model organisms including mouse and fruit fly. The Browser uses advanced, multi-resolution data formats that require minimal system resources and support remote data access. Thus, investigators can display their own genome-wide data and metadata on the Browser as custom tracks for direct comparison with reference data without having to transfer the entire dataset to the Browser. The Browser can be easily expanded to support any sequencing-based large genomics projects including many that focus on human diseases. The Epigenome Browse is an increasingly important and widely accessible tool for mining biological insights from unprecedented amounts of high-quality genomic, epigenomic and expression data.
This Unit walks reader through a series of examples to examine histone modification pattern over the human nicotinic receptor family genes. By following the instructions, reader shall be familiar with the Browser interface, be able to navigate around, bring up data sets of interest for display and interpret the data pattern. Reader will also be introduced to some advanced functions including genomic juxtaposition and Gene Set View, to focus the view on a subset of the genome; and the Gene Plot, to identify data pattern with respect to a gene set.
The Wash U Epigenome Browser contains many components and offers diverse functionality. High accessibility is achieved through an advanced user interface. In this Protocol, major components and usability are illustrated by following a series of examples.
Computer with Internet access, preferably with mouse-like pointing devices. On touch-screen devices, some peripheral features including cursor hovering effects will not work, but core functions won’t be affected.
This web service runs best on open source web browsers including Chromium, Google Chrome, and Mozilla Firefox. Please upgrade your web browser to the most current version to achieve optimal performance. Microsoft Internet Explorer is currently not supported.
The Browser panel is composed of the following major components. In the center is the genome heatmap, showing track data over a specific genomic position. Each row in the genome heatmap represents one track containing genome-wide numerical data. On the right side of the genome heatmap is the metadata color map, in which the metadata annotation of tracks in the genome heatmap are represented as different colors. Each column is one metadata term with the term name printed on top. Below the genome heatmap lie various genomic feature tracks aligned with the heatmap. The floating toolbox can be found on the top right of the Browser. It retains its position in the web browser window when the page is scrolled. The floating toolbox contains navigation buttons, the message console, and many other control options that remain hidden until invoked. At the bottom of the page is the control panel, whose left part is the navigation bar, and the actual interface is on the right. Clicking tabs in the navigation bar will reveal the corresponding control interface.
It can be clearly seen that there is a strong H3K4me3 signal over the 3 kb region centered on the CHRNA7 transcription start site in most of the tracks, indicating the H3K4me3 mark is clearly present at CHRNA7’s promoter region in these samples. This suggests that the CHRNA7 promoter is in an open chromatin state and can be actively expressed. Focus the view on the group of tracks annotated by the metadata term “Blood”. If the tracks marked “Blood” are scattered among other tracks in the genome heatmap, group them together by clicking the word “Blood” on top of the metadata color map. In blood samples, the promoter H3K4me3 signal is not uniform among the various blood-cell types. Only in “mobilized CD34 primary cells” is strong H3K4me3 signal clearly observed. Otherwise no enrichment is observed and only background level signal is present in this region. Such absence of H3K4me3 enrichment can also be observed in other tissue samples including bone marrow and liver. This indicates the H3K4me3 pattern at the CHRNA7 promoter is regulated in a tissue-specific manner, and the CHRNA7 gene activity in these samples could be regulated in the same manner as well. However, to properly support such a hypothesis, data on other epigenetic marks and gene expression is needed.
By running juxtaposition, the Browser is showing the data over genes in the gene track, but not data over the intergenic regions. Click the “zoom out” button in the floating toolbox to include more genes in the view. Scroll the genome heatmap to expose nearby genes. This view provides a quick survey of H3K4me3 patterns over CHRNA7 and its neighbors. Notice gene KLF13 to the left. It has a strong H3K4me3 signal at its 5′ end in almost all samples, and shows strong gene body H3K4me3 signals in some blood samples. Refer to the right and find GREM1, this gene also has strong H3K4me3 signals at the gene’s 3′ end in many samples, but lacks a 5′ end signal in blood samples, indicating GREM1 might not be active in blood and immune cells.
The Browser is showing H3K4me3 data over the gene promoters. You can scroll or zoom to adjust the viewing region. Strong H3K4me3 signal presents at some gene promoters, including CHRNA7, KLF13, and FAN1. Conversely, many other promoters are lack H3K4me3 signal, indicating these genes might be inactive in these samples. Among the genes having strong H3K4me3 signal over their promoters, CHRNA7 stands out by showing a patchy pattern (H3K4me3 signal exists in a subset of the samples), while the KLF3 and FAN1 promoters show a consistently strong H3K4me3 signal over all samples.
The H3K4me3 signal appears strongly in a subset of the nAChR gene family (CHRNA7, CHRNB1, CHRNE, CHRNA5). Most of the signal appears at 5′ end of the genes, which is consistent with the previous observation of H3K4me3 pattern on the CHRNA7 gene’s promoter. Quite extraordinary is the CHRNB1 gene, where the H3K4me3 signal is observed in the highest intensity over the second to last exon of this gene (zoom in and enlarge this gene to get a detailed view). The H3K4me3 mark also shows a spreading pattern over the CHRNE gene body.
Similar to the gene body view in Fig. 6, a strong H3K4me3 signal is found in a subset of nAChR family genes at the regions around the transcription start sites. As demonstrated in Fig. 3, the region belonging to CHRNA7 shows a patchy pattern where the H3K4me3 signal is observed in a subset of samples. A strong consistent H3K4me3 signal can be found in the CHRNA5 region.
This plot shows a clustering pattern of genomic intervals with respect to the H3K4me3 ChIP-Seq track. In the plot a heatmap visualizes data from the particular H3K4me3 track over the genomic intervals, so that the data pattern at different intervals can be contrasted and the variance can be brought out. Each line in the heatmap belongs to one genomic interval. The maximum data value over all genomic intervals is used to set the color of the heatmap cells, so dark red indicates a data value close to the maximum. On the left of the heatmap a dendrogram shows how genomic intervals are grouped together by hierarchical clustering. Genomic intervals belonging to genes CHRNA5 and CHRNA7 are set apart from other intervals, as they harbor the strongest H3K4me3 signal. Because the peaks of the signals occur at different locations, the two genes do not belong to the same sub-cluster.
The Wash U Epigenome Browser is a powerful web service for visualizing whole-genome data sets, especially high-throughput DNA sequencing data. It extends the UCSC Genome Browser (Dreszer et al., 2011, UNIT 1.4) and the UCSC Cancer Genomics Browser (Sanborn et al., 2011) and it features novel functions. It is rooted in the well-known and dependable UCSC Genome Browser code base, achieving optimum server performance, and it adopts the twin-heatmap visualization approach of the Cancer Genomics Browser in visualizing data. The Wash U Browser focuses on hosting public data sets. At the moment its data comes exclusively from large consortium projects where data are produced in massive amounts and scope of coverage, and are relatively easy to collect and assemble. As of May 2012, the Wash U Browser is hosting over 4000 sequencing experiment tracks from the Roadmap Epigenomics (http://www.roadmapepigenomics.org/) and ENCODE (http://genome.ucsc.edu/ENCODE/) Projects. With the Wash U Browser, it is possible to navigate such enormous and diverse datasets and mine for biological information. Investigators can have 500+ tracks of various types and sources displayed in a single screen, and group the tracks in a meaningful way by the metadata attributes so that data patterns can be revealed. Facet browsing ensures investigators are able to identify tracks of interest in a directed and progressive manner. The Wash U Browser is sophisticated inside, but with its intuitive user interface, simple tasks are made easy and complicated tasks are made possible.
Genome annotation data used by the Wash U Epigenome Browser are downloaded from the UCSC Genome Browser database. The data are presented in the form of various genomic feature tracks and are assigned into meaningful groups for easy identification. In helping investigators analyze and understand sequencing experimental data in the context of genomic annotations, various unique functionalities have been developed, including data juxtaposition, Gene Set View, and Gene Plot, as demonstrated in this Unit. These functionalities allow investigators to view data from diverse angles and obtain insights that are otherwise difficult to achieve with conventional genome browser displays.
Engineering-wise, the Wash U Browser adopts the latest web technologies to deliver the best user experience. All client-server interactions are through Ajax protocol, which updates website content without reloading the web page. The Browser processes much of the tasks on the user’s web browser, including facet browsing and track rendering, thus reducing server queries to the minimum amount. Tracks can be re-rendered with custom styles at the blink of an eye. All these features contribute to a thoroughly modern web tool with satisfying performance and response.
The protocols in this Unit cover the essence of the Wash U Browser, including relocating to a genomic location, selecting tracks via the facet browsing function, scrolling and zooming to adjust the view, and data juxtaposition and Gene Set View to focus on a specific features of the genome. While these contents should adequately address most of the expectations about what a genome browser should accomplish, the Wash U Browser is nonetheless equipped with an array of advanced features that are not demonstrated here due to requirements of extra specialty and resource. These features apply special visualization techniques or automate routine tasks, and can be tremendously useful in specific contexts. Following is brief description about them.
Functions in this category include correlation analysis, pairwise comparisons, and hypotheses testing. With them the user can perform preliminary and explorative analyses on the data displayed in the genome heatmap. The correlation function assesses similarity between a target track and other tracks in genome heatmap. Track data from the current view in the genome heatmap is used for the correlation, but data beyond the view is not. To compare two groups of heatmap tracks, pairwise comparisons can be used to calculate the log2(ratio) as a straight-forward metric. These values are rendered as a quantitative track below the genome heatmap to show the comparison’s result. For rigorous comparisons, hypotheses tests can be used to derive P values as a measure of track data variation. Similar to the log2(ratio) track from the pairwise comparisons, a track of P values will be displayed to visualize the hypothesis test result. A P value cutoff can be set to select genomic intervals showing significant P values from the test.
This function helps user obtain the “whole picture” on any track from the Browser. The user can select a genome heatmap track and view its data pattern over all chromosomes via the bird’s eye view. Gene tracks can also be displayed in same way, but gene density data will be displayed instead. The bird’s eye view function is versatile enough to display multiple tracks in the same view so that they can be contrasted and customized with different colors and scales. Even log2(ratio) or P value data from statistical analyses can be displayed showing genome-wide comparison results.
The custom track function helps the user display their private data; this is very useful as the custom tracks behave in the same way as native tracks. Users can visualize their custom track alongside the native tracks, and apply all analysis and visualization techniques, for example running juxtaposition with a custom genomic feature track. Data must be prepared into binary, indexed file formats (bigWig, or bigBed as described in Kent WJ, 2010, and BAM as described in Li H, 2009), and the files will be placed on a web server for access. Such network-based file access is fast and secure, as only the small portion of data used for display is read and transmitted and never requires any file upload. Fortunately, the preparation procedures of custom track files are standardized and require only modest hardware investment and routine bioinformatics skills.
The dataHUB function is a natural extension of the custom track function. With dataHUB, the user can organize all of his/her custom tracks in one place, and display them on the Browser with one single step instead of laborious one-by-one uploading. Tracks inside a hub can be a mixture of types (quantitative data, annotation data, read alignment) and are not restricted to be on the same web server or host. Moreover, users can define custom metadata and annotate tracks to intelligently identify and organize everything in the hub. The dataHUB function doesn’t restrict the number of tracks in a hub and is a light-weight, yet powerful tool for building centralized collections of genomics data on the web.
These functions create a record of the user’s browsing status. Via the URL parameters, the user can control the contents displayed in the Browser, such as genome assembly, set of tracks, and genomic position on which to show the data. Users can compose such a URL and share their customized contents. Alternatively, session functions can be used to record a user’s browsing status. This saves a user’s status in a database on the server that can be retrieved by a session ID. Saved session information can be kept in the database for a limited time period, usually 3–6 months, and the URL parameter has virtually unlimited life span and is a preferred way of making one’s browsing status permanent.
The Wash U Epigenome Browser is always under active development in order to adapt to the fast-paced field of genome science. New feature releases, bug fixes, and adjustments to existing features happen regularly. Users can always refer to our online manuals for up-to-date instructions at http://epigenomegateway.wustl.edu/browser/manual/, and are recommended to subscribe to our blog (washugb.blogspot.com) or twitter feed (@WashUGBrowser) to receive news and announcements.
We thank Dr. David Haussler, Dr. James Kent and their UCSC Genome Browser team for critical advice on browser development. We also thank Dr. Joseph F. Costello at UCSF, Dr. Xiaole Shirley Liu at Harvard Medical School, and Dr. Pamela Madden at Wash U for helpful discussions and support. Dr. Heather Lawson and Ms. Rebecca Lowdon helped with manuscript revision. We would especially like to thank the anonymous reviewer who offered constructive comments. X.Z. is supported by NIDA’s R25 program DA027995. T.W. is supported in part by NIH grant 5U01ES017154, the March of Dimes Foundation, the Edward Jr. Mallinckrodt Foundation, P50CA134254 and a grant from the Foundation for Barnes-Jewish Hospital.