This video describes how the maize genome was sequenced, assembled and how the sequence can be viewed and used via the MaizeGDB Genome Browser and is accessible online at http://tutorial.maizegdb.org/?p=365
. Here, we outline the type and level of information we provided in the video, which is divided into six different sections. Note that the text presented here is tied to the schematic movies in the video.
Understanding the B73 genome assemblies
The Maize B73 genome was sequenced in a bacterial artificial chromosome (BAC)-by-BAC approach. There are currently three B73 genome assemblies, each released incrementally as an improvement on the former release. First, the Missouri Mapping Project (8
) created a BAC library from the maize B73 inbred. In addition, the project created high-resolution intermated B73
Mo17 (IBM) genetic maps (7
), and used these maps to order the BACs relative to each other. The result was a series of overlapping BAC clones assembled in ‘contigs’. This series of overlapping BACs was called the ‘Fingerprint Contig’ (FPC) map or ‘the physical map’ (7
). Next, the MGSC used the FPC map to select a minimum tiling path (MTP) of BAC clones for sequencing. The first assembly of sequences was called the ‘BAC-based Assembly’. This assembly is a series of overlapping BACs ordered by the FPC map with sequence elements mapped to those physical locations. The second version of the assembly was called ‘B73 RefGen_v1’. This was the first sequence-based pseudomolecule assembly for maize, and is the only assembly currently published or available via GenBank (3
). The current version is an improvement called ‘B73 RefGen_v2’. This assembly is a global reassembly of Bacterial Artificial Chromosome sequences integrating fosmid reads, B73 Optical Map data (9
) and maize-sorghum synteny (F. Wei et al
., manuscript in preparation). Subsequent versions are anticipated to be named incrementally: i.e. B73 RefGen_v3, would be next, etc.
How the Maize Genome Consortium Sequenced each BAC
A separate, more detailed video on this topic is available, but it is briefly covered in the ‘How to use the MaizeGDB Genome Browser’ movie, because how the BACs were sequenced is important for understanding what the sequence represents. Each BAC clone was shotgun sequenced and assembly was aided by scaffolding where possible (the scaffolding process is visually explained in our ‘How the Maize Genome Consortium Sequenced each BAC’ movie tutorial). Thus, each BAC sequence is as a series of about 5–12 segments with relative order and orientation noted, if known. One hundred ‘Ns’ are used to mark places in the pseudomolecule assembly where the sequence is not contiguous. This, of course, complicates the identification of genes, thus researchers benefit greatly from an awareness of the quality of the underlying sequence.
Creating a pseudomolecule
This part of the tutorial explains how the BACs and contigs were joined to make the pseudomolecule (again, a separate, more detailed video on this topic is available). Contigs and BACs are anchored to the Intermated B73 X Mo17 genetic map, which can usually provide order and orientation. Within a single contig, BAC sequences were merged based on strict criteria (≥99% identity, and other criteria outlined in the video). Once BACs within a contig were assembled, the contigs were strung together to make up a pseudomolecule for each chromosome. The gaps between contigs are represented by 1000 ‘Ns’ in the pseudomolecule, to distinguish this type of gap from the ones between pieces of sequence within each BAC (represented as 100 ‘Ns’).
How these assemblies look on the MaizeGDB Genome Browser
For the video, after each description of how the assembly was generated, the display fades to a view of how the pseudomolecule is represented within the context of the MaizeGDB Genome Browser. Most maize researchers have used the MaizeGDB Genome Browser, so this is an attempt to relate the visuals researchers are accustomed to with the methods that were used for sequencing and assembly.
Entering the MaizeGDB Genome Browser
There are several ways to enter the Genome Browser from the MaizeGDB home page, and each is shown in this section of the video. Researchers can click on ‘Genome Browser’ on the home page, or use the simple search box at the top of any header or footer, and select ‘genome browser’ from the pull down menu. Another way to enter is through the MaizeGDB implementation of BLAST (10
). Researchers can BLAST a sequence against a genome assembly then upload hits to the browser as a separate, private track. We also have additional outreach videos that spend several minutes going through exactly how to do this.
General features of the genome browser
We use GBrowse (11
), and many of the features available within the MaizeGDB implementation of GBrowse are similar to those offered at other databases [e.g. Flybase (12
) and TAIR (13
)]. Several features are shown in the videos, including ‘rubberbanding’ your region of choice (i.e. choosing a specific region on the genome browser by clicking the mouse button and holding it down while selecting a region), how to get more information on the data in each track, collapsing and expanding tracks, making customized views, and more.
In the next part of the tutorial, some lesser-known features of the MaizeGDB Genome Browser are discussed: these were included based upon specific questions we received from researchers. Examples include how to: download sequence files, bookmark a region, upload private tracks and share private tracks. At the end of the tutorial, viewers are invited again to submit questions anytime, and our contact information is given.