The bovine genome was assembled at the Baylor College of Medicine Human Genome Sequencing Center using a combined method similar to that used for the rat genome[9
] and more recently the sea urchin genome[8
]. The combined strategy is a hybrid of the Whole Genome Shotgun (WGS) approach used for the mouse genome and the hierarchical (BAC clone) approach used for the human genome. The sequencing combines BAC shotgun reads with whole-genome-shotgun (WGS) reads from small insert libraries as well as BAC end sequences (BES).
The DNA for the small insert WGS libraries was from white blood cells from the Hereford cow L1 Dominette 01449. The source of the BAC library DNA was Hereford bull L1 Domino 99375, the sire of the former animal.
Two early assembly versions (Btau_1.0 and Btau_2.0) were prepared using only whole genome shotgun (WGS) reads from small insert clones and BES. Contigs from Btau_2.0 were used in the subsequent assembly.
Btau_3.1 was produced using the Atlas genome assembly system with a combination of WGS and BAC sequence[10
]. The assembly process consisted of multiple phases (Figure ). Sequences from each BAC were assembled with Phrap, first with just the BAC generated sequences, then in combination with the WGS reads that overlapped the BAC as an enriched BAC (eBAC). BACs were sequenced as either individual clone libraries or as pools of arrayed clones (see read statistics in Table and basepair statistics in Table ). BAC reads from individual libraries or from deconvoluted pools were assembled as individual BACs. 19,667 BAC projects (12,549 individual sequenced clones and 7,118 clones from BAC pools) were sequenced and assembled. Details of BAC assembly methods are provided below. Contigs from the Btau_2.0 WGS assembly were used to fill in the gaps in the BAC-based assembly (e.g. those due to gaps in the BAC tiling path), creating the combined assembly, Btau_3.1.
Figure 1 The Genome Assembly process. Sequence from pooled BACs, individual BACs and Whole Genome Shotgun was combined in a number of different ways as outlined here. At the top left, pooled BACs were deconvoluted and assembled as individual BACs. On the top right, (more ...)
The assembled contigs and scaffolds of the Btau_3.1 assembly were placed on the chromosomes using a version of the Integrated Bovine Map that represents merged data from several independent maps[11
]. Btau_4.0 is the latest assembly. This assembly added relatively little new sequence data, and thus contigs and scaffolds were not significantly changed, but used the ILTX[12
] and BAC finger-print contig [11
] maps and split scaffolds based on consistent bovine and sheep BES data [13
] to place contigs and scaffolds in the genome, instead of the Integrated Bovine Map, resulting in more accurate chromosome structures.
Overall, 90% of the total genome was placed on chromosomes in the Btau_4.0 assembly (Table ). This assembly was tested against available bovine sequence data sets (Tables and Additional file 1
). Of the 1.04 million EST sequences, 95.0% were contained in the assembled contigs. Assuming the ESTs are uniformly distributed throughout the genome, the estimated genome size is 2.87 Gb (2.73 Gb/0.95). The quality of the assembly was also tested by alignment to 73 finished BACs. The genomic coverage in these BACs was high, between 92.5% and 100.0% (average of 98.5%) of the BAC sequence in the assembly. The assembled contigs and scaffolds were aligned linearly to the finished BACs, suggesting that misassemblies are rare.
Scaffold Placement Statistics for Btau_4.0
Two groups have used SNP linkage data to order scaffolds on particular chromosomes. One group used SNP linkage data to order scaffolds on Chr6 [14
] and another placed scaffolds on Chr19 and Chr29 [15
]. Their studies provided additional evidence for scaffold placements and independent measurements for the quality of the assembly. Scaffolds in Btau_4.0 have an order entirely consistent with the evidence from these three chromosomes, while both Btau_3.1 and the composite map[11
] show misplaced scaffolds (see the summary in Table , and details in Additional file 2
Comparison to Independent Chromosome Maps
Further assessment of the Btau_4.0 assembly was performed by comparing dense SNP linkage maps constructed from genotyping 17,482 SNPs in 2,637 bulls belonging to 108 half-sib families with the physical positioning of the SNPs on all autosomal chromosomes. The analysis revealed that 134 SNPs were incorrectly positioned within assembly. This relatively small number (<0.8%) indicates the high degree of precision in the Btau_4.0 assembly. These misplaced SNPs were relocated in the linkage map to a position corresponding to the most closely linked, correctly assigned SNP. Additionally, 568 SNPs from 321 unplaced scaffolds were mapped to linkage groups.