Whole-genome XhoI and PvuII optical maps of Y. pestis strain KIM were constructed to validate sequence assemblies and simplify the gap closure aspects of a parallel-sequencing project. The XhoI map was used solely to guide sequence assembly through the validation of nascent sequence contigs. This process worked in several ways: (i) nascent sequence contigs were aligned with the map to assess sequence assembly errors, (ii) validated sequence contigs were placed and oriented on the map, and (iii) gaps were characterized between mapped sequence contigs. Since the finished sequence contained information gleaned from the XhoI optical map, the PvuII map was reserved and used as a purely independent means of sequence validation.
The finished sequence of Y. pestis
strain KIM5 is 4.60 Mb (10
), very close to our estimate of 4.57 Mb based on the Xho
I optical restriction map, this shows that there was a mere 0.7% difference or map error. This genome sizing error is generally superior to that associated with other whole-genome physical maps constructed using pulsed-field gel electrophoresis (12
). Previously, the genome size of Y. pestis
strain KIM was estimated by two-dimensional pulsed-field gel electrophoresis (Spe
) to be 4.21 Mb, which indicates a sizing error of about 8.5% compared to sequence data. An accurate and independent estimation of genome size is critical when embarking on or concluding a microbial sequencing project, since goals and end points can be precisely known. We found that the in silico Pvu
II map constructed from the finished sequence of Y. pestis
strain KIM was almost congruent with its Pvu
II optical restriction map counterpart. The differences between them stemmed mainly from the absence of, or errors associated with, small restriction fragments (smaller than 2 kb).
Advancements in DNA mounting, image collection, processing, and map assembly software have resulted in a new optical mapping system capable of high-resolution analysis. These advancements are evident in the map data presented here. For example, the XhoI map of Y. pestis strain KIM was generated by the old system before modifications. The average fragment sizing error was 3.11 kb, and more than half of the fragments smaller than 2 kb based on the in silico map were missing. However, the higher-resolution PvuII map (average fragment size, 12.06 kb) was constructed using the modified system, with the results showing twice the precision (1.56 kb, [Fig. ]). The recent success in the construction of a BamHI optical map of Shigella flexneri with an average fragment size of 10.72 kb based on the finished optical map (unpublished results) and an average fragment size standard deviation of 1.48 kb provides another example that the resolution and precision of optical mapping have been greatly improved. It seems contradictory that the sizing accuracy of optical mapping has been reduced in the modified system because the average relative sizing error (optical map versus in silico map) is smaller for the XhoI map (5.14%) than for the PvuII map (6.00%). The reason for this result is that both the old and the modified optical mapping systems have reduced accuracy when sizing small fragments. In the XhoI optical map, less than 1/10 of the fragments are smaller than 5 kb, while in PvuII map, about 1/3 of the fragments are smaller than 5 kb, as shown in Fig. and F. We therefore see a slight increase in the relative sizing error in the PvuII optical map.
With increasing resolution of the optical mapping system, maps could be used for comprehensive genotyping. Since most genes are highly conserved among different strains of the same bacterial species or even closely related species (20
), this provides the basis for optical maps to identify large syntenic regions and instances of chromosomal rearrangements across closely related microbial genomes. Such rearrangements may be discerned as large insertions, deletions, or translocations. Given the sequence of a prototypic or reference strain, this would enable molecular studies without the need to sequence each microbe used in the comparison. For example, the flanking regions around a translocation event identified by a map versus sequence analysis could be sequenced by the pinpoint generation of appropriate amplicons. We are actively making progress toward this goal.
A composite map constructed from the linear addition of independently derived restriction enzyme maps is more informative than a single-enzyme map having the same average restriction fragment size (14
). The construction of such maps is generally not problematic for clones or for bacterial genomes, where macro- and microrestriction maps are commonly combined. However, expansive, high-resolution optical maps of entire genomes present new technical challenges in the alignment of congruent fragments from different restriction enzyme maps. What is sought is the proper registration of restriction fragments relative to each other (i.e., does the Xho
I fragment come before or after the Pvu
II fragment?). Our previous efforts to align restriction sites of different enzymes have produced some unavoidable errors, especially for small fragments (smaller than 5 kb) (12
). In this study, the composite optical map of Y. pestis
KIM with Xho
I and Pvu
II was constructed based on the alignments of the two separate optical maps with the in silico map made from the finished sequence. We conclude from this analysis that a new algorithm should be developed to systematically align multiple optical maps with sequence data, which takes into account both map and sequence errors.
In summary, the high-resolution maps we have presented were constructed using a newly advanced optical mapping system, and we have shown how such maps can facilitate large-scale sequencing efforts. However, we plan to use these same advancements to lay the groundwork for large-scale comparative genome studies that would evade analysis by microarray-based approaches.