|Home | About | Journals | Submit | Contact Us | Français|
The Giardia lamblia genome consists of 12 Mb divided among 5 chromosomes ranging in size from approximately 1 to 4 Mb. The assembled contigs of the genotype A1 isolate, WB, were previously mapped along the 5 chromosomes on the basis of hybridization of plasmid clones representing the contigs to chromosomes separated by PFGE. In the current report, we have generated an MluI optical map of the WB genome to improve the accuracy of the physical map. This has allowed us to correct several assembly errors and to better define the extent of the subtelomeric regions that are not included in the genome assembly.
The published sequence of the Giardia lamblia genotype A1 isolate, WB, consists of 11.7 Mb divided among 306 contigs. Some of these contigs were joined into larger scaffolds, primarily by “contig-joining” clones that linked these contigs even in the absence of continuous sequence . The results were supplemented by the use of multiple BAC clones that were end-sequenced and physically mapped to specific chromosomes using pulsed field gel electrophoresis (PFGE). Subsequent physical mapping studies using NotI-digested chromosomes of the genotype A1 isolate, BRIS/83/HEP/106, [2, 3] have made additional contributions to a complete physical map. The current manuscript describes the use of optical mapping to refine and extend the physical map of the WB isolate.
WB-C6 Giardia trophozoites were used to generate the optical map. The WB isolate was originally axenized from a patient who most likely acquired his giardiasis in Afghanistan  and subsequently has been cloned a number of times. The C6 clone from the laboratory of Dr. Fran Gillin, UC San Diego, was used for the genome project and was also used for the optical mapping described here. However, the WB isolate has been subjected to multiple rounds of replication in the laboratory, so any changes that occur rapidly, such as changes in the subtelomeric regions (STRs) may have resulted in differences between the organisms used for the genome project and those used for the optical mapping.
Trophozoites were grown to confluence, pelleted and embedded in soft agarose as previously described , followed by digestion with proteinase K in the presence of 1% Sarkosyl. The optical mapping performed by OpGen (Gaithersburg, Maryland) [6, 7] consisted of melting the agarose blocks followed by digestion with B-agarase. The DNA was mounted on a glass optical mapping surface and digested in situ with MluI so that the order of the individual restriction fragments was maintained. The DNA was labeled with fluorescent YOYO-1 and imaged by fluorescent microscopy, allowing the sizes of the fragments to be estimated by the intensity of the fluorescent labeling. OpGen software was used to generate an MluI restriction map and then to compare that map with the available genomic sequence data. The map generated 150-fold coverage. An algorithm that incorporates the length of the alignment and the quality of the individual restriction fragments was used to overlay the sequence contigs (and secondarily the scaffolds) onto the optical map. Individual sequence contigs could be flagged as problematic if regions of match were followed by complete mismatch, suggesting an assembly error in the individual sequence contigs.
Contigs that matched the optical map over their entire sequence were left intact. Those that matched the optical map for only a portion of the map were split at the point of discrepancy (c13, c27 and c29; Table 1). Conversely, if two contigs overlapped on the contig map and had areas of sequence identity consistent with their positions on the optical map, these contigs were joined. (17a and 53, 61 and 29a; Table 1).
The MluI optical map yielded a genome size of 12.1 Mb divided among five chromosomes ranging in size from 1.46 to 4.43 Mb. There were 1463 MluI sites with an average restriction fragment size of 8.29 kb. These chromosome sizes compare with PFGE estimates of 1.6 Mb to 3.5 Mb (Table 1). The total genome size estimated by the optical map is remarkably similar to the 12 Mb estimated by PFGE  and 11.7 Mb by the published genome, which did not include the rDNA repeats . Although the total size was nearly identical to that estimated by PFGE, the sizes of the individual chromosome estimates differed in that the chromosome 5 size had been underestimated by PFGE (assuming that the optical map is indeed more accurate) and the estimates for other chromosomes were smaller for the optical map than for PFGE.
The assembly of the published WBC6 genome consisted of 306 contigs in descending sizes by increasing ID number. These contigs are identified in Genbank and in the Giardia genome database as AACB02000001-AACB02000306. Many of the contigs were joined into scaffolds, most frequently because of longer contig-joining clones. Contigs 1 through 70 with the exception of 66 were placed onto the optical map (Fig 1). (A more detailed demonstration of the placing of the contigs can be seen in Supplementary Figure 1). However, contigs 13, 27 and 29 were each split into two fragments. Contig 13a was placed onto chromosome 5, but contig 13b was not placed. The two fragments of contig 27 were placed on chromosomes 5 and 1, respectively. Contig 29a was also placed onto chromosome 5, but contig 29b (39.8 kb) was not placed on the map. There were nine places on the optical map with “negative gaps”, meaning that there was an overlap between two contigs. In each case, we used BLAST comparisons of the adjacent contig sequences to look for regions of sequence identity near the contig ends that would allow them to be joined. We identified regions of sequence overlap for two of the nine contig pairs. The two pairs of contigs with overlapping sequences were joined and then analyzed using the OpGen software. This analysis confirmed that the joined contigs were compatible with the optical map. For the remaining seven pairs of overlapping but unjoined contigs, it is possible that misassembled sequences are present at one or both of the adjacent ends; this remains to be determined.
The contigs smaller than contig 70 (34.2 kb) had too few MluI sites to allow direct placement onto the optical map. However, several were contained in a scaffold of the published genome. These were left in the same positions if they did not contradict the optical map.
With the exception of the end gaps, 95% of the genome is covered by the optical map. The genome assembly omitted the STRs entirely. A subsequent report  described the sequences at most of the STRs, but the optical map provides the first accurate assessment of the extent of the STRs not covered by the sequence assembly. The 10 end gaps ranged in size from 2 to 819 kb, with all but one being less than 45 kb in size. The exception is the 819 kb gap from one end of chromosome 5, much of which consists of a repetitive region with MluI fragments 4400–4600 bp in size. We believe this most likely represents the rDNA repeat region. Although the rDNA repeat is 5566 bp in length and has only one MluI site, this is the only repeat region in the optical map compatible with prior data regarding the location of the rDNA sequence in subtelomeric repeats. Prior data indicated that three genotype A1 isolates (Portland, ISR, and CAT) varied greatly in the locations of the rDNA repeats . These repeats are located in the STRs of different chromosomes in different isolates. Even within different cloned lines of the ISR isolate, the sizes of the rDNA-containing subtelomeric regions varied substantially . This is particularly remarkable since the chromosome-internal regions demonstrate very little sequence variability.
A map placing the contigs and supercontigs onto a physical map that was derived by PFGE hybridization studies has recently been published . Many sections of the map in the current study are identical to those obtained by PFGE, but there are a few notable differences. Some of these differences resulted from the fact that the optical map split some of the contigs and supercontigs (Sc) or allowed the placement of additional Sc between two existing Sc. For example, Sc 1764 and 1761 were adjacent to each other at the right end of chromosome 4 on the PFGE-based map, while Sc 1801 was placed between them on the optical map. The differences not explained by splitting the contigs or supercontigs are found primarily in the subtelomeric regions. For example, Sc 1769 and 1767 were located at the left ends of chromosomes 1 and 2, respectively, in the PFGE-based map, but in the optical map, Sc 1769 was at the end of chromosome 2, while Sc 1767 was at the end of chromosome 1. We suggest that these subtelomeric differences may be the result of using different isolates in the two studies.
The optical map has provided independent verification for the majority of the contigs and supercontigs of the WB Giardia genome as originally published . In addition, it has corrected several errors that resulted from misassembly. We believe the increased accuracy of the current map will facilitate improved analysis of recombination and of gene expression that depends on the local context.
Supplementary Fig 1: The placement of the contigs along each of the chromosomes is shown. Sequence ID is the full name of the contig sequence in the GiardiaDB. “Contig” is the shortened name which corresponds to the unique last three digits of the full name. The “length” column gives the lengths of the individual contig sequence, while the gap gives the number of bp between contigs. A negative gap indicates overlap between adjacent contigs. “Along the chromosome” indicates the cumulative distance across the chromosome as determined by the optical data.
This work was funded in part by the Woods Hole Center for Oceans and Human Health, jointly funded by the National Science Foundation (OCE-0430724) and the National Institute for Environmental Health Sciences (P50 ES012742).