We have previously reported a draft genome sequence and analysis of the
Giardia assemblage B isolate GS [
22]. We here describe the first genome analysis of a non-human
Giardia isolate, that of assemblage E isolate P15. We have chosen to use a high throughput strategy to sequence the genome of P15, taking advantage of the low cost of 454 sequencing and the rapid data generation. We have examined the genetic diversity of a mostly uncharacterized assemblage of
Giardia, and identified intraspecific variation that can be used for population studies or genotyping in a clinical setting. Comparative genome analysis of the assemblage A isolate WB and the B isolate GS previously revealed evidence of chromosomal shuffling, an abundance of allelic heterozygosity (ASH) and approximately 900 conserved open reading frames that were left unannotated in the first
Giardia genome project.
We have estimated the core gene content of Giardia to be approximately 91% of the total gene content, with the remaining 9% involved in antigenic variation and functions related to host specificity. A few members of the VSP and HCMP families were found to be positionally conserved between all three genomes, but the function of these syntenic surface proteins is not known. The Giardia genomes are relatively similar in terms of core gene content, but tend to differ with respect to synteny and in genes responsible for antigenicity. We have identified several small scale genomic changes, including insertions and deletions of coding and noncoding sequences. A small number of isolate specific genes were identified and in most cases these are located in nonsyntenic regions of the genome. As expected, the Giardia isolates have diverged repertoires of surface proteins (VSPs and HCMPs) primarily coded in nonsyntenic regions of the genomes. A similar genomic arrangement is present in the trypanosomes where syntenic regions are interspersed with large regions of surface molecule genes, noncoding sequences and repeats. The roles of such regions are unknown but could constitute regions of increased genomic plasticity and they could play a role in the generation of antigenic variation through recombination events. A chromosome-wide analysis of gene order have been hindered in our assembly by the lack of long range continuity, a result of the sequencing read length. Future efforts to characterize the Giardia spp. genomes would benefit from using paired-end libraries or other strategies to join repeated regions and to avoid the typical homopolymer errors associated with the 454 technology.
A very low level of allelic sequence heterozygosity was detected in the P15 genome, which contradicts findings in the GS genome, but is consistent with observations from the WB genome. The striking differences in ASH levels could imply different biology regarding sexual reproduction, GS could be a recent mixture of two isolates, or the presence of active meiotic components in either isolate. It has been suggested that the assemblage A isolates may have a mechanism to maintain low levels of ASH, and that such a mechanism could also be operating in some assemblage E isolates. However, low ASH levels in P15 does conflict with genotyping data from assemblage E isolates where double peaked chromatograms have been observed [
32]. Low ASH may be a feature of this particular assemblage E isolate but the high abundance of double peaks from other isolates could also reflect the fact that mixed infections with different assemblage E subgenotypes are more frequent in animals. Further studies of additional assemblage E isolates are needed to resolve this issue. Several genes identified as involved in DNA repair and meiosis in
Giardia, were previously shown to display larger than average sequence divergence as well as insertions and deletions in the GS isolate, compared to WB [
22]. An analogous comparison with the P15 isolate reveals substantial divergence in the genes
Rad50,
Rad52 and
Mre11, and GS specific insertions in
Smc5,
Mlh1,
Rad50 and deletions in
Rad50,
Msh6,
Dmc1a and
Smc1 that are specific for the ASH-rich GS lineage. Only one insertion in
Rad50 was specific for the WB lineage whereas none were detected in P15. However, it should also be noted that genes displaying a high degree of divergence in the GS lineage are also the most diverged when comparing WB and P15. Future studies of these differences correlated to the level of ASH in disparate isolates could reveal a mechanism for dealing with ASH accumulation. These differences in gene synteny and ASH between these three
Giardia isolates suggest that there is ubiquitous inter-genotype genetic diversity.
Indels have previously been noted as a prominent feature of many
Giardia proteins when compared to eukaryotic homologs. Small indels are an abundant feature in pairwise comparisons between all three genomes and they are dispersed over a major fraction of the genes. In contrast to single nucleotide polymorphisms (SNPs), indels always alter the primary amino acid sequence, and are therefore more likely to impact protein function. We found that a number of genes involved in core processes of the cell have indels, which could possibly alter the functional specificity or efficiency of these proteins. Indels have been described in other species and have been attributed to DNA polymerase slippage [
33].
Several groups encompassing metabolic genes were found to have elevated dN/dS ratios (Figure ). Such changes could cause substrate specificity to change in enzymes or alter metabolic pathways. It is possible that these enzymes allow variation in the amino acid sequence without affecting protein function. Another possibility is that these enzymes evolve more rapidly due to adaptations to the intestinal microenvironment in the host organism. A similar differential pattern of dN/dS rate was observed for genes grouped according to developmental stage. We here observed that genes expressed in certain developmental stages have slightly different distributions of dN/dS values (Figure ). The reason for this could be that certain stages need to be more tightly regulated and cannot easily accommodate variation. Another possibility would be that some stages expose proteins to the external environment, which could cause selective constraints by the host immune system.
We found evidence for a highly conserved set of core genes in
Giardia which we propose is essentially common to all
Giardia intestinalis isolates. This set of genes lies in genomic regions with mostly conserved synteny. Despite this, polymorphisms in terms of synonymous and nonsynonymous SNPs and indels are common in this set of genes, which could change the function or specificity of the expressed proteins. Also, cases of disrupted synteny do occur in these regions, which indicate that chromosomal recombination takes place in
Giardia or in certain lineages of the species. Experimental evidence for recombination between assemblages is missing, but has been reported to occur between A2 isolates [
34].
A large fraction of genes in the diplomonad genome have been proposed to have been acquired via horizontal gene transfer, due to their clustering with bacterial sequences in phylogenetic trees [
10,
35,
26]. Here we found a number of genes only present in certain isolates (Figure ). Some of these genes could be involved in shaping phenotypic differences, whereas others could be non-functional remnants from recent horizontal gene transfers, or represent recent losses in one or two of the lineages. An example of a recently acquired gene is the acetyl transferase gene with an apparent bacterial origin which was identified in P15, but which is absent in WB and GS (Additional file
5). However, it is presently unknown if the gene is expressed and what the precise function of its putative product might be. Our gene content comparison (Figure ) showed that gene acquisition and loss is much less frequent than in intestinal bacteria, such as
Escherichia coli [
36]. Nevertheless, it indicates that gene acquisition is an extant evolutionary process also in a eukaryotic parasite, which likely contributes to diversification of
Giardia isolates over evolutionary timescales.
The dynamics of the
G. intestinalis genomes are further exemplified by a putative pseudogene found in a conserved core region of assemblage E. Evolutionary analyses revealed that the gene most likely was recently acquired from bacteria, probably of the mammalian gut flora (Additional file
5). The gene then turned into a pseudogene in P15, whereas it is under purifying selection in WB and GS. The presence of this gene exclusively in the human infecting isolates is intriguing; it is tempting to speculate that it might be related to host specificity.
We could not find any clear evidence for chromosomal duplications in our genome data, but we detected the presence of a genomic region in P15 that harbors 13 genes that are absent in WB while it appears that they are present in GS. A preliminary analysis suggests that this region may be absent from other assemblage A isolates as well, and the protein products from these genes could constitute good targets for the development of assemblage discriminating antibodies for serological purposes. Host specificity or phenotypic variation could also manifest at the level of expression. Differential expression of genes between isolates could provide means for creating host specificity and phenotypic differences. The regulation of gene expression is likely to be mostly post-transcriptional because of the short relaxed promoters in Giardia, and it is possible that short RNAs could be involved. Future efforts to characterize the Giardia transcriptome would benefit from using high throughput sequencing to efficiently enhance the resolution and confirm the expression of proposed gene models.
The repeat content of the
Giardia genome is approximately 9% depending on which gene families are taken into consideration. This can be compared to the 50% repeat content of another protozoan parasite,
T. cruzi [
37]. The relatively low complexity of the
Giardia genome makes it ideal for rapid comparative genomics using shotgun sequencing.
Repeated areas in the Giardia genome contain pseudogenes, retro-transposons and low complexity repeats and they could constitute areas of increased genomic plasticity or sites where recombination occurs, which is indicated by their associations with discontinued synteny. Proteins that are encoded in these regions have no positional orthologs in other genomes but tend to be classified into one of the larger gene families in Giardia. In other genomes with short intergenic distances, gene order has mostly been conserved. Also, the bidirectional promoters of Giardia could act to prevent reshuffling by inactivating genes upon rearrangement. Whether genomic rearrangements occur randomly, or in a controlled manner by a certain mechanism or genomic location, warrant further investigation. It is known that co-regulated genes act as a hindrance to genomic rearrangements. It could therefore be proposed that the lack of gene regulation at the transcriptional level allows for larger genomic rearrangements in Giardia without affecting gene function. The presence of retro-transposon derived sequences in non-syntenic genomic positions in WB and P15 could indicate retro-transposons have contributed to shaping the Giardia genomes by increasing the genomic plasticity.
The Giardia genomes contain a very large fraction of hypothetical genes that code for proteins of unknown function and with limited expression or proteomic data to support their expression or function. Most of these genes appear to be randomly distributed across the genome. The lack of significant hits to other protozoan genomes likely reflects the strong adaptation of the parasite to its host environment in combination with the evolutionary divergence of these organisms. As the number of sequences from diplomonads increases, due to a number of ongoing diplomonad sequencing projects, several Giardia proteins will likely be assigned orthologs, but experimental efforts will be required to elucidate the function of these proteins. The generation of the GS and P15 draft genomes have provided information about the strong conservation of these genes which indicates that they have a functional role in Giardia biology.
It is clear that the genomic data alone cannot resolve the question of host specificity and mechanisms of pathogenicity, but the genome sequence data provide an evolutionary insight into how parasite genomes have been shaped over the course of evolution. Moreover, the identification of lineage specific genes and gene variants will provide candidates for future functional studies. We hope that the draft genome sequence of this assemblage E isolate will provide useful information for future studies of the differences between Giardia strains and the diplomonads in general.