|Home | About | Journals | Submit | Contact Us | Français|
Enterocytozoon bieneusi is a unicellular enteric fungal pathogen and the most common cause of human microsporidiosis. The frequent detection of this organism in animals, including companion animals, livestock and wildlife, has raised the question of the importance of animal reservoirs in the epidemiology of this pathogen. A partial sequence of the ribosomal internal transcribed spacer (ITS) has been widely used as a genetic marker for studying the molecular epidemiology of E. bieneusi. With the aim of comparing E. bieneusi ITS genotypes originating from different host species, and assess the potential for zoonotic transmission, E. bieneusi ITS sequences retrieved from GenBank were analyzed using two metrics of diversity, rarefaction and phylogenetic distance. In spite of the human ITS sample being geographically more diverse, ITS sequence diversity in animals exceeded that of humans. In both host groups much of the ITS diversity remains to be sampled. Using quantitative phylogenetic tests we found evidence for a partial but significant segregation of E. bieneusi ITS sequences according to host species. Host-specific segregation was confirmed by hierarchical analysis of molecular variation. To improve our understanding of the epidemiology of human microsporidiosis and strengthen the study of E. bieneusi populations, efforts to genotype additional E. bieneusi isolates from wildlife and companion animals should be prioritized and the geographic and species diversify of animal samples should be increased. Due to the possibility of genetic recombination in this species, additional unlinked genetic markers need to be developed and included in future studies.
Based on the sequence of a small number of loci (Hirt et al., 1999; Keeling, 2003) the microsporidia are thought to represent an early-diverging branch of the Fungi (James et al., 2006). Several unique characteristics are shared by Microsporidia including the presence of a single ribosomal internal transcribed spacer (ITS), the lack of mitochondria, and the presence of a polar tube serving to propel the spore content into the host cell. Microsporidia are parasitic, and one species, E. bieneusi, is the most common cause of human intestinal microsporidiosis and an opportunistic infection in AIDS.
The epidemiology of E. bieneusi, and in particular the importance of zoonotic transmission, remains unclear. The study of the molecular epidemiology of human microsporidiosis has been driven by surveys of geographically restricted parasite populations collected from humans (Breton et al., 2007; Leelayoova et al., 2006; Rinder et al., 1997; Sulaiman et al., 2003a; ten Hove et al., 2009; Tumwine et al., 2002), livestock (Buckholt et al., 2002; Dengjel et al., 2001; Mathis et al., 1999; Santin et al., 2005; Sulaiman et al., 2004), or wildlife species (Lobo et al., 2006; Sulaiman et al., 2003b). These studies are all based on the sequence of the ribosomal ITS and have contributed to a growing collection of sequences of this locus (Santin and Fayer, 2009). Comparisons of E. bieneusi ITS genotypes from different host species have revealed an apparent lack of host specificity, as ITS genotypes are shared among human and animal hosts. These observation have been interpreted as evidence of zoonotic transmission (Drosten et al., 2005), a view supported by experiments demonstrating transmission between different host species (Feng et al., 2006; Kondova et al., 1998; Tzipori et al., 1997).
As a consequence of several sequencing efforts initiated over 10 years ago (Rinder et al., 1997), a relative large number of ITS sequences have been deposited in GenBank. These sequence data are far from being random in space or with respect to host origin, but pooling data from disparate surveys into a global collection enables testing for the presence of discrete ITS populations, particularly as they relate to host species.
Although a partial sequence of the E. bieneusi genome is available (Akiyoshi et al., 2009), and the sequence of a few other loci has been obtained from multiple isolates (Akiyoshi et al., 2007), the ITS is the only marker which has been used for studying the molecular epidemiology of this species. The reliance on a single locus constrains the interpretation of the data, particularly for an organism such as E. bieneusi for which information on the occurrence of sexual recombination is lacking. In the absence of recombination a single genetic marker would be adequate for studying the epidemiology of this pathogen, but if genetically distinct genotypes recombine in nature, extrapolating from a single locus to the entire genome may lead to the wrong conclusions.
Bearing in mind the limitations of the E. bieneusi ITS database and the constrains of the single-locus typing method, we report here an analysis of the GenBank ITS sequence collection using diversity analysis, phylogenetic tests and analysis of molecular variation (AMOVA). The results indicate host-specific structuring of the E. bieneusi ITS diversity.
E. bieneusi ITS sequences were downloaded from GenBank and aligned with Clustal W accessed through the Accessory Applications menu of BioEdit (Hall, 1999). Sequences were trimmed to 243 nucleotides corresponding to nucleotide position 56–298 in GenBank sequence AB359945 as this region is present in a majority of GenBank entries. Sequences which did not span this region were excluded. A total of 135 sequences were retained including 68 sequences from human infections, 16 sequences of bovine origin, 14 sequences from infected pigs and 15 sequences from cats and dogs combined. Unless stated otherwise, bovine, porcine, feline and canine hosts are referred to as “livestock”. An additional 22 sequences came from E. bieneusi isolated from wildlife, including raccoons, muskrats, beavers, marmosets and birds.
Rarefaction analysis (Gotelli and Colwell, 2001; Sanders, 1968) was used to compare the diversity among E. bieneusi ITS sequences originating from different hosts. Aligned ITS sequences were exported in FASTA format to Microsoft Excel and sorted such that identical sequences were located on adjacent rows. Each unique sequence was then numbered and sequence/sample/abundance combinations saved in plain text format as described in EstimateS User Guide at http://viceroy.eeb.uconn.edu/EstimateS and by Hughes and Hellmann (Hughes and Hellmann, 2005). In the User Guide this format is referred to as “Format 3”. Individual-based analytical rarefaction estimates and their standard deviation were calculated with this program.
A phylogenetic tree was constructed by importing aligned ITS sequences (n=135) into Mega 4.1 (Tamura et al., 2007) downloaded from http://megasoftware.net/. Evolutionary distances were computed with the Maximum composite Likelihood method (Tamura et al., 2004).
The program Unifrac (Lozupone et al., 2006) was used to investigate the extent and statistical significance of any clustering of E. bieneusi ITS sequences according to host species. Unifrac tests whether related sequences cluster according to a user-defined property, such as the environment from which they were obtained. The test measures the phylogenetic distance among populations by comparing the lengths of the branches in a phylogenetic tree which are unique to an environment to the total of the branch lengths in the tree. To determine the statistical significance of the fraction of unique branch length, the environments are randomized and the distribution of the Unifrac Distances (UD) over many replicates determined. The p value represents the number of randomized trees that generate UD values greater than the value obtained from the experimental tree. Here, we extend Unifrac’s application to testing whether E. bieneusi ITS sequences cluster according to host species. The web interface of the program at http://bmf2.colorado.edu/unifrac/index.psp was used. The program requires the input of two files: a tree file in Newick format (http://evolution.genetics.washington.edu/phylip/newick_doc.html) encoding a phylogenetic tree generated from the aligned sequences, and an “environmental file” described below. To create the tree file, a pairwise distance matrix was calculated from ITS sequences aligned with BioEdit using the Dnadist program of PHYLIP (Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle) accessed from the BioEdit Accessory Applications menu. The distance matrix was saved in text format and input into the Neighbor program or Fitch program of the PHYLIP package. Trees (Saitou and Nei, 1987) were drawn with these programs using the default values, except that an outgroup sequence was defined (see below). The file output by these programs was then uploaded into Unifrac. Also uploaded was the “environmental file” which is a tab-delimited text file linking the designation of each sequence (in this case the GenBank accession number) with the “environment”, i.e., the host species or group of host species (i.e. livestock) from which each E. bieneusi sequence originated. The outgroup was excluded. An Unifrac Distance (UD) of 1 indicates complete segregation, whereas a distance of zero is the theoretical lower bound of the UD obtained if sequences and environment (i.e., host) are not associated.
Consistent with the extensive divergence among the Microsporidia (James et al., 2006), no microsporidian ITS from GenBank could be aligned with the E. bieneusi ITS. Even ITS sequences from other Enterocytozoon or Encephalitozoon species were too divergent. To satisfy Unifrac’s need for phylogenetic trees rooted with an outgroup, an artificial outgroup sequence was created by combining 100-bp fragments from three E. bieneusi ITS identified by the following GenBank accession numbers: EF014430, DQ683757, DQ885585. These sequences were chosen because they are located on the most divergent branches of the E. bieneusi ITS phylogeny. In addition, random SNPs were introduced at ~100-bp intervals to ensure that the artificial outgroup would not group with any E. bieneusi ITS branch. To ensure that the artificial outgroup did not affect the results, the analysis was replicated with a different outgroup, which was derived from the Nucleospora salmonis ITS. Because the genetic distance between E. bieneusi ITS sequences and that of N. salmonis exceeded the threshold set by Dnadist, the distance was reduced by manually “mutating” 30% of the N. salmonis positions to that found in E. bieneusi AF242475 of human origin. The divergence of the outgroups with respect to the population of E. bieneusi ITS sequences was individually verified by inspecting a Neighbor Joining tree drawn from all ITS sequences plus the outgroup generated in silico. Finally, UDs were converted into a “heatmap” using a Microsoft Excel macro written by Yukihiro Yabuta downloaded from http://homepage.mac.com/yabyab/my.html. UDs were color coded such that the largest value is represented with the warmest color (red), and smaller distances with colder colors. If present, an UD of 0 would be shown in deep blue. Principal Coordinate Analysis (PCoA) as implemented in the Unifrac program was used to cluster the host-associated ITS sequences. Host-specific segregation of ITS alleles was also investigated with a 2-level hierarchical AMOVA (Excoffier et al., 2005). A model of four groups corresponding to ITS from human, bovine, canine/feline and porcine E. bieneusi isolates was tested.
Rarefaction is commonly used to compare taxonomic diversity among samples of different size. To assess whether ITS diversity differs among host species, rarefaction curves were plotted for sequences originating from humans, livestock and all the other animal hosts combined (Fig. 1A). The graph shows that many of the ITS sequences collected to date are unique. The steep slopes and the lack of a plateau indicate that much of ITS diversity remains to be sampled.
Using rarefaction analysis we compared the ITS diversity among parasites originating from different host species. A total of 41 ITS genotypes were found in the 13 animal species (livestock and wildlife) represented in the database. This sample displays greater taxonomic richness than sequences from humans (n=67, 35.1 genotypes; 95% confidence interval (C.I.) 33.7–36.9), but did not differ significantly from the diversity found in each of the four livestock species (cattle, pig, cat and dog). Rarefied to the livestock sample size of 45, the estimated ITS richness for all animal species combined was 30.3. The 95% C.I. of 24.7–35.6 genotypes includes the livestock ITS richness of 27, indicating a lack of significant difference. Similarly, the rarefied human E. bieneusi ITS diversity of 25.3 (95% C.I. 20.1–30.6) is not significantly different from that of the livestock samples. The main conclusion we are drawing from this analysis is that E. bieneusi ITS sequences in humans are less diverse than in all animals combined. Among the three groups of host species, non-livestock animal host (wildlife) are the smallest groups (n=22) and were therefore not analyzed separately.
Because the geographical diversity of the ITS host-specific groups could impact the analyses, the geographical diversity of human, animals, and livestock ITS collections were compared (Fig. 1B) using the same approach. Rarefied to the animal sample size of 67, the geographic diversity of human ITS (11.8 countries, 95% C.I. 11.1–12.6) was significantly higher that that of the ITS sequences originating from all animals combined (9 countries). If the human and animal groups are rarefied to 45, the livestock sample size, the human and the animal samples were geographically more diverse than the former group which originated from 7 countries only. We infer that lack of geographic diversity is unlikely to be the cause of the lower ITS diversity found in human infections. To the contrary, the higher taxonomic diversity in animals was observed in spite of this group being geographically significantly less diverse than the human sample. We also note that the country rarefaction curves have reached, or are close to a plateau, particularly in the case of the animal collections. This indicates a lack of geographic diversity in these two groups, which is particularly pronounced in the collection from livestock and animals in general.
Phylogenetic trees (Fig. 2) are commonly drawn to visualize the relationship among sequences. Several studies have used such methods to visualize the relations among E. bieneusi ITS sequences from different collections. A limitation of this approach is that evolutionary trees are typically interpreted based on visual inspection. To test whether there is a statistically significant difference among E. bieneusi ITS sequences isolated from different hosts, we applied the Unifrac method (Lozupone et al., 2006) and the “Phylogenetic Test” (P test) (Martin, 2002) to the entire collection of 135 GenBank entries. To rule out bias due to ITS sequences from rare or highly divergent host species, these analyses were performed with different samples; i.e., the complete sample of 135 ITS sequences, the human and livestock subsample of 113 sequences, a human and lifestock subsample of 109 sequences which excludes the four most divergent dog samples, and a subsample of 99 sequences originating from the most abundant hosts species, human, bovine, canine and feline. To facilitate the interpretation, the UDs are represented as a heat map and a cluster diagram (Fig. 3). In a comparison of sequences from humans and each animal subgroup (wildlife excluded) UD values exceeded those among animal groups. UD between human and bovine sequences was the largest (0.42), followed by human-cat (0.38), human-pig (0.38) and human-dog (0.35). In contrast, comparisons among the four animal subgroups yielded smaller UD ranging from 0.08 for cat-dog and pig-dog, to 0.22 for cat-calf. Although none of the human-animal UD approaches the maximum value of 1 which would indicate complete separation, the higher distance in human-animal comparisons is apparent (Fig. 3A). A similar picture emerged from a Principal Coordinate Analysis (PCoA), which shows a pig/dog/cat cluster which is well separated from the bovine group and even more distant from the human group (Fig. 3B). When all host species were tested together, both Unifrac and P test revealed a significant clustering by host (p<0.01), regardless whether all 135 sequences were included, or only human and livestock (n=113), or human and livestock minus pigs (n=99) were analyzed. In addition we also tested whether removal of four E. bieneusi ITS sequences from dogs (AB359947, AB359946, EU650273, DQ885585) would affect the results. Because these sequences are divergent from the remaining sequences (Fig. 2), we wanted to ensure that Unifrac significance was not a result of the long single-species branch formed by these four sequences. The results of the analysis of the remaining 109 sequences (113−4=109) again showed significant clustering by host (p=0.01). PCoA with n=109 generated a similar clustering as observed when 113 sequences were included (not shown). The choice of outgroup (see Materials and Methods) did not affect these results, nor did the algorithm used to draw the phylogenetic tree. Using a Fitch-Margoliash tree (as opposed to Neighbor-Joining) led to same conclusion. UD values and P-test thus are indicative of partial host-specificity among E. bieneusi ITS sequences, particularly with respect to sequences originating from humans and animals.
To verify the results obtained with Unifrac, AMOVA was used to test the model of four (human, bovine, feline, canine) host-associate E. bieneusi ITS groups. Consistent with the above analyses, variation among groups was highly significant (p≈0), regardless whether the four divergent sequences from dogs were included or omitted (Table 1). In the AMOVA of 113 human and livestock sequences, variation among groups accounted for 23.8% of total variation. As expected based on their significant sequence divergence, removal of the four sequences from dogs lowered the percentage of variation among populations, which now accounted for 16.7% of the total.
Our understanding of the epidemiology of human microsporidiosis is still superficial and will evolve as more samples, particularly sequences originating from currently under-sampled host species and geographical locations, are analyzed. The present study represents a preliminary analysis of global E. bieneusi ITS diversity in different host species based on statistical methods to test for host-specific segregation. The main limitations of these analyses is that only one locus is available and that some host species, which may potentially be important for understanding transmission, are missing or are under-sampled. The lack of geographical diversity, particularly for the animal sequences, was already mentioned above in the context of the rarefaction analysis.
The diversity of E. bieneusi is expected to increase as additional loci are included, particularly if recombinant genotypes are found. If the life cycle of E. bieneusi includes a sexual phase, the ITS genotypes by themselves would not be sufficient as isolates carrying identical ITS alleles may differ at other loci. In the absence of genetic recombination the global E. bieneusi population could be viewed as a collection of clonally reproducing genotypes which can be adequately characterized with a single marker. However, evidence of meiotic recombination in other microsporidia, such as Kneallhazia solenopsae, a parasite of fire ants (Sokolova and Fuxa, 2008), Andreanna caspii, a mosquito parasite (Simakova et al., 2008), and Vairimorpha, parasites of Lepidoptera (Jouvenaz and Ellis, 1986), indicates that sexual recombination in E. bieneusi should be investigated. The fact that sex is widespread among fungi supports this argument. Examining the possibility of genetic recombination will require the identification of unlinked polymorphic markers and the development of multilocus genotyping tools. Multilocus genotypes would reveal the extent of linkage disequilibrium in E. bieneusi populations, and enable other population biology analyses. On the other hand, a purely asexual mode of reproduction would imply that genetic diversity can be captured by typing a single locus. Moving to a multilocus genotyping method would be particularly productive if all participating laboratories were to adopt the same markers. The recent evidence of a unified approach to E. bieneusi ITS nomenclature (Santin and Fayer, 2009) indicates that a coordinated effort may also be possible for expanding from the current single-locus method to a multi-locus system. Country rarefaction curves indicate that sampling efforts aiming at diversifying the geography of animal collection should be prioritized.
It is unknown to what extent the ITS sequence is under selection and whether any of the polymorphisms included in the analyses are the result of positive selection. In addition, since it is unknown if E. bieneusi genomes recombine, speculating on the effect that such polymorphisms may have on the parasite’s phenotype would be premature.
The detection of significant clustering of ITS in phylogenetic tests and AMOVA, and the difference in taxonomic richness between humans and animal hosts indicates that ITS alleles are not distributed randomly among host species. Consistent with previously published ITS phylogenies (Drosten et al., 2005), UD values smaller than 1 indicate partial segregation by host. This conclusion was also inferred by others on the basis of sequence comparisons (Cama et al., 2007; Dengjel et al., 2001; Leelayoova et al., 2008; Mathis et al., 2005; Santin et al., 2005; Sulaiman et al., 2004). Recently, Ten Hove and co-workers found that in a group of 57 isolates from the Netherlands and Malawi ITS sequences isolated from pigs and cattle formed separate clusters, and that E. bieneusi from immunosuppressed individuals shared the ITS genotype C (ten Hove et al., 2009). Together with the present analysis of a global ITS collection, these studies are consistent with partial segregation of ITS genotypes according to host and with the occurrence of zoonotic transmission. If future multilocus genotypes confirm this model, it is conceivable that E. bieneusi may comprise genotypes with different host range, some infecting only certain species and other circulating among a wider range of host species, including humans. Improved genotyping methods, in particular multilocus genotyping protocols, and the expansion of genotype information to other hosts and countries will be needed to further dissect the epidemiology of human microsporidiosis.
Financial support from the NIAID (grants R01AI052781 and R21AI064118) is gratefully acknowledged. Our thanks to Zijin Wu for assistance with data analysis and Alejandro Greenberg for critical comments on the manuscript.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.