|Home | About | Journals | Submit | Contact Us | Français|
To develop and assess a microsatellite technique to characterize populations of Schistosoma mansoni from humans.
For each of five patients, we calculated the allele count and frequency at 11 loci for several pools of miracidia (50 and 100), and compared these to population values, determined by amplifying microsatellites from 186 to 200 individual miracidia per patient.
We were able to detect up to 94.5% of alleles in pools. Allele count and frequency strongly and significantly correlated between singles and pools; marginally significant differences (P < 0.05) were detected for one patient (pools of 50) for allele frequencies and for two patients (pools of 100) for allele counts. Kato–Katz egg counts and number of alleles per pool did not co-vary, indicating that further direct comparisons of the results from these two techniques are needed.
Allele counts and frequency profiles from pooling provide important information about infection intensity and complexity, beyond that obtained from traditional methods. Although we are not advocating use of pooling to replace individual genotyping studies, it can potentially be useful in certain applications as a rapid and cost effective screening method for studies of S. mansoni population genetics, or as a more informative way to quantify and characterize human worm populations.
Schistosomiasis is one of the world’s neglected diseases (Chitsulo et al. 2000; Engels et al. 2002; Hotez & Ferris 2006; Steinmann et al. 2006) and remains stubbornly entrenched in many developing African countries (van der Werf et al. 2003). It exemplifies an insidious, chronic and debilitating infection with a life cycle that resists control. Its public health impact is probably greater than generally recognized (King & Dangerfield-Cha 2008). As this parasite continues to burden human communities in many countries, our need for a better understanding of schistosomes grows. One area of study that is increasingly being emphasized is the elucidation of broad-scale distribution patterns of worm genetic diversity. Markers such as microsatellites can be used to study the forces influencing host-parasite coevolution and maintenance (for example, see Thiele et al. 2008) and have the potential to inform control strategies.
Population genetic studies of schistosomes in humans, however, are made difficult by the inaccessible location of adult worms in the host. Schistosoma mansoni adults are typically found in the portal veins draining the large intestine (Roberts & Janovy 2004), a location from which sampling is only possible by the highly invasive procedure of perfusion. An alternative to this impractical sampling regime has been to sample the parasite’s progeny, which can be performed in two ways. Progeny can be sampled indirectly by subjecting material collected from patients to laboratory passage through snail and murine hosts (Curtis et al. 2002; Rodrigues et al. 2002; Stohler et al. 2004; Agola et al. 2006; Gower et al. 2007; Steinauer et al. 2008a; Thiele et al. 2008). Adult worms, perfused from the mice, are then used as an indirect reflection of the patient’s original worm population. The drawbacks of this technique include high cost, time delays, intense effort, use of numerous laboratory animals, bottlenecking and/or host-induced selection pressures (Stohler et al. 2004; Sorensen et al. 2006; Gower et al. 2007). The alternative is that progeny can be sampled by genotyping eggs or miracidia directly from a patient (Silva et al. 2006; Sorensen et al. 2006; Gower et al. 2007).
One technical limitation to using miracidia to study genetic population parameters of schistosomes is the number of specimens needed for adequate sampling. It has been suggested that for each population of parasites, as many as 1000 adults may need to be genotyped to represent adequately species with large natural ranges (Jarne & Theron 2001). Ultimately, the individual sampling and testing of many thousands of miracidia would prove cost and time prohibitive, especially considering that broad-scale studies of population structure and diversity would require sampling over large geographic ranges.
Pooling of templates has been suggested to circumvent the cost prohibitive nature of genotyping individuals (Pacek et al. 1993). In this approach, templates (or individuals) are combined into pools prior to DNA amplification and analysis. In a novel study, pooling was tested for S. mansoni miracidia (Silva et al. 2006) using laboratory strains and synthetic pools. Pooling proved to be a reliable way to reconstruct features about the population, such as population allele frequencies from the individual samples contributing to the pool (Silva et al. 2006). This pooling approach has, however, not been tested in samples from endemic areas where much higher diversity and genetic complexity have been found (Morgan et al. 2005; Steinauer et al. 2008a, b).
Using miracidia collected from humans living in a major endemic focus for S. mansoni, Lake Victoria, Kisumu, Kenya, we used 11 microsatellite markers to investigate the efficiency of the pooling approach as an estimate of population genetic parameters and tested the possibilities of using pooling as a way to characterize adult worm infrapopoulations. The basis of comparison was allele frequency data derived from individual miracidia from the same patient’s stool sample.
The five participants (referred to as 20, 22, 24, 25 and 26) in this study were adult males and were chosen from a larger population, which was part of a long-term longitudinal study (Karanja et al. 1998, 2002; Mwinzi et al. 2001). Patients are exposed to schistosomes as they work and stand in the shallow waters of the lake to wash cars, trucks and motorcycles. Faecal contamination along these 45 m of shoreline, within downtown Kisumu, is common (Steinauer et al. 2008a). Recruitment of participants started in 1995, as part of a study by the University of Georgia, Centers for Disease Control and Prevention (CDCP) and the Kenya Medical Research Institute (KEMRI). Faecal samples were obtained in February 2007, after they had been used for their original purpose (Karanja et al. 1997, 1998).
Faecal samples were returned to the laboratory within 2–4 h of collection, and while transported were maintained in the dark and at 4 °C. The number of eggs/g was determined using the traditional Kato–Katz thick faecal smear technique (Kato & Miura 1954; Katz et al. 1972). For each patient, two to six independent smears were made on a single day or for up to 6 consecutive days (Table 1).
Eggs were removed from faecal samples by sieving homogenized samples sequentially through 710, 425, 212 and 45 μm sieves using commercial water. Eggs were washed several times and placed into a light-exposed 1-l flask with water. Water at the top of the flask was poured into Petri dishes and miracidia were individually pipetted into sterile 96-well plates, or pooled into sterile 1500 μl plastic tubes using a micropipettor and a counter to ensure accurate collecting. All miracidia from each patient, for pools and singles, were collected from a single faecal sample within a 4 h time interval.
Genomic DNA from singles or pools was extracted using a modified HotShot method (Truett et al. 2000; Steinauer et al. 2008b). Eleven previously published microsatellite loci (Durand et al. 2000; Curtis et al. 2002; Silva et al. 2006) were amplified in two multiplexed PCR reactions, P17 and P22 panels (Table 2), as described previously (Steinauer et al. 2008b). PCR products were genotyped (ABI3130; Applied Biosystems, Foster City, CA), scored with GeneMapper 4.0 (Applied Biosystems) and verified manually. As the frequency of a rare allele in pools could be as low as 0.5%, all peaks regardless of size were scored manually, and the area of each peak was calculated by GeneMapper. Only peaks falling within preset bins and showing normal allele amplification shapes (Birnbaum & Rosenbaum 2002) were included in subsequent analyses.
Allele frequencies were calculated differently for individual miracidium (singles) and pooled miracidia (pools) samples. Allele frequencies for singles were calculated by traditional methods; all genotypes of singles per patient were summed and a percentage was calculated. For pools, the total area of genotyped allele peaks at each locus was summed and individual allele frequencies were then calculated by dividing each peak area by the summed total area. Thus, allele frequencies were based on (1) pools derived from 50 or 100 miracidia and (2) 186–200 single miracidia per patient (Table 1).
Allele counts were also calculated differently for singles and pools. For pools, peaks for each locus were summed for each patient. For singles, to make their counts comparable to pools, random resampling was used to calculate the average number of alleles expected for 50 and 100 single miracidia. Resampling calculations were done with a script written in the Perl programming language (O’Reilly Media, Inc., Sebastopol, CA). Thus, allele counts were based on (1) pools derived from 50 or 100 miracidia and (2) resampled populations of 50 or 100 single miracidia.
Associations between variables were calculated using Pearson’s product–moment correlation, except in cases of non-parametric data where Spearman’s rank-order correlation was more appropriate. Groups of paired measures (for each individual locus) were tested for differences using paired t-tests (double tailed), except in cases of non-parametric data where Wilcoxon’s signed ranks tests were used. All statistics were performed on the Vassar-Stats utility (http://faculty.vassar.edu/lowry/VassarStats.html).
The parent studies from which the current data were obtained were approved by the institutional review boards of the University of Georgia and the CDCP, the Scientific Steering Committee of the KEMRI and the KEMRI/National Ethics Review Board of Kenya.
For each patient, we obtained from 6 to 10 pools of miracidia (50 or 100 individuals per pool) and 186–200 single miracidia. Combining all data, we detected a total of 192 unique alleles in singles and pooled data across the 11 loci. We detected 183 alleles from singles (977 miracidia), 184 alleles from 20 pools of 50 (1000 miracidia) and 185 alleles from 18 pools of 100 (1800 miracidia). Based on allele resampling from the singles data, allelic richness was highest for locus C5, containing 23.1–28.5 (n = 50) and 29.4–37.0 (n = 100) alleles per patient; whereas locus D2 contained the least diversity, with 2.2–3.0 (pools 50) and 2.5–3.5 (pools 100) alleles per patient (Table 3). Allele counts based on resampled singles for all 11 loci ranged from 116.8 to 130.6 (n = 50) and 136.8 to 151.4 (n = 100).
To compare allele counts between pools and singles, two analyses were performed. The first analysis was based on the average rate of allele detection in pools for each patient; pools of 50 performed better than pools of 100 (94.5% and 89.6% respectively; Table 3). More alleles were detected in pools than single samples at several loci (especially D2, S6 and S1). For all patients, there was a significant positive correlation between the allele counts, per locus, of singles and pools of 50 (r2 = 0.782, P < 0.01) or pools of 100 (r2 = 0.8372, P < 0.01; Figure 1). For individual patients, allele counts per locus showed marginally significant differences for one patient for pools of 50 and two patients for pools of 100 (P < 0.05) from singles (Table 3).
Two allele rich loci performed poorest in percent allele detection (D7 and C5), with rates below 85% and below 80% for pools of 50 and pools of 100 respectively (Table 3). If these loci are excluded from the analysis, the detection rate for pools averaged over all patients increases to 100% and 96.4% for pools of 50 and 100, respectively, and the difference between both pool sizes and singles is no longer statistically significant (t-test; both P > 0.05) for any of the patients.
The second allele count analysis was based on a comparison of the detection rates of each allele between singles and pools, across all patients. For each locus, we calculated the expected allele count, based on singles and subtracted the pool allele count (Figure 2). For this analysis, allele counts were rounded to the nearest whole integer. A value of 0 indicates agreement between singles and pools. For all pools combined, 19.86% loci were scored the same between pools and singles (pools 50: 15.45%; pools 100: 24.74%). Additionally, pools more often missed alleles detected in singles than detected alleles not seen in singles (Figure 2).
Pool allele frequencies correlated well with singles (Figure 1). Overall, pools of 50 (r2 = 0.880, P < 0.01) were better correlated with singles than pools of 100 (r2 = 0.840, P < 0.01). Some loci (C1, D6 and D7) did not correlate as well as others (C5 and D2; Table 4). The degree of correlation of data from pools and singles varied among patients. For example, the correlation for patient 22 was much higher than for patient 20.
Deviation of allele frequency estimates between pools and singles for each of the 192 encountered alleles was calculated by taking the absolute difference of the average for all pools from all patients, and from the calculated frequencies from singles (Figure 3). The mean allele error rates were higher for pools of 100 ( = 11.1, σ = 11.7) than pools of 50 ( = 11.6, σ = 14.3). Although the mean error rates were relatively high, much of the variation was due to few alleles; more than half of the total error was confined to 16% of alleles, whereas 42% of alleles contained 10% of the total error (Figure 3). Furthermore, 86% of all alleles varied less than 20% between allele frequencies of pools and singles. Tests of allele frequencies for each locus individually revealed no significant differences between pools and singles for each patient (Table 4). When comparing the allele frequencies between singles and pools of 100, for the 100 most common alleles (as measured by singles), the majority of the error was found in the overestimation of rare alleles.
Kato–Katz values ranged from 83 to 367 eggs/g, with exceptionally high standard deviations 19–183 eggs/g (Table 1). Although small sample size did not allow for statistical comparisons, allele counts from singles or pools did not appear to vary with eggs/g (Figure 4).
The number of alleles per locus from singles and pools was significantly and negatively correlated with the detection rate of alleles by pools (Figure 5). The influence was slightly greater on pools of 50 (r2 = 0.189, P < 0.01) than on pools of 100 (r2 = 0.170, P < 0.01). Allele length was significantly and negatively correlated with the allele detection rate by pools (pools 100: r2 = 0.0475, P < 0.01; pools 50: r2 = 0.0523, P < 0.01). Alleles detected in singles but not in pools were significantly longer than alleles correctly scored for both pool sizes (Table 5; P < 0.01), suggesting that most of the alleles not detected by pools, are from loci with longer alleles, and that most of the alleles detected by pools, but not by singles are in loci with shorter alleles. We also found that there was a significant negative correlation between population allele frequency (based on singles) and detection rate of individual alleles in each pool (pools 100: r2 = 0.0936, P < 0.01; pools 50: r2 = 0.1386, P < 0.01), suggesting that as alleles become increasingly rare, they are also increasingly difficult to detect in pools.
This is the first study to test the use of pooled miracidia coupled with microsatellite analysis to characterize S. mansoni infrapopulations from patients within endemic areas. Silva et al. (2006) included seven loci containing a total of 18 alleles to analyse a laboratory-maintained S. mansoni strain. In the current study, we detected a total of 193 alleles (11 loci) from five patients; not surprisingly, this population had a much higher diversity (Morgan et al. 2005; Steinauer et al. 2008a, b). These results clearly indicate that studies using pooling in natural S. mansoni populations must consider high allele diversity. Although the samples used for this study were highly diverse, we nonetheless found a high degree of correlation between allele measurements of pooled and singles data (Figure 1).
The advantage of analyzing DNA from pooled miracidia is clear when considering time and cost compared to sampling individuals. We estimate that pooling could save from one to two orders of magnitude of money and effort. Currently, genotyping 100 miracidia costs about $250 USD, and takes about 1 week, including time needed for data analysis and reanalysis of failed sequences. In contrast, two pools of 50 miracidia can be genotyped for $5 USD, and takes hours. The possible sampling breadth afforded by the savings of this technique must be balanced by its possible decreased sensitivity and information content.
To assess the sensitivity of pooling, several technical issues pertinent to the general approach of microsatellite analysis of pools must be addressed. First, pooling must be able to infer correctly allele numbers. Allele dropout can be a serious problem for population genetic analyses and detection of rare alleles. Despite their small contribution to genetic variation, the presence of such alleles can be of considerable theoretical and practical importance (Kraft & Sall 1999). We found that the pooling approach used in this study detects about 8% fewer alleles than singles and in some cases, we detected alleles in pools not seen in singles.
The source of genetic differences measured in this study between singles and pools could have come from several sources. Actual genotypic representation may have differed between groups of singles and pools. As miracidia were randomly sampled from the population derived from a faecal sample, it is possible that genotypes will have had differential representation in groups of singles and pools. Another source of error in allele counts of singles and pools may have come from the fact that both of these techniques are independent measures of allelic differences, each having their own error rate. Thus, both measures may contain genotyping errors. Genotyping error is due to unequal allelic amplification, a phenomenon caused by differences in allele sizes or polymerase initiation (Daniels et al. 1998) and by stutter peaks, which are small peaks on either side of the real allele peak (Perlin et al. 1995). Both of these errors can be decreased by the use of a training set of templates, and introducing a mathematical correction factor (Collins et al. 2000; Schnack et al. 2004). Due to our experimental protocol and the small size of miracidia, which are at the current limit of reproducible extractions, we were not able to include such a correction factor. Furthermore, although our data show that allele length negatively correlated with allele detection rates in pools, the correlation coefficients were small (0.0523 and 0.0475 respectively) and it is unlikely that any correction factor could have significantly improved the correlation between singles and pools.
The second technical issue pertinent to pooling approaches is that they must be able to amplify alleles in proportion to their true frequency. Amplification artefacts (see above) can have potentially serious impact on allele frequency. We attempted to minimize these artefacts through modification of amplification cycle, marker choice and optimization (Steinauer et al. 2008b). Our correlations of allele frequency between pools and singles were remarkably high given the high allelic diversity of the populations sampled.
The most serious drawback to pooling assays is the loss of information on individuals. Thus, with the loss of data on heterozygosity, the Hardy–Weinberg equilibrium cannot be determined and more sophisticated population analysis tools (such as those to calculate inbreeding coefficients, linkage disequilibrium and population structure) cannot be used. Although we are not advocating pooling as an alternative to individual worm genotyping in population genetics studies, this does not mean that this approach cannot be useful in other contexts with more immediate uses. Pooling could be used as a rapid and cost effective method to characterize worm populations by producing allelic frequency profiles, from which worm populations from individual patients or geographic areas could potentially be identified and monitored in relation to climate or ecological change, or following treatment of patient with antihelminthics. Such analyses could likely lead to a better-targeted analysis of individual miracidia, making the investment of large amounts of resources for typing individuals more tractable.
Another possible use of pooling is as an alternative or supplement to traditional worm burden quantification methods. Currently, there are few reliable methods to measure adequately schistosome infections in humans. The most commonly used surrogate diagnostic tool for infection (and intensity) is the Kato–Katz technique (Kato & Miura 1954; Katz et al. 1972), which relies on enumeration of parasite ova in stool samples. However, the sensitivity and meaning of single or even multiple examinations may be unclear due to variation of the distribution of eggs within the stool, daily egg output and different observers (De Vlas et al. 1997; Engels et al. 1997; Kongs et al. 2001; Utzinger et al. 2001). The Kato–Katz technique relies on 20–50 mg of sieved stool per test. Our pooling technique uses a patient’s entire stool sample, of up to hundreds of milligrams, and thus is likely to be several orders more sensitive. The other commonly used surrogate diagnostic test is for parasite-derived antigens in the blood (CAA and CCA), which has been used to detect schistosome infections (Van Lieshout et al. 1995; Mutapi et al. 1997) and determine worm burdens (Barsoum et al. 1990; Agnew et al. 1995). Studying the relationship between worm burdens and eggs/g or amount of circulating antigens in humans has been nearly impossible. Only studies of S. mansoni in non-human primates have revealed a linear relationship between worm burdens and surrogate methods (Wilson et al. 2006). However, each of these surrogates often miss low intensity infections, prompting the call for more sensitive methods (Teesdale et al. 1985; Wilson et al. 2006).
In addition to providing data on worm burdens, it would be helpful if new diagnostic techniques could also provide clues about the genetic complexity of infections. We believe that the data presented here suggest that microsatellite analysis of pools can provide a simple alternative to characterizing worm infrapopulations. We have found that microsatellite analysis has painted a very different picture of the patient’s worm infrapopulation than traditional methods (Figure 4). The Kato–Katz method indicated that eggs/g varied among the five patients by more than fourfold, whereas genotyping indicated that the parasite allele count is much more similar (Table 3). Thus, though patient 25 had a Kato–Katz count more than four times higher than patient 22, their parasite infrapopulations contained similar allele counts. Although data based on microsatellites showed that allele counts among these patients are similar, additional studies are needed to determine how these allele counts correlate with worm burdens. We believe what we have described here provides a strong incentive for more research to tease out whether pool allele counts could serve as an alternative or adjunct to Kato–Katz counts and whether this technique could be used as a reliable indicator of infection intensity and complexity. Knowing the complexity of adult worm infrapopulations or metapopulations may ultimately help guide and monitor treatment of individual patients or allow for the focus of treatment to specific geographic areas.
Primary funding was provided by NIH grant AI044913. We thank George Rosenberg for writing the Perl script and acknowledge support from the UNM Molecular Biology Facility and from NIH grant 1P20RR18754 (IDeA Program of the National Center for Research Resources) and J.M. Kinuthia, M.W. Mutuku, B.N. Mungai, B. Abudho, and B. Mualuko for field and laboratory assistance. Field support was provided by D.G. Colley, E. Secor, and D.M.S. Karanja (NIH grant R01AI053695). We would also like to thank three reviewers for their helpful comments.