|Home | About | Journals | Submit | Contact Us | Français|
A number of chemical, microbial, and eukaryotic indicators have been proposed as indicators of fecal pollution sources in water bodies. No single one of the indicators tested to date has been able to determine the source of fecal pollution in water. However, the combined use of different indicators has been demonstrated to be the best way of defining predictive models suitable for determining fecal pollution sources. Molecular methods are promising tools that could complement standard microbiological water analysis. In this study, the feasibility of some proposed molecular indicators for microbial source tracking (MST) was compared (names of markers are in parentheses): host-specific Bacteroidetes (HF134, HF183, CF128, and CF193), Bifidobacterium adolescentis (ADO), Bifidobacterium dentium (DEN), the gene esp of Enterococcus faecium, and host-specific mitochondrial DNA associated with humans, cattle, and pigs (Humito, Bomito, and Pomito, respectively). None of the individual molecular markers tested enabled 100% source identification. They should be combined with other markers to raise sensitivity and specificity and increase the number of sources that are identified. MST predictive models using only these molecular markers were developed. The models were evaluated by considering the lowest number of molecular indicators needed to obtain the highest rate of identification of fecal sources. The combined use of three molecular markers (ADO, Bomito, and Pomito) enabled correct identification of 75.7% of the samples, with differentiation between human, swine, bovine, and poultry sources. Discrimination between human and nonhuman fecal pollution was possible using two markers: ADO and Pomito (84.6% correct identification). The percentage of correct identification increased with the number of markers analyzed. The best predictive model for distinguishing human from nonhuman fecal sources was based on 5 molecular markers (HF134, ADO, DEN, Bomito, and Pomito) and provided 90.1% correct classification.
Fecal pollution represents a serious public health problem. Pathogens from infected animals and humans can be introduced into the environment through feces and cause health risks, environmental degradation, and economic losses. In recent years, water authorities' environmental and sanitary regulations have focused on the total fecal load that can be held by a water body and on determining the source of fecal pollution. Accurate and reliable methods for detecting fecal pollution are needed to reduce its occurrence, prevent future spills, decrease economic losses, and take legal measures.
Total coliforms, fecal coliforms, enterococci, and Escherichia coli have traditionally been used as microbial fecal indicators in water. These microorganisms are easy to enumerate by cultivation methods. However, they do not identify the source of fecal pollution.
Fecal pollution of surface waters comes from point or diffuse sources, including municipal sewage, slaughterhouse wastewater, manure and biosolid disposal, wildlife, and undetermined runoff. Reliable microbial source tracking (MST) methods can provide efficient and rapid fecal source determination and facilitate cost-effective remediation. In recent years, various MST methods have been developed that are based on library-dependent or -independent methods and analyze phenotypic and/or genomic characteristics (39, 54, 55). Library-dependent methods (LDM) require a comprehensive library of isolates from known sources. Isolates from unknown sources are classified by correspondence with those from the library (57). LDM include antibiotic resistance analysis, carbon source utilization, repetitive PCR, and ribotyping. However, the geographic and temporal stability and the numerical methods used for these LDM have been questioned (26, 44). Some cultivation-based methods have already been described, such as the detection of specific enterotoxins of E. coli strains (30, 31, 43), the differentiation and enumeration of sorbitol-fermenting bifidobacteria (10, 37), and the enumeration of phages that infect host-specific Bacteroides spp. (8, 45). Cultivation methods detect only viable bacteria, may give a biased picture of the populations, and thus misrepresent the bacterial diversity (60). The use of PCR-based methods allows the detection of bacteria that are difficult to grow, such as anaerobes, including the genera Bacteroides and Bifidobacterium (5, 9, 14, 63), Rhodococcus coprophilus (51), methanogenic archaeal bacteria (59), and viruses (27). More detailed information on MST methods can be found in several technical reviews (8, 17, 54, 55, 57).
Bifidobacterium and Bacteroides have been proposed as possible source-tracking indicators for waterborne fecal pathogens (3, 18, 33, 37, 41, 48). Several Bifidobacterium species are thought to be human host specific, such as Bifidobacterium adolescentis, Bifidobacterium dentium, and Bifidobacterium longum (58). Meanwhile, others have been linked to certain domestic animals (20, 61). A multiplex PCR has been developed to detect human fecal pollution by analyzing the presence of B. dentium and B. adolescentis in water (9). Bacteroidetes markers are mainly based on the definition of host-specific oligonucleotides (for example, to detect human, ruminant, and swine pollution) that are associated with some uncultured populations (5, 15, 32, 46, 47). Geographical differences in host specificity have been observed when these markers are applied in different world regions (1, 2, 11, 21, 22, 24, 40, 42). The detection of the gene esp, which encodes an enterococcal surface protein, has also been proposed as an indicator of human fecal pollution (53). This gene has been associated with the virulence, colonization and biofilm formation found in Enterococcus faecium and Enterococcus faecalis (25). However, recent studies have indicated that the detection of esp may not always be related to human fecal pollution (12, 35). Other MST indicators have been developed for eukaryotic molecular markers. Martellini et al. (38) designed a PCR protocol that targets eukaryotic genetic markers as a fecal source tracking method for differentiating human, porcine, bovine, and ovine fecal pollution. This protocol consists of nested PCRs, based on the amplification of mitochondrial DNA from the host cells. Multiplex and real-time PCR methods for mitochondrial MST indicators have also been developed (4, 13, 52).
It has been shown that no single microbial or chemical MST indicator can determine the source of fecal pollution. Therefore, a selection of indicators is required (7, 8, 22, 24). Predictive models to distinguish between human and nonhuman pollution have been developed by combining indicators. These models have achieved a 100% likelihood of success (7, 24, 56). However, they are mostly based on culture-dependent methods, and discernment among different animal sources should be attained. In this study, microbial and eukaryotic molecular markers were compared for use as MST indicators. Potential combinations were also evaluated. Finally, MST predictive models using only these molecular markers were developed using a number of established statistical methods. The models were evaluated by considering the lowest number of molecular indicators needed to obtain the highest rate of discrimination among fecal sources.
A total of 144 samples of wastewater, feces, and slurry were collected from human and animal sources. Human sewage samples were obtained from 9 urban water treatment plants (50 samples). Animal samples were obtained from the following sources: two poultry slaughterhouse wastewater effluents (31 samples); swine feces and slurry from 4 slaughterhouses and a farm (38 samples); 6 ruminant slaughterhouses and two bovine farms (25 samples). All samples were maintained at 4°C until analysis (less than 6 h). Most of the samples (132) were taken within a radius of 200 km from our laboratory. Twelve frozen preserved samples from European countries that had been obtained from an international research project (EVK1-CT1 2000-00080 ) were also analyzed: 3 samples from France, 2 from Sweden, 2 from the United Kingdom, and 5 from Cyprus.
To prevent the emergence of PCR inhibitors, genomic DNA from 200 μl of the samples was extracted using a QIAamp DNA blood extraction minikit (Qiagen), according to the manufacturer's instructions.
ADO-DEN multiplex PCR was used to detect B. adolescentis (ADO) and B. dentium (DEN) human fecal source markers (9, 11). A first PCR was performed using 16 rRNA gene-targeting Bifidobacterium-specific primers lm26 and lm3 (28). The 1,420-bp PCR amplicon was amplified using the ADO-DEN multiplex PCR with species-specific sets of primers (9). The controls were B. adolescentis DSM 20083T and B. dentium DSM 20084T. The strains were grown in reinforced clostridium medium (Oxoid, Hampshire, United Kingdom), and the DNA extraction was performed using a previously described protocol (29).
Specific PCR primers designed by Bernhard and Field (5) were used to discriminate Bacteroidetes from human and ruminant sources of fecal contamination. The primers HF134F and HF183F distinguish human Bacteroidetes using the genus-specific reverse primer Bac708R (5). The primers CF128F, CF193F, and Bac708R were used to differentiate the ruminant Bacteroidetes group. These primers were designed as a function of the uncultured markers obtained from an analysis of the Bacteroidetes communities in feces using terminal restriction fragment length polymorphism (T-RFLP) techniques (6). Consequently, no strain-like positive control could be used. However, positive samples (a positive urban wastewater sample for humans and a positive slurry sample for ruminants) were used for this purpose. Each 25-μl PCR mixture contained the following: 1× Taq polymerase buffer (Eppendorf, Hamburg, Germany), each primer at a concentration of 200 μM, 0.625 U of Taq polymerase (Eppendorf), 1.5 mM MgCl2, 0.2 μM each nucleotide, and 1 μl of the extracted DNA. The reaction was performed in a Perkin-Elmer thermocycler under the following conditions: an initial denaturation at 94°C for 2 min, 30 cycles consisting of 94°C for 1 min, a suitable annealing temperature for every marker for 1 min (62°C for the ruminant marker and 63°C for the human marker) and 72°C for 1.5 min, followed by a final 7-min extension at 72°C (K. Field, personal communication).
The method for detecting the gene esp requires a previous cultivation stage to enrich the sample and increase the probability of detecting the gene (53). Samples were first processed by membrane filtration through 0.45-μm porous membrane filters (Ez-Pak Membrane; Millipore). Then, filters were incubated on m-Enterococcus (mE) agar and incubated for 48 h at 37°C. Filters containing presumptive enterococci colonies were suspended on tryptic soy broth (Conda, Madrid, Spain) and incubated for 3 h at 41°C. Then, DNA extraction was performed using a QIAamp DNA blood minikit (Qiagen), according to the manufacturer's instructions. Thirty-five samples were analyzed following the described procedure for detecting the gene esp (53), which uses a sample pre-enrichment step. Moreover, 77 samples were analyzed directly by PCR without the pre-enrichment step. The gene was amplified using a forward primer that is specific for the protein esp. It has been suggested that this primer is specifically associated with human E. faecium strains (53). The reverse primer used in this study was described by Hammerum and Jensen (25) for detecting a nonspecific esp protein. The E. faecium strain C68 was used as a positive control in the analysis. This strain was kindly provided by L. B. Rice (Louis Stokes Cleveland VA Medical Center, Cleveland, OH).
Three nested-PCR assays were performed on each sample to detect mitochondrial host-specific DNA associated with humans, cattle, and pigs (designated Humito, Bomito, and Pomito, respectively). Primers and the conditions used were the same as those described previously (38).
Sensitivity (r) and specificity (s) were calculated according to the following formulas: r = [TP/(TP + FN)] and s = [TN/(TN + FP)], where TP is the number of samples that were positive for the PCR marker of their own species (true positive), FN is the number of samples that were negative for a PCR marker of their own species (false negative), TN is the number of samples that were negative for a PCR marker of another species (true negative), and FP is the number of samples that were positive for a PCR marker of another species (false positive).
Bayes' formula was used to calculate the conditional probability of hosting a specific marker. The conditional probability of correct classification is the probability that any detection of PCR markers is the result of a true positive (32). For example, when used to calculate the probability of a source of human fecal pollution (H) given a positive marker (T): P(H|T) = [P(T|H) × P(H)]/[P(T|H) × P(H) + P(T|H′) × (P(H′)] = [P(H) × r]/[P(H) × r + (1 − P(H)) × (1 − s)], where P(T|H) is the probability of getting a positive signal, which is captured by the sensitivity (r) (an example would be getting a positive for the B. adolescentis marker in sewage), P(H) is the probability of the marker in the samples, P(T|H') is the probability of getting a positive signal with that marker in an animal wastewater sample (1 − s), and P(H′) = 1 − P(H) is the probability of not detecting the marker.
Several classifiers were used to develop predictive models for MST, chosen using the criterion of simplicity (16). Linear and quadratic discriminant analysis (LDA and QDA) are widely used parametric methods in which the class distributions are multivariate Gaussians and maximum-likelihood estimates are used instead of population parameters (means, covariances, and priors for every class). With LDA, all classes are assumed to have the same covariance matrix. QDA does not need such an assumption; however, the number of parameters to be estimated from the data available for each class is much higher, entailing lower statistical significance. An observation (a water sample in MST studies) is classified into a class if the squared distance (also called the Mahalanobis distance) of the observation to the class center (its mean) is the minimum. These are attractive methods because they need no parameter tuning and their limited complexity (quadratic at most) is a solid guard against overfitting the data. One highly practical simplification is the so-called naïve Bayes classifier, which makes the assumption that the variables (selected indicators in MST studies) are class-conditionally independent (this assumption is not as rigid as assuming independent variables). Despite its simplicity, the naïve Bayes classifier has been shown to have comparable performance in some domains to more sophisticated methods, such as artificial neural networks.
The k-nearest-neighbors technique (k-NN) is a very intuitive nonparametric technique that classifies new observations based on their distance to observations in the training set. Given an unlabeled observation x, k-NN finds the k closest labeled observations to x and assigns it to the majority class within this set of observations. The method has the advantages of being analytically tractable and straightforward to implement. On the negative side, there are large storage requirements and computationally intensive recall.
In C4.5 decision trees (DTs), the internal nodes are questions about the possible values of a variable and the leaves are decision nodes labeled with the class. The trees are grown in an iterative top-down process. At every step, the remaining set of observations is split according to the variable that most greatly reduces the uncertainty (e.g., measured by entropy) of this set with respect to the classes. Once it has finished, test observations are classified through a sequence of questions, one per internal node. The main benefit of DTs is interpretability, since the decision process for a test observation can be expressed as the conjunction of decisions taken along the path from the root to the corresponding leaf node. The main drawback is that DTs are quite unstable learners: the addition or removal of a small set of observations may result in an entirely different tree.
Not all the markers were analyzed for all the 144 samples tested in the study. In cases where a specific marker was not analyzed for a sample, there were difficulties in getting reliable imputations. The C4.5 decision tree handles this situation by built-in procedures that average over the possible values (decision tree a [DTa]). We were also interested in assessing trees developed by considering this situation as a further possible value (decision tree b [DTb]). The naïve Bayes classifier ignored the lost values in the probability counts (though using the rest of the involved sample). The other methods discarded the sample.
In all cases, the classifiers were evaluated by leave-one-out cross-validation. The analyses were performed using the software packages Minitab 15 and Weka 3.6 (http://www.cs.waikato.ac.nz/ml/weka/) (19, 62).
The sensitivity and specificity of the markers were tested against DNA extracted from host-specific samples. Comparisons between the positive samples for each marker are shown in Table Table1.1. The values obtained for the sensitivity, specificity, and conditional probability of each marker are shown in Table Table22.
The ADO marker indicated that 95.6% of the 45 human samples analyzed were positive. Lower sensitivity was obtained using the DEN marker, which identified 64.4% of the 45 human samples as positive. Nevertheless, this marker showed higher specificity (91.8%) than the ADO marker (74.3%). The percentages of correct human identification were very similar: 80% for the ADO marker and 77% for the DEN marker.
Only 12 of the 40 samples were amplified with the human Bacteroidetes marker HF134, while only 20 of the samples were amplified with HF183. These values indicate sensitivities of 30% and 50%, respectively. The ruminant Bacteroidetes marker CF128 was detected in 26% of the 19 ruminant source samples analyzed. No ruminant samples were amplified with the CF193 marker. Despite the lower sensitivity observed for the Bacteroidetes markers, the specificity was high: 81% and 71%, respectively, for the human markers and 100% for the ruminant CF128 marker. The percentage of correct ruminant identification was 100% for CF128. The percentage of correct human identification was 32% for the HF134 marker and 50% for the HF183 marker. CF128 was exclusively detected in ruminant samples, although a low sensitivity was reported.
The detection of the esp gene of E. faecium strains from human sources was performed in parallel using the same conditions as for the rest of the molecular markers (without the enrichment culture step) and following the protocol described previously (53). The analysis that was carried out without the initial enrichment step of the sample had a very low sensitivity (4.4%) and a low percentage of correct classification (5.9%). However, when the described method was applied, the sensitivity improved. Thus, a 48-h culture of enterococci with 3 h of enrichment was needed. The sensitivity was reported as 77%. Ten of the 13 human samples analyzed by this method were amplified. With this procedure, the marker was detected in 6 of the 10 pig samples tested and 1 of the 5 cow samples. A low specificity of 68% was reported.
The three mitochondrial DNA markers for detecting human (Humito), bovine (Bomito), and porcine (Pomito) pollution exhibited high sensitivities: 84.4%, 84.2%, and 87.9%, respectively. The percentages of correct classification were 76%, 69% and 81%, respectively. The specificity was also high for the ruminant (87%) and swine (90.1%) markers. However, the specificity was lower for the human mitochondrial DNA marker (41.1%).
The CF193 markers and the detection of the esp gene (with and without the enrichment step) were not used in combinations to develop MST predictive models, as they led to a low percentage of correct classification and sensitivity. Several models were developed to distinguish between the four possible fecal pollution sources (labeled HPBP: human, porcine, bovine, poultry) (Table (Table3).3). Models for discriminating human and nonhuman sources (labeled H-NH) are also shown in Table Table3.3. The predictive model for HPBP that exhibits the highest correct classification was based on seven markers (HF134, HF183, ADO, DEN, Humito, Bomito, and Pomito). This model used LDA and correctly classified the source of 79.5% of the samples analyzed. Other methods used a lower number of markers to discriminate the source of fecal pollution. A DTa model correctly classified 71.3% of the samples using four markers: ADO, HF183, Bomito, and Pomito (Fig. (Fig.1,1, top left); only three markers (ADO, Bomito, and Pomito) were needed in a DTb to achieve 75.7% accuracy (Fig. (Fig.1,1, top right). Higher values of correct classification were obtained using the predictive models developed to discriminate between human and nonhuman pollution. The best predictive model for H-NH was developed using 5 markers and QDA, leading to 90.1% correct classification. Models that used a lower number of markers were also developed. Only 3 markers (ADO, HF183, and Pomito) were needed to classify 87.5% of the samples correctly using a DTa (Fig. (Fig.1,1, bottom left). Another DTa model correctly classified 89.7% of the samples using four markers: ADO, DEN, HF134, and Humito (not shown). Interestingly, similar to the HPBP models, only two markers (ADO and Pomito) are needed in a DTb to achieve an accuracy as high as 84.6% (Fig. (Fig.1,1, bottom right).
B. adolescentis has been reported as one of the most abundant species of Bifidobacterium in the human microbiota. B. dentium is less abundant, but is also described in the composition of normal human microbiota (50, 58). Therefore, both species have been described as possible markers of fecal human pollution in the environment (9, 36, 41). The detection of both species by an ADO-DEN multiplex PCR method has been used in a European project involving five different countries and a total of 116 samples. The sensitivity and specificity obtained were 93.7% and 74.3% for ADO and 64.4% and 91.8% for DEN (7). Similar values were obtained in the present study (Table (Table2),2), in which ADO had the highest sensitivity and DEN had the highest specificity. Both markers correctly classified a high percentage of samples. However, they should be combined with other markers to raise the capacity for correct classification and increase the number of sources that are identified.
Bacteroidetes markers have been designed using a genomic library developed from noncultured species from human and ruminant feces (6). These markers have been used in many different locations around the world and have varying capacities to identify the source of the fecal pollution (1, 11, 17, 21, 22, 24, 42). The geographical variability of HF183 and CF128 was tested in European countries in the framework of an international research project (21). The specificity and sensitivity of the human marker (HF183) was high in the four countries tested (the United Kingdom, Ireland, France, and Portugal), at 80 to 100%. The sensitivity of the CF128 marker was also high in these four countries. However, low specificity was described in Portugal (40.8%). Consequently, markers should be tested in the different geographical areas in which the method is going to be used. Lower sensitivity for HF183 and CF128 markers has also been observed in this study (Table (Table2),2), although the results obtained are similar to those detected in an analysis performed in Nebraska (34). The geographical stability of these markers and the kind of samples analyzed could be related to the lower values obtained. The Bacteroidetes markers have mostly been tested in feces. However, in this study, the samples were mainly wastewater from different sources. The bacterial composition and even the survival of the populations might be different in feces than in wastewater. Variations between studies carried out by different authors have been also observed (21, 23). The robustness of PCR-based assays needs to be improved to overcome such shortcomings.
The esp marker is related to human pathogenic strains of E. faecium. Consequently, prevalence in the environment is expected to be low. The method for detecting the esp gene (53) requires an enrichment step before PCR. Without this step, the technique is not sensitive enough to detect the potential marker. The esp marker showed a high rate of nonspecificity even when the enrichment step was undertaken. The marker was detected in 60% of pig and 20% of cow fecal samples. The nonspecificity of this marker has been reported in other recent studies (12, 35).
Many procedures for detecting mitochondrial markers have been developed in recent years, including source tracking methods for fecal pollution (13, 38, 52). Large amounts of exfoliated epithelial cells are removed with feces. Every cell has a high number of mitochondria, and every mitochondrion has many copies of mitochondrial DNA. Human feces can contain around 1 × 107 copies of mitochondrial DNA/g (13). The use of nested PCR increases the sensitivity of the technique. However, there is little knowledge of the prevalence of this type of marker in the environment. In this study, high levels of specificity were obtained for the three markers tested (Table (Table1).1). The specificity was high for the bovine and swine mitochondrial markers but not for the human marker.
An ideal MST marker should not have any cross-reaction. Additionally, MST markers that show high specificity need to be tested with many samples from a large number of geographical areas (49). Many factors could determine the feasibility of MST markers when they are applied in the environment. Consequently, the ideal marker has not been found. None of the individual molecular markers tested in this study enable 100% correct classification. The markers ADO, Pomito, Humito, Bomito, and DEN showed the highest sensitivity and the highest specificity (except for Humito) when applied individually. The use of a combination of markers (traditional and molecular) has already been proposed (7) in order to develop MST predictive models. Different numerical methods could be applied to define MST predictive models using the lowest number of markers. In this study, MST predictive models that are based exclusively on the use of molecular markers were defined. Though none of the molecular markers tested correctly classified 100% of samples, a higher capacity of source identification was achieved when they were combined. Models that distinguish human from nonhuman fecal polluted samples or the 4 fecal sources (human, porcine, bovine, and poultry) have been developed. These models were defined using a moderate or low number of markers and correctly classified a high percentage of samples. The use of three studied molecular markers (ADO, Bomito, and Pomito) enabled correct classification of 75.7% of the samples, with differentiation between human, swine, bovine and poultry sources. Discrimination between human and nonhuman fecal pollution is possible using two markers: ADO and Pomito. In this case, 84.6% of the samples can be correctly classified. The percentage of correct classification can be increased by increasing the number of markers analyzed. The best predictive model for distinguishing human from nonhuman fecal sources is based on 5 molecular markers (HF134, ADO, DEN, Bomito, and Pomito) yielding 90.1% correct classification. The solutions presented are judged in the light of the interpretability of the model, the composition and number of the markers used, and the percentages of correct classification per group, configuring possible approaches that could suit different needs. In this sense, and from a practical point of view, the decision trees should be the preferred models for the H-NH task, especially if interpretability is the primary issue. They also tend to offer solutions with lower numbers of markers. For the HPBP task, the choice is between the decision tree and the naïve Bayes method, which offers slightly better overall performance. However, the latter is better for the identification of human and bovine sources, while the former is better for poultry and porcine sources.
In conclusion, some molecular markers could be considered potential MST indicators. They could be used as new parameters in combination with other culture-dependent MST indicators (host specific or unspecific) for the development of feasible universal predictive models in order to determine fecal pollution sources in water bodies.
This study was supported by the Spanish Government, research project CGL2007-65980-C02-01.
Published ahead of print on 29 January 2010.