Model community experiment.
The idea of the qfingerprinting approach is to apply a standard fingerprinting strategy to dilution series of the target samples, in order to identify the highest dilution at which an OTU is still PCR amplifiable (Fig. ). The qfingerprinting strategy was applied to a mixture of four plasmids harboring cloned ITS inserts whose individual lengths and concentrations (Fig. , ) were known beforehand in order to test the validity of the concept. Capillary electrophoresis separation of the defined samples produced 24 electropherograms (21 dilution samples and 3 negative controls) for the same plasmid mixture (Fig. ). A clear difference in peak height and a gradual peak loss at dilutions 10−4 to 10−6 were observed as a function of the initial plasmid concentrations.
FIG. 2. Electropherograms obtained for the qARISA approach performed on a model community. The decimal dilution factor is indicated in each panel by a figure (0 for undiluted, −1 for 0.1, −2 for 0.02, etc.) and the letter K (no DNA control). Only (more ...)
FIG. 3. Calculations of abundance values for each OTU present in the model community. After binning, the RFI (in percentages of total sample intensity) for each OTU is indicated when present at the corresponding dilution and replicate levels. Using the qARISA (more ...)
Following the straightforward binning of the data (WS = 2, Sh = 0.1), the binned table (series of diluted replicated samples by OTUs) was then converted to log10 values by following the consensus and the continuity rules (see Materials and Methods). The consensus rule can be illustrated, for instance, by OTUs 524, 525, and 534 (Fig. ). Indeed, the corresponding peaks were considered present only for the latter two OTUs, because a peak was identified in two of three replicates at a given dilution level. An example of the continuity rule may be found with OTU 537 for which peaks satisfying the consensus rule were detected at dilution 100 and at 10−2, but not at 10−1. In that case the peak presence was established only for the undiluted sample at 100 (Fig. ). Using those two rules, OTU presence until the highest dilutions was recorded and transformed into log10 abundance values.
Interestingly, the RFI values did not proportionally change with the dilution factor (Fig. ). At the undiluted level (i.e., standard ARISA conditions), the RFI values ranging from 13 to 31% did not even reflect the fact that the concentration of each target sequence was initially consisting of 10-fold differences. Indeed, RFI values are obtained by standardizing the individual peak areas per sample (see, for example, references 44
). It is thus not surprising that these values do not follow the dilution trend imposed in the study because their magnitude is directly related to the behavior of the other peaks present in the given sample (see also reference 4
). Consequently, the final calculations of abundance values from the dilution series should take the occurrence of the OTUs into account, instead of their RFI values.
On the electropherograms (Fig. ) and in the resulting sample-by-OTU table (Fig. ), several OTUs were associated with small peaks, some of which satisfied both the consensus and the continuity rules (e.g., OTUs 522, 525, and 534). Peaks that did not correspond to one of the target sequences in the mixture were always found at low abundances. This may be due to the fact that because few targets were present in the artificial mixture, the background level of false-positive peaks was higher. Thus, unspecific, but weak amplifications were more likely to be detected. When more sequences are present in the DNA mixture, as with environment samples, the small, inconsistent peaks may be diluted out by the presence of more numerous and highly dominant peaks. In a standard ARISA, however, these small peaks would have been considered as real OTUs, whereas here the difference in magnitude reveals that they may most likely be considered PCR artifacts.
By taking the initial insert concentrations and the respective dilution factors into consideration, the theoretical detection limit of the technique can be estimated. By taking the most abundant OTU (601 bp) at the initial concentration of 0.4 ng per reaction at the undiluted level, and assuming 660 g/mol per bp (thus, 6.59 × 10−19
g per fragment), a total of ~6 × 108
fragment copies of the target sequence were then initially present in the reaction. The highest dilutions where amplifications still occurred were between 10−5
(Fig. and ), indicating that a minimum of 600 to 6,000 target sequences were needed for a successful amplification to occur under our experimental conditions. This is in accordance with the detection thresholds of 103
cells per ml of sample generally reached when molecular community fingerprinting methods are used (6
Overall, the qfingerprinting concept and analysis pipeline worked well on the mixture of known ITS sequences. The difference in the magnitude of the initial target ratios was maintained because the derived abundances obtained via qfingerprinting are not based on the final, biased quantitative estimate of PCR success, but rather on the qualitative outcome of the PCR amplification at different dilution levels. The test on plasmids also illustrates the need to apply advanced numerical methods to identify the best binning frames so as to identify the most likely dilution levels for OTU occurrence. Indeed, misclassification of OTUs between replicates at a given dilution level could lead to a false rejection or acceptance of the consensus or continuity rules by altering the continuity of the series of positive amplifications.
Application to environmental samples.
Samples originating from deep marine sediments forming a natural depth gradient were chosen to illustrate the advantages of the new method. A total of nine successive 1-cm sediment layers were subjected to qARISA, thus producing a total of 189 reactions (nine layers × seven dilution levels × three replicates). A total of 15,361 peaks were retained after capillary electrophoresis, which ranged from 100 to 1,000 bp and whose absolute peak areas were over 50 fluorescence units. No peaks satisfying the aforementioned criteria were observed for the eight negative controls. Due to the large number of peaks, using an automatic binning technique was very important because OTU identification and quantification had to satisfy both the consensus and the continuity rules, which could not be performed manually.
After the scanning mode of the binning algorithm was applied, the correlation coefficients between samples were all 0.46 for window sizes 2, 3, 4, and 5 bp and corresponded to total OTU numbers of 438, 298, 225, and 181, respectively. For window sizes of 0.5 and 1 bp, the correlation coefficients were smaller (r = 0.32 and 0.38, respectively), but the OTU numbers were much higher, with 789 and 815 OTUs identified, respectively. These findings are not surprising since increasing the window size leads to lumping more peaks together, hence to a smaller number of OTUs finally identified. As a compromise between high correlation among samples and high OTU number, the binning results of window size 2 were chosen for subsequent analyses. A total of 437 bins (OTUs) were thus obtained out of the 15,361 initial peaks detected when all replicated samples were considered. After the qfingerprinting function was applied to convert the binned table into a log10 abundance table (i.e., by taking the dilution series into account, as well as the consensus and continuity rules), a total of 332 OTUs were finally obtained for the nine environmental samples. For each OTU, an abundance value that ranged from 0 (absent) to 7 (present at abundance levels 7 orders of magnitude higher than at the undiluted level) was obtained, and this produced on average 173 ± 20 (standard deviation) OTUs per sample.
The variability in the quantitative estimates obtained via qfingerprinting was already taken into consideration at the levels of peak size calling and interexperimental variability by using the appropriate binning method and consensus rules. However, the current implementation does not enable the calculation of confidence intervals for those estimates. Alternatively, the MPN (5
) approach could be used to analyze the data in a probabilistic framework. MPN-PCR and direct dilution calculations provide slightly different estimations of abundances, but they are mostly linearly related to each other (see, for example, references 35
). The direct dilution calculation was chosen here for its ease of implementation when simultaneously dealing with hundreds of OTUs. It should be noted that the accuracy of the quantification depends on the number of replicates and dilution rates, with more accuracy obtained when more replicates are used per dilution level. The average accuracy has been shown to be almost identical for dilution rates between 2 and 10, but coefficients of variation are more stable and slightly lower for twofold dilution assays (5
). The dilution rate may obviously be adapted by taking previous knowledge, specific needs, or available resources into account.
Based on the detection limit calculated in the model community experiment (see above), a specific OTU may be detected if its number of copies is at least in the range 600 to 6,000 copies at the undiluted level, under ideal amplification conditions (i.e., if we assume that DNA from the synthetic community has the same properties as that from environmental communities). This would be indicated by an abundance value of 1 log10
in the result table. Consequently, when values of 7 log10
abundance are obtained (see Fig. ), this corresponds to at least 6 × 108
target copies detected for the specific OTUs. Noticeably, this number is very close to the reported estimates of highest cell number found in those samples, which ranged from 1.0 × 109
to 6.0 × 109
). Despite this interesting correspondence in abundance estimates, it should be noted that the abundance values obtained via qfingerprinting are difficult to convert to cell densities or biomass because such conversion would require knowledge of the copy number per genome and the genome size, and both numbers can be very variable (9
FIG. 5. Two-way clustering of OTU abundances obtained by standard ARISA and by qARISA. A complete linkage algorithm was used to perform a cluster analysis of the Euclidean-based dissimilarity matrices inferred from RFI values (standard ARISA) and log10 values (more ...) Comparison of qARISA and standard ARISA.
The distributions of RFI values and log10 values obtained from standard ARISA and qARISA, respectively, could not be considered as drawn from the same data distribution (Fig. ), as also confirmed by a significant Kolmogorov-Smirnov test (D = 0.8878, P < 0.001; based on 1,560 nonzero values in each case). The latter test takes into account differences in both location and shape of the two empirical cumulative distribution functions. The same conclusions were also obtained if the respective abundance data were first split into frequencies of occurrence per sample and then compared between the standard ARISA and the qARISA approaches (D = 0.77 to 0.98; all P < 0.001).
FIG. 4. Comparison of the distribution of abundance values and of their effects on sample dissimilarities when ARISA versus qARISA is used on a set of environmental samples. (A) Comparison of all non-null values (1,560 values) from the two distributions (standard (more ...)
At the level of sample similarities, the standard ARISA versus qARISA approaches, however, produced significantly linearly related distance matrices (Mantel r = 0.73, P < 0.001 as determined by 1,000 permutations; Fig. ), although a low coefficient of determination (R2 = 0.54) indicated the existence of strong disagreement between the two approaches. As noted from the two abundance distributions, using RFI values from the standard ARISA instead of qARISA-derived values would generally lead to lower dissimilarities among samples, i.e., to concluding that samples are more similar than they really are. The slope coefficient of the linear regression, beta, was estimated to be 0.81 ± 0.129 (standard error), leading to a 95% confidence interval ranging from 0.55 to 1.08 (assuming 34 degrees of freedom for the Student t distribution). This indicates that beta was not significantly different from 1, i.e., that the two methods give related estimates of dissimilarities among samples. Finally, NMDS ordinations based on the two approaches, although displaying substantial deviations from each other (Fig. ), were nevertheless significantly correlated (symmetric Procrustes rotation correlation of 0.88; P < 0.001).
In conclusion, using qARISA versus ARISA led to various levels of discrepancies for data distribution, estimation of sample dissimilarities, and ordination results. The significant correlation between sample ordinations implies that the standard ARISA would still be robust for inferring changes in community structure, despite its inability to accurately estimate OTU abundance. This may originate from the DNA normalization steps common to both procedures, i.e., before the initial PCR, but also before separating DNA fragments by capillary electrophoresis. In addition, it has been shown that normalized ARISA peak area may relate well to quantitative relative abundance, as determined by flow cytometry (3
), and this also explains why the standard ARISA has been successfully used thus far to describe community dynamics over spatial and temporal gradients in a variety of ecosystems (13
). Despite the apparent limited differences between the two approaches, it is not yet known whether the two techniques would (i) still yield comparable sample ordinations when more samples are included in the analyses and (ii) offer the same ecological conclusions when contextual interpretations (i.e., when additional environmental, spatial, or temporal variables are available to explain diversity patterns) of the two sample ordinations are undertaken. Those important points will need particular attention in future qfingerprinting applications.
Validity of the continuity rule.
One of the main assumptions used in the qfingerprinting approach is that, for each OTU, PCR success is monotonically related to the concentration of the OTU target sequence in the sample. It is, however, known that PCR yield may be optimal (i) at high dilution levels of the DNA solution because impurities or PCR inhibitors may be diluted out or (ii) in the middle of a target DNA concentration range because a trade-off between DNA concentration and a low level of contaminants must be found to obtain a successful PCR amplification, thus producing a unimodal distribution of amplification success for a given OTU. The first consideration should not be a major issue if we assume that, in the case of impurities, PCR inhibition would apply to all OTUs at the same time because it is the efficiency of the DNA polymerase that is mostly affected and not a specific OTU amplification. Note that this point could be further debated because it is highly likely that impurities selectively trap DNA sequences based on sequence composition (see, for example, references 45
), but this goes beyond the scope of the present study.
The second hypothesis deserves more attention because it will directly entail a modification of how the algorithm currently assesses OTU presence and calculates the final OTU abundance table. By examining the raw data for each OTU before applying the qfingerprinting calculations, on average 86% ± 10% of the OTUs were present at dilution 100, and for the existing OTUs whose amplification patterns did not start at the undiluted level (i.e., presenting unimodal rather than linear patterns) they consisted more than 61% of the time of OTUs appearing only once at one dilution level, suggesting that they may be considered as unreliable peaks. Thus, it seems that the continuity rule used in OTU quantification represents a valid hypothesis to describe the behavior of OTU amplification across the dilution series. It must be noted that this conclusion, although being valid for this data set, could be inappropriate if more impurities are present (e.g., if different samples or extraction protocols are used) or if particular PCR conditions are more sensitive to suboptimal conditions.
Relationship between OTU distribution patterns and depth.
The environmental samples were coming from nine 1-cm-thick layers from the same core, and it was particularly interesting to look at the OTU distribution patterns and abundance changes across this natural depth gradient. The overall community behavior with respect to depth when determined by detrended correspondence analysis revealed that a short gradient length of 1.74 was associated with the qARISA data set, suggesting that most OTUs were linearly responding to environmental gradients (37
). This was confirmed by RDA, indicating that depth significantly explained the variation in the data set (P
< 0.001, based on 1,000 permutations of the multivariate model).
A detailed analysis of individual OTU abundance was performed by using two-way cluster analysis and heatmap plot representations (Fig. ), which allow a display of OTU abundance relationships as a function of depth (the sample order with depth was kept fixed).
To allow for comparisons of abundance classes between RFI and log10 values, RFI values that ranged from 0 to 13% were first converted into a discrete 0 to 7 scale. Therefore, although the ranges of color codes (0 to 7) were the same for standard and quantitative ARISAs, their meaning was different: for standard ARISA they corresponded to fluctuations of ~1 log10 value difference [log10(13) = 1.11], whereas for qARISA the variation corresponds to 7-order-of-magnitude changes (decimal logarithmic scale). Clustering RFI values from standard ARISA produced overall a very different picture of inter-OTU relationships and of abundance changes with depth compared to the outcome of qARISA log10 abundance values (Fig. ). Noticeably, OTUs in standard ARISA seemed to fall into two main clusters, whereas four, more variable clusters (designated by the letters A to D) could be identified with qARISA data for a similar cutoff level (Fig. ). It is important to note that clustering RFI values from standard ARISA could lead to very unpredictable patterns because, as mentioned above, these values cannot truly represent OTU abundances from a sample since they are derived from the overall peak abundances from a given electrophoretic profile. For instance, the two most abundant OTUs in standard ARISA were located in cluster A or B when qARISA was used, i.e., clusters associated with low abundant OTUs.
The four main clusters obtained with qARISA data reflected different patterns of OTU occurrence among samples and different OTU abundance categories, i.e., high, average, or low abundance. Nonparametric Spearman correlation tests were subsequently used to determine significant OTU relationships with depth. Among the 199 OTUs (60% of total OTUs) of cluster A, the abundances of 17 and 2 OTUs significantly decreased and increased with depth, respectively, and contributed to 9.5% of all OTUs in the cluster. Since the highest OTU abundance was 4 log10, cluster A may be qualified as composed of “rare OTUs with low abundance.” Cluster B consisted of 84 OTUs (25% of the whole data set). Among them, two and one OTU, respectively, significantly decreased and increased with depth, representing 3.5% of the OTUs in the cluster. As such, the OTUs in cluster B may be designated as “common, but generally not affected by depth.” Cluster C consisted of 40 OTUs (12% of all OTUs), with 24 OTUs significantly increasing with depth (i.e., 60% of the OTUs of the cluster), leading to the most “depth-associated” cluster. Finally, cluster D with only nine OTUs (3% of all OTUs) consisted of OTUs that were present in several samples and at high abundance (ranging from 5 to 7 log10) but whose abundance overall displayed no trend with depth. This cluster may be designated as that of “common and dominant” OTUs.
Only 27 OTUs (8% of the total OTUs) were present in all nine samples with an average abundance over 2 log10
(mean abundance for all nonzero values), and 20 OTUs (6%) were present in all samples, but with abundances lower than 2. A total of 57 (17%) rare OTUs (i.e., occurring in one sample only) were all found with an average abundance below 2 log10
. The large majority (69% or 228 OTUs) of all OTUs were thus not classified as either dominant or rare. It may thus be concluded that qARISA did not only amplify the most common OTUs present in the samples, since the most abundant only consisted of 8% of all OTUs detected. The large majority (85%) was found to be composed of rare or average OTUs with generally low abundance and with weak relationships with depth. It must be noted that the finding of well-represented OTU categories from our environmental samples has a lot to do with the choice of the primer pairs. The ARISA primers in the present study were chosen for their ability to evenly amplify different OTUs in the same sample compared to other primer sets (4
; unpublished results). Therefore, the finding of various OTU categories should not directly be attributed to the qfingerprinting approach but to the choice of PCR primers and amplification conditions.
The finding of a substantial proportion of rare or average OTUs is contrary to the common belief that fingerprinting methods only detect the most abundant microbes in a sample (reviewed in reference 1
) and suggests that qfingerprinting may also be useful for studying the dynamics of the “rare biosphere” (16
). It is, however, difficult to attribute the very rare types either to PCR artifacts, as could be suggested by the low-abundance peaks detected in our model community experiments, or to a true representation of the community composition. It has previously been suggested that rare OTUs (often associated with small peak sizes) may be artifacts of fingerprinting methods (2
), thus resulting in fallacious description of microbial community structure (see, for example, reference 55
). Using the consensus rule as a conservative way to validate the presence of a given OTU may be one step toward rejecting trivial PCR artifacts. Caution is therefore required at this stage to avoid overinterpreting the presence of rare types in the data, especially when additional molecular proofs are not available, at least for some of the rare OTUs. Alternatively, cutoff abundance values may also be used to remove the low-abundant OTUs if one considers their presence as artifactual.
In most microbial diversity analyses, biotic interactions between components of the microbial community are overlooked, despite the fact that they may be extremely important in determining community dynamics and changes in ecosystem processes, as evidenced from macroecological studies (53
). Here, the qfingerprinting approach enabled interrelationships between OTUs to be further examined in contrast to the traditional fingerprinting approaches that only offer a qualitative description of OTU presence or absence. To facilitate the depiction of the relationships, PCA was chosen since it helps identify OTUs whose abundance profiles covary positively, negatively, or neutrally with respect to each other (Fig. ), insights that cannot be easily deduced from Fig. , for instance. In the ordination plot, OTU vectors that are orthogonal to each other may be considered as behaving independently, whereas the ones that are colinear may be seen as positively or negatively covarying, depending on the angles between the vectors being compared (see, for example, references 23
for further interpretation of PCA ordination plots). Noticeably, individual OTU patterns were clearly spanning all directions in the PCA ordination plot (Fig. ) and therefore revealed the existence of much more individual variability than what could have been assumed within a single sediment core.
FIG. 6. PCA of OTU distribution patterns. PCA with scaling focusing on inter-OTU distances for the 332 OTU distribution patterns was chosen to illustrate how inter-OTU relationships could be deduced from qARISA-derived abundance values. The percentages of explained (more ...)
Interestingly, it is also possible to determine the number of OTUs sharing the exact same abundance profile, i.e., being present in the same samples and displaying the same abundance values. The number of OTU patterns occurring only once was 237 (i.e., 90.5% of the 262 abundance patterns), the numbers of those occurring two and three times were 16 (6.1%) and 4 (1.5%), respectively, and those occurring 4, 11, 12, and 20 times were each found at less than 1% of the total patterns (i.e., 1 or 2 patterns) (Fig. ), thus suggesting that the large majority of OTUs had distinct abundance patterns. Patterns shared by more than two OTUs were mostly associated with low-abundant OTUs found in one or two samples, thus suggesting that the main structure in the data set was created by specific variation in the individual OTU abundances rather by common OTU behaviors. As previously seen (Fig. ), most OTU distributions did not follow the depth gradient, and this is confirmed here by the finding of only a few OTU vectors being correlated to the depth vector superimposed in the ordination plot (Fig. ). It therefore seems that some other unmeasured factor(s) may be responsible for the observed patterns, if we opt for a deterministic explanation, and/or that stochastic (random) processes may be at play in the current system.
Advantages and limitations of the quantitative approach.
The present study demonstrates the usefulness of using the qfingerprinting strategy to simultaneously determine the relative abundance of hundreds of OTUs, and this could readily be applied to tens to hundreds of samples. This quantitative approach represents a major improvement for the understanding of microbial diversity patterns at the level of individual OTUs in contrast to standard community fingerprinting methods that merely qualitatively screen for OTU presence or absence in samples. At the level of whole community ordination, i.e., not comparing individual OTUs but sample similarity based on OTU abundances, patterns retrieved by the two approaches were slightly different but still consistent with each other. Hence, if detailed information about each OTU abundance is required in a given study, the new, quantitative strategy should be preferred over the standard strategy. If the study merely focuses on overall patterns of sample similarities and accurate determination of OTU abundance is not needed, then the standard fingerprinting method with the necessary DNA standardization and normalization steps described in the present study should be favored. However, caution must be taken when interpreting diversity patterns based on qualitative measures of β diversity since this has been shown to lead to different ecological insights concerning the factors that structure microbial communities (26
An additional advantage of the new strategy is the fact that there is no need to design individual PCR primers for each OTU, as would be required for a difficult, multi-OTU quantitative PCR approach. Because it is straightforward, the technique will easily complement the existing applications of ARISA, T-RFLP, and SSCP, and any other fingerprinting methods based on ribosomal or functional genes if the latter are based on high-throughput sorting of the fragments (e.g., via capillary electrophoresis) and if internal size standards are included in every sample. These last conditions are important in order to make sure that a high reliability is associated with peak size calling and that further correction of size calling inaccuracy is possible without losing sensitivity in signal detection.
Despite the considerable advantages of the new strategy, the same weaknesses and limitations as found with fingerprinting methods and more generally with any PCR-based methods may be predicted. For instance, the method also relies on correct DNA extraction methods to obtain a good representation of the original sample diversity and is affected by primer biases, the generation of chimeric sequences, etc., and all of those issues have been extensively reviewed (50
). The specificity of the strategy is directly related to the choice of the PCR primers, and this fact may be used to narrow or to expand the range of targeted OTUs (this may be thus seen as a strength or a weakness). Diversity under- or overestimation common to many fragment-based fingerprinting techniques may also be problematic (for a review, see references 1
). Finally, because many PCRs are done in parallel for the dilution series, higher risk of contaminations and cost increase have to be considered. However, because dilution rates can be adapted to the specific needs of the study and as access to sequencing facilities and automatic pipetting devices becomes easier and cheaper, large-scale, high-throughput applications of the technique should become more feasible and affordable in the future.
Future applications of the qfingerprinting strategy should provide datasets that will help establish better mathematical models of intracommunity dynamics and more generally datasets on which microbial ecological theories and community assembly rules can be further developed (15
). Detailed analyses of OTU distribution and abundance may also be foreseen when additional contextual parameters are available. For instance, OTU indicator analysis could be performed to determine whether various microbial OTUs are influenced by the same environmental factors or are indicative of specific community status, since microbes may play a major role in detecting and characterizing changes in environmental conditions (32
). In addition, density-dependent processes such as quorum-sensing mediated microbial responses (52
) may be monitored using qfingerprinting, and this will help us to better understand the relationships between community structure and function in complex ecosystems. Other examples of application should be found in the description of the allelic diversity associated with functional genes so as to quantify specific groups of microorganisms involved in biogeochemical processes, bioremediation, or plant protection and to identify their interactions and behavior under specific environmental conditions.