The effect of alignment on genetic distances
For each of the 13 regions I used various alignment methods to calculate 91,131,750 pairwise distances assuming that a series of consecutive gaps represented one insertion or deletion. The SILVA, greengenes, and RDP alignments represent a gradation in the level of attention given to aligning the variable regions and are each guided by the secondary structure of the 16S rRNA gene. In contrast, the MUSCLE and pairwise alignments are attempts to optimize the alignment between sequences based on a limited number of parameters that are set a priori
. To compare the pairwise distances calculated for the same pairs of sequences across alignments, I calculated the regression coefficients describing the relationship between the distances for the greengenes, RDP, MUSCLE, and pairwise alignments and the SILVA alignment for each region (). Distance calculations for this analysis assumed that consecutive gap positions were the product of a single insertion or deletion mutation (i.e. one gap). With the exception of the V3 and V4 regions, the RDP alignment for each of the regions predicted greater genetic diversity than that of the SILVA alignment. Interestingly, the greengenes alignment, which does a poor job of aligning the variable regions, predicted between 9 and 33% more genetic diversity for each region than the RDP alignment, which does not attempt to align the variable regions. Visual inspection of the greengenes alignment suggests that in many instances the variable region alignments are somewhat random 
. I observed that the MUSCLE-generated alignments described considerably greater genetic diversity than any of the other methods for the V3, V6, and V9 regions (); however, the use of pairwise alignments yielded smaller distances than those calculated with the other alignment methods because pairwise alignment methods optimize the alignment without the constraint of preserving positional homology across multiple sequences (). Perhaps most worrisome is the observation that with the exception of the distances calculated from pairwise alignments, regressions of the other alignment methods to the SILVA-based alignment typically did a poor job of accounting for the variation in the distances (). These data make it clear that variation in alignment quality can have a significant impact on the genetic diversity that is calculated between the same pairs of sequences.
Table 2 Slope coefficients and R2 values for the comparison of one gap distances calculated for SINA-aligned sequences extracted from different regions within the 16S rRNA gene sequence to one gap distances calculated using sequences aligned by different methods. (more ...)
Effect of alignment on interpretation of α-diversity
Considering the poor correlation between the distances generated from the five alignment methods, it was necessary to determine the effect of this variation on the ability to accurately describe and compare communities. As expected based on the genetic distance analysis, the number of OTUs observed using the greengenes alignment was routinely higher than that observed using the other alignment methods and the number of OTUs observed using the pairwise alignment method was routinely the lowest (). Inspection of these lineage through time plots identified a stair-like appearance for many of the regions. This was due to the loss of information as sequence length decreased. The most extreme example of this phenomenon was for the V6 region that had an average sequence length of 60 bp. Each difference between a pair of V6 sequences changed the distance by approximately 0.0167 units, which is the step-length observed for the V6 data in . When the phylogenetic diversity of the datasets was calculated, the greengenes aligned sequences had the highest phylogenetic diversity and the pairwise aligned sequences had the lowest (). One limitation of the phylogenetic diversity metric is that it is difficult to interpret the statistic and so it is unclear how biologically meaningful the level of variation observed is in .
The number of OTUs observed as a function of genetic distance for various regions within the 16S rRNA gene when using different sequence alignments.
The phylogenetic diversity observed for different regions within the 16S rRNA gene when using different alignments.
Effect of alignment on interpretation of β-diversity
To describe β-diversity, I used two OTU-based metrics ( and ) and two phylogenetic-based metrics () to measure the sensitivity of the metrics to alignment quality. Sequences were partitioned so that they would represent two samplings of communities whose Jaccard similarity index was 0.80, but whose Morisita-Horn similarity index was 0.60 with a cutoff of 0.05 when defining OTUs with full-length sequences. Because the sampling of the two simulated communities was limited (ca. 6,750 sequences per community), the Jaccard and unweighted UniFrac statistics did not equal the expected values. Within this simulation framework, the effect of alignment was generally highly statistically significant across metrics of β-diversity (p
0.001); however it is unclear how biologically meaningful the observed differences were.
Figure 3 The Jaccard coefficient calculated between two mock communities (described in Materials & Methods) for different OTU definitions and alignments.
Figure 4 The Morisita-Horn coefficient calculated between two mock communities (described in Materials & Methods) using different OTU definitions and alignments.
Figure 5 Unweighted and weighted UniFrac similarity values calculated between two mock communities (described in Materials & Methods) using different alignments.
The effect of distance calculation method on genetic distances
Using the same SILVA-aligned sequences that I analyzed above, I investigated the effect of different distance calculation methods on downstream analyses. Specifically, I considered the one gap calculator (i.e. a gap of any length between two sequences represents a single mutation) and each gap (i.e. gaps length n, represent n mutations) and ignore gap calculators (i.e. gapped characters are not considered in calculating a distance; ). The slope of lines forced through the origin indicated that the each gap calculator calculated between 0 (V4) and 9% (V3) more genetic diversity than the one gap calculator. With the exception of the V3 region (69%), the regression between the each gap and one gap calculators accounted for more than 87% of the variation in the distances. The differences in the explanatory power of the regression were a function of frequency of gaps longer than 1 nucleotide. The ignore gap calculator calculated between 2 (V4) and 7% (V9) less genetic diversity than the one gap calculator. The regression between the ignore gap and one gap calculators accounted for more than 94% of the variation in the data. Until there is a more well-developed theoretical basis for selecting a method for treating gaps in sequence alignments, these results suggest that treating gaps of any length as a single mutation is a middle ground between ignoring them and treating each of them as a separate evolutionary event.
Slope and R2 values for the regression of one gap pairwise distances onto each gap, ignore gap, Lane mask-filtered one gap, and kmer distances using SINA-aligned sequences over the same region.
Pairwise kmer distances were much larger than the alignment-based calculators and their regression onto the one gap calculated distances accounted for between 83 and 97% of the variation observed between the distances. In order to have no risk of falsely ignoring true one gap pairwise distances smaller than 0.10, it was necessary to keep kmer distances smaller than 0.45 (V19) to 0.73 (V6). This would result in needing to calculate between 3.3- and 9.1-fold more distances than would be needed by alignment-based methods.
Effect of distance calculation method on interpretation of α-diversity
Lacking a theoretical basis for treating gaps as a single evolutionary event, I was curious how much measures of α- and β-diversity are affected by the choice of a distance calculator. I used an OTU-based approach to determine the effect of distance calculation methods on the richness of OTUs within the dataset () and a phylogeny-based approach using total branch length to measure phylogenetic diversity (). As would be predicted, the number of observed OTUs at any genetic distance was greatest with the each gap and least with the ignore gap calculators; the one gap and each gap calculators generated comparable numbers of OTUs. When I analyzed the effect of region and distance calculation method on the phylogenetic diversity of the datasets, there were qualitative trends between methods and regions that could have been predicted from the regression analysis in (). These analyses suggest that the difference observed in α-diversity when using either the one gap or each gap calculator is unlikely to be biologically meaningful.
The number of OTUs observed as a function of genetic distance for various regions within the 16S rRNA gene when using different methods of calculating distances and masking sequences.
The phylogenetic diversity observed for different regions within the 16S rRNA gene when using different methods of calculating distances and masking sequences.
Effect of distance calculation method on interpretation of β-diversity
I next investigated what effect each calculator method had on two OTU-based ( and ) and two phylogeny-based β-diversity measures (). For the OTU-based metrics, ignoring gaps resulted in an over-estimate of the similarity between the two communities and counting each gap resulted in an under-estimate. Increasing and decreasing the cutoff used to define the OTUs had a parallel effect on the Jaccard and Morisita-Horn indices ( and ). These results occurred because ignoring gaps and increasing the threshold each dampen the differences between sequences and pull more sequences into an OTU so that more OTUs are likely to be shared; the same phenomenon was observed when sequences were filtered using the Lane mask (see below). Penalizing each gap or making the OTU definition more stringent had the opposite effect. The calculated Jaccard coefficients were not significantly different between the one gap and each gap distance calculation methods when using the 0.03 and 0.05 OTU cutoffs (); all four distance calculation methods yielded statistically significant differences in Morisita-Horn coefficients, regardless of the OTU cutoff. For the phylogeny-based methods, the observed differences between each of the distance calculation methods were statistically significant (). Although the differences between distance calculation methods were highly statistically significant (p
0.001), it is unclear how biologically meaningful the differences were.
Figure 8 Jaccard similarity values calculated between two mock communities (described in Materials & Methods) for different OTU definitions, methods of calculating distances, and masking sequences.
Figure 9 The Morisita-Horn coefficient calculated between two mock communities (described in Materials & Methods) using different OTU definitions, methods of calculating distances, and masking sequences.
Figure 10 Unweighted and weighted UniFrac similarity values calculated between two mock communities (described in Materials & Methods) using different methods of calculating distances and masking sequences.
Effects of filtering sequences using the Lane mask on analysis
To circumvent alignment quality problems, the Lane mask has been used to filter variable regions from 16S rRNA genes. Results of analyses using filtered sequences aligned by any method or when distances were calculated by any method did not vary to a meaningful degree. Comparison of distances calculated using filtered sequences to those calculated using unfiltered sequences showed that filtering significantly reduced the genetic diversity observed between sequences (). With the exception of the V4 and V6 regions, masking removed between 15 and 45% of the genetic diversity. The V4 region is largely unaffected by the Lane mask and the average length of V6 sequences following the Lane mask treatment was only 27 bp, which made the resulting pairwise distances of dubious value (). As would be expected, the number of OTUs and phylogenetic diversity observed using Lane mask-filtered sequences was significantly lower than those calculated with the unfiltered sequences. For the four β-diversity measures, when the Lane mask-filtered sequences were analyzed, the communities appeared more similar than for non-filtered SILVA-aligned sequences (–). One explanation for this observation is that because filtering makes sequences more similar to each other, it also makes communities appear more similar to each other. Although useful for broad-scale phylogenetic analysis at the level of a kingdom or phylum, filters remove the sequence information necessary to differentiate populations within a community. Ultimately, application of such filters is troublesome because it mutes the signals that differentiate communities.
Relationship between the genetic diversity calculated between full-length and regional sequences
I compared the one gap distances calculated for each of the 12 regions from each alignment to the one gap distances calculated from the full-length SILVA alignments (). The regression of pairwise distances calculated from a sub-region onto distances calculated from full-length sequences was rarely near 1.00. The most extreme case was the V6 region for which distances were nearly 3-fold higher than distances calculated using full-length sequences. Conversely, sequences from the V9 region were 33% less diverse than their full-length counterparts. In general, genetic diversity decreased along the length of the 16S rRNA gene. Although one could use these regression coefficients to relate data collected from one region to that from full-length sequences, the ability of the regression to explain the variation observed between sub-region and full-length sequences was quite poor. As expected, longer regions did the best job of relating the variation between sub-regions and full-length sequences. For example, when using the SILVA alignments, the regression of the V14, V35, and V69 distances onto the full-length distances accounted for 87, 77, and 77% of the variation in distances. Shorter regions such as the V3, V6, and V9 accounted for 26, 36, and 46% of the variation (). This analysis revealed that all sub-regions are limited in their capacity to serve as surrogates for full-length 16S rRNA gene sequences.
Table 4 Regression coefficients and R2 values for the comparison of one gap distances calculated for different regions within the 16S rRNA gene sequence and aligned by different methods to the one gap distances calculated using SINA aligned full-length sequences. (more ...)
Relationship between sub-region differences and differences in α-diversity
The distance-based analysis clearly showed significant differences between distances calculated from sub-regions and full-length sequences. The OTU-based analysis in demonstrates that there was a clear difference in the number of OTUs observed across regions for a given genetic distance as well as the level of curvature observe observed in the lineage-through-time plots ( and ). In the phylogenetic-based analysis those regions that described more genetic diversity than the full-length sequences had greater phylogenetic diversity than the phylogenetic diversity calculated for the full-length sequences whereas the regions that described less genetic diversity yielded greater phylogenetic diversity ( and ).
Relationship between sub-region differences and differences in β-diversity
Using pyrotag data introduces several complexities to β-diversity analyses. Moving across regions, but using the same OTU definition could lead one to overestimate community similarity. For example, the average Morisita-Horn similarity for full-length SILVA-aligned sequences with one gap distances was 0.56. Using similarly treated sequences from the V12, V13, V14, and V23 regions I calculated Morisita-Horn values between 0.57 and 0.60; however those from the other 8 regions yielded values between 0.64 (V2) and 0.79 (V9). For a single region, changing the OTU cutoff also had a significant effect on the Morisita-Horn index. For instance, full-length SILVA-aligned sequences yielded 0.52, 0.56, and 0.86 for cutoffs of 0.03, 0.05, and 0.10. This spread in Morisita-Horn values between the 0.03 and 0.10 OTU cutoffs (0.34) was the largest of any region. The narrowest spread was observed for the V6 region (0.06). In contrast to the Morisita-Horn values, there was little variation in the unweighted or weighted UniFrac statistic when comparing sequences analyzed by the same alignment and distance calculation method. With the exception of the V6 region (0.33), the average unweighted UniFrac values varied between 0.24 (V13, V14, V19) and 0.30 (V9) and with the exception of the V12 region (0.69), the average weighted UniFrac values varied between 0.80 (V13) and 0.87 (V9); the value for the full-length sequence was 0.82. Similar to the α-diversity measure of phylogenetic diversity, an added complication of phylogeny-based methods is the complexity of interpreting the proportion of branch length that is shared between or unique to two communities and how such proportions relate to classical β-diversity measures. Thus, it is difficult to interpret the biological significance of such variation. Regardless, the results of the OTU- and phylogeny-based analyses demonstrate that caution must be taken in extrapolating results from one region to another.