In silico evaluation of miniprimer targets in environmental sequences.
To determine whether miniprimer PCR might expand the phylogenetic breadth of 16S rRNA gene sequences detected in environmental samples, the environmental sequence database was searched for putative 16S rRNA gene amplicons. It is important to note that this database compiles sequences obtained not by PCR but by random shotgun cloning of DNA directly extracted from environmental samples. A modeled PCR search identified putative 16S rRNA gene amplicons delimited by forward and reverse primer sequences separated by an appropriate distance (Fig. ). Our goal was not to determine the fraction of 16S rRNA sequences in the database identified by the primers but to compare experimental methods for 16S rRNA gene discovery. Each database sequence and its reverse complement were searched using several of the best-performing primer pairs (Fig. ); additional primer pairs were also designed to identify smaller amplicons because the average sequence length of the environmental sequence database entries is ~1,000 nt, lowering the probability of identifying long amplicons. To compare the abundances of putative amplicons defined by pairs of long primers and miniprimers, matched primer pairs were used to target the same regions of the 16S rRNA gene (Fig. ). Miniprimers identified 1,648 putative amplicons and the standard primers identified 448; surprisingly, of the 1,648 miniprimer sequences, 971 were not identified by any long-primer pair tested. By BLAST comparison against a database of 30,312 nonredundant, nonchimeric bacterial and archaeal 16S rRNA gene sequences of at least 1,350 nt, 1,068 miniprimer and 301 long-primer sequences were verified to be regions of 16S rRNA genes (Fig. ). Importantly, each miniprimer pair identified 2- to 10-fold more putative 16S rRNA gene amplicons (Fig. ). This analysis predicted that miniprimer pairs can amplify more 16S rRNA gene sequences in environmental samples than can standard long primers.
FIG. 3. Comparison of putative miniprimer and long-primer amplicons in the GenBank environmental sequence database. Pairs of long primers and miniprimers were used to identify putative 16S rRNA gene amplicons in the GenBank environmental sequence database. Horizontal (more ...)
Next, the phylogenetic breadth represented by the putative 16S rRNA gene amplicons was determined. In sum, the miniprimer and long-primer pairs identified 1,068 and 301 total putative 16S rRNA gene sequences, respectively. To build the best phylogenies and remove redundant sequences, the longest sequence identified from each database entry was used in instances when the same database entry was identified by multiple primer pairs. After removing replicated sequences, the remaining 685 miniprimer and 205 long-primer sequences were aligned to a comprehensive 16S rRNA gene sequence database (16
). One long-primer sequence and 14 miniprimer sequences could not be satisfactorily aligned to the database. A phylogenetic tree was constructed using the remaining sequences, and the resulting taxonomic distribution indicated that almost all the sequences originated from Bacteria
, with Proteobacteria
sequences in the majority (see S1 in the supplemental material). For almost every taxonomic class, miniprimers identified more sequences than long primers. Also, miniprimers identified sequences in some classes for which long primers identified no sequences (see S1 in the supplemental material); this generally occurred for classes in which few members were identified but also occurred for Sargasso Sea group 11 (SAR11), the most abundant class of marine bacteria currently known (29
). In addition, miniprimers identified nine sequences from Archaea
, a domain for which long primers identified no sequences. This suggests that miniprimer PCR can amplify more sequences in more phylogenetic groups than can standard long primers.
Evaluation of miniprimer PCR with a soil sample.
To determine whether the computational analysis reflected experimental results, miniprimer PCR was used in a study to amplify 16S rRNA genes from soil by use of the primer pairs 27F-10/1505R-10, 12F-10/1505R-10, and 524F-10/1505R-10 (Table and Fig. ). In this study, we did not attempt a comprehensive analysis of a soil sample but only an assessment of the miniprimers and whether they could detect sequences with mismatches to the standard primers. After PCR and cloning, 32 clones from each primer pair were partially sequenced from one end to examine the primer binding regions. Products from the 27F-10/1505R-10 PCR were chosen for further study because 27F-10 comprises the first 10 nt from the 5′ half of the 27F suite of primers (Table ) and thus enabled the identification of amplicons whose sequences had mismatches to the 3′ half of 27F (see S2 in the supplemental material). In addition, the 27F-10/1505R-10 miniprimer pair amplifies nearly the entire 16S rRNA gene, and it performed well in pilot experiments. Very-high-quality sequences were obtained for 10 of the 32 27F-10/1505R-10 sequences; of these 10 end sequences, 5 began with the 27F-10 sequence and 5 began with the 1505R-10 sequence. Two of the sequences anchored by the miniprimer 27F-10, A01 and A04, contained mismatches to the 27F-P binding sequence and also to the sequence of 27F-HT, a degenerate primer designed to be more general (Table ; also see S2 in the supplemental material).
After A01 and A04 were fully sequenced, it was noted that the A01 sequence also had a mismatch within the 4 nt at the 3′ end of the 1492R binding site that are not shared by 1505R-10 (see S2 in the supplemental material). The A01 and A04 sequences were aligned to the comprehensive 16S rRNA gene sequence database to assess the sequences within conserved regions (see S2 in the supplemental material). The alignment revealed that A01 and A04 contained mismatches within the 27F-P and 1492R-P primer binding regions and within other regions of the 16S rRNA gene that are very highly conserved across the Bacteria domain (see S2 in the supplemental material). Importantly, predictions of rRNA secondary structure based on the Arb alignments supported the presence of the divergent bases in the A01 and A04 sequences—of the nine divergent bases observed (see S2 in the supplemental material), seven are located in base-paired regions, and for five of these bases, the expected complementary changes are present at the proper locations to preserve Watson-Crick pairing, and in the remaining two cases, G:G and G:A base pairs are present for all members of the phylogenetic clusters in which A01 and A04 reside, respectively. An examination of 20 randomly selected divergent bases within conserved helical regions also revealed the expected complementary bases at the correct locations to preserve Watson-Crick pairing for all 20 sites chosen (data not shown). Thus, given the covariation present in the sequences, it is exceedingly unlikely that the divergent bases arose from PCR errors or chimeric sequences.
Miniprimer PCR characterization of the Candelaria microbial mat community.
We applied the method to a mature microbial mat community from an extreme environment, the hypersaline Candelaria lagoon of the Cabo Rojo salterns. This mat community was chosen for study because it was expected to have intermediate diversity and interesting phylotypes. To identify members of the mat community for this study, we constructed six 16S rRNA gene sequence libraries with either the miniprimer or standard long-primer techniques.
We constructed libraries with the miniprimer pair 27F-10/1505R-10 and two different long-primer pairs: 27F-P/1492R-P and 27F-HT/1492R-HT (Table ). The primers 27F-P and 1492R-P are “first-generation” nondegenerate universal bacterial primers that remain in widespread use today (1
), and 27F-HT and 1492R-HT are more recently published primers that are based on the original first-generation primers and include several degenerate positions to broaden their scope of targets (37
). These primers produce nearly full-length 16S rRNA gene sequences and thus provide a fair assessment of the miniprimer method relative to standard methods in common use; duplicate libraries were constructed with each of the primer pairs (Table ).
Candelaria microbial mat 16S rRNA gene libraries constructed
The Candelaria microbial mat sequence libraries comprised over 40 bacterial divisions, with the Chloroflexi, Bacteroidetes, Halanaerobiales, and Planctomycetes divisions having the greatest representation (Fig. ). All of the most highly populated divisions identified were represented in each of the libraries, though in different proportions (Fig. ). Notably, miniprimers amplified a greater proportion of sequences that could not be classified at or below the division level (see below).
FIG. 4. Distribution of microbial mat library sequences in major bacterial divisions identified. The 10 taxonomic divisions with the highest representation in the libraries are shown as their relative proportions within the miniprimer (M) and two types of long-primer (more ...)
The miniprimers amplified more sequences with poor matches to previously isolated 16S rRNA gene sequences (Fig. ). Miniprimer libraries contained a larger fraction than did long-primer libraries of sequences that matched the database at distances greater than 0.10 (Fig. ); conversely, the miniprimer libraries matched many fewer sequences at distances less than or equal to 0.05. The distributions of database matches for the two long-primer libraries were very similar and more similar to each other than either distribution was to the miniprimer distribution (Fig. ). Thus, the miniprimer method appears to amplify more novel sequences than the long-primer methods.
FIG. 5. Distribution of microbial mat best database matches. The fractions of sequences whose best database matches fall into the designated distance intervals are shown for the long-primer (P, 27F-P/1492R-10; H, 27F-HT/1492R-HT) and miniprimer (M) libraries. (more ...)
A small fraction of sequences matched the database at distances greater than 0.20 (Fig. ). Most of these sequences were not placed into defined taxa or were placed into recently defined taxa, many of which include sequences isolated from similar environments (Fig. ). A few groups contain clusters of miniprimer sequences, in particular a subgroup within WS6 and the CR1 and CR2 groups (Fig. ). The clusters CR1 and CR2 (Fig. ; also see S3 in the supplemental material) formed at distances of 0.26 and 0.24, respectively, from previously isolated sequences and thus are likely to be representatives of new division-level taxa. Another four clusters formed at distances slightly below the division level: CR3 to CR5 at 0.18 and CR6 at 0.17 (see S3 in the supplemental material). Interestingly, seven of the eight sequences defining CR1 and CR2, and 25 of the 32 sequences defining CR3 to CR6, are miniprimer sequences. Many other sequences also branched deeply at distances of 0.29 to 0.16 to their nearest neighbors but did not meet all the criteria for defining new division-level taxa. A small fraction of library sequences accounted for these putative novel groups: approximately 3.5% of library sequences comprised monophyletic groups at distances of at least 0.17, with 5.9% of the miniprimer sequences contributing and 1.5% of the two long-primer libraries combined contributing (1.7% from the 27F-P/1492R-P and 1.2% from the 27F-HT/1492R-HT libraries, respectively).
FIG. 6. Microbial mat sequences with low-scoring database matches. Miniprimer (red), long-primer (green and blue), and reference (black) sequences were assembled into a phylogenetic tree by use of maximum likelihood (35). The best-scoring tree of 20 independent (more ...)
Other Candelaria sequences expanded the membership of previously defined taxonomic groups. The Chloroflexi
group Eub6 is almost entirely composed of sequences cloned in this study and from the Guerrero Negro microbial mats (26
) and was the largest single taxonomic group from any division to which miniprimer sequences contributed; the other Chloroflexi
subtaxa were largely populated by long-primer sequences, suggesting the existence of a primer bias for particular groups of Chloroflexi
. Similar subdivision bias is demonstrated by the Halanaerobiales
sequences: most Halobacteroidaceae
family sequences were from miniprimer libraries, whereas most Halanaerobiaceae
family sequences were from long-primer libraries. A large fraction of miniprimer sequences was clustered within the Planctomycetes
group and sequences from all libraries were distributed over many Deltaproteobacteria
groups. Five sequences contributed to the Deltaproteobacteria
group GN04, which was recently defined based on sequences isolated from the Guerrero Negro microbial mat. Several miniprimer sequences populated the Spirochaetes
group GN05-1, which was first defined as a group of Guerrero Negro sequences and is currently exclusive to sequences cloned from microbial mats. Very few miniprimer sequences were from the Bacteroidetes
divisions, though large fractions of long-primer sequences were classified in these divisions (Fig. ). The remaining divisions into which library sequences were placed each contributed less than 5% to their respective libraries.
To evaluate the reproducibility of the three library construction methods and to compare the methods, the 16S rRNA gene sequence libraries were compared to each other by calculating similarity indices (34
) using 97% sequence similarity to define OTUs. Here, we focus on the classic incidence-based Sørenson similarity index, a measure of membership, and the Clayton θ, a nonparametric maximum likelihood estimator of community structure that considers abundance (41
). These calculations demonstrate two trends (Table ): first, duplicate libraries constructed with the same primers show high levels of similarity in both membership and structure; and second, libraries constructed with long primers are typically more similar to each other than to libraries constructed with miniprimers. These trends are particularly evident when relative abundances are considered (θ, Table ). Importantly, the data suggest no significant dependence of library membership or structure on the particular polymerase used, as demonstrated by the high similarities of duplicate libraries constructed with the same primer pairs, 27F-P/1492R-P or 27F-HT/1492R-HT, but amplified with different polymerases (Tables and ). Moreover, the miniprimer method is as reproducible as either of the other methods tested here using long-primer pairs. Thus, these data suggest that differences in compositions of the libraries constructed with miniprimers and long-primer libraries are not due to the enzyme used for amplification.
Estimation of library similaritiesa
The coverage and richness of the 1,281 Candelaria sequences were estimated by calculating rarefaction curves and the nonparametric diversity estimators Chao1 and Ace1 (see S4 in the supplemental material). Rarefaction analysis suggested the sequences represented a diversity of at least 423 phylotypes defined at an identity threshold of 97% but indicated that sampling was far from complete at this threshold. At an 80% identity threshold, the estimates appear to have reached plateaus, suggesting that sampling is more complete at approximately the division level. Chao1 and Ace1 suggest the presence of 91 to 124 taxa (95% confidence intervals) at the 80% threshold, and the rarefaction analysis indicates that the libraries represent 87 taxa at this level. Thus, though most division-level taxa have probably been identified in the microbial mat sample, these analyses suggest a high degree of diversity at a greater taxonomic resolution and that many organisms have yet to be identified.
Lastly, we examined the miniprimer clone sequences to quantify mismatches within the regions that would be targeted by the long primers 27F-P, 27F-HT, 1492R-P, and 1492R-HT (Fig. ). For the 27F binding region, 309 and 266 of the 598 miniprimer sequences deviated from the 27F-P and 27F-HT primer sequences, respectively, at one or more nucleotides; sequences with two mismatches to 27F-P and 27F-HT numbered 20 and 10, respectively, and 1 sequence had three mismatches to the 27F-P sequence. Though the miniprimer sequence consensus largely agreed with the sequences of the long 27F primers, the nucleotide frequency at position 12 did not agree with the degeneracy present at this position in 27F-HT. Position 12 of 27F-HT is designed to match A or C; however, while in the cloned miniprimer sequence libraries C was present in the majority of sequences, G was present more than four times as often as was A, and T was observed twice. Similar analysis of the 1492R binding region identified one sequence from the miniprimer clone libraries having a single mismatch to 1492R-P and 1492R-HT (data not shown).
FIG. 7. Mismatches of 27F-P and 27F-HT primers to microbial mat miniprimer library sequences. (A) Portions of long primers 27F-P and 27F-HT that bind past the 3′ end of miniprimer 27F-10. (B) Bases represented by degenerate codes in 27F-HT. (C and D) (more ...)