Designing predicted primers using conserved fragments of 16S rDNA sequences
We identified continuous conservative sites (>14 nt) in the Archaea and Eubacteria separately. They were positioned on the E. coli 16S rRNA gene by using a pairwise alignment and converted to conservative fragments. There were 8 archaea-specific and 11 eubacteria-specific conservative fragments of various lengths. Most of the conservative archaeal and eubacterial fragments were numbered according to approximate positions on the E. coli 16S rRNA gene, and only four fragments lacked any counterparts: eubacterial fragments 104–120, 683–707, and 1177–1197, and archaeal fragment 1225–1242 (). Among the overlapping fragments, we found obvious sequence variations such as between archaeal 344–367 and eubacterial 314–368. The differences in these fragments possibly reflect the major characteristics of the functional parts of the 16S rRNA transcripts, which probably developed after the divergence of the Archaea and Eubacteria.
The conservative fragments in archaeal and eubacterial 16S rDNAs.
Next, we selected candidate primers (15 nt) from the fragments by checking their coverage rates. A high coverage rate indicates a high percentage of bacteria in our dataset with a target site for the candidate primer. Every candidate primer was examined with a sliding window, which was moved across the fragments (). Although all the sites were highly conservative, the coverage rates of the candidate primers on the same fragment varied markedly and might be distributed across a larger range than that shown in . The candidate primers containing degenerate sites clearly corresponded to low coverage rates (), suggesting that introduction of the degeneracies could not ensure complete matches between the primers and their targets, and that the degeneracies by themselves pointed to the positions of weak sites in the candidate primers as well as in the conservative fragments.
Coverage rates of candidate primers within a conservative fragment.
After we filtered out the candidate primers with a coverage rates below 90%, the remaining overlapping primers were merged again and new coverage rates were measured for them (). Thirty candidate primers (13 for the Archaea and 17 for the Eubacteria) were identified and are of potential use in designing forward and reverse primers. Notably, eubacterial conservative fragment 104–120 did not contain candidate primers that met the selection criteria. Some primers for the Archaea and Eubacteria were not only numbered with the same E. coli rDNA positions but were also highly homologous in their pattern. Therefore, they were defined as predicted universal primers: U515–532, U785–800, U909–928, and U1052–1071 ().
The coverage rate of predicted primers.
Coverage rates of predicted and known primers
To evaluate the accuracy of our prediction, the predicted primers were compared with 29 known primers including 13 Archaea-specific, 9 Eubacteria-specific, and 7 universal primers (). After cleaning the overlapping primers, we found that our predicted primers contained a novel primer, A884–898, which has not been reported previously. Although nearly all the predicted and known primers were located in the same regions, some of the known primers were probably problematic because of the lack of sufficient degeneracies and the low degree of conservation at some sites in the primers. Therefore, the coverage rates of these primers were compared with those of the predicted primers.
Coverage rate of known primers.
For the predicted primers, the average coverage rates of the archaeal and eubacterial primers were 96% and 96.2%, respectively. The average coverage rate of the predicted universal primers was 96%. The values for the known archaeal, eubacterial, and universal primers were 85%, 77.4%, and 84.3%, respectively. Overall, the coverage rates of all the predicted primers were above 90%, whereas the coverage rates of the 11 known primers (30.6% of all known primers) were lower than 90% (). The coverage rates of the predicted primers were significantly higher than those of the known primers (Spearman test; P<0.00001). Our results also cast doubt on the validity of some known universal primers, as three out of the seven showed poor coverage in Archaea or Eubacteria: the coverage rate of U779 in Archaea was only 5%. The remaining primers, U341F, U519F, U789F, and U1053F, are highly recommended for their high coverage rates in all bacteria. U341F was not included among our predicted universal primers, as polymorphisms and dissimilarities in this region would introduce too many degeneracies when both the Archaea and Eubacteria are considered.
Phylum specificity of predicted and known primers
As described above, we generated a list of predicted and known primers with a high coverage rate for both the Archaea and Eubacteria. However, it was a challenge to amplify the 16S rRNA sequences of all the bacteria in environmental samples. Generally, the dominant and well-characterized bacterial phyla could be detected easily according to the principles of primer design. The problem was how to identify the minority bacterial phyla; occasionally, a whole phylum was missed. In the RDP database, the numbers of bacteria from different phyla differed substantially, and the failure to detect a small phylum might simply result in less than 1% loss of coverage rate. Therefore, it was necessary to assess the phylum specificity of our predicted primers, as a supplementary evaluation other than coverage rate.
We first displayed coverage spectrum of 13 Archaea-specific primers. In the Crenarchaeota and Euryarchaeota, the percentage of failed detections was below 10% for the primers, indicating that the coverage of these Archaea was rather stable (). However, the coverage of Korarchaeota and Nanoarchaeota varied remarkably in a range of 0%–100%. Primers A785–800, A899–913, and A905–936 were not suitable for Korarchaeota, as indicated by their 100% of failure rates. The highly variant coverage rates of these primers in Nanoarchaeota were not surprising because there were only three representatives of this taxon (>1200 nt) in the database. In light of the spectrum found in this test, A519–539 could provide the best coverage of all archaeal phyla. Although some primers failed to cover Korarchaeota completely, they provided location information for the design of Korarchaeota-specific primers. Among the 12 known Archaea-specific and universal primers examined, U906F and U1053F performed better than the others (Fig. S1
). And the result confirms that the Archaea-specific primers do not have high coverage rates in Korarchaeota and/or Nanoarchaeota.
Phylum specificity of predicted primers for Archaea.
The same test was performed with 17 predicted Eubacteria-specific primers on 25 eubacterial phyla. Most of the primers showed a weakness in finding targets in a small spectrum of eubacteria phyla (). E969–983 was the best primers because it displayed the lowest average rate (1%) of failed detections, followed by E1063–1081 with an average failure percentage of 4.6%. The highest average failure percentage (32.8%) was observed for E1177–1193. Surprisingly, the difference between E783–797 and E785–806 was 9%, although the major part of E783–797 lies within E785–806 except for the first two nucleotides. Therefore, different primers show clear phylum specificity, and fine adjustment of the primer target could achieve better coverage. This was verified by variant rates of failed detections observed for the same phylum dataset using different primers. We thus measured the average of the rates for individual phyla to determine the bacteria phyla that were most easily detected, and the results showed that Firmicutes, Gemmatimonadetes and Proteobacteria were the phyla with the highest rates of match to the primers. In ascending order, the average percentages of failed detections were 1.47%, 1.54%, and 1.9%, respectively, for three phyla. In contrast, Planctomycetes and TM7 were associated with the highest average rates of failed detections (40% and 31.8%, respectively) with large standard deviation (42% and 43%, respectively), indicating that the coverage of the primers in these two phyla is not stable. These results could be foreseen because the overwhelming number of representatives from Firmicutes and Proteobacteria () caused a bias in primer design. The polymorphisms in the minority phyla were largely ignored, leading to insufficient degeneracies in the primers.
Phylum specificity of predicted primers for Eubacteria.
The performance of known primers was also assessed. Of the top three phyla, Firmicutes and Proteobacteria were most easily detected with the known primers (Fig. S2
). A minor phylum, Deferribacteres, was the phylum best covered by the known primers, with the lowest average rate (0.45%) of failed detections, followed by Deinococcus and Acidobacteria. This finding suggests that the 16S rDNA sequences collected previously from the RDP and GenBank were less biased in collection of certain phyla. However, the usefulness of the known primers for Verrucomicrobia was limited, and half the known primers showed >50% failed detections, perhaps reflecting the lack of representatives of this phylum when the primers were designed. Among the known primers, U515 and E517 are highly recommended in light of their wide spectrum of perfect coverage. E1099F also had an overall high coverage rate, although it failed to detect most of Planctomycetes (Fig. S2
Assessment of Cyanobacteria-specific primers
The above results are useful for studies that focus on a specific phylum. By designing primers for a phylum of interest, only the 16S rDNA of the desired bacterial species is amplified for subsequent studies. We examined three Cyanobacteria-specific primers, CYA106F, CYA359F, and CYA781R 
. The coverage rate for all Eubacteria was 31.7% for primer CYA106F, 7.4% for CYA359F, and 2.3% for CYA781R. We classified the identified bacteria species and found that CYA106F was not specific for the Cyanobacteria. CYA106F, CYA359F and CYA781R could be used to identify 80%, 98%, and 92% of the 4655 Cyanobacterial sequences in our collection, independently. Moreover, CYA106F and CYA359F had many targets in Firmicutes: 75% of 94475 Firmicutes sequences were targets of CYA106F and 9% were targets of CYA359F. However, CYA781R had an extremely low coverage rate (0.001%) in Firmicutes. An appropriate combination of forward and reverse primers could avoid generating a mixture of amplicons from Firmicutes. These primers designed, based on previous database collection, are still useful today.
Distance of the primers to variant regions of 16S rRNA genes
We put the predicted and known primers onto the same map to compare their relative distances to the 16S rRNA variant regions. Three of these regions (V3, V5, and V6) in E. coli are shown in . The primers were concentrated in six narrow regions, spanning the three variant regions. For those primers with high coverage rates, the predicted and known primers overlapped strongly. The “hot” regions where the primers bind were: 321–364, 505–539, 783–806, 884–939, 947–984, and 1045–1081. The sizes of the amplicons from the V3 region and V5–V6 region were about 180 nt and 270 nt, respectively. Both could be completely sequenced with the 454 FLX platform.
Primer distributions and distances to variant regions.