Using the GS-FLX platform (454 Life Sciences), we recently described the first
cpn60-based phylogenetic analysis by deep sequencing (
49), comparing
cpn60-based to 16S rRNA gene variable region 2 and 3-based phylogenies in matched samples. While the overall structure of the microbiota was similar, resolution (i.e., the number of distinct phylotypes detected per sample) was greatly increased by using
cpn60 UT (
49). Our findings indicate that
cpn60 UT is ideally suited to deep sequencing using next-generation platforms.
The main advantage of defining consensus sequences using an assembly method, as opposed to selecting a representative sequence after clustering, is that the consensus sequence is the OTU summarizing all of the reads in each assembly for phylogenetic inference. This approach allows visual inspection of the aligned sequences used to create the consensus sequence. The impact of sequencing error in OTU assignment may also be mitigated by using a consensus-based approach, although this possibility was not formally evaluated.
Extensive sequence heterogeneity may, in part, represent technical error arising from sequencing or sequence assembly. However, heterogeneity was more extensive in some phylotypes than in others. Extensive Gardnerella heterogeneity was first observed in clone libraries and isolates and confirmed by deep sequencing. A similar observation in three independent data sets indicates that deep-sequencing error is unlikely to be the sole explanation for these differences.
Distinct
G. vaginalis biotypes have been described in early culture-based studies, as well as more recently (
24,
43), and recent genome level analyses indicate extensive genomic rearrangement in this organism (
58). A recent study of HIV
+ African women detected four distinct 16S rRNA gene v6-based
G. vaginalis phylotypes (
29), and we have recently observed similar patterns based on
cpn60 UT sequences from published
G. vaginalis genomes. The possible biological significance of the heterogeneity observed in this study is unknown.
Since no unambiguous rule for collapsing OTUs was established, we decided that the OTU as defined by newbler assembly was the most direct and reproducible basis for OTU definition. Whether this set of OTUs over- or underestimates the true sequence diversity was not established in this study, however, potential errors in OTU calling are restricted to the highest resolution (i.e., at branch tips) and are therefore unlikely to affect the study's main conclusions.
As in our previous study, we observed differences in the overall proportion of specific OTUs in clone libraries versus deep-sequencing reads. Although we used the same primers to amplify the cpn60 UT for either method, different molecular and bioinformatic biases associated with the two methods are likely. Also, clone libraries were generated in a smaller group (n = 10) that overlapped with but was not entirely a subset of the individuals selected for deep sequencing (n = 44). Therefore, differences in relative abundances may be related to methodological biases and/or different individuals in these subsets.
Small sequence differences between similar OTUs defined in clone libraries versus pyrosequencing may reflect actual differences between closely related strains or may result from different biases associated with the generation of consensus sequences using different software and data sets for clones (Gap4 using the full-length cpn60 UT) versus pyrosequencing reads (newbler using reads ranging from 150 to 400 bp). A closer look reveals that all OTUs found only in clone libraries are very close to OTUs found in reads. This question was formally addressed by comparing total branch length based on OTUs from reads (16.5 base substitutions per site or bsps), clones (5.1 bsps), and isolates (6.1 bsps). The total branch length when adding clone-only OTUs to read OTUs was 19.1 bsps. Therefore, the amount of unique branch length contributed by clone-only OTUs is 2.6 bsps.
The large number of OTUs with species level designations in the cpnDB contrasts with recent 16S rRNA gene-based molecular studies that largely describe the vaginal microbiota at the genus or higher taxonomic levels (
29,
34,
46,
51). Our findings indicate that the higher resolution of the protein-encoding
cpn60 UT allows greater separation between closely related organisms than other conventional targets do (
49). We report 55 species in 29 genera from 200-bp assemblies representing just over 600,000 reads, compared with 23 species in 10 genera reported from over 12,000,000 72-bp Illumina reads of 16S rRNA gene v6 in a much larger group of African women (
29).
The other studies report virtually no species level information. For example, all studies report that Prevotella was observed in most individuals; however, none report species level designations for this genus. In contrast, our study detected nine Prevotella species (P. amnii, P. bergensis, P. bivia, P. buccalis, P. corporis, P. disiens, P. melaninogenica, P. timonensis, and P. zoogleoformans) and three Porphyromonas species (P. asaccharolytica, P. gingivalis, and P. uenonis) in the phylum Bacteroidetes with an average of 96.8% identity.
The phylogeny of deep-sequencing OTUs shown in the context of clone libraries and isolate sequences demonstrates that the overall composition of the vaginal microbiota in this group is strikingly consistent with what has been observed in other groups of women worldwide (; see Table S1 in the supplemental material) (
25,
46,
51). Since these women report a median of four commercial sex work clients per day (range, 1 to 10) and frequent postcoital douching (
48), it is somewhat surprising that their vaginal microbiota should be so similar to that of other groups. This finding suggests a robustness of vaginal microbial communities despite frequent physical disturbance.
The hypothesis that HESN women have increased levels of specific types of Lactobacillus and/or reduced levels of specific types of BV-related organisms compared to those of HIV-N and HIV+ women was not supported. We originally estimated that a sample size of 15 individuals in each group would be sufficient to detect a 50% difference in the presence of a specific OTU; however, the high resolution of the present data set (number of OTUs defined) and the extensive variability observed between individuals make it difficult to draw firm conclusions based on such small numbers of individuals at a single time point. Longitudinal studies designed to determine the dynamics of the vaginal microbiota in relation to pro- and antiinflammatory mucosal factors may reveal distinct patterns in HESN individuals. The present study provides a solid framework on which to base future studies.
Our results suggest that HIV
+ women in this cohort have a vaginal microbiota, including increased
E. coli and reduced
L. crispatus levels, distinct from that of HESN or HIV-N women. However, a recent 16S rRNA gene-based survey of the vaginal microbiota in HIV
+ women in Tanzania did not detect any individuals with dominant
E. coli (
29), indicating that this finding may be a specific characteristic of HIV
+ women in this cohort, since most also receive daily prophylaxis with the antibiotic trimethoprim-sulfamethoxazole (Septrin) (
48).
Based on UniFrac PCA, we defined samples based on phylogenetic characteristics determined by deep sequencing, i.e., a “molecular definition” of BV. We found that samples with BV+ and BV− diagnoses by Nugent score were significantly different in terms of the richness, diversity, and abundance of Gardnerella, Bacteroidetes, Clostridiales, Lactobacillus, and Proteobacteria OTUs by deep sequencing, indicating an excellent correspondence between molecular and traditional definitions for these categories.
A cluster of samples with intermediate levels of
Lactobacillus and dominant
Gardnerella was identified (mBVI). These samples also had intermediate levels of
Clostridiales,
Bacteroidetes, and
Proteobacteria abundance compared to those of BV
+ and BV
− samples. Although a discussion of BV dynamics remains speculative in the absence of longitudinal data, these samples hypothetically represent a transition stage between a
Lactobacillus-dominated phenotype (BV
−), characterized by a shift to
Gardnerella dominance as levels of
Lactobacillus fall, and gradual establishment of the
Gardnerella-
Prevotella-
Clostridiales codominant phenotype (BV
+). These findings are consistent with
in vitro work suggesting that nutritional interactions between
Gardnerella and
Prevotella are likely to be involved in the establishment of BV (
44).
By statistical analysis of the relative abundance of specific OTUs and
k-means clustering techniques in the Genespring environment, we have defined several potential biomarkers of BV, including increased levels of
Clostridiales clones NC030 and NC040, increased
P. amnii, and reduced levels of
L. iners. These OTUs are good candidates for rapid approaches to quantify fluctuations in specific bacterial populations associated with BV, as we have recently shown (
17).
Surprisingly, several alpha- and betaproteobacterial OTUs, including the newly described species V. paradoxus, were found to be most abundant in BV− samples. Like many others, we have shown that increased Lactobacillus abundance is strongly tied to BV− status; however, no previous studies that we are aware of have described a non-Lactobacillus organism with increased abundance in BV− individuals. Further work is required to confirm this potentially important finding.
Understanding fluctuations in specific bacterial populations in the vagina and their influence on the susceptibility of the mucosal barrier to infection with HIV and other pathogens is critical to defining mechanisms underlying the well-known association between BV and HIV. The highly detailed species and strain level definition of the vaginal microbiota related to BV in this study provides a framework in which to track specific microbes in HIV-exposed individuals in order to address hypotheses about vaginal microbiology, mucosal resistance factors, immune quiescence, and resistance to HIV.