This study compared the genic contents of 17 S. pneumoniae
strains. Genes from all strains were organized into orthologous clusters, and these clusters were quantified for all genomes. When the genomes are analyzed together, fewer than 50% of all the orthologous clusters (corresponding to ~73% of the total CDSs) are conserved among all species. When the genomes of individual strains were evaluated, 21 to 32% of the orthologous clusters were noncore. Predictions using the finite supragenome model suggest that the total number of orthologous clusters in the S. pneumoniae
species is around 5,100 and the total number of core orthologous clusters is around 1,380. These large strain differences illustrate the enormous genic diversity within this species, as postulated in the distributed-genome hypothesis (7
). The engines driving this genomic plasticity are threefold: first, it has been demonstrated that chronic infections by nasopharyngeal pathogens are generally polyclonal in nature (11
; J. R. Gilsdorf, presented at the 9th International Symposium on Recent Advances in Otitis Media, 3 to 7 June 2007); second, the bacteria in these chronic infections adopt a biofilm mode of growth, which greatly increases the kinetics of horizontal gene transfer (8
); and third, S. pneumoniae
employs highly energetic fratricidal as well as autocompetence and autotransformation mechanisms for the release and uptake of pneumococcal DNA, respectively, from the surrounding environment (35
). The pathological consequences of these phenomena, which collectively result in a continual reassortment of genic characters among strains within a polyclonal biofilm infection, are that the host's adaptive immune system continually encounters novel strains, making clearance very difficult, because the pathogen can generate diversity faster than the host can adapt to it, thus ensuring chronicity of infection.
In a previous study, we constructed individual genomic libraries from the eight CGS S. pneumoniae
clinical isolates (CGSSp9BS68, CGSSp14BS69, CGSSp11BS70, CGSSp3BS71, CGSSp23BS72, CGSSp6BS73, CGSSp18BS74, and CGSSp19BS75). Of the 4,793 clones sequenced, ~16% were not present in the TIGR4 reference strain, suggesting that many genes were not conserved across the species. In addition, the screen identified genes unrelated to any streptococcal sequences; analysis of the allocation of a subset of 58 of these found that they were not uniformly distributed across the eight strains (37
). These results are in complete agreement with this study; both studies underscore the genomic plasticity of the S. pneumoniae
The use of the finite supragenome model suggests that 99% of orthologous clusters in the supragenome that have population frequencies equal or higher to 0.1 can be identified after sequencing of 33 strains and that the 17 available strains provide ~95% coverage of this set. When analyzing the S. agalactiae
supragenome, Tettelin and colleagues presented a different mathematical model, generated using the assumption that noncore genes are sampled in the population with equal probabilities (43
). Unlike the finite supragenome model, this model predicts that a constant number of new strain-specific genes will be identified with the addition of each genome, such that sequencing a limited number of strains would not provide major coverage of the supragenome.
Our analysis includes clinical strains from multiple locations including the United States, the United Kingdom, Norway, and Spain. Diversity is generated from DNA exchange among strains; thus, it is tempting to consider that strains from the same geographical location may be more similar, since they have a higher probability of exchanging genetic information (directly or indirectly, via other strains). Interestingly, we did not observe this with our limited number of strains. While it is possible that a correlation between geographical distance and genic diversity will be observed when a larger number of strains from multiple geographic regions are sequenced and compared, we must nonetheless consider that this correlation may not exist. This result would be explained if the vast majority of the orthologous clusters in the S. pneumoniae supragenome have been in the species for a very long time, and horizontal transfer from other species and new mutations have introduced only a minority of the supragenome's orthologous clusters, or if the extent of human population migration is now so high (at least in the West) that human pathogens are essentially homogenized around the world.
This enormous genetic diversity calls attention to the need for markers of human virulence phenotypes and highlights the potential difficulty associated with this task. S. pneumoniae
strains are presently categorized based on capsule type and MLST. The capsular serotype is an important virulence factor and affects the ability of pneumococci to cause invasive disease (2
). For example, the difference in virulence between type 2 D39, which is highly virulent in the murine model of infection, and unencapsulated R6, which is avirulent, is attributed to the loss of the capsule. However, it is critical to remember that even within the same capsular type, virulence is highly related to the genetic background of the strains (20
). The virulence phenotypes displayed by the eight strains isolated in Pittsburgh differ significantly in a chinchilla model of S. pneumoniae
infection; these differences may be due to distinct serotypes, genotypes, or both (M. Forbes and J. Hayes, personal communication). Our data in Fig. clearly show that the serotype cannot be correlated with the genic content, since strains of serotypes 14, 6, and 23 were not grouped based on genic differences. An analysis of sequence variation of the type 6A and 6B capsular biosynthetic loci was related to the MLST profile, yet there was also ample evidence of horizontal transfer to unrelated lineages (27
). Phylogenetic trees using MLST from the seven CGS clinical strains of known MLST types did not closely resemble the phylogenies created from genic differences (data not shown). Together these data suggest that in some cases, the serotype, MLST type, and/or genetic background may correlate, but in other cases, they do not, as would be expected from strains undergoing high rates of intraspecies horizontal gene transfer. Since pathogenesis is probably a consequence not only of capsular type but also of multiple other genes, MLST type and serotype alone are not ideal markers for the disease phenotype of S. pneumoniae
Previous work on six strains of Streptococcus agalactiae
described a supragenome with ~80% core orthologous clusters and the remaining set consisting of partially shared and strain-specific orthologous clusters (43
). In addition, these data resemble, in qualitative and quantitative terms, our comparison of 13 H. influenzae
). The total number of H. influenzae
orthologous clusters for the 13 strains was 2,786, of which 52% were core, 29% were distributed, and 19% were unique. Taken together, these studies suggest that a high degree of genic variation is common among multiple species. However, it may not be universal; analyses of eight Bacillus anthracis
strains revealed substantially less variation among strains, with no new genes uncovered after analysis of only four genomes (43
). It is possible that this degree of variation is intrinsic to naturally transforming bacteria such as H. influenzae
and S. pneumoniae
, which undergo extensive DNA recombination events. In addition, both of these bacteria exist exclusively in the human mucosa, where they form biofilms (15
). Cells in a biofilm are embedded in an extracellular polymeric matrix that is rich in nucleic acids; thus, biofilms may provide ideal environments to foster such genomic plasticity (32
). There is also quantitative similarity, since both species have core genomes that seem to stabilize around 1,400 proteins, or ~50% of the supragenome. This similarity suggests that similar evolutionary forces may be determining the equilibrium between core genes, noncore genes, and genome size.
This diversity suggests caution in the use of model strains to test and develop vaccines and drugs, since effective targets in one strain may be missing in a significant percentage of the other strains. It is probable that these bacteria have evolved multiple and redundant mechanisms to evade immunity and adapt to variations among hosts and their commensal microbiota.