|Home | About | Journals | Submit | Contact Us | Français|
The α-proteobacteria represent one of the most diverse bacterial subdivisions, displaying extreme variations in lifestyle, geographical distribution and genome size. Species for which genome data are available have been classified into a species tree based on a conserved set of vertically inherited core genes. By mapping the variation in gene content onto the species tree, genomic changes can be associated with adaptations to specific growth niches. Genes for adaptive traits are mostly located in ‘plasticity zones’ in the bacterial genome, which also contain mobile elements and are highly variable across strains. By physically separating genes for information processing from genes involved in interactions with the surrounding environment, the rate of evolutionary change can be substantially enhanced for genes underlying adaptation to new growth habitats, possibly explaining the ecological success of the α-proteo-bacterial subdivision.
The Proteobacteria represent the largest bacterial group that is currently recognized in the domain Bacteria. The name ‘proteobacteria’ is derived from the Greek God Proteus, which had the ability to change shape, to indicate that the proteobacterial species come in many forms, colours and shapes. However, the diversity of this subdivision is not merely restricted to shape, but extreme variation is also observed in lifestyle, metabolic capacity and ecological significance.
Over the past decade, the genome sequences of nearly 100 α-proteobacterial species have been determined, revealing an astonishing plasticity in genome size and architecture. In this paper, we will reflect on the diversity of the α-proteobacterial genomes and the coherence of the α-proteobacterial lineage in the light of all this genetic flexibility. We will argue that the α-proteobacteria represent an excellent model system for studying bacterial genome evolution because of the ease with which they adapt to new growth habitats. In this respect, the α-proteobacteria represent the bacterial equivalents of the Darwin finches.
Bacterial classification schemes are mostly based on the sequence diversity of the rRNA genes because of their universal presence and slow rate of evolution, enabling inferences of deep divergences. Likewise, core genes involved in information processing such as transcription and translation tend to evolve by vertical descent and are frequently used to delineate species relationships. Using the latter approach, a species tree has been inferred for all α-proteobacterial species with a sequenced genome (figure 1; Williams et al. 2007).
By mapping the growth habitat of each organism onto the species tree (figure 1), major environmental shifts can be inferred. This is of interest, since the α-proteobacteria are found in all imaginable habitats, ranging from the ocean floor to volcanic environments, in many of which they represent the most abundant bacterial group (Dutilh et al. 2008). For example, nitrogen-fixing members of the Rhizobiales are highly abundant in the soil where they interact with plant roots, whereas the SAR11 clade and the Rhodobacterales account for as much as 30–50% of all bacteria in the ocean surface waters (Giovannoni et al. 2005). Adaptations to vertebrate and invertebrate hosts have occurred several times independently (Batut et al. 2004), with up to 76 per cent of all arthropods and most of filarial nematodes being infected with Wolbachia (Bourtzis & Miller 2003). Since the species composition pattern is characteristically different for each environment, we conclude that the ecological abundance of the α-proteobacteria is not simply an effect of ‘everything is everywhere’, but rather a reflection of highly specific adaptation patterns.
If every gene in the α-proteobacteria evolved by vertical descent, all genes would support the same tree topology. This is obviously not the case: only 33–97% of all genes have a most similar homologue that is also an α-proteobacterial species (Esser et al. 2007). Furthermore, these genomes show a 10-fold variation in genome size, ranging from 1 to more than 9Mbp (figure 1), with species-specific genes counting in the thousands (Boussau et al. 2004). Thus, we may ask: are all genes equally flexible and how does this variability correlate with adaptations to specific growth niches?
Present in all species, including even the smallest genomes, are approximately 200 genes for DNA, RNA and protein syntheses and another 40 genes for nucleotide and cofactor biosyntheses (Boussau et al. 2004). Gene content statistics for different functional categories show weak increases in gene number with genome size for genes involved in basic information processes, such as transcription and translation (approx. 4 genes Mbp−1; Boussau et al. 2004). Much more dramatic increases are observed for genes involved in adaptability processes, such as energy metabolism, transport and regulatory functions (approx. 80–100genesMbp−1), with some of the smallest genomes having virtually no regulatory genes (Boussau et al. 2004).
We anticipate these variations in gene content to reflect the quality and stability of the growth environment. Indeed, by mapping the fluctuations in genome size onto the species tree, periods of massive genome reduction and expansion can be identified with hundreds of genes lost and gained in direct association with important environmental shifts (Boussau et al. 2004). For example, bacteria adapted to the cytoplasm of eukaryotic host cells, which is a relatively static and rich growth environment, have evolved towards small genomes, whereas free-living bacteria adapted to soil, which is more variable in nutritional quality, have genomes of much larger sizes. Thousands of genes have been eliminated and thousands of genes have been acquired. Below, we discuss the mechanisms underlying these dramatic changes in gene content.
Gene loss has been a dominant process in the evolution of the intracellular members of Rickettsiales (figure 2a), which have single circular genomes in the 1–1.5Mbp range (Darby et al. 2007). Among genes discarded in Rickettsia are those involved in the biosynthesis of metabolites that are instead imported from the nutritionally rich cytoplasm of the eukaryotic host cell, such as vitamins and amino acids (Darby et al. 2007; Fuxelius et al. 2007). This reductive mode of evolution is still ongoing, as manifested in high fractions of fragmented genes and non-coding contents of up to 30 per cent (Fuxelius et al. 2008). Although weakly mutated genes may still be expressed, a majority are severely degraded and probably non-functional (Darby et al. 2007).
Genomic streamlining is observed in the SAR11 clade, here represented by Pelagibacter ubique (figure 2a). This genome has also suffered from gene loss, but in contrast to the Rickettsiales genomes it contains neither pseudogenes nor remnants of mobile elements. With a genome size of 1.3Mbp, and an intergenic distance of only 3bp, this is the most compact bacterial genome sequenced to date (Giovannoni et al. 2005). It is suggested that genomic streamlining in P. ubique is the result of selection on genome size per se to increase the surface-to-volume ratio and thereby facilitate uptake of nutrients that are limiting in the oceans, such as phosphorus and nitrogen.
Gene transfer to the nuclear genome of the host may also potentially contribute to genome reduction in obligate host-associated bacteria (figure 2a). For example, bacterial genes from Wolbachia have been discovered in the nuclear genomes of their fly hosts; it is not yet known whether these transferred genes contribute bacterial functions (Hotopp et al. 2007). Such gene transfers underlie the reductive evolution of the mitochondrial genome, which is derived from an endosymbiotic α-proteobacterium (Kurland & Andersson 2000).
In addition to the chromosome, many species of the Rhodobacterales contain auxiliary replicons up to several megabase pairs in size. Transfer of genes located on auxiliary replicons may lead to rapid expansion of the gene pool (figure 2b). Some of these replicons, also called megaplasmids, contain plasmid-like replication systems and were presumably once derived from plasmids (Cevallos et al. 2008). The megaplasmids are extremely variable among strains (Giuntini et al. 2005) and contain mostly genes that confer adaptive traits, thereby facilitating niche specificity. As such, they represent a playground for evolution, where selection may rapidly drive any gene circulating with the conjugation system to fixation.
The auxiliary replicons may also integrate into the main chromosome (figure 2b), as observed in Bartonella (Alsmark et al. 2004) and Bradyrhizobium (Viprey et al. 2000). Such integration events serve to stabilize the auxiliary gene pool, thereby preventing its loss from the population. Most α-proteobacterial genomes also contain genomic islands that carry genes encoding important adaptive traits that can integrate, excise and be transferred across species (Sullivan & Ronson 1998).
Finally, host-adapted bacterial genomes may expand in size by the acquisition of genes from the host (figure 2b). For example, ‘eukaryotic-like’ genes encoding proteins with tetratrico peptide and ankyrin (ANK) repeat domains are present in several members of the Rickettsiales (Wu et al. 2004; Cho et al. 2007; Klasson et al. 2008). In analogy with the secretion of ANK proteins in Legionella and Coxiella (Pan et al. 2008), it is thought that the ANK proteins in the Rickettsiales are secreted and used to modulate the functions of the host cell.
The difference in size among α-proteobacterial genomes reflects adaptations to different environments, just as the different sizes of the beaks of the Darwin finches reflect adaptations to different food sources. By placing the genomic characteristics within a phylogenetic framework, the association between genotypic and phenotypic diversities can be explored (figure 1). Two opposing mechanisms explain most of the genome size variations observed: loss of genes in obligate host-adapted bacteria (figure 2a) and gain of genes in free-living bacteria (figure 2b). Most of the variability in gene content is associated with genes conferring adaptive traits, with bacteria living in isolated and rich environments having a smaller auxiliary gene pool than bacteria adapted to environments that frequently change in nutritional quality. Most of the variability is associated with the mobile gene pool, as a result of which genes for adaptive traits can be amplified and exchanged upon demand without compromising the stability of the vertically inherited core genes.
The acquisition of new adaptive traits in bacterial genomes is rarely due to single nucleotide polymorphisms. Much more dramatic changes, such as deletions, duplication–divergence and de novo acquisitions of genes, need to be invoked. These changes are often associated with repetitive sequences, i.e. regions that are often not well resolved in eukaryotic draft genome sequences. Here, there may be an important lesson for future genomic studies of adaptive traits in eukaryotes, such as the evolution of different beak sizes in the Darwin finches.
T.J.G.E. acknowledges the support from a Marie Curie Intra-European fellowship by the European Union. S.G.E.A. acknowledges the support of the Swedish Research Council, the Swedish Foundation for Strategic Research, the European Union, the Göran Gustafsson Foundation, and the Knut and Alice Wallenberg Foundation.
One contribution of 11 to a Special Feature on ‘Whole organism perspectives on understanding molecular evolution’.