The genomes of obligate bacterial symbionts are highly reduced in size and gene repertoires due to a combination of factors. First, the symbiotic lifestyle in a nutrient-rich host environment renders numerous genes superfluous, allowing the inactivation of many previously functional regions 
. In addition, the dynamics of symbiont transmission to new hosts involve severe restrictions in population size and impose clonality, thereby reducing the efficacy of selection and fostering the accumulation of deleterious mutations 
. When combined with the pervasive mutational bias in bacteria in which deletions outnumber insertions, regions that are not under strong selective constraints erode and are eventually lost, leading to small and compact genomes 
Due to difficulties in defining the variety or size of functional elements that might potentially occur within intergenic spacers, the most common comparative methods, such as Ka
ratio tests, are not useful for differentiating spacers (or those portions of spacers) that are functional from those that are inert. This problem is further compounded by the extreme AT-richness of symbiont genomes, which can cause erroneous results from motif-finding and structural algorithms. Therefore, we tested the degree of conservation for a series of short sequences (k
-mers) across orthologous regions from Buchnera aphidicola
of varying degrees of phylogenetic relatedness (Figure S1
). To enhance the strength and validity of these tests, we generated complete Buchnera
genome sequences for two aphid species (Buchnera
-Ua and Buchnera
-Ak), which provided information at intermediate levels of relatedness.
Based on the presence of identical k-mers within orthologous regions across genomes, intergenic spacers (IGSs) contain an excess of conserved k-mers relative to protein-coding regions, indicating most IGSs contain some type of functional elements. Because these analyses require that k-mers be identical, many of the functional regions within IGSs are considerably longer than the associated k-mers but do not show perfect conservation along their entire lengths. Also, we found that conserved k-mers are often located near one another in the same IGS (k-mer blocks), suggesting that they are parts of the same functional element.
Orthologous IGSs exhibit not only sequence conservation, as reflected in the elevated numbers of identical k-mers, but also substantial conservation of length across Buchnera genomes (). The similarity in spacer length among Buchnera lineages could be attributable either to selection on the functional elements within spacers or simply to shared ancestry, such that genomes retain ancestral spacer lengths due to lack of time for mutations affecting length to occur. However, the latter explanation is excluded by the observation that DNA from inactivated functional elements is largely eliminated across the time scales corresponding to divergence of these lineages (up to 70 MYA). For example, along the lineage leading to Buchnera-Ua, DNA for 36 genes or pseudogenes was eliminated, with only 13 pseudogenes recognizable in the genome. Thus, the conservation of spacer lengths is largely attributable to functional constraints.
Taken as a whole, our analyses established that at least 201 of the 336 IGSs of at least 50 nucleotides in length encode functional elements (). Some of those remaining might harbor functional sequences that do not rely on conserved motifs; for example, the standard sigma-70 binding sites (RpoD sites) have a relatively weak and AT-rich consensus sequence in E. coli
nnTAtAaT). However, many IGSs probably consist of decaying pseudogenes. We note that the IGSs that do not contain conserved k
-mers have more often undergone changes in length during the divergence of Buchnera
-Ap and Buchnera
-Ak (), as expected for functionally inactive regions (Fisher's r
IGS features in the Buchnera-ApTokyo genome.
Lengths of orthologous IGSs in Buchnera-Ap and Buchnera-Ak.
Our analyses focused on a cluster of Buchnera corresponding to aphids in the Aphidinae and including the focal species, Buchnera-Ap. This depth of comparison provided sufficient divergence to detect conserved elements not due to recent shared ancestry. Searching for conserved IGS regions in the more distant genomes of Buchnera-Cc and Buchnera-Bp did reveal some of the same elements. However, these were relatively few due to the reduction in number of clearly identifiable orthologous IGSs (reflecting divergence in gene repertoires) and to the lack of strict conservation of sequence for stretches corresponding to k-mer lengths. The intermediate level of comparison enabled by the newly sequenced genomes was critical to detecting conserved elements.
In addition to the variation observed in intergenic spacers, the process of genome reduction is also expected to cause differences in the gene catalogs and the pseudogene contents of these genomes. Among Buchnera
from aphids in the subfamily Aphidinae, the newly sequenced Buchnera
-Ua encodes the fewest protein-coding genes. Certain of these gene losses in Buchnera
-Ua may reflect changes in its nutritional ecology, related to the host plant, or a greater reliance upon the host or presence of an additional symbiont. The composition of phloem sap ingested by U. ambrosiae
feeding on one of its host plant (Tithonia fruticosa
) contains very high amounts of arginine (25% of free amino acids) 
; elevated arginine in the diet potentially has led to relaxation of selection for the maintenance of ornithine biosynthesis, resulting in the loss of that pathway in Buchnera
-Ua. Alternatively these gene losses (argECB, argA, argD
) may be influenced by increased metabolic cooperation between U. ambrosiae
-Ua in that the activity of an aphid-derived ornithine aminotransferase (EC 18.104.22.168), involved in analogous functionality, was recently demonstrated to be up-regulated in the bacteriocytes of A. pisum 
-Ua genome also has lost the genes for pantothenate biosynthesis (panBC
), possibly due to the transfer from dependence on Buchnera
for pantothenate provisioning to dependence on another bacterial symbiont, Hamiltonella defensa
, which is universally present in U. ambrosiae
and closely related Uroleucon
from North America. This Uroleucon
-associated strain of H. defensa
and appears to be a stable coevolving symbiont of this clade of Uroleucon
species, along with Buchnera
, P. Degnan unpublished).
Ongoing gene erosion in Buchnera
has resulted in 15 convergent cases of gene inactivation and loss along independent lineages of the Aphidinae. Although several of these events involve highly degraded or deleted genes (e.g., ansA, hflD, hns
), nine involve inactivating mutations generated by an indel in a homopolymeric tract. Such mutations are common in endosymbiont genomes due to their highly biased base compositions and are commonly interpreted as ‘recent’ gene inactivations 
. In fact, between 10 and 70% of disrupted genes identified in Buchnera
genomes result from indels occurring within homopolymeric repeats (). However, it has been demonstrated that mRNAs for an inactivated locus in Buchnera
of the aphid Rhopalosiphum padi
can be corrected by transcriptional slippage to yield functional proteins 
. This phenomenon has been suggested to potentially play a role in regulating gene expression 
. Therefore, while some convergent gene loss may be the result of independent inactivation events, reflecting low functional constraint, it is plausible that some of these mutations provide an alternative means of gene regulation in Buchnera
Many of the spacers that do not contain functional elements are pseudogenes in various stages of decay, including some newly identifiable on the basis of comparisons between Buchnera
-Ak and Buchnera
-Ua (). Although the symbiosis between Buchnera
and aphids has existed for more than 150 million years and the ancestral Buchnera
already had a highly reduced genome 
, the loss of genes has been ongoing during this period, even among strains confined to a single aphid host species, as observed for the Buchnera
-Ap strains (Fig. 2
). The continuous production of new pseudogenes, and the resulting new intergenic spacers, is perhaps surprising given the long co-evolution and functional interdependence of the symbiont and host. However, Buchnera
genomes are not nearly the smallest genomes found in symbiotic bacteria 
, implying that symbiotic bacteria are able to endure and compensate for continued gene loss.