|Home | About | Journals | Submit | Contact Us | Français|
Highly reduced genomes of 144–416 kilobases have been described from nutrient-provisioning bacterial symbionts of several insect lineages [1–5]. Some host insects have formed stable associations with pairs of bacterial symbionts that live in specialized cells and provide them with essential nutrients; genomic data from these systems have revealed remarkable levels of metabolic complementary between the symbiont pairs [3, 4, 6, 7]. The mealybug, Planococcus citri (Hemiptera: Pseudococcidae), contains dual bacterial symbionts existing with an unprecedented organization: an unnamed Gammaproteobacteria, for which we propose the name Candidatus Moranella endobia, lives inside the Betaproteobacteria Candidatus Tremblaya princeps . Here we describe the complete genomes and metabolic contributions of these unusual nested symbionts. We show that while there is little overlap in retained genes involved in nutrient production between symbionts, several essential amino acid pathways in the mealybug assemblage require a patchwork of interspersed gene products from Tremblaya, Moranella, and possibly P. citri. Furthermore, while Tremblaya has the smallest cellular genome yet described, it contains a genomic inversion present in both orientations in individual insects, starkly contrasting the extreme structural stability typical of highly reduced bacterial genomes [4, 9, 10].
With only 138,927 base pairs (bps) and 121 protein-coding genes, Tremblaya has the smallest cellular genome yet described (Figure S1). There are two copies of the 16S-23S-5S ribosomal DNA operon contained within an identical 5702 bp duplication (genome coordinates 46,430-52,131 and 124,024-129,725), as expected from previous work sequencing rDNA-positive clones from a mealybug genomic library . The Tremblaya genome encodes 44 ribosomal proteins, which is similar to other highly reduced bacterial symbiont genomes . In contrast to these other tiny genomes, however, Tremblaya lacks many translation-related gene homologs outside of the ribosome itself. Its genome encodes no functional aminoacyl-tRNA synthetases, and is missing homologs for both translational release factors (prfA and prfB), elongation factor EF-Ts (tsf), ribosome recycling factor (frr), and peptide deformylase (def). Similar to the alphaproteobacterial insect symbiont Candidatus Hodgkinia cicadicola , which at 144 kilobases (kb) is the next-largest cellular genome, Tremblaya has the unusual property of having a small genome with a high guanine + cytosine (GC) content (58.8%) , although unlike Hodgkinia it uses the standard bacterial genetic code. Compared to most highly reduced bacterial genomes, the Tremblaya genome is unusually gene-sparse, with a coding density of only 72.9%, an average intergenic length of 389 bps, and 19 recognizable pseudogenes (Tables 1 and and22).
One of the most striking features of the Tremblaya genome is a 7,032 bp region flanked by two 71 bp inverted repeats, which is found in both orientations in the population (Figure 1). During genome assembly, reads overlapping either end of this 7 kb region were found contiguous with both ends of an assembly gap, suggesting genomic polymorphism in the 12 insects that were pooled for genome sequencing. PCR amplification on DNA extractions from 10 additional insects confirmed that both orientations of the inversion exist in the Tremblaya population in individual insects (data not shown). It is currently unknown whether both orientations are present within individual Tremblaya cells. Tremblaya encodes only six genes involved in DNA replication, recombination, or repair—the , , , and/subunits of DNA polymerase III, the replicative DNA helicase, and the DNA replication inhibitor CspD (dnaENQXB and cspD)—and thus at this evolutionary stage does not appear to be able to catalyze this inversion itself, as no genes known to be directly involved in recombination are present in the genome (e.g., recA, recBCD, recE, recF, or recJ ).
Most insects that feed exclusively on plant sap have developed symbioses with microorganisms to supplement their nutritionally deficient diet [14, 15]. Genomic work on the symbionts of sap-feeding insects has clearly shown that essential amino acid production is the primary nutritional role of these intracellular bacteria [2–7, 9, 16, 17]. As mealybugs feed exclusively on phloem sap, it has been assumed that at least one of their two intracellular bacterial symbionts is involved in provisioning essential amino acids . This assumption leads to the prediction that either Tremblaya or Moranella (or both) would retain genes necessary for the production of at least some essential amino acids. Analysis of the Tremblaya genome supports this hypothesis, as it encodes several gene homologs (29 of 121 genes, or 22% of the genome by nucleotide count) involved in the synthesis of the 10 essential amino acids, although no one single pathway is complete in Tremblaya alone (Figure 2). The leucine and valine pathways are nearly complete, although gene homologs for the branched-chain amino acid aminotransferase (BCA) and the small subunit (IlvN) of acetohydroxybutanoate synthase/acetolactate synthase are both missing. In Escherichia coli, the active site for both acetohydroxybutanoate synthase and acetolactate synthase are on the large subunit (IlvB), and this subunit is functional in the absence of IlvN, although the Vmax for both reactions is greatly reduced . It is therefore possible that these reactions are catalyzed by IlvB alone in Tremblaya. The aphid symbiont Buchnera is also missing the BCA homolog, and it has been predicted that this reaction is carried out by the insect in the Buchnera-pea aphid symbiosis . The recent completion of the pea aphid genome confirmed that a BCA gene homolog is encoded in the aphid genome ; it is also highly expressed in aphid bacteriocytes at both the mRNA  and protein  levels. As the BCA activity is encoded in all other sequenced insect genomes , it seems likely that this reaction is ancestrally present in insects and retained by P. citri. Evidence for this hypothesis comes from a BCA homolog found in an expressed sequence tag (EST) library of Maconellicoccus hirsutus (GenBank accession EH218655.1), a mealybug belonging to the same subfamily as P. citri . Additionally, a few low-coverage contigs in the present work assumed to be from the mealybug genome have top hits to animal BCA homologs in the GenBank non-redundant database (nr), indicating that a gene coding for the branched-chain amino acid aminotransferase activity is present in the P. citri genome (Table S1). As in aphids , the mechanism used by the mealybug system in the production of methionine is not clear (Figure 2). Certain sequences in the M. hirsutus EST library (EH216258.1, EH218567.1, and EH211908.1) and in the P. citri assembly (Table S1) correspond to gene homologs of cystathionine gamma-synthase (CGL), cystathionine beta-lyase (CBL), and homocysteine S-methyltransferase (MetE), although in this case the precise annotation of these genes is uncertain. As Tremblaya also encodes a homolog of metE, the production of methionine may involve a combination of gene products from Tremblaya and P. citri, although further work is needed to identify the exact genes involved in this pathway.
At 538,294 bps in length, the Moranella genome is almost 4 times larger than the Tremblaya genome (Figure S1), although it codes for only 15 gene homologs involved in essential amino acid production, compared with 29 in Tremblaya (Figure 2). As in Tremblaya, no single essential amino acid pathway is complete in Moranella alone, and only 3 gene homologs involved in essential amino acid biosynthesis are present in both the Tremblaya and Moranella genomes. Strikingly, complete synthesis of tryptophan and threonine require a patchwork of gene products from Tremblaya and Moranella, while phenylalanine, arginine, and isoleucine biosynthesis require genes from Tremblaya, Moranella, and possibly the mealybug host (Figure 2). As for the BCA homolog discussed above, genes that encode aspartate aminotransferase (AAT), ornithine aminotransferase (OAT), and threonine dehydratase (TDH) activities can be found in several insect genomes [20, 21], and copies of AAT and OAT orthologs are present in the EST library from mealybug Maconellicoccus hirsutus (GenBank accessions EH215711.1 (AAT) and EH214276.1 (OAT)). Additionally, a few low-coverage contigs in the present work have top hits to animal AAT, OAT, and TDH homologs in the nr database, indicating the possible presence of these genes in the P. citri genome (Table S1). Therefore, the last step in phenylalanine synthesis, the production of ornithine for arginine biosynthesis, and the threonine dehydratase activity of isoleucine pathway may be performed by the insect, as has been predicted in the Buchnera-pea aphid symbiosis [20–22]. Our results suggest that lysine and histidine are not produced by this symbiotic partnership, as none of the missing genes in the bacterial pathways are thought be present in animal genomes , and no obvious homologs can be identified in the low-coverage sequence contigs. Phloem sap of various plants is known to vary widely in the levels of essential amino acids present , and it is possible that lysine and histidine are acquired at sufficient levels in the mealybug diet.
While the pattern of complementary gene loss and retention in the mealybug assemblage implies a functional symbiosis, especially in the context of the extreme genome reduction observed in Tremblaya, we sought to verify gene expression from a pathway that involved contributions from all three organisms in the symbiosis. We tested for gene expression in all genes in the phenylalanine pathway—which requires gene products from Tremblaya, Moranella, and P. citri (Figure 2) —by RT-PCR on RNA extracted from purified mealybug bacteriomes. We targeted a region of the P. citri AAT mRNA predicted to be spliced (Figure S2), reasoning that a functional transcript should be processed to mature mRNA in tissues where it is actively expressed. All 10 genes, including the spliced form of AAT mRNA from the low-coverage P. citri genome assembly, were expressed (Figure S2).
Several multi-species symbiotic communities that provision nutrients to their hosts have been studied using genomic methods [3, 4, 6, 7, 26, 27]. In most of these previous examples, complete or near-complete pathways for the synthesis of individual nutrients exist exclusively in one member of the community [3, 4, 7, 27]. For example, in both glassy-winged sharpshooter and cicada, Sulcia muelleri (Bacteroidetes) has near-complete pathways for the production of 8 of the 10 essential amino acids while the partnering symbiont produces the remaining 2 [3, 7] (in glassy-winged sharpshooter, the partner is the Gammaproteobacteria Baumannia cicadellinicola ; in cicada, the partner is the Alphaproteobacteria Hodgkinia cicadicola ). In the related spittlebug system, Sulcia has lost the ability to make tryptophan—thereby retaining the ability to make 7 instead of 8 of the essential amino acids—and the partnering bacterial symbiont, the Betaproteobacteria Zinderia insecticola, has retained a perfectly complementary set of gene homologs for the synthesis of the remaining 3 essential amino acids . In all of these cases, complete amino acid biosynthetic pathways are partitioned between bacterial partners; for example, gene homologs comprising the tryptophan biosynthetic pathway exist in either one symbiont or the other, never in both .
One system that seems to be an exception is the Buchnera/Serratia dual symbiosis in the aphid Cinara cedri [5, 26]. In this example, the first two steps in the tryptophan pathway (trpEG) are present on a plasmid in Buchnera, while the remaining tryptophan biosynthetic genes are found on a fragment of the Serratia genome, suggesting that an intermediate of the pathway must pass between Buchnera and Serratia for tryptophan to be produced . However, because the Serratia genome is not yet complete , it is not known definitively that trpEG homologs are not encoded in the Serratia genome. Nevertheless, assuming that the steps in tryptophan biosynthesis are divided between Buchnera and Serratia in C. cedri, this example does not compare to the complexity of the step-by-step metabolic interdependency shown in the mealybug system (Figure 2).
As in other dual-symbiont insect systems [3, 4, 6, 7], it is unclear how transport of metabolites occurs between co-symbionts. Even when only considering phenylalanine and tryptophan synthesis, Tremblaya and Moranella would have to pass a minimum of 5 metabolites between the two partners to produce tryptophan, and 2 metabolite exchanges would be required for the synthesis of phenyalanine (not including the last step, putatively catalyzed by the mealybug AAT activity). Moranella encodes only a handful of genes involved in membrane transport, and none are specific for amino acids or their precursors (data not shown). If proteins were imported or exported between the cells to accomplish phenylalanine and tryptophan biosynthesis, then 6 proteins would have to be transported in either direction. Some components of the Sec translocation machinery are present in the Moranella genome (secAEGY and yidC; secB exists as a pseudogene), and it is possible that these are used to transport some proteins across Moranella’s inner membrane. A search for signal peptides in the Moranella proteome revealed 27 proteins with N-terminal secretory signal peptides, however none was involved in essential amino acid biosynthesis (data not shown). As the Tremblaya genome encodes no predicted transporters, it is unlikely that either Moranella or Tremblaya is capable of specifically controlling the import and export of the metabolites or enzymes needed for amino acid biosynthesis.
Similar to other highly reduced bacterial genomes , Tremblaya has retained 19 of 21 small subunit and 25 of 33 large subunit ribosomal protein genes, suggesting that translation still occurs in Tremblaya cells. While other highly reduced bacterial symbiont genomes are missing certain aminoacyl-tRNA synthetases [1–4], Tremblaya is the first bacterial genome published without any functional aminoacyl-tRNA synthetase copies in its genome (copies of the arginyl- and cysteinyl-tRNA synthetases exists as pseudogenes). Furthermore, Tremblaya is missing several gene homologs for translation-related functions—for example, both translational release factors—that are normally found in the smallest bacterial genomes . Because the bacterial translational machinery is significantly different from the eukaryotic machinery (in particular the release factors ), it seems unlikely that all of these missing functions could be complemented by host-encoded activities. It is possible that, as in organelles , some Tremblaya genes have been transferred to the host genome and their products subsequently reimported; this would have to be accomplished by host-encoded transporters. However, in two insects that have stably-associated symbiotic bacteria for which complete insect genomes exist (pea aphid and body louse) no transfer of functional genes between symbiont and host were found [30, 31], suggesting that this may not be a common solution to missing genes in insect symbiont genomes.
Like other highly reduced symbiont genomes , Tremblaya encodes only a few gene homologs involved in DNA replication, and none are directly implicated in DNA repair or recombination. The loss of recombinogenic activities, combined with a dearth of repetitive sequence, is thought to explain the unusual and extreme structural stability found in reduced symbiont genomes [9, 32], where bacterial genomes separated by 20–200 million years show no rearrangements or duplications [7, 9, 10, 17]. This extreme structural stability is restricted only to bacteria with severely reduced genomes that are subject to strict vertical transmission within hosts. In contrast, other obligate intracellular bacteria with larger genomes and more promiscuous transmission routes can harbor large numbers of mobile genetic elements and are more structurally dynamic [33, 34]. As Tremblaya has the smallest reported bacterial genome, it is surprising to observe an inversion in its genome (Figure 1). At present, the age and origin of this inversion is unknown. If no evolutionary advantage exists for the inversion occurring in one orientation versus the other, then it is likely a recent event, as evolutionarily old selectively neutral polymorphisms are expected to be lost in small populations, such as those that exist for intracellular symbionts [35, 36], due to genetic drift. If the inversion is somehow advantageous, it could be either ancient or recent, and maintained through selection; it is currently unclear what the selective advantage of this genomic polymorphism would be in Tremblaya.
In the 1960s, summarizing decades of microscopical work carried out by numerous investigators, Paul Buchner described the symbiotic structures in P. citri bacteriocytes as “roundish or longish mucilaginous globules [now known to be Tremblaya] in which the symbionts [Moranella] are thickly embedded”  (Figure S3). He noted at least two morphological forms of Moranella: a reproductive form where cells were small in size and in the process of dividing, and a degenerative phase in which Moranella cells became unevenly shaped and elongated . The particular Moranella form was dependent on the life stage of the insect and seemed to be synchronized within a bacteriocyte .
Adding evidence to the idea that Moranella has life stages distinct from Tremblaya, recent work has shown that the infection levels of Tremblaya and Moranella are uncoupled in mealybugs, at least in males . Specifically, during male development, the number of Moranella cells relative to Tremblaya cells drops significantly as the insects age, while in female insects the levels of the two symbionts remain roughly equivalent over the entire life cycle . These results led the authors to suggest that Tremblaya plays an essential role in controlling the levels of the Gammaproteobacteria (Moranella), and that a novel bacterial endo- and exocytosis-like mechanism might be involved in the Tremblaya-Moranella symbiosis . Given that Tremblaya has an extremely limited coding capacity that is largely devoted to essential amino acid biosynthesis and translation, and that only 7 genes are of completely unknown function, it seems impossible that Tremblaya itself controls any structural aspect of the symbiosis. Likewise, the Moranella genome does not encode any genes involved in traditional infective strategies (such as Type III or IV secretion systems), and does not indicate any obvious pathway by which it could be an active participant involved in seeking out the Tremblaya cytoplasm. Thus, it seems likely that the host is largely in control of the structure and organization of this bacteria-within-a-bacterium symbiosis.
The recent discoveries of bacterial symbiont genomes that are reduced in size and gene content far beyond what was once considered possible raise important but difficult-to-answer questions about how—and in some cases, whether—these bacteria carry out the necessary functions needed for their own survival, as well as for the survival of their symbiotic partners [2, 12, 39]. The missing activities resulting from such extensive gene loss could be compensated by several possible mechanisms, such as: i) gene products or metabolites of either host or bacterial origin imported from the host [30, 40, 41]; ii) gene products or metabolites imported directly from the co-symbiont (if present), or iii) genetic co-adaptations to the loss of genes within the reduced genome itself. Tremblaya presents these same issues, but with the added complexity of having a bacterial symbiont with a relatively rich gene set residing in its cytoplasm. This adds another possible mechanism—extremely speculative, to be sure—for the solution to gene loss in Tremblaya: iv) the direct use of Moranella gene products due to a simple, passive mechanism such as Moranella cell lysis within the cell membrane system of Tremblaya. Were the host able to control the system so that selective lysing of Moranella cells were possible (or if those cells occasionally spontaneously lyse on their own), this would give Tremblaya ready access to Moranella gene products that could catalyze the diverse and numerous functions missing from the Tremblaya genome. This scenario would obviate the need for multiple host-encoded transport processes involving molecules as potentially diverse as aminoacylated tRNAs (or their synthetases), translational control factors, several metabolites or enzymes involved in essential amino acid production, and possibly even enzymes that catalyze the Tremblaya genomic inversion. Evidence that the levels of Tremblaya and Moranella cells are decoupled in mealybugs  would seem to at least indirectly support this hypothesis. In any case, further work is needed to understand how Tremblaya can survive with such a limited gene set.
Bacterial genomes are in general compact and gene-dense, with typical coding densities of 80–90% . Reduced bacterial genomes tend to be, but are not always, more gene dense than average (Table 1). Excluding Tremblaya, the four smallest bacterial genomes have coding densities in the range of 93–97%, and so it is surprising that Tremblaya has a coding density of only 73%, making it the smallest but also one of the least gene-dense bacterial genomes published (Table 1). The other examples of bacteria with very low coding densities—Mycobacterium leprae (50%), Sodalis glossinidius (52%), and Rickettsia prowazekii (76%)—are also intracellular bacteria undergoing reductive genome evolution, but have 8–30 times larger genomes than Tremblaya. These larger gene-sparse genomes are likely in the phase of genome reduction associated with the shift from a free-living to an obligate intracellular lifestyle, where the constant exposure to the stable and rich environment of the host cell combined with a severe reduction in population size (and subsequent reduction in the efficacy of purifying selection) allows large numbers of pseudogenes to accumulate [42, 43]. These pseudogenes are eventually purged from the genome through mutational patterns favoring deletions , leading to small gene-dense genomes such as those from insect nutritional symbionts.
As the smallest described cellular genome, Tremblaya is a surprising exception to this rule. One possible explanation is that the Tremblaya genome was gene-dense prior to the acquisition of Moranella, and that establishment of the symbiosis relaxed the selective constraints on Tremblaya genes that were redundant with the more gene-rich Moranella genome. Basal lineages of mealybugs in the same subfamily as P. citri (Pseudococcinae) seem to contain Tremblaya without the intracellular gamaproteobacterial endosymbiont [23, 45], indicating that Moranella was acquired after the establishment of Tremblaya. The patterns of gene pseudogenization also fit this hypothesis, as most pseudogenized Tremblaya genes have functional Moranella homologs (Table 2). Importantly, all 3 Tremblaya pseudogenes involved in essential amino acid biosynthesis (carB, argF, and argH) are present in the Moranella genome as functional copies, exactly as would be expected in such a complementary and interdependent nutritional symbiosis.
We propose the name Candidatus Moranella endobia for the gammaproteobacterial symbiont living in the cytoplasm of the Betaproteobacteria Candidatus Tremblaya princeps, which itself lives in the cytoplasm of the mealybug Planococcus citri bacteriocytes. Previous phylogenetic work has established Moranella as a member of the Enterobacteriaceae, with the tsetse fly symbiont Sodalis glossinidius as its closest relative . Due to the uncertain relationships of the gammaproteobacterial symbionts of mealybugs [38, 45], we are only naming the gammaproteobacterial symbiont present in the genus Planococcus, although this name could be used in other taxa if future work warrants the designation. Moranella refers to the American evolutionary biologist Nancy A. Moran, and endobia reflects the unusual property of living exclusively in the Tremblaya cytoplasm (endo = inside, bia = living; feminine form). Unique properties of Moranella include the exclusive existence in Tremblaya cells and the 16S rDNA sequence GTCTTGAACTGTGGCTTTCGTAGTT (positions 839–863, E. coli numbering).
We thank N. Moran for helpful discussions and suggestions throughout the course of this work, M. Riley and O. A. Duhl for help in naming Moranella, and N. Fukushima for performing the RT-PCR experiments. This work was funded by the University of Arizona’s Center for Insect Science National Institutes of Health Training Grant 1K12GM00708 and the National Science Foundation Montana EPSCoR grant EPS-0701906 (J.P.M). The Tremblaya and Moranella genome sequence data have been submitted to the DDBJ/EMBL/GenBank databases under accession Nos. CP002244 and CP002243. The authors declare no competing financial interests.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.