|Home | About | Journals | Submit | Contact Us | Français|
Commensal symbionts, thought to be intermediary amid obligate mutualists and facultative parasites, offer insight into forces driving the evolutionary transition into mutualism. Using macroarrays developed for a close relative, Escherichia coli, we utilized a heterologous array hybridization approach to infer the genomic compositions of a clade of bacteria that have recently established symbiotic associations: Sodalis glossinidius with the tsetse fly (Diptera, Glossina spp.) and Sitophilus oryzae primary endosymbiont (SOPE) with the rice weevil (Coleoptera, Sitophilus oryzae). Functional biologies within their hosts currently reflect different forms of symbiotic associations. Their hosts, members of distant insect taxa, occupy distinct ecological niches and have evolved to survive on restricted diets of blood for tsetse and cereal for the rice weevil. Comparison of genome contents between the two microbes indicates statistically significant differences in the retention of genes involved in carbon compound catabolism, energy metabolism, fatty acid metabolism, and transport. The greatest reductions have occurred in carbon catabolism, membrane proteins, and cell structure-related genes for Sodalis and in genes involved in cellular processes (i.e., adaptations towards cellular conditions) for SOPE. Modifications in metabolic pathways, in the form of functional losses complementing particularities in host physiology and ecology, may have occurred upon initial entry from a free-living to a symbiotic state. It is possible that these adaptations, streamlining genomes, act to make a free-living state no longer feasible for the harnessed microbe.
Symbiosis, a term coined by Anton de Bary in 1879, refers to two (or more) species that live in close association with each other, typically with one on or within the body of the other. Symbiotic relations are known to be extremely pervasive, with examples being found throughout the earth's biota. It is estimated that the majority of the arguably largest class of invertebrates, Insecta, are involved in some type of symbiosis, with the majority of these relations being shared with bacteria. These associations form a dynamic spectrum with regards to the necessity as well as the physiological impact towards those involved.
Obligate mutualists of insects are thought to enable their hosts to survive on restrictive diets, typically consisting of a single food source, by provisioning nutritional supplements such as amino acids and B vitamins (10, 40). Symbiont loss often results in detrimental fitness costs for the host, such as sterility, growth impairment, and shortened life spans (40). Congruence of host and obligate mutualist phylogenies, dating back to ancient times, has been demonstrated for a number of insect systems, including Buchnera and aphids, Wigglesworthia and tsetse flies, Blochmannia and ants, Carsonella and psyllids, and Blattobacterium and cockroaches (recently reviewed in reference 52). Many of these arthropod-associated mutualists form distinct but related lineages in the γ-proteobacteria, with genome diversification presumably having occurred upon respective symbiotic establishments. Analyses of obligates reveal drastically reduced genomes, such as the 450 to 653 kb of Buchnera, the 697 kb of Wigglesworthia, the 800 kb of Blochmannia, and the 680 kb of Baumannia (37, 52). These reduced genomes are reflective of tight associations with host physiology and ecology such that pathways for superfluous compounds or compounds deemed beneficial but unnecessary are eliminated (35, 36). Among other hallmarks of these genomes are increased genetic drift, accelerated sequence evolution, and adenine-thymine (AT) bias (6, 36).
In contrast to the well-studied obligates, less is known about the functions and evolution of commensal genomes. These symbionts apparently are recent inhabitants (4, 12), with much broader tissue tropism in their hosts (i.e., variations among host sex and age) and a patchier occurrence within populations (12, 48) and different host species (14). While commensals are found to be maternally transmitted (10, 12, 14, 18), horizontal transfer among related host species is common (4, 12, 17, 43), as evidenced by the lack of congruity between symbiont and host genomes. Recent works have noted some benefits for host biology arising from commensals, such as temperature tolerance (13, 34) and increased resistance against parasitoid development (41) in aphids. It has also been suggested that commensals may influence traits such as host susceptibility towards disease (28) and the transmission of other microbes, such as trypanosome infection in tsetse flies (51). Thus, commensals can be considered intermediates between the highly evolved mutualist, strictly vertically transmitted and indispensable for its host, and the facultative parasite, whose horizontal modes of transmission have typically been associated with virulence (18).
A group of bacteria, characterized from distant insect orders such as Diptera, Coleoptera, and Hemiptera, form a distinct lineage within the γ-proteobacteria in close association with pathogenic and free-living enteric microbes. Here, we compare the genomic makeup of two of the symbionts within this lineage: Sodalis glossinidius, a symbiont with the tsetse fly (Glossina spp.; Diptera, Glossinidae), and Sitophilus oryzae primary endosymbiont (SOPE), a symbiont with the rice weevil (Sitophilus oryzae; Coleoptera, Dryophthoridae). Both of these recently established symbionts (4, 24) show no A+T bias in several of their sequenced genes (2, 24). Although their genomes, 2.0 Mb for Sodalis (2) and 3.0 Mb for SOPE (11), have been reduced in size in comparison to free-living relatives, they are significantly larger than those of obligates. Their hosts, members of distant insect taxa, occupy distinct ecological niches and have evolved to survive on specialized diets (i.e., blood for tsetse flies and cereal for rice weevils). In addition to Sodalis, tsetse flies harbor an ancient obligate mutualist, Wigglesworthia, and members of both insect host taxa have been invaded by the parasitic Wolbachia (25, 42). The functional biology of the symbionts within their respective hosts suggests that Sodalis fulfills a commensalist role while SOPE has acquired obligate traits, since its elimination negatively affects the energy-dependent activities of its weevil host (21, 25, 39). Furthermore, while Sodalis lives both within insect cells and extracellularly in hemolymph (14), SOPE appears confined to specialized cells within its host (25, 26). It has been possible to cultivate Sodalis in vitro in cell-free medium (50), while attempts to cultivate SOPE have failed (A. Heddi, unpublished data), further supporting their symbiotic statuses. Discoveries of invasion mechanisms, such as a type III secretion system (15, 16), in these symbionts suggest that parasitism, through an attenuated adaptive process arising from coevolution, may have developed into beneficial relationships (15, 16, 20, 27, 46). Thus, these symbionts can serve as models for studying the preliminary impact of a lifestyle shift, from free-living to symbiotic, on microbial genome composition.
The goal of this study was to gain insight into forces driving the evolutionary transition to symbiosis and their consequences on the genomic composition of microbial participants. The symbiont genomes were comparatively analyzed by a heterologous array hybridization approach using Escherichia coli macroarrays in attempts to understand how host ecology may influence selection for adaptations in microbial genomes and how initial genome reductions may harness organisms into symbiotic lifestyles. The information obtained for SOPE was compared to previous results obtained with Sodalis employing a similar technique (2). We discuss the divergence observed for the two sister bacterial genomes in light of their respective unique niches.
Sitophilus oryzae weevils (Chinese strain) were reared on wheat at 27.5°C and 70% relative humidity. Bacteriomes were dissected from fourth instar larvae, and total SOPE DNA was prepared as previously described (22). Tsetse flies, Glossina morsitans subsp. morsitans, were maintained at 23°C with 60% relative humidity and received defibrinated bovine blood through an artificial membrane system. The symbiont, Sodalis, was obtained from pupae and cultured on a layer of C6/36 cells as described previously (50). Purification of Wolbachia from Drosophila melanogaster (Canton-S) and DNA extraction were performed as previously described (9).
For genome comparison studies, the E. coli K-12 gene array, which contains the 4,290 PCR-amplified open reading frames (ORFs) identified in the sequenced genome (8) spotted onto nylon membranes (Panorama macroarrays; Genosys Biotechnologies Inc.), was used. Each ORF is spotted in duplicate over three panels for control purposes. For hybridization probes, symbiont DNA was radioactively labeled with [α-33P]ATP (ICN, La Jolla, Calif.) by use of a Pol I/DNase I nick translation kit (Gibco). Hybridization reactions were performed in 45% formamide-5× Denhardt's solution-5× SSC (1× SSC is 0.15 M NaCl plus 0.015 M sodium citrate)-0.5% sodium dodecyl sulfate (SDS) buffer at 42°C as described previously (2). The arrays were washed at 42°C for 30 min in 2× SSC-0.1% SDS and 0.1× SSC-0.1% SDS followed by 0.1× SSC-0.5% SDS. Arrays were exposed to maximum resolution films (BMR; Eastman Kodak Company), and signals were scored on the basis of strong, medium, or weak hybridization intensities (six-, four-, and twofold above background noise, respectively, as verified by density readings obtained by utilizing a Kodak Image Station 2000R system) as described previously (2). There were no cases for which duplicate spots gave contradictory results. Hybridization with Wolbachia DNA was performed under the same conditions to evaluate potential contamination and therefore its contribution to the signals detected with SOPE DNA. There was negligible hybridization with Wolbachia DNA, confirming that results obtained with SOPE DNA represent true homologs.
ORFs were identified according to array specifications and entered into spreadsheets (Microsoft Excel) (available online [http://info.med.yale.edu/eph/html/faculty/aksoy/]; upon request, they can be provided electronically). Macros were created to identify shared as well as unique genes between Sodalis and SOPE based on two independent sets of search criteria, gene name and DBGET numerical designation. Genes were compiled into functional categories as provided by Genosys software, and potential metabolic pathways were reconstructed by use of the Kyoto Encyclopedia of Genes and Genomes web site (http://www.genome.ad.jp/kegg/). Only the metabolic pathways for which complete gene sets could be detected were included in the comparative functional analyses. A two-tailed Fisher's exact test was performed with the SAS system for Windows (30) to compare gene retention (i.e., number of genes detected) per functional category between SOPE and Sodalis. The effect of symbiont species on the proportion of genes retained with E. coli homologs was tested in these analyses. Significance is reported at the 5% level.
A heterologous array hybridization approach was used to infer the composition of the symbiont genomes. The E. coli macroarray contains 4,290 ORFs, representing its known genome, and functional roles have been assigned to 1,962 of these. A total of 2,084 orthologs could be detected with SOPE DNA, hybridizing with strong (10%), medium (37%), and weak (53%) hybridization intensities. These orthologs could account for approximately 70% of the 3.0-Mb SOPE genome (11), assuming an average size of 1 kb per gene. Of the 2,084 homologs detected in the SOPE genome, 1,450 have assigned roles in E. coli, while the remaining correspond to genes with hypothetical functions. In a similar study, the Sodalis genome of 2.0 Mb was found to hybridize to 1,812 E. coli orthologs (2), including 1,316 with described functional roles (Table (Table1).1). Both symbionts harbor large extrachromosomal elements, adding up to approximately 135 kb in Sodalis (2) and 138 kb in SOPE (11). The hybridization data obtained with Sodalis DNA reflect only the chromosomal homologs, but the homologs detected with SOPE include genes recognized by plasmid and chromosomal DNA. If the potential plasmid-specific homologs are excluded, Sodalis and SOPE genomes show homology to similar numbers of E. coli ORFs.
Comparative analysis of the ORFs detected showed that 1,525 E. coli orthologs were shared by both symbionts and 1,169 of these corresponded to ORFs with assigned functions (Table (Table1).1). To understand how differences in symbiont-host biology may have impacted the loss of genes from SOPE and Sodalis genomes, a comparative analysis of the ORFs (with known roles in E. coli) detected in each functional category was conducted. Statistically significant differences were found among symbionts in the proportion of ORFs coding for carbon compound catabolism (Fisher's exact test; P = 0.00009), cell structure (Fisher's exact test; P = 0.0073), energy metabolism (Fisher's exact test; P = 0.0071), fatty acid metabolism (Fisher's exact test; P = 0.0253), and transport (Fisher's exact test; P = 0.00004). No significant differences in housekeeping categories such as translation and posttranslational modification, cell processes, transcription, nucleotide biosynthesis and metabolism, membrane proteins, and DNA replication, recombination, modification, and repair were recorded between the two symbionts. Furthermore, ORFs encoding the regulatory sigma factors 24 (rpoE), 54 (rpoN), 32 (rpoH), and 70 (rpoD) were detected with the hybridization of both SOPE and Sodalis DNAs, indicating that these bacteria have retained the ability to respond to environmental stimuli.
While sharing many ORFs, SOPE and Sodalis genomes have retained unique (i.e., not detected with the other symbiont) genes (281 and 147 genes, respectively) homologous to those with known functions in E. coli. In Sodalis, the majority of these genes correspond to cellular processes (16%), central intermediate metabolism (15%), and amino acid biosynthesis and metabolism (13%) (Fig. (Fig.1).1). SOPE maintains a greater number of unique genes associated with functions such as transport (17%), carbon compound catabolism (14%), cell structure (14%), and energy metabolism (13%) (Fig. (Fig.2).2). These unique genes may play important roles in the specific associations of SOPE and Sodalis within their weevil and tsetse fly hosts.
Based on the presence of all the genes necessary for complete metabolic pathways, both symbionts appear capable of synthesizing many amino acids, including Ala, Cys, Asp, Gly, Ile, Lys, Leu, Val, Met, Asn, Phe, Pro, Gln, Arg, Ser, Thr, Trp, Tyr, and Val. SOPE, however, appears deficient in the production of the essential amino acid histidine (His) (no hisBCFHI), although the His transporter (hisP) is detected, suggesting that His may be obtained from the weevil host with its strict cereal diet. In contrast, His synthesis is possible for Sodalis, as evidenced by the retention of hisABCDFGHI, and accordingly hisP shows a very weak signal in comparison to that for SOPE, suggesting relaxed selection towards this gene and possibly inactivation. Both SOPE and Sodalis appear unable to synthesize glutamic acid, but Glu transporters (gltK, -L, and -P) are retained within their genomes. Although Glu is a nonessential amino acid, it is necessary for Arg, Pro, and Gln production, and symbionts may utilize transporters to attain this needed precursor externally. Glutamic acid has been shown to be a major product of tsetse fly metabolism (32, 33) and accounts for about 40% of the total amino acids of wheat (http://www.nal.usda.gov).list_nut.pl). Complete nucleotide biosynthesis and metabolism appear possible for SOPE and Sodalis, and both are capable of fatty acid biosynthesis commencing from acetyl-coenzyme A.
Both symbionts appear to lack the ability to synthesize thiamine (B1), folate, and nicotinamide, but they are capable of riboflavin, pyridoxine, lipoic acid, and protoheme production. They also have in common the retention of ORFs that encode sugar isomerases (e.g., pgm, galE, and araA) and enzymes that metabolize simple sugars, like glucose (e.g., glgC, galU, ptsG, malX, crr, pgi, prkB, and mrsA). The most abundant sugar in insect hemolymph, alpha-trehalose, can also be metabolized by both symbionts. Sodalis has lost many genes encoding enzymes involved in intermediate pathways and that metabolize plant sugars. In contrast, SOPE has retained the majority of enzymes that catabolize plant sugars (β-d-glucosides). Hence, SOPE, but not Sodalis, is able to hydrolyze molecules such as cellobiose and salicin. Sodalis and SOPE possess the activator for glycolate (glcC), a product of plant photorespiration, but Sodalis lacks the ability to convert glycolate plus P to phosphoglucolate. Furthermore, SOPE retains most of the glycolate oxidases (glcCDB), while Sodalis maintains glcC only.
To understand the impact of different symbiotic relations on genome functions, we grouped orthologs detected with known functions into categories and compared them to those present in the annotated genome sequences of free-living E. coli (8), the obligate mutualists Wigglesworthia (3) and Buchnera (44), and the parasite Rickettsia (7) (Table (Table2).2). Since full genome annotations are not available for Sodalis and SOPE, the following percentages represent the minimum values for their respective genomes. A smaller percentage of E. coli orthologs devoted to cell processes are detected in the Sodalis and SOPE genomes (5.9 and 6.8%, respectively) than in the obligates Wigglesworthia (12.1%) and Buchnera (8.4%). In contrast, higher percentages of the SOPE and Sodalis genomes are involved in amino acid biosynthesis and metabolism, energy metabolism, and cofactor biosynthesis than of the genome of Rickettsia, which is parasitic and relies on its host for many of these functions. In comparison to E. coli, SOPE and Sodalis appear to have smaller percentages of orthologs devoted to central intermediate metabolism (9.7% compared to 8 and 5.9%, respectively), cell structure (9.4% compared to 8 and 4.8%, respectively), and cell processes (9.7% compared to 6.8 and 5.9%, respectively).
Overall genomic contents can be reconstructed quickly and inexpensively when complete sequences are unavailable by utilizing hybridization to arrays available for closely related organisms. Although this technique has been successfully applied to gain a broad understanding of genome composition, it lacks the ability to identify events that render genes inactive, such as point mutations and frame shifts caused by deletions and insertions. Genes that are not found in the arrayed genome as well as species-specific genes also cannot be identified. Thus, a fraction of the genome of interest may remain unsampled when utilizing arrays of close relatives. Furthermore, the hybridization of genomes with nucleotide composition bias may lead to the detection of artifacts and result in variance between gene array hybridization and genome-wide sequence analysis (1, 3). Like any comparative genome approach, this method is unable to detect nonorthologous gene displacement (29), by which unrelated or distantly related proteins can be recruited for the same functions. Despite these limitations, this technique allows us to adequately compare genomes of microbes with similar guanine and cytosine (G+C) contents and with close phylogenetic distances to the organism from which the arrays were constructed. The identification of the full repertoire of genes comprising operons can further validate the results. For this study, related symbionts from two insect orders, Coleoptera and Diptera, were compared to delineate what alterations in microbial genome contents may have occurred since divergence from a common ancestor due to distinct host environments. To our knowledge, this is the first report describing microbial genome disintegration during the early phases of symbiotic establishment and in relation to environmental constraints.
Multiple phylogenetic analyses of Enterobacteriaceae species have suggested a distinct lineage for Sodalis and SOPE, indicating that they are members of a single bacterial taxon that have diverged from a common ancestral organism (4, 24). Recent data on the Dryophthoridae endosymbiont phylogeny have estimated the divergence between Sitophilus endosymbionts and Sodalis to be <25 million years (C. Lefevre, personal communication). The fact that symbionts related to Sodalis and SOPE are found in other insect taxa suggests that a progenitor had the ability to enter into relations with a wide range of hosts, perhaps as an insect pathogen (15). With time, pathogenic effects may have been attenuated while functions and/or products important for the evolutionary success of both partners were retained. It will be interesting to identify those genes and their functions that are present in symbionts but absent from E. coli. Equally intriguing is whether these genes were present in the common ancestor of enteric bacteria or were acquired horizontally after divergence.
Sodalis, known as the secondary symbiont of the tsetse fly, is transmitted vertically to intrauterine larvae through the mother's milk (14, 31) and has both intra- and extracellular localization in the midgut, muscle, salivary glands, fat body, and hemolymph (14). Although Sodalis maintains an overall similar tissue tropism between species, its density appears to vary among tsetse fly species (14). In contrast to Sodalis, SOPE has a strict intracellular localization within specialized structures called bacteriomes (25). Bacteriomes differentiate early during insect embryonic development and remain attached to the intestine at the junction of the foregut and midgut during the four larval stages. In young adults, bacteriomes are found in the mesenteric ceca of the intestine. However, in 2- to 3-week-old individuals, bacteriomes disappear, remaining only in the female ovaries from where bacteria are maternally transmitted to the offspring (25). SOPE has been shown to supply weevil larvae with vitamins such as pantothenic acid, riboflavin, and biotin (25) and is a source of amino acids such as phenylalanine and proline (19). The symbiont also interacts with mitochondrial oxidative phosphorylation by increasing mitochondrial enzymatic activity (23), thus extending the flight ability and other energy-dependent activities of its host (21, 25, 39). Yet the symbiosis can be disrupted without causing whole host population lethality (39).
Similarities in genomic makeup between SOPE and Sodalis are still highly evident, with retention of many of the same genetic components involved in housekeeping functions such as translation and posttranslational modification, cell processes, transcription, and nucleotide biosynthesis. The cellular machinery involved in these processes appears to be conserved in mutualists and commensals, in contrast to parasitic microbes, which exploit their hosts for many of these resources. Furthermore, the greater conservation of translational, as opposed to transcriptional, processes supports stronger genetic regulation at the translational level for Sodalis and SOPE.
Competence in complete nucleotide biosynthesis and metabolism, amino acid biosynthesis and metabolism, energy metabolism, and cofactor biosynthesis suggests that SOPE and Sodalis may be approaching mutualism in their symbiotic associations. The retention of genes involved in regulatory functions, such as sigma factors, supports their recent symbiotic establishment. Obligate mutualists that live intracellularly, sheltered from environmental fluctuations, within their hosts have lost such genes with associated regulatory functions due to lack of need (3, 35, 44).
Despite the vast similarities in detected ORFs for SOPE and Sodalis, specialized modifications towards host environment, particularly in metabolic functions, appear to have occurred that are reminiscent of the genome tailoring of ancient symbionts. The significantly greater number of energy metabolism and fatty acid metabolism genes detected for SOPE than for Sodalis may be due to the restricted cereal diet of SOPE's weevil host. Lipids, prominent in the tsetse fly blood meal, provide more energy than plant carbohydrates because they are in a more reduced form (53). The erosion of fatty acid metabolism pathways in Sodalis might reflect the natural abundance of such products in the host environment. SOPE, with a greater capacity for carbon catabolism, is capable of metabolizing plant sugars in the diet of its host, which is comprised of as much as 70% starch but is very low in lipid components (http://www.nal.usda.gov). The purging of genes involved in plant sugar metabolism from the Sodalis genome can be interpreted as an adaptive response to tsetse fly nutritional behavior. Since tsetse flies do not feed on plant material but on blood, which is low in carbohydrates and rich in simple sugars such as glucose and trehalose, Sodalis has lost unnecessary pathways that catabolize plant sugars such as starch.
The higher number of unique ORFs corresponding to cellular processes (perhaps adaptations necessary for intra- and extracellular localization), central intermediate metabolism, and amino acid biosynthesis and metabolism for Sodalis and to carbon compound catabolism, cell structure, energy metabolism, and fatty acid metabolism and transport for SOPE may be indicative of differences in genome retention since their last common ancestor and suggests bacterial domestication by the host. These differences in retention will influence what is further maintained in the SOPE and Sodalis genomes as relations with their hosts further evolve.
Utilizing the intensively studied, 250-million-year-old Buchnera-aphid association (38) as a model, some authors have suggested that significant changes on microbial genome structure transpire early upon symbiotic establishments (36). Large deletions, which typically span multiple genes, occur during the initiation of symbiosis, resulting not from selection for DNA loss but from decreased selection to maintain locus functionality (36, 49). Such massive genome reduction early upon symbiosis is supported by near perfect gene order conservation in the whole-genome sequences of three divergent strains of Buchnera (44, 47, 49). It has been inferred that the content of these early deletions determines the degree of selection on remaining loci and ultimately governs the eventual genetic inventory of the reduced genome. Only later, at an exponentially decreasing pace, are some genes eliminated through inactivation and gradual erosion (5, 45). These losses, resulting in the reduction of microbial functional flexibility, are expected to restrict the evolutionary options for the microbes, ultimately harnessing them into specific symbiotic lifestyles. Such drastic genome erosion may enable the recruitment of newer symbiotic associations to replace functions lost in the ancient obligate mutualists and potentially allow hosts to exploit new niches.
Despite their close taxonomic relatedness, the genomes of SOPE and Sodalis have been shaped differentially due to adaptations to their unique host environments. As a result, these organisms have diverged extensively and appear to be tailored to subsist on different metabolites provided in their host diets. These findings are of relevance for our applied genetic engineering studies by which we explore the use of symbionts to block transmission of pathogens in their insect hosts. Our results infer that the symbionts described here are anchored tightly to host biology through restricted metabolic capabilities and therefore may not be able to undergo horizontal transmission and establishment in distant insect taxa.
We thank Trevor W. Rudy for computer programming assistance, John Brownstein for support with statistical analyses, and members of the Aksoy lab and R.V.M.R.'s doctoral committee for enlightening discussions.
This work was funded by NIH/NIAID grant AI-34033 to S.A. R.V.M.R. is the recipient of NIH training grant T32AI07404 and a CDC fellowship for training in vector-borne infectious diseases, T01/CCT122306-01.