|Home | About | Journals | Submit | Contact Us | Français|
Lateral gene transfer (LGT) between bacteria constitutes a strong force in prokaryote evolution, transforming the hierarchical tree of life into a network of relationships between species. In contrast, only a few cases of LGT from eukaryotes to prokaryotes have been reported so far. The distal animal intestine is predominantly a bacterial ecosystem, supplying the host with energy from dietary polysaccharides through carbohydrate-active enzymes absent from its genome. It has been suggested that LGT is particularly important for the human microbiota evolution. Here we show evidence for the first eukaryotic gene identified in multiple gut bacterial genomes. We found in the genome sequence of several gut bacteria, a typically eukaryotic glycoside-hydrolase necessary for starch breakdown in plants. The distribution of this gene is patchy in gut bacteria with presence otherwise detected only in a few environmental bacteria.
We speculate that the transfer of this gene to gut bacteria occurred by a sequence of two key LGT events; first, an original eukaryotic gene was transferred probably from Archaeplastida to environmental bacteria specialized in plant polysaccharides degradation and second, the gene was transferred from the environmental bacteria to gut microbes.
LGT allows for rapid transfer of genes under strong selection and represents one way that members of the microbiota could share metabolic capabilities. It has been shown that LGT is particularly important for the microbiota evolution in the human distal gastrointestinal tract.1 Polysaccharide utilization is an important activity in the lower intestine and the ability of resident bacteria to utilize different polysaccharides provides a distinct competitive advantage.2 Recently, it has been demonstrated that Bacteroides have acquired new useful genes from environmental microbes.3 Bacteroides are the most frequent bacteria in the human gut microbiota, and harvest a vast array of dietary and host-derived glycans via outer membrane protein complexes. Genes encoding these proteins are clustered together in similarly patterned Polysaccharide utilization loci (PUL). Notably, Bacteroides thetaiotaomicron, a prototypic Bacteroides, possesses 88 PULs, differing in polysaccharide specificity.2 Intriguingly, we have found in the genome of various species of Bacteroidales, an isolated glycoside-hydrolase coding gene that belongs to CAZy family GH77. The top-scoring BLASTp hit of a characterized protein was Arabidopsis thaliana DPE2 (Disproportionating Enzyme 2). Plant DPE2 are modular glycoside hydrolases consisting of a GH77 domain interrupted by an insertion of ~150 amino acids and two carbohydrate binding modules (CBM20) at the N-terminal extension (Fig. 1). The DPE2 gene codes for a 4-α-glucanotransferase (EC 126.96.36.199) essential for maltose metabolism during the conversion of transitory starch to sucrose in the cytosol of plant cells.4 Previous phylogenetic analyses support the eukaryotic origin of DPE2-like coding gene.5 Moreover, DPE2-like genes are present only in Bacteroidales but absent in all others groups of Bacteria, including Cyanobacteria. This further argues for an origin in the nuclear genomes of eukaryotes and not from past endosymbiosis and transfer from an ancestral plastid.
To elucidate the ancestry and evolutionary history of bacterial DPE2-like proteins, we constructed phylogenetic trees. DPE2-like genes were identified in a few eukaryotic taxa and in a small group of gut and environmental bacteria (Table 1). The phylogenetic analysis showed that bacterial DPE2-like enzymes form a highly supported group branching with their eukaryotic orthologs (Fig. 2). The cluster of bacterial enzymes was positioned inside the eukaryotic cluster. Interestingly, DPE2-like enzymes of environmental bacteria were positioned at the base of the bacterial cluster. The tree topology suggests that one LGT event occurred from eukaryota, probably Archaeplastida, to ancient environmental bacterium similar to Haliscomenobacter hydrossis or Paludibacter propionicigenes. H. hydrossis is sporadically observed in aeration tanks of sewage treatment plants and in paper industry wastewater treatment plants and P. propionicigenes is a fermentative anaerobe from plant residue and rice roots dwelling in irrigated rice-field soil. Both bacteria are specialized in the degradation of plant polysaccharides and could potentially be in contact in a common environment. Later, another LGT event has occurred, from these environmental bacteria to gut Bacteroidales, possibly using food as vector. Interestingly, we identified only one bacteria species that possess two DPE2-like genes. Succinatimonas hippei, a human gut Gammaproteobacteria, presents a DPE2-like gene (Fig. 2) but also a prototypic bacterial GH77 (Fig. 1). We propose that S. hippei has acquired this gene recently from Bacteroidales bacteria inside the human gut, and has also retained the original bacterial GH77 gene. An alternative scenario cannot be excluded; it is also possible that the gut community has acquired DPE2 gene directly from plants or other eukaryote dwelling in the animal intestine and later environmental bacteria have acquired it by LGT from gut bacteria released in the environment. However, the branching order in the topology we have reconstructed places environmental bacteria in a basal position and thus tend to support more a first transfer to environmental bacteria then to gut bacteria.
In eukaryotes, we studied the conservation of intron position. Two introns localized between the two CBM20 modules are shared by the eukaryotic lineages supporting the common origin of DPE2-like genes (Fig. S1). The possible eukaryotic progenitors are Rhodophyta DPE2 genes. Interestingly, Rhodophyta coding genes, including DPE2, are mostly intronless, which could make easier the transfer of the eukaryotic gene to recipient bacteria.6,7 The phylogeny of the two CBM20 reflects the GH77 module phylogeny (Fig. 3), indicating their presence in the common ancestor of eukaryotic DPE2-like proteins.
Many complex plant polysaccharides are resistant to digestion due to either insolubility or lack of host-encoded hydrolytic enzymes. These carbohydrates are not absorbed in the upper gastrointestinal tract but serve as a major source of carbon and energy for the distal gut microbial community. These “nondigestible” dietary carbohydrate substrates include the so-called resistant starch fraction, plant cell wall material and oligosaccharides. Polysaccharide degradation is one of the core functions encoded by the human gut microbiota and the ability to target these substrates resides in many different PULs.8 The starch utilization system (SUS) was the first PUL to be described. Although the SUS system is essential for the growth of B. thetaiotaomicron on starch, SUS genes are not required for growth on maltose, a typical byproduct of the starch breakdown9 (and other publications by Salyers and co-workers). Furthermore, until now it is not proven that any of the SUS enzymes can degrade maltose. Here we show that the gut bacteria DPE2-like gene was most likely acquired by a LGT of eukaryotic origin and we suspect that it is probably involved in maltose degradation. MalQ, another GH77, is indispensable to the maltose regulon of Escherichia coli transferring maltosyl and longer dextrinyl residues onto glucose, maltose and longer maltodextrins. This operon is absent in the genome of gut bacteria belonging to the order bacteroidales. It has been shown that E. coli mutants lacking MalQ amylomaltase can no longer grow on maltose, but this ability can be restored by A. thaliana DPE2.10 It is possible that DPE2 represents a substantial competitive advantage to the host and the microbiota, providing gut bacteria with the capacity to degrade resistant starch byproducts. We speculate that the LGT event leading to the acquisition of DPE2 by gut microbiota has been crucial in the host-bacterial relationship establishment with animals during evolution, and other similar gene transfers can certainly be expected.
The putative GH77 sequences were identified using the CAZy annotation pipeline,11 and the DPE2 proteins homologues identified in various different databases (DB). Protein sequence data were retrieved by BLASTp searches against the NCBI non-redundant database, EUpathDB (http://eupathdb.org/eupathdb/), Phytozome v7.0 (http://www.phytozome.net/) and the Galderia sulphuraria DB (http://genomics.msu.edu/cgi-bin/galdieria/blast.cgi) using the Bacteroidales DPE2-like sequences and Arabidopsis thaliana DPE2 (AT2G40840) as queries. Only sequences that aligned the entire length of the GH77 module at the protein level with an e-value not higher than e-50 were kept for multiple sequence alignment in order to keep as much as possible informative sites for phylogenetic reconstruction. Sequences of a same species that were 100% identical to one another or entirely included in a longer one were eliminated to remove redundancy. There is a small amount of public Rhodophyte sequences available. For this reason three additional Rhodophyte DPE2 fragmentary sequences were included in the analysis: two Porphyridium cruentum and one Calliarthron tuberculosum protein sequences. These sequences were obtained from a non-public database hosted at Rutgers University, which is part of a current sequencing project. DPE2-like sequences were aligned using MUSCLE with default parameters12 and multiple sequence alignments were manually examined using JALVIEW.13,14
We performed phylogenetic analyzes using two different approaches, Bayesian estimation and bootstrapped maximum likelihood, as described by Danchin.15 We have rooted the tree using as out group four bacterial GH77 proteins that non-controversially belong to a different CAZy subfamily: YP_003248951.1 (Fibrobacter succinogenes ssp. succinogenes S85), YP_004699220.1 (Spirochaeta caldaria DSM 7334), CAN92536.1 (Sorangium cellulosum “So ce 56”) and ABS24534.1 (Anaeromyxobacter sp Fw109-5). Blast and a HMM libraries are used as complementary comparison tools for family and sub-family division in CAZy database. These bacterial-type GH77 sequences (GH77_2, Fig. 1) were chosen because they are the closest GH77 that are not DPE2-like sequences (GH77_3, Fig. S2). Bayesian phylogenetic reconstructions were done using MrBayes software16 with a mixture of models, an estimated gamma distribution of rates of evolution and an estimation of the proportion of invariable sites. By default 100,000 generations were run for each phylogeny reconstruction. In case the average standard deviation of split frequencies was not inferior to 0.05 after 100,000 generations, additional generations were launched until congruence was reached (<0.05). Consensus trees and statistics were obtained after systematically “burning” 25% of generated trees. Posterior probability support values are reported for each node in Figure 2. To obtain support from a second independent method, we also performed phylogenetic analyses using maximum likelihood (ML) estimation with the RAxML software.17 We systematically ran 100 bootstrap replicates followed by a ML search for the best‐scoring tree. For DPE2 and CBM20 phylogeny we selected the WAG model of amino acids evolution because it returned the best posterior probability score in corresponding Bayesian phylogenetic analysis. We used a model with four categories of estimated gamma rates of evolution as well as an estimate of the proportion of invariable sites. The overall topology of the ML consensus trees corresponds to that of the Bayesian trees and the values between parentheses in Figure 2 correspond to bootstrap values. Trees were generated using FigTree (http://tree.bio.ed.ac.uk/software/figtree/).
Exon-intron structures were predicted based on alignment of corresponding protein sequences with genome assemblies, using the online tool WEBSCIPIO.18 Positions of introns were reported on the protein sequences by inserting the “XXXXX” characters at the junction between two consecutive exons. We generated multiple alignments to determine conservation of intron positions between species and clades.
We thank D. Bhattacharya for C. tuberculosum and P. cruentum sequences, and A. Weber for the G. sulphuraria sequence.
No potential conflicts of interest were disclosed.
Previously published online: www.landesbioscience.com/journals/mge/article/20375