Search tips
Search criteria

Results 1-25 (1152834)

Clipboard (0)

Related Articles

1.  The Last Universal Common Ancestor: emergence, constitution and genetic legacy of an elusive forerunner 
Biology Direct  2008;3:29.
Since the reclassification of all life forms in three Domains (Archaea, Bacteria, Eukarya), the identity of their alleged forerunner (Last Universal Common Ancestor or LUCA) has been the subject of extensive controversies: progenote or already complex organism, prokaryote or protoeukaryote, thermophile or mesophile, product of a protracted progression from simple replicators to complex cells or born in the cradle of "catalytically closed" entities? We present a critical survey of the topic and suggest a scenario.
LUCA does not appear to have been a simple, primitive, hyperthermophilic prokaryote but rather a complex community of protoeukaryotes with a RNA genome, adapted to a broad range of moderate temperatures, genetically redundant, morphologically and metabolically diverse. LUCA's genetic redundancy predicts loss of paralogous gene copies in divergent lineages to be a significant source of phylogenetic anomalies, i.e. instances where a protein tree departs from the SSU-rRNA genealogy; consequently, horizontal gene transfer may not have the rampant character assumed by many. Examining membrane lipids suggest LUCA had sn1,2 ester fatty acid lipids from which Archaea emerged from the outset as thermophilic by "thermoreduction," with a new type of membrane, composed of sn2,3 ether isoprenoid lipids; this occurred without major enzymatic reconversion. Bacteria emerged by reductive evolution from LUCA and some lineages further acquired extreme thermophily by convergent evolution. This scenario is compatible with the hypothesis that the RNA to DNA transition resulted from different viral invasions as proposed by Forterre. Beyond the controversy opposing "replication first" to metabolism first", the predictive arguments of theories on "catalytic closure" or "compositional heredity" heavily weigh in favour of LUCA's ancestors having emerged as complex, self-replicating entities from which a genetic code arose under natural selection.
Life was born complex and the LUCA displayed that heritage. It had the "body "of a mesophilic eukaryote well before maturing by endosymbiosis into an organism adapted to an atmosphere rich in oxygen. Abundant indications suggest reductive evolution of this complex and heterogeneous entity towards the "prokaryotic" Domains Archaea and Bacteria. The word "prokaryote" should be abandoned because epistemologically unsound.
This article was reviewed by Anthony Poole, Patrick Forterre, and Nicolas Galtier.
PMCID: PMC2478661  PMID: 18613974
2.  Natural history of the E1-like superfamily: implication for adenylation, sulfur transfer and ubiquitin conjugation 
Proteins  2009;75(4):895-910.
The E1-like superfamily is central to ubiquitin (Ub) conjugation, biosynthesis of cysteine, thiamine and MoCo and several secondary metabolites. Yet, its functional diversity and evolutionary history is not well-understood. We develop a natural classification of this superfamily and use it to decipher the major adaptive trends occurring in the evolution of the E1-like superfamily. Within the Rossmann fold, E1-like proteins are closest to NAD(P)/FAD-dependent dehydrogenases and S-AdoMet-dependent methyltransferases. Hence, their phosphotransfer activity is an independent catalytic “invention” with respect to such activities seen in other Rossmannoid folds. Sequence and structure analysis reveals a striking diversity of residues and structures involved in adenylation, sulfotransfer and substrate-binding between different E1-like families, allowing us to predict previously uncharacterized functional adaptations. E1-like proteins are fused to several previously undetected domains, such as a predicted sulfur transfer domain containing a novel superfamily of the TATA-binding protein fold, different types of catalytic domains, a novel winged helix-turn-helix domain and potential adaptor domains related to Ub conjugation. Based on these fusions we develop a generalized model for the linking of E1 catalyzed adenylation/thiolation with further down-stream reactions. This is likely to involve a dynamic interplay between the E1 active sites and diverse fused C-terminal domains. We also predict participation of E1-like domains in previously uncharacterized bacterial secondary metabolism pathways, new cysteine biosynthesis systems, such as those associated with archaeal O-phosphoseryl tRNA, metal-sulfur cluster assembly (e.g. in nitrogen fixation) and Ub-conjugation. Evolutionary reconstructions suggest that the last universal common ancestor (LUCA) contained a single E1-like domain possessing both phosphotransfer and thiolating activities and participating in multiple sulfotransfer reactions. The E1-like superfamily subsequently expanded to include 26 families clustering into three major radiations. These are broadly involved in ubiquitin activation, cofactor and cysteine biosynthesis, and biosynthesis of secondary metabolites. In light of this we present evidence that in eukaryotes other E1-like enzymes, such as Urm1, were independently recruited for Ubl conjugation, probably functioning without conventional E2-like enzymes.
PMCID: PMC2732565  PMID: 19089947
3.  A Bioenergetic Basis for Membrane Divergence in Archaea and Bacteria 
PLoS Biology  2014;12(8):e1001926.
The deepest split in the tree of life is between archaea and bacteria. We show this split can be explained by the late evolution of impermeable membranes, for energetic reasons.
Membrane bioenergetics are universal, yet the phospholipid membranes of archaea and bacteria—the deepest branches in the tree of life—are fundamentally different. This deep divergence in membrane chemistry is reflected in other stark differences between the two domains, including ion pumping and DNA replication. We resolve this paradox by considering the energy requirements of the last universal common ancestor (LUCA). We develop a mathematical model based on the premise that LUCA depended on natural proton gradients. Our analysis shows that such gradients can power carbon and energy metabolism, but only in leaky cells with a proton permeability equivalent to fatty acid vesicles. Membranes with lower permeability (equivalent to modern phospholipids) collapse free-energy availability, precluding exploitation of natural gradients. Pumping protons across leaky membranes offers no advantage, even when permeability is decreased 1,000-fold. We hypothesize that a sodium-proton antiporter (SPAP) provided the first step towards modern membranes. SPAP increases the free energy available from natural proton gradients by ∼60%, enabling survival in 50-fold lower gradients, thereby facilitating ecological spread and divergence. Critically, SPAP also provides a steadily amplifying advantage to proton pumping as membrane permeability falls, for the first time favoring the evolution of ion-tight phospholipid membranes. The phospholipids of archaea and bacteria incorporate different stereoisomers of glycerol phosphate. We conclude that the enzymes involved took these alternatives by chance in independent populations that had already evolved distinct ion pumps. Our model offers a quantitatively robust explanation for why membrane bioenergetics are universal, yet ion pumps and phospholipid membranes arose later and independently in separate populations. Our findings elucidate the paradox that archaea and bacteria share DNA transcription, ribosomal translation, and ATP synthase, yet differ in equally fundamental traits that depend on the membrane, including DNA replication.
Author Summary
The archaea and bacteria are the deepest branches of the tree of life. The two groups are similar in morphology and share some fundamental biochemistry, including the genetic code, but the differences between them are stark, and rank among the great unsolved problems in biology. The composition of cell membranes and walls is utterly different in the two groups, while the mechanism of DNA replication seems unrelated. We address a specific paradox, giving new insight into this deep evolutionary split: membrane bioenergetics are universal, yet the membranes themselves are not. We resolve this paradox by considering the energetics of a hypothetical last universal common ancestor (LUCA) in geochemically sustained proton gradients. Using a quantitative model, we show that LUCA could have used proton gradients to drive carbon and energy metabolism, but only if the membranes were leaky. This requirement precluded ion pumping and the early evolution of phospholipid membranes. We constrain a pathway leading from LUCA to the deep divergence of archaea and bacteria on the basis of incremental increases in free-energy availability. We support our inferences with comparative biochemistry and phylogenetics, and show why the late evolution of modern membranes forced divergence in other traits such as DNA replication.
PMCID: PMC4130499  PMID: 25116890
4.  Phylogenomic analysis of glycogen branching and debranching enzymatic duo 
Branched polymers of glucose are universally used for energy storage in cells, taking the form of glycogen in animals, fungi, Bacteria, and Archaea, and of amylopectin in plants. Some enzymes involved in glycogen and amylopectin metabolism are similarly conserved in all forms of life, but some, interestingly, are not. In this paper we focus on the phylogeny of glycogen branching and debranching enzymes, respectively involved in introducing and removing of the α(1–6) bonds in glucose polymers, bonds that provide the unique branching structure to glucose polymers.
We performed a large-scale phylogenomic analysis of branching and debranching enzymes in over 400 completely sequenced genomes, including more than 200 from eukaryotes. We show that branching and debranching enzymes can be found in all kingdoms of life, including all major groups of eukaryotes, and thus were likely to have been present in the last universal common ancestor (LUCA) but have been lost in seemingly random fashion in numerous single-celled eukaryotes. We also show how animal branching and debranching enzymes evolved from their LUCA ancestors by acquiring additional domains. Furthermore, we show that enzymes commonly perceived as orthologous, such as human branching enzyme GBE1 and E. coli branching enzyme GlgB, are in fact related by a gene duplication and consequently paralogous.
Despite being usually associated with animal liver glycogen and plant starch, energy storage in the form of branched glucose polymers is clearly an ancient process and has probably been present in the last universal common ancestor of all present life. The evolution of the enzymes enabling this form of energy storage is more complex than previously thought and illustrates the need for explicit phylogenomic analysis in the study of even seemingly “simple” metabolic enzymes. Patterns of conservation in the evolution of the glycogen/starch branching and debranching enzymes hint at some as yet unknown mechanisms, as mutations disrupting these patterns lead to a variety of genetic diseases in humans and other mammals.
PMCID: PMC4236520  PMID: 25148856
Glycogen; Starch; Branching; Debranching; Glycogen storage disease; AGL; GBE1; GlgB; GlgX; TreX
5.  A comprehensive evolutionary classification of proteins encoded in complete eukaryotic genomes 
Genome Biology  2004;5(2):R7.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs from seven eukaryotic genomes. The analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes.
Sequencing the genomes of multiple, taxonomically diverse eukaryotes enables in-depth comparative-genomic analysis which is expected to help in reconstructing ancestral eukaryotic genomes and major events in eukaryotic evolution and in making functional predictions for currently uncharacterized conserved genes.
We examined functional and evolutionary patterns in the recently constructed set of 5,873 clusters of predicted orthologs (eukaryotic orthologous groups or KOGs) from seven eukaryotic genomes: Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, Arabidopsis thaliana, Saccharomyces cerevisiae, Schizosaccharomyces pombe and Encephalitozoon cuniculi. Conservation of KOGs through the phyletic range of eukaryotes strongly correlates with their functions and with the effect of gene knockout on the organism's viability. The approximately 40% of KOGs that are represented in six or seven species are enriched in proteins responsible for housekeeping functions, particularly translation and RNA processing. These conserved KOGs are often essential for survival and might approximate the minimal set of essential eukaryotic genes. The 131 single-member, pan-eukaryotic KOGs we identified were examined in detail. For around 20 that remained uncharacterized, functions were predicted by in-depth sequence analysis and examination of genomic context. Nearly all these proteins are subunits of known or predicted multiprotein complexes, in agreement with the balance hypothesis of evolution of gene copy number. Other KOGs show a variety of phyletic patterns, which points to major contributions of lineage-specific gene loss and the 'invention' of genes new to eukaryotic evolution. Examination of the sets of KOGs lost in individual lineages reveals co-elimination of functionally connected genes. Parsimonious scenarios of eukaryotic genome evolution and gene sets for ancestral eukaryotic forms were reconstructed. The gene set of the last common ancestor of the crown group consists of 3,413 KOGs and largely includes proteins involved in genome replication and expression, and central metabolism. Only 44% of the KOGs, mostly from the reconstructed gene set of the last common ancestor of the crown group, have detectable homologs in prokaryotes; the remainder apparently evolved via duplication with divergence and invention of new genes.
The KOG analysis reveals a conserved core of largely essential eukaryotic genes as well as major diversification and innovation associated with evolution of eukaryotic genomes. The results provide quantitative support for major trends of eukaryotic evolution noticed previously at the qualitative level and a basis for detailed reconstruction of evolution of eukaryotic genomes and biology of ancestral forms.
PMCID: PMC395751  PMID: 14759257
6.  Small but versatile: the extraordinary functional and structural diversity of the β-grasp fold 
Biology Direct  2007;2:18.
The β-grasp fold (β-GF), prototyped by ubiquitin (UB), has been recruited for a strikingly diverse range of biochemical functions. These functions include providing a scaffold for different enzymatic active sites (e.g. NUDIX phosphohydrolases) and iron-sulfur clusters, RNA-soluble-ligand and co-factor-binding, sulfur transfer, adaptor functions in signaling, assembly of macromolecular complexes and post-translational protein modification. To understand the basis for the functional versatility of this small fold we undertook a comprehensive sequence-structure analysis of the fold and developed a natural classification for its members.
As a result we were able to define the core distinguishing features of the fold and numerous elaborations, including several previously unrecognized variants. Systematic analysis of all known interactions of the fold showed that its manifold functional abilities arise primarily from the prominent β-sheet, which provides an exposed surface for diverse interactions or additionally, by forming open barrel-like structures. We show that in the β-GF both enzymatic activities and the binding of diverse co-factors (e.g. molybdopterin) have independently evolved on at least three occasions each, and iron-sulfur-cluster-binding on at least two independent occasions. Our analysis identified multiple previously unknown large monophyletic assemblages within the β-GF, including one which unifies versions found in the fasciclin-1 superfamily, the ribosomal protein L25, the phosphoribosyl AMP cyclohydrolase (HisI) and glutamine synthetase. We also uncovered several new groups of β-GF domains including a domain found in bacterial flagellar and fimbrial assembly components, and 5 new UB-like domains in the eukaryotes.
Evolutionary reconstruction indicates that the β-GF had differentiated into at least 7 distinct lineages by the time of the last universal common ancestor of all extant organisms, encompassing much of the structural diversity observed in extant versions of the fold. The earliest β-GF members were probably involved in RNA metabolism and subsequently radiated into various functional niches. Most of the structural diversification occurred in the prokaryotes, whereas the eukaryotic phase was mainly marked by a specific expansion of the ubiquitin-like β-GF members. The eukaryotic UB superfamily diversified into at least 67 distinct families, of which at least 19–20 families were already present in the eukaryotic common ancestor, including several protein and one lipid conjugated forms. Another key aspect of the eukaryotic phase of evolution of the β-GF was the dramatic increase in domain architectural complexity of proteins related to the expansion of UB-like domains in numerous adaptor roles.
This article was reviewed by Igor Zhulin, Arcady Mushegian and Frank Eisenhaber.
PMCID: PMC1949818  PMID: 17605815
7.  What Does Virus Evolution Tell Us about Virus Origins?▿ 
Journal of Virology  2011;85(11):5247-5251.
Despite recent advances in our understanding of diverse aspects of virus evolution, particularly on the epidemiological scale, revealing the ultimate origins of viruses has proven to be a more intractable problem. Herein, I review some current ideas on the evolutionary origins of viruses and assess how well these theories accord with what we know about the evolution of contemporary viruses. I note the growing evidence for the theory that viruses arose before the last universal cellular ancestor (LUCA). This ancient origin theory is supported by the presence of capsid architectures that are conserved among diverse RNA and DNA viruses and by the strongly inverse relationship between genome size and mutation rate across all replication systems, such that pre-LUCA genomes were probably both small and highly error prone and hence RNA virus-like. I also highlight the advances that are needed to come to a better understanding of virus origins, most notably the ability to accurately infer deep evolutionary history from the phylogenetic analysis of conserved protein structures.
PMCID: PMC3094976  PMID: 21450811
8.  Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases 
The eukaryotic RNA-dependent RNA polymerase (RDRP) is involved in the amplification of regulatory microRNAs during post-transcriptional gene silencing. This enzyme is highly conserved in most eukaryotes but is missing in archaea and bacteria. No evolutionary relationship between RDRP and other polymerases has been reported so far, hence the origin of this eukaryote-specific polymerase remains a mystery.
Using extensive sequence profile searches, we identified bacteriophage homologs of the eukaryotic RDRP. The comparison of the eukaryotic RDRP and their homologs from bacteriophages led to the delineation of the conserved portion of these enzymes, which is predicted to harbor the catalytic site. Further, detailed sequence comparison, aided by examination of the crystal structure of the DNA-dependent RNA polymerase (DDRP), showed that the RDRP and the β' subunit of DDRP (and its orthologs in archaea and eukaryotes) contain a conserved double-psi β-barrel (DPBB) domain. This DPBB domain contains the signature motif DbDGD (b is a bulky residue), which is conserved in all RDRPs and DDRPs and contributes to catalysis via a coordinated divalent cation. Apart from the DPBB domain, no similarity was detected between RDRP and DDRP, which leaves open two scenarios for the origin of RDRP: i) RDRP evolved at the onset of the evolution of eukaryotes via a duplication of the DDRP β' subunit followed by dramatic divergence that obliterated the sequence similarity outside the core catalytic domain and ii) the primordial RDRP, which consisted primarily of the DPBB domain, evolved from a common ancestor with the DDRP at a very early stage of evolution, during the RNA world era. The latter hypothesis implies that RDRP had been subsequently eliminated from cellular life forms and might have been reintroduced into the eukaryotic genomes through a bacteriophage. Sequence and structure analysis of the DDRP led to further insights into the evolution of RNA polymerases. In addition to the β' subunit, β subunit of DDRP also contains a DPBB domain, which is, however, distorted by large inserts and does not harbor a counterpart of the DbDGD motif. The DPBB domains of the two DDRP subunits together form the catalytic cleft, with the domain from the β' subunit supplying the metal-coordinating DbDGD motif and the one from the β subunit providing two lysine residues involved in catalysis. Given that the two DPBB domains of DDRP contribute completely different sets of active residues to the catalytic center, it is hypothesized that the ultimate ancestor of RNA polymerases functioned as a homodimer of a generic, RNA-binding DPBB domain. This ancestral protein probably did not have catalytic activity and served as a cofactor for a ribozyme RNA polymerase. Subsequent evolution of DDRP and RDRP involved accretion of distinct sets of additional domains. In the DDRPs, these included a RNA-binding Zn-ribbon, an AT-hook-like module and a sandwich-barrel hybrid motif (SBHM) domain. Further, lineage-specific accretion of SBHM domains and other, DDRP-specific domains is observed in bacterial DDRPs. In contrast, the orthologs of the β' subunit in archaea and eukaryotes contains a four-stranded α + β domain that is shared with the α-subunit of bacterial DDRP, eukaryotic DDRP subunit RBP11, translation factor eIF1 and type II topoisomerases. The additional domains of the RDRPs remain to be characterized.
Eukaryotic RNA-dependent RNA polymerases share the catalytic double-psi β-barrel domain, containing a signature metal-coordinating motif, with the universally conserved β' subunit of DNA-dependent RNA polymerases. Beyond this core catalytic domain, the two classes of RNA polymerases do not have common domains, suggesting early divergence from a common ancestor, with subsequent independent domain accretion. The β-subunit of DDRP contains another, highly diverged DPBB domain. The presence of two distinct DPBB domains in two subunits of DDRP is compatible with the hypothesis that the ultimate ancestor of RNA polymerases was a RNA-binding DPBB domain that had no catalytic activity but rather functioned as a homodimeric cofactor for a ribozyme polymerase.
PMCID: PMC151600  PMID: 12553882
9.  Analysis of two domains with novel RNA-processing activities throws light on the complex evolution of ribosomal RNA biogenesis 
Frontiers in Genetics  2014;5:424.
Ribosomal biogenesis has been extensively investigated, especially to identify the elusive nucleases and cofactors involved in the complex rRNA processing events in eukaryotes. Large-scale screens in yeast identified two biochemically uncharacterized proteins, TSR3 and TSR4, as being key players required for rRNA maturation. Using multiple computational approaches we identify the conserved domains comprising these proteins and establish sequence and structural features providing novel insights regarding their roles. TSR3 is unified with the DTW domain into a novel superfamily of predicted enzymatic domains, with the balance of the available evidence pointing toward an RNase role with the archaeo-eukaryotic TSR3 proteins processing rRNA and the bacterial versions potentially processing tRNA. TSR4, its other eukaryotic homologs PDCD2/rp-8, PDCD2L, Zfrp8, and trus, the predominantly bacterial DUF1963 proteins, and other uncharacterized proteins are unified into a new domain superfamily, which arose from an ancient duplication event of a strand-swapped, dimer-forming all-beta unit. We identify conserved features mediating protein-protein interactions (PPIs) and propose a potential chaperone-like function. While contextual evidence supports a conserved role in ribosome biogenesis for the eukaryotic TSR4-related proteins, there is no evidence for such a role for the bacterial versions. Whereas TSR3-related proteins can be traced to the last universal common ancestor (LUCA) with a well-supported archaeo-eukaryotic branch, TSR4-related proteins of eukaryotes are derived from within the bacterial radiation of this superfamily, with archaea entirely lacking them. This provides evidence for “systems admixture,” which followed the early endosymbiotic event, playing a key role in the emergence of the uniquely eukaryotic ribosome biogenesis process.
PMCID: PMC4275035  PMID: 25566315
rRNA; TSR4; TSR3; 20S; 18S rRNA; tRNA; DTW domain; endosymbiosis
10.  On the evolution of the tRNA-dependent amidotransferases, GatCAB and GatDE 
Journal of molecular biology  2008;377(3):831-844.
Glutaminyl-tRNA synthetase and asparaginyl-tRNA synthetase evolved from glutamyl-tRNA synthetase and aspartyl-tRNA synthetase, respectively, after the split in the last universal communal ancestor (LUCA). Glutaminyl-tRNAGln and asparaginyl-tRNAAsn were likely formed in LUCA by amidation of the mischarged species, glutamyl-tRNAGln and aspartyl-tRNAAsn, by tRNA-dependent amidotransferases as is still the case in most bacteria and all known archaea. The amidotransferase GatCAB is found in both domains of life while the heterodimeric amidotransferase, GatDE, is found only in Archaea. The GatB and GatE subunits belong to a unique protein family with Pet112 that is encoded in the nuclear genomes of numerous eukaryotes. GatE was thought to have evolved from GatB after the emergence of the modern lines of decent. Our phylogenetic analysis though places the split between GatE and GatB prior to the phylogenetic divide between Bacteria and Archaea and Pet112 to be of mitochondrial origin. In addition, GatD appears to have emerged prior to the bacterial-archaeal phylogenetic divide. Thus, while GatDE is an archaeal signature protein it likely was present in LUCA together with GatCAB. Archaea retained both amidotransferases while Bacteria emerged with only GatCAB. The presence of GatDE has favored a unique archaeal tRNAGln that may be preventing acquisition of glutaminyl-tRNA synthetase in Archaea. Archaeal GatCAB on the other hand has not favored a distinct tRNAAsn suggesting tRNAAsn recognition is not a major barrier to the retention of asparaginyl-tRNA synthetase in more Archaea.
PMCID: PMC2366055  PMID: 18279892
tRNA-dependent amidotransferase; GatCAB; GatDE; Pet112; LUCA
11.  Comparative genomics of proteins involved in RNA nucleocytoplasmic export 
The establishment of the nuclear membrane resulted in the physical separation of transcription and translation, and presented early eukaryotes with a formidable challenge: how to shuttle RNA from the nucleus to the locus of protein synthesis. In prokaryotes, mRNA is translated as it is being synthesized, whereas in eukaryotes mRNA is synthesized and processed in the nucleus, and it is then exported to the cytoplasm. In metazoa and fungi, the different RNA species are exported from the nucleus by specialized pathways. For example, tRNA is exported by exportin-t in a RanGTP-dependent fashion. By contrast, mRNAs are associated to ribonucleoproteins (RNPs) and exported by an essential shuttling complex (TAP-p15 in human, Mex67-mtr2 in yeast) that transports them through the nuclear pore. The different RNA export pathways appear to be well conserved among members of Opisthokonta, the eukaryotic supergroup that includes Fungi and Metazoa. However, it is not known whether RNA export in the other eukaryotic supergroups follows the same export routes as in opisthokonts.
Our objective was to reconstruct the evolutionary history of the different RNA export pathways across eukaryotes. To do so, we screened an array of eukaryotic genomes for the presence of homologs of the proteins involved in RNA export in Metazoa and Fungi, using human and yeast proteins as queries.
Our genomic comparisons indicate that the basic components of the RanGTP-dependent RNA pathways are conserved across eukaryotes, and thus we infer that these are traceable to the last eukaryotic common ancestor (LECA). On the other hand, several of the proteins involved in RanGTP-independent mRNA export pathways are less conserved, which would suggest that they represent innovations that appeared later in the evolution of eukaryotes.
Our analyses suggest that the LECA possessed the basic components of the different RNA export mechanisms found today in opisthokonts, and that these mechanisms became more specialized throughout eukaryotic evolution.
PMCID: PMC3032688  PMID: 21223572
12.  Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism 
A comprehensive genome-scale metabolic network of Chlamydomonas reinhardtii, including a detailed account of light-driven metabolism, is reconstructed and validated. The model provides a new resource for research of C. reinhardtii metabolism and in algal biotechnology.
The genome-scale metabolic network of Chlamydomonas reinhardtii (iRC1080) was reconstructed, accounting for >32% of the estimated metabolic genes encoded in the genome, and including extensive details of lipid metabolic pathways.This is the first metabolic network to explicitly account for stoichiometry and wavelengths of metabolic photon usage, providing a new resource for research of C. reinhardtii metabolism and developments in algal biotechnology.Metabolic functional annotation and the largest transcript verification of a metabolic network to date was performed, at least partially verifying >90% of the transcripts accounted for in iRC1080. Analysis of the network supports hypotheses concerning the evolution of latent lipid pathways in C. reinhardtii, including very long-chain polyunsaturated fatty acid and ceramide synthesis pathways.A novel approach for modeling light-driven metabolism was developed that accounts for both light source intensity and spectral quality of emitted light. The constructs resulting from this approach, termed prism reactions, were shown to significantly improve the accuracy of model predictions, and their use was demonstrated for evaluation of light source efficiency and design.
Algae have garnered significant interest in recent years, especially for their potential application in biofuel production. The hallmark, model eukaryotic microalgae Chlamydomonas reinhardtii has been widely used to study photosynthesis, cell motility and phototaxis, cell wall biogenesis, and other fundamental cellular processes (Harris, 2001). Characterizing algal metabolism is key to engineering production strains and understanding photobiological phenomena. Based on extensive literature on C. reinhardtii metabolism, its genome sequence (Merchant et al, 2007), and gene functional annotation, we have reconstructed and experimentally validated the genome-scale metabolic network for this alga, iRC1080, the first network to account for detailed photon absorption permitting growth simulations under different light sources. iRC1080 accounts for 1080 genes, associated with 2190 reactions and 1068 unique metabolites and encompasses 83 subsystems distributed across 10 cellular compartments (Figure 1A). Its >32% coverage of estimated metabolic genes is a tremendous expansion over previous algal reconstructions (Boyle and Morgan, 2009; Manichaikul et al, 2009). The lipid metabolic pathways of iRC1080 are considerably expanded relative to existing networks, and chemical properties of all metabolites in these pathways are accounted for explicitly, providing sufficient detail to completely specify all individual molecular species: backbone molecule and stereochemical numbering of acyl-chain positions; acyl-chain length; and number, position, and cis–trans stereoisomerism of carbon–carbon double bonds. Such detail in lipid metabolism will be critical for model-driven metabolic engineering efforts.
We experimentally verified transcripts accounted for in the network under permissive growth conditions, detecting >90% of tested transcript models (Figure 1B) and providing validating evidence for the contents of iRC1080. We also analyzed the extent of transcript verification by specific metabolic subsystems. Some subsystems stood out as more poorly verified, including chloroplast and mitochondrial transport systems and sphingolipid metabolism, all of which exhibited <80% of transcripts detected, reflecting incomplete characterization of compartmental transporters and supporting a hypothesis of latent pathway evolution for ceramide synthesis in C. reinhardtii. Additional lines of evidence from the reconstruction effort similarly support this hypothesis including lack of ceramide synthetase and other annotation gaps downstream in sphingolipid metabolism. A similar hypothesis of latent pathway evolution was established for very long-chain fatty acids (VLCFAs) and their polyunsaturated analogs (VLCPUFAs) (Figure 1C), owing to the absence of this class of lipids in previous experimental measurements, lack of a candidate VLCFA elongase in the functional annotation, and additional downstream annotation gaps in arachidonic acid metabolism.
The network provides a detailed account of metabolic photon absorption by light-driven reactions, including photosystems I and II, light-dependent protochlorophyllide oxidoreductase, provitamin D3 photoconversion to vitamin D3, and rhodopsin photoisomerase; this network accounting permits the precise modeling of light-dependent metabolism. iRC1080 accounts for effective light spectral ranges through analysis of biochemical activity spectra (Figure 3A), either reaction activity or absorbance at varying light wavelengths. Defining effective spectral ranges associated with each photon-utilizing reaction enabled our network to model growth under different light sources via stoichiometric representation of the spectral composition of emitted light, termed prism reactions. Coefficients for different photon wavelengths in a prism reaction correspond to the ratios of photon flux in the defined effective spectral ranges to the total emitted photon flux from a given light source (Figure 3B). This approach distinguishes the amount of emitted photons that drive different metabolic reactions. We created prism reactions for most light sources that have been used in published studies for algal and plant growth including solar light, various light bulbs, and LEDs. We also included regulatory effects, resulting from lighting conditions insofar as published studies enabled. Light and dark conditions have been shown to affect metabolic enzyme activity in C. reinhardtii on multiple levels: transcriptional regulation, chloroplast RNA degradation, translational regulation, and thioredoxin-mediated enzyme regulation. Through application of our light model and prism reactions, we were able to closely recapitulate experimental growth measurements under solar, incandescent, and red LED lights. Through unbiased sampling, we were able to establish the tremendous statistical significance of the accuracy of growth predictions achievable through implementation of prism reactions. Finally, application of the photosynthetic model was demonstrated prospectively to evaluate light utilization efficiency under different light sources. The results suggest that, of the existing light sources, red LEDs provide the greatest efficiency, about three times as efficient as sunlight. Extending this analysis, the model was applied to design a maximally efficient LED spectrum for algal growth. The result was a 677-nm peak LED spectrum with a total incident photon flux of 360 μE/m2/s, suggesting that for the simple objective of maximizing growth efficiency, LED technology has already reached an effective theoretical optimum.
In summary, the C. reinhardtii metabolic network iRC1080 that we have reconstructed offers insight into the basic biology of this species and may be employed prospectively for genetic engineering design and light source design relevant to algal biotechnology. iRC1080 was used to analyze lipid metabolism and generate novel hypotheses about the evolution of latent pathways. The predictive capacity of metabolic models developed from iRC1080 was demonstrated in simulating mutant phenotypes and in evaluation of light source efficiency. Our network provides a broad knowledgebase of the biochemistry and genomics underlying global metabolism of a photoautotroph, and our modeling approach for light-driven metabolism exemplifies how integration of largely unvisited data types, such as physicochemical environmental parameters, can expand the diversity of applications of metabolic networks.
Metabolic network reconstruction encompasses existing knowledge about an organism's metabolism and genome annotation, providing a platform for omics data analysis and phenotype prediction. The model alga Chlamydomonas reinhardtii is employed to study diverse biological processes from photosynthesis to phototaxis. Recent heightened interest in this species results from an international movement to develop algal biofuels. Integrating biological and optical data, we reconstructed a genome-scale metabolic network for this alga and devised a novel light-modeling approach that enables quantitative growth prediction for a given light source, resolving wavelength and photon flux. We experimentally verified transcripts accounted for in the network and physiologically validated model function through simulation and generation of new experimental growth data, providing high confidence in network contents and predictive applications. The network offers insight into algal metabolism and potential for genetic engineering and efficient light source design, a pioneering resource for studying light-driven metabolism and quantitative systems biology.
PMCID: PMC3202792  PMID: 21811229
Chlamydomonas reinhardtii; lipid metabolism; metabolic engineering; photobioreactor
13.  Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing 
Biology Direct  2013;8:15.
The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity.
The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes.
Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life.
This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin
PMCID: PMC3710099  PMID: 23768067
14.  Comparative Analysis of RNA Families Reveals Distinct Repertoires for Each Domain of Life 
PLoS Computational Biology  2012;8(11):e1002752.
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer (HGT), and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner.
Author Summary
In cells, DNA carries recipes for making proteins, and proteins perform chemical reactions, including replication of DNA. This interdependency raises questions for early evolution, since one molecule seemingly cannot exist without the other. A resolution to this problem is the RNA world, where RNA is postulated to have been both genetic material and primary catalyst. While artificially selected catalytic RNAs strengthen the chemical plausibility of an RNA world, a biological prediction is that some RNAs should date back to this period. In this study, we ask to what degree RNAs in extant organisms trace back to the common ancestor of cellular life. Using the Rfam RNA families database, we systematically screened genomes spanning the three domains of life (Archaea, Bacteria, Eukarya) for RNA genes, and examined how far back in evolution known RNA families can be traced. We find that 99% of RNA families are restricted to a single domain. Limited conservation within domains implies ongoing emergence of RNA functions during evolution. Of the remaining 1%, half show evidence of horizontal transfer (movement of genes between organisms), and half show an evolutionary history consistent with an RNA world. The oldest RNAs are primarily associated with protein synthesis and export.
PMCID: PMC3486863  PMID: 23133357
15.  Prokaryotic homologs of Argonaute proteins are predicted to function as key components of a novel system of defense against mobile genetic elements 
Biology Direct  2009;4:29.
In eukaryotes, RNA interference (RNAi) is a major mechanism of defense against viruses and transposable elements as well of regulating translation of endogenous mRNAs. The RNAi systems recognize the target RNA molecules via small guide RNAs that are completely or partially complementary to a region of the target. Key components of the RNAi systems are proteins of the Argonaute-PIWI family some of which function as slicers, the nucleases that cleave the target RNA that is base-paired to a guide RNA. Numerous prokaryotes possess the CRISPR-associated system (CASS) of defense against phages and plasmids that is, in part, mechanistically analogous but not homologous to eukaryotic RNAi systems. Many prokaryotes also encode homologs of Argonaute-PIWI proteins but their functions remain unknown.
We present a detailed analysis of Argonaute-PIWI protein sequences and the genomic neighborhoods of the respective genes in prokaryotes. Whereas eukaryotic Ago/PIWI proteins always contain PAZ (oligonucleotide binding) and PIWI (active or inactivated nuclease) domains, the prokaryotic Argonaute homologs (pAgos) fall into two major groups in which the PAZ domain is either present or absent. The monophyly of each group is supported by a phylogenetic analysis of the conserved PIWI-domains. Almost all pAgos that lack a PAZ domain appear to be inactivated, and the respective genes are associated with a variety of predicted nucleases in putative operons. An additional, uncharacterized domain that is fused to various nucleases appears to be a unique signature of operons encoding the short (lacking PAZ) pAgo form. By contrast, almost all PAZ-domain containing pAgos are predicted to be active nucleases. Some proteins of this group (e.g., that from Aquifex aeolicus) have been experimentally shown to possess nuclease activity, and are not typically associated with genes for other (putative) nucleases. Given these observations, the apparent extensive horizontal transfer of pAgo genes, and their common, statistically significant over-representation in genomic neighborhoods enriched in genes encoding proteins involved in the defense against phages and/or plasmids, we hypothesize that pAgos are key components of a novel class of defense systems. The PAZ-domain containing pAgos are predicted to directly destroy virus or plasmid nucleic acids via their nuclease activity, whereas the apparently inactivated, PAZ-lacking pAgos could be structural subunits of protein complexes that contain, as active moieties, the putative nucleases that we predict to be co-expressed with these pAgos. All these nucleases are predicted to be DNA endonucleases, so it seems most probable that the putative novel phage/plasmid-defense system targets phage DNA rather than mRNAs. Given that in eukaryotic RNAi systems, the PAZ domain binds a guide RNA and positions it on the complementary region of the target, we further speculate that pAgos function on a similar principle (the guide being either DNA or RNA), and that the uncharacterized domain found in putative operons with the short forms of pAgos is a functional substitute for the PAZ domain.
The hypothesis that pAgos are key components of a novel prokaryotic immune system that employs guide RNA or DNA molecules to degrade nucleic acids of invading mobile elements implies a functional analogy with the prokaryotic CASS and a direct evolutionary connection with eukaryotic RNAi. The predictions of the hypothesis including both the activities of pAgos and those of the associated endonucleases are readily amenable to experimental tests.
This article was reviewed by Daniel Haft, Martijn Huynen, and Chris Ponting.
PMCID: PMC2743648  PMID: 19706170
16.  Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes 
Comparative analysis of sequenced genomes reveals numerous instances of apparent horizontal gene transfer (HGT), at least in prokaryotes, and indicates that lineage-specific gene loss might have been even more common in evolution. This complicates the notion of a species tree, which needs to be re-interpreted as a prevailing evolutionary trend, rather than the full depiction of evolution, and makes reconstruction of ancestral genomes a non-trivial task.
We addressed the problem of constructing parsimonious scenarios for individual sets of orthologous genes given a species tree. The orthologous sets were taken from the database of Clusters of Orthologous Groups of proteins (COGs). We show that the phyletic patterns (patterns of presence-absence in completely sequenced genomes) of almost 90% of the COGs are inconsistent with the hypothetical species tree. Algorithms were developed to reconcile the phyletic patterns with the species tree by postulating gene loss, COG emergence and HGT (the latter two classes of events were collectively treated as gene gains). We prove that each of these algorithms produces a parsimonious evolutionary scenario, which can be represented as mapping of loss and gain events on the species tree. The distribution of the evolutionary events among the tree nodes substantially depends on the underlying assumptions of the reconciliation algorithm, e.g. whether or not independent gene gains (gain after loss after gain) are permitted. Biological considerations suggest that, on average, gene loss might be a more likely event than gene gain. Therefore different gain penalties were used and the resulting series of reconstructed gene sets for the last universal common ancestor (LUCA) of the extant life forms were analysed. The number of genes in the reconstructed LUCA gene sets grows as the gain penalty increases. However, qualitative examination of the LUCA versions reconstructed with different gain penalties indicates that, even with a gain penalty of 1 (equal weights assigned to a gain and a loss), the set of 572 genes assigned to LUCA might be nearly sufficient to sustain a functioning organism. Under this gain penalty value, the numbers of horizontal gene transfer and gene loss events are nearly identical. This result holds true for two alternative topologies of the species tree and even under random shuffling of the tree. Therefore, the results seem to be compatible with approximately equal likelihoods of HGT and gene loss in the evolution of prokaryotes.
The notion that gene loss and HGT are major aspects of prokaryotic evolution was supported by quantitative analysis of the mapping of the phyletic patterns of COGs onto a hypothetical species tree. Algorithms were developed for constructing parsimonious evolutionary scenarios, which include gene loss and gain events, for orthologous gene sets, given a species tree. This analysis shows, contrary to expectations, that the number of predicted HGT events that occurred during the evolution of prokaryotes might be approximately the same as the number of gene losses. The approach to the reconstruction of evolutionary scenarios employed here is conservative with regard to the detection of HGT because only patterns of gene presence-absence in sequenced genomes are taken into account. In reality, horizontal transfer might have contributed to the evolution of many other genes also, which makes it a dominant force in prokaryotic evolution.
PMCID: PMC149225  PMID: 12515582
17.  Ribonucleotide reduction - horizontal transfer of a required function spans all three domains 
Ribonucleotide reduction is the only de novo pathway for synthesis of deoxyribonucleotides, the building blocks of DNA. The reaction is catalysed by ribonucleotide reductases (RNRs), an ancient enzyme family comprised of three classes. Each class has distinct operational constraints, and are broadly distributed across organisms from all three domains, though few class I RNRs have been identified in archaeal genomes, and classes II and III likewise appear rare across eukaryotes. In this study, we examine whether this distribution is best explained by presence of all three classes in the Last Universal Common Ancestor (LUCA), or by horizontal gene transfer (HGT) of RNR genes. We also examine to what extent environmental factors may have impacted the distribution of RNR classes.
Our phylogenies show that the Last Eukaryotic Common Ancestor (LECA) possessed a class I RNR, but that the eukaryotic class I enzymes are not directly descended from class I RNRs in Archaea. Instead, our results indicate that archaeal class I RNR genes have been independently transferred from bacteria on two occasions. While LECA possessed a class I RNR, our trees indicate that this is ultimately bacterial in origin. We also find convincing evidence that eukaryotic class I RNR has been transferred to the Bacteroidetes, providing a stunning example of HGT from eukaryotes back to Bacteria. Based on our phylogenies and available genetic and genomic evidence, class II and III RNRs in eukaryotes also appear to have been transferred from Bacteria, with subsequent within-domain transfer between distantly-related eukaryotes. Under the three-domains hypothesis the RNR present in the last common ancestor of Archaea and eukaryotes appears, through a process of elimination, to have been a dimeric class II RNR, though limited sampling of eukaryotes precludes a firm conclusion as the data may be equally well accounted for by HGT.
Horizontal gene transfer has clearly played an important role in the evolution of the RNR repertoire of organisms from all three domains of life. Our results clearly show that class I RNRs have spread to Archaea and eukaryotes via transfers from the bacterial domain, indicating that class I likely evolved in the Bacteria. However, against the backdrop of ongoing transfers, it is harder to establish whether class II or III RNRs were present in the LUCA, despite the fact that ribonucleotide reduction is an essential cellular reaction and was pivotal to the transition from RNA to DNA genomes. Instead, a general pattern of ongoing horizontal transmission emerges wherein environmental and enzyme operational constraints, especially the presence or absence of oxygen, are likely to be major determinants of the RNR repertoire of genomes.
PMCID: PMC3019208  PMID: 21143941
18.  Combination of the loss of cmnm5U34 with the lack of s2U34 modifications of tRNALys, tRNAGlu and tRNAGln altered mitochondrial biogenesis and respiration 
Journal of molecular biology  2009;395(5):1038.
Yeast Saccharomyces cerevisiae MTO2, MTO1 and MSS1 genes encoded highly conserved tRNA modifying enzymes for the biosynthesis of cmnm5s2U34 in mitochondrial tRNALys, tRNAGlu and tRNAGln. In fact, Mto1p and Mss1p are involved in the biosynthesis of the cmnm5 group (cmnm5U34), while Mto2p is responsible for the 2-thiouridylation (s2U34) of these tRNAs. Previous studies showed that partial modifications at U34 in mitochondrial tRNA enabled mto1, mto2 and mss1 strains to respire. In this report, we investigated the functional interaction between MTO2, MTO1 and MSS1 genes by using the mto2, mto1 and mss1 single, double and triple mutants. Strikingly, the deletion of MTO2 was synthetically lethal with a mutation of MSS1 or deletion of MTO1 on medium containing glycerol, but not on medium containing glucose. Interestingly, there were no detectable levels of 9 tRNAs including tRNALys, tRNAGlu and tRNAGln in mto2/mss1, mto2/mto1 and mto2/mto1/mss1 strains. Furthermore, mto2/mss1, mto2/mto1 and mto2/mto1/mss1 mutants exhibited extremely low levels of COX1 and CYTB mRNA, 15S and 21S rRNA as well as the complete loss of mitochondrial protein synthesis. The synthetic enhancement combinations likely resulted from the completely abolished modification at U34 of tRNALys, tRNAGlu and tRNAGln, caused by the combination of eliminating the 2-thiouridylation by the mto2 mutation with the absence of the cmnm5U34 by the mto1 or mss1 mutation. The complete loss of modifications at U34 of tRNAs altered mitochondrial RNA metabolisms, causing a degradation of mitochondrial tRNA, mRNA and rRNAs. As a result, failures in mitochondrial RNA metabolisms were responsible for the complete loss of mitochondrial translation. Consequently, defects in mitochondrial protein synthesis caused the instability of their mitochondrial genomes, thus producing the respiratory deficient phenotypes. Therefore, our findings demonstrated a critical role of modifications at U34 of tRNALys, tRNAGlu and tRNAGln in maintenance of mitochondrial genome, mitochondrial RNA stability, translation and respiratory function.
PMCID: PMC2818684  PMID: 20004207
Mitochondrial tRNA; nucleotide modification; respiration; biogenesis; metabolism
19.  On the Origin of Cells and Viruses: Primordial Virus World Scenario 
It is proposed that the pre-cellular stage of biological evolution unraveled within networks of inorganic compartments that harbored a diverse mix of virus-like genetic elements. This stage of evolution might comprise the Last Universal Cellular Ancestor (LUCA) that more appropriately could be denoted Last Universal Cellular Ancestral State (LUCAS). This scenario for the origin of cellular life recapitulates the early ideas of J. B. S. Haldane sketched in his classic 1928 essay. However, unlike in Haldane’s day, there is now considerable support for this scenario from three major lines of comparative-genomic evidence: i) lack of homology between the core components of the DNA replication systems of the two primary lines of descent of cellular life forms, archaea and bacteria, ii) distinct membrane chemistries and lack of homology between the enzymes of lipid biosynthesis in archaea and bacteria, iii) spread of several viral hallmark genes, which encode proteins with key functions in viral replication and morphogenesis, among numerous and extremely diverse groups of viruses, in contrast to their absence in cellular life forms, iv) the extant archaeal and bacterial chromosomes appear to be shaped by accretion of diverse, smaller replicons, suggesting a continuity between the hypothetical, primordial virus stage of life’s evolution and the dynamic prokaryotic world that existed ever since. Under the viral model of pre-cellular evolution, the key components of cells including the replication apparatus, membranes, and molecular complexes involved in membrane transport and translocation originated as components of virus-like entities. The two surviving types of cellular life forms, archaea and bacteria, might have emerged from the LUCAS independently, along with, probably, numerous forms now extinct.
PMCID: PMC3380365  PMID: 19845627
comparative genomics; evolution of cells; evolution of viruses; origin of membranes; viral hallmark genes
20.  From Endosymbiont to Host-Controlled Organelle: The Hijacking of Mitochondrial Protein Synthesis and Metabolism  
PLoS Computational Biology  2007;3(11):e219.
Mitochondria are eukaryotic organelles that originated from the endosymbiosis of an alpha-proteobacterium. To gain insight into the evolution of the mitochondrial proteome as it proceeded through the transition from a free-living cell to a specialized organelle, we compared a reconstructed ancestral proteome of the mitochondrion with the proteomes of alpha-proteobacteria as well as with the mitochondrial proteomes in yeast and man. Overall, there has been a large turnover of the mitochondrial proteome during the evolution of mitochondria. Early in the evolution of the mitochondrion, proteins involved in cell envelope synthesis have virtually disappeared, whereas proteins involved in replication, transcription, cell division, transport, regulation, and signal transduction have been replaced by eukaryotic proteins. More than half of what remains from the mitochondrial ancestor in modern mitochondria corresponds to translation, including post-translational modifications, and to metabolic pathways that are directly, or indirectly, involved in energy conversion. Altogether, the results indicate that the eukaryotic host has hijacked the proto-mitochondrion, taking control of its protein synthesis and metabolism.
Author Summary
Mitochondria are compartments from the eukaryotic cell that originated from the endosymbiosys of an alpha-proteobacterium. The bacterial-like metabolism of this early endosymbiont was thought to differ substantially from that of modern mitochondria, but so far we do not know the details of this bacterium-to-organelle transformation. To address this issue, we used an evolutionary approach to find genes derived from the ancestor of mitochondria. By identifying eukaryotic genes that are closely related to alpha-proteobacterial ones, we reconstructed a set of genes derived from the mitochondrial ancestor. We used that set to infer the ancestral mitochondrial metabolism, and subsequently compared it with those of modern mitochondria, as reconstructed from proteomics data from yeast and human. This allowed us to trace the metabolic evolution of mitochondria. What we found is that there has been a large turnover of the protein content of mitochondria, which has affected some pathways more than others. Pathways for protein synthesis and those involved in energy conversion have been preferentially retained in the mitochondrion, whereas those involved in replication, transcription, cell division, transport, regulation, and signal transduction have been replaced by eukaryotic proteins. Our findings show how the eukaryotic host has taken control of the endosymbiont, effectively hijacking those pathways that it could use.
PMCID: PMC2062474  PMID: 17983265
21.  Evolution of vacuolar proton pyrophosphatase domains and volutin granules: clues into the early evolutionary origin of the acidocalcisome 
Biology Direct  2011;6:50.
Volutin granules appear to be universally distributed and are morphologically and chemically identical to acidocalcisomes, which are electron-dense granular organelles rich in calcium and phosphate, whose functions include storage of phosphorus and various metal ions, metabolism of polyphosphate, maintenance of intracellular pH, osmoregulation and calcium homeostasis. Prokaryotes are thought to differ from eukaryotes in that they lack membrane-bounded organelles. However, it has been demonstrated that as in acidocalcisomes, the calcium and polyphosphate-rich intracellular "volutin granules (polyphosphate bodies)" in two bacterial species, Agrobacterium tumefaciens, and Rhodospirillum rubrum, are membrane bound and that the vacuolar proton-translocating pyrophosphatases (V-H+PPases) are present in their surrounding membranes. Volutin granules and acidocalcisomes have been found in organisms as diverse as bacteria and humans.
Here, we show volutin granules also occur in Archaea and are, therefore, present in the three superkingdoms of life (Archaea, Bacteria and Eukarya). Molecular analyses of V-H+PPase pumps, which acidify the acidocalcisome lumen and are diagnostic proteins of the organelle, also reveal the presence of this enzyme in all three superkingdoms suggesting it is ancient and universal. Since V-H+PPase sequences contained limited phylogenetic signal to fully resolve the ancestral nodes of the tree, we investigated the divergence of protein domains in the V-H+PPase molecules. Using Protein family (Pfam) database, we found a domain in the protein, PF03030. The domain is shared by 31 species in Eukarya, 231 in Bacteria, and 17 in Archaea. The universal distribution of the V-H+PPase PF03030 domain, which is associated with the V-H+PPase function, suggests the domain and the enzyme were already present in the Last Universal Common Ancestor (LUCA).
The importance of the V-H+PPase function and the evolutionary dynamics of these domains support the early origin of the acidocalcisome organelle. In particular, the universality of volutin granules and presence of a functional V-H+PPase domain in the three superkingdoms of life reveals that the acidocalcisomes may have appeared earlier than the divergence of the superkingdoms. This result is remarkable and highlights the possibility that a high degree of cellular compartmentalization could already have been present in the LUCA.
This article was reviewed by Anthony Poole, Lakshminarayan Iyer and Daniel Kahn
PMCID: PMC3198990  PMID: 21974828
22.  The origin of introns and their role in eukaryogenesis: a compromise solution to the introns-early versus introns-late debate? 
Biology Direct  2006;1:22.
Ever since the discovery of 'genes in pieces' and mRNA splicing in eukaryotes, origin and evolution of spliceosomal introns have been considered within the conceptual framework of the 'introns early' versus 'introns late' debate. The 'introns early' hypothesis, which is closely linked to the so-called exon theory of gene evolution, posits that protein-coding genes were interrupted by numerous introns even at the earliest stages of life's evolution and that introns played a major role in the origin of proteins by facilitating recombination of sequences coding for small protein/peptide modules. Under this scenario, the absence of spliceosomal introns in prokaryotes is considered to be a result of "genome streamlining". The 'introns late' hypothesis counters that spliceosomal introns emerged only in eukaryotes, and moreover, have been inserted into protein-coding genes continuously throughout the evolution of eukaryotes. Beyond the formal dilemma, the more substantial side of this debate has to do with possible roles of introns in the evolution of eukaryotes.
I argue that several lines of evidence now suggest a coherent solution to the introns-early versus introns-late debate, and the emerging picture of intron evolution integrates aspects of both views although, formally, there seems to be no support for the original version of introns-early. Firstly, there is growing evidence that spliceosomal introns evolved from group II self-splicing introns which are present, usually, in small numbers, in many bacteria, and probably, moved into the evolving eukaryotic genome from the α-proteobacterial progenitor of the mitochondria. Secondly, the concept of a primordial pool of 'virus-like' genetic elements implies that self-splicing introns are among the most ancient genetic entities. Thirdly, reconstructions of the ancestral state of eukaryotic genes suggest that the last common ancestor of extant eukaryotes had an intron-rich genome. Thus, it appears that ancestors of spliceosomal introns, indeed, have existed since the earliest stages of life's evolution, in a formal agreement with the introns-early scenario. However, there is no evidence that these ancient introns ever became widespread before the emergence of eukaryotes, hence, the central tenet of introns-early, the role of introns in early evolution of proteins, has no support. However, the demonstration that numerous introns invaded eukaryotic genes at the outset of eukaryotic evolution and that subsequent intron gain has been limited in many eukaryotic lineages implicates introns as an ancestral feature of eukaryotic genomes and refutes radical versions of introns-late. Perhaps, most importantly, I argue that the intron invasion triggered other pivotal events of eukaryogenesis, including the emergence of the spliceosome, the nucleus, the linear chromosomes, the telomerase, and the ubiquitin signaling system. This concept of eukaryogenesis, in a sense, revives some tenets of the exon hypothesis, by assigning to introns crucial roles in eukaryotic evolutionary innovation.
The scenario of the origin and evolution of introns that is best compatible with the results of comparative genomics and theoretical considerations goes as follows: self-splicing introns since the earliest stages of life's evolution – numerous spliceosomal introns invading genes of the emerging eukaryote during eukaryogenesis – subsequent lineage-specific loss and gain of introns. The intron invasion, probably, spawned by the mitochondrial endosymbiont, might have critically contributed to the emergence of the principal features of the eukaryotic cell. This scenario combines aspects of the introns-early and introns-late views.
this article was reviewed by W. Ford Doolittle, James Darnell (nominated by W. Ford Doolittle), William Martin, and Anthony Poole.
PMCID: PMC1570339  PMID: 16907971
23.  Predicting the Minimal Translation Apparatus: Lessons from the Reductive Evolution of Mollicutes 
PLoS Genetics  2014;10(5):e1004363.
Mollicutes is a class of parasitic bacteria that have evolved from a common Firmicutes ancestor mostly by massive genome reduction. With genomes under 1 Mbp in size, most Mollicutes species retain the capacity to replicate and grow autonomously. The major goal of this work was to identify the minimal set of proteins that can sustain ribosome biogenesis and translation of the genetic code in these bacteria. Using the experimentally validated genes from the model bacteria Escherichia coli and Bacillus subtilis as input, genes encoding proteins of the core translation machinery were predicted in 39 distinct Mollicutes species, 33 of which are culturable. The set of 260 input genes encodes proteins involved in ribosome biogenesis, tRNA maturation and aminoacylation, as well as proteins cofactors required for mRNA translation and RNA decay. A core set of 104 of these proteins is found in all species analyzed. Genes encoding proteins involved in post-translational modifications of ribosomal proteins and translation cofactors, post-transcriptional modifications of t+rRNA, in ribosome assembly and RNA degradation are the most frequently lost. As expected, genes coding for aminoacyl-tRNA synthetases, ribosomal proteins and initiation, elongation and termination factors are the most persistent (i.e. conserved in a majority of genomes). Enzymes introducing nucleotides modifications in the anticodon loop of tRNA, in helix 44 of 16S rRNA and in helices 69 and 80 of 23S rRNA, all essential for decoding and facilitating peptidyl transfer, are maintained in all species. Reconstruction of genome evolution in Mollicutes revealed that, beside many gene losses, occasional gains by horizontal gene transfer also occurred. This analysis not only showed that slightly different solutions for preserving a functional, albeit minimal, protein synthetizing machinery have emerged in these successive rounds of reductive evolution but also has broad implications in guiding the reconstruction of a minimal cell by synthetic biology approaches.
Author Summary
In all cells, proteins are synthesized from the message encoded by mRNA using complex machineries involving many proteins and RNAs. In this process, named translation, the ribosome plays a central role. The elements involved in both ribosome biogenesis and its function are extremely conserved in all organisms from the simplest bacteria to mammalian cells. Most of the 260 known proteins involved in translation have been identified and studied in the bacteria Escherichia coli and Bacillus subtilis, two common cellular models in biology. However, comparative genomics has shown that the translation protein set can be much smaller. This is true for bacteria belonging to the class Mollicutes that are characterized by reduced genomes and hence considered as models for minimal cells. Using homology inference approach and expert analyses, we identified the translation apparatus proteins for 39 of these organisms. Although striking variations were found from one group of species to another, some Mollicutes species require half as many proteins as E. coli or B. subtilis. This analysis allowed us to determine a set of proteins necessary for translation in Mollicutes and define the translation apparatus that would be required in a cellular chassis mimicking a minimal bacterial cell.
PMCID: PMC4014445  PMID: 24809820
24.  Origin and Evolution of the Ribosome 
The modern ribosome was largely formed at the time of the last common ancestor, LUCA. Hence its earliest origins likely lie in the RNA world. Central to its development were RNAs that spawned the modern tRNAs and a symmetrical region deep within the large ribosomal RNA, (rRNA), where the peptidyl transferase reaction occurs. To understand pre-LUCA developments, it is argued that events that are coupled in time are especially useful if one can infer a likely order in which they occurred. Using such timing events, the relative age of various proteins and individual regions within the large rRNA are inferred. An examination of the properties of modern ribosomes strongly suggests that the initial peptides made by the primitive ribosomes were likely enriched for l-amino acids, but did not completely exclude d-amino acids. This has implications for the nature of peptides made by the first ribosomes. From the perspective of ribosome origins, the immediate question regarding coding is when did it arise rather than how did the assignments evolve. The modern ribosome is very dynamic with tRNAs moving in and out and the mRNA moving relative to the ribosome. These movements may have become possible as a result of the addition of a template to hold the tRNAs. That template would subsequently become the mRNA, thereby allowing the evolution of the code and making an RNA genome useful. Finally, a highly speculative timeline of major events in ribosome history is presented and possible future directions discussed.
The ribosome evolved before the last universal common ancestor. Evidence from primary sequences, high resolution structural studies, and functional properties of various components provide significant insights to that evolutionary history, which is linked to the origins of the code and chirality.
PMCID: PMC2926754  PMID: 20534711
25.  The mechanistic and evolutionary aspects of the 2′- and 3′-OH paradigm in biosynthetic machinery 
Biology Direct  2013;8:17.
The translation machinery underlies a multitude of biological processes within the cell. The design and implementation of the modern translation apparatus on even the simplest course of action is extremely complex, and involves different RNA and protein factors. According to the “RNA world” idea, the critical link in the translation machinery may be assigned to an adaptor tRNA molecule. Its exceptional functional and structural characteristics are of primary importance in understanding the evolutionary relationships among all these macromolecular components.
Presentation of the hypothesis
The 2′-3′ hydroxyls of the tRNA A76 constitute chemical groups of critical functional importance, as they are implicated in almost all phases of protein biosynthesis. They contribute to: a) each step of the tRNA aminoacylation reaction catalyzed by aminoacyl-tRNA synthetases (aaRSs); b) the isomerase activity of EF-Tu, involving a mixture of the 2′(3′)- aminoacyl tRNA isomers as substrates, thereby producing the required combination of amino acid and tRNA; and c) peptide bond formation at the peptidyl transferase center (PTC) of the ribosome. We hypothesize that specific functions assigned to the 2′-3′ hydroxyls during peptide bond formation co-evolved, together with two modes of attack on the aminoacyl-adenylate carbonyl typical for two classes of aaRSs, and alongside the isomerase activity of EF-Tu. Protein components of the translational apparatus are universally recognized as being of ancient origin, possibly replacing RNA-based enzymes that may have existed before the last universal common ancestor (LUCA). We believe that a remnant of these processes is still imprinted on the organization of modern-day translation.
Testing and implications of the hypothesis
Earlier publications indicate that it is possible to select ribozymes capable of attaching the aa-AMP moiety to RNA molecules. The scenario described herein would gain general acceptance, if a ribozyme able to activate the amino acid and transfer it onto the terminal ribose of the tRNA, would be found in any life form, or generated in vitro. Interestingly, recent studies have demonstrated the plausibility of using metals, likely abandoned under primordial conditions, as biomimetic catalysts of the aminoacylation reaction.
This article was reviewed by Henri Grosjean, Manuel Santos and Eugene Koonin. For complete reviews, go to the Reviewers’ reports section.
PMCID: PMC3716924  PMID: 23835000
Aminoacyl-tRNA synthetases; Elongation factor EF-Tu; Ribosome; 2′-3′ hydroxyls of the ribose

Results 1-25 (1152834)