1.  The transition from noncoded to coded protein synthesis: did coding mRNAs arise from stability-enhancing binding partners to tRNA? 
Biology Direct  2010;5:16.
Understanding the origin of protein synthesis has been notoriously difficult. We have taken as a starting premise Wolf and Koonin's view that "evolution of the translation system is envisaged to occur in a compartmentalized ensemble of replicating, co-selected RNA segments, i.e., in an RNA world containing ribozymes with versatile activities".
Presentation of the hypothesis
We propose that coded protein synthesis arose from a noncoded process in an RNA world as a natural consequence of the accumulation of a range of early tRNAs and their serendipitous RNA binding partners. We propose that, initially, RNA molecules with 3' CCA termini that could be aminoacylated by ribozymes, together with an ancestral peptidyl transferase ribozyme, produced small peptides with random or repetitive sequences. Our concept is that the first tRNA arose in this context from the ligation of two RNA hairpins and could be similarly aminoacylated at its 3' end to become a substrate for peptidyl transfer catalyzed by the ancestral ribozyme. Within this RNA world we hypothesize that proto-mRNAs appeared first simply as serendipitous binding partners, forming complementary base pair interactions with the anticodon loops of tRNA pairs. Initially this may have enhanced stability of the paired tRNA molecules so they were held together in close proximity, better positioning the 3' CCA termini for peptidyl transfer and enhancing the rate of peptide synthesis. If there were a selective advantage for the ensemble through the peptide products synthesized, it would provide a natural pathway for the evolution of a coding system with the expansion of a cohort of different tRNAs and their binding partners. The whole process could have occurred quite unremarkably for such a profound acquisition.
Testing the hypothesis
It should be possible to test the different parts of our model using the isolated contemporary 50S ribosomal subunit initially, and then with RNAs transcribed in vitro together with a minimal set of ribosomal proteins that are required today to support protein synthesis.
Implications of the hypothesis
This model proposes that genetic coding arose de novo from complementary base pair interactions between tRNAs and single-stranded RNAs present in the immediate environment.
This article was reviewed by Eugene Koonin, Rob Knight and Berthold Kastner (nominated by Laura Landweber).
PMCID: PMC2859854  PMID: 20377916
2.  Comparative Analysis of RNA Families Reveals Distinct Repertoires for Each Domain of Life 
PLoS Computational Biology  2012;8(11):e1002752.
The RNA world hypothesis, that RNA genomes and catalysts preceded DNA genomes and genetically-encoded protein catalysts, has been central to models for the early evolution of life on Earth. A key part of such models is continuity between the earliest stages in the evolution of life and the RNA repertoires of extant lineages. Some assessments seem consistent with a diverse RNA world, yet direct continuity between modern RNAs and an RNA world has not been demonstrated for the majority of RNA families, and, anecdotally, many RNA functions appear restricted in their distribution. Despite much discussion of the possible antiquity of RNA families, no systematic analyses of RNA family distribution have been performed. To chart the broad evolutionary history of known RNA families, we performed comparative genomic analysis of over 3 million RNA annotations spanning 1446 families from the Rfam 10 database. We report that 99% of known RNA families are restricted to a single domain of life, revealing discrete repertoires for each domain. For the 1% of RNA families/clans present in more than one domain, over half show evidence of horizontal gene transfer (HGT), and the rest show a vertical trace, indicating the presence of a complex protein synthesis machinery in the Last Universal Common Ancestor (LUCA) and consistent with the evolutionary history of the most ancient protein-coding genes. However, with limited interdomain transfer and few RNA families exhibiting demonstrable antiquity as predicted under RNA world continuity, our results indicate that the majority of modern cellular RNA repertoires have primarily evolved in a domain-specific manner.
Author Summary
In cells, DNA carries recipes for making proteins, and proteins perform chemical reactions, including replication of DNA. This interdependency raises questions for early evolution, since one molecule seemingly cannot exist without the other. A resolution to this problem is the RNA world, where RNA is postulated to have been both genetic material and primary catalyst. While artificially selected catalytic RNAs strengthen the chemical plausibility of an RNA world, a biological prediction is that some RNAs should date back to this period. In this study, we ask to what degree RNAs in extant organisms trace back to the common ancestor of cellular life. Using the Rfam RNA families database, we systematically screened genomes spanning the three domains of life (Archaea, Bacteria, Eukarya) for RNA genes, and examined how far back in evolution known RNA families can be traced. We find that 99% of RNA families are restricted to a single domain. Limited conservation within domains implies ongoing emergence of RNA functions during evolution. Of the remaining 1%, half show evidence of horizontal transfer (movement of genes between organisms), and half show an evolutionary history consistent with an RNA world. The oldest RNAs are primarily associated with protein synthesis and export.
PMCID: PMC3486863  PMID: 23133357
3.  Circularity and self-cleavage as a strategy for the emergence of a chromosome in the RNA-based protocell 
Biology Direct  2013;8:21.
It is now popularly accepted that an “RNA world” existed in early evolution. During division of RNA-based protocells, random distribution of individual genes (simultaneously as ribozymes) between offspring might have resulted in gene loss, especially when the number of gene types increased. Therefore, the emergence of a chromosome carrying linked genes was critical for the prosperity of the RNA world. However, there were quite a few immediate difficulties for this event to occur. For example, a chromosome would be much longer than individual genes, and thus more likely to degrade and less likely to replicate completely; the copying of the chromosome might start at middle sites and be only partial; and, without a complex transcription mechanism, the synthesis of distinct ribozymes would become problematic.
Inspired by features of viroids, which have been suggested as “living fossils” of the RNA world, we supposed that these difficulties could have been overcome if the chromosome adopted a circular form and small, self-cleaving ribozymes (e.g. the hammer head ribozymes) resided at the sites between genes. Computer simulation using a Monte-Carlo method was conducted to investigate this hypothesis. The simulation shows that an RNA chromosome can spread (increase in quantity and be sustained) in the system if it is a circular one and its linear “transcripts” are readily broken at the sites between genes; the chromosome works as genetic material and ribozymes “coded” by it serve as functional molecules; and both circularity and self-cleavage are important for the spread of the chromosome.
In the RNA world, circularity and self-cleavage may have been adopted as a strategy to overcome the immediate difficulties for the emergence of a chromosome (with linked genes). The strategy suggested here is very simple and likely to have been used in this early stage of evolution. By demonstrating the possibility of the emergence of an RNA chromosome, this study opens on the prospect of a prosperous RNA world, populated by RNA-based protocells with a number of genes, showing complicated functions.
This article was reviewed by Sergei Kazakov (nominated by Laura Landweber), Nobuto Takeuchi (nominated by Anthony Poole), and Eugene Koonin.
PMCID: PMC3765326  PMID: 23971788
4.  Evolutionary connection between the catalytic subunits of DNA-dependent RNA polymerases and eukaryotic RNA-dependent RNA polymerases and the origin of RNA polymerases 
The eukaryotic RNA-dependent RNA polymerase (RDRP) is involved in the amplification of regulatory microRNAs during post-transcriptional gene silencing. This enzyme is highly conserved in most eukaryotes but is missing in archaea and bacteria. No evolutionary relationship between RDRP and other polymerases has been reported so far, hence the origin of this eukaryote-specific polymerase remains a mystery.
Using extensive sequence profile searches, we identified bacteriophage homologs of the eukaryotic RDRP. The comparison of the eukaryotic RDRP and their homologs from bacteriophages led to the delineation of the conserved portion of these enzymes, which is predicted to harbor the catalytic site. Further, detailed sequence comparison, aided by examination of the crystal structure of the DNA-dependent RNA polymerase (DDRP), showed that the RDRP and the β' subunit of DDRP (and its orthologs in archaea and eukaryotes) contain a conserved double-psi β-barrel (DPBB) domain. This DPBB domain contains the signature motif DbDGD (b is a bulky residue), which is conserved in all RDRPs and DDRPs and contributes to catalysis via a coordinated divalent cation. Apart from the DPBB domain, no similarity was detected between RDRP and DDRP, which leaves open two scenarios for the origin of RDRP: i) RDRP evolved at the onset of the evolution of eukaryotes via a duplication of the DDRP β' subunit followed by dramatic divergence that obliterated the sequence similarity outside the core catalytic domain and ii) the primordial RDRP, which consisted primarily of the DPBB domain, evolved from a common ancestor with the DDRP at a very early stage of evolution, during the RNA world era. The latter hypothesis implies that RDRP had been subsequently eliminated from cellular life forms and might have been reintroduced into the eukaryotic genomes through a bacteriophage. Sequence and structure analysis of the DDRP led to further insights into the evolution of RNA polymerases. In addition to the β' subunit, β subunit of DDRP also contains a DPBB domain, which is, however, distorted by large inserts and does not harbor a counterpart of the DbDGD motif. The DPBB domains of the two DDRP subunits together form the catalytic cleft, with the domain from the β' subunit supplying the metal-coordinating DbDGD motif and the one from the β subunit providing two lysine residues involved in catalysis. Given that the two DPBB domains of DDRP contribute completely different sets of active residues to the catalytic center, it is hypothesized that the ultimate ancestor of RNA polymerases functioned as a homodimer of a generic, RNA-binding DPBB domain. This ancestral protein probably did not have catalytic activity and served as a cofactor for a ribozyme RNA polymerase. Subsequent evolution of DDRP and RDRP involved accretion of distinct sets of additional domains. In the DDRPs, these included a RNA-binding Zn-ribbon, an AT-hook-like module and a sandwich-barrel hybrid motif (SBHM) domain. Further, lineage-specific accretion of SBHM domains and other, DDRP-specific domains is observed in bacterial DDRPs. In contrast, the orthologs of the β' subunit in archaea and eukaryotes contains a four-stranded α + β domain that is shared with the α-subunit of bacterial DDRP, eukaryotic DDRP subunit RBP11, translation factor eIF1 and type II topoisomerases. The additional domains of the RDRPs remain to be characterized.
Eukaryotic RNA-dependent RNA polymerases share the catalytic double-psi β-barrel domain, containing a signature metal-coordinating motif, with the universally conserved β' subunit of DNA-dependent RNA polymerases. Beyond this core catalytic domain, the two classes of RNA polymerases do not have common domains, suggesting early divergence from a common ancestor, with subsequent independent domain accretion. The β-subunit of DDRP contains another, highly diverged DPBB domain. The presence of two distinct DPBB domains in two subunits of DDRP is compatible with the hypothesis that the ultimate ancestor of RNA polymerases was a RNA-binding DPBB domain that had no catalytic activity but rather functioned as a homodimeric cofactor for a ribozyme polymerase.
PMCID: PMC151600  PMID: 12553882
5.  The Rodin-Ohno hypothesis that two enzyme superfamilies descended from one ancestral gene: an unlikely scenario for the origins of translation that will not be dismissed 
Biology Direct  2014;9:11.
Because amino acid activation is rate-limiting for uncatalyzed protein synthesis, it is a key puzzle in understanding the origin of the genetic code. Two unrelated classes (I and II) of contemporary aminoacyl-tRNA synthetases (aaRS) now translate the code. Observing that codons for the most highly conserved, Class I catalytic peptides, when read in the reverse direction, are very nearly anticodons for Class II defining catalytic peptides, Rodin and Ohno proposed that the two superfamilies descended from opposite strands of the same ancestral gene. This unusual hypothesis languished for a decade, perhaps because it appeared to be unfalsifiable.
The proposed sense/antisense alignment makes important predictions. Fragments that align in antiparallel orientations, and contain the respective active sites, should catalyze the same two reactions catalyzed by contemporary synthetases. Recent experiments confirmed that prediction. Invariant cores from both classes, called Urzymes after Ur = primitive, authentic, plus enzyme and representing ~20% of the contemporary structures, can be expressed and exhibit high, proportionate rate accelerations for both amino-acid activation and tRNA acylation. A major fraction (60%) of the catalytic rate acceleration by contemporary synthetases resides in segments that align sense/antisense. Bioinformatic evidence for sense/antisense ancestry extends to codons specifying the invariant secondary and tertiary structures outside the active sites of the two synthetase classes. Peptides from a designed, 46-residue gene constrained by Rosetta to encode Class I and II ATP binding sites with fully complementary sequences both accelerate amino acid activation by ATP ~400 fold.
Biochemical and bioinformatic results substantially enhance the posterior probability that ancestors of the two synthetase classes arose from opposite strands of the same ancestral gene. The remarkable acceleration by short peptides of the rate-limiting step in uncatalyzed protein synthesis, together with the synergy of synthetase Urzymes and their cognate tRNAs, introduce a new paradigm for the origin of protein catalysts, emphasize the potential relevance of an operational RNA code embedded in the tRNA acceptor stems, and challenge the RNA-World hypothesis.
This article was reviewed by Dr. Paul Schimmel (nominated by Laura Landweber), Dr. Eugene Koonin and Professor David Ardell.
PMCID: PMC4082485  PMID: 24927791
Aminoacyl-tRNA synthetases; Urzymes; Genetic code; Origin of Translation; RNA World hypothesis; Amino acid activation; Structural homology; Ancestral genes; Sense/antisense coding
6.  The scenario on the origin of translation in the RNA world: in principle of replication parsimony 
Biology Direct  2010;5:65.
It is now believed that in the origin of life, proteins should have been "invented" in an RNA world. However, due to the complexity of a possible RNA-based proto-translation system, this evolving process seems quite complicated and the associated scenario remains very blurry. Considering that RNA can bind amino acids with specificity, it has been reasonably supposed that initial peptides might have been synthesized on "RNA templates" containing multiple amino acid binding sites. This "Direct RNA Template (DRT)" mechanism is attractive because it should be the simplest mechanism for RNA to synthesize peptides, thus very likely to have been adopted initially in the RNA world. Then, how this mechanism could develop into a proto-translation system mechanism is an interesting problem.
Presentation of the hypothesis
Here an explanation to this problem is shown considering the principle of "replication parsimony" --- genetic information tends to be utilized in a parsimonious way under selection pressure, due to its replication cost (e.g., in the RNA world, nucleotides and ribozymes for RNA replication). Because a DRT would be quite long even for a short peptide, its replication cost would be great. Thus the diversity and the length of functional peptides synthesized by the DRT mechanism would be seriously limited. Adaptors (proto-tRNAs) would arise to allow a DRT's complementary strand (called "C-DRT" here) to direct the synthesis of the same peptide synthesized by the DRT itself. Because the C-DRT is a necessary part in the DRT's replication, fewer turns of the DRT's replication would be needed to synthesize definite copies of the functional peptide, thus saving the replication cost. Acting through adaptors, C-DRTs could transform into much shorter templates (called "proto-mRNAs" here) and substitute the role of DRTs, thus significantly saving the replication cost. A proto-rRNA corresponding to the small subunit rRNA would then emerge to aid the binding of proto-tRNAs and proto-mRNAs, allowing the reduction of base pairs between them (ultimately resulting in the triplet anticodon/codon pair), thus further saving the replication cost. In this context, the replication cost saved would allow the appearance of more and longer functional peptides and, finally, proteins. The hypothesis could be called "DRT-RP" ("RP" for "replication parsimony").
Testing the hypothesis
The scenario described here is open for experimental work at some key scenes, including the compact DRT mechanism, the development of adaptors from aa-aptamers, the synthesis of peptides by proto-tRNAs and proto-mRNAs without the participation of proto-rRNAs, etc. Interestingly, a recent computer simulation study has demonstrated the plausibility of one of the evolving processes driven by replication parsimony in the scenario.
Implication of the hypothesis
An RNA-based proto-translation system could arise gradually from the DRT mechanism according to the principle of "replication parsimony" --- to save the replication cost of RNA templates for functional peptides. A surprising side deduction along the logic of the hypothesis is that complex, biosynthetic amino acids might have entered the genetic code earlier than simple, prebiotic amino acids, which is opposite to the common sense. Overall, the present discussion clarifies the blurry scenario concerning the origin of translation with a major clue, which shows vividly how life could "manage" to exploit potential chemical resources in nature, eventually in an efficient way over evolution.
This article was reviewed by Eugene V. Koonin, Juergen Brosius, and Arcady Mushegian.
PMCID: PMC3002371  PMID: 21110883
7.  Modelling evolution on design-by-contract predicts an origin of Life through an abiotic double-stranded RNA world 
Biology Direct  2007;2:12.
It is generally believed that life first evolved from single-stranded RNA (ssRNA) that both stored genetic information and catalyzed the reactions required for self-replication.
Presentation of the hypothesis
By modeling early genome evolution on the engineering paradigm design-by-contract, an alternative scenario is presented in which life started with the appearance of double-stranded RNA (dsRNA) as an informational storage molecule while catalytic single-stranded RNA was derived from this dsRNA template later in evolution.
Testing the hypothesis
It was investigated whether this scenario could be implemented mechanistically by starting with abiotic processes. Double-stranded RNA could be formed abiotically by hybridization of oligoribonucleotides that are subsequently non-enzymatically ligated into a double-stranded chain. Thermal cycling driven by the diurnal temperature cycles could then replicate this dsRNA when strands of dsRNA separate and later rehybridize and ligate to reform dsRNA. A temperature-dependent partial replication of specific regions of dsRNA could produce the first template-based generation of catalytic ssRNA, similar to the developmental gene transcription process. Replacement of these abiotic processes by enzymatic processes would guarantee functional continuity. Further transition from a dsRNA to a dsDNA world could be based on minor mutations in template and substrate recognition sites of an RNA polymerase and would leave all existing processes intact.
Implications of the hypothesis
Modeling evolution on a design pattern, the 'dsRNA first' hypothesis can provide an alternative mechanistic evolutionary scenario for the origin of our genome that preserves functional continuity.
This article was reviewed by Anthony Poole, Eugene Koonin and Eugene Shakhnovich
PMCID: PMC1866227  PMID: 17466073
8.  Multilevel Selection in Models of Prebiotic Evolution II: A Direct Comparison of Compartmentalization and Spatial Self-Organization 
PLoS Computational Biology  2009;5(10):e1000542.
Multilevel selection has been indicated as an essential factor for the evolution of complexity in interacting RNA-like replicator systems. There are two types of multilevel selection mechanisms: implicit and explicit. For implicit multilevel selection, spatial self-organization of replicator populations has been suggested, which leads to higher level selection among emergent mesoscopic spatial patterns (traveling waves). For explicit multilevel selection, compartmentalization of replicators by vesicles has been suggested, which leads to higher level evolutionary dynamics among explicitly imposed mesoscopic entities (protocells). Historically, these mechanisms have been given separate consideration for the interests on its own. Here, we make a direct comparison between spatial self-organization and compartmentalization in simulated RNA-like replicator systems. Firstly, we show that both mechanisms achieve the macroscopic stability of a replicator system through the evolutionary dynamics on mesoscopic entities that counteract that of microscopic entities. Secondly, we show that a striking difference exists between the two mechanisms regarding their possible influence on the long-term evolutionary dynamics, which happens under an emergent trade-off situation arising from the multilevel selection. The difference is explained in terms of the difference in the stability between self-organized mesoscopic entities and externally imposed mesoscopic entities. Thirdly, we show that a sharp transition happens in the long-term evolutionary dynamics of the compartmentalized system as a function of replicator mutation rate. Fourthly, the results imply that spatial self-organization can allow the evolution of stable folding in parasitic replicators without any specific functionality in the folding itself. Finally, the results are discussed in relation to the experimental synthesis of chemical Darwinian systems and to the multilevel selection theory of evolutionary biology in general. To conclude, novel evolutionary directions can emerge through interactions between the evolutionary dynamics on multiple levels of organization. Different multilevel selection mechanisms can produce a difference in the long-term evolutionary trend of identical microscopic entities.
Author Summary
The origin of life has ever been attracting scientific inquiries. The RNA world hypothesis suggests that, before the evolution of DNA and protein, primordial life was based on RNA-like molecules both for information storage and chemical catalysis. In the simplest form, an RNA world consists of RNA molecules that can catalyze the replication of their own copies. Thus, an interesting question is whether a system of RNA-like replicators can increase its complexity through Darwinian evolution and approach the modern form of life. It is, however, known that simple natural selection acting on individual replicators is insufficient to account for the evolution of complexity due to the evolution of parasite-like templates. Two solutions have been suggested: compartmentalization of replicators by membranes (i.e., protocells) and spatial self-organization of a replicator population. Here, we make a direct comparison of the two suggestions by computer simulations. Our results show that the two suggestions can lead to unanticipated and contrasting consequences in the long-term evolution of replicating molecules. The results also imply a novel advantage in the spatial self-organization for the evolution of complexity in RNA-like replicator systems.
PMCID: PMC2757730  PMID: 19834556
9.  On the Origin of DNA Genomes: Evolution of the Division of Labor between Template and Catalyst in Model Replicator Systems 
PLoS Computational Biology  2011;7(3):e1002024.
The division of labor between template and catalyst is a fundamental property of all living systems: DNA stores genetic information whereas proteins function as catalysts. The RNA world hypothesis, however, posits that, at the earlier stages of evolution, RNA acted as both template and catalyst. Why would such division of labor evolve in the RNA world? We investigated the evolution of DNA-like molecules, i.e. molecules that can function only as template, in minimal computational models of RNA replicator systems. In the models, RNA can function as both template-directed polymerase and template, whereas DNA can function only as template. Two classes of models were explored. In the surface models, replicators are attached to surfaces with finite diffusion. In the compartment models, replicators are compartmentalized by vesicle-like boundaries. Both models displayed the evolution of DNA and the ensuing division of labor between templates and catalysts. In the surface model, DNA provides the advantage of greater resistance against parasitic templates. However, this advantage is at least partially offset by the disadvantage of slower multiplication due to the increased complexity of the replication cycle. In the compartment model, DNA can significantly delay the intra-compartment evolution of RNA towards catalytic deterioration. These results are explained in terms of the trade-off between template and catalyst that is inherent in RNA-only replication cycles: DNA releases RNA from this trade-off by making it unnecessary for RNA to serve as template and so rendering the system more resistant against evolving parasitism. Our analysis of these simple models suggests that the lack of catalytic activity in DNA by itself can generate a sufficient selective advantage for RNA replicator systems to produce DNA. Given the widespread notion that DNA evolved owing to its superior chemical properties as a template, this study offers a novel insight into the evolutionary origin of DNA.
Author Summary
At the core of all biological systems lies the division of labor between the storage of genetic information and its phenotypic implementation, in other words, the functional differentiation between templates (DNA) and catalysts (proteins). This fundamental property of life is believed to have been absent at the earliest stages of evolution. The RNA world hypothesis, the most realistic current scenario for the origin of life, posits that, in primordial replicating systems, RNA functioned both as template and as catalyst. How would such division of labor emerge through Darwinian evolution? We investigated the evolution of DNA-like molecules in minimal computational models of RNA replicator systems. Two models were considered: one where molecules are adsorbed on surfaces and another one where molecules are compartmentalized by dividing cellular boundaries. Both models exhibit the evolution of DNA and the ensuing division of labor, revealing the simple governing principle of these processes: DNA releases RNA from the trade-off between template and catalyst that is inevitable in the RNA world and thereby enhances the system's resistance against parasitic templates. Hence, this study offers a novel insight into the evolutionary origin of the division of labor between templates and catalysts in the RNA world.
PMCID: PMC3063752  PMID: 21455287
10.  How Amino Acids and Peptides Shaped the RNA World 
Life  2015;5(1):230-246.
The “RNA world” hypothesis is seen as one of the main contenders for a viable theory on the origin of life. Relatively small RNAs have catalytic power, RNA is everywhere in present-day life, the ribosome is seen as a ribozyme, and rRNA and tRNA are crucial for modern protein synthesis. However, this view is incomplete at best. The modern protein-RNA ribosome most probably is not a distorted form of a “pure RNA ribosome” evolution started out with. Though the oldest center of the ribosome seems “RNA only”, we cannot conclude from this that it ever functioned in an environment without amino acids and/or peptides. Very small RNAs (versatile and stable due to basepairing) and amino acids, as well as dipeptides, coevolved. Remember, it is the amino group of aminoacylated tRNA that attacks peptidyl-tRNA, destroying the bond between peptide and tRNA. This activity of the amino acid part of aminoacyl-tRNA illustrates the centrality of amino acids in life. With the rise of the “RNA world” view of early life, the pendulum seems to have swung too much towards the ribozymatic part of early biochemistry. The necessary presence and activity of amino acids and peptides is in need of highlighting. In this article, we try to bring the role of the peptide component of early life back into focus. We argue that an RNA world completely independent of amino acids never existed.
PMCID: PMC4390850  PMID: 25607813
coevolutionary theory; RNA world; prebiotic chemistry; evolution
11.  The cosmological model of eternal inflation and the transition from chance to biological evolution in the history of life 
Biology Direct  2007;2:15.
Recent developments in cosmology radically change the conception of the universe as well as the very notions of "probable" and "possible". The model of eternal inflation implies that all macroscopic histories permitted by laws of physics are repeated an infinite number of times in the infinite multiverse. In contrast to the traditional cosmological models of a single, finite universe, this worldview provides for the origin of an infinite number of complex systems by chance, even as the probability of complexity emerging in any given region of the multiverse is extremely low. This change in perspective has profound implications for the history of any phenomenon, and life on earth cannot be an exception.
Origin of life is a chicken and egg problem: for biological evolution that is governed, primarily, by natural selection, to take off, efficient systems for replication and translation are required, but even barebones cores of these systems appear to be products of extensive selection. The currently favored (partial) solution is an RNA world without proteins in which replication is catalyzed by ribozymes and which serves as the cradle for the translation system. However, the RNA world faces its own hard problems as ribozyme-catalyzed RNA replication remains a hypothesis and the selective pressures behind the origin of translation remain mysterious. Eternal inflation offers a viable alternative that is untenable in a finite universe, i.e., that a coupled system of translation and replication emerged by chance, and became the breakthrough stage from which biological evolution, centered around Darwinian selection, took off. A corollary of this hypothesis is that an RNA world, as a diverse population of replicating RNA molecules, might have never existed. In this model, the stage for Darwinian selection is set by anthropic selection of complex systems that rarely but inevitably emerge by chance in the infinite universe (multiverse).
The plausibility of different models for the origin of life on earth directly depends on the adopted cosmological scenario. In an infinite universe (multiverse), emergence of highly complex systems by chance is inevitable. Therefore, under this cosmology, an entity as complex as a coupled translation-replication system should be considered a viable breakthrough stage for the onset of biological evolution.
This article was reviewed by Eric Bapteste, David Krakauer, Sergei Maslov, and Itai Yanai.
PMCID: PMC1892545  PMID: 17540027
12.  The Origins of the RNA World 
The general notion of an “RNA World” is that, in the early development of life on the Earth, genetic continuity was assured by the replication of RNA and genetically encoded proteins were not involved as catalysts. There is now strong evidence indicating that an RNA World did indeed exist before DNA- and protein-based life. However, arguments regarding whether life on Earth began with RNA are more tenuous. It might be imagined that all of the components of RNA were available in some prebiotic pool, and that these components assembled into replicating, evolving polynucleotides without the prior existence of any evolved macromolecules. A thorough consideration of this “RNA-first” view of the origin of life must reconcile concerns regarding the intractable mixtures that are obtained in experiments designed to simulate the chemistry of the primitive Earth. Perhaps these concerns will eventually be resolved, and recent experimental findings provide some reason for optimism. However, the problem of the origin of the RNA World is far from being solved, and it is fruitful to consider the alternative possibility that RNA was preceded by some other replicating, evolving molecule, just as DNA and proteins were preceded by RNA.
RNA probably preceded DNA as the genetic material. RNA in this “RNA world” may have arisen by prebiotic catalysis or from an earlier “RNA-like world” based on p-RNA or PNA.
PMCID: PMC3331698  PMID: 20739415
13.  On origin of genetic code and tRNA before translation 
Biology Direct  2011;6:14.
Synthesis of proteins is based on the genetic code - a nearly universal assignment of codons to amino acids (aas). A major challenge to the understanding of the origins of this assignment is the archetypal "key-lock vs. frozen accident" dilemma. Here we re-examine this dilemma in light of 1) the fundamental veto on "foresight evolution", 2) modular structures of tRNAs and aminoacyl-tRNA synthetases, and 3) the updated library of aa-binding sites in RNA aptamers successfully selected in vitro for eight amino acids.
The aa-binding sites of arginine, isoleucine and tyrosine contain both their cognate triplets, anticodons and codons. We have noticed that these cases might be associated with palindrome-dinucleotides. For example, one-base shift to the left brings arginine codons CGN, with CG at 1-2 positions, to the respective anticodons NCG, with CG at 2-3 positions. Formally, the concomitant presence of codons and anticodons is also expected in the reverse situation, with codons containing palindrome-dinucleotides at their 2-3 positions, and anticodons exhibiting them at 1-2 positions. A closer analysis reveals that, surprisingly, RNA binding sites for Arg, Ile and Tyr "prefer" (exactly as in the actual genetic code) the anticodon(2-3)/codon(1-2) tetramers to their anticodon(1-2)/codon(2-3) counterparts, despite the seemingly perfect symmetry of the latter. However, since in vitro selection of aa-specific RNA aptamers apparently had nothing to do with translation, this striking preference provides a new strong support to the notion of the genetic code emerging before translation, in response to catalytic (and possibly other) needs of ancient RNA life. Consistently with the pre-translation origin of the code, we propose here a new model of tRNA origin by the gradual, Fibonacci process-like, elongation of a tRNA molecule from a primordial coding triplet and 5'DCCA3' quadruplet (D is a base-determinator) to the eventual 76 base-long cloverleaf-shaped molecule.
Taken together, our findings necessarily imply that primordial tRNAs, tRNA aminoacylating ribozymes, and (later) the translation machinery in general have been co-evolving to ''fit'' the (likely already defined) genetic code, rather than the opposite way around. Coding triplets in this primal pre-translational code were likely similar to the anticodons, with second and third nucleotides being more important than the less specific first one. Later, when the code was expanding in co-evolution with the translation apparatus, the importance of 2-3 nucleotides of coding triplets "transferred" to the 1-2 nucleotides of their complements, thus distinguishing anticodons from codons. This evolutionary primacy of anticodons in genetic coding makes the hypothesis of primal stereo-chemical affinity between amino acids and cognate triplets, the hypothesis of coding coenzyme handles for amino acids, the hypothesis of tRNA-like genomic 3' tags suggesting that tRNAs originated in replication, and the hypothesis of ancient ribozymes-mediated operational code of tRNA aminoacylation not mutually contradicting but rather co-existing in harmony.
This article was reviewed by Eugene V. Koonin, Wentao Ma (nominated by Juergen Brosius) and Anthony Poole.
PMCID: PMC3050877  PMID: 21342520
14.  mRNA turnover rate limits siRNA and microRNA efficacy 
Based on a simple model of the mRNA life cycle, we predict that mRNAs with high turnover rates in the cell are more difficult to perturb with RNAi.We test this hypothesis using a luciferase reporter system and obtain additional evidence from a variety of large-scale data sets, including microRNA overexpression experiments and RT–qPCR-based efficacy measurements for thousands of siRNAs.Our results suggest that mRNA half-lives will influence how mRNAs are differentially perturbed whenever small RNA levels change in the cell, not only after transfection but also during differentiation, pathogenesis and normal cell physiology.
What determines how strongly an mRNA responds to a microRNA or an siRNA? We know that properties of the sequence match between the small RNA and the mRNA are crucial. However, large-scale validations of siRNA efficacies have shown that certain transcripts remain recalcitrant to perturbation even after repeated redesign of the siRNA (Krueger et al, 2007). Weak response to RNAi may thus be an inherent property of the mRNA, but the underlying factors have proven difficult to uncover.
siRNAs induce degradation by sequence-specific cleavage of their target mRNAs (Elbashir et al, 2001). MicroRNAs, too, induce mRNA degradation, and ∼80% of their effect on protein levels can be explained by changes in transcript abundance (Hendrickson et al, 2009; Guo et al, 2010). Given that multiple factors act simultaneously to degrade individual mRNAs, we here consider whether variable responses to micro/siRNA regulation may, in part, be explained simply by the basic dynamics of mRNA turnover. If a transcript is already under strong destabilizing regulation, it is theoretically possible that the relative change in abundance after the addition of a novel degrading factor would be less pronounced compared with a stable transcript (Figure 1). mRNA turnover is achieved by a multitude of factors, and the influence of such factors on targetability can be explored. However, their combined action, including yet unknown factors, is summarized into a single property: the mRNA decay rate.
First, we explored the theoretical relationship between the pre-existing turnover rate of an mRNA, and its expected susceptibility to perturbation by a small RNA. We assumed a basic model of the mRNA life cycle, in which the rate of transcription is constant and the rate of degradation is described by first-order kinetics. Under this model, the relative change in steady-state expression level will become smaller as the pre-existing decay rate grows larger, independent of the transcription rate. This relationship persists also if we assume various degrees of synergy and antagonism between the pre-existing factors and the external factor, with increasing synergism leading to transcripts being more equally targetable, regardless of their pre-existing decay rate.
We next generated a series of four luciferase reporter constructs with destabilizing AU-rich elements (AREs) of various strengths incorporated into their 3′ UTRs. To evaluate how the different constructs would respond to perturbation, we performed co-transfections with an siRNA targeted at the coding region of the luciferase gene. This reduced the signal of the non-destabilized construct to 26% compared with a control siRNA. In contrast, the most destabilized construct showed 42% remaining reporter activity, and we could observe a dose–response relationship across the series.
The reporter experiment encouraged an investigation of this effect on real-world mRNAs. We analyzed a set of 2622 siRNAs, for which individual efficacies were determined using RT–qPCR 48 h post-transfection in HeLa cells ( Of these, 1778 could be associated with an experimentally determined decay rate (Figure 4A). Although the overall correlation between the two variables was modest (Spearman's rank correlation rs=0.22, P<1e−20), we found that siRNAs directed at high-turnover (t1/2<200 min) and medium-turnover (2001000 min) transcripts (P<8e−11 and 4e−9, respectively, two-tailed KS-test, Figure 4B). While 41.6% (498/1196) of the siRNAs directed at low-turnover transcripts reached 10% remaining expression or better, only 16.7% (31/186) of the siRNAs that targeted high-turnover mRNAs reached this high degree of silencing (Figure 4B). Reduced targetability (25.2%, 100/396) was also seen for transcripts with medium-turnover rate.
Our results based on siRNA data suggested that turnover rates could also influence microRNA targeting. By assembling genome-wide mRNA expression data from 20 published microRNA transfections in HeLa cells, we found that predicted target mRNAs with short and medium half-life were significantly less repressed after transfection than their long-lived counterparts (P<8e−5 and P<0.03, respectively, two-tailed KS-test). Specifically, 10.2% (293/2874) of long-lived targets versus 4.4% (41/942) of short-lived targets were strongly (z-score <−3) repressed. siRNAs are known to cause off-target effects that are mediated, in part, by microRNA-like seed complementarity (Jackson et al, 2006). We analyzed changes in transcript levels after transfection of seven different siRNAs, each with a unique seed region (Jackson et al, 2006). Putative ‘off-targets' were identified by mapping of non-conserved seed matches in 3′ UTRs. We found that low-turnover mRNAs (t1/2 >1000 min) were more affected by seed-mediated off-target silencing than high-turnover mRNAs (t1/2 <200 min), with twice as many long-lived seed-containing transcripts (3.8 versus 1.9%) being strongly (z-score <−3) repressed.
In summary, mRNA turnover rates have an important influence on the changes exerted by small RNAs on mRNA levels. It can be assumed that mRNA half-lives will influence how mRNAs are differentially perturbed whenever small RNA levels change in the cell, not only after transfection but also during differentiation, pathogenesis and normal cell physiology.
The microRNA pathway participates in basic cellular processes and its discovery has enabled the development of si/shRNAs as powerful investigational tools and potential therapeutics. Based on a simple kinetic model of the mRNA life cycle, we hypothesized that mRNAs with high turnover rates may be more resistant to RNAi-mediated silencing. The results of a simple reporter experiment strongly supported this hypothesis. We followed this with a genome-wide scale analysis of a rich corpus of experiments, including RT–qPCR validation data for thousands of siRNAs, siRNA/microRNA overexpression data and mRNA stability data. We find that short-lived transcripts are less affected by microRNA overexpression, suggesting that microRNA target prediction would be improved if mRNA turnover rates were considered. Similarly, short-lived transcripts are more difficult to silence using siRNAs, and our results may explain why certain transcripts are inherently recalcitrant to perturbation by small RNAs.
PMCID: PMC3010119  PMID: 21081925
microRNA; mRNA decay; RNAi; siRNA
15.  The evolution and functional repertoire of translation proteins following the origin of life 
Biology Direct  2010;5:15.
The RNA world hypothesis posits that the earliest genetic system consisted of informational RNA molecules that directed the synthesis of modestly functional RNA molecules. Further evidence suggests that it was within this RNA-based genetic system that life developed the ability to synthesize proteins by translating genetic code. Here we investigate the early development of the translation system through an evolutionary survey of protein architectures associated with modern translation.
Our analysis reveals a structural expansion of translation proteins immediately following the RNA world and well before the establishment of the DNA genome. Subsequent functional annotation shows that representatives of the ten most ancestral protein architectures are responsible for all of the core protein functions found in modern translation.
We propose that this early robust translation system evolved by virtue of a positive feedback cycle in which the system was able to create increasingly complex proteins to further enhance its own function.
This article was reviewed by Janet Siefert, George Fox, and Antonio Lazcano (nominated by Laura Landweber)
PMCID: PMC2873265  PMID: 20377891
16.  RNase MRP and the RNA processing cascade in the eukaryotic ancestor 
BMC Evolutionary Biology  2007;7(Suppl 1):S13.
Within eukaryotes there is a complex cascade of RNA-based macromolecules that process other RNA molecules, especially mRNA, tRNA and rRNA. An example is RNase MRP processing ribosomal RNA (rRNA) in ribosome biogenesis. One hypothesis is that this complexity was present early in eukaryotic evolution; an alternative is that an initial simpler network later gained complexity by gene duplication in lineages that led to animals, fungi and plants. Recently there has been a rapid increase in support for the complexity-early theory because the vast majority of these RNA-processing reactions are found throughout eukaryotes, and thus were likely to be present in the last common ancestor of living eukaryotes, herein called the Eukaryotic Ancestor.
We present an overview of the RNA processing cascade in the Eukaryotic Ancestor and investigate in particular, RNase MRP which was previously thought to have evolved later in eukaryotes due to its apparent limited distribution in fungi and animals and plants. Recent publications, as well as our own genomic searches, find previously unknown RNase MRP RNAs, indicating that RNase MRP has a wide distribution in eukaryotes. Combining secondary structure and promoter region analysis of RNAs for RNase MRP, along with analysis of the target substrate (rRNA), allows us to discuss this distribution in the light of eukaryotic evolution.
We conclude that RNase MRP can now be placed in the RNA-processing cascade of the Eukaryotic Ancestor, highlighting the complexity of RNA-processing in early eukaryotes. Promoter analyses of MRP-RNA suggest that regulation of the critical processes of rRNA cleavage can vary, showing that even these key cellular processes (for which we expect high conservation) show some species-specific variability. We present our consensus MRP-RNA secondary structure as a useful model for further searches.
PMCID: PMC1796607  PMID: 17288571
17.  Human miRNA Precursors with Box H/ACA snoRNA Features 
PLoS Computational Biology  2009;5(9):e1000507.
MicroRNAs (miRNAs) and small nucleolar RNAs (snoRNAs) are two classes of small non-coding regulatory RNAs, which have been much investigated in recent years. While their respective functions in the cell are distinct, they share interesting genomic similarities, and recent sequencing projects have identified processed forms of snoRNAs that resemble miRNAs. Here, we investigate a possible evolutionary relationship between miRNAs and box H/ACA snoRNAs. A comparison of the genomic locations of reported miRNAs and snoRNAs reveals an overlap of specific members of these classes. To test the hypothesis that some miRNAs might have evolved from snoRNA encoding genomic regions, reported miRNA-encoding regions were scanned for the presence of box H/ACA snoRNA features. Twenty miRNA precursors show significant similarity to H/ACA snoRNAs as predicted by snoGPS. These include molecules predicted to target known ribosomal RNA pseudouridylation sites in vivo for which no guide snoRNA has yet been reported. The predicted folded structures of these twenty H/ACA snoRNA-like miRNA precursors reveal molecules which resemble the structures of known box H/ACA snoRNAs. The genomic regions surrounding these predicted snoRNA-like miRNAs are often similar to regions around snoRNA retroposons, including the presence of transposable elements, target site duplications and poly (A) tails. We further show that the precursors of five H/ACA snoRNA-like miRNAs (miR-151, miR-605, mir-664, miR-215 and miR-140) bind to dyskerin, a specific protein component of functional box H/ACA small nucleolar ribonucleoprotein complexes suggesting that these molecules have retained some H/ACA snoRNA functionality. The detection of small RNA molecules that share features of miRNAs and snoRNAs suggest that these classes of RNA may have an evolutionary relationship.
Author Summary
The major functions known for RNA were long believed to be either messenger RNAs, which function as intermediates between genes and proteins, or ribosomal RNAs and transfer RNAs which carry out the translation process. In recent years, however, newly discovered classes of small RNAs have been shown to play important cellular roles. These include microRNAs (miRNAs), which can regulate the production of specific proteins, and small nucleolar RNAs (snoRNAs), which recognise and chemically modify specific sequences in ribosomal RNA. Although miRNAs and snoRNAs are currently believed to be generated by different cellular pathways and to function in different cellular compartments, members of these two types of small RNAs display numerous genomic similarities, and a small number of snoRNAs have been shown to encode miRNAs in several organisms. Here we systematically investigate a possible evolutionary relationship between snoRNAs and miRNAs. Using computational analysis, we identify twenty genomic regions encoding miRNAs with highly significant similarity to snoRNAs, both on the level of their surrounding genomic context as well as their predicted folded structure. A subset of these miRNAs display functional snoRNA characteristics, strengthening the possibility that these miRNA molecules might have evolved from snoRNAs.
PMCID: PMC2730528  PMID: 19763159
18.  Small Cofactors May Assist Protein Emergence from RNA World: Clues from RNA-Protein Complexes 
PLoS ONE  2011;6(7):e22494.
It is now widely accepted that at an early stage in the evolution of life an RNA world arose, in which RNAs both served as the genetic material and catalyzed diverse biochemical reactions. Then, proteins have gradually replaced RNAs because of their superior catalytic properties in catalysis over time. Therefore, it is important to investigate how primitive functional proteins emerged from RNA world, which can shed light on the evolutionary pathway of life from RNA world to the modern world. In this work, we proposed that the emergence of most primitive functional proteins are assisted by the early primitive nucleotide cofactors, while only a minority are induced directly by RNAs based on the analysis of RNA-protein complexes. Furthermore, the present findings have significant implication for exploring the composition of primitive RNA, i.e., adenine base as principal building blocks.
PMCID: PMC3138788  PMID: 21789260
19.  Comparative genomics of eukaryotic small nucleolar RNAs reveals deep evolutionary ancestry amidst ongoing intragenomic mobility 
Small nucleolar (sno)RNAs are required for posttranscriptional processing and modification of ribosomal, spliceosomal and messenger RNAs. Their presence in both eukaryotes and archaea indicates that snoRNAs are evolutionarily ancient. The location of some snoRNAs within the introns of ribosomal protein genes has been suggested to belie an RNA world origin, with the exons of the earliest protein-coding genes having evolved around snoRNAs after the advent of templated protein synthesis. Alternatively, this intronic location may reflect more recent selection for coexpression of snoRNAs and ribosomal components, ensuring rRNA modification by snoRNAs during ribosome synthesis. To gain insight into the evolutionary origins of this genetic organization, we examined the antiquity of snoRNA families and the stability of their genomic location across 44 eukaryote genomes.
We report that dozens of snoRNA families are traceable to the Last Eukaryotic Common Ancestor (LECA), but find only weak similarities between the oldest eukaryotic snoRNAs and archaeal snoRNA-like genes. Moreover, many of these LECA snoRNAs are located within the introns of host genes independently traceable to the LECA. Comparative genomic analyses reveal the intronic location of LECA snoRNAs is not ancestral however, suggesting the pattern we observe is the result of ongoing intragenomic mobility. Analysis of human transcriptome data indicates that the primary requirement for hosting intronic snoRNAs is a broad expression profile. Consistent with ongoing mobility across broadly-expressed genes, we report a case of recent migration of a non-LECA snoRNA from the intron of a ubiquitously expressed non-LECA host gene into the introns of two LECA genes during the evolution of primates.
Our analyses show that snoRNAs were a well-established family of RNAs at the time when eukaryotes began to diversify. While many are intronic, this association is not evolutionarily stable across the eukaryote tree; ongoing intragenomic mobility has erased signal of their ancestral gene organization, and neither introns-first nor evolved co-expression adequately explain our results. We therefore present a third model — constrained drift — whereby individual snoRNAs are intragenomically mobile and may occupy any genomic location from which expression satisfies phenotype.
PMCID: PMC3511168  PMID: 22978381
snoRNA; Last Eukaryotic Common Ancestor; Intron; Retrotransposition; Introns-first; Constrained drift
20.  The catalytic mechanism of hairpin ribozyme studied by hydrostatic pressure 
Nucleic Acids Research  2005;33(8):2557-2564.
The discovery of ribozymes strengthened the RNA world hypothesis, which assumes that these precursors of modern life both stored information and acted as catalysts. For the first time among extensive studies on ribozymes, we have investigated the influence of hydrostatic pressure on the hairpin ribozyme catalytic activity. High pressures are of interest when studying life under extreme conditions and may help to understand the behavior of macromolecules at the origins of life. Kinetic studies of the hairpin ribozyme self-cleavage were performed under high hydrostatic pressure. The activation volume of the reaction (34 ± 5 ml/mol) calculated from these experiments is of the same order of magnitude as those of common protein enzymes, and reflects an important compaction of the RNA molecule during catalysis, associated to a water release. Kinetic studies were also carried out under osmotic pressure and confirmed this interpretation and the involvement of water movements (78 ± 4 water molecules per RNA molecule). Taken together, these results are consistent with structural studies indicating that loops A and B of the ribozyme come into close contact during the formation of the transition state. While validating baro-biochemistry as an efficient tool for investigating dynamics at work during RNA catalysis, these results provide a complementary view of ribozyme catalytic mechanisms.
PMCID: PMC1088306  PMID: 15870387
21.  On the origin of the translation system and the genetic code in the RNA world by means of natural selection, exaptation, and subfunctionalization 
Biology Direct  2007;2:14.
The origin of the translation system is, arguably, the central and the hardest problem in the study of the origin of life, and one of the hardest in all evolutionary biology. The problem has a clear catch-22 aspect: high translation fidelity hardly can be achieved without a complex, highly evolved set of RNAs and proteins but an elaborate protein machinery could not evolve without an accurate translation system. The origin of the genetic code and whether it evolved on the basis of a stereochemical correspondence between amino acids and their cognate codons (or anticodons), through selectional optimization of the code vocabulary, as a "frozen accident" or via a combination of all these routes is another wide open problem despite extensive theoretical and experimental studies. Here we combine the results of comparative genomics of translation system components, data on interaction of amino acids with their cognate codons and anticodons, and data on catalytic activities of ribozymes to develop conceptual models for the origins of the translation system and the genetic code.
Our main guide in constructing the models is the Darwinian Continuity Principle whereby a scenario for the evolution of a complex system must consist of plausible elementary steps, each conferring a distinct advantage on the evolving ensemble of genetic elements. Evolution of the translation system is envisaged to occur in a compartmentalized ensemble of replicating, co-selected RNA segments, i.e., in a RNA World containing ribozymes with versatile activities. Since evolution has no foresight, the translation system could not evolve in the RNA World as the result of selection for protein synthesis and must have been a by-product of evolution drive by selection for another function, i.e., the translation system evolved via the exaptation route. It is proposed that the evolutionary process that eventually led to the emergence of translation started with the selection for ribozymes binding abiogenic amino acids that stimulated ribozyme-catalyzed reactions. The proposed scenario for the evolution of translation consists of the following steps: binding of amino acids to a ribozyme resulting in an enhancement of its catalytic activity; evolution of the amino-acid-stimulated ribozyme into a peptide ligase (predecessor of the large ribosomal subunit) yielding, initially, a unique peptide activating the original ribozyme and, possibly, other ribozymes in the ensemble; evolution of self-charging proto-tRNAs that were selected, initially, for accumulation of amino acids, and subsequently, for delivery of amino acids to the peptide ligase; joining of the peptide ligase with a distinct RNA molecule (predecessor of the small ribosomal subunit) carrying a built-in template for more efficient, complementary binding of charged proto-tRNAs; evolution of the ability of the peptide ligase to assemble peptides using exogenous RNAs as template for complementary binding of charged proteo-tRNAs, yielding peptides with the potential to activate different ribozymes; evolution of the translocation function of the protoribosome leading to the production of increasingly longer peptides (the first proteins), i.e., the origin of translation. The specifics of the recognition of amino acids by proto-tRNAs and the origin of the genetic code depend on whether or not there is a physical affinity between amino acids and their cognate codons or anticodons, a problem that remains unresolved.
We describe a stepwise model for the origin of the translation system in the ancient RNA world such that each step confers a distinct advantage onto an ensemble of co-evolving genetic elements. Under this scenario, the primary cause for the emergence of translation was the ability of amino acids and peptides to stimulate reactions catalyzed by ribozymes. Thus, the translation system might have evolved as the result of selection for ribozymes capable of, initially, efficient amino acid binding, and subsequently, synthesis of increasingly versatile peptides. Several aspects of this scenario are amenable to experimental testing.
This article was reviewed by Rob Knight, Doron Lancet, Alexander Mankin (nominated by Arcady Mushegian), and Arcady Mushegian.
PMCID: PMC1894784  PMID: 17540026
22.  Protecting exons from deleterious R-loops: a potential advantage of having introns 
Biology Direct  2007;2:11.
Accumulating evidence indicates that the nascent RNA can invade and pair with one strand of DNA, forming an R-loop structure that threatens the stability of the genome. In addition, the cost and benefit of introns are still in debate.
At least three factors are likely required for the R-loop formation: 1) sequence complementarity between the nascent RNA and the target DNA, 2) spatial juxtaposition between the nascent RNA and the template DNA, and 3) accessibility of the template DNA and the nascent RNA. The removal of introns from pre-mRNA reduces the complementarity between RNA and the template DNA and avoids the spatial juxtaposition between the nascent RNA and the template DNA. In addition, the secondary structures of group I and group II introns may act as spatial obstacles for the formation of R-loops between nearby exons and the genomic DNA.
Organisms may benefit from introns by avoiding deleterious R-loops. The potential contribution of this benefit in driving intron evolution is discussed. I propose that additional RNA polymerases may inhibit R-loop formation between preceding nascent RNA and the template DNA. This idea leads to a testable prediction: intermittently transcribed genes and genes with frequently prolonged transcription should have higher intron density.
This article was reviewed by Dr. Eugene V. Koonin, Dr. Alexei Fedorov (nominated by Dr. Laura F Landweber), and Dr. Scott W. Roy (nominated by Dr. Arcady Mushegian).
PMCID: PMC1863416  PMID: 17459149
23.  Pseudo-Replication of [GADV]-Proteins and Origin of Life 
The RNA world hypothesis on the origin of life is generally considered as the key to solve the “chicken and egg dilemma” concerning the evolution of genes and proteins as observed in the modern organisms. This hypothesis, however, contains several serious weak points. We have a counterproposal called [GADV]-protein world hypothesis, abbreviated as GADV hypothesis, in which we have suggested that life originated from a [GADV]-protein world, which comprised proteins composed of four amino acids: Gly [G], Ala [A], Asp [D], and Val [V]. A new concept “pseudo-replication” is crucial for the description of the emergence of life. The new hypothesis not only plausibly explains how life originated from the initial chaotic protein world, but also how genes, genetic code, and proteins co-evolved.
PMCID: PMC2680631  PMID: 19468323
GADV hypothesis; pseudo-replication; [GADV]-protein world; origin of life
24.  Evolution of the Division of Labor between Genes and Enzymes in the RNA World 
PLoS Computational Biology  2014;10(12):e1003936.
The RNA world is a very likely interim stage of the evolution after the first replicators and before the advent of the genetic code and translated proteins. Ribozymes are known to be able to catalyze many reaction types, including cofactor-aided metabolic transformations. In a metabolically complex RNA world, early division of labor between genes and enzymes could have evolved, where the ribozymes would have been transcribed from the genes more often than the other way round, benefiting the encapsulating cells through this dosage effect. Here we show, by computer simulations of protocells harboring unlinked RNA replicators, that the origin of replicational asymmetry producing more ribozymes from a gene template than gene strands from a ribozyme template is feasible and robust. Enzymatic activities of the two modeled ribozymes are in trade-off with their replication rates, and the relative replication rates compared to those of complementary strands are evolvable traits of the ribozymes. The degree of trade-off is shown to have the strongest effect in favor of the division of labor. Although some asymmetry between gene and enzymatic strands could have evolved even in earlier, surface-bound systems, the shown mechanism in protocells seems inevitable and under strong positive selection. This could have preadapted the genetic system for transcription after the subsequent origin of chromosomes and DNA.
Author Summary
The RNA world refers to the stage of early evolution when RNA macromolecules were responsible both for storing hereditary information and performing enzymatic activities. Conflict arises between these two functions, however, as enzymatic activities of the ribozymes are in tradeoff with their replication rates. Here we address this problem by investigating the evolutionary emergence of a primordial transcription-like system in model protocells inhabited by unlinked replicators. Our numerical analysis demonstrates that division of labor between genes and enzymes could have emerged, given that there was a moderate to strong tradeoff between the enzymatic and template efficiency of one strand of the ribozymes. This division of labor results in a strong asymmetry in the numbers of the enzymatic and genetic strands of the macromolecules, in favor of the former. We offer insight into the emergence of the first transcription-like system, which is today characteristic of all known life forms.
PMCID: PMC4256009  PMID: 25474573
25.  tRNA Signatures Reveal a Polyphyletic Origin of SAR11 Strains among Alphaproteobacteria 
PLoS Computational Biology  2014;10(2):e1003454.
Molecular phylogenetics and phylogenomics are subject to noise from horizontal gene transfer (HGT) and bias from convergence in macromolecular compositions. Extensive variation in size, structure and base composition of alphaproteobacterial genomes has complicated their phylogenomics, sparking controversy over the origins and closest relatives of the SAR11 strains. SAR11 are highly abundant, cosmopolitan aquatic Alphaproteobacteria with streamlined, A+T-biased genomes. A dominant view holds that SAR11 are monophyletic and related to both Rickettsiales and the ancestor of mitochondria. Other studies dispute this, finding evidence of a polyphyletic origin of SAR11 with most strains distantly related to Rickettsiales. Although careful evolutionary modeling can reduce bias and noise in phylogenomic inference, entirely different approaches may be useful to extract robust phylogenetic signals from genomes. Here we develop simple phyloclassifiers from bioinformatically derived tRNA Class-Informative Features (CIFs), features predicted to target tRNAs for specific interactions within the tRNA interaction network. Our tRNA CIF-based model robustly and accurately classifies alphaproteobacterial genomes into one of seven undisputed monophyletic orders or families, despite great variability in tRNA gene complement sizes and base compositions. Our model robustly rejects monophyly of SAR11, classifying all but one strain as Rhizobiales with strong statistical support. Yet remarkably, conventional phylogenetic analysis of tRNAs classifies all SAR11 strains identically as Rickettsiales. We attribute this discrepancy to convergence of SAR11 and Rickettsiales tRNA base compositions. Thus, tRNA CIFs appear more robust to compositional convergence than tRNA sequences generally. Our results suggest that tRNA-CIF-based phyloclassification is robust to HGT of components of the tRNA interaction network, such as aminoacyl-tRNA synthetases. We explain why tRNAs are especially advantageous for prediction of traits governing macromolecular interactions from genomic data, and why such traits may be advantageous in the search for robust signals to address difficult problems in classification and phylogeny.
Author Summary
If gene products work well in the networks of foreign cells, their genes may transfer horizontally between unrelated genomes. What factors dictate the ability to integrate into foreign networks? Different RNAs and proteins must interact specifically in order to function well as a system. For example, tRNA functions are determined by the interactions they have with other macromolecules. We have developed ways to predict, from genomic data alone, how tRNAs distinguish themselves to their specific interaction partners. Here, as proof of concept, we built a robust computational model from these bioinformatic predictions in seven lineages of Alphaproteobacteria. We validated our model by classifying hundreds of diverse alphaproteobacterial taxa and tested it on eight strains of SAR11, a phylogenetically controversial group that is highly abundant in the world's oceans. We found that different strains of SAR11 are more distantly related, both to each other and to mitochondria, than widely believed. We explain conflicting results about SAR11 as an artifact of bias created by the variability in base contents of alphaproteobacterial genomes. While this bias affects tRNAs too, our classifier appears unexpectedly robust to it. More broadly, our results suggest that traits governing macromolecular interactions may be more faithfully vertically inherited than the macromolecules themselves.
PMCID: PMC3937112  PMID: 24586126

