|Home | About | Journals | Submit | Contact Us | Français|
Life is based on replication and evolution. But replication cannot be taken for granted. We must ask what there was prior to replication and evolution. How does evolution begin? We have proposed prelife as a generative system that produces information and diversity in the absence of replication. We model prelife as a binary soup of active monomers that form random polymers. ‘Prevolutionary’ dynamics can have mutation and selection prior to replication. Some sequences might have catalytic activity, thereby enhancing the rates of certain prelife reactions. We study the selection criteria for these prelife catalysts. Their catalytic efficiency must be above certain critical values. We find a maintenance threshold and an initiation threshold. The former is a linear function of sequence length, and the latter is an exponential function of sequence length. Therefore, it is extremely hard to select for prelife catalysts that have long sequences. We compare prelife catalysis with a simple model for replication. Assuming fast template-based elongation reactions, we can show that replicators have selection thresholds that are independent of their sequence length. Our calculation demonstrates the efficiency of replication and provides an explanation of why replication was selected over other forms of prelife catalysis.
The defining feature of biological systems is evolution. Biological organisms are products of evolutionary processes and are capable of undergoing further evolution. We think of the evolutionary process as modifying the traits of living systems. But how does evolution get started? How can we formulate a dynamical system that leads to the origin of evolution? What is there just before evolution begins? This paper is an extension of earlier work that tries to approach such questions (Nowak & Ohtsuki 2008; Manapat et al. 2009). In these papers, we have defined ‘prelife’ as a chemical system that can lead to information and diversity and that is capable of selection and mutation, but does not yet have replication. We have modelled prelife as a soup of active monomers, which can give rise to polymers. Here, we assume that some polymers have catalytic activity: they increase the rate of certain reactions in prelife. We study the criteria for the selection of prelife catalysts. We compare prelife catalysts with replicators, which have the ability to make copies of themselves.
The origin of life is a transition from chemistry to biology. There have been many theoretical and empirical studies concerning the origin of life (Oparin 1953; Crick 1968; Orgel 1968, 1992; Eigen 1971; Dyson 1982, 1999; Eigen & Schuster 1982; Kuppers 1983; Stein & Anderson 1984; Farmer et al. 1986; Szathmary & Demeter 1987; Sievers & von Kiedrowski 1994; Fontana & Schuster 1998; Luther et al. 1998; Lifson & Lifson 1999; de Duve 2005, 2007). One line of research attempts to understand how chemical processes on early Earth can spontaneously synthesize the basic building blocks of life (Miller 1953; Allen & Ponnamperuma 1967; Miller & Orgel 1974; Hargreaves et al. 1977; Rao et al. 1982; Rushdi & Simoneit 2001; Benner et al. 2002; Ricardo et al. 2004; Benner & Ricardo 2005; Wächtershäuser 2007). RNA has the ability to store genetic information and catalyse chemical reactions. Therefore, the proposal has been made that early life existed in an ‘RNA world’ (Orgel 1986; Joyce 1989, 2002; Ellington & Szostak 1990; Cech 1993; Johnston et al. 2001; Steitz & Moore 2003; Hughes et al. 2004). Bartel & Szostak (1993) discovered an RNA sequence that can catalyse RNA polymerization.
Some critics, however, argue that RNA is too complicated and fragile to arise spontaneously and that the origin of life must have been based on simpler molecules, metabolic networks or compositional genomes (Shapiro 1984, 2006, 2007; Kauffman 1986; Morowitz et al. 1988; Segre et al. 1998, 2000). Sometimes this debate is called ‘RNA first’ versus ‘metabolism first’. Our own position is the following. All currently known biological organisms use RNA or DNA. At some time, such a system must have evolved. Therefore, it is a valid programme to investigate the principles that govern the emergence of a biological polymer that carries information. When this event took place, complicated chemical cycles must have been present, which generate the compounds needed for the biological polymers. In this sense, metabolism first is certainly true, but an RNA-like system is needed for the emergence of genetic evolution.
A crucial step in the origin of life is the formation of the first cell (Szostak et al. 2001; Hanczyc et al. 2003; Chen & Szostak 2004a,b; Chen et al. 2004, 2005; Chen 2006). Fatty acids are simple molecules that can be synthesized under prebiotic conditions. They can self-assemble into bilayer vesicles, which can undergo growth and division. A decisive question is whether cells preceded information carrying polymers or vice versa. In the context of our theory, the ordering of these two events affects the population structure. If polymers came first, then their emergence can be studied in well-mixed populations. If cells came first, then the emergence of polymers should be studied in structured meta-populations containing ensembles of dividing subpopulations. From the perspective of mathematical analysis, the logical first step is to study well-mixed populations (as we will do here) and later move to evolutionary dynamics in structured populations (Nowak & May 1992; Rousset 2004; Traulsen & Nowak 2006; Ohtsuki et al. 2006; Taylor et al. 2007; Tarnita et al. 2009).
Eigen & Schuster (1977, 1979) developed a hugely influential molecular theory of chemical evolution. Their quasi-species theory studies the competition of different replicators (McCaskill 1984; Eigen et al. 1989; Nowak & Schuster 1989; Nowak 1992). Hypercycles are cooperative interactions between two or more replicators. By contrast, our theory of prelife does not begin with the presence of replicators; instead, we study mutation and selection prior to replication (Nowak & Ohtsuki 2008; Manapat et al. 2009). Therefore, we study the origin of evolution and the competition between life (which is based on replication) and prelife (chemistry without replication). Fontana & Buss (1994a,b) use λ calculus to study a generative chemistry with and without replication.
This paper is structured as follows. In §2, we present prelife and fully symmetric prelife. In §3, we discuss partial and perfect prelife catalysts. They give rise to hysteresis (bistability). In §4, we discuss a simple replicator. A brief summary of our findings is given in §5.
We consider two types of activated monomers: 0* and 1*. They are produced by prebiotic chemistry, and they decay at certain rates. They can also become deactivated to generate inactivated monomers, 0 and 1. Activated monomers participate in co-polymerization reactions. Let i denote a binary string. We consider the following chemical reactions: i + 0* → i0 and i + 1* → i1. These chemical reactions can generate all binary strings. Inactivated monomers cannot be used for the elongation reactions, but they can react with active monomers; for example, 0 + 1* → 01.
The chemical kinetics of prelife are described by the following system of linear differential equations
The index i represents all binary strings (or sequences). The abundance of sequence i is denoted by xi. Longer strings are produced from shorter ones by adding either a 0* or a 1* on the right side. Each string, i, has one precursor, denoted by i′, and two followers, denoted by i0 and i1 (figure 1a). For example, 010 is the precursor of 0101. The two followers of 0101 are 01010 and 01011. For the precursors of strings 0 and 1, we set x0′ = x1′ = 1. The rate constants ai denote the rate at which string i is formed from string i′ by addition of an activated monomer (which is either 0* or 1*). Equation (2.1) assumes that the concentration of activated monomers are at constant steady-state levels. This happens, for example, when the decay rate of activated monomers is greater than the rate at which they are used up in prelife reactions. In the following, we think that the steady-state density of activated monomers are already subsumed in rate constants. All strings are removed (decay) at rate d.
(a) The tree of prelife. Activated monomers, 0* and 1*, form (random) polymers. Activated monomers can become deactivated, 0* → 0 and 1* → 1. Activated monomers can attach to the end of strings. For simplicity, we assume that all strings ...
Prelife dynamics define a tree (more precisely a double tree) with the two roots, 0 and 1. This ‘tree of prelife’ has infinitely many lineages (figure 1a). The half of all lineages starts from 0 and the other half starts from 1. A lineage is a sequence of infinitely many strings that are followers of each other. For example, one such lineage contains all all-0 strings: 0, 00, 000, … . Another lineage contains alternating sequences (that start with 0): 0, 01, 010, 0101, … . We could also consider prelife with more than two types of monomers, but this extension is not necessary for the purpose of this paper.
For fully symmetric prelife, we assume a0 = a1 = λ/2 and ai = a for all other sequences, i. In this case, all sequences of length n have the same equilibrium abundance, [λ/2a][a/(2a + d)]n. The total abundance of all strings is λ/d.
Prelife catalysis means that some sequences have the ability to enhance the rates of certain prelife reactions. For example, sequence j might catalyse the reaction i + 0* → i0 at rate c (figure 1b). In this case, the rate of formation of sequence i0 can be written as ai0xi + cxixj. The first term denotes the rate of the uncatalysed reaction. The second term denotes the rate of the catalysed reaction, which is proportional to the abundance of the catalyst, xj. In a subsequent paper, we plan to study sets of prelife catalysts, but here we focus on the dynamics of individual catalysts. We consider a prelife catalyst that enhances some (or all) of its upstream reactions (figure 1a). Our aim is to calculate the equilibrium abundance of such a catalyst. Therefore, we can study the conditions for selection of catalysed over uncatalysed prelife.
Let us consider fully symmetric prelife. Without loss of generality, we assume that the catalyst is the all-0 sequence of length n, which we denote by 0n. There are n − 1 upstream reactions in the lineage, leading from 0 to 0n. Each reaction, 0k + 0* → 0k+1, is enhanced by ck times the abundance of 0n. The parameter ck can be either zero or positive. In order to understand this system, we study the abundances of sequences of the form 0k, where k = 1,2, … . We change our previous notation and let xk denote the abundance of 0k. We have the following system of ordinary differential equations:
We are interested in the equilibrium abundance of the prelife catalyst, which we denote by n. A straightforward calculation shows that it is given as a root of the following polynomial equation:
Imagine a prelife catalyst of length n that catalyses m(1 ≤ m ≤ n − 1) of its n − 1 upstream reactions. For analytical simplicity, we assume that ck is either c or 0. That is, m entries of the vector (c1, …, cn−1) are c and the others are zero. In this case, the equilibrium abundance, n, is given as a root of the equation
Note that equation (3.3) does not depend on which particular m reactions out of the n − 1 upstream reactions are enhanced. For a general c, equation (3.3) cannot be solved explicitly. Nevertheless, we obtain the following result. There exists a critical threshold of m, denoted by mcr. If m ≤ mcr, then the equilibrium abundance, n, is a monotone increasing function of the catalytic activity, c. If m > mcr, then we observe a hysteresis effect: for an interval of intermediate c-values, equation (3.3) has three positive roots: two of them correspond to stable equilibria and one to an unstable equilibrium. Which of the two stable equilibria is reached depends on the initial abundance of the catalyst. For a detailed analysis, see appendix A.
As a special case, let us study a sequence that enhances the rates of all of its upstream reactions. Therefore, we have m = n − 1. The equilibrium abundance, n, is given as a root of the polynomial equation
For c = ∞, we obtain the maximum abundance, n = λ/2(2a + d)(nmax). For a general c, we obtain the following result. There exists a threshold for the length of the catalyst, ncr. When n ≤ ncr (figure 2), the equilibrium abundance n is a monotone increasing function of c. When n > ncr (figure 3), we find the two branches of stable equilibria (the solid lines in figure 3a) and one unstable equilibrium between them (the dotted line in figure 3a). The upper branch exists for c ≥ c1, whereas the lower branch exists for c ≤ c2. For c1 ≤ c ≤ c2, the equilibrium abundance, n, depends on its initial abundance. If the catalyst is initially rare, then it will reach the lower equilibrium (figure 3b). If the catalyst is initially present at high abundance, then it will reach the higher equilibrium (figure 3c).
The equilibrium abundances of the all-0 strings, 01, 02, 03, … , are shown as a function of the catalytic activity, c. The catalyst, 04, is shown in red. Shorter sequences are shown in blue and longer sequences in black. We use a = 1, d = 1 and ...
The first threshold, c1, is the critical value of c that is needed to maintain the catalyst at high abundance. The second threshold, c2, is the critical value that is needed to initiate high abundance of the catalyst when it is not common in the beginning. Therefore, we call c1 and c2 ‘maintenance threshold’ and ‘initiation threshold’, respectively. For large n, we obtain
Here e = 2.718281… (appendix B). The maintenance threshold, c1, grows as a linear function of the sequence length, n. The initiation threshold, c2, grows (approximately) as an exponential function of the sequence length, n. Therefore, it is extremely difficult to select for a catalyst that has a long sequence. At the same time, it is unlikely that short sequences have good (or any) catalytic activity.
An intuitive biological summary is the following. The system has two equilibria: E1 and E2. At E1, the catalyst has low abundance; all sequences have almost the same abundances as in uncatalysed prelife. At E2, the catalyst has high abundance; it ‘dominates’ the population (figure 3). We say that at equilibrium E2, the catalyst has been selected over uncatalysed prelife. If the catalytic activity, c, is less than the threshold c1, then only E1 is stable. If c is greater than c2, then only E2 is stable. If c is between c1 and c2, then both equilibria are stable. Which one will be chosen depends on the initial condition. Therefore, if the prelife catalyst is already present at high abundance, then it will remain so as long as c is greater than c1. On the other hand, if the catalyst is initially not present at high abundance, then it will gain high abundance only if c is greater than c2. This ‘chemical hysteresis’ is caused by the bistability of our system.
Imagine that a sequence i can make a copy of itself by using activated monomers. For fully symmetric prelife, we can once again assume without loss of generality that the replicator is the all-0 sequence of length n, denoted by 0n. The replication starts from the primer, 0, and incorporates activated monomers 0* for elongation.
The difference between the perfect prelife catalyst and the replicator is the following. The prelife catalyst can attach to a sequence and increase the rate at which the activated monomer is added. Afterwards, the catalyst dissociates from the elongated sequence. By contrast, the replicator attaches to a primer and then holds on to the growing sequence. Therefore, the catalytic activity of the replicator can ‘walk along’ the entire sequence. In both cases, we assume that the catalysed elongation step is not rate limiting. Consequently, for the replicator, a single rate-limiting bimolecular reaction is sufficient (attaching between template and primer). For the perfect prelife catalyst, we need n − 1 rate-limiting bimolecular reactions (see figure 1b).
As before, let xk be the abundance of the sequence in the form of 0k (k = 1, … , n). The consumption of primers is described by the term −rx1xn. If we assume perfect replication, two copies of replicators are produced from one primer and one replicator. Therefore, the production of replicators is described by the term rx1xn. In a general case, we obtain the following system of differential equations:
Here the parameter δ represents the efficacy of replication. A perfect replication leads to δ = 1. If replication is always unsuccessful, we have δ = −1, because replicators are consumed in vain. In general, δ takes a value between −1 and 1. In appendix C, we provide a derivation of equation (4.1) by examining the detailed mechanism of the replication process. A key assumption there is that the template-based elongation steps are not rate limiting. In the following, we study δ > 0, otherwise replicators are never selected.
From equation (4.1), it is easy to see that the equilibrium abundance of the replicator, n, is given as the positive root of the following quadratic equation:
For large r, we obtain nmax = δλ/2(2a + d), which agrees with nmax in the case of c = ∞ for prelife catalysts (see §3b), but up to the factor δ. However, the dependence of the equilibrium abundance on r is qualitatively different from that on c in prelife catalysts. It is shown that if the efficacy of replication exceeds δ* = (a/2a + d)n−1, the equilibrium abundance n monotonically increases with r. Bistability is never observed (figure 4). There exists a critical threshold of r given by
If r > r* holds, the equilibrium abundance of the replicator is more than a fraction f (0 < f < 1) of its theoretical maximum, i.e. n > fnmax. Interestingly, the threshold equation (4.3) converges to a fixed value, 2(2a + d)2/(1 − f)δλ, for large n. In contrast to prelife catalysts, long replicators can be selected over prelife.
Now we consider a scenario where the primer of replication is not a monomer, but a sequence of length (>1). As before, suppose that the replicator is 0n. The primer of the replication is given by 0 (1 < < n). Replication is described by the term rxxn. Taking into account the efficacy of replication, we obtain the following system of differential equations:
A calculation shows that the equilibrium abundance of the replicator, denoted by n, is given by the positive root of the quadratic equation
The equilibrium abundance of replicators monotonically increases with r if and only if the efficacy exceeds
Therefore, for a fixed length of the replicator, n, the required efficacy grows exponentially with the length of the primer, . The replicator that requires a longer primer is less likely to be selected.
Suppose the efficacy exceeds equation (4.6). We obtain n = [δλ/2a]·[a/(2a + d)] (nmax) at r → ∞. The critical threshold of the replication constant, denoted by r*, is given by
This threshold means that if r > r*, then the equilibrium abundance of the replicator exceeds a fraction f (0 < f < 1) of its theoretical maximum, i.e. n > fnmax For a fixed primer length, , the threshold (4.7) tends to a constant, r* = 2(2a + d)2/[(2a + d)/a]−1(1 − f)δλ for large n. Thus, the critical threshold (4.7) converges to a fixed value for increasing n, which is consistent with the result found in §4a. The intuitive explanation for this finding is that the catalysed elongation steps of the replication process are not rate limiting. Therefore, the length of the replicator does not affect the rate of replication.
We have studied the selection criteria for prelife catalysts and replicators. By prelife catalysts, we mean sequences that can enhance certain reactions in prelife. The perfect prelife catalyst is a (hypothetical) sequence that enhances the rates of all reactions in its own production lineage. We show that even for a perfect prelife catalyst, it is very difficult to achieve a high equilibrium abundance, because the catalytic activity has to exceed a threshold value that grows exponentially with the sequence length. By contrast, sequences that can replicate can achieve high equilibrium abundance even if they have considerable length. The critical replication rate is almost independent of the length of the replicator. However, the required efficacy of replication grows with the length of the primer.
Our selection thresholds arise, because there is competition between prelife and catalytic prelife, on the one hand, and between prelife and replication (life), on the other hand. The latter is especially interesting because prelife is needed to build the sequences for replication (the replicator and the primer), but then prelife and life compete for the same resources (activated monomers). This tension between prelife and life leads to the origin of evolution.
Support from the John Templeton Foundation, the NSF/NIH joint program in mathematical biology (NIH grant no. R01GM078986), the Bill and Melinda Gates Foundation (Grand Challenges grant 37874), and J. Epstein is gratefully acknowledged.
First, we will study equation (3.4), which is rewritten as
Therefore, we can regard c as a single-valued function of x. Let c(x) be the right-hand side of equation (A 1). Its derivative with respect to x is
where ξ 1 − [2(2a + d)x/λ][1/(n−1)]. As c is non-negative, from equation (A 1), we need 0 < ξ ≤ (a + d)/(2a + d). Solving c′(x) = 0 leads to
Let D be the discriminant of the quadratic equation of ξ, equation (A 3). D vanishes at
Also, D is strictly negative at n = 2. Thus, if 2 ≤ n ≤ ncr, then D ≤ 0, which means that c′(x) is always non-negative. Therefore, c = c(x) is a monotone increasing function of x, so is its inverse function x = x(c). If n> ncr, then D > 0, which means that equation (A 3) has two distinct roots. We can prove that these two roots satisfy 0 < ξ< (a + d)/(2a + d). Therefore, c = c(x) has one local maximum and one local minimum, leading to the S-shaped curve in figure 3a.
Next we study equation (3.3), which can be rewritten in the same form as equation (3.4) by setting n′ m + 1 and λ′ λ[a/(2a + d)]n−1−m. Therefore, similar conclusions can be drawn. If n′ > ncr, or equivalently, if
holds, then the system shows bistability. There are the maintenance threshold, c1, and the initiation threshold, c2.
First, we study a perfect catalyst that catalyses all of its upstream reactions. When n > ncr, solving equation (A 3) yields
Remember that we have defined ξ as ξ = 1 − [2(2a + d)x/λ][1/(n−1)], so x± = [λ/2(2a + d)](1 − ξ±)n−1. Note that x+ < x−. The function c = c(x) has its local maximum at x = x+ and its local minimum at x = x−. We obtain c1 = c(x−) and c2 = c(x+) (figure 3a). A direct calculation shows the asymptotic estimates of these values shown in the main text. We use (1 + 1/n)n ≈ e = 2.718281… . for large n.
Similarly, for a catalyst that catalyses a fraction θ(=m/(n − 1)) of its upstream reactions with the catalytic activity c, we obtain the following asymptotic estimates of c1 and c2 for large n:
where 0 < θ ≤ 1. Therefore, the two thresholds grow (approximately) exponentially with n when the catalyst enhances some of its upstream reactions (0 < θ< 1). Only when the catalyst enhances all of its upstream reactions (θ = 1) does the maintenance threshold, c1, grow linearly with n.
Here we explain the underlying mechanics of replication and provide a detailed derivation of equation (4.1). Let 0n denote the replicator. As in the main text, we denote the abundance of sequence 0k by xk (k = 1, … , n). We assume direct as opposed to complementary replication. The replication process starts when a (inactivated) monomer 0, which is a primer, attaches to a replicator, which is a template. This reaction is described by the term αx1xn. The resulting complex between the template and the primer grows in length by incorporating activated monomers 0* one by one until it becomes the full double strand of 0n. We call these steps elongation reactions. Let yk denote the abundance of the complex between the template (of length n) and the growing sequence that has reached length k. The abundance of the full double strand is given by yn. For simplicity, we assume that the reaction rate of each elongation step is constant and given by β. The full double strand separates at rate γ (for example, via temperature oscillations). All sequences and complexes decay at rate d. We obtain the following system of differential equations:
Remember that the stationary density of activated monomers is subsumed in the rate constants, λ, a and β. We assume that the rate of template-based elongation, β, is much faster than other rate constants such as a, d and γ. For the quasi-equilibrium abundance of full double strands, we obtain
Rewriting parameters as r = α and δ = (γ − d)/(γ + d) reproduces equation (4.1) in the main text.
We note that the assumption of fast elongation (large β) is entirely consistent with our model for prelife catalysis, which also contains an implicit assumption of a fast ‘elongation’ step. The prelife catalyst, 0n, binds its target sequence, 0k, to form a complex [0n 0k]. This complex reacts very fast with an activated monomer, 0*, to give rise to [0n 0k+1]. Subsequently, the complex dissociates into 0n and 0k+1. Equation (3.1) assumes that the elongation reaction is not rate limiting. Therefore, a replicator with a fast elongation reaction is the proper comparison for the prelife catalyst described by equation (3.1). The difference between the replicator and the prelife catalyst is that the catalytic activity of the replicator ‘walks along’ the sequence, whereas the prelife catalyst can accelerate only a single elongation step and dissociates subsequently.