Search tips
Search criteria 


Logo of wtpaEurope PMCEurope PMC Funders GroupSubmit a Manuscript
Phys Rev Lett. Author manuscript; available in PMC 2012 October 19.
Published in final edited form as:
Published online 2012 January 30.
PMCID: PMC3476000

Optimal Placement of Origins for DNA Replication


DNA replication is an essential process in biology and its timing must be robust so that cells can divide properly. Random fluctuations in the formation of replication starting points, called origins, and the subsequent activation of proteins lead to variations in the replication time. We analyse these stochastic properties of DNA and derive the positions of origins corresponding to the minimum replication time. We show that under some conditions the minimization of replication time leads to the grouping of origins, and relate this to experimental data in a number of species showing origin grouping.

The replication of the DNA content of a cell is one of the most important processes in living organisms. It ensures that the information needed to synthesize proteins and cellular components is passed on to daughter cells in a robust and timely fashion. Replication takes place during the S phase of the cell cycle, and it starts from specific locations in the chromosome called origins. In order to function in a particular round of the cell cycle, possible origin locations (loci) must undergo a sequence of binding events before the S phase starts. This culminates in the clamping of one or more pairs of ring-shaped Mcm2-7 molecules around the DNA; this is known as licensing. Below we denote a pair of Mcm2-7 molecules as pMcm. Features of human replication have been studied with the help of the yeast S. cerevisiae and X. laevis frog embryos. In S. cerevisiae licensing is only possible at a set of specific points in each chromosome, characterized by the presence of specific DNA sequences, whereas in X. laevis embryos the licensing proteins can bind at virtually any location in the genome [1]. When a licensed locus activates in the S phase, two replication forks are created at the origin, and they move in opposite directions with approximately constant speed, duplicating the DNA as they travel through the chromosome [Fig. 1(a)]. Both origin licensing and origin activation time are stochastic events, since they result from molecular processes involving low-abundance species. In X. laevis, both the loci that are licensed and licensed loci selected for activation vary randomly from cell to cell, whereas in S. cerevisiae each of the fixed loci has a certain probability of being activated in any given cell—the competence—which reflects the fraction of cells in a population in which that locus has had time to be licensed before the S phase starts [2].

FIG. 1
(a) Coordinate system for origin loci with d1, d2 being the distance from the left or right end of the chromosome, respectively. x is the position coordinate along the chromosome. Replication forks travel at a speed v away from the origins. The grey regions ...

The total time it takes to replicate a cell’s DNA—the replication time—is a quantity of crucial importance for biology, since it is clearly an evolutionary advantage for replication to be rapid as it affects the minimum time required for cells to duplicate. The location of the origins is one of the crucial factors determining the replication time of cells, and it is reasonable to expect that the loci have been selected by evolution such that the replication time is minimized. There are a number of recent theoretical and modeling works on the dynamics of DNA replication (reviewed in [3]). Previous theoretical works on S. cerevisiae have used the experimentally determined loci as given parameters, without attempting to understand why the origins are located where they are [2,4-6]. Inspection of the loci on a S. cerevisiae genome map shows groups of two or three very close origins which are very prominent in most chromosomes [7]. There is also experimental evidence for grouping in X. laevis, where origins seem to be distributed with groups of 5 to 10 pMcms [8-10]. Most of the existing models of replication in X. laevis [9,11-15] - an exception is [16] - assume the origins to be random and independent of each other, and so they cannot explain pMcm grouping or the observed maximum-spacing of 25 kilobases (kb) between adjacent origins [9].

In this Letter, we use a simplified mathematical model of the DNA replication process to determine the optimal origin location in a chromosome which leads to the shortest average replication time, and how this optimal placement depends on parameters such as the origin competences and the width of the activation time probability distribution. We show that contrary to what one might expect, in many cases the replication time is minimized by placing origins close together in groups like those observed in real chromosomes. This suggests that origin locations have been selected to minimize the replication time. Analysis of our model reveals that grouping is favored for low-competence origins and for origins with large stochastic fluctuations in their activation time. The reason for this is that if origins have an appreciable likelihood of either failing to activate (low competence) or of taking a very long time to activate, grouping origins together helps reduce the risk of large regions of the chromosome not being replicated on time. If the origins are highly competent and have a well-defined activation time, it is optimal to have maximal coverage and distribute the origins evenly on the chromosome. We further show that there is an abrupt transition in the optimal configuration of origins, from isolated to grouped, as the locus’ competence decreases (in S. cerevisiae), and also as the width of the activation time distribution increases. We give an intuitive explanation of this phenomenon, and argue that it is robust, and independent of any particular details of the model. These results are derived analytically, and tested through numerical simulations. We also compare quantitatively the predictions of our theory with the available experimental data for both S. cerevisiae and X. laevis, and find that they match well.

We start by analyzing the case of stochasticity in licensing of fixed origin loci, as in S. cerevisiae. DNA is modeled as a one-dimensional segment of unit length, and we for simplicity consider only two loci in the chromosome. The two origin loci have competences p1 and p2—these are the probabilities that origins have been licensed and can therefore start replication forks. We initially make the assumption that origins activate at a well-defined time (which we set to t = 0). All replication forks travel at the same unit speed across the DNA. We consider the geometry depicted in Fig. 1(a); d1(d2) is the distance from the left (right) end of the chromosome to the left (right) most locus. If both loci fail to be licensed we postulate that replication will eventually take place anyway, with a replication time T0 — for example, we can imagine that stretch of DNA will be replicated by forks originating from origins outside of the region we are considering. Our results do not depend on T0 as will be clear shortly; this is just a mathematical device to prevent us from dealing with infinite replication times.

If only one of the loci fails to become licensed, the replication time depends on the time it takes for the fork to reach the furthest end of the segment, so Td1 = 1 − d1 for locus 1 and Td2 = 1 − d2 for locus 2. If both loci have been licensed the replication time Td1,d2 = max{d1;d2(1 − d1 − d2)/2} s defined by the longest time for a fork to reach the end of the segment or for two forks to collide. It can be shown that the replication time of an asymmetric placement of loci is never less than a corresponding symmetric configuration (that is, with d1 = d2). Therefore we consider only symmetrical locus placements, and use d1 = d = d with 0 ≤ d ≤ 1/2. The average replication time is then given by. The average replication time is then given by

equation M1

This is a piecewise-linear function with discontinuities at d = 1/4 and 1/2. Hence, Trep can only have a minimum found at d=0, d=1/2, or at 1/4. Placing loci at the end of a segment (d = 0) is obviously not a minimum of Trep. The replication times for d = 1/4 and 1/2 are Trep(d = 1/2) = (1 − p1)(1 − p2)T0 + (p1 + p2 − p1p2)/2 and Trep(d = 1/4) = (1 − p1)(1 − p2)T0 + (3p1 + 3p2 − 5p1p2)/4. We conclude that the two loci group together (d = 1/2) to achieve minimum replication time if Trep(d = 1/2) < Trep(d = 1/4), which leads to the condition

equation M2

Notice here that T0 drops out. The inequality Eq. (1) defines two regions on the p1-p2 plane, corresponding to grouped or isolated loci being optimum. This is shown in Fig. 1(b), where this analytical result is confirmed by stochastic simulations. The region above the curve corresponds to competences for which Trep is minimized by loci being apart (d = 1/4) and below the curve for organising these in a group (d = 1/2). Notice that, perhaps surprisingly, the grouped region is actually greater than the isolated-loci region. In general, if one of the loci has low competence grouping gives the minimum replication time. In fact, it can be shown from Eq. (1) that if one of the loci has a competence lower than 50%, grouping is the optimal situation regardless of the competence of the other — even if the other is close to 100% competent.

For the case of equal competences, p1 = p2 = p, the grouped configuration is optimal if p < 2. This result predicts that when p drops below 2/3, there is a sharp transition where the optimum spatial distribution of the loci changes from isolation (d = 1/4) to the grouped configuration at the center. We ran a numerical optimization algorithm (using genetic algorithms [17]) to find the loci corresponding to the least replication time for a range of p; these results are shown in Fig. 1(c). The same transition also takes place for non-identical values of p1 and p2 — whenever one crosses from the dark to the light regions of Fig. 1(b).

The above results may seem at first quite counterintuitive; one might expect that the configuration with the least replication time would correspond to isolated loci (d = 1/4). However, if the origins have a significant chance of failing to activate, this configuration would mean that often one side of the chromosome would have to wait for a fork which originated at the origin on the other site to replicate it, therefore increasing Trep. So in the case of low competences, it becomes advantageous to have both loci centered, which is near any point in the chromosome. This explains the condition for grouping if p < 2/3.

In reality eukaryotic chromosomes have more than two loci [3], so next we investigate the case of a chromosome on which there many loci and examine the condition under which it becomes favourable to have isolated origin loci compared to groups. In this analysis we will assume for simplicity that the loci all have identical competence p. We consider a group of loci as one single locus with an effective competence peff. For a group consisting of m loci peff is the competence that at least one locus will be licensed there, and is given by peff = 1 − (1 − p)m We assume that one large group of n identical loci breaks up into two groups of equal size, each consisting of n/2 loci. A locus organized with others in a group of size m = n/2 rather than with n loci will give minimum replication time, as long as the locus’ competence is less than its critical probability pc, given by peff = 2/3, which yields equation M3. Figure 2(a) confirms our analytical result showing the value of pc for increasing group sizes in our simulations. These results clearly show that large groups of many highly competent loci are unfavourable, but that groups tend to form for low-competence loci, and the optimal group size increases with decreasing competence.

FIG. 2
(a) Probability at which groups separate pc vs loci/group n. Shown are simulations (circles) and analytical prediction for equation M10 (line). (b) Distribution of origin loci on yeast chromosome VI with known (grey) and unknown competences [2,18]. The distribution ...

Our hypothesis is that selective pressure has influenced the position of origin loci through the minimization of the replication time. We show in Fig. 2(b) locus competence and location data for yeast chromosome VI, which has been studied extensively [2,18]. Competences cannot be measured for all loci (in white), because either they are too close to the end of the chromosome or to an adjacent locus. We performed a search for the optimal position for the loci in the region with known competences using a genetic algorithm [17]. We remark that there is an identifiability problem as all strong loci have p ~ 90% and we therefore constrained the ordering during the optimization. Although in this result we do not consider interorigin variations in the origin activation time, the predicted locus distribution from these simulations bears a good resemblance to the actual spacing with a score of F = 0.11 [19]; in particular, we recover the group in the middle, in which an origin locus with 58% competence is placed next to one with 88% competence. Even multiple repeats of the optimization algorithm produce minimum Trep solutions which have on average F = 0.12.

The above discussion focused on the case of predefined loci in yeast, and ignored additional noise such as the variation in origin activation time. We show in Fig. 3(d) that the previous pattern of origin grouping is preserved in the two-origin model with stochastic variation in origin activation time. Grouping is important for swift replication under conditions of low competence and large noise which we will explain in the remainder of this Letter.

FIG. 3
(a) Origin position x so that Trep for 2 pMcms is minimal on a segment of unit length, when the standard deviation σ for their activation time increases. (b) Inset: Trep as a function of σ for realistic parameters as given in the text. ...

We now examine the case of stochastic activation time for X. laevis embryos as a model organism. Unlike loci in yeast, any DNA locus in an X. laevis embryo is capable of binding with pMcm to become an origin. Surprisingly, biologists find roughly equally-spaced groups of 5-10 pMcms separated by approximately 10 kb [8-10]. We will use the same approach as above, but now with respect to stochasticity in the replication time. In this case, an “origin” is defined as a locus where at least one pMcm has bound to it, and so it corresponds to the 100% competent locus in the notation we have used so far. It is well accepted by biologists, however, that origin activation time is stochastic. For simplicity we assume that the pMcms at an origin can activate with uniform probability at any time within a window which has a lower boundary at t0 = 0 min and an upper at tb, which is at maximum the length of an S phase (20 min). In addition, pMcms are assumed to be all identical with the same activation probability distribution (standard deviation equation M4).

The expectation is that we will again see a transition of the optimal configuration from isolated pMcms to groups as σ increases; this is akin to varying competence in our previous scenario. We test this prediction using the two-origin model with one pMcm bound to one origin; we find numerically the optimal (minimum Trep) positions for the origins as a function of σ. The results are presented in Fig. 3(a). We again use a segment of unit length and forks progress at unit speed of v = 1. We observe a sharp transition at σ ~ 0.25, above which it is best to place both origins in the middle of the segment, as observed in the case with varying competence. A minor difference between this case and the previous one is that for σ < 0.25, the optimal location of the origins is not constant.

We now apply this model for more origins and pMcms, using realistic parameters so that we can relate the results to what is experimentally known about X. laevis’ pMcm distribution. We model a stretch of DNA of size 100 kb and v = 1 kb/min [20]. To determine whether the minimum-replication-time configuration requires pMcm grouping, we distributed 64 pMcms in total, i.e., that there is on average 1/1.5 pMcm/kb as found in nature [8]. The pMcms are then placed in 64/n groups of n [set membership] {1,2,4,8,16} origins, so that the origins are uniformly distributed through the 100 kb chromosome, or completely random. Other authors have identified σ to be 6-10 min in X. laevis [13,21] as well as in S. cerevisiae [2,5,20,22]. Our results [Fig. 3(b)] indicate that grouping with an equal spacing of up to 12.5 kb achieves precise and fast DNA synthesis before the end of S phase (20 min) for σ within these limits. We also find that 8 groups of 8 pMcms gives the advantage of a 1.1 min quicker Trep than using random loci; even when the number of pMcms at these 8 groups varies, a quicker Trep is achieved (data not shown). Grouping pMcms also protects the overall replication process against fluctuations from one round of the cell cycle to another; a similar problem is discussed in [14]. This is because one initiation event at an origin is sufficient to activate replication forks.

One might expect that in a natural environment there would not be strict equal spacing of groups. We now relax our previous assumption by taking evenly spaced groups and perturb the location of each group by a small random amount drawn from a Gaussian distribution. The introduction of such variation allows us to compare our simulation with available experimental data of replicated genomic regions, which were captured as center-center distances at around 5 min after the onset of replication (cf. [9]). Figure 3(c) shows that our result is in agreement with the current understanding of the biological community, i.e., groups of 5-10 pMcms about every 10 kb. This may be achieved by a regulation of pMcm-loading proteins, whose affinity to bind decreases around existing origins [23,24]. Although a random placement represents the data similarly well, Trep remains smaller in this case where the origin groups are not equally spaced as seen before [cf. Figure 3(b)]. This shows that grouping of origins remains favorable even in a more general setting [Fig. 3(d)].


We thank C.A. Nieduszynski and C.A. Brackley for critical reading of the manuscript. We also would like to thank the referees for their suggestions to the manuscript. This work has been supported through the Scottish University Life Sciences Alliance and the Biotechnology and Biological Sciences Research Council (Grants No. BB/G001596/1 and No. BB-G010722).


[1] Kelly TJ, Brown GW. Annu. Rev. Biochem. 2000;69:829. [PubMed]
[2] de Moura APS, et al. Nucleic Acids Res. 2010;38:5623. [PMC free article] [PubMed]
[3] Hyrien O, Goldar A. Chromosome Res. 2009;18:147. [PubMed]
[4] Spiesser TW, Klipp E, Barberis M. Mol. Genet. Genomics. 2009;282:25. [PMC free article] [PubMed]
[5] Brümmer A, et al. PLoS Comput. Biol. 2010;6:e1000783. [PMC free article] [PubMed]
[6] Yang SC-H, Rhind N, Bechhoefer J. Mol. Syst. Biol. 2010;6:404. [PMC free article] [PubMed]
[7] Nieduszynski CA, et al. Nucleic Acids Res. 2007;35:D40. [PMC free article] [PubMed]
[8] Mahbubani HM, et al. J. Cell Biol. 1997;136:125. [PMC free article] [PubMed]
[9] Blow JJ, et al. J. Cell Biol. 2001;152:15. [PMC free article] [PubMed]
[10] Edwards MC, et al. J. Biol. Chem. 2002;277:33049. [PubMed]
[11] Jun S, Bechhoefer J. Phys. Rev. E. 2005;71:011909. [PubMed]
[12] Zhang H, Bechhoefer J. Phys. Rev. E. 2006;73:051903. [PubMed]
[13] Goldar A, et al. PLoS ONE. 2008;3:e2919. [PMC free article] [PubMed]
[14] Yang SC-H, Bechhoefer J. Phys. Rev. E. 2008;78:041917. [PubMed]
[15] Blow JJ, Ge XQ. EMBO J. 2009;10:406. [PMC free article] [PubMed]
[16] Jun S, et al. Cell Cycle. 2004;3:211.
[18] Shirahige K, et al. Mol. Cell. Biol. 1993;13:5043. [PMC free article] [PubMed]
[19] equation M5 is a measure of the difference between the gap distribution of the optimized and random cases. A gap is defined as the separation between ith experimental locus position equation M6 and that of the optimization equation M7. equation M8 is akin; the average separation that arises from placing a locus uniformly and randomly, and equation M9. F = 0 means that the optimization fits the experimental loci positions perfectly; F ~ 1 indicates no difference to that of a random placement.
[20] Raghuraman MK, et al. Science. 2001;294:115. [PubMed]
[21] Herrick J. J. Mol. Biol. 2002;320:741. [PubMed]
[22] Sekedat MD, et al. Mol. Syst. Biol. 2010;6:353. [PMC free article] [PubMed]
[23] Rowles A, Tada S, Blow JJ. J. Cell Sci. 1999;112:2011. [PubMed]
[24] Oehlmann M, Score AJ, Blow JJ. J. Cell Biol. 2004;165:181. [PMC free article] [PubMed]