|Home | About | Journals | Submit | Contact Us | Français|
Previous studies have led to a picture wherein the replication of DNA progresses at variable rates over different parts of the budding yeast genome. These prior experiments, focused on production of nascent DNA, have been interpreted to imply that the dynamics of replication fork progression are strongly affected by local chromatin structure/architecture, and by interaction with machineries controlling transcription, repair and epigenetic maintenance. Here, we adopted a complementary approach for assaying replication dynamics using whole genome time-resolved chromatin immunoprecipitation combined with microarray analysis of the GINS complex, an integral member of the replication fork. Surprisingly, our data show that this complex progresses at highly uniform rates regardless of genomic location, revealing that replication fork dynamics in yeast is simpler and more uniform than previously envisaged. In addition, we show how the synergistic use of experiment and modeling leads to novel biological insights. In particular, a parsimonious model allowed us to accurately simulate fork movement throughout the genome and also revealed a subtle phenomenon, which we interpret as arising from low-frequency fork arrest.
In mitotic division, cells duplicate their DNA in S phase to ensure that the proper genetic material is passed on to their progeny. This process of DNA replication is initiated from several hundred specific sites, termed origins of replication, spaced across the genome. It is essential for replication to begin only after G1 and finish before the initiation of anaphase (Blow and Dutta, 2005; Machida et al, 2005). To ensure proper timing, the beginning stages of DNA replication are tightly coupled to the cell cycle through the activity of cyclin-dependent kinases (Nguyen et al, 2001; Masumoto et al, 2002; Sclafani and Holzen, 2007), which promote the accumulation of the pre-replication complex (pre-RC) at the origins and initiate replication. Replication fork movement occurs subsequent to the firing of origins on recruitment of the replicative helicase and the other fork-associated proteins as the cell enters S phase (Diffley, 2004). The replication machinery itself (polymerases, PCNA, etc.) trails behind the helicase, copying the newly unwound DNA in the wake of the replication fork.
One component of the pre-RC, the GINS complex, consists of a highly conserved set of paralogous proteins (Psf1, Psf2, Psf3 and Sld5; Kanemaki et al, 2003; Kubota et al, 2003; Takayama et al, 2003). In Xenopus egg extracts, GINS has been shown to localize at sites of unwound DNA (Pacek et al, 2006). In yeast, the complex associates with paused replication forks (Calzada et al, 2005), and directly interacts with several fork proteins (Gambus et al, 2006). Chromatin immunoprecipitation (ChIP) experiments show that the GINS complex moves away from specific autonomous replication sequences (ARSs) at the time of initiation (Takayama et al, 2003; Kanemaki and Labib, 2006). The GINS complex has been biochemically isolated with Cdc45 and Mcm2-7, which together are referred to as the replisome progression complex (RPC) or the CMG (Cdc45-MCM-GINS) (Gambus et al, 2006; Moyer et al, 2006); the CMG has been shown to have helicase activity in vitro (Moyer et al, 2006). Taken together, these data suggest that the GINS complex is an integral component of the replication fork and that its interaction with the genome correlates directly to the movement of the fork (reviewed in Labib and Gambus, 2007). Here, we used the GINS complex as a surrogate to measure features of the dynamics of replication—that is, to determine which origins in the genome are active, the timing of their firing and the rates of replication fork progression.
Previously, as many as 732 sites within the 16 chromosomes of the Saccharomyces cerevisiae genome have been reported as potential origins of replication (Nieduszynski et al, 2007) (www.oridb.org). Three separate studies monitoring the production of newly replicated DNA identified, respectively, 332, 260 and 444 origins in the yeast genome (Raghuraman et al, 2001; Yabuki et al, 2002; Feng et al, 2006). In addition, experiments using ChIP combined with microarray analysis (ChIP-chip) to search for the genomic localization of MCM and ORC proteins identified, respectively, 422 (Wyrick et al, 2001) and 529 sites (Xu et al, 2006). Thus, although it is likely that virtually all origins have been identified, exactly which are active during S phase remains ambiguous.
The timing of origin firing and the rates of fork progression have also been investigated by monitoring nascent DNA synthesis (Raghuraman et al, 2001; Yabuki et al, 2002). Origin firing was observed to occur as early as 14 min into the cell cycle and as late as 44 min (Raghuraman et al, 2001). A wide range of nucleotide incorporation rates (0.5–11 kb/min) were observed, with a mean of 2.9 kb/min (Raghuraman et al, 2001), whereas a second study reported a comparable mean rate of DNA duplication of 2.8±1.0 kb/min (Yabuki et al, 2002). In addition to these observations, replication has been inferred to progress asymmetrically from certain origins (Raghuraman et al, 2001). These data have been interpreted to mean that the dynamics of replication fork progression are strongly affected by local chromatin structure or architecture, and perhaps by interaction with the machineries controlling transcription, repair and epigenetic maintenance (Deshpande and Newlon, 1996; Rothstein et al, 2000; Raghuraman et al, 2001; Ivessa et al, 2003). In this study, we adopted a complementary approach for assaying replication dynamics, in which we followed GINS complexes as they traverse the genome during the cell cycle. These studies led us to a different view of replication fork dynamics wherein fork progression throughout the genome is symmetrical around origins, highly uniform in rate and little affected by genomic location.
In this study, we are particularly interested in the effect of chromosomal location on the dynamics of replication fork progression. To address this question on a genome-wide scale, we followed the spatial and temporal association of a protein that assembles at origins with the pre-RC and has been inferred to travel with the advancing replication forks. For this purpose, we chose Psf2, a component of the GINS complex (Takayama et al, 2003), and collected ChIP-chip (Ren et al, 2000; Tackett et al, 2005) data as a function of time throughout the cell cycle (shown for chromosome XVI and for the whole genome in Figure 1; Supplementary Figure S4).
At early times (20 min), the GINS complex does not exhibit any observable interaction with the genome as evidenced by the lack of peaks in the ChIP-chip signal (Figure 1) and lack of binding to other protein members of the RPC (data not shown). As the cell cycle progresses, the GINS complex begins to interact with specific chromosomal sites (indicated by the peaks and the dashed lines in Figure 1)—nearly all of which correspond to previously posited origins (Supplementary Table S1; Nieduszynski et al, 2007). The height of the peaks provides a measure of average occupancy, and their spreading indicates that some fraction of the origins within the cell population have fired. The data for chromosome XVI (Figure 1) indicate that 12 origins are occupied to some extent in the population, almost all of which have fired by 25 min. By 30 min the number of occupied origins increases to ~28, after which time no additional origins appear to be populated. Bi-directional GINS progression from each of these origins (averaged over the population of cells) can be inferred from the spreading of the edges of the corresponding peaks. By 35 min, many of these edges have merged with those from adjacently spreading GINS and fewer locations on the chromosome remain wholly unoccupied. Spreading continues with time such that by 50 min GINS has progressed to most regions of the chromosome, and reduced occupancy is observed in the regions surrounding the origins. By 60 min the overall occupancy has fallen sharply across the chromosome indicating that the replication process for most of the population is nearing completion and the GINS complex has been released. The GINS re-associates with the origins at the beginning of the next cell cycle (data not shown). Thus, the Psf2 reporter provides an animated view of the GINS progression across an entire chromosome.
Similar data to that shown in Figure 1 were obtained for all 16 yeast chromosomes (Supplementary Figure S4). At the time resolution of the experiment (5 min), we discern three broad categories that describe the association of GINS with origins. The first includes those origins that fire in the interval 20–25 min. The second includes those that fire in the interval 25–30 min. The final category includes origins to which the GINS complex binds, but from which bidirectional spreading is not observed. Although others have reported similar early and late firing origins, our data reveal the majority of origins fire in a much narrower time window—that is, 15 versus 30 min (Yamashita et al, 1997; Poloumienko et al, 2001; Raghuraman et al, 2001; Yabuki et al, 2002). Applying this classification system to chromosome XV (Figure 2A), we identified a total of 29 origins, of which 17 are category 1, 9 are category 2 and 3 are category 3. In the whole genome, we identified 168, 135 and 24 origins, respectively, in categories 1, 2 and 3 (Supplementary Table S1, which also provides a detailed comparison with previous studies). Overall, we observe 303 origins that give rise to active GINS progression (i.e. categories 1 and 2). Although a small number of late firing origins may be obscured by proximal early firing origins, our data were determined with sufficiently high resolution and signal-to-noise to provide an accurate map of the majority of origins that are active in the cell cycle.
Most of the 303 active origins fire within a short time window (~10 min) (Figures 1, ,2A;2A; Supplementary Figure S4). We determined the progression rates for 278 spreading peaks emanating from 199 of these 303 origins by direct measurement of the positions of the peak edges as a function of time (Figure 2B). We obtain an average progression rate of 1.6±0.3 kb/min—a value that is considerably lower and more narrowly distributed than previous replication rate estimates (2.9 kb/min with a range between 0.5–11 kb/min (Raghuraman et al, 2001) or 2.8±1.0 kb/min (Yabuki et al, 2002). The bidirectional spreading that emanates from the origins progressed to the left and right at closely comparable rates.
From these data, we infer that this movement is largely symmetric and occurs at a highly characteristic rate, implying that, in contrast to previous inferences relating chromatin position to replication rate, there are few obvious chromosomal features that locally alter uniform GINS progression. To test the generality of this rather simple view of GINS progression throughout the genome, we used our measurements to generate an iterative model of this process. Our model (Materials and methods) uses a parsimonious set of assumptions: (1) the firing time of a given origin in a population of cells is normally distributed with a mean firing time specific to the origin; (2) the standard deviation of the firing times is constant for all origins in each simulated region; (3) the velocity of progression is the same for all forks (v=1.6 kb/min); and (4) GINS fall off the chromosomes when adjacent forks collide.
Examples comparing simulations based on our model with experimental data are provided in Figure 3. The top panel (Figure 3A) compares a region containing a single category 1 origin and two adjacent category 2 origins. The model accurately recapitulates the features exhibited in the experimental data. The same is true of the more complex region containing seven category 1 origins shown in Figure 3B. It also accounts for fork movement between the most widely spaced origins (~100 kb apart) (Figure 3C). Such regions are completely filled in at our experimentally determined average rate of 1.6 kb/min, within the duration of S phase (~30–35 min at room temperature). Figure 3D and E show that regions proximal to the telomeres and centromeres are as easily modeled as any other region of the genome. Category 3 origins were also readily modeled by allowing a small amount of binding without bidirectional spreading (Figure 3F). We conclude that our simple model successfully recapitulates GINS movement throughout the genome, indicating that this movement is largely uniform irrespective of location. By varying the assumed fork velocity, our model also allows us to estimate the accuracy of velocities inferred from the time-resolved ChIP-chip analysis. Velocity changes as small as 0.4 kb/min can readily be discerned (Supplementary Figure S8).
For simplicity, all the origins illustrated in Figure 3A–E are assumed to fire with unit efficiency. A refined model should include the possibility that different origins fire with different efficiencies. For example, replication is known to initiate at a reduced efficiency in many late firing origins (Yamashita et al, 1997; Poloumienko et al, 2001; Yabuki et al, 2002). In addition, there is a pattern of alternating groups of early and late firing origins across the genome (McCarroll and Fangman, 1988; Yamashita et al, 1997; Poloumienko et al, 2001; Raghuraman et al, 2001; Yabuki et al, 2002). We observe similar trends in this work (see Supplementary Table S1). As an example, we consider the cluster of late firing origins located on the right arm of chromosome XV between 500 and 800 kb (Figure 2A). We find that our model fails to accurately simulate this region if the firing efficiency for these late firing origins is too high (100%) or too low (10%). However, we can accurately simulate fork dynamics in this region if we assume that the efficiencies for these sites are decreased to 50% relative to the surrounding early firing origins (Figure 4). Thus, not only does the model provides an overall picture of the dynamic nature of GINS, but also a means to extract quantitative details concerning factors such as firing efficiency.
Indeed, the method is sufficiently sensitive to detect small heterogeneities in replication dynamics and in the process sheds new light on the phenomenon described as ‘replication pausing'—short duration stalling of forks at numerous specific sites in the genome (Deshpande and Newlon, 1996; Ivessa et al, 2003; Azvolinsky et al, 2006; Azvolinsky et al, 2009). For example, pausing intervals at tRNA genes have previously been estimated at ~10 s (Deshpande and Newlon, 1996; Ivessa et al, 2003)—that is, ~4 times longer than the interval required for unimpeded replication fork transit of a tRNA gene. Although such events would seem too subtle to detect with 5 min resolution (Azvolinsky et al, 2009), we observed sharp features at 267 out of 275 these tRNA genes (Supplementary Figures S11 and S12), indicating that our methodology is sensitive enough to detect small perturbations, and that GINS movement can indeed be hindered to some degree (Figure 5). However, rather than a brief pause of each fork as it passes through these sites, these features persist late into the cell cycle. Simulation of several different scenarios (Figure 6; Supplementary Figure S2) suggests that these features may represent infrequent long-term arrest events, occurring with a probability of <0.5% for any given fork passing through a tRNA gene (Supplementary Figure S3). This phenomenon was also observed at 81 of the 83 snoRNA and snRNA genes, and 95 of the 100 other most highly transcribed genes in the genome (Figure 5; Supplementary Figures S2 and S11), and appears to be independent from the direction of transcription.
This study monitors the dynamic progression of the GINS complex along the genome during the cell cycle. Our data, along with previously determined interactions of the GINS complex, strongly suggest that GINS moves with the helicase at the replication fork. Our time-resolved ChIP-chip data reveal that GINS binds to active replication origins and spreads bi-directionally and symmetrically as S phase progresses. A similar approach has been used to monitor Pol localization on yeast chromosome III, albeit at lower time resolution (Hiraga et al, 2005). As monitored by GINS movement, the majority of origins appear to fire in the first ~15 min of S phase. A small fraction (~10%) of the origins to which GINS binds show no evidence of spreading (category 3 origins), although it remains possible that these peaks represent passively fired origins (Shirahige et al, 1998). Once an active origin fires, the GINS complex moves at an almost constant rate of 1.6±0.3 kb/min. Its movement through the inter-origin regions is consistent with that of a protein complex associated with a smoothly moving replication fork. This progression rate is considerably lower and more tightly distributed than those inferred from previous genome-wide measurements assayed through nascent DNA production (Raghuraman et al, 2001; Yabuki et al, 2002). It is of interest to consider potential sources of these discrepancies, such as those arising from the integration inherent to monitoring nascent DNA accumulation. Thus, for example, integration in regions flanking inefficient origins produces aberrant skewing of the measured times of replication. Such problems are avoided by our direct measurement of a specific fork-associated protein Psf2 (discussed in Supplementary Figure S1). However, it may be conjectured that the discrepancies discussed above can be explained by the possibility that the replication forks are not tightly coupled to the replicative polymerases with respect to their dynamics (Walter and Newport, 2000; Pacek et al, 2006). We do not consider this likely because it would require that the polymerases leave the origin considerably later than the fork, exposing large stretches of unpaired DNA.
In this work, we also observe a large number of low-intensity persistent features at sites of high transcriptional activity (e.g. tRNA genes). We were able to accurately simulate these features by assuming they are the result of low probability arrest of replication forks at these sites. Previously, pausing of forks had been observed at such sights, where in certain cases the pause appeared to be coordinated with head-on collisions between the replication and transcriptional machineries (Deshpande and Newlon, 1996). Here, we do not observe any significant directional dependence of fork arrest at the tRNA genes, 40% of which transcribe in the same direction as the movement of the replication fork. Thus, the presently described features at highly transcribed genes do not appear to exclusively correlate with head-on collisions. The extremely low frequency of these events in wild-type cells suggests they are due to low probability stochastic occurrences during the replication process. It is hoped that future studies will resolve whether these persistent features indeed represent rare instances of fork arrest, or are the result of some alternative process. These may include, for example, the deposition of GINS complexes (or perhaps more specifically Psf2) once a pause has been resolved.
In this work, we have made extensive use of modeling to test a number of different hypotheses and assumptions. In particular, iterative modeling allowed us to infer that GINS progression is uniform and smooth throughout the genome. We have also shown the potential of simulations for estimating firing efficiencies. In the future, extending such firing efficiency simulations to the whole genome should allow us to make correlations with chromosomal features such as nucleosome occupancy. Such correlations may help in determining factors that govern the probability of replication initiation throughout the genome.
S. cerevisiae strains are from the W303 background. Strain MSY1 was made by tagging the PSF2 gene with a protein A (PrA) affinity tag at the C-terminal coding sequence through homologous recombination (Tackett et al, 2005). The BAR1 gene, which encodes for the α-factor protease, was replaced with the KANMX4 selection cassette, making the strain hypersensitive to α-factor block. Cells were grown in rich YEPD medium for synchronization.
MSY1 was grown in a 10 l Bioflo 410 fermenter (New Brunswick Scientific) to a density of ~7 × 106/ml at 30°C. The mating pheromone α-factor was added to a final concentration of 50 nM, and the cells were incubated for another 3 h. Cell arrest in G1 was verified by the complete presence of the shmoo morphology. The cells were harvested and centrifuged at 5000 r.p.m. at 4°C and washed twice with ice-cold YPD. The culture was re-suspended as quickly as possible in fresh media at 25°C. Initial cell density for all time-course experiments was ~1 × 107/ml. Budding indices for both types of time-course experiments (before and after formaldehyde incubation) are shown in the Supplementary Figure S9.
Time points were collected in two separate experiments—(a) every 15 min after release from the G1 block from 0 to 105 min and (b) every 5 min beginning 20 min after block (through S phase) from 20 to 50 min. For each time point to be used for ChIP, a 750 ml sample was cross-linked by incubation in 1% formaldehyde at room temperature for 20 min. The ChIP sample was centrifuged at 4500 r.p.m. for 4 min at 4°C, washed with ice-cold Tris pH 7.8 and centrifuged again at 3000 r.p.m. for 3 min at 4°C. The pellet was re-suspended in 500 μl of cold Lysis buffer (50 mM Hepes pH 7.5, 1.2% polyvinylpyrrolidine) and the cells were dripped into liquid nitrogen forming small pellets. The pellets were stored at −75°C until use, and then cryogenically ground three times at 30 Hz with a mixer mill (Retsch MM301). Ground samples were kept frozen until resuspended for IPs.
In all, 10 ml of cells from each time point were fixed in 70% ethanol for 1 h at room temperature, and prepared for FACS analysis using previously described methods (Haase and Lew, 1997). Cells were stained with 50 μg/ml propidium iodide, and DNA content was analyzed using a Facscalibur I (BD Biosciences). Data analysis was performed with FloJo 8.3.3 (http://www.flowjo.com/).
ChIP was performed as reported earlier (Tackett et al, 2005); 0.5 g of each frozen, ground time point was suspended in 1 ml of ChIP lysis buffer (20 mM Hepes pH 7.5, 150 mM NaCl, protease inhibitor cocktail (Roche)), and sonicated at 4°C, to an average size of ~400 bp (Supplementary Figure S10). The IP sample was incubated for 1 h at 4°C with IgG-conjugated 3 μm dynabeads (Invitrogen 143-01), and eluted as reported earlier (Tackett et al, 2005). A PCR cleanup kit was used to purify the DNA samples (Qiagen 28104).
Real-time PCR was performed as sample validation before array hybridization (Supplementary Figure S13); 1 μl of undiluted sample was analyzed in 20 μl reaction volume with 900 nM forward (ARS306taqfor—5′-TCGTCTAAGTCCTTGTAATGTAAGGTAAGA-3′) and reverse primers (ARS306taqrev—5′-GCTTGGGTTTGTGACTTACTAACG-3′), and 250 nM probe (ARS306taqprobe—5′-FAM-TGCAAGCATCTTGTTTGTAACGCGATTG-TAMRA-3′). Samples were analyzed with a 7900HT Sequence Detection System (Applied Biosystems), for 45 cycles of denaturation at 94°C for 15 s, annealing at 43°C for 30 s and extension at 72°C for 30 s. Results are normalized to ACT1 levels (primer sequences are ACT1taqfor—5′-CTCCGTCTGGATTGGTGGTT-3′, ACTtaq1rev—5′-TGGACCACTTTCGTCGTATTCTT-3′, ACT1taqprobe—5′-FAM-TTGACTACCTTCCAACAA-TAMRA-3′).
The samples were prepared for hybridization to yeast 4x44k whole genome microarrays (Agilent) with average spatial resolution of ~290 nt, and analyzed as described earlier (Ren et al, 2000; Tackett et al, 2005). The program SignalMap (Nimblegen) was used to visualize the data. The list of previously known origins was obtained from the OriDB website (Nieduszynski et al, 2007).
Fork velocities were estimated by measuring the distance between the peak edges in successive time points (Figure 2B), and dividing by 5 min. When edges began to merge with each other we ignored these data points. The rates are reported in kilobases per minute (kb/min).
The microarray results were returned as log2 ratios of IP fluorescence versus WCE fluorescence. For the purpose of normalization, the data were binned using a 5000 bp bin size. The distributions for the average intensities and the RMSD for these bins can be seen in Supplementary Figure S5A. The first normalization of the data was done by finding the intensity where 3% of the bins had a negative average intensity. This gives a robust zero level that is independent of occasional large negative outliers in the data. The centroid of the RMSD distribution was then found for each time point and the intensities were scaled in such a way that these centroids were set to 1 (Supplementary Figure S5B). The second normalization was performed by finding an intensity threshold that separated the signal from the noise, and by normalizing the centroid of the RMSD distribution of the noise (Supplementary Figure S5C). An additional correction was used for time points where the data were dominated by signal. This correction (1.5-fold for the 40 min time point) was based on the assumption that the number of forks in a wave is constant after firing of that origin has ceased and before the front of the wave has reached an adjacent wave of forks traveling in the opposite direction (see, for example the 35 and 40 min data in Figure 3C).
A small number of microarray time points exhibited signal saturation. This was corrected by re-scanning the same arrays at a lower gain, and adjusting the intensities of the saturated pixels according to average intensity ratio between the two scans (Dudley et al, 2002).
Fork progression was modeled use the following set of assumptions: (i) the start of replication at origin i is normally distributed with an average start time ti and a standard deviation σ; (ii) replication progresses at a constant velocity, v=1.6 kb/min, for all replication forks and over the whole genome; (iii) each origin has an associated efficiency, ei; (iv) pausing might occur at a pause site j in a fraction (f) of the cells with a probability (Pj) and a duration (dj).
The mean start times, the standard deviation of the start times and origin efficiencies were determined from our data by minimizing the sum of the square differences between simulation results and experimental data. For each pair of adjacent origins, this minimization procedure used five slices of the data. These slices were chosen to encompass each of the origins and three equidistant regions between them, and the width of the slices was chosen to be 10% of the distance between the neighboring origins. The median was calculated for each of these slices, and used to calculate the sum of square differences. Each origin was assumed to have its own mean start time, ti, but the standard deviation of the start time was assumed to be constant in each simulation region. We also assumed that the GINS complex does not linger at the origin for any significant period of time, and after binding moves away at 1.6 kb/min. Finally, the GINS complexes fall off when adjacent forks collide. Each simulation was performed 10 000 times (to simulate 10 000 cells), using a random number generator to determine start times within the Gaussian distribution for a particular origin. Examples of the effects of changing the parameters used in the optimizations are presented in Supplementary Figures S6–S8.
The complete experimental data can be found at http://prowl.rockefeller.edu/data/yeast_repl and has been deposited in MIAME compliant form in the Gene Expression Omnibus (accession number GSE19818).
Supplementary Discussion and data
We thank Bruz Marzolf for help with hybridizing and scanning of the arrays, and Conrad Nieduszynski for assistance with creating origin lists. We are grateful to Michael O'Donnell, Frederick Cross, Michael Rout, Andrew Murray, Bruce Stillman and Paul Nurse for insightful discussion and suggestions. We also thank all members of the Chait laboratory for their assistance. This work was supported by NIH grants U54 RR022220 (JA and BTC), P50 GM076547 (JA), and P41 RR00862 (BTC).
The authors declare that they have no conflict of interest.