|Home | About | Journals | Submit | Contact Us | Français|
DNA unwinding and polymerization are complex processes involving many intermediate species in the reactions. Our understanding of these processes is limited because the rates of the reactions or the existence of intermediate species is not apparent without specially designed experimental techniques and data analysis procedures. In this chapter we describe how pre-steady state single-turnover measurements analyzed by model-based methods can be used for estimating the elementary rate constants. Using the hexameric helicase and the DNA polymerase from bacteriophage T7 as model systems, we provide stepwise procedures for measuring the kinetics of the reactions they catalyze based on radioactivity and fluorescence. We also describe analysis of the experimental measurements using publicly available models and software gfit (http://gfit.sf.net).
Helicases are motor proteins that use the chemical energy of NTP hydrolysis to separate the strands of double stranded nucleic acids (1–4). Often helicases work in association with other proteins, such as the DNA polymerase or single strand binding proteins to perform their function with a greater efficiency. Characterization of the nucleic acid unwinding activity involves measurement of the rate and the processivity of the helicase-catalyzed unwinding reaction. These unwinding parameters can be used as basic handles to understand the mechanism of unwinding and the role of the helicase in a particular biological process. Measurement of the unwinding rate of helicases as a function of DNA duplex stability provided insights into the active or passive nature of the helicase-catalyzed reaction (5). Measurement of the unwinding rate of phage T7 hexameric helicase in the presence of the DNA polymerase provided insights into the synergistic action of the two motor proteins in DNA replication (6).
In this chapter, we describe assays to measure DNA unwinding catalyzed by the helicase, and DNA polymerization catalyzed by the helicase and polymerase proteins using the hexameric ring-shaped T7 gp4 helicase as our model system. We outline procedures to fit the kinetic data to specific models that provide kinetic parameters such as the rate of DNA unwinding and the rate of nucleotide incorporation during DNA polymerization.
The product of bacteriophage T7 gene gp4 is a ring shaped protein (7–9) that has both DNA unwinding and primase activities (10). T7 gp4 can move along single stranded (ss) (11) or double stranded (ds) (12) DNA and separate DNA strands using the energy of dTTP hydrolysis. The protein can also synthesize short (4–5 -mer) RNA primers using ATP and CTP at priming sites on the lagging DNA strand. T7 gp4 unwinds dsDNA using the strand exclusion model wherein one strand of the dsDNA is threaded into the central channel of the helicase and the other strand is excluded as the helicase unwinds the dsDNA (13,14). This strand exclusion model is accepted generally for ring shaped helicases (15–17). The unwinding rate of T7 gp4 depends on dsDNA stability and that its speed of DNA unwinding is slower than its speed of translocation along ssDNA (5). The other key component of T7 replication complex is gp5 DNA polymerase (18–20). Interestingly, the rate of DNA unwinding by T7 gp4 is accelerated by the polymerase and approached the ssDNA translocation rate of the helicase (6). Accurate measurements of kinetic rates of translocation, unwinding, and polymerization provide both quantitative and qualitative insights about the DNA replication mechanism.
Unwinding and polymerization pathways comprise many interacting reaction steps such as substrate binding, catalysis, and conformational changes. The large number of these steps makes their identification and characterization increasingly difficult. Therefore, one prerequisite of a successful experimental design is its ability to decouple the pathway: to study a part of the pathway in isolation or while controlling the influence of the other parts. Most clear results are produced by experiments that measure kinetics of one fully decoupled reaction step.
One strategy to maximize decoupling enabled by recent technological advances involves monitoring reactions occurring with single molecules. These techniques have been successfully applied to both DNA unwinding and polymerization (21–28). Unfortunately, only parts of DNA processing pathways can be currently observed at single molecule level, which brings us back to conventional measurement techniques that integrate signals originating from billions of enzyme molecules. During the course of reaction, as the system reaches a steady state, molecules of enzyme become distributed along the reaction pathway populating all of the reaction species and participating in all reactions simultaneously. Combined signal from such heterogeneous mixture is hard to interpret in terms of individual reaction rates.
These problems are overcome by using pre-steady state and single round kinetic techniques. Pre-steady state kinetic measurements involve synchronization of the system – assembling the reaction mixture in a way that populates only one species of the pathway. This can be done, for example, by withholding a component required for the next reaction step. After the reaction in the synchronized mixture is started, but before it reaches a steady state, the measured signal can be attributed to a few steps that follow the synchronization point.
The pre-steady state period of reaction is quite short due to a natural tendency of molecular systems to lose synchronization. To extend it and to be able to characterize more reaction steps in one experiment, single round conditions can be used. Single round conditions effectively prevent the system from reaching a steady state by allowing each molecule of enzyme transform no more than one molecule of substrate. In case of an unwinding reaction, this can be achieved by adding a helicase trap at the time of initiation. An excess of ssDNA can capture free helicase molecules from solution preventing them from re-binding to new substrate molecules. Pre-steady state single round approaches enhance our ability to decouple unwinding and polymerization pathways to measure the rates of their reaction steps.
DNA unwinding rates have been measured using both bulk and single molecule techniques (5,6,29). Single round conditions simplify interpretation of a bulk kinetic measurement result. Helicases can processively unwind stretches of dsDNA longer than their binding site. Therefore an unwinding reaction can be described as an n-step process and for experiments conducted under single round conditions, the results can be fit with a stepping equation (30,31). Such analysis of unwinding kinetics data for dsDNA substrates of different lengths can be used for estimating helicase stepping rate and size and for assessing processivity of the helicase; i.e., how far the helicase moves along the DNA before it falls off.
Ring-shaped helicases assemble around ssDNA, and assembly is usually a slow step (relative to the rate of DNA unwinding). Therefore, to measure the unwinding rate (rather than the assembly rate) and to synchronize the reactions, it is important to preassemble the helicase on the DNA substrate prior to reaction start. T7 gp4, like other ring-shaped helicases, binds to the DNA only in the presence of its nucleotide substrate (dTTP in the case of T7 gp4). This makes the preassembly of the helicase on the DNA substrate, without reaction occurring during the assembly period, challenging. Based on our finding that T7 gp4 forms hexamers and binds DNA in the absence of Mg(II), we arrived at the following assembly procedure (this might be applicable to other helicases that require the presence of NTP to bind DNA). We preassemble the helicase on the DNA by adding dTTP, but by leaving out Mg(II). In the absence of Mg(II) and in the presence of added EDTA (to chelate contaminating divalent metal ions), T7 gp4 does not hydrolyze dTTP or unwind DNA (32).
Two types of the unwinding assays are described: discontinuous gel-based radiometric assay and continuous stopped-flow fluorescence assay. They are both all-or-none unwinding assays where unwinding rates are obtained from the kinetics of end product formation, i.e., the kinetics of the appearance of the fully unwound DNA. Since the unwinding rate of T7 gp4 is fast, the kinetics are measured using a rapid quenched-flow or a stopped-flow apparatus (Fig. 2.1.) that allow mixing in the millisecond time scales. A typical DNA unwinding substrate for the hexameric helicase consists of a fork DNA. Two short DNA strands (top and bottom) are annealed to generate a duplex region (40 bp, here) and ssDNA overhangs at one end. A 5′ ssDNA overhang (dT35) in the top strand is needed for helicase binding (the DnaB family helicases that are 5′-3′ helicases), and the 3′ ssDNA overhang (dT15) in the bottom strand is required for strand exclusion during unwinding (Fig. 2.2.). In the gel-based assay, one of the DNA strands is radiolabeled, so the radioactive fork substrate and the ssDNA product can be quantified after they are resolved by native PAGE. The kinetics of DNA unwinding is fit to obtain the unwinding rate. In the fluorescence-based assay, a fluorescent dye (fluorescein) is incorporated at the 5′ end of the bottom strand (Fig. 2.3.). A run of three guanosines at the 3′ end of the top strand quenches fluorescein fluorescence when the substrate is duplexed. When helicase unwinds the dsDNA and the top strand is displaced away from the dye, the fluorescence increases. The time dependent increase in fluorescence is measured continuously in a stopped-flow apparatus.
DNA polymerization rates can be estimated from the individual nucleotide incorporation rates measured using transient state kinetic methods. DNA polymerases incorporate hundreds to thousands of nucleotides during polymerization in a template-dependent manner, adding one dNMP to the primer at a time and moving with a step-size of one nucleotide. Each nucleotide is added at a different rate that depends on several factors, not all well understood, one of which is the sequence context around the base to be added. The nucleotide addition rate can be determined accurately using a combination of rapid kinetics, product analysis on a high resolution sequencing gel, and data analysis. The rapid kinetic methods providing milli seconds time resolution capture the formation and decay of intermediate products, the sequencing gels can resolve the DNA products with a single base resolution, and data analysis extracts the single nucleotide incorporation rate and the polymerase off-rates from the observed kinetics of primer elongation.
DNA polymerase extends a primer annealed to a template DNA by utilizing dNTPs as substrates. When the template is single stranded, the polymerase can copy the template without the helicase, but when the template is double stranded, the polymerase requires the helicase to unwind the dsDNA. T7 DNA polymerase requires T7 gp4 to catalyze strand displacement DNA synthesis. The rate of DNA unwinding by the helicase with concomitant DNA synthesis can be measured by the unwinding assays described above using a replication fork substrate. Alternatively, the kinetics of the reaction can be measured by following the primer extension reaction. We describe methods to obtain the rate of each nucleotide addition in the primer extension reaction. Replication fork substrates are made by annealing the top and bottom ssDNAs to generate a duplex region (40 bp, here) and two ssDNA overhangs. A third strand, a primer (24 mer, here) is annealed to the bottom strand (of a defined sequence) to create a primer/template junction (Fig 2.4.). The primer is either radiolabeled or fluorescein-labeled (at the 5′-end) to follow the primer extension kinetics. Reaction products are separated by PAGE with single-base resolution to measure the amount of each intermediate.
Even the most advanced experimental techniques cannot fully decouple all steps involved in unwinding and polymerization process. Most experimental observations arise not from one but from multiple simultaneously occurring reactions. Such results do not directly provide a value for any reaction rate or even a confirmation that the reaction step actually takes place. This information can be extracted from the results by applying model-based (regression) analysis that has the following goals: a) to test if the proposed model is in agreement with the observations, b) to estimate parameters of the model and c) to estimate their confidence intervals.
In this chapter we describe analysis of unwinding and polymerization data using gfit, an open source program (http://gfit.sourceforge.net) (33). Although current version of gfit runs within MATLAB environment and uses computational models written in MATLAB language. Programming, however, is only involved in creating new models; all analysis tasks in gfit using existing models are performed through graphical user interface and do not require any knowledge of programming or MATLAB environment. In the following sections we discuss details of the analysis procedure, computational problems that it presents, and the ways to address these problems.
Analysis starts with creating a computational model based on existing knowledge about the system. The model should be able to compute (simulate) a predicted result of each experiment. To perform computations, the model uses two kinds of inputs: experimental conditions, known values from the experimental protocol (e.g., incubation time, nucleotide concentration) and model parameters, usually unknown, intrinsic properties of the system (e.g., translocation step size, rate of nucleotide incorporation). Although, for any given experiment, conditions and parameters are easily distinguishable, the same variable (e.g., enzyme concentration) may appear as a condition in one experiment and as a parameter in another.
Correctness of a model can never be demonstrated conclusively. However, given certain parameter values, a model may produce simulations closely matching (fitting) experimental measurements. Testing if the model is consistent with all the experimental observations, a key step in data analysis, is performed by global curve fitting (optimization). Failure to find a good set of parameter values usually means that the model is not accurate and requires structural changes. Finding a good fit not only demonstrates consistency, but also provides estimates of parameters. Such result does not conclude the analysis because the estimated parameter values carry little meaning without an indication of their uniqueness and an estimate of their confidence intervals. Restarting optimization many times from randomly selected positions in the parameter space may lead to discovery of alternative parameter sets fitting the data, an indication of low confidence. Underdetermined parameters may stem from an overly complicated model or from insufficient experimental data. The problem can be solved by simplifying the model, by adding explicit constraints to parameters (e.g., equating two related rate constants, sharing a parameter between multiple experiments), or by providing more experimental results. Global analysis of experiments that highlight different aspects of system’s behavior provides most rigorous validation of the model and allows estimation of its parameters with highest confidence.
Although procedures for model-based data analysis are well-established, their practical applications to biological systems present several computational challenges and places rigid requirements on the analysis software. The main function of the software is to provide methods for statistical analysis that operate on experimental data and computational models supplied by the researcher. Biological research projects are highly dynamic. Therefore the software should make it easy to modify the model, to include new experiments into global analysis, to change statistical weights, to use same parameter value for multiple experiments, etc. The desired flexibility can be achieved by developing a project-specific analysis program (34), although our experience shows that this approach requires frequent modifications of the program code, which slow down the analysis and lead to programming errors.
Computational models are the central parts of the analysis. The models have to simulate a wide range of biological processes and experimental procedures. Ordinary differential equations (ODEs) is currently the most common method to model biology. However, defining a model strictly as a set of ODEs is unacceptable because some systems have to be simulated with a floating number of ODEs (e.g., unwinding and polymerization, as discussed below), others – by solving ODEs several times while changing initial concentrations to replicate pre-incubation, dilution, and mixing performed as part of the experimental procedure, yet others may require completely different computational techniques. Since simulation is usually the slowest step in the analysis process, it should be performed as efficiently as possible. An analysis procedure may involve 103 to 108 simulations with different experimental conditions and parameters. To carry out the analysis without human intervention, each output of the model has to be directly comparable with the result of the corresponding experiment. In short, simulations require accuracy, flexibility, and efficiency. These requirements can be best met by writing models in a general purpose programming language (e.g. C++, Python, MATLAB) making it possible to accurately capture all details of the mechanism and experimental design and using most efficient algorithms.
Considering the above requirements, model-based analysis of experiments in this chapter was performed using gfit, software that solves the general case of this problem (33). Since gfit uses MATLAB scripts as models, it can be used for analysis of any type of experimental data – kinetic, thermodynamic, or any other. Details about writing gfit models are beyond the scope of this chapter. Instead, we described how the existing models can be used for data analysis. Models and sample datasets used in this chapter, more detailed documentation and more modeling examples can be found at gfit website: http://gfit.sourceforge.net.
Experimental data in gfit is represented by a collection of experiments. Each experiment may include multiple variables, which store numerical information. A variable may contain a scalar (single number), a vector (a column of numbers), or a matrix (arrays with more than two dimensions are also supported, but do not appear in this chapter). gfit uses variables for communicating with models. Input variables are used for sending a model the data it needs to perform a simulation; simulation results are received from the model as output variables. Since every model has its own special requirements for input data and produces different types of results, the model has to describe its input and output variables, their names, dimensional relationships, and other properties. By reading model description, gfit learns how the model should be run and what kind of simulations to expect from it.
Same information is used by gfit for importing experimental data. During import, the data is checked against the requirements of the currently selected model, to determine whether it is suitable for simulation and for fitting. If a critical piece of information is missing, or if data violates some of the model’s requirements, it is rejected. For example, it is an error not to include variable time in an unwinding experiment, because unwinding.m model lists this variable as a requirement. It is also an error to include a vector of 10 numbers as time, and a vector of 11 numbers - as F, because the model stipulates that the two variables should be vectors of equal length.
Experimental data is imported to gfit from a spreadsheet by copying a block of data into clipboard. Many experiments arranged side-by-side to each other can be imported in one operation. Each experiment must have a name starting with a letter and containing any characters thereafter. All names of experiments should appear in the top row of the imported block. Variables for each experiment should occupy spreadsheet cells directly below or below to the right of experiment’s name. Names and other properties of variables should be according to model’s definitions. To obtain names of variables defined by the current model, select menu Model → Copy data headers. This command places into the clipboard a sample header for one experiment. Paste contents of clipboard into an empty spreadsheet. Note that the header contains all variables defined by the model; not all of them have to appear in actual experimental data. Variable’s data occupy rectangular blocks of spreadsheet cells. The top left corner of each block should appear immediately below the variable’s name. The number for a scalar variable may also occupy the cell immediately to the right from its name. In that case, a colon, “:”, should be added to the name.
Experimental data is imported from clipboard by selecting menu Data → Paste-add Data. If import is successful, gfit generates parameters required for simulation and fitting of imported experiments. For each parameter, the name, optimization flag, lower bound constraint, current value, and upper bound constraint are shown. To simulate experiments, gfit combines data from parameters and experimental conditions and sends it to the model. The way each parameter is used for simulation of each experiment can be defined by researcher by selecting Combine Parameters and Separate Elements from Parameters menu or after right-clicking parameter name. The same parameter can be used in one or in many experiments, in one or in many input variables, and, if the input variable is an array, in multiple positions of the array.
Checked optimization flag indicates that the parameter should be optimized during fitting. During fitting, parameter value may vary between lower and upper bound constraints, which can be adjusted by researcher.
Unwinding can be modeled as a process involving multiple steps of equal size, s, and rate, kf. The only observable product, ssDNA, is produced at the last step. Its appearance is simulated by the model unwinding.m as incomplete gamma function (30,31,35). This method is both computationally efficient and allows using continuous (not only integers) number of steps. To account for heterogeneity in helicase population, the model calculates the sum of N unwinding processes. Other parameters of the model that can be estimated by fitting include unwinding amplitude, A, background signal, F0, and minimal stable duplex length, minD. It should be noted that the estimated step size s is strongly affected by enzyme’s heterogeneity and thus, maybe inaccurate.
Polymerization is another example of a multi-step process with the number of steps, N-1, equal to the length of the DNA template. At each step, the polymerase can either add a nucleotide to the growing chain with rate kf, or dissociate from the substrate, rate kd. The process is modeled as N ODEs. Since intermediate polymerization species can be resolved on a gel, a vector concentration is measured for each time point. The result of the experiment is therefore a matrix of size M by N, where M is the number of time points. To avoid writing dedicated models for each template length in gfit it is possible to write a universal model, polymerase_ni.m, that always simulates the correct number of steps according to the experimental data.
Forward and dissociation rates at each step do not have to be the same. Therefore, for each simulation the model requires N-1 -long vectors for kf and kd. Accordingly, for each experiment gfit generates N-1 parameters for each of the variables. By default, these parameters are linked (forced to keep the same value) into single parameters kf and kd. It is a researcher’s option to either separate the parameters completely, or to group them in any desired fashion. For example, one could try fitting the data while linking the kf-s according to the incorporated base type – kfG, kfT, kfA, kfC.
A zip archive containing all models and data used in this chapter can be downloaded from http://gfit.sourceforge.net. The archive also includes readme.txt file with most current information. The data files are in tab/newline-delimited format. The files can be opened in a text editor, but it is more convenient to view them in a spreadsheet application.
T7 gp4-catalyzed DNA unwinding activity requires a fork substrate (Fig. 2.2 and 2.3). When the activity of helicase and polymerase is measured, the unwinding substrate is further modified to anneal a primer to the bottom strand to create a primer/template junction for DNA polymerase binding (Fig. 2.4). All the proteins are preassembled on the DNA substrate in the presence of dTTP, but in the absence of Mg(II). Reaction are started as ‘standing start’ by adding Mg(II) (see Note 7). The rapid quenched-flow methods for gel-based DNA unwinding assays (31) and strand displacement DNA synthesis for polymerase-helicase in replisome (6) are described here. A high throughput stopped flow fluorescence-based DNA unwinding assay for helicase is also explained (42).
gfit software can be downloaded from http://gfit.sourceforge.net. This website also contains most up-to-date installation instructions.
Authors would like to thanks the Patel lab members for proofreading the chapter and testing the model and the National Institute of Health grant GM55310.
1Check all the protein preparations for nuclease activity to avoid anomalous results due to DNA substrate degradation.
2BT1 membranes used for Elutrap are brittle and need extra care while handling. BT2 membranes need to be stored at 4°C and be moist all the time. Reverse the polarity of electrodes for about 30 s before taking a elute fraction of the oligonucleotide to dislodge any DNA sticking to the membranes. Use fresh gloves and blades for handling each oligonucleotide while cutting out the band form the preparatory gel and setting up the elution to prevent any intermixing and contamination due to handling. Do not overexpose the DNA band to UV light.
3It is very important to get accurate oligonucleotide concentrations for preparing correctly annealed substrate for all the assays described here.
4Sephadex G-25 spin column purification works well for all the oligonucleotides above 8 nt; however, the recovery and purity may vary depending on the secondary structure of the oligo. Other matrices like P-5, P-30 from BioRad or Sephadex G-50 could also be tried if needed. Use medium size particle grade matrix powder for making the spin column.
5It is good to have GC proportion of the duplex region evenly distributed through the entire duplex region to avoid getting biased rates due to localized high or low GC patches. During oligonucleotide design avoid primase recognition site 3′CTG (on the top oligo) to prevent primase activity when not needed. Avoid freezing and thawing the annealed substrates to retain their proper form. It is best to make them fresh or if required store at 4 °C.
6Do not have Tris or any other competing amine group in the labeling reactions with fluorescein for efficient labeling of the amine group. Fluorescein labeled DNA substrate should always be protected from light by using dark tubes for reactions and storage. Keep the stopped flow reaction chamber covered with aluminum foil to avoid outside light exposure of the dye.
7In all the assays described here, it is important to keep a basal level of EDTA in protein assembly solution to chelate any contaminating magnesium ions present in buffers to prevent uncontrolled start of the reaction.
8All the buffer solutions used should be filtered through the 0.22 micron filters when ever possible to avoid blockage or bacterial growth in the RQF or stopped flow instruments. Avoid air bubbles in tubes, loops and syringes in both the instruments when collecting data. A vacuum line is required for the RQF instrument for efficient flushing and drying of the loops. Do not connect the RQF instrument exit line to vacuum when valves are in ‘load’ position (reactants in the sample loading syringe) or in ‘fire’ mode during the experiment. Clean RQF and stopped flow instrument tubes before and after to prevent any blockage/buildup in the system. Never fire either instruments while having valves in ‘load’ position.
9The quenched DNA unwinding reactions assays should be loaded immediately on the gel; if possible, as the experiment is being carried out, to avoid re-annealing of the complementary strands especially in reactions without trap.
10Use fresh DTT for T7 DNA polymerase assembly. Store the assembled T7 DNA polymerase on ice until use (2 – 3 hrs) and do not use the frozen assembly.
11Sequencing gels should be made with high purity acrylamide solutions and reagents. The gels should be run at high power to have temperatures around 45–50°C to get high resolution of the bands. Gel prepared with wedge spacer (0.25–0.4 mm here) gives good resolution for the bands of 20–80 mer size range. Sequencing gel loading dye to sample ratio should be increased if length or GC content of the duplex region of the DNA substrate is higher than mentioned here. Xylene cyanol dye interferes with the fluorescein intensity measurements and should not be used with fluorescent samples.
12The parameters of unwinding can be estimated more reliably by globally fitting unwinding kinetics for substrates of different length and similar GC composition. Processivity is best determined from the gel-based assay by analyzing unwinding amplitude as a function of dsDNA length (31).
13Fitting operation can be aborted at any time by clicking Cancel. If, for any reason, fitting did not complete successfully, the latest set of parameters will appear in the right column. To continue optimization, copy new parameter values to the starting ones by clicking column header optimum or value and click button Fit. Sometimes a successful fit can be improved simply by re-starting it.
14Random restart method searches parameter space for the best fit by continuously reinitializing local optimization from randomly selected parameter values. The results of local optimizations are accumulated in file Mgfit/Temp/Optim_nnn.txt. The file can be opened in a spreadsheet and will appear as a table with each row representing parameter values at a local optimum. The first column of the table contains goodness of fit values. Parameters appear in the table in the order they were discovered. Sort the table by the first column to locate the best fits.