Helicases are motor proteins that use the chemical energy of NTP hydrolysis to separate the strands of double stranded nucleic acids (1
). Often helicases work in association with other proteins, such as the DNA polymerase or single strand binding proteins to perform their function with a greater efficiency. Characterization of the nucleic acid unwinding activity involves measurement of the rate and the processivity of the helicase-catalyzed unwinding reaction. These unwinding parameters can be used as basic handles to understand the mechanism of unwinding and the role of the helicase in a particular biological process. Measurement of the unwinding rate of helicases as a function of DNA duplex stability provided insights into the active or passive nature of the helicase-catalyzed reaction (5
). Measurement of the unwinding rate of phage T7 hexameric helicase in the presence of the DNA polymerase provided insights into the synergistic action of the two motor proteins in DNA replication (6
In this chapter, we describe assays to measure DNA unwinding catalyzed by the helicase, and DNA polymerization catalyzed by the helicase and polymerase proteins using the hexameric ring-shaped T7 gp4 helicase as our model system. We outline procedures to fit the kinetic data to specific models that provide kinetic parameters such as the rate of DNA unwinding and the rate of nucleotide incorporation during DNA polymerization.
The product of bacteriophage T7 gene gp4 is a ring shaped protein (7
) that has both DNA unwinding and primase activities (10
). T7 gp4 can move along single stranded (ss) (11
) or double stranded (ds) (12
) DNA and separate DNA strands using the energy of dTTP hydrolysis. The protein can also synthesize short (4–5 -mer) RNA primers using ATP and CTP at priming sites on the lagging DNA strand. T7 gp4 unwinds dsDNA using the strand exclusion model wherein one strand of the dsDNA is threaded into the central channel of the helicase and the other strand is excluded as the helicase unwinds the dsDNA (13
). This strand exclusion model is accepted generally for ring shaped helicases (15
). The unwinding rate of T7 gp4 depends on dsDNA stability and that its speed of DNA unwinding is slower than its speed of translocation along ssDNA (5
). The other key component of T7 replication complex is gp5 DNA polymerase (18
). Interestingly, the rate of DNA unwinding by T7 gp4 is accelerated by the polymerase and approached the ssDNA translocation rate of the helicase (6
). Accurate measurements of kinetic rates of translocation, unwinding, and polymerization provide both quantitative and qualitative insights about the DNA replication mechanism.
1.1. Methods for measuring reaction kinetics
Unwinding and polymerization pathways comprise many interacting reaction steps such as substrate binding, catalysis, and conformational changes. The large number of these steps makes their identification and characterization increasingly difficult. Therefore, one prerequisite of a successful experimental design is its ability to decouple the pathway: to study a part of the pathway in isolation or while controlling the influence of the other parts. Most clear results are produced by experiments that measure kinetics of one fully decoupled reaction step.
One strategy to maximize decoupling enabled by recent technological advances involves monitoring reactions occurring with single molecules. These techniques have been successfully applied to both DNA unwinding and polymerization (21
). Unfortunately, only parts of DNA processing pathways can be currently observed at single molecule level, which brings us back to conventional measurement techniques that integrate signals originating from billions of enzyme molecules. During the course of reaction, as the system reaches a steady state, molecules of enzyme become distributed along the reaction pathway populating all of the reaction species and participating in all reactions simultaneously. Combined signal from such heterogeneous mixture is hard to interpret in terms of individual reaction rates.
These problems are overcome by using pre-steady state and single round kinetic techniques. Pre-steady state kinetic measurements involve synchronization of the system – assembling the reaction mixture in a way that populates only one species of the pathway. This can be done, for example, by withholding a component required for the next reaction step. After the reaction in the synchronized mixture is started, but before it reaches a steady state, the measured signal can be attributed to a few steps that follow the synchronization point.
The pre-steady state period of reaction is quite short due to a natural tendency of molecular systems to lose synchronization. To extend it and to be able to characterize more reaction steps in one experiment, single round conditions can be used. Single round conditions effectively prevent the system from reaching a steady state by allowing each molecule of enzyme transform no more than one molecule of substrate. In case of an unwinding reaction, this can be achieved by adding a helicase trap at the time of initiation. An excess of ssDNA can capture free helicase molecules from solution preventing them from re-binding to new substrate molecules. Pre-steady state single round approaches enhance our ability to decouple unwinding and polymerization pathways to measure the rates of their reaction steps.
1.1.1. Unwinding kinetics
DNA unwinding rates have been measured using both bulk and single molecule techniques (5
). Single round conditions simplify interpretation of a bulk kinetic measurement result. Helicases can processively unwind stretches of dsDNA longer than their binding site. Therefore an unwinding reaction can be described as an n-step process and for experiments conducted under single round conditions, the results can be fit with a stepping equation (30
). Such analysis of unwinding kinetics data for dsDNA substrates of different lengths can be used for estimating helicase stepping rate and size and for assessing processivity of the helicase; i.e., how far the helicase moves along the DNA before it falls off.
18.104.22.168. Assembly of helicase on the DNA substrate
Ring-shaped helicases assemble around ssDNA, and assembly is usually a slow step (relative to the rate of DNA unwinding). Therefore, to measure the unwinding rate (rather than the assembly rate) and to synchronize the reactions, it is important to preassemble the helicase on the DNA substrate prior to reaction start. T7 gp4, like other ring-shaped helicases, binds to the DNA only in the presence of its nucleotide substrate (dTTP in the case of T7 gp4). This makes the preassembly of the helicase on the DNA substrate, without reaction occurring during the assembly period, challenging. Based on our finding that T7 gp4 forms hexamers and binds DNA in the absence of Mg(II), we arrived at the following assembly procedure (this might be applicable to other helicases that require the presence of NTP to bind DNA). We preassemble the helicase on the DNA by adding dTTP, but by leaving out Mg(II). In the absence of Mg(II) and in the presence of added EDTA (to chelate contaminating divalent metal ions), T7 gp4 does not hydrolyze dTTP or unwind DNA (32
22.214.171.124. DNA substrates and DNA unwinding kinetics
Two types of the unwinding assays are described: discontinuous gel-based radiometric assay and continuous stopped-flow fluorescence assay. They are both all-or-none unwinding assays where unwinding rates are obtained from the kinetics of end product formation, i.e., the kinetics of the appearance of the fully unwound DNA. Since the unwinding rate of T7 gp4 is fast, the kinetics are measured using a rapid quenched-flow or a stopped-flow apparatus (.) that allow mixing in the millisecond time scales. A typical DNA unwinding substrate for the hexameric helicase consists of a fork DNA. Two short DNA strands (top and bottom) are annealed to generate a duplex region (40 bp, here) and ssDNA overhangs at one end. A 5′ ssDNA overhang (dT35) in the top strand is needed for helicase binding (the DnaB family helicases that are 5′-3′ helicases), and the 3′ ssDNA overhang (dT15) in the bottom strand is required for strand exclusion during unwinding (.). In the gel-based assay, one of the DNA strands is radiolabeled, so the radioactive fork substrate and the ssDNA product can be quantified after they are resolved by native PAGE. The kinetics of DNA unwinding is fit to obtain the unwinding rate. In the fluorescence-based assay, a fluorescent dye (fluorescein) is incorporated at the 5′ end of the bottom strand (.). A run of three guanosines at the 3′ end of the top strand quenches fluorescein fluorescence when the substrate is duplexed. When helicase unwinds the dsDNA and the top strand is displaced away from the dye, the fluorescence increases. The time dependent increase in fluorescence is measured continuously in a stopped-flow apparatus.
Fig. 2.1 Instrumental designs for the rapid kinetic studies. a) Chemical quenched-flow RQF-3 (www.kintek-corp.com, figure kindly provided by Prof. K.A. Jhonson). Sample A and Sample B are loaded in sample loops from the load ports via a three way valve. Upon firing (more ...)
Fig. 2.2 Gel-based radiometric assay for DNA unwinding. a) The DNA unwinding fork substrate design with radiolabeled top strand. T7 gp4 assembles on the top strand and moves in the 5′ to 3′ direction to unwind the dsDNA substrate. b) Representative (more ...)
Fig. 2.3 Fluorescence-base stopped-flow assay for DNA unwinding. a) DNA unwinding fork substrate design with fluorescein in the lower strand and GGG at the 3′ end in the top strand. T7 gp4 moves in the 5′ to 3′ direction to unwind the dsDNA (more ...)
1.1.2. Polymerization kinetics
DNA polymerization rates can be estimated from the individual nucleotide incorporation rates measured using transient state kinetic methods. DNA polymerases incorporate hundreds to thousands of nucleotides during polymerization in a template-dependent manner, adding one dNMP to the primer at a time and moving with a step-size of one nucleotide. Each nucleotide is added at a different rate that depends on several factors, not all well understood, one of which is the sequence context around the base to be added. The nucleotide addition rate can be determined accurately using a combination of rapid kinetics, product analysis on a high resolution sequencing gel, and data analysis. The rapid kinetic methods providing milli seconds time resolution capture the formation and decay of intermediate products, the sequencing gels can resolve the DNA products with a single base resolution, and data analysis extracts the single nucleotide incorporation rate and the polymerase off-rates from the observed kinetics of primer elongation.
126.96.36.199. DNA synthesis kinetics
DNA polymerase extends a primer annealed to a template DNA by utilizing dNTPs as substrates. When the template is single stranded, the polymerase can copy the template without the helicase, but when the template is double stranded, the polymerase requires the helicase to unwind the dsDNA. T7 DNA polymerase requires T7 gp4 to catalyze strand displacement DNA synthesis. The rate of DNA unwinding by the helicase with concomitant DNA synthesis can be measured by the unwinding assays described above using a replication fork substrate. Alternatively, the kinetics of the reaction can be measured by following the primer extension reaction. We describe methods to obtain the rate of each nucleotide addition in the primer extension reaction. Replication fork substrates are made by annealing the top and bottom ssDNAs to generate a duplex region (40 bp, here) and two ssDNA overhangs. A third strand, a primer (24 mer, here) is annealed to the bottom strand (of a defined sequence) to create a primer/template junction (.). The primer is either radiolabeled or fluorescein-labeled (at the 5′-end) to follow the primer extension kinetics. Reaction products are separated by PAGE with single-base resolution to measure the amount of each intermediate.
Fig. 2.4 Strand displacement DNA synthesis by helicase-polymerase replisome. a) T7 DNA polymerase and T7 gp4 are assembled on the replication fork substrate with a radiolabeled DNA primer. T7 gp4 on top strand moves to unwind the dsDNA and DNA polymerase extends (more ...)
1.2. Analysis of reaction kinetics measurements
Even the most advanced experimental techniques cannot fully decouple all steps involved in unwinding and polymerization process. Most experimental observations arise not from one but from multiple simultaneously occurring reactions. Such results do not directly provide a value for any reaction rate or even a confirmation that the reaction step actually takes place. This information can be extracted from the results by applying model-based (regression) analysis that has the following goals: a) to test if the proposed model is in agreement with the observations, b) to estimate parameters of the model and c) to estimate their confidence intervals.
In this chapter we describe analysis of unwinding and polymerization data using gfit
, an open source program (http://gfit.sourceforge.net
). Although current version of gfit
runs within MATLAB environment and uses computational models written in MATLAB language. Programming, however, is only involved in creating new models; all analysis tasks in gfit
using existing models are performed through graphical user interface and do not require any knowledge of programming or MATLAB environment. In the following sections we discuss details of the analysis procedure, computational problems that it presents, and the ways to address these problems.
1.2.1. Steps involved in model-based analysis
Analysis starts with creating a computational model based on existing knowledge about the system. The model should be able to compute (simulate) a predicted result of each experiment. To perform computations, the model uses two kinds of inputs: experimental conditions, known values from the experimental protocol (e.g., incubation time, nucleotide concentration) and model parameters, usually unknown, intrinsic properties of the system (e.g., translocation step size, rate of nucleotide incorporation). Although, for any given experiment, conditions and parameters are easily distinguishable, the same variable (e.g., enzyme concentration) may appear as a condition in one experiment and as a parameter in another.
Correctness of a model can never be demonstrated conclusively. However, given certain parameter values, a model may produce simulations closely matching (fitting) experimental measurements. Testing if the model is consistent with all the experimental observations, a key step in data analysis, is performed by global curve fitting (optimization). Failure to find a good set of parameter values usually means that the model is not accurate and requires structural changes. Finding a good fit not only demonstrates consistency, but also provides estimates of parameters. Such result does not conclude the analysis because the estimated parameter values carry little meaning without an indication of their uniqueness and an estimate of their confidence intervals. Restarting optimization many times from randomly selected positions in the parameter space may lead to discovery of alternative parameter sets fitting the data, an indication of low confidence. Underdetermined parameters may stem from an overly complicated model or from insufficient experimental data. The problem can be solved by simplifying the model, by adding explicit constraints to parameters (e.g., equating two related rate constants, sharing a parameter between multiple experiments), or by providing more experimental results. Global analysis of experiments that highlight different aspects of system’s behavior provides most rigorous validation of the model and allows estimation of its parameters with highest confidence.
1.2.2. Software for model-based analysis
Although procedures for model-based data analysis are well-established, their practical applications to biological systems present several computational challenges and places rigid requirements on the analysis software. The main function of the software is to provide methods for statistical analysis that operate on experimental data and computational models supplied by the researcher. Biological research projects are highly dynamic. Therefore the software should make it easy to modify the model, to include new experiments into global analysis, to change statistical weights, to use same parameter value for multiple experiments, etc. The desired flexibility can be achieved by developing a project-specific analysis program (34
), although our experience shows that this approach requires frequent modifications of the program code, which slow down the analysis and lead to programming errors.
Computational models are the central parts of the analysis. The models have to simulate a wide range of biological processes and experimental procedures. Ordinary differential equations (ODEs) is currently the most common method to model biology. However, defining a model strictly as a set of ODEs is unacceptable because some systems have to be simulated with a floating number of ODEs (e.g., unwinding and polymerization, as discussed below), others – by solving ODEs several times while changing initial concentrations to replicate pre-incubation, dilution, and mixing performed as part of the experimental procedure, yet others may require completely different computational techniques. Since simulation is usually the slowest step in the analysis process, it should be performed as efficiently as possible. An analysis procedure may involve 103 to 108 simulations with different experimental conditions and parameters. To carry out the analysis without human intervention, each output of the model has to be directly comparable with the result of the corresponding experiment. In short, simulations require accuracy, flexibility, and efficiency. These requirements can be best met by writing models in a general purpose programming language (e.g. C++, Python, MATLAB) making it possible to accurately capture all details of the mechanism and experimental design and using most efficient algorithms.
Considering the above requirements, model-based analysis of experiments in this chapter was performed using gfit
, software that solves the general case of this problem (33
). Since gfit
uses MATLAB scripts as models, it can be used for analysis of any type of experimental data – kinetic, thermodynamic, or any other. Details about writing gfit
models are beyond the scope of this chapter. Instead, we described how the existing models can be used for data analysis. Models and sample datasets used in this chapter, more detailed documentation and more modeling examples can be found at gfit
1.2.3. Data analysis using gfit
Experimental data in gfit is represented by a collection of experiments. Each experiment may include multiple variables, which store numerical information. A variable may contain a scalar (single number), a vector (a column of numbers), or a matrix (arrays with more than two dimensions are also supported, but do not appear in this chapter). gfit uses variables for communicating with models. Input variables are used for sending a model the data it needs to perform a simulation; simulation results are received from the model as output variables. Since every model has its own special requirements for input data and produces different types of results, the model has to describe its input and output variables, their names, dimensional relationships, and other properties. By reading model description, gfit learns how the model should be run and what kind of simulations to expect from it.
Same information is used by gfit for importing experimental data. During import, the data is checked against the requirements of the currently selected model, to determine whether it is suitable for simulation and for fitting. If a critical piece of information is missing, or if data violates some of the model’s requirements, it is rejected. For example, it is an error not to include variable time in an unwinding experiment, because unwinding.m model lists this variable as a requirement. It is also an error to include a vector of 10 numbers as time, and a vector of 11 numbers - as F, because the model stipulates that the two variables should be vectors of equal length.
Experimental data is imported to gfit from a spreadsheet by copying a block of data into clipboard. Many experiments arranged side-by-side to each other can be imported in one operation. Each experiment must have a name starting with a letter and containing any characters thereafter. All names of experiments should appear in the top row of the imported block. Variables for each experiment should occupy spreadsheet cells directly below or below to the right of experiment’s name. Names and other properties of variables should be according to model’s definitions. To obtain names of variables defined by the current model, select menu Model → Copy data headers. This command places into the clipboard a sample header for one experiment. Paste contents of clipboard into an empty spreadsheet. Note that the header contains all variables defined by the model; not all of them have to appear in actual experimental data. Variable’s data occupy rectangular blocks of spreadsheet cells. The top left corner of each block should appear immediately below the variable’s name. The number for a scalar variable may also occupy the cell immediately to the right from its name. In that case, a colon, “:”, should be added to the name.
Experimental data is imported from clipboard by selecting menu Data → Paste-add Data. If import is successful, gfit generates parameters required for simulation and fitting of imported experiments. For each parameter, the name, optimization flag, lower bound constraint, current value, and upper bound constraint are shown. To simulate experiments, gfit combines data from parameters and experimental conditions and sends it to the model. The way each parameter is used for simulation of each experiment can be defined by researcher by selecting Combine Parameters and Separate Elements from Parameters menu or after right-clicking parameter name. The same parameter can be used in one or in many experiments, in one or in many input variables, and, if the input variable is an array, in multiple positions of the array.
Checked optimization flag indicates that the parameter should be optimized during fitting. During fitting, parameter value may vary between lower and upper bound constraints, which can be adjusted by researcher.
1.2.4. Computational model for unwinding
Unwinding can be modeled as a process involving multiple steps of equal size, s
, and rate, kf
. The only observable product, ssDNA, is produced at the last step. Its appearance is simulated by the model unwinding.m as incomplete gamma function (30
). This method is both computationally efficient and allows using continuous (not only integers) number of steps. To account for heterogeneity in helicase population, the model calculates the sum of N
unwinding processes. Other parameters of the model that can be estimated by fitting include unwinding amplitude, A
, background signal, F0
, and minimal stable duplex length, minD
. It should be noted that the estimated step size s
is strongly affected by enzyme’s heterogeneity and thus, maybe inaccurate.
1.2.5. Computational model for polymerization
Polymerization is another example of a multi-step process with the number of steps, N-1, equal to the length of the DNA template. At each step, the polymerase can either add a nucleotide to the growing chain with rate kf, or dissociate from the substrate, rate kd. The process is modeled as N ODEs. Since intermediate polymerization species can be resolved on a gel, a vector concentration is measured for each time point. The result of the experiment is therefore a matrix of size M by N, where M is the number of time points. To avoid writing dedicated models for each template length in gfit it is possible to write a universal model, polymerase_ni.m, that always simulates the correct number of steps according to the experimental data.
Forward and dissociation rates at each step do not have to be the same. Therefore, for each simulation the model requires N-1 -long vectors for kf and kd. Accordingly, for each experiment gfit generates N-1 parameters for each of the variables. By default, these parameters are linked (forced to keep the same value) into single parameters kf and kd. It is a researcher’s option to either separate the parameters completely, or to group them in any desired fashion. For example, one could try fitting the data while linking the kf-s according to the incorporated base type – kfG, kfT, kfA, kfC.