|Home | About | Journals | Submit | Contact Us | Français|
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Since real time PCR was first developed, several approaches to estimating the initial quantity of template in an RT-PCR reaction have been tried. While initially only the early thermal cycles corresponding to exponential duplication were used, lately there has been an effort to use all of the cycles in a PCR. The efforts have included both fitting empirical sigmoid curves and more elaborate mechanistic models that explore the chemical reactions taking place during each cycle. The more elaborate mechanistic models require many more parameters than can be fit from a single amplification, while the empirical models provide little insight and are difficult to tailor to specific reactants.
We directly estimate the initial amount of amplicon using a simplified mechanistic model based on chemical reactions in the annealing step of the PCR. The basic model includes the duplication of DNA with the digestion of Taqman probe and the re-annealing between previously synthesized DNA strands of opposite orientation. By modelling the amount of Taqman probe digested and matching that with the observed fluorescence, the conversion factor between the number of fluorescing dye molecules and observed fluorescent emission can be estimated, along with the absolute initial amount of amplicon and the rate parameter for re-annealing. The model is applied to several PCR reactions with known amounts of amplicon and is shown to work reasonably well. An expanded version of the model allows duplication of amplicon without release of fluorescent dye, by adding 1 more parameter to the model. The additional process is helpful in most cases where the initial primer concentration exceeds the initial probe concentration. Software for applying the algorithm to data may be downloaded at http://www.niehs.nih.gov/research/resources/software/pcranalyzer/
We present proof of the principle that a mechanistically based model can be fit to observations from a single PCR amplification. Initial amounts of amplicon are well estimated without using a standard solution. Using the ratio of the predicted initial amounts of amplicon from 2 PCRs is shown to work well even when the absolute amounts of amplicon are underestimated in the individual PCRs.
The powerful technique of cDNA amplification by polymerase chain reactions (PCR) is used to quantify small amounts of mRNA in a variety of applications from genomics to medicine[1,2]. Using reverse transcription, the mRNA is converted to cDNA, which is then amplified by repeated duplication in the PCR process (RT-PCR). Various detection systems are available, including Taqman probes and SYBR Green, which mark the amplification process by fluorescent emission and so allow monitoring the DNA amplification in real time. The Taqman system uses a sequence-specific probe that attaches to a nucleotide sequence within the template. After the attachment of primer and Taq enzyme, the duplication of the template strands proceeds in the extension step. As part of the extension, the attached part of the probe molecule is digested, separating the fluorescent reporter dye from the quencher sequence. Probe molecules are 'used up' in the duplication process. By contrast, the SYBR Green system binds non-specifically to any double-stranded DNA complex and emits fluorescence only in the attached state. The dye molecules dissociate during the denaturation step, but are used again in later cycles.
It is the goal of quantitative PCR to use the record of observed fluorescent emissions to estimate the initial amount of cDNA, either in absolute terms of the amount of cDNA at the start of the amplification or in relative terms when the initial amount of cDNA for one PCR is estimated as a fold change from that of another PCR. In either case, the estimation requires a mathematical expression or model to link the initial amount of amplicon to the observed fluorescence. The most well known approach uses data from the short span of cycles whose fluorescent emission just rise above the background fluorescence, but for which duplication can still be approximated by an exponential curve, such as the one below:
Here FC is the observed fluorescence at the Cth thermal cycle, and F0, the fluorescence that would be emitted by the initial amount of amplicon, when duplicated. The quantity FC is generally found to be less than F0 2C, so that, using equation 1, the efficiency, E, can be no greater than 1.
Many different and ingenious approaches have been published using this expression; however, the use of this expression brings with it 2 important drawbacks:
1) The observations of relatively few cycles are used, and those cycles are not always easy to identify.
2) Information beyond the observations from the PCR amplifications is needed to convert from the observed fluorescent scale to estimated quantities of amplicon in nmol/L.
Absolute quantitation using this principal is achieved using the standard curve technique , where the efficiency of the duplication is found by amplifying serial dilutions of a known concentration of the amplicon. This approach can be costly both in time taken to construct the standard solutions and in reagents for the PCR reactions since multiple samples are used. Alternatively, Swillens et al applied equation 1 specifically to PCRs run with the Taqman probe, making use of the fact that the probe is used up in amplification process . By assuming that the entire initial amount of probe is used up and relating the initial probe amount to the final fluorescent reading, they estimated the conversion factor between number of probe molecules and observed fluorescence, thus estimating the initial amount of amplicon absolutely.
The effort to use more of the thermal cycles in a PCR has led to fitting a mathematical expression for a sigmoid curve to all of the fluorescent data using nonlinear regression [7-9]. The curves used by these authors can fit the observations very well, but do not reflect any of the internal processes of a PCR, so that it is difficult to fit them more precisely by adding information that may be known about specific reactants. There is also no intrinsic reason to believe that a curve that fits observations above the background fluorescence will also fit the observations below the background fluorescence, which could also affect the estimates of the initial amount of amplicon. Furthermore, when the sigmoid curves are applied to a dye that is not inactivated by the duplication process, a standard solution with a known amount of the target molecules is still needed to estimate the conversion factor between fluorescent emission and number of target molecules [7,8]. Goll et al applied numerous sigmoid expressions to observations generated by the Taqman probe but again used solutions with known copy number to estimate the conversion factor .
Alternatively, detailed models of the reactions taking place in the anneal and extension steps of the PCR have also been presented [10,11]. Using reaction equations to describe each chemical reaction in the duplication process, a detailed picture is built up of what is known of the process. Including all reactions allows insight into consequences of limiting nucleotide bases, primers or enzyme for example, but requires a large number of parameters that are approximated with values from the literature. The models describe the chemical processes in a PCR consistent with detection systems similar to SYBR Green, so that there are no equations comparable to the annealing and digestion of the Taqman probe. This simplifies the system of equations, but again requires the information of a standard solution to estimate the conversion factor between fluorescent emission and number of target molecules.
In this paper, we present a compromise between the last two approaches. That is, we present a model based on the chemical reactions taking place in the anneal step of PCRs with Taqman probe, but use simplifying assumptions, to reduce the model to only 3 parameters. All parameters are shown to be estimable using the observations from a single amplification. Since the model includes the digestion of probe molecules as part of the duplication, a more precise estimate of the amount of probe converted to fluorescence and therefore a more precise estimate of the conversion factor is possible.
In the body of the paper we present the basic model in detail, including the duplication process with probe digestion as well as the re-annealing of template. In the Appendices we present equations that represent an additional process that may be important to particular PCRs: duplication of template without release of fluorescence, i.e. 'unmarked duplication'.
We fit the model to 3 data sets with known initial concentrations of amplicons to evaluate the model's estimates. Adding the unmarked duplication process improved fits and estimates in cases where the initial amount of probe was smaller than the initial amount of primer. For one of the data sets the estimates are quite good, and in all cases the estimates are of the right order of magnitude, though sometimes underestimated. In the cases where the initial amount of amplicon is underestimated, the fold changes of different concentrations of the amplicon are still well estimated, indicating that a factor common to both amplifications has been left out of the model.
The model presented in this paper estimates the initial amount of template that is amplified through the PCR process. If the PCR process is used to quantify an unknown amount of mRNA (as opposed to an unknown amount of cDNA directly), the mRNA must first be reverse transcribed to form what is in theory an equivalent amount of cDNA. The efficiency of the transcription varies with the practitioner and the reagents, but estimating the efficiency of the transcription is beyond the scope of this paper. It is the transcribed amount of cDNA that is estimated by our approach.
Once the mRNA has been transcribed, the PCR process acts to duplicate the cDNA through a succession of thermal cycles. The thermal cycles alternate between 2 phases: the anneal/extension step and the denaturation step. Ideally in the anneal/extension step, each single strand of template is hybridized by both a probe molecule and a primer molecule. When a molecule of Taq enzyme attaches to such a triplex, the extension process begins and nucleotide bases attach to form a duplex consisting of the original DNA template and a completed second strand of opposite orientation. As part of the extension process, the Taq enzyme also digests the attached part of the Taqman probe, releasing and separating the fluorescent marker from the quencher sequence. A given Taqman probe only attaches to DNA sequences of a particular orientation. For the data in this paper, the probe molecules were designed to attach to molecules of the DNA sequences corresponding to the cDNA transcribed from the original sample of mRNA. The single-stranded DNA of opposite orientation duplicates without probe attachment and so without fluorescent emission. We assume duplication of strands of either orientation proceeds at the same rate. We also assume all duplexes separate during the hotter denaturation step.
The simplified model presented in this paper, includes an approximation to the above duplication process, predicting the amounts of both probe and primers used by the PCR for each cycle. In addition, the model includes the process of re-annealing, whereby single strands of DNA previously synthesized and of opposite orientation hybridize. The competition for single strands of DNA by this second process diminishes the predicted efficiency of the duplication process over the course of the PCR [10,12]. Details are shown in the Methods section. We show the results below of the basic model and the model extended by adding the unmarked duplication process.
The model was applied to 3 data sets described in the method section. The 3 data sets amplified a total of 3 gene sequences, RnaseP, Cyp1B1 and Actin. In each case, the initial concentrations of the amplicon were known, with amplifications run on several known dilutions of the original concentration. Tables Tables11 through through44 in the body of the paper compare the estimated amount of amplicon with the known concentration. Figure Figure11 shows an example of the model fit to an amplification run of the RNaseP data set.
In Table Table1,1, we present the estimation results of the initial amounts of amplicon for the RNaseP data set. As PCRs are frequently run for 40 rather than 60 cycles, the results are analyzed using just the first 40 cycles as well as using all available 60 cycles. Both the basic model and the extended model (allowing for duplication of the target sequence without release of fluorescence; see Appendix (additional file 1)) were used.
Even when the basic model is used with the data of only 40 cycles (2nd column), the estimates of the initial concentration of amplicon are of the right order of magnitude (although substantially underestimated), ranging from 57% to 72% of the true value. The estimates are somewhat better for the larger initial amounts, (63 – 72% of the true value) than for the smallest initial amounts (57 – 59% of the true value), suggesting that there is information carried in the later cycles of a PCR. Figure Figure2A2A shows that the measured fluorescence at the 40th cycle is further from the horizontal asymptote for the lowest initial amounts of amplicon so that there are fewer information-carrying observations for the lower dilutions.
The results of using the basic model with all 60 cycles are shown in the 3rd column of Table Table11 and range from 66% of the true value to 85%. For these data the estimates do not seem to improve with higher initial concentrations, suggesting that 60 cycles were sufficient for these initial amounts of amplicon. When the extended model is used, even the estimates using only 40 thermal cycles improve consistently, with the quality of the estimate again increasing with the initial concentration. Using the expanded model with 60 cycles leads to some overestimation for the highest initial concentration, but generally estimates initial amounts of amplicon very close to the true values.
In Table Table2,2, the same patterns seen in Table Table11 are repeated: the basic model with 40 cycles shows the poorest estimates, especially for the lowest initial concentrations. Adding the last 20 cycles improves all estimates, but the improvement is greater for the lower initial concentrations. The estimates are again of more uniform quality for the basic model applied to the data from all 60 cycles. Allowing for duplication of target sequence without fluorescent emission (i.e., using the expanded model) again improves the estimates using both 40 and 60 thermal cycles of data. For this data set all estimates underestimated the true initial amount of amplicon, estimating values between 53% and 67% of the true value.
In Table Table3,3, we use the absolute estimates found earlier to estimate fold changes between dilutions of the initial amounts of amplicon. In each case we compared the highest concentration to a lower one. For the Cyp1B1 data, this meant computing 3 estimates of a fold change, while we were able to compute 4 estimates of each fold change for the RNaseP data set.
In contrast to the absolute estimates, the estimates of the fold changes are not improved by using the expanded model over the basic model. The best estimates generally come from using 60 cycles of data with the basic model, which provides the largest amount of data with the fewest parameters to be estimated. The conclusion is that the basic model may underestimate the amount of amplicon by a consistent factor over the various concentrations, so that the fold change remains well estimated. In that case having fewer parameters to estimate with more data provides better estimates. Correspondingly, the worst estimates of the fold change are made using the expanded model with 40 cycles of data.
Fold change estimates made using the method  are listed in the final column of Table Table33 for comparison. In this approach a 'threshold value' is set for fluorescent emission, low enough that amplification is still exponential. The dilution fold is estimated using the fractional cycle numbers (CT) at which the amplifications cross that threshold in the following formula:
Here Fc is the estimated fold change. The subscripts 'high' and 'low' refer respectively to the highest initial concentration of amplicon and the lower concentration being compared to it. The threshold was set at 10 times the standard deviation of the first 10 cycles of the highest concentration in the comparison. Generally, the estimates made using all 60 cycles with the basic model are closer to the true fold changes.
The last data set comprises the averaged observations over 3 replicates of several PCRs applied to the Cyp1B1 and Actin sequences with various combinations of initial probe and primer concentrations. The results are shown in Table Table4.4. The 1st column gives the initial concentrations of probe and primers; the 2nd column gives the initial number of amplicon molecules. Columns 3 and 4 give the results for the Actin sequence; the last 2 columns give the results for the Cyp1B1 sequence.
The results vary between probe/primer combinations and gene sequence, with good results for Cyp1B1 generally for the higher concentrations. The initial amounts of Actin amplicon are considerably underestimated, although these estimates also improve for higher initial amounts. Nonetheless there is a consistent pattern that shows some improvement of the estimates using the extended model, but only for those cases where the initial concentration of primer exceeded the initial concentration of probe. This is consistent with the improvement in estimates seen with the extended model in Tables Tables11 and and2,2, both of which also show results for amplifications with more primer than probe available. For every estimate in Table Table44 where the concentration of probe is at least as high as that of primers, the basic model either outperforms the expanded model or fails to be improved by the extended model. We conclude that as long as there is at least as much probe as primer available, the preferential attachment of template to probe over primer precludes duplication of template without probe attachment. In that case, the model with the fewer parameters will provide the better estimate, since the extra parameter may be over-fitting to noise.
In this paper we explore balancing mechanistic modelling of the chemical reactions taking place in a PCR with the requirement that the model be estimable using the data from a single PCR amplification without an additional standard solution. The model was developed using Taqman probes to monitor the duplication. The Taqman detection system worked well, because the probe molecules are digested in the course of duplication. This characteristic allows the estimation of the conversion factor between the number of digested probe molecules and the observed fluorescence. While others have used this characteristic before , they required that all the probe molecules be used up. By modelling the consumption of probe molecules, the conversion factor can in principle by calculated using the predicted number of digested probe molecules and comparing that to the observed increase in fluorescence.
The presence of unused probe molecules after the completion of the PCR was predicted by the extended model in both data sets containing replicate data (RnaseP and Cyp1B1). In both of these data sets more primer was available than probe, but with both re-annealing and unmarked duplication included in the model, unused probe was left even though the upper asymptote was clearly reached (Fig (Fig2).2). Between 82% and 93.5% of probe was used in the RnaseP amplifications and around 95% for the Cyp1B1 amplifications.
In applying the model to the data, we found that the observed upper asymptotes corresponding to identical combinations of initial amounts of probe and primers did not always have the same values. This is visible in Fig. Fig.2B,2B, showing the Cyp1B1 replicate data. We were unable to reproduce the differences in the upper asymptotes without allowing different conversion factors between observed fluorescence and the number of digested probe molecules. For this reason we did not use a common conversion factor.
The results of the estimation of amplicon were mixed. For the RNaseP amplifications, the estimation of the initial amount of amplicon was quite good; for the Cyp1B1 and the Actin sequences the initial amount of amplicon was underestimated, sometimes severely so. Even in these cases, the fact that the estimates were of the right order of magnitude indicate that the approximations used in the model were reasonable. The underestimations do indicate that more work needs to be done with this approach before it can be used by itself or in a clinical setting.
In order to simplify the chemical reactions taking place in the PCR to the point that the model could be estimated by a single amplification means that many processes are necessarily left out. The cumulative effect of all unmodeled processes could be significant, even if the individual unmodeled reactions have a small effect on the outcome. We mention here several of the reactions that were not included in the results presented here.
Primer dimerization occurs when primers specific to DNA strands of opposite orientation hybridize during the anneal step. Primer-dimers may extend during the extension step, in which case the primers are no longer usable, even after denaturation. If the dimers do not extend, then the primers may be used in later cycles . Although a differential equation describing primer-dimerization can be constructed (See Appendix A (additional file 1)), the problem is that the additional parameter has to be estimated. If primer-dimerization is added to the extended model, then a total of 5 parameters have to be estimated for a single amplification, which is sometimes too many. We did attach the primer-dimerization equation to the basic model and applied the result to some of the amplifications of the averaged data. For these data, adding the additional reaction improved neither the fit nor the estimate.
The amount of Taq enzyme might become limiting, if the enzyme degrades over the course of the PCR as a result of repeated exposures to high temperatures. The decrease in enzymatic activity has been investigated  and did not seem to have a strong impact on the amplification process.
In simplifying the model, we were especially ruthless with the extension step, assuming perfect irreversible extension for all cases. This means we did not examine the quantity of nucleotides being limiting or the possibility of incomplete extension . Unlike the primer-dimerization, these processes would be very difficult to incorporate into the model, and still have it be estimable from a single amplification.
In spite of these shortcomings however, the model (both the basic and the extended models) did prove to be estimable in all cases. The predictions when compared to the known concentrations of amplicon were quite good in the case of the extended model applied to the RnaseP sequence and of the right order of magnitude (though underestimated) in the other cases. Furthermore, the fold change estimates using the basic model did well for both the RnaseP and the Cyp1B1 sequence. In addition, the model seems to be sensitive enough that processes included in the model that do not contribute to the output of a particular amplification also do not improve the fit. In particular, this was the case for the unmarked duplication process where enough probe was available (Table (Table44).
There are advantages to having a model that can estimate initial quantities of amplicon from a single amplification; using only the data from a single amplification means that no additional reference tissues need to be analyzed, no standard solutions need to be made up, no additional replicates or dilutions need to be run, and there are no concerns about whether the efficiencies of the amplifications or the conversion factors linking fluorescent emissions to molecules are nearly enough the same across replicates or reference solutions. There are also advantages to using a model based on the science of the process: such a model allows the estimated quantities to have meaning within the context of the chemical reactions, provides a basis for comparison across experiments run under differing conditions and allows greater insight into the critical components that govern the net outcome of a complicated series of reactions.
In this paper we present proof of the principle that it is possible to estimate absolute concentrations of template in single PCR amplifications through modeling the biochemical processes present within the thermal cycles. That the drastically simplified model nonetheless produced good estimates in some cases and at least estimates of the right order of magnitude in others shows that the assumptions going into the model produce good approximations.
PCR quantification was applied to solutions with known concentrations of human gene sequences from 3 data sets. The PCRs in each data set were run for 60 thermal cycles using 50 μl wells. Cycles consisted of anneal/extension steps (each lasting 60 seconds at 60°C) alternating with denature steps (each lasting 15 seconds at 95°C). The first data set consists of 20 PCRs run on the RnaseP sequence and includes 4 replicates of each of 5 initial amounts of amplicon (1250, 2500, 5 × 103, 104, and 2 × 104 molecules). The probe and primers were supplied by an AB Biosystems pre-packaged kit. For this data set a 7900 model machine was used with a 96 well plate and a kit with part number 4310982. (Identical probe and primers for models 7300 or 7500 come in part number 4350584.) The initial primer concentrations were all 900 nmol/L; the initial probe concentrations, all 250 nmol/L.
The second data set consists of 12 PCRs on Cyp1B1, including 3 replicates of each of 4 initial concentrations of amplicon (103, 104, 105, and 106 molecules). The primers for both orientations of each sequence were found using the Primer Express software program (Applied Biosystems). The forward primer used was GCC AGC CAG GAC ACC CT with the reverse primer, GAT CCA ATT CTG CCT GCA CTC. The probe used was CGC TTGC AGT GGC TGC TCC TCC T. The Taqman probe carried a FM reporter label with Tamra as quencher. The initial primer concentrations were 200 nmol/L; the initial probe concentrations, 100 nmol/L.
The final data set contains PCRs run on both Cyp1B1 and Actin sequences with 4 different combinations of initial probe and primer concentrations. The primer sequences used were again found using the Primer Express software program (Applied Biosystems). For the Actin sequence, the forward primer was CCT GGC ACC CAG CAC AAT and the reverse primer was GCT GAT CCA CAT CTG CTG GAA. The probe used was ATC AAG ATC ATT GCT CCT CCT GAG CGC with the VIC reporter label and a Tamra quencher. The 4 combinations of initial probe and primer concentrations used for both gene sequences were 400 nmol/L probe with 400 nmol/L primer, 400 nmol/L probe with 200 nmol/L primer, 200 nmol/L primer with 400 nmol/L probe and 200 nmol/L for both probe and primer. For each combination of probe and primer 12 PCRs were run with 3 replicates of 4 initial concentrations of amplicon (3.32 × 10-5 nmol/L, 3.32 × 10-6 nmol/L,, 3.32 × 10-7 nmol/L, and 3.32 × 10-8 nmol/L, all in 50 μl of solution.) Only the mean observations, averaged over the 3 replicates were available.
The model provides a simplified description of the duplication of DNA template using the Taqman probe fluorescent detection system. We focus on reactions taking place in the anneal step of the PCR amplification. The attachment of Taq enzyme and subsequent extension of complexes formed in the anneal step is assumed to follow immediately upon the attachment of primer and is therefore not separately modeled. Similarly, we assume that during the denaturation or melting step all duplexes 'melt' or separate into single strands, so that these reactions are also not separately modeled. The reactions that form the focus of the basic model are given in Table Table5.5. The DNA template is referred to as A with the sequence specific primer, P. The anti-sense counterparts are denoted by primes. The Taqman probe molecule is assumed to hybridize only with A (not A') and is denoted by Q.
The reactions listed in Table Table55 are shown as reversible, with the exception of the re-annealing of existing strands of DNA leading directly to the duplex, A'A. As mentioned by other authors , the duplex A'A rarely dissociates. If the reactions as they appear in Table Table55 are allowed to go to equilibrium, very little product will be formed, as the re-annealing reaction would eventually use up all the single DNA strands. In their paper, Gevertz et al  therefore consider all reactions as being stopped by the end of the anneal/extension step, rather than letting all reactions go to equilibrium. However, when PCR runs with anneal/extension steps varying from 30 seconds to 90 seconds were compared, no discernible differences were observed (not shown). From this we concluded that the reactions in an anneal/extension step do go to equilibrium very quickly.
Our solution is to assume that although the reactions attaching probe or primer are themselves reversible, the attachment of a primer molecule is followed very quickly by the attachment of Taq enzyme. We further assume that extension follows quickly and irreversibly and continues until the duplex is made. In effect then, the reactions in Table Table55 forming the triplex, AQP, and the complex, A'P', are irreversible and can compete with the re-annealing reactions, even if the reactions are all allowed to go to equilibrium.
Table Table55 does not contain the reaction attaching primer P to template A. For the basic model, we assume that the probe molecules are designed to attach more quickly to the template than the primers, so that the rate-limiting step is that of the attachment of primer, immediately followed by enzyme attachment and extension. We relax this assumption in the extended model discussed in more detail in the Appendix A(additional file 1). If the rate-limiting step is the attachment of the primer rather than the probe, we can also assume that double-stranded DNA formed from the complex, A'P', is produced at the approximately the same rate as that from the complex, AQP. For this reason, the annealing and extension processes associated with the anti-sense strands, A', are also not modeled separately, but assumed to proceed at the same rate as for the sense strands, A.
Table Table66 shows the differential equations actually making up the basic model. The subscript i refers to the anneal step in the ith thermal cycle. Thus, Ai, Qi and Pi refer to the amounts of template, probe and primer available at the start of the ith anneal step. All quantities in the equations in Table Table66 are in units of nmol/L, since that is how the probe and primer are measured. The 1st equation in Table Table66 approximates the result of the first two reactions in Table Table5.5. The intermediate product of the 1st reaction, AQ, is assumed to be short-lived, progressing quickly to AQP with the attachment of a primer. The solution to the 1st equation, wiq(t), then reflects the amount of double-stranded DNA formed by attachment of probe and primer to template with subsequent extension. The factors (Qi - wiq(t)) and (Pi - wiq(t)) refer to the amounts of probe and primer respectively available at time t within the ith anneal step, being the amount of probe and primer present at the start of the ith anneal step minus the amounts already used in duplications by time t.
The solution to the 2nd equation, wir(t), predicts the amount of double stranded DNA formed by re-annealing of previously synthesized single DNA strands. If Ai denotes the amount of single strands of template, then (Ai - wiq(t) - wir(t)) reflects the amount of single template DNA available at time t within the ith anneal step. The re-annealing process combines strands of opposite orientation, so that (Ai - wiq(t) - wir(t)) should be multiplied by a factor reflecting the amount of available anti-sense DNA available at time t. However, since we assumed the double-stranded DNA is formed from anti-sense strands at the same rate as from the template, there should always be the same amount of single-stranded DNA of both orientations present. For that reason, we square the factor denoting the amount of available single-stranded DNA of the template orientation. Each denaturation step is assumed to separate all duplexes, so that the amounts of wiq(0) and wir(0) are zero at the start of each anneal step.
Since we assume all reactions go to equilibrium in the anneal/extensions step, we want the equilibrium solutions to the equations in Table Table66 for each thermal cycle. Unfortunately, setting the derivatives equal to zero does not give a fully defined solution. The relative amount of double-stranded DNA formed by duplication as opposed to re-annealing is determined by the relative speed of the 2 modeled processes and can only be calculated by numerically integrating the differential equations until the solutions, wiq(t) and wir(t), are very close to their respective equilibrium values, wiqE and wirE. Since equilibrium as characterized by our model occurs when all single-stranded DNA is incorporated into double-stranded DNA (either by duplication or by re-annealing), we numerically integrate the differential equation system until the term (Ai - wiq(t) - wir(t))2 is less than 10-12. The solutions of the differential equations at that time are taken to be the equilibrium values for that time step.
The two rate constants, Mq and Mr do not change over the course of the amplification, but are even so not both identifiable, since, as long as both are large enough to ensure equilibrium is reached before the end of the anneal/extension step, only their relative sizes determine wiqE and wirE. Hence Mq can be set equal to 1, leaving Mr to be estimated by fitting the model to the fluorescent output.
From the assumptions encoded in Table Table6,6, the amount of duplicated template at the end of the ith anneal/extension step, wirE, is equal to the amount of probe and primer digested during the ith anneal/extension step. Thus the predicted amounts of primer, probe and single-stranded amplicon available at the start of the i + 1st anneal/extension step can be calculated as
The first cycle is started with the known initial amounts of primer and probe, P0 and Q0, as well as the unknown amount of initial amplicon, A0. Although A0 is not shown explicitly in the equations in Table Table6,6, it is incorporated into all values of wiqE through the above equations. A0 is estimated by optimizing the fit of the model to the fluorescent data.
If wiqE estimates the amount of probe digested in the ith cycle, the amount of probe digested throughout a PCR amplification of n cycles is estimated as . Given this value, the calibration coefficient between total number of probe molecules digested and the total observed fluorescence over the entire amplification is calculated as
Note that since the amounts of probe and primer are given in nmol/L (in a 50 μl tubule), the estimate Qused is also in nmol/L. (See Appendix B(additional file 1) for the conversion formula between the concentration nmol/L in the 50 ml tubules and number of molecules.). The cost function minimized to fit the data is given below
where Fi denotes the observed (cumulative) fluorescence corresponding to the ith cycle, and there are n thermal cycles in the amplification. The optimization is organized as follows:
1) Initial guesses are made for A0 (unknown initial amount of amplicon) and Mr (the rate constant for the re-annealing process).
2) Using those guesses, the model is run, ie, the differential equations are integrated for each anneal/extension step and the amounts of product, probe and primer updated between cycles.
3) At the end of the last cycle, the amount of probe used up by the amplification is calculated and equation 2 is used to compute the estimate for Kf.
4) Using the predicted Kf and the predicted equilibrium values for the solutions to the 1st differential equation for each cycle, the cost function is evaluated.
5) An optimization routine is used to determine the next guess for A0 and Mr.
Steps 2 – 5 are then repeated until the cost function is minimized.
The cost function describes weighted least squares estimation, since the squared difference between the observed fluorescence and the predicted fluorescence is multiplied by the ith increment in fluorescence. We found that this weighting factor improved the fit by weighting more heavily the linear part of the sigmoid curve of cumulative fluorescence, which is the least affected by noise. To avoid contamination by the noise in the early cycles, we start the sum in the cost function with the first cycle (shown as thr in the above cost function) whose observed value is larger than 10 times the standard deviation of the observations over the first 10 cycles. Other thresholds could have been used, but this one was easily programmable.
CJP conceived of the study and helped draft the manuscript. MVS developed the model. CRM provided valuable discussion and data. NJW provided early discussion pointing to the Taqman probe. MK provided insight and valuable discussions about the chemistry of the processes. All authors contributed in preparation of the final manuscript.
The Appendix to the paper as a Word document containing details of the expanded model described in the text, as well as the details of the estimation and optimization methods.
The authors would like to acknowledge the contribution of Shawn Harris (Senior software developer at Constella Group, LLC) in the coding of the algorithm for distribution. The work of Dr. Smith and Mr. Harris was supported by the National Institute of Environmental Health Sciences contract #GS-00F-0003L.