The value of qRT-PCR depends critically on appropriate analysis and interpretation of the outcome of cDNA amplification. Important parameters for this analysis include 1) the magnitude of the noise due to stochastic properties of PCR amplification and fluorescence detection; 2) the identification and selection of the EP; 3) the calculation of the amplification efficiency; 4) the selection of the threshold for CT determination; and 5) the choice of whether to use individual or average efficiencies for data analysis. The method presented here, based on quantitative assessment of the kinetics of individual reactions, offers an accurate and convenient method to address all of these issues and eliminates the need for estimates or qualitative judgments often embedded in other methods.
Noise, or stochastic variations in the detected fluorescence level, exists in all qRT-PCR reactions. In the initial cycles, because the fluorescence is low, the influence of the noise will be more pronounced. Typical quantification procedures fit a straight line to the signal before amplification to be used as the baseline. In software supplied by most real time PCR machines (e.g., MyIQ, iCycler from BioRad, ABI systems, Stratagene MX systems, etc.), the baseline is taken as a straight line fitted to data from the first few cycles, typically 10 cycles or a value arbitrarily chosen by the user. However, at high concentrations of starting template, PCR product can be amplified earlier than the cycles used by such algorithms. Moreover, linear regression algorithms (Peirson et al., 2003
; Ramakers et al., 2003
) require transformation of the raw fluorescence data from linear to logarithmic form using subtraction of the baseline. Nevertheless, since the baseline is usually calculated only from the initial or ground phase before amplification begins, it is not reasonable to subtract all later cycles from this value since this violates the basis of linear regression. Tichopad et al.
recommended nonlinear regression using parameter y0
of the three-parameter simple exponent model as the baseline, a suggestion which avoided the violation of assumptions needed for linear analysis. However, the method they used for SPE determination, Outlier-SPE (Tichopad et al., 2003
) is still based on the assumption that a perfect baseline can be fit by the points only from the ground phase. Note that the SPE is not the baseline of the EP, but the first detectable point in the EP Since noise has maximal influence on the ground phase, systematic errors were introduced ( and , and ). In addition, the variable noise levels of different platforms will also result in variance in outlier detection (). The fitted baseline usually has a positive slope due to the slight fluorescence increase of the ground phase. Because of the influence of noise, the baseline is often overestimated. In the KOD method, the authors subtracted a baseline that was the arithmetic average of the five lowest fluorescence readings (slope = 0), meaning that the baseline value is more likely to be underestimated as the authors reported (Bar et al., 2003
). Although the authors claimed that in typical experiments OBS and UBS can be visualized, small OBS or UBS is actually very hard to see since the value of the baseline is low. Instead of using only the ground phase, we used the noise level of the ground phase (RNoise
) based on all points (whole curve fitting) to determine the SPE. This ensures that any amplification cycles used for efficiency estimation are not drawn from points that have low fluorescence readings and occur within the noise level.
Another critical step is choosing the optimal part of the PCR amplification phase for calculation of efficiency. Liu and Saint have proposed an amplification plot method to address this problem, but their method does not provide an objective procedure for determining the EP before performing nonlinear regression (Liu and Saint, 2002a
). Another group chose the “window-of-linearity” method (an iterative linear regression algorithm) in which they set the start and end points of the EP subjectively and search for a line in a logarithm plot with the highest R2
value and a slope close to the maximum slope (Ramakers et al., 2003
). Peirson et al.
suggested yet another linear regression method using the mid-value point in the logarithmic plot, emphasizing that the points chosen for regression should be equally distributed around the mid-value point to achieve the highest accuracy (Peirson et al., 2003
). However, visual inspection of all amplification curves is required to choose suitable windows for EP for these methods. In Peirson’s method, as in the MyIQ software, the standard deviation of cycles 1 to 10 is used as RNoise
after performing baseline subtraction, which may not be reasonable for all reactions. Moreover, due to the influence of noise, not all points that are equally distributed on both sides of the mid-value point are suitable to be used for the efficiency estimate. Also, Peirson et al.
used the maximal fluorescence of the entire curve to calculate the mid-value point, which is actually the middle value of the whole curve and much higher than that of the exponential phase. As described above, we use a method based on the logistic model which is entirely objective (Noise-SPE and SDM) to identify the EP.
After defining the EP, the efficiency is calculated. Although the methods provided by Liu and Saint (2002b)
and Rutledge (2004)
suggested that the R0
can be calculated directly from the fitted sigmoid model instead of using efficiency and CT, the intrinsic mathematical calculation relies on how well the sigmoid model is matched with the PCR kinetics. As noted above, the point symmetric sigmoid model does not fit the curve of the overall reaction, especially for the EP. Even using an improved S-shaped model, such as the logistic model, we still found a much higher MSE within the EP, compared with the three-parameter simple exponent model, the theoretical curve of PCR kinetics (>20 times, ). Generally, a suitable whole S-shaped curve fitting can do a very good job of defining the exponential phase, since it can accurately predict the cycles on both sides of the exponential phase and plateau phase () where the fluorescence changes are less dramatic (lower MSE, data not shown). The whole S-shaped curve fitting fails, however, to be fit to the cycle regions of most rapid change (e.g., the exponential and plateau phases, ) with high enough fidelity (larger MSE than the exponent model, ).
To calculate the efficiency, we chose the three-parameter simple exponential nonlinear regression, also proposed by others (Tichopad et al., 2003
). However, in a PCR reaction, the EP occurs in a very early part of the amplification, meaning there are relatively lower fluorescence levels, and is therefore more influenced by noise as discussed above. In our method, we used the P
-value of the regression to control the contribution (the weight) of the candidate efficiency to the final efficiency estimation. In practice, we found that with greater noise, fewer candidate efficiencies have a P
-value less than 0.05. Typically, the EP spans ca. 8 cycles and the iterative regression will find ca. 15 windows containing 4–8 cycles. The method will then calculate all the candidate efficiencies and their related P
-values. In contrast, Tichopad’s method (Tichopad et al., 2003
) performs only one regression based on all points found in the EP. Our approach makes the whole algorithm very robust over thousands of tested samples across different platforms.
As another important parameter for the qRT-PCR, the CT values should be determined within the EP to reflect the initial concentration of the template. Currently, there are several different methods for estimating CT: the fit point, Taqman threshold, SDM (Tichopad et al., 2002
; Wittwer et al., 2001
), and FDM (Tichopad et al., 2004
) methods. Since the mid-value point is also located within the exponential phase, it potentially can be used for objective CT determination.
In the fit point method, an intersecting line is placed arbitrarily in a logarithm plot at the base of the exponential portion of the amplification curves. This method can result in systematic errors due to the baseline subtraction and subjective judgment. Another problem for this method is that all samples should come from the same experimental plate so that the unique threshold can be set across all samples. This constrains the number of samples that can be reliably compared.
The Taqman threshold method refines the fit point method by fitting a line at 10 times the standard deviation of the fluorescence in the ground phase (Holland et al., 1991
). However, using 10 times the value is an arbitrary choice and does not guarantee that the CT will be in the exponential phase.
Since the FDM and SDM as well as the mid-value point methods calculate the CT from an individual sample based on its own kinetics, they can potentially be used for cross-plate analyses as long as the noise levels across plates are similar. Nevertheless, the FDM value is usually not in EP, while the fluorescence at the mid-value point is relatively small and more easily influenced by noise (). In practice, we found that the CT for the samples with extremely low concentrations of initial template (e.g., CT > 32 in a reaction with a total of 40 cycles) are usually much less accurate than others, which can be easily discovered by much larger differences among replicates. In some cases, the corresponding reaction curves do not even reach the SDM before the last cycle, although the logistic regression might still give a number for SDM (underestimated). Without providing enough accurate information about the reaction, these data should be excluded no matter what kind of postmathematical computation (standard curve or Miner) is chosen, unless one is introducing more templates, optimizing the experimental condition, or increasing the total cycles carefully if the variance among replicates is acceptable.
Since knowledge of amplification efficiency is critical for accurate real-time PCR quantification, using the mean efficiencies of all samples for each gene is still recommended (Tichopad et al., 2004
). Applying individual corrections can result in potential systematic error because only a small number (~8) of available data points within EP can be used for the individual efficiency calculation. Different efficiencies from only triplicate samples might result in a considerable effect on R0
because any error in the measured efficiencies will be exponentially magnified (Peirson et al., 2003
). Alternatively, when the differences of inhibitors (hemoglobin, heparin, glycogen, fats, Ca2+
, etc.) or enhancers (glycerol, BSA, gene 32 protein, Taq extender, E. coli
ssDNA binding protein, etc.) of RT-PCR among samples need to be considered in a particular experiment (Rossen et al., 1992
; Tichopad et al., 2004
; Wilson, 1997
), using more replicates to calculate the mean of efficiency for a different experimental group to acquire a comparable R0
is advised. On the other hand, the individual efficiency allows additional investigation of the quality of each reaction.
In summary, the algorithm described here uses the kinetics of individual reactions for accurate estimates of efficiency and CT without the need for preparing a standard curve. Furthermore, this method allows all the key parameters for the quantification procedure to be objectively estimated, which is especially convenient for beginning users and for large sample sizes. It is hence economical for qRT-PCR analysis and robust for samples with high noise levels across real-time platforms.