Currently, the mainstream of analysis of qPCR data is based on the Ct value of each sample and a PCR efficiency value per amplicon. Application of a calculation equation derived from Equation 1 then leads to an estimate of the starting concentration expressed in arbitrary fluorescence units or an estimate of the ratio between two starting concentrations of the transcript-of-interest (Equation 2 and Equations 3B or 3C, respectively). This article deals with the analysis of qPCR data resulting from the monitoring of DNA binding dyes like SYBR Green I, but most of the principles discussed in this article also apply to data collected with other fluorescent chemistries (e.g. hydrolysis probes). However, analysis of such data sets requires extra data processing steps that are not discussed in this text.
Analysis of qPCR data requires the derivation of a PCR efficiency value from the observed data. This article shows that the observed PCR efficiency is strongly influenced by small errors in the applied baseline correction. As described, it proves impossible to estimate a baseline value from the so-called ground phase data because the source of this fluorescence is not clear. The main source of baseline fluorescence is unbound fluorochrome (e.g. SYBR Green), which is not fully nonfluorescent (4
). However, baseline fluorescence also depends on sample dilution, and thus on total cDNA concentration, and on primer concentration (B). Together with the unidentified interactions between those fluorescence sources, the prediction and modeling of baseline behavior is currently unfeasible. Our conclusion that there is not enough ground for the development of an algorithm to determine the baseline from the ground phase data is in line with the findings of others (7
The baseline estimation algorithm described in the current article is based on the kinetic model of PCR amplification (Equation 1) and a constant PCR efficiency. Cycle-dependent changes in PCR efficiency are predicted by sigmoid models used in qPCR analysis (20
). The use of such sigmoid models is not based on biophysical/biochemical considerations of PCR kinetics, but mainly on their good fit to raw qPCR data. Recent papers show that despite their overall good fit, these models do not fit well to the exponential phase data (7
). Therefore, these ‘empirical’ models do not provide a solid basis for modeling of the behavior of the PCR efficiency during the PCR reaction. On the other hand, it was established that, when modeling PCR as a statistical branching process, PCR efficiency is constant from the first cycle until the beginning of the plateau phase (30
). A modeling study based on kinetic annealing confirmed this notion (23
). Moreover, the N0
value estimated with Equation 2, at large enough Ct
, has been shown to be an unbiased estimate of the real starting amount (22
With a constant PCR efficiency the value of each data point up till the start of the plateau phase is the sum of the baseline fluorescence and an exponentially increasing amplicon-dependent fluorescence (Equation 4). An algorithm that searches for a baseline value that results in the longest straight line of data points when plotted on a semi-logarithmic scale, isolates the exponentially increasing part of the observed fluorescence values. This algorithm requires a sufficiently large baseline-to-plateau ratio as well as a low observation noise. In datasets that do not fulfill these requirements a reliable straight line in the log-linear phase will not be found. The baseline value can be lowered by lowering the primer concentration (B); observation noise can be reduced by setting a fixed, instead of an adaptive, exposure time in the qPCR apparatus. Note that the baseline estimation algorithm does not include a ‘goodness-of-fit’ criterion. The chosen algorithm ensures that points at lower cycle numbers are only included as long as they randomly deviate from the straight line defined by the points in the upper part of the exponential phase. Such a provision would not be possible when the algorithm includes a ‘goodness-of-fit’ criterion for the whole log-linear phase.
Even after minimizing PCR efficiency variability and setting of a W-o-L per amplicon, similar samples show slightly different observed PCR efficiencies. To the best of our knowledge, no sample-dependent PCR efficiency differences have ever been reported (10
). Variability of the PCR efficiency values has been attributed to a limited precision of individual data (12
) and thus reflects mainly a statistical error and not a real difference (16
). Accordingly, most researchers choose to use a fixed or the mean efficiency per amplicon in their analysis of qPCR data. The symmetric distribution of the individual efficiency values (e.g. B and inset of C) justifies using the arithmetic mean efficiency. Although the use of a fixed PCR efficiency for all samples per amplicon is well supported, it is still important to use an efficiency value that represents the true efficiency. Equation 7 shows that the bias in the expression ratio resulting from using a common efficiency value for two amplicons, instead of the amplicon-specific efficiencies, depends on the relative difference in efficiencies as well as the Ct
values of both samples. An example of such a bias is illustrated in D.
Based on the results and considerations in the current paper, the LinRegPCR analysis program (15
) has been updated. Although this updated version of the program can be used in a ‘load-and-click’ mode, the different variation sources in qPCR analysis make that no analysis system can be used as a black box. Every user of qPCR should stay aware of hitherto unknown variables affecting the analysis. The experimental set-up should be aimed at recognizing the variables of interest and should enable the analysis of the significance of such variables. Analysis systems cannot relief the researcher of this task.