Real-time PCR is one of the most sensitive and reliably quantitative methods for gene expression analysis. It has been broadly applied to microarray verification, pathogen quantification, cancer quantification, transgenic copy number determination and drug therapy studies [1
]. A PCR has three phases, exponential phase, linear phase and plateau phase as shown in Figure . The exponential phase is the earliest segment in the PCR, in which product increases exponentially since the reagents are not limited. The linear phase is characterized by a linear increase in product as PCR reagents become limited. The PCR will eventually reach the plateau phase during later cycles and the amount of product will not change because some reagents become depleted. Real-time PCR exploits the fact that the quantity of PCR products in exponential phase is in proportion to the quantity of initial template under ideal conditions [5
]. During the exponential phase PCR product will ideally double during each cycle if efficiency is perfect, i.e. 100%. It is possible to make the PCR amplification efficiency close to 100% in the exponential phases if the PCR conditions, primer characteristics, template purity, and amplicon lengths are optimal.
Figure 1 Real-time PCR. (A) Theoretical plot of PCR cycle number against PCR product amount is depicted. Three phases can be observed for PCRs: exponential phase, linear phase and plateau phase. (B) shows a theoretical plot of PCR cycle number against logarithm (more ...)
Both genomic DNA and reverse transcribed cDNA can be used as templates for real-time PCR. The dynamics of PCR are typically observed through DNA binding dyes like SYBR green or DNA hybridization probes such as molecular beacons (Strategene) or Taqman probes (Applied Biosystems) [2
]. The basis of real-time PCR is a direct positive association between a dye with the number of amplicons. As shown in Figure ( and ), the plot of logarithm 2-based transformed fluorescence signal versus cycle number will yield a linear range at which logarithm of fluorescence signal correlates with the original template amount. A baseline and a threshold can then be set for further analysis. The cycle number at the threshold level of log-based fluorescence is defined as Ct number, which is the observed value in most real-time PCR experiments, and therefore the primary statistical metric of interest.
Real-time PCR data are quantified absolutely and relatively. Absolute quantification employs an internal or external calibration curve to derive the input template copy number. Absolute quantification is important in case that the exact transcript copy number needs to be determined, however, relative quantification is sufficient for most physiological and pathological studies. Relative quantification relies on the comparison between expression of a target gene versus a reference gene and the expression of same gene in target sample versus reference samples [7
Since relative quantification is the goal for most for real-time PCR experiments, several data analysis procedures have been developed. Two mathematical models are very widely applied: the efficiency calibrated model [7
] and the ΔΔCt model [9
]. The experimental systems for both models are similar. The experiment will involve a control sample and a treatment sample. For each sample, a target gene and a reference gene for internal control are included for PCR amplification from serially diluted aliquots. Typically several replicates are used for each diluted concentration to derive amplification efficiency. PCR amplification efficiency can be either defined as percentage (from 0 to 1) or as time of PCR product increase per cycle (from 1 to 2). Unless specified as percentage amplification efficiency (PE), we refer the amplification efficiency (E) to PCR product increase (1 to 2) in this article. The efficiency-calibrated model is a more generalized ΔΔCt model. Ct number is first plotted against cDNA input (or logarithm cDNA input), and the slope of the plot is calculated to determine the amplification efficiency (E). ΔCt for each gene (target or reference) is then calculated by subtracting the Ct number of target sample from that of control sample. As shown in Equation 1, the ratio of target gene expression in treatment versus control can be derived from the ratio between target gene efficiency (Etarget
) to the power of target ΔCt (ΔCttarget
) and reference gene efficiency (Ereference
) to the power of reference ΔCt (ΔCtreference
). The ΔΔCt model can be derived from the efficiency-calibrated model, if both target and reference genes reach their highest PCR amplification efficiency. In this circumstance, both target efficiency (Etarget
) and control efficiency (Econtrol
) equals 2, indicating amplicon doubling during each cycle, then there would be the same expression ratio derived from 2-ΔΔCt
Whereas ΔCttarget = Ctcontrol - Cttreatment and ΔCtreference = Ctcontrol - Cttreatment
Ratio = 2-ΔΔCt Equation 2
Whereas ΔΔCt = ΔCtreference - ΔCttarget
Even though both the efficiency-calibrated and ΔΔCt models are widely applied in gene expression studies, not many papers have thorough discussions of the statistical considerations in the analysis of the effect of each experimental factor as well as significance testing. One of the few studies that employed substantial statistical analysis used the REST®
]. The software presented in this article is based on the efficiency-calibrated model and employed randomization tests to obtain the significance level. However, the article did not provide a detailed model for the effects of different experimental factors involved. Another statistical study of real-time PCR data used a simple linear regression model to estimate the ratio through Ct calculation [10
]. However, the logarithm-based fluorescence was used as the dependent variable in the model, which we believe does not adequately reflect the nature of real-time PCR data. It follows that Ct should be the dependent variable for statistical analysis, because it is the outcome value directly influenced by treatment, concentration and sample effects. Both studies used the efficiency-calibrated models. Despite the publication of these two methods, many research articles published with real-time PCR data actually do not present P
values and confidence intervals [11
]. We believe that these statistics are desirable to facilitate robust interpretation of the data.
A priori, we consider the confidence interval and P value of ΔΔCt data to be very important because these directly influence the interpretation of ratio. Without a proper statistical modeling and analysis, the interpretation of real-time PCR data may lead the researcher to false positive conclusions, which is especially potentially troublesome in clinical applications. We hereby developed four statistical methodologies for processing real-time PCR data using a modified ΔΔCt method. The statistical methodologies can be adapted to other mathematical models with modifications. SAS programs implementing the methodologies and data control are presented with real-time PCR practitioners in mind for turnkey data analysis. Standard deviations, confidence levels and P values are presented directly from the SAS output. We also included analysis of the sample data set and SAS programs for the analysis in the online supplementary materials.