An often underestimated critical issue underlying the reliability of gene expression results is the quality of the RNA samples. To assess the impact of RNA quality on the results, a useful quality parameter is needed as well as a measurable outcome. In this study, we have examined the quality of an unprecedentedly large series of 740 clinical RNA samples using six RNA quality parameters. We did not aim to identify the best parameter measuring RNA quality. Rather, we developed and applied an analytical framework using novel methods for evaluation of RNA quality in relation to qPCR results. Undisputedly, we demonstrated a measurable influence of RNA quality on the gene expression results.
A significant—albeit imperfect—positive correlation was found between all RNA quality parameters; each parameter appears to have a different appreciation of RNA quality. While the Alu repeat sequence expression level and the normalization factor based on four reference genes were determined on randomly primed pre-amplified material, HPRT1
5′–3′ dCq and HPRT1
3′ Cq value were measured on cDNA obtained from anchored oligo-dT priming of original RNA. Therefore, a possible explanation for the lower correlation between these parameters is the use of a different RT priming strategy, resulting in successful pre-amplification of partially compromised RNA samples in case of random priming (23
). This might also explain why some samples classified as bad quality based on RQI or 18S/28S rRNA ratio (measured on total RNA) turn out to be better quality samples based on the other methods (measured on cDNA).
Upon careful interpretation of the impact of RNA quality on the results using the novel methods, we clearly observed an effect on the variation of reference gene expression, on the significance of differential expression of prognostic marker genes and on the classification performance using a multigene signature. In contrast to some reports in the literature (8
), the results obtained from this study indicate that the process of normalization does not completely resolve the effect of compromised RNA quality on the final results. A substantial impact of RNA quality on the standard deviation of each of the four reference genes when normalized by the three other reference genes was observed. This is in accordance with our previous report indicating that reference gene expression stability is influenced by RNA quality and that genes display varying sensitivity to RNA degradation (7
). As lower RNA quality generally results in higher Cq values, it is important to note that the observed increase in variation is not simply due to sampling noise that occurs when the number of input molecules are low. Indeed, the measured Cq values of the reference genes are found in a range well below values at which sampling noise is expected; furthermore, the same observations are made when RNA quality metrics are used that are not depending on the input amount (such as 18S/28S rRNA ratio, RQI and 5′–3′ dCq).
RNA quality has also a noticeable influence on the significance of differential expression of individual marker genes between two divergent risk groups of cancer patients. Some genes appear to be sensitive to RNA quality, while others are not. Surprisingly, for a few genes, the results seem better when RNA was of lower quality. This puzzling observation is in accordance to findings within the EU FP7 Spedia project on standardization and improvement of pre-analytical procedures for in vitro diagnostics (M. Kubista, Personal communication). Upon extensive correlation analyses using different parameters, including qPCR assay and gene expression specific characteristics such as amplicon length, transcript length, distance to 3′-end, assay amplification efficiency, mean Cq value and magnitude of differential expression between the two risk groups, no clear explanation was found as to why some genes were more sensitive to RNA quality than others (data not shown).
Our data further show that the performance of gene expression based classification in function of RNA integrity is influenced by the number of genes included in the classifier and by the nature of the applied classification algorithm. The correlation signature algorithm seems to be the least sensitive to RNA quality and an expression signature built with a larger number of genes results in more robust classification.
Overall, the 5′–3′ dCq and normalization factor quality parameters appear to have the largest influence on the qPCR expression results obtained on fresh frozen biopsies. As such, they ‘appear to constitute the most’ useful parameters to qualify RNA samples. The advantage of using a 5′/3′ ratio assay or the normalization factor to assess the integrity of an RNA sample before its use in a gene expression study is that it specifically ‘addresses’ the integrity of a messenger RNA molecule. This is not the case for other methods such as microfluidic electrophoresis that predominately inspect the ribosomal RNA profile to infer RNA quality. Undoubtedly, such methods provide an indication of total RNA quality but are not necessarily most appropriate to predict the integrity of mRNA transcripts that form the actual template in RT–qPCR analyses. Of note, the 5′–3′ dCq value in itself is not only depending on the RNA quality, but may also be influenced by reverse transcriptase efficiency and assay performance (qPCR efficiency). The last two factors do not need to be taken into account if the RNA quality parameter is used to rank the samples in the same study according to RNA quality. However, if the aim is to establish a quality cut-off value, these factors should be considered. The evaluation of qPCR assays targeting the 3′-end and the 5′ start of other reference gene is needed in order to confirm our results and establish 5′–3′ dCq as a valuable RNA quality parameter.
Clearly, further studies are warranted to establish a cut-off value for inclusion of a given sample in a gene expression study. We propose that pilot experiments are initiated that include positive and negative control samples in order to establish a study-specific cut-off. It is expected that this value will depend on the observed expression difference, the target abundance, the intra-group expression variability, the sensitivity to degradation of the target, the gene expression measurement method (RT–qPCR versus microarray versus massively parallel sequencing), and the nature of the samples (e.g. fresh frozen versus formalin fixed paraffin embedded). The cut-off value will also depend on the purpose of the study; a more stringent assessment of RNA quality is probably needed when a therapeutic decision is required for an individual patient compared to drawing statistical conclusions for a group of samples. Nonetheless, the inability to diagnose or assess prognosis of patients due to inferior RNA quality is unacceptable. Therefore, efforts should be made to overcome this problem and to increase the percentage of eligible cases for gene expression profiling in a clinical setting. For instance, laboratories should be trained to use standard operating procedures for the extraction and storage of high quality RNA. Furthermore, random primed reverse transcription could allow samples with some degree of RNA fragmentation to become eligible (27
). The number of useable samples can also be increased if gene expression profiling is performed immediately after sampling of the tumour (which is clearly a compromising factor in retrospective studies) as it is known that storage of RNA samples might lead to degradation (10
The results from this study demonstrate that monitoring RNA quality is of critical importance to obtain meaningful and reliable gene expression data and to ensure reproducibility of the results. This study confirms the need of proper RNA integrity control and proposes a framework to assess the value of an RNA quality parameter to measure the impact on the gene expression results.