Although microarrays have been extensively used as discovery tools for biological and biomedical studies, the challenge remains whether this technology can be reliably applied in clinical practice and regulatory decision making, where high precision and accuracy in performance are required. A series of studies have been reported on evaluating performance across various commercial and home-brewed microarray platforms, however, most of these studies focused on evaluating the level of concordance across different microarray platforms. While these analyses emphasized critical issues such as the compatibility across different microarray platforms, they tended to result in conflicting conclusions because the "relative to relative" nature of such approaches. What is lacking in these studies is a "gold standard" data set that allows an evaluation of different microarray platforms based on a common "ground truth". One commonly used approach for setting up such "ground truth" is by spiking in bacterial synthetic transcripts with known concentrations in series of dilutions over a large dynamic range [22
], however, the limitation of this approach is that the information is asserted from very limited transcripts, and it is also very prone to experimental artifacts. An alternative strategy to set up the "ground truth" is using a well accepted reference data set generated by a reliable independent technology, such as real-time PCR for gene expression measurements. In this study, we have constructed a large reference data set of gene expression measurements using TaqMan Gene Expression Assays and real-time PCR technology. We also demonstrated how to use such a data set to evaluate the performance of different microarray platforms.
We first evaluated the detection sensitivity and accuracy of the two selected microarray platforms using TaqMan Gene Expression Assays and real-time PCR data set as the reference. We chose to use the detection thresholds that are recommended by each manufacturer as the base line for comparison. These recommended thresholds are somewhat arbitrary and are not necessarily based on the same parameters, nevertheless, these detection thresholds are widely adopted by researchers and therefore evaluating their effect on detection sensitivity and accuracy can prove useful in further refining them and better interpreting microarray results. Our results showed that both of the microarray platforms can achieve reasonably good sensitivity in signal detection, while the specificity tends to be relatively low, especially for Agilent microarrays, with a ~ 50% false positive rate. It is worth noting that the differences in detection sensitivity and specificity we observed could be caused by less optimal bioinformatics/algorithms used to define the detection thresholds and do not necessarily reflect the inherent qualities or accuracies of the respective platforms. Several strategies could be developed to improve the detection specificity of microarrays, including improving probe design, hybridization conditions which would minimize the effects of cross-hybridization, as well as improving image analysis software/algorithms to facilitate more accurate signal quantification and detection thresh-holding.
This study also evaluated correlation in detecting differential expression between microarray platforms and TaqMan real-time PCR platform (Figure and Figure ). Our analysis also provided a high-resolution examination of the performance of microarrays in detecting differential expression at different expression levels as well as at different fold changes. We validated that microarrays have acceptable sensitivity and accuracy in detecting differential expression, especially for genes with high and medium expression levels and for detecting > 2-fold changes. These results support the notion that microarrays, as exploratory tools for genome-wide gene expression screening, can achieve acceptable reliability in performance.
Our study also characterized some of the limitations of microarrays, in particular the ratio compression phenomena as shown in Figure . A certain level of fold change compression is expected for microarray platforms due to various technical limitations, including limited dynamic range, signal saturations, and cross-hybridizations. The two-color system analyzed in this study (Agilent microarrays), appears to have more severe ratio compression, which could be attributed to several factors: (1) The concentration of the 60mer probes on Agilent microarrays depends on the coupling efficiency of the in-situ oligonucleotide synthesis, and on probe length. Lower efficiency may result in low probe concentration and therefore limit the dynamic range of the platform; (2) Two-color systems such as Agilent arrays, utilize two different fluorescent dyes that have different dynamic ranges and quantum yields. These intrinsic differences may be partially adjusted by intra-array intensity-dependent normalization but may not be completely eliminated. Theoretically, dye swapping experiments may help to further adjust these biases introduced by two different dyes. In reality, however, dye-swapping is not always practical due to cost and limitations in sample amount. Finally, ratio compression can be also introduced by certain data-processing/normalization algorithms that aim to reduce variances (e.g. lowess normalization for Agilent microarrays and RMA method for Affymetrix microarrays). Our analysis suggests that the optimal balance between the two parameters will eventually determine the overall accuracy in detection of differential expression, for a given microarray platform. Other microarray limitations revealed by our study include the significant decrease in overall accuracy of differential expression detection at low expression level (Figure ) and the relatively poor sensitivity in detecting small fold changes (i.e. < 2-fold). Although these limitations have been previously suspected by many, the large scale "reference" data set provided by our study provides a more quantitative view of these limitations for the first time.
Lastly, it is noteworthy that although TaqMan Gene Expression Assay based real-time PCR is a well accepted "gold standard" for gene expression measurements, we are aware that it has its own limitations and is also affected by experimental errors. In addition, different strategies in probe designs for microarrays (usually 3' biased and targeting a composite of transcripts) and TaqMan Gene Expression Assays (usually without a priori bias and targeting a single or subset of transcripts) may also account for a small percentage of the discordance observed between the microarrays and real-time PCR results. For example, the gene expression profiles of gene NM_003640 measured by both microarray platforms are highly correlated with each other but anti-correlates with the profile measured by TaqMan Gene Expression Assay (Hs00175353_m1, Figure ). NM_003640 is a relative long transcript with 5917 bases and 37 exons. While the TaqMan assay was designed against the exon 2 and exon 3 junctions, which is > 4 kb from the 3' end, the microarray probes were usually designed close to 3' end (mostly within 1.5 kb from 3' end for Applied Biosystems probes). In this particular instance, the data suggest that the TaqMan gene expression assay is potentially detecting additional splice variants than the array probes. This difference in probe designs may result in quantifying different population of transcripts (e.g. product of alternative splicing or degradation) by microarrays or TaqMan assays. These factors may change the absolute metrics (i.e. TRP, TFP, and anti-correlation rate); nevertheless, they would not change the general conclusions and trends we observed. We think that most of the discrepancies between TaqMan based real-time PCR and microarrays are due to the sensitivity limits of a PCR based approach vs. a hybridization based approach. It is clear that at high expression levels, there is a much better correlation between the two approaches (Figure ). These factors may change the absolute metrics (i.e. TRP, TFP, and anti-correlation rate); nevertheless, they would not change the general conclusions and trends we observed. We think that most of the discrepancies between TaqMan based real-time PCR and microarrays are due to the sensitivity limits of a PCR based approach vs. a hybridization based approach. It is clear that at high expression levels, there is a much better correlation between the two approaches (Figure ).