Depending on availability or the set of genes to be quantified, large scale gene expression profiling studies have used different versions of chips of a given microarray platform. For the data sets analyzed in this study two types of Affymetrix chips were used: the HuFL oligo chips and the U95Av2 chips. These contain 20 and 16 oligo probes per probe set, respectively. For the cDNA microarray studies, the pool of actual clones shows a very high level of diversity between various studies. Therefore, the exact number of overlapping probes depended on both the specific generation of Affymetrix platform and the set of cDNA clones to which it was mapped. A summary of these data is listed in Table .
Summary of mapping cDNA microarray features to probes on Affymetrix gene-chips.
Comparison of cDNA and Affymetrix expression measurements
Because cDNA microarray measurements are typically reported as the log ratio of an experimental (Cy5) and control (Cy3) channel, direct comparison with single-channel Affymetrix data required that one of the two data sources be converted to a scale compatible with the other. Because the spot-size on robotically spotted cDNA microarrays can vary substantially, considering only the experimental channel would have given expression measurements prone to probe-quantity artifacts. On the other hand, without direct measurement on the Affymetrix platform of the control RNA used in the cDNA hybridization, it was impossible to replicate exactly the reference response level of each measurement feature.
We attempted to address this difficulty by assuming that the reference RNA batches chosen for each cDNA hybridization uniformly reflect the diversity of experimental transcript populations and therefore that the mean of a gene's measured expression level across all experiments may serve as a reference for the normalization of Affymetrix data (methods). We verified that the mean expression measured by each Affymetrix array did not vary substantially (max- min < 0.25).
Sequence-overlapping probes give greater cross-platform consistency for the NCI-60 panel
The NCI-60 cell line panel consists of sixty well characterized human tumor cell lines derived from patients with leukaemia, melanoma, and lung, colon, central nervous system, ovarian, renal, breast and prostate cancers. This cell line panel has been developed by the Developmental Therapeutics Program of the National Cancer Institute and routinely used to screen potential anticancer drugs [5
The gene expression profiles of the NCI-60 cell line panel measured by cDNA microarray and by Affymetrix HuFL oligo chips constitutes a unique data source. To the best of our knowledge, it is the only publicly available dataset in which replicates of a large number of diverse RNA samples have been quantified by these two microarray platforms. Affymetrix microarray probe sets were classified based on their shared sequence identity across the two platforms.
Since the actual number of overlapping probes can be between 0 and 20, a large number of potential stratification schemes can be implemented. However, for a clear presentation of results we chose to compare the following classes representing different levels of shared identity: a) Affymetrix probe sets that share a Unigene ID with a cDNA clone. (termed Shared Unigene probes) b) Affymetrix probe sets containing probes that could be sequence-matched to the same transcript sequence as the cDNA clone, but for which no Affymetrix probe actually overlaps the cDNA clone sequence (termed Shared Transcript probes); c) Affymetrix probe sets with 1 to 10 probes sequence overlapping with the cDNA clone (termed Partially Overlapping probes); d) Affymetrix probe sets with 20 (i.e. all) probes sequence overlapping with the cDNA clone (termed Completely Overlapping probes); e) alt-CDF or "redefined probe sets" for which all probes across the entire array that matched to a given cDNA clone insert were used to define a new derivative probe set. This new probe set may contain only a subset (even a single probe) of an original probe set; in other cases probes across several original probe sets were joined into the new derivative probe set (fig ). For "partially overlapping" and "completely overlapping" probes (classes c and d), the entire original probe set was used for calculating gene expression levels, whereas for the "redefined" probe sets (class e) only the sequence mapped probes were retained.
Figure 1 Composition of redefined Affymetrix probe-sets based on overlap with cDNA clone insert sequence. Stacked histograms show the distribution of probe-set size for sets consisting of a single Affymetrix-defined probe-set (black) and for those comprised of (more ...)
Figure demonstrates the correlation between the Affymetrix and cDNA microarray measurements for the various types of matched probes across the two platforms. Increasing the number of overlapping Affymetrix probes ensures increased cross-platform consistency both for matched genes and matched cell-lines. Additionally, concordance was greatest when only sequence-overlapping probes were used by redefining probe sets, even though in some cases only a single Affymetrix probe was considered. Redefined probes and completely overlapping probes showed the highest concordance levels. (The cumulative correlation distributions showed little difference, however the former method allowed a 4-fold increase in the number of available genes.) These results imply that probes targeting identical transcript sequence regions give substantially stronger concordance than probes that target identical contiguous transcript molecules at different sequence regions. In order to further investigate the effect of direct sequence overlap we examined the performance of Affymetrix probe sets that can be sequence mapped to the same transcript molecule but show no actual overlap with the cDNA clone insert ("shared transcript" probes, class b). These probe sets showed the lowest correlation. This might be due to a number of factors including the presence of splice variants, the probes being subject to different cross-hybridization patterns, or incorrect clone sequence predictions.
Figure 2 Sequence-overlapping probes give greater cross-platform concordance for the NCI-60 panel. (A) Pearson correlation coefficient was calculated for each gene between its expression values measured on the Affymetrix Hu6800 platforms and its expression values (more ...)
Figure also shows, however, that a significant number of probes matched by complete sequence overlap show rather poor correlation (around zero) across the two platforms. The same applies to redefined probe sets. Because we used Pearson correlation as our concordance metric, we expect genes for which the signal fluctuation is below the resolution of the measurement platform to have low levels of concordance, (since the corresponding correlations will be made between noise.) We investigated the effect of removing genes with low levels of variation across the cell-lines on the cross-platform concordance (Fig. ). Specifically, we removed genes from the Affymetrix dataset with standard deviations below 0.388, (representing the 50th percentile of standard deviation in the full Unigene-mapped dataset.) We removed genes from the cDNA dataset with standard deviations below 0.265, (representing the 50th percentile of standard deviation in the full cDNA dataset.) Matched gene and cell-line concordance was then assessed as described using the genes remaining in both datasets (Fig. ).
Effect of standard deviation filtering on cross-platform NCI-60 concordance. Genes are filtered removing those with low standard deviations across the 60 cell-lines (methods.) Matching features are determined and concordance assessed as in Figure 1.
As expected, removing these genes substantially increased both gene and cell-line concordance (Fig. ). This improvement was substantially greater than that obtained by filtering genes based on mean expression (data not shown). Specifically, the range of median gene correlation increased from approximately 0.2 – 0.4 to 0.4 – 0.6. Interestingly, filtering did not give a substantial improvement near the low end of the distribution, suggesting that some correlations of < 0.1 may be due to incorrect mappings or non-functional probes.
Finally, we noted that "complete overlap" matched pairs performed better than redefined probe sets after standard deviation filtering. This may be due to a number of factors, such as the potentially small number of probes interrogating a given transcript level (in some cases only a single probe.) Alternatively, the redefined probe sets may contain spurious probes in cases where a false-positive clone sequence prediction led to the combination of several Affymetrix-defined probe sets. In any case, the ~4-fold increase in the number of mapped genes available through redefined probe sets may offset the small reduction in concordance.
Highly correlated genes are expected to produce a more reproducible unsupervised classification of the cell lines than that derived from a larger pool of genes with less correlation. This can be evaluated in several ways. For example, the hierarchical classification trees derived from the Affymetrix gene chip and cDNA microarray based measurements can be visually compared. Improved reproducibility of classification is indicated by the fact that more cell lines show similar or identical classification on the two hierarchical trees (fig ).
Figure 4 Conserved clustering pattern of the NCI-60 cell lines profiled using cDNA microarray and Affymetrix gene chips. Data was normalized as described (methods). Average linkage Pearson correlation hierarchical clustering was computed for each dataset. Cell (more ...)
Encouraged by our initial success, we merged the Affymetrix and cDNA microarray based gene expression profiles and hierarchically clustered the composite data set. More consistent measurements of gene expression across the two platforms would result in a greater number of instances in which the measurements of the same cell-line cluster together. In addition, co-clustering of cell lines of similar origin also provides circumstantial evidence that the gene expression profiles accurately reflect a certain tumor subtype.
Indeed, hierarchical clustering of the combined datasets resulted in a greater number of matched cell-lines clustering together when only sequence-overlapping measurements were used (fig ). The majority of matched cell lines are more correlated to one another than to any other cell line from either platform. This was not the case when the expression measurements were Unigene-matched (fig ).
Figure 5 Improved hierarchical clustering of combined NCI-60 cell-lines profiled by Affymetrix gene-chip and cDNA microarray by sequence-overlapping probe measurements. The gene expression profiles obtained for the sixty cell lines by the Affymetrix gene chips (more ...)
We were somewhat disconcerted by the fact that some of the cell lines showed a completely different localization on the two hierarchical trees. For example, the colon cancer cell line HT-29 clusters together with other colon cancer cell lines on the cDNA microarray derived tree but it is placed in a different cluster on the Affymetrix gene chip based classification tree (fig ). An obvious explanation for this discrepancy would be the failure of the Affymetrix gene chip based measurement. Since no replicates were produced for any of the measurements, there is no statistically sound way of evaluating the quality of any of the gene expression profiles except by some circumstantial measures. For example, most cell lines had cross-platform correlation coefficients larger than 0.2 (Fig ). HT-29 was the single outlier with correlation consistently near 0. We obtained an alternative measurement of the same cell line based on an HG-U133A Affymetrix gene chip (a generous gift of Avalon Pharmaceuticals Inc.) We extracted a gene expression profile using the "redefined probe sets" strategy. This gene expression vector produced a much higher correlation coefficient (0.208) with the corresponding cDNA microarray measurements.
Sequence overlapping measurements improve cross-platform classification of breast cancer subtypes
We were seeking further confirmation for our method using gene expression profiles derived from various human tissue samples. These data sets do not allow highly controlled side-by-side comparisons such as the above presented analysis using in vitro cell lines. Therefore, we needed to rely on "indirect" measures of cross-platform consistency, such as classification reproducibility.
Namely, we investigated whether sequence matching of probes would enable us to reproduce the classification of primary breast tumor derived gene expression profiles produced by different microarray platforms.
A breast-cancer subtype classifier was derived from a cohort of patients profiled on cDNA microarrays [1
]. This classifier transferred to Affymetrix HuFL gene expression data [6
] only to a limited extent [7
]. Recently, we improved on those results by using only those Affymetrix and cDNA probes that could be mapped to the same transcript [4
]. This earlier publication, however, did not involve the selective use of only those oligo probes that actually matched the cDNA clone. Here we introduced the use of "redefined probe sets" as described in the methods. This was coupled with an advanced normalization method, RMA [8
], leading to a strong overall improvement over the original results of Sørlie et al [7
] (fig ). In particular, with two exceptions, all samples could be assigned to a breast cancer subtype defined by the cDNA microarray derived centroids. In addition, more than 70% of all samples clustered in their own well-defined clusters.
Figure 6 Increased efficiency of breast cancer subtype classification transfer from cDNA microarray to Affymetrix HuFL gene-chip tumor-profiles by sequence-overlapping probe measurements. Tumor samples profiled on the Affymetrix platform were classified according (more ...)
Furthermore, we compared the transfer of the cDNA-based classifier [7
] to two additional cohorts of breast cancer samples profiled on Affymetrix HG-U95Av2 gene-chips [9
], using both the 'shared Unigene' (fig ) and 'redefined probe sets' (fig ) to match measurements (see methods). Since true classes are usually not known a priori
for novel cancer subtypes, we focused our attention on a subtype where gene expression profiles associated with an independent immunohistochemical marker: Her-2 / erbB2 status. Significantly, the classification based on 'redefined probe sets' contains a larger and more coherent ERBB2+ subtype cluster than that based on shared Unigene identifier. The validity of this cluster was substantiated by the immunohistochemical assessment of Her-2 status (available only for the Santorini cohort); all of the tested samples in this cluster stained positive for Her-2 amplification.
Figure 7 Increased efficiency of breast cancer subtype classification transfer from cDNA microarray to Affymetrix HG-U95Av2 gene-chip tumor-profiles by sequence-overlapping probe measurements. Tumor samples profiled on the Affymetrix platform were classified according (more ...)
Sequence-overlapping measurements improve cross-platform similarity of normal lung samples
Finally, we evaluated our sequence-overlap probe set redefinition method on a third cDNA platform. In this case, we evaluated the cross-platform similarity of normal lung samples profiled on cDNA microarrays [11
] and Affymetrix HG-U95Av2 gene chips [12
]. These two independent data sets contain normal samples from different patients. However, a robust gene expression profile was detected in both studies for the normal lung tissue samples [11
]. If this robust, normal gene expression profile is accurately measured by both microarray platforms, then a high Pearson correlation coefficient would be expected between the normal samples, independently from the microarray platform used for a given tissue sample. Therefore, we calculated the correlation coefficient between each possible pair of normal gene expression profiles across the two platforms. Two probe matching strategies, the Unigene and sequence-overlap based mappings were compared (fig ). The significance of the observed increase in cross-platform correlation was assessed at p = 0.0002 (methods), further highlighting the advantage of using only sequence-overlapping measurements for cross-platform comparison.
Figure 8 Increased cross-platform similarity of normal lung samples by sequence-overlapping probe measurements. Shown are the cumulative distributions of the 5 × 17 cross-platform sample correlations (see methods.) substantially greater similarity is observed (more ...)