DNA from the cell lines was labelled and hybridised in duplicate to all four microarray types following the manufacturers' instructions, and the same DNA preparations were used for all platforms and replicates in order to avoid effects of DNA quality or cell line variability on the results. Data processing and analysis were performed using the corresponding software for each of the platforms with default settings.
Reproducibility
To measure the reproducibility of log
2 ratios between replicate hybridisations, Pearson's correlation was calculated based on all data points (Table ). Illumina microarrays showed the highest degree of correlation between replicates, 0.96 and 0.94 for TC-32 and OSA, respectively. Agilent microarrays showed also a high correlation between replicates, 0.91 for both cell lines, while Affymetrix microarrays showed an intermediate correlation of 0.75 and 0.84 for TC-32 and OSA, respectively. Nimblegen microarrays showed the lowest correlation, 0.73 and 0.63 for OSA and TC-32, respectively. Although having a more complex karyotype, the OSA cell line showed slightly better correlation for the replicates than the TC-32 cell line for Affymetrix and Nimblegen microarrays. Scatter plots of log
2 ratios for the replicate hybridisations for all microarray platforms are shown in Additional file
1.
| Table 2Reproducibility of replicate hybridisations and signal response to copy number |
In a previous study of melanoma cell lines using lower resolution oligonucleotide microarrays from Affymetrix, Agilent and Nimblegen, similar results were observed [
8]. Here, the highest resolution array from Agilent tested (185 K) showed the highest correlations (ranging 0.72-0.86 for the different samples hybridised), whereas the highest resolution array from Affymetrix tested (500 K) showed intermediate correlations (0.54-0.67). Also here the Nimblegen array (1500 K) showed the lowest correlations (0.27-0.57). In addition, a significant higher degree of correlation was observed for the higher-density microarrays compared to the lower-density microarrays from the same supplier (Agilent 185 K vs 44 K and Affymetrix 500 K vs 100 K) [
8]. Although not directly comparable, this is in concordance with the even higher correlations between replicate hybridisations observed with the higher-resolution microarrays from Agilent and Affymetrix used here.
Similar results have also been observed between replicate hybridisations of Affymetrix 500 K arrays, Agilent 44B arrays and Illumina Hap550 arrays for one leukaemia cell line [
14]. The standard deviation for each probe across four replicate hybridisations was calculated, and Illumina arrays showed the lowest median standard deviation (0.059), followed by Agilent (0.083) and Affymetrix (0.101). However, when the median standard deviation was normalized to the number of measurements on each platform, all platforms showed similar levels of variation [
14].
Signal response to copy number
To quantify and evaluate the signal response to copy number for each platform, the measured DNA copy numbers for five specific chromosomes or chromosomal regions in TC-32 (representing 0, 1, 2, 3 and 4 copies) and one specific chromosomal region in OSA (representing high-level amplification) were compared with the expected ratios based on available cytogenetic and molecular information. Homozygous deletion of the locus around the tumour suppressor gene
CDKN2A in 9p21.3, heterozygous deletion of 9p21.3-p21.2, normal copy number of chromosome 2, three copies of chromosome 8 and four copies of the long arm of chromosome 1 were measured in TC-32, and high-level amplification of the locus around the oncogene
MDM2 in 12q15 was measured in OSA. For both replicate hybridisations, the average log
2 ratio was determined for each normalized segment, as well as the standard deviation of the signals, and this is presented in Table . Figure shows a plot of the measured average log
2 ratios compared to the theoretical values. In addition, the size of the homozygous deletion of the
CDKN2A locus in TC-32 and the high-level amplification of the
MDM2 locus in OSA were estimated (Table ). Copy number plots of the homozygous deletion of the
CDKN2A locus as well as the heterozygous deletion of 9p21.3-p21.2 in TC-32 are shown in Additional file
2 for all microarray platforms.
The values were in general similar between the replicate hybridisations for all platforms and regions. Agilent arrays showed the highest dynamic range, from average log
2 ratio of -4.78 to 4.72 for the homozygous deletion of the
CDKN2A locus and the high-level amplification of the
MDM2 locus, respectively, and gave almost expected log
2 ratios for all regions, close to the theoretical values (Table ). Affymetrix results were second best for log
2 ratios for regions of increased copy number, 3 and 4 copies, but not so good on regions of decreased copy number, 0 and 1 copies. The Illumina results deviated most from the expected log
2 ratios for 3 and 4 copies (theoretical value 0.58 and 1.0, respectively), giving on average log
2 ratios of 0.22 and 0.36, respectively, while the Nimblegen data deviated most for 0 and 1 copies (theoretical value <-1 and -1.0, respectively), giving on average log
2 ratios of -0.87 and -0.46, respectively. For the high-level amplification of the
MDM2 locus in OSA, expected to have a log
2 ratio well above 3 based on previous results [
19,
20], Nimblegen arrays gave the second best values, average log
2 ratio of 2.52, whereas Affymetrix and Illumina arrays gave average log
2 ratios of 1.93 and 1.24, respectively.
A regression line was calculated for the measured average log2 ratios of the regions representing 1, 2, 3 and 4 copies, and the slope and R2 values are given in Figure . The regions representing 0 and > 10 copies were omitted from the regression line, since the expected log2 ratio for the homozygous deletion is not an exact number (< -3) and the exact log2 ratio for the high-level amplification is unknown (> 3). All platforms showed a high linearity of the measured log2 ratios, but the slope of the regression line varied. The slope of the measurements for the Agilent arrays was closest to the theoretical value, followed by Affymetrix and Illumina arrays, whereas the measurements from the Nimblegen arrays deviated most.
In a previous study using lower resolution oligonucleotide microarrays from Affymetrix (250 K), Agilent (185 K) and Illumina (317 K), as well as BAC arrays (32 K), for screening chronic lymphocytic leukaemia, similar results were observed [
9]. Agilent 185 K arrays and 32 K BAC arrays showed the highest dynamic range, where the Agilent arrays showed the most correct response to loss of one copy and gain of one copy, whereas the BAC arrays showed the most correct response to homozygous deletion. A notable difference in the scale of log
2 ratios has also previously been observed between Agilent 44 K arrays, Illumina 109 K arrays and ROMA/Nimblegen 82 K arrays for screening breast cancer, with higher signals for the Agilent arrays [
6]. Affymetrix 100 K and 250 K arrays also showed higher mean log
2 ratio of chromosome X than Nimblegen 385 K arrays in sex-mismatched hybridisations of patients with submicroscopic genomic copy number variations [
11]. In a previous study on melanoma cell lines, Agilent 185 K arrays showed the highest signals for 4 copies (average log
2 ratio 0.86) as well as the highest signal to noise ratio, whereas Affymetrix 500 K arrays showed intermediate values (average 0.55) and Nimblegen 1500 K arrays showed the lowest values (average 0.37) [
8].
Concerning the variation in log2 ratios within a chromosome or chromosomal region, Affymetrix arrays showed the highest standard deviation for the regions representing 2, 3 and 4 copies, whereas the other platforms showed equal variation. Illumina and Agilent arrays showed the highest standard deviation for the regions representing 1 and 0 copies, respectively (0.57 and 1.55). Nimblegen arrays showed the lowest standard deviations for the regions of loss, but this is most likely due to compression of the log2 ratios since the Nimblegen data deviated most for 0 and 1 copies. For the high-level amplification of the MDM2 locus, Agilent showed the lowest and Nimblegen the highest standard deviations (0.23 and 0.99, respectively).
The baseline variation has also been determined in previous studies. For the screening of chronic lymphocytic leukaemia, Affymetrix 250 K arrays, Agilent 185 K arrays and Illumina 317 K arrays showed similar average log
2 ratio and standard deviation of a region with normal copy number (chromosome 1) [
9]. However, when assessing the baseline variation in form of autocorrelation of the whole genome, Agilent showed the lowest variation, followed by Affymetrix and Illumina. For the previous study of patients with submicroscopic genomic copy number variations, Nimblegen 185 K arrays showed a higher standard deviation of the log
2 ratios of the whole genome (excluding regions harbouring the variations) than the Affymetrix 250 K and 100 K arrays [
11].
The distribution of log2 ratios of all probes within the six specific chromosomes or chromosomal regions (representing 0, 1, 2, 3, 4 and >10 copies, respectively) for one hybridisation from all microarray platforms is shown in Figure . The replicate hybridisation showed a similar pattern (data not shown). The distribution is shown for both the normalized data and a smoothed version of the same data. The data were smoothed using Gaussian smoothing with a window size of 50 kb and standard deviation of 10 kb, in order to reduce the variation. Smoothing was not possible for the small regions representing 0 and >10 copies (the CDKN2A and MDM2 loci) due to insufficient number of probes. For the normalized data, the curves for the different copy number levels were by far best separated by the Agilent data, whereas the curves were highly overlapping for the Affymetrix data. However, smoothing of the data had a huge effect on the Affymetrix data, narrowing the distributions of the log2 ratios and thus better separating the curves. The smoothing also further improved the Agilent, Illumina and Nimblegen data, but to a smaller extent. The most difficult separation for all platforms was to distinguish between 3 and 4 copies, and this was particularly not easy with the Illumina data where the two curves were highly overlapping.
Similar results were observed in a previous study of melanoma cell lines, with the distribution of log
2 ratios of regions of 2 and 4 copies [
8]. The distribution was best separated for the Agilent 185 K and 44 K arrays, whereas the Affymetrix 500 K and 100 K arrays showed intermediate results. For the Nimblegen 1500 K arrays, the distribution of log
2 ratios of regions of 2 and 4 copies was indistinguishable. However, in line with the observations in this study, smoothing of the data improved the results for all platforms, with a most profound effect for the Nimblegen 1500 K arrays and Affymetrix 100 K arrays, the two lowest ranking in terms of signal to noise ratios [
8].
The size of the small aberrations of CDKN2A and MDM2 was estimated manually, and this revealed very similar results for all microarray platforms (Table ). Only Affymetrix arrays showed a difference in size of the deletion of the CDKN2A locus between the replicate hybridisations for TC-32, detected to be 166 and 148 kb, respectively.
Detection of DNA copy number aberrations
The aberrant regions examined for signal response to copy number were scored using the analysis software provided for each microarray platform with default settings. In addition, data from all platforms were exported into Nexus (BioDiscovery) in order to make an independent scoring of the aberrations. This software also has advantages when it comes to downstream integration with other genome-level data. In Nexus, all four platforms were analysed using the rank segmentation algorithm with default settings. Table shows the scoring of the aberrant regions examined for signal response to copy number from the platform-specific analysis software as well as Nexus for one hybridisation from all platforms. The replicate hybridisation showed a similar pattern (data not shown). Detection of the homozygous deletion of the
CDKN2A locus as well as the heterozygous deletion of 9p21.3-p21.2 in TC-32 using Nexus for all platforms is shown in Additional file
3.
| Table 3Detection of copy number of specific regions |
Using the platform-specific analysis software, Affymetrix, Agilent and Illumina scored the correct copy number level for all regions representing 0, 1, 2, 3 and 4 copies in TC-32, and indicated the high-level amplification in OSA. The only exception was the scoring of 4 copies of 1q using the Illumina data, where the software segmented the region into segments of mainly 3 copies and some smaller segments of 4 copies. For Nimblegen, the corresponding analysis software segments the data and displays them, without giving copy number scores.
Using the platform-independent Nexus software, all regions were determined to have the correct copy number for the Affymetrix data (Table ). The region of 1 copy was over-scored as a homozygous deletion for Agilent, because of the low log2 ratios of this segment (average -0.97) and the threshold for homozygous deletion in Nexus (default log2 ratio < -1.0). The region of 4 copies was scored as one copy less for Illumina, most likely due to compression of the log2 values. For the Nimblegen data, Nexus detected the homozygous deletion as 1 copy and the 1q region as 3 copies instead of 4, most likely also due to compression of the log2 values.
In a previous study, Affymetrix 500 K arrays, Agilent 244 K arrays and Nimblegen 385 K arrays were compared for detection of submicroscopic constitutional aberrations [
15]. In that study, using the corresponding analysis program, all 10 previously known abnormalities investigated were detected using the Agilent data, whereas one and three aberrations were not identified using the Affymetrix and Nimblegen data, respectively. However, using the software dChip in combination with an R script, all aberrations were detected using the Affymetrix data as well [
15]. For the comparison of Affymetrix 500 K arrays, Agilent 44B arrays and Illumina Hap550 arrays, all known alterations in a leukaemia cell line were identified using both a platform-specific software and a platform-independent analysis (circular binary segmentation) [
14].
The number of overall copy number aberrations in chromosome 1-22 detected by Nexus is given in Table , for both TC-32 and OSA for all microarray platforms and both replicate hybridisations. Detection of the copy number aberrations is given in Additional File
4. Nexus divides the copy number aberrations in four categories; homozygous copy loss, loss, gain and high copy gain depending on the log
2 ratio of the segments. The number of detected copy number aberrations was in general similar between the replicate hybridisations, except for the categories gain and loss in TC-32 by Affymetrix, where the replicates varied with 31 and 36 aberrations, respectively.
| Table 4Detection of copy number aberrations |
The number of detected aberrations varied between the platforms, in general showing that microarrays with a higher number of probes detect more segments of copy number aberrations. Affymetrix showed by far the highest number of aberrations, as expected with the 1.8 million probes on the array, and most additional aberrations compared to the other platforms were small regions (Table and Additional File
4). However, Agilent, with only 236 k probes on the array, also showed a high number of small aberrations in the OSA cell line. Illumina identified approximately the same number of total aberrations as Agilent, whereas Nimblegen identified considerably less than the other platforms (Table ). Some differences between the platforms were observed for larger regions, for instance the heterozygous deletion of 4q in OSA detected by Agilent, which was partly detected by Illumina and not at all by Nimblegen. Affymetrix detected several small regions within 4q as deletions (Additional File
4).
Similar results were observed in the analysis of the total number of chromosome segments altered in a leukaemia cell line, where the highest resolution arrays (Illumina Hap550) showed the highest number of identified segments, followed by Affymetrix 500 K arrays and Agilent 44B arrays [
14]. On the other hand, for the screening of chronic lymphocytic leukaemia, the lowest resolution array (Agilent 185 K) detected the highest number of platform-specific copy number aberrations, followed by Affymetrix 250 K arrays and Illumina 317 K arrays [
9]. Most of these aberrations were smaller segments. For the aberrations detected in common by two of the platforms, most often the Affymetrix and Agilent platforms showed concordant results [
9]. A comparison of copy number aberrations detected in 18 melanoma cell lines by Affymetrix 500 K arrays and Agilent 244 K arrays showed a similar number of total aberrations detected with a 29% overlap between the two platforms [
8].
Scoring of loss of heterozygosity
An advantage of the Affymetrix and Illumina platforms is that they also provide global polymorphism information, and thus can indicate regions of LOH that could be involved in loss-of-function mutations, haploinsufficiency, etc. SNP analysis of the Affymetrix and Illumina data was performed in Nexus using the SNP-FASST segmentation algorithm with default settings.
In general, Affymetrix detected slightly more regions of allelic changes overall for both samples, but both platforms detected allelic changes in the regions with copy number aberrations scored by Nexus. In addition, regions of copy number-neutral allelic changes were identified, and detection of the copy number-neutral LOH of 1q in OSA using Nexus is shown in Additional file
5. Nexus divides the detection of allelic changes in two categories; LOH and allelic imbalance, depending on the distribution of the allelic ratio plot. For the copy-number neutral LOH of 1q in OSA, as well as other similar regions, the allelic changes were scored as LOH for the Illumina data, whereas the allelic changes were only scored as allelic imbalance for the Affymetrix data, due to a less defined distribution of the allelic ratio plot. The allelic ratios of the Illumina SNP data were overall better separated and thus more precisely scored, but all the allelic changes identified using the Illumina data were also identified using the Affymetrix data. Thus, the two platforms both perform well in detecting regions of allelic imbalance based on the SNP data.
Detection of LOH has previously been compared for Affymetrix 250 K arrays and Illumina 317 K arrays for chronic lymphocytic leukaemia [
9]. Most loci were concordant between the two platforms, especially for regions > 4 Mb, but more differences were observed for smaller regions. The Illumina arrays showed in general a higher detection rate, in contrast with this study, but also a lower noise level in the LOH analysis, which was also observed in this study.