The accurate and high-resolution mapping of DNA copy number aberrations (CNA) has become an important tool for biological and medical research. From understanding the extent of natural genetic variation [1
], to associations with diseases such as HIV [2
], to elucidating the mechanisms of tumourigenesis [3
], such research is dependent on the quality of the data generated.
Numerous reports on the use and comparison of copy number profiling platforms have appeared [4
] and more recently an approach to perform meta-analyses across such platforms has been described [11
]. Early studies [12
] suggested a high level of concordance between BAC-based aCGH and SNP-based platforms (Affymetrix 10 K array) in detecting CNA, but did not formally compare them. Greshock et al
] performed the first systematic comparison of multiple platforms on melanoma cell lines and found that a high level of sensitivity and specificity was observed for the Agilent 185 K arrays and that the increased probe density of Affymetrix arrays (100 K and 500 K) results in increased confidence in detection for these platforms. These results were echoed by Gunnarsson et al
] who also examined the performance of several older copy number profiling platforms (a 32 K BAC array, the Affymetrix 250 K SNP array, the Agilent 185 K oligonucleotide array, and the Illumina 317 K SNP) array in 10 chronic lymphocyte leukaemia (CLL) samples. They concluded that all platforms performed reasonably well at detecting large alterations, but that BAC probes were too large to detect small alterations. While Agilent offered the highest sensitivity, the increased density of SNP-CGH platforms (Affymetrix and Illumina) compensated for their increased technical variability, with Affymetrix detecting a higher degree of CNA compared to Illumina. A further aCGH study did not compare platforms, but did investigate the influence of cellularity on copy number detection [13
] and concluded that modern high-resolution arrays could cope with high levels of contamination.
To attempt a fair and formal comparison of copy-number profiling platforms in a general setting is an almost futile exercise. Quantification of performance is difficult even with idealized data, and while measurements have been proposed such as the theoretical power to discover a single copy loss or gain [7
], or the 'functional resolution' of the platform [6
], these tend either to measure a very specific aspect of the platform, or appear flawed under close examination. Such idealized data are, in any case, difficult to obtain, as one has to ask what is fair in terms of numbers entering the experimental design. Should one Illumina array be compared to one Nimblegen array or should the two-channel Nimblegen array be compared to two arrays from the single colour technology? Should the two-colour platform be penalized by an inefficient design to allow easier comparison, or the SNP-based platform credited for the additional information that it brings? If, as often is the case, the main experimental constraint is financial, then comparing $1000 of one technology to $1000 of another technology would seem sensible. However, the relative costs of platforms will vary from laboratory to laboratory and with time, and such an approach would foist the authors' view of microarray economics on the reader.
Additionally, the results from such an exercise are only as good as the analysis methods used and in that regard one has two options, both flawed. Naturally, the platforms will require different pre-processing strategies, but if different methods of analysis are also used for segmentation, then the performance of the technology will be confounded with the adequacy of the algorithm. This then punishes newer technologies for which analytical methodologies are not yet mature. The alternative, to use a common approach for the analysis of all platforms, is undesirable firstly because that approach is likely to have been developed for one of the technologies and may thus introduce bias, and secondly because the deliberate use of a sub-optimal analysis does not provide useful information to inform decisions in the real world. Nonetheless, informative qualitative comparisons can be made without performing segmentation that illuminate the relative strengths and weaknesses of each platform. We acknowledge that some users will be primarily interested in a comparison based on using existing analytical tools, rather than concerning themselves with the potential of each platform, but that is not the purpose of this study.
This study differs from previous comparative assessments of copy number profiling platforms in that we have attempted to characterize the strengths and weaknesses of various platforms in as unbiased a fashion as possible by avoiding measures that cannot be fairly computed, highlighting areas of potential bias, and emphasizing a graphical assessment of performance that provides insight about the underlying technology as well as the specific platform. Inevitably, despite considerable effort, these comparisons will be shaped by our own prejudices concerning copy number analysis, but we have made the raw data available for others to draw their own conclusions.
Due to the speed of platform development, it is typical for a platform to be superseded by one with a greater number of features before comparisons involving it are published. The generation of platforms described here have not yet been the subject of an in-depth comparison, but have indeed already been superseded since this study was performed. Nonetheless, the underlying technologies are similar and a comparison is still informative. Implications for the new generations are discussed in the New Platforms section.
Herein we describe a comparison based on the analysis of two cell lines, six primary breast tumours, including matched normal samples, and two HapMap individuals. The SUM159 and MT3 cell lines and HapMap samples were selected based on the presence of known chromosomal aberrations, while the tumours are highly heterogeneous and hence present additional complexity for copy number analysis, not least with regard to their varying degrees of cellularity.
Here we present an analysis of probe coverage on each of the microarray platforms and a technical description of their reproducibility, sensitivity, and noise. We also provide an in-depth visual assessment of the ability of the different platforms to identify a range of sizes of copy number aberration. Lastly, we provide a publicly available dataset resulting from the processing of a range of samples (chosen to evaluate different abilities) on each platform. This information will allow interested parties to make decisions based on their own circumstances, preferences, and constraints.