|Home | About | Journals | Submit | Contact Us | Français|
In the 2007 Association of Biomolecular Resource Facilities Microarray Research Group project, we analyzed HL-60 DNA with five platforms: Agilent, Affymetrix 500K, Affymetrix U133 Plus 2.0, Illumina, and RPCI 19K BAC arrays. Copy number variation was analyzed using circular binary segmentation (CBS) analysis of log ratio scores from four independently assessed hybridizations of each platform. Data obtained from these platforms were assessed for reproducibility and the ability to detect formerly reported copy number variations in HL-60. In HL-60, all of the tested platforms detected genomic DNA amplification of the 8q24 locus, trisomy 18, and monosomy X; and deletions at loci 5q11.2~q31, 9p21.3~p22, 10p12~p15, 14q22~q31, and 17p12~p13.3. In the HL-60 genome, at least two of the five platforms detected five novel losses and five novel gains. This report provides guidance in the selection of platforms based on this wide-ranging evaluation of available CGH platforms.
Comparative genomic hybridization (CGH) measures DNA copy number differences between a reference genome and a test genome. The DNA samples are differentially labeled and hybridized to an immobilized substrate. In early CGH experiments, the DNA targets were hybridized to metaphase chromosome spreads in fluorescent in situ hybridization (FISH) assays. This technology later evolved so that the DNA targets are hybridized to microarrays containing cDNA fragments or bacterial artificial chromosomes (BACs). Recent commercial offerings from Agilent, Affymetrix, and Illumina derive copy number differences using oligonucleotide microarrays representing 500,000 or more loci. In most commercial assays, genomic DNA is labeled and hybridized to microarrays designed for single nucleotide polymorphism (SNP) genotyping analyses. Interestingly, Auer et al.1 have recently shown that expression microarrays, such as the Affymetrix U133 series can also be used to identify copy number differences.
It has become apparent that copy number variants are quite common in the human genome and can have dramatic phenotypic consequences as a result of altering gene dosage, disrupting coding sequences, or perturbing long-range gene regulation. These DNA anomalies are associated with many genetic diseases including congenital anomalies, developmental delay, and mental retardation. As a result, many arrays have been designed to diagnose these DNA alterations as well as to detect gains and losses of tumor suppressor and oncogenes.2,3 Identifying the specific segmental genomic alterations and the genes they contain will yield molecular targets for diagnostics and therapy. For these treatments to be effective, a reliable and accurate identification of genomic alterations associated with a given disease is essential. The following paragraphs identify recent studies that characterize the reliability and accuracy of various CGH technologies.
Huang et al.4 identified multiple regions of amplification and deletion using whole genome sampling analysis (WGSA) on a panel of human breast cancer cell lines. Their WGSA simultaneously genotyped over 10,000 SNPs by allele-specific hybridization to perfect match and mismatch probes synthesized on a single array. With a mean inter-SNP distance of 244 kb, they obtained a resolution primarily attributed to their high-density oligonucleotide array.4
A research group including Wellcome Trust, Affymetrix, the University of Tokyo, and others employed Affymetrix GeneChip Human Mapping 550K Early Access arrays and clone-based CGH on 270 HapMap samples. They identified 1447 copy number variations (CNVs) ranging in size from 960 bp to 3.4 Mb. These CNVs contained hundreds of genes, disease loci, functional elements, and segmental duplications, and provided the framework for the first comprehensive global map of human CNVs.5,6
By hybridizing genomic representations of breast and lung carcinoma cell line and lung tumor DNA to SNP arrays, and measuring locus-specific hybridization intensity, Zhao et al.7 detected both known and novel genomic amplifications and homozygous deletions in these cancer samples. Comparison with BAC and cDNA array analysis showed that the three platforms gave generally comparable results. The BAC arrays showed the highest signal-to-noise ratios, making them better suited to detect single-copy alterations. However, the SNP arrays allow copy number changes and genotype to be measured in the same experiment.7
Recent advances in array-based CGH technology have refined the determination of chromosomal gains and losses. These refinements are dependent on improved arrayCGH performance characteristics that have been evaluated in recent review articles.8–10 Coe et al.9 have defined a “functional resolution” for arrayCGH technology that incorporates the size and uniformity of element spacing on the array as well as the sensitivity of each platform to single-copy alterations. They propose that the detection sensitivity of an array is best described by the probability of detecting any alteration of a given size.
The goal of the 2007 Association of Biomolecular Resource Facilities (ABRF) Microarray Research Group (MARG) project was to assess the ability of current technologies to detect chromosomal aberrations. For this assessment we selected five CGH platforms, a test genome with a variety of known gains and losses, and analysis software that would facilitate comparison of the resolution of each platform. At the time of the study, the five platforms represented the state of the art for detecting chromosomal aberrations: Agilent CGH 44B Microarray, Illumina HumanHap 550 BeadChip, Affymetrix GeneChip Human Mapping 500K Array Set, a human BAC19K array developed by Roswell Park Cancer Institute, and the Affymetrix Human Genome U133 Plus 2.0 gene expression array. Each platform was assessed on its repeatability between replicates and on detection of the reported gains and losses in the HL-60 cell line compared with reference DNA.
Genomic DNA was isolated from HL-60 leukemia cells11 and human female normal lung DNA was purchased from the Biochain Institute (Hayward, CA). DNA purity was assessed by measurement of the 260/280-nm absorption ratio. The HL-60 tumor line was derived from the bone marrow cells of a patient with acute myelogenous leukemia.12
We relied on two CGH studies to compile the reference set of HL-60 genetic alterations. To characterize the HL-60 cell line genetic alterations, Ulger et al.13 employed microarrays with 1003 non-overlapping BAC and PAC clones that provided an average resolution of 3 Mb. These investigators detected 10 copy number changes in the HL-60 cell line: amplification at 8q24, trisomy 18, and monosomy X; as well as deletions at loci in 5q11.2~q31, 6q12, 9p21.3~p22, 10p12~p15, 14q22~q31, 16q21, and 17p12~p13.3. A more recent paper by Peiffer et al.,14 which used whole-genome BeadChip microarrays (Illumina, Hayward, CA) to assay up to 317,000 SNP loci, detected 8 of the 10 chromosomal aberrations reported in the Ulger et al.13 research. HL-60 loss at chromosomes 6q12 and 16q21 was not detected by Peiffer et al.14 and therefore not included in our reference set. The nucleotide start-and-stop positions of the 8 reference variants used in this study are listed in Supplemental Material.
Detailed descriptions of each of the selected platforms are provided below and summarized in Table 1. The physical locations for all probes are reported as NCBI Build 35 genome assembly, also referred to as HG17 (http://genome.ucsc.edu). For the Affymetrix (Santa Clara, CA) U133 platform, the physical position is reported as the central nucleotide for a probe set, after performing a reannotation based on probe sequences.1 No location is reported for probe sets with less than 50% homology to Build 35.
DNA printing solutions were prepared from sequence-connected Roswell Park Cancer Institute-11 BAC clones (RPCI; Buffalo, New York) by ligation-mediated polymerase chain reaction (PCR) as described previously.15–17 The minimal tiling RPCI BAC array contains ~19,000 BAC clones that were chosen by virtue of their sequence tagged site (STS) content, paired BAC end-sequence, and association with heritable disorders and cancer. The backbone of the array consists of ~4600 BAC clones that were directly mapped to specific, single-chromosomal positions by FISH.15 Each clone is printed in duplicate on amino-silanated glass slides (Schott Nexterion type A+) using a MicroGrid ll TAS arrayer (Genomic Solutions, Ann Arbor, MI). The BAC DNA products have ~80 μm diameter spots with 150 μm center-to-center spacing creating an array of ~39,000 elements. The printed slides dry overnight and are UV-crosslinked (350 mJ) in a Stratalinker 2400 (Stratagene, La Jolla, CA) immediately before hybridization. A complete list of the RPCI-11 BAC clones spotted on the 19K microarray can be found at http://microarrays.roswellpark.org.
Genomic DNA (1 μg) was fluorescently labeled using random primers and Klenow in the BioArray CGH Labeling System (Enzo Life Sciences, Farmingdale, NY). The four HL-60 targets were labeled with Cy3-modified nucleotides. The four normal lung targets were labeled with Cy5-modified nucleotides. Prior to hybridization, each HL-60 target was coprecipitated with each normal lung target in the presence of 100-μg human fluorimetric Cot-1 DNA (Invitrogen, Carlsbad, CA) to block repetitive elements. The four targets were hybridized to four 19K BAC microarrays in SlideHyb Buffer #3 (Ambion, Austin, TX) including yeast tRNA (Invitrogen). Hybridization and washing occurred in a GeneMachine hybridization station (Genomic Solutions) as described.18 The slides were scanned immediately at 5-μm resolution using a GenePix 4200AL laser scanner (Molecular Devices, Sunnyvale, CA) in both Cy3 (HL-60) and Cy5 (normal lung) channels.
Image analysis was performed using the ImaGene version 7.0.1 software (BioDiscovery, El Segundo, CA). Low-intensity and low-quality spots were flagged and excluded from further processing. Copy number was estimated using the log2 ratios of the HL-60/normal lung signals which were normalized using a subgrid LoESS procedure, with the clones on the sex chromosome given a weight of 0. Replicate values on the same microarray were averaged. Regions of segmental duplications and regions of large scale variation in the human genome were identified as previously described.19–22
The protocol for CGH analysis using gene expression microarrays has been described elsewhere.1 Briefly, four HL-60 targets and four normal targets were generated by DNaseI fragmentation of genomic DNA (10 μg) followed by biotin labeling using terminal transferase. The eight targets were individually hybridized to eight GeneChip® Human Genome U133 Plus 2.0 arrays (Affymetrix). The hybridized arrays were washed, stained with streptavidin phycoerythin and scanned for fluorescent signals using standard gene expression protocols (Affymetrix). Signals for individual probe sets were calculated using the robust multiarray average algorithm.23 After log2 transformation, the median signal for each set of four replicates was determined. Copy number was estimated from the difference between the individual replicate log2 HL-60 signals and the median log2 normal lung signal, which is equivalent to the HL-60/normal lung ratio. Statistical significance was defined as p < 0.05 using a two-tailed test assuming equal variance.
The Affymetrix GeneChip Human Mapping 500K Array Set is comprised of two arrays, each capable of genotyping on average 250,000 SNPs (approximately 262,000 for Nsp arrays and 238,000 for Sty arrays). The probe sequences are based on restriction enzyme digests. The resulting arrays have a nonuniform probe distribution with a median physical distance between SNPs of 2.5 kb and an average distance between SNPs of 5.8 kb. Eighty-five percent of the human genome is within 10 kb of an SNP. The assay requires a sample size of 250 ng of raw genomic DNA. Four replicates of the HL-60 sample were hybridized to individual arrays. The platform-specific analysis relied on Partek Genomics Suite software (http://www.affymetrix.com/products/software/compatible/index.affx).
For the CBS analysis, a random set of 40 samples taken from the HapMap cohort were used as a reference. Copy number was estimated from log2 ratios which were calculated for each HL-60 replicate relative to the mean of the HapMap controls. All initial processing including normalization and construction of ratios was performed in Genotyping Console v. 2.1 (Affymetrix) using default parameters for the 500K microarray.
Agilent Technologies has developed a 44B arrayCGH platform with 60-mer oligonucleotide probes synthesized in situ using inkjet technology. It includes 40,000 probes that span the human genome with an average spatial resolution of approximately 75 kb, including coding and noncoding sequences. It includes one probe per gene for refseq and genbank known genes and three probes for each of approximately 1100 known cancer genes of importance. This platform requires 25 ng of total genomic DNA to detect chromosomal changes across the entire genome.
Three pairs of HL-60 and normal lung targets were prepared from 0.3 μg of input DNA, while the fourth pair was prepared on a separate day from 3.0 μg of DNA. Copy number was estimated using normalized log ratios for each replicate and these were obtained from Agilent CGH Analytics software using the aberration detection method 1 (ADM-1 or “adamone”) algorithm.
ADM-1 is an aberration algorithm that identifies all aberrant intervals in a given sample with consistently high or low log ratios based on the statistical score. The ADM algorithms search for intervals in which the average log ratio of the sample and reference channels exceeds a user-specified threshold. In contrast to the Z-score algorithm, the ADM algorithms do not rely upon a set window size, instead sampling adjacent probes to arrive at a robust estimation of the true range of the aberrant segment. The output differs from that of the CBS algorithm by reporting statistically significant aberrant regions, allowing rapid genomic assessment. The ADM-1 statistical score is computed as the average normalized log ratios of all probes in the genomic interval multiplied by the square root of the number of these probes. It represents the deviation of the average of the normalized log ratios from its expected value of zero. The ADM-1 score is proportional to the height h (absolute average log ratio) of the genomic interval, and to the square root of the number of probes in the interval. Roughly, for an interval to have a high ADM-1 score, it should have high height and/or include a large number of probes.
The Illumina Infinium Whole-Genome Genotyping assay provides high-resolution profiling of both loss of heterozygosity and DNA copy number changes. This technology relies on BeadChips that are constructed by random assembly of bead pools into microwell patterned stripes on a silicon substrate. Each stripe is loaded with a unique bead pool composed of tens of thousands of different bead types for a total complexity of hundreds of thousands of bead types across the BeadChip. Processed measurements from these BeadChips provide normalized intensity and allelic ratio for each SNP. The platform performs a parallel assay of approximately 550K locations in the genome. Similar to the Affymetrix platform, a single sample is hybridized to each array. Four replicates of the HL-60 sample were hybridized to individual arrays. Copy number was estimated using the log2 ratio of each hybridization relative to a control dataset included in the Illumina BeadStudio software. This software was utilized for all normalization and ratio construction for the four Illumina replicates using manufacturer recommended settings.
Software has been developed by many groups for data visualization and statistical analysis as a result in the increase in arrayCGH applications and the diversity of platform technologies. A recent article has identified the breadth of this software development2 and another has evaluated several analytical techniques.24 Lai et al. compared 11 different algorithms for analyzing arrayCGH data including both segment detection methods and smoothing methods.24 They found that some segmentation methods, especially CGHseg25 and circular binary segmentation (CBS),26 appear to perform consistently well.24 For consistency in comparisons, we chose CBS to identify gains and losses from all of our selected platforms.
All CBS analyses were performed using the DNA copy package in R/BioConductor (DNA copy library). Log ratios were used as input and each replicate on each platform was run independently. All runs were implemented with the default CBS settings. The final set of segments, segment scores (smoothed log ratios for the segment), and associated markers were output from each run. Copy number calls (gain or loss) were assigned to a segment after applying a threshold to each segment (described below).
Reproducibility of CBS segment score across the four replicates summarized in Figure 1 was examined in the following fashion. Platform-to-platform variation in CBS segment score indicating a potential variant was normalized across platforms (by scaling). Scaled CBS segment scores were then mapped to each marker associated with the segment. The mapped scores were then averaged across four replicates for each marker and treated as the “fitted” value. Root mean squared error (RMSE) was computed for each individual segment from each array by comparing the segment score relative to the fitted value for each marker in the segment. Log10(RMSE for a segment) was then plotted versus the log10(# of markers in the segment). As this creates thousands of points for the Illumina and Affymetrix array systems, we then implemented a lowess smoothing procedure per platform to visualize the results.
To determine optimal thresholds for terming a segment as a putative variant, we examined plots of CBS segment scores relative to segment size as well as cumulative distributional plots of CBS segment scores (see Supplemental Material). Change point analysis of ordered CBS segment scores related to loss was relatively straightforward for all platforms. However, for gains, change point analysis did not have the same clarity. Some platforms (such as Agilent 44B) did not have an obvious threshold for gains. Therefore, assuming integer gains and losses and using a simple model of segment score = log2 (d + Bg)/(2 + Bg), where Bg stands for general background and d represents either the smallest loss (1 copy) or a gain (3 copies) relative to typical (2 copies), then if lt is the threshold for loss, then gt = log2 (2 – 2lt) is the threshold for gain. This simple model was used for Agilent 44B, Affymetrix U133 Plus 2.0, and Affymetrix 500K. However, the data strongly suggested different gain thresholds for BAC and Illumina arrays. The final thresholds chosen as well as CBS segment scores relative to segment length are shown in Supplemental Material.
The value for each replicate in a reference region was assigned by calculating the mean of segments that surpassed the threshold, weighted by their segment length (number of markers in the segment). This value was retained along with the percentage of the region that is covered after applying the threshold. Both measures are presented in Figure 2.
The data files generated from CGH consist of log ratios of signal intensities from disease or selected DNA compared with a control DNA, indexed by physical location of each probe on the array. The goal is to detect discontinuities in signal intensities that represent break points in copy number of physically contiguous genomic regions. In our study, five platforms were assessed for repeatability of the log-ratio signal and CNV detection. The probe annotation on all platforms was standardized to sequence information in NCBI Build 35 / HG17 [http://genome.ucsc.edu].
The CGH platforms were selected to represent both traditional microarrays containing large BAC clones and more recent high-density oligonucleotide microarrays (Table 1). Most of the oligonucleotide arrayCGH platforms were designed for DNA analysis. However, we included a novel approach that calculates DNA copy number using microarrays designed for RNA expression profiling.1 Genomic DNA from HL-60 leukemia cells11,27 and normal human female lung (Biochain Institute) was distributed to three sites for replicate CGH analyses on five different platforms. Each site relied on platform-specific protocols for target preparation, hybridization, quality assessment, and image analysis methods.
The number of hybridizations and the source of normal reference data varied between platforms (Table 1). Two platforms used traditional (two-color) CGH methods, where both HL-60 and normal lung targets were hybridized to the same microarray. Three platforms hybridized HL-60 or lung targets to separate (one-color) microarrays. For the Illumina and Affymetrix genotyping microarrays, gains and losses were evaluated by comparing HL-60 signals with reference datasets provided by the manufacturers (see Materials and Methods). For the novel approach using Affymetrix gene expression microarrays, HL-60 signals were compared with a reference value based on the median of four hybridizations from normal human DNA.
We initially compared the repeatability of the five array-CGH platforms by calculating standard deviations in the HL-60 ratios for each probe on the microarrays across the four replicate hybridizations (Table 2). The BAC technology shows approximately a three- to fivefold less median standard deviation than the oligonucleotide arrayCGH platforms. However, when the standard deviation is normalized by the square root of the number of measurements (a proxy for a type of standard error of the mean under certain assumptions), then all of the platforms had similar levels of variation per probe.
Next we determined the ability of the five arrayCGH platforms to detect a reference set of HL-60 gains and losses. This evaluation used circular binary segmentation (CBS) as well as many manufacturer-recommended analysis methods for each platform (see Materials and Methods). The Supplemental Material contains chromosome depictions for some of the arrayCGH platforms. With the exception of the gene expression platform, individual hybridizations were evaluated separately. Previous CGH analyses of the HL-60 cell line detected monosomy X, trisomy 18, five regions of genetic loss, and one region of amplification in 8q24.13,14,28 Each of these reference changes were detected in all array-CGH platforms by platform-specific and CBS analyses (data not shown). In addition, the lower density microarrays reported further chromosome alterations (see Supplemental Material) that were sometimes detected by a single platform. The absence of additional changes in the high-density platforms using the platform-specific analysis methods may be related to the stringent thresholds and/or the data filtration processes (see Materials and Methods). For example, the Genomic Suite (Partek) software used for Affymetrix 500K analysis included repeated data smoothing and was set to exclude copy number variations less than 0.5 Mb.
While the platform-specific analyses are tailored for individual platforms, the differences in data filtering and smoothing approaches limited the extent of our arrayCGH comparison. Therefore, we also compared arrayCGH results derived from a common analysis method, CBS.26 Based on signal ratios of HL-60 to either reference data or cohybridized normal lung DNA, the CBS algorithm examines regions across multiple consecutive probes and divides chromosomes into contiguous segments with similar copy number ratios. The score for each chromosome segment reflects both the direction and the magnitude of any copy number change. Chromosome regions with no copy number changes have a CBS score near 0. Negative scores reflect chromosome loss and positive scores indicate gain.
For each hybridization in the five arrayCGH platforms, CBS scores and chromosome segments were calculated using the R/BioConductor software (DNA copy). The numbers of CBS segments per hybridization are listed in Table 3 and the distributions of segment sizes are presented in Figure 1B. Individual hybridizations using the same arrayCGH platform generated a similar number of chromosome regions. Interestingly, the number of probes on the arrayCGH platform did not predict the number of chromosome segments. Low-density platforms divided the genome into larger chromosome segments. For example, the BAC and Agilent platforms lacked any chromosome segments less than 100,000 kb, while the segment size for other platforms varied from 1000 kb to over 100 Mb (Fig. 1B).
While probes in the same CBS chromosome segment have the same CBS score, the endpoints of that segment may vary slightly between individual hybridizations. Therefore, we developed a novel RMSE approach to evaluate repeatability of CBS scores between replicates using the same arrayCGH platform (see Materials and Methods). Log10 plots of the RMSE results illustrate the relative variance in segments with the same number of probes (Fig. 1A) or segments of the same length (Fig. 1B). Large chromosome segments and those containing the most probes tended to be less variable. Given the same number of markers on a segment, the BAC arrayCGH platform generally had 10-fold lower RMSE scores indicating the highest repeatability between replicates (Figure 1A). This result may be somewhat expected given that the BAC microarrays had the lowest median standard deviation in the platform-specific analyses. However, all platforms had similar RMSE scores when segments of equal length were considered (Figure 1B).
For each arrayCGH platform, the distribution of CBS scores for all segments identified in the four hybridizations is presented in the Supplemental Material. Interestingly, all platforms reported more negative CBS scores, indicative of chromosome losses, than positive scores. This finding coincides with previous findings of more losses than gains in the HL-60 cells.13,14
Detection of the reference set of HL-60 alterations was investigated using CBS analysis of log ratios from four technical replicates from each platform. Unique thresholds for defining chromosome loss or gain were developed for each platform based on its distribution of CBS scores as described in Materials and Methods and Supplemental Material.
The value for each replicate in a reference region was assigned by calculating the mean of segments that surpassed the threshold, weighted by their segment length (number of markers in the segment). This value was retained along with the percentage of the region that is covered after applying the threshold. Both measures are presented in Figure 2.
All platforms detected eight of the eight reference gains and losses in most of the individual hybridizations. In addition, for the regions where there is strong segment score agreement, some indicate different levels of the portion of the region affected. For example, for the 5q11.2–5q31 loss, all but one platform indicates that there is approximately a 90% loss in that region. The lone exception is the Affymetrix U133 Plus 2.0 system which indicates a wide range of the region portion having a loss (0%–75%). We see similar behavior for 9p21.3–9p22. However, at 17p12–17p13.3, the Illumina array shows the widest range. In general, the BAC array shows the most repeatability for the eight regions of HL-60 examined when it comes to estimating the portion of the reference region affected, followed closely by the Affymetrix 500K.
Further genetic alterations were detected in some, but not all, of the platforms studied. For example, a review of five gain and five loss events is presented in Supplemental Material, S11 in the CBS Analysis. It is likely that several of these novel CNVs represent true alterations in the HL-60 cell line, given that the identification of these novel regions was conducted in the same way as for the reference regions.
There are other platform features to consider when selecting a platform for arrayCGH research. Some practical features are detailed in Table 4. Sample-to-protocol completion times were accomplished within 2 days for most platforms. Similarly, target labeling prices ranged from $125 to $180 per sample (varied by a factor of 1.4) depending on an institution’s pricing schedule. one feature that may distinguish these platforms is the volume of resulting data generated from one sample and the capacity for each laboratory to process the data into copy number values. The highest density microarrays, Illumina Hap550 and Affymetrix 500K, each contain more than 500,000 probes resulting in relatively large data files to be used in downstream analysis.
This paper is the first to compare CGH results from a BAC and three genotyping oligonucleotide microarray platforms. It also includes a novel arrayCGH platform using an RNA expression microarray to detect DNA copy number variation.1 Using both platform-specific analyses and a common CBS approach, we found that all five of the arrayCGH platforms detected 100% of the eight previously reported CNVs in almost all replicates. our results suggest that, at this level of resolution, the selection of an arrayCGH platform may depend more on practical considerations such as price than on a substantial difference in technical performance.
CGH analysis methods are still evolving and are often optimized to particular platform features or applications. In order to provide a less biased review, the CGH results for each platform in this study were analyzed by two different approaches. The platform-specific analyses detected all of the eight reference regions in all of the platforms. Likewise, the common CBS analysis method detected the eight reference regions. It should be noted that the common analysis method was based on the well-characterized CBS algorithm26 and included unique detection thresholds for each platform, but has not been optimized or repeatedly validated. The same default software settings were used for both oligonucleotide and BAC platforms, which resulted in approximately 2000 segments per Illumina Hap550 hybridization, but less than 350 segments per hybridization with the other arrayCGH platforms, including the Affymetrix 500K microarray which has a similar probe density as the Illumina microarray.
Unlike microarray-based expression profiling, most arrayCGH studies do not include replicate hybridizations of the same DNA sample. Therefore, the repeatability of the individual probe or overall analysis results is a key consideration during platform selection. This study includes four replicate hybridizations for each platform and the reproducibility of the results was confirmed at multiple levels. As shown in previous research,29 the BAC platform showed the smallest level of probe variation when the overall standard deviation (Table 2) or RMSE values (Fig. 1A) were considered. However, all platforms showed a similar level of variation per probe when normalized to the number of measures on each platform (Table 2 and Fig. 1B). Detection of reference CNV regions and the number of CBS segments per hybridization (Table 3) were generally consistent between replicates on all platforms, although the weighted segment scores and region detected percentages derived from the common CBS analysis sometimes fluctuated (Fig. 2). overall, these results are encouraging for the use of single hybridizations on any of the arrayCGH platforms.
Other arrayCGH studies have relied on male versus female differences in X chromosome probes to model known copy number differences.30 In contrast, we examined normal versus tumor DNA derived from HL-60 cells. This comparison may have been especially challenging given the choice of the HL-60 cell line, which contains multiple CNVs, and because subpopulations in the cell line may result in noninteger changes in copy number.
The capability of arrayCGH to detect changes in genomic regions throughout the genome is dependent on the size and positioning of clones on the array. BAC arrays, with a resolution in the 150-kb range, typically have large segments from 100 to 160 kb whereas array oligonucleotide CGH platforms contain shorter segment sizes in the order of 50 to 100 kb. The high-density oligonucleotide arrays theoretically offer higher resolution, but reportedly may not be as robust as the BAC platform.24
Theoretically, oligonucleotide array platforms should provide improved detection of gains and losses compared with BAC arrays because the spatial resolution is in the 35-kb range and the number of probes per array is increased compared with BAC arrays. Due to their small target size, however, oligonucleotide arrays suffer from poorer signal-to-noise ratios that often results in a significant number of false-positive outliers. Typically 20–50 adjacent oligonucleotides are necessary for a reliable call (i.e., >90%). Thus, identification of regions of CNV requires the use of statistical tools and more complex algorithms. In this report, the evaluation of oligonucleotide arrays, including Agilent, Illumina550K, Affymetrix 500K, and Affymetrix U133 Plus 2.0, and a custom BAC array showed good detection of the previously reported gains and losses for the HL-60 cell line as well as finding a number of novel CNVs.
In the absence of any substantial technical differences, platform selection will also be based on practical considerations and intended application. In this study, novel HL-60 deletions were detected by some, but not all, platforms (see Supplemental Material). This detection difference may be related to platform resolution and/or analysis settings. Therefore, some arrayCGH platforms may be best suited for discovery research, while others are preferable for diagnostic applications. In this study, the gene expression platform1 generated unexpectedly comparable results to other arrayCGH methodologies. This alternative approach may be particularly useful for genomes where genotyping arrays are not yet available or in combined expression-genotype studies where a single set of probes might be useful.
This study was initiated in December 2006 and so utilized microarray products available at that time. Significant advances in the array products and CGH software have sine been made; for example, commercial arrays with more than 1 million probes (or probe sets) are now available. However, the methods and results in this study will be useful in evaluating these new arrayCGH products, as basic microarray hybridization technology remains the same.
Identifying the specific segmental genomic alterations and the genes they contain will enrich our understanding of disease processes and may identify molecular targets for improved diagnosis and subsequent therapeutic strategies. As CGH technologies continue to evolve, it is important to evaluate their performance so that practitioners with limited resources may have guidance in platform selection.
|Reference Region||Chromosome||Start Nucleotide||End Nucleotide||Variant Type|
|ArrayCGH Platform||Loss 2q||Loss 4p||Loss 9p12~q21||Loss 16q23||Loss 16||Gain 17||Gain 19|
|Custom BAC 19K||+||+||+||+||−||−||−|
HL-60 CGH Results Using BAC 19K Microarray. Regions with gains and losses in more than one replicate are listed. Red lines mark CBS segments.
HL-60 CGH Results Using Affymetrix 500K Microarray. Regions with statistically significant gains and losses using Partek software are highlighted.
HL-60 CGH Results for Chromosome 9 Using Agilent 44K Microarray. Regions with gains and losses using ADM-1 algorithm are shaded and marked by bottom line.
HL-60 CGH Results for Chromosome 9 Using Illumina Hap550 Microarray.
HL-60 CGH Results for Chromosome 7.
Distribution of CBS Segments for the Affymetrix U133 Microarray.
Distribution of CBS Segments for the Agilent Microarray.
Distribution of CBS Segments for the BAC 19K Microarrays.
Distribution of CBS Segments for the Affymetrix 500K Microarrays.
Distribution of CBS Segments for the Illumina Hap550 Microarrays.
Additional Regions Using Common CBS Analysis.
The authors gratefully acknowledge the key assistance of the following individuals: Agnes Viale for processing Agilent, Affymetrix, and Illumina arrays; Herbert Auer for DNA preparation and hybridization to the Affymetrix U133 Plus microarrays; Devin McQuaid and Jeff Conroy at RPCI for their guidance and CGH technical advice; Xiaowen Wang and Tom Downey for assistance with CNV analysis using the Partek software; as well as Anniek De Witte for analysis of Agilent microarrays. We also appreciate financial support provided by ABRF and supplies generously donated by Affymetrix. The research described in this article has been reviewed by the National Health and Environmental Effects Research Laboratory, US Environmental Protection Agency, and approved for publication. Approval does not signify that the contents necessarily reflect the views and the policies of the Agency, nor does mention of trade names or commercial products constitute endorsement or recommendation for use.