|Home | About | Journals | Submit | Contact Us | Français|
The goal of this work was to test the ability of oligonucleotide-based arrays to reproduce the results of focused bacterial artificial chromosome (BAC)-based arrays used clinically in comparative genomic hybridization experiments to detect constitutional copy number changes in genomic DNA.
Custom oligonucleotide (oligo) arrays were designed using the Agilent Technologies platform to give high-resolution coverage of regions within the genome sequence coordinates of BAC/P1 artificial chromosome (PAC) clones that had already been validated for use in previous versions of clone arrays used in clinical practice. Standard array-comparative genomic hybridization experiments, including a simultaneous blind analysis of a set of clinical samples, were conducted on both array platforms to identify copy number differences between patient samples and normal reference controls.
Initial experiments successfully demonstrated the capacity of oligo arrays to emulate BAC data without the need for dye-reversal comparisons. Empirical data and computational analyses of oligo response and distribution from a pilot array were used to design an optimized array of 44,000 oligos (44K). This custom 44K oligo array consists of probes localized to the genomic positions of >1400 fluorescence in situ hybridization-verified BAC/PAC clones covering more than 140 regions implicated in genetic diseases, as well as all clinically relevant subtelomeric and pericentromeric regions.
Our data demonstrate that oligo-based arrays offer a valid alternative for focused BAC arrays. Furthermore, they have significant advantages, including better design flexibility, avoidance of repetitive sequences, manufacturing processes amenable to good manufacturing practice standards in the future, increased robustness because of an enhanced dynamic range (signal to background), and increased resolution that allows for detection of smaller regions of change.
The advent of array-based copy number analysis using comparative genomic hybridization (CGH) or non-CGH methods, including analysis of single nucleotide polymorphisms, has been a breakthrough in the detection of chromosomal copy number changes in the clinical setting.1 This approach has been shown to be superior to both classical cytogenetic banding methods and fluorescence in situ hybridization (FISH)-based methods because of the greatly improved resolution and highly multiplexed nature of the method.2–4 It is clear that this “molecular cytogenetic” methodology will continue to expand the capabilities for correlations between chromosomal aberrations and clinical phenotypes. This will be invaluable, not only for the diagnostic potential, but also for eventual discovery of the true genotypic basis for specific syndromic features at the molecular level.
Until recently, most clinical applications of array-CGH, other than some cancer studies, have been based on arrays constructed by covalent attachment to glass slides of DNA from whole clones, typically cosmids, P1 artificial chromosomes (PACs), or bacterial artificial chromosomes (BACs), or of polymerase chain reaction products generated from such clones. Although arrays with whole genome coverage have been produced,5–10 the great majority of clinical cases have been analyzed on much more focused arrays, partly because of the problems of production, analysis, and interpretation of the extensive amount of data that can be generated in array experiments.11,12 These arrays have been primarily concentrated in specific genomic regions that have either been shown to correlate with genetic disorders, or which are expected to have that potential (e.g., subtelomeric and pericentromeric regions).13–16
Because of the technical limitations of array production and establishment of rigorous quality control standards for spotted clone-based arrays, it seemed likely that this approach would be only a temporary solution to detect genomic copy number changes and would most likely be supplanted by oligonucleotide arrays.17 The latter have been shown to be very powerful platforms for many types of hybridization-based studies, including analysis of gene expression, DNA methylation, chromatin immunoprecipitation, and single nucleotide polymorphism (reviewed in studies by Koczan and Thiesen, Shaikh, and Zahir and Friedman18–20). Preliminary work has been performed on several of the competing platforms to demonstrate proof-of-principle for applicability to copy number analysis.21–25 However, systematic validations and studies of specific technical aspects have been limited.26 Here, we have undertaken a side-by-side comparison of custom-designed oligonucleotide-focused arrays with our clinical BAC arrays to address these issues.
There has been considerable debate recently about the relative merits of focused versus “whole-genome” array analysis.27–29 The approach described here easily lends itself to future expansion that will blur the distinctions between focused and nonfocused arrays by allowing many options for array design and analysis.
All patients studied were referred to the BCM cytogenetics laboratory for clinical array-CGH analysis. DNA was extracted from whole blood using the Puregene DNA extraction kit (Gentra, Minneapolis, MN) according to the manufacturer’s instructions.
Our Version 5.0 BAC microarray (BAC V5) included 853 BAC and PAC clones; it was designed to contain 3–10 BAC/PAC clones for regions corresponding to 75 known genomic disorders as well as all 41 subtelomeric regions and 43 pericentromeric regions.15 The Version 6 BAC microarray (BAC V6) includes 1472 BAC and PAC clones. This version covers approximately 150 genomic disorders with minimum backbone coverage of every chromosome at the 650-band level of cytogenetic resolution (http://www.bcm.edu/cma/table.htm).
Microarrays were synthesized using ink-jet technology with phosporamide chemistry (Agilent Technologies, Inc., Santa Clara, CA).30,31 Probe sequences were chosen from the HD CGH database (eArray, Agilent Technologies), designed in silico, and empirically validated using two-color array-CGH methods.32 The entire human genome was tiled and oligos selected based on melting temperature (Tm), secondary structure, and homology to other sites in the human genome. Oligos specially designed for array-CGH were Tm-matched by trimming, from 60 nucleotide bases (60 mers) to match the target temperature. Although most oligo sequences were 60 bases, some shorter ones (down to 45 mers) were selected to make the oligo selection isothermal. Nonhybridizing nucleotide stilts were used to make all the oligos a uniform 60 bases in length. Oligos were also searched for homology to the human genome (Build 35 hg17) to avoid cross-hybridization, which could lead to confusion for positional mapping of the oligo. Only unique oligos were selected for inclusion in the HD database.
Three oligo array designs corresponding to BAC V5 and V6 arrays were manufactured by Agilent Technologies using standard procedures. First, an oligo array containing 40,937 oligonucleotides of approximately 60 bases mapping within the sequence coordinates of BAC/PAC locations for our BAC V5 array was synthesized (oligo V5). The genome sequence coordinates were determined for BAC clones using the UCSC Genome Browser resources with the May 2004 (hg17) build. Two oligonucleotide arrays were subsequently developed to obtain approximate equivalence in coverage to our BAC V6 array; both of these arrays were based on the March 2006 (hg18) build. To optimize the oligo selection, initially an array containing approximately 100,000 oligos was synthesized with two arrays on each slide. Ten hybridizations were performed with these arrays using male (M) and female (F) reference DNAs (4 × F vs. F, 2 × M vs. M, 4 × M vs. F). An analysis of the intensity distribution from these hybridizations showed consistently low intensities for <5% of the oligos, and these oligos were then eliminated. To further reduce the number of oligos, the range of each BAC interval covered by the oligos was determined as a percentage of the BAC region; a uniform coverage statistic based on splitting each BAC interval into 15K bins and calculating the observed deviation from uniform coverage of each BAC was also computed. Oligos were then eliminated so as to maintain high coverage as a percent of the BAC and high uniformity in the distribution along the BAC. Finally, we enforced the rule that BACs retain a minimum of approximately 10–15 oligos whenever possible. After this preselection process, 42,640 oligonucleotides corresponding to genomic regions covered by the BAC V6 arrays were chosen. This targeted oligo array (oligo V6) was manufactured in a 4 × 44K format, with an average of 28–30 oligos per region previously covered by a single BAC clone.
All array-CGH analyses were performed with gender-matched reference DNA from a single phenotypically normal male or female unless otherwise noted. The procedures for probe labeling and hybridization of our BAC arrays were reported previously.15 The procedures for DNA digestion, labeling, and hybridization for the oligo arrays were performed according to the manufacturer’s instructions, with some modifications. Briefly, 1–2 μg of genomic DNA from experimental and gender-matched reference samples were digested with AluI (10 units) and RsaI (10 units) (Promega, Madison, WI) at 37°C for 2 hours. The labeling reaction was performed using the Bioprime CGH Labeling Module (Invitrogen, Carlsbad, CA) at 37°C for 2 hours in the presence of cyanine 5-dCTP (for the experimental sample) or cyanine 3-dCTP (for the reference sample) (PerkinElmer, Boston, MA). For experiments involving dye-swap labeling, two experiments were performed with reversal of the dye labels incorporated into the control and test samples. Experimental and reference DNAs for each hybridization were purified, pooled, and incubated with human Cot-1 DNA (Invitrogen) and blocking agent (Agilent Technologies). The labeled samples were applied to an array, which was placed in a microarray hybridization chamber (Agilent Technologies), hybridized for more than 20 hours at 65°C in a rotating hybridization oven and washed according to the manufacturer’s protocol (Agilent Technologies).
The slides were scanned into image files using a GenePix Model 4000B microarray scanner (Molecular Devices, Sunnyvale, CA) or an Agilent Microarray Scanner (PN G2565BA). Microarray image files of oligo arrays were quantified using Agilent Feature Extraction software (v9.0), and text file outputs from the quantitation analysis were imported either into the Agilent CGH Analytics software program or converted to BAC-level emulation data by combining oligo data corresponding to regions encompassed by BAC clones (“emulated BAC clone”) and then using our in-house analysis package for copy number analysis, as described previously.12,15,33
Initially, two basic questions were addressed in this study. First, would oligonucleotide-based arrays reliably recapitulate the findings of both increased and decreased copy number changes detected by focused BAC arrays? Second, was the classical dye-reversal design used for array-CGH with clone-based arrays necessary to obtain statistically valid quantitative data? To address these questions, we initially developed a BAC emulation array by selecting 60-mer oligonucleotides for most of the clones included in our BAC V5 array. Exact coverage equivalence was not possible with the particular clone set used to design this pilot array because approximately 7% of the targets on the clone array lacked sufficient DNA sequence information for precise placement on the human genome sequence assembly. For the remaining 790 clones, genomic sequence coordinates were used to select approximately 41,000 oligonucleotides from the Agilent CGH-HD database, which were printed in a 1 × 44K format. A total of 20 independent hybridizations were performed with these arrays. To perform direct comparisons with BAC data, hybridization ratios from all oligos mapping within the genome sequence coordinates of individual BACs were averaged to give a single regional value.
To test the need for dye-reversal for array-CGH with oligo-based arrays, we performed a hybridization with gender-mismatched normal controls and four hybridizations with gender-matched clinical samples on V5 oligo arrays by dye-swap labeling; representative images are shown in Figure 1. The average log2 ratios showing statistically significant gains or losses for the genomic regions corresponding to the BAC sequences are summarized in Table 1. From these initial data, we concluded that dye-reversal was not necessary because all the predicted changes on each sample were completely consistent between the two experimental designs and, importantly, no new regions were detected by the oligo array that might lead to false-positives. A series of 10 additional clinical samples with a variety of copy number changes were then analyzed with the V5 oligo arrays without dye-swap. In all cases, there was 100% concordance between the original BAC results and the BAC emulation results from the oligo array (data not shown).
Having demonstrated the basic validity of the BAC emulation approach, an optimized oligo version of a higher density BAC array (V6) was developed. As before, BAC endpoint coordinates were determined to direct regional oligonucleotide selection to allow emulation of BAC data from the corresponding clone array. We have found previously that, even with careful selection, the actual performance of some oligos may be inconsistent or suboptimal. Therefore, to use empirical data for better optimization, initially arrays of >100K were tested with a series of normal male and female control hybridizations; it is important to note that these control samples showed no evidence of false-positive changes involving multiple consecutive oligos but did reproducibly give single oligo values that were more than 6 standard deviations from the mean. These data were combined and used to eliminate about 5% of the oligos that gave the most variable signals to eliminate background noise as much as possible. Oligo distributions within each BAC were then plotted and used to select optimal coverage with an average of one oligo per 5 kb of insert sequence. These criteria allowed selection of an approximately 44K oligo array that was essentially equivalent to the V6 BAC array in terms of genomic coverage, but with a potential 3–5-fold increase in resolution within regions previously covered by a single BAC probe because of the oligo probe redundancy at each location.
The final 44K oligo V6 array design was validated in two stages. First, the same 14 clinical samples tested on the V5 oligo pilot array, of which 13 had copy number changes of apparent clinical significance based on prior BAC V5 data, plus three additional samples from patients, which had given interesting patterns previously on V5 BAC arrays (B5-4, B5-11, and B5-17), were run for comparison with previous findings; the results are summarized in Table 2. Then 21 patient samples that had been tested previously on V6 BAC arrays were analyzed. Of this group, six had been previously shown to have nonpolymorphic copy number changes on V6 BAC arrays; the results for these patients are also summarized in Table 2. In Stage 2, parallel blind analyses of 62 new patient samples were performed simultaneously on V6 BAC and V6 oligo arrays. Eighteen of the 62 cases (29%) showed one or more chromosomal locations with copy number differences (Table 2). Eleven cases (18%) showed a significant gain or loss of two or more emulated BAC clones that were suggestive of clinically relevant genomic imbalances. The remaining 7 of 18 cases gave changes that could not be dismissed as common polymorphisms and are, therefore, included in the table even though most of them were shown to also be present in a parent.
Representative side-by-side comparisons of the log ratio plots for four hybridizations are shown in Figure 2. In nearly every case there was complete concordance in the detected region of change, with the average log ratio values from the pooled oligo data consistently showing a significantly larger value than was found with the corresponding BAC clone DNA (Table 2). There were a few instances where the oligo array detected additional changes that were not statistically significant on the BAC array. For example, in Cases C5 and C47, the BAC array clone log ratio did not achieve the cutoff value of ±0.2, although in retrospect both had values well above the baseline (Table 2). The BAC array-CGH analysis also failed to detect the single clone copy number change in Cases C37 and C57. Further investigation showed that these copy number differences are caused by gains or losses involving only a portion of the sequence contained within an individual BAC clone region. By using the Web-based software to examine the copy number detected at the level of the oligos instead of at the level of the whole emulated BAC clone, these smaller changes were easily detected (Fig. 3). Furthermore, because of the improved dynamic range observed using the oligo array, these smaller “partial BAC clone changes” are now detected above the normal threshold cutoff value of ±0.2. Importantly, we found that even after careful selection of the oligos used on the final clinical microarray, variability in the hybridization intensities and relative ratios at the level of individual oligos was still observed (Fig. 3 right panel). Therefore, it remains imperative to focus the analysis on binned groups of oligos rather than examining the oligo results independently. It should also be noted that a few regions that we have found previously to be highly polymorphic with BAC arrays were also detected with the appropriate oligo-based emulation arrays. However, we have excluded these from the data discussed because they would not be considered in clinical evaluation.
Cases with copy number changes on the sex chromosomes were of particular interest to us. In our experience with 7482 clinical samples using gender-matched reference DNA on V5 and V6 BAC arrays, we find that approximately 14% of the abnormal clinical array-CGH cases show abnormalities involving the X or Y chromosomes (unpublished data). In the 62 blinded clinical samples that were tested using the oligo array and gender-matched control samples, we found that five (8%) had genomic imbalances involving the X chromosome ranging from approximately 700 kb to the entire chromosome (Table 2; Cases C5, C21, C28, C35, and C40). Comparative BAC and oligo array results for these are shown in Figure 4. Importantly, the potential for identifying mosaicism involving the sex chromosomes is markedly enhanced by the increased sensitivity of detection as well as by the use of gender-matched controls for all clinical samples (Case B6-1, Fig. 4).
Array-CGH is a powerful new approach to the quantitative determination of genomic copy number changes. As a diagnostic method, it has many advantages over classical cytogenetic or FISH techniques for evaluation of constitutional chromosomal changes leading to phenotypes as general as developmental delay, dysmorphic features, or mental retardation,8,12,22,34–42 as well as for diagnosis of many known specific genetic disorders resulting from deletions and duplications.1 The technology has been developed for clinical use primarily with arrays based on large clones, particularly BAC clones, spotted on glass arrays.13,15,43,44 However, the rapid developments in oligonucleotide-based arrays, including not only technical issues such as probe flexibility and density, but also decreasing costs and improved software capabilities, make these arrays a more attractive approach for the future. To validate a change to this platform for clinical implementation, we have performed comparative studies between our BAC arrays and custom designed oligonucleotide arrays that focus on the same sequences in the genome that are covered by these clones. From a technical viewpoint, the overall similarity between protocols greatly facilitates transition between the two platforms. In addition, this general approach allows the use of prior experience with identification of regions of copy number variation acquired from BAC arrays in interpretation of results. It is also very compatible with the continued application of FISH for independent validation of copy number changes, which we believe is still an important final step.
Although in our experience, the oligo array data are very robust and sensitive, there is additional information that can be uncovered by FISH analysis because it is the only clinical laboratory methodology that provides both copy number information and chromosomal location for gain of genomic material (i.e., insertions and translocations) at the level of an individual cell. This information is important not only for the patient, but also for family counseling and risk assessment.
To demonstrate proof of principle, in Stage 1, a total of 58 independent hybridizations were performed on oligo arrays that emulated one of our BAC array designs. This included DNA samples from 14 patients that were analyzed by both V5 and V6 oligo arrays, four of which were performed by the classical dye-reversal design. In addition, 24 patient samples were retrospectively analyzed on the V6 oligo array and compared with the known BAC V5 or V6 results. Genomic imbalance was previously identified in 22 of the 38 patients. We found 100% accuracy in detection of all expected copy number changes.
In Stage 2 of the validation process, 62 patient samples were analyzed in parallel on BAC V6 and oligo V6 arrays. This blind analysis of clinical samples had a detection rate of 18% (11 of 62) for clinically relevant copy number changes. The results of the side-by-side analysis showed that the oligo array gave 100% detection rate of all changes identified by BAC arrays. In addition a few additional copy number changes were detected, which can be attributed to the increased sensitivity and resolution of the oligo arrays. We did not detect any false-negatives, demonstrating that the data generated from the BAC-emulated oligo arrays are qualitatively comparable, or superior, to the standard BAC array-CGH analysis.
With BAC arrays there is intrinsic signal variability because of the probe complexity resulting from the large size and repetitive DNA content of the clones, as well as issues in array production with large DNA fragments. Therefore, dye-swap experiments are normally used, in which comparing or combining the data helps compensate for some of the experimental variability and, therefore, minimizes the occurrence of false-positive or false-negative results. The demonstration of equivalent data from a single experiment for oligo arrays significantly simplifies the analysis for CGH and reduces the costs.
During the course of these experiments it became obvious that there were two other primary experimental advantages of the oligo-based arrays: increased dynamic range and the potential for higher resolution detection of copy number changes. First, an extended dynamic range is extremely important in assessing the validity of experimentally detected changes within regions covered by a single clone. Additionally, this increased dynamic range also facilitates the detection of mosaicism (Fig. 4B). In general, the mean value (log2 ratio) for emulated BAC clone regions showing copy number loss (total =76) was −0.716 for the oligo-based data, compared with a value of −0.379 for the corresponding clones on the BAC arrays (Fig. 5). For gains (total =186), the value was 0.565 for oligo-based data and 0.262 for BAC arrays (Fig. 5). Thus, the copy number changes on the oligo-based arrays were significantly closer to the theoretical log2 ratios for single copy loss or gain (−1 and +0.58, respectively), compared with clone arrays where the lower signals are potentially attributable to the inability to completely block some cross-hybridization from repetitive DNA, even with Cot-1 preassociation. Furthermore, the error and signal-to-noise properties of the binned oligo data were superior to the BAC array results. In more than 90% of instances the oligo data gave T statistics with stronger evidence to detect a copy number change than the corresponding T statistic from the BAC level data (data not shown).
A second advantage of the oligo data are the ability to examine changes smaller than the average BAC size (~150 kb). For BAC arrays, confirmation by FISH analysis can sometimes produce ambiguous results. A “diminished” FISH hybridization signal is often interpreted as a possible “partial deletion” of the region detected by the clone used as the probe and a “partial duplication” is extremely challenging to distinguish by FISH analysis. Our BAC-emulation oligo array allows for visualization of the copy number change detected at the level of a BAC clone as well as by each individual oligo, thus verifying that a diminished signal observed by FISH analysis is indeed a partial deletion. This technology further provides the possibility of more accurate mapping of deletion/duplication breakpoints (Fig. 3). However, caution must be taken to avoid “over-calling” copy number changes. In our experience, the performance of individual oligos can vary (see right panels in Fig. 3). At this time, we believe that it is neither practical nor necessary to determine whether copy number changes detected by a single oligo reflect a true loss or gain in the patient or is a technical artifact. Instead, we rely on a large database comprised of all the clinical cases assayed by our laboratory using array-CGH to determine whether the copy number change is significant at the level of the binned BAC-emulation as well as the individual oligos.
Copy number changes involving the sex chromosomes are being detected with increasing frequencies. We estimate that in our experience with 7482 cases analyzed on BAC V5 and V6 arrays, approximately 14% of clinically relevant copy number changes were detected on either the X or Y chromosome (unpublished data). We find that copy number changes involving the sex chromosomes are more difficult to detect using gender mismatched reference DNA (unpublished data). The importance of using gender-matched reference DNA for array-CGH is highlighted in the data shown for the five patient samples with copy number changes involving the X chromosome (Fig. 4) and, in particular, the mosaic case (B6-1) shown in Figure 4B. Furthermore, the marked increase in dynamic range achieved using an oligo platform allows for ease of detecting very subtle changes in copy number of genomic regions on the sex chromosomes that may be missed using gender mismatched controls. This increase in sensitivity becomes more obvious when comparing the average log2 ratio for copy number change detected by BAC and oligo arrays (Table 2; Cases B5-12, B6-1, C28, C35, and C40). For the mosaic 45,X/46,XX case (B6-1), there is a 1.5-fold increase in dynamic range for the detection of the loss of genomic material on the oligo array compared with the BAC array. On average, the increase in dynamic range for copy number changes detected on the X chromosome is 2.55-fold for genomic loss and 2.3-fold for gains.
The use of whole-genome oligonucleotide arrays for research studies is a very powerful tool because of the high resolution obtained from such arrays. They have recently been used to screen a variety of patient populations to understand the underlying genetic factors to phenotypes such as developmental delay, dysmorphic features, mental retardation,8,22,34–42 and autism.45–47 We have used them routinely for follow-up of clinical cases to, for example, map deletion or duplication endpoints and examine sequences at translocation breakpoints.48–50 For clinical analysis there are, however, additional practical issues that come into consideration. The human genome has shown much more plasticity than anticipated with regard to copy number variation, which may or may not have clinical relevance.51–53 This creates a significant challenge in data interpretation, in terms of deciding whether observed changes in an individual’s DNA relative to a control is important. Ideally, such changes can be studied for association with inheritance patterns from parents to determine their origin, but this increases both cost and complexity of the analysis. As more knowledge is gained about how to interpret such changes and as robust validation methods are developed for small changes, it is possible that whole genome tiling array analysis will become a routine diagnostic test. The data presented here represent an important intermediate step in that direction, by focusing the analysis within specific regions where clinical interpretation is assisted by precedents from BAC-based diagnostic arrays. This logic may of course be equally applicable to other specific genomic regions, such as genes, depending on the type of analysis and degree of resolution desired; although, as the resolution for genomic imbalances detected by array-CGH increases, FISH analysis may not be an option for validation, and alternative strategies will need to be developed. We believe that the transition to oligo arrays is a very positive step that will greatly improve the quality assurance for production arrays and will offer the possibility of easier upgrades in the content of future arrays as clinical implementation continues to advance to higher-resolution genome analysis.
The authors thank the CMA and FISH laboratories at Baylor College of Medicine, Medical Genetics Laboratories for their contributions to this work.
Disclosure: The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the chromosomal microarray analysis offered in the Clinical Cytogenetics laboratory. C.E.C. is employed by and owns stock in Agilent Technologies, Inc.