|Home | About | Journals | Submit | Contact Us | Français|
A database of apparently benign copy number variants (bCNVs) detected by a Spectral Genomics Inc./PerkinElmer BAC array platform has been maintained through the University of Utah Comparative Genomic Hybridization laboratory since 2005. The target population for this database represents 1,275 patients with abnormal phenotypes, primarily children referred for developmental delay and mental retardation. These bCNVs are independent of any identified copy number abnormality detected. The most common 35 bCNVs observed and their frequencies are reported here, and a subset of ten of the patients studied was evaluated on a new oligonucleotide CNV array set designed by Agilent Technologies. There was a 76% concordance of calls for these 35 bCNVs detected by both array platforms in the same patients. The higher resolution of the Agilent oligonucleotide array compared to the BAC array allowed determination of the precise breakpoints of the observed CNVs, in addition to documentation of additional CNVs of smaller sizes. As expected, observed CNVs and their frequencies were generally consistent with those of other previously published and available databases, including the Database of Genomic Variants (http://projects.tcag.ca/variation/). The availability of these data should assist other clinical laboratories in the evaluation of CNVs of unknown clinical significance.
It has been estimated that as much as one fifth of the normal human genome contains copy number variants (CNVs), as assessed by array comparative genomic hybridization (aCGH) (Iafrate et al., 2004; Carter, 2007). The detection and interpretation of these CNVs in a clinical laboratory setting has presented challenges, especially in determining whether or not they are pathogenic when analyzing patients with abnormal phenotypes (Lee et al., 2007). The discovery of new pathogenic CNVs has also led to the identification of new genetic syndromes (Koolen et al., 2006; Krepischi-Santos et al., 2006; Sharp et al., 2006; Shaw-Smith et al., 2006). It has been estimated that a large fraction (~40%) of CNVs detected in children with developmental delay or mental retardation are actually benign CNVs and inherited from a parent (de Ravel et al., 2006). For consistency in this special issue of the journal, ‘bCNVs’ will be defined as benign copy number variants; but we otherwise generally refer to these as benign copy number changes (bCNCs) to distinguish them from pathogenic ones (pCNCs) or those of unknown clinical significance (uCNC) (Aston et al., 2008).
Benign CNVs have been defined differently in different scientific disciplines. Typically, CNVs would represent sequences of at least 1 kb in size, and if they are true population polymorphisms would be present in at least 1% of a particular population (Scherer et al., 2007). Traditional cytogenetic analysis, on the other extreme, identifies polymorphic regions which are visible at the microscopic level (Brothman et al., 2006) which would generally be greater than 5 Mb. The Database of Genomic Variants (DGV) (http://projects.tcag.ca/variation/) has become a resource for information regarding the frequency of particular bCNVs, yet recent studies indicate that the majority of bCNV loci are smaller in size than those reported by the DGV (de Smith et al., 2007; Korbel et al., 2007a; Kidd et al., 2008; Perry et al., 2008). The availability of detailed information regarding bCNVs which are not associated with any pathogenic phenotype is extremely important for interpretation of aCGH findings on clinical specimens.
There have been numerous reports of CNVs in various populations (e.g. Redon et al., 2006; Qiao et al., 2007; Jakobsson et al., 2008), but these are almost exclusively from cohorts of phenotypically normal individuals and often cell lines rather than DNA derived directly from patient specimens. The Cytogenetics Laboratory at the University of Utah has been using aCGH to evaluate DNA derived from blood of patients with developmental delay, mental retardation and ‘syndromic’ or dysmorphic phenotypes since 2005 (Aston et al., 2008). Initial work was done using the Spectral Genomics Inc./PerkinElmer BAC-based aCGH, and more recently has encompassed oligonucleotide-based aCGH (Agilent Technologies). A database of changes detected in all patients studied has been maintained, and the data were analyzed in conjunction with publicly available databases (such as the DGV) to categorize genomic regions which likely contain bCNVs.
We present here a summary of findings describing 35 of the most frequently observed bCNVs detected in a cohort of 1,275 clinically affected individuals, initially studied using a BAC array of about 1 Mb resolution. These data are compared with publicly available databases, and a subset was compared with that obtained when the same patient DNA was hybridized to a high resolution oligonucleotide array set designed to specifically assess CNV regions throughout the human genome.
The University of Utah Cytogenetics/Microarray Laboratory data include results from two bacterial artificial chromosome (BAC) array chips purchased from Spectral Genomics Inc./PerkinElmer (SGI/PE) (Turku, Finland): the Constitutional Chip™ (CC) and the SpectralChip 2600™ (1Mb). Patients were initially studied using either the 1Mb, CC, or both. The CC is a targeted array designed to screen for 41 recognized microdeletion or microduplication syndromes plus 41 unique subtelomeric regions. Clones are concentrated in the areas of interest with a sparse backbone of clones in other areas of the genome. This design results in large gaps between clones in some regions. The version of the chip most recently used (Constitutional Chip™ 3.0) was printed with 604 BAC clones. The 1Mb is a whole genome array. The most recent chip used (SC2600-B33) spaced 2,610 BAC clones along the genome at roughly 1-Mb intervals (Aston et al., 2008). The majority of the CNVs identified by the University of Utah were identified on the 1Mb chip. Both chips were based on NCBI Build 35.1.
Clones detecting copy number changes that map to regions of known CNVs as well as single clones of unknown significance were added to this database. An entry was made each time a patient exhibited a copy number change of a BAC in any of these regions. Abnormalities with known clinical significance were excluded. From this database, we were able to determine which clones represented sequences which were frequently deleted or duplicated in this patient population. We identified the most common CNV regions and then selected DNAs from ten patients who exhibited copy number changes of those common regions for further study with the Agilent CNV microarray set.
All protocols followed the manufacturer's recommendations with minor modifications. DNA was isolated from whole blood or fixed cell pellets with the Puregene Genomic DNA Purification Kit (Gentra Systems, Minneapolis, MN). DNA concentration was measured using a NanoDrop Spectrophotometer (NanoDrop Technologies, Inc, Wilmington, DE).
The genomic BAC microarrays were performed using Spectral Genomics chips (SGI/PE 1Mb or CC) and reagents. Normal male or female DNA (representing pooled individuals) from Promega (Madison, WI) was used as a sex-mismatched control. The BAC microarrays were run in duplicate using a dye-reversal strategy. A copy number variant was called only if confirmed on both hybridizations. Chips were scanned using either the Gene Pix 4000B or the Agilent Microarray scanner and the Gene Pix Pro 6.0 software package (Axon Instruments, Union City, CA). The data were analyzed using the SpectralWare software program (SGI/PE).
The Agilent CNV microarray set G4423B (AMADIDs 018897 and 018898) contains 392,824 oligonucleotide probes targeting 19,400 disjoint intervals spanning 28% of the genome and printed on two slides containing 244K probes each. These probes were selected from Agilent's database of 24.3 million oligonucleotide probes suitable for aCGH at an average spacing of 100 bp within the non-repeat masked regions of the genome. The 19,400 intervals were derived from 4,902 CNV regions and 16,110 in-dels from the DGV as well as 8,061 regions from the UCSC segmental duplication track. Median probe spacing within targeted regions varied from approximately 200 bp for smaller intervals to as high as 2,400 bp for larger intervals (average 982 bp). An additional 69,785 probes were selected from the remaining regions of the genome to provide whole genome coverage with a median spacing of 27 kb. The array probes were selected using an empirical model that utilizes scores for homology, thermodynamics, secondary structure and sequence complexity. To ensure adequate coverage of regions duplicated in the genome, homology criteria were relaxed to allow probes with multiple perfect matches in the reference human genome assembly. Probes for chromosomes 1–8 were assigned to array 1, and probes for chromosomes 9–22, X and Y were assigned to array 2. About 10,000 probes were repeated on both arrays for normalization.
A set of ten samples which showed common bCNVs based on the BAC array data set was chosen for analysis on the Agilent Human Genome CNV microarray. Normal male or female DNA (representing pooled individuals) from Promega was used as a sex-mismatched control. Sample and control DNAs were labeled with Cy5 or Cy3, respectively, using the Agilent DNA labeling kit. Following manufacturer‘s recommended hybridization and washes, the arrays were scanned at 5 μm resolution using the Agilent microarray scanner and analyzed using Feature Extraction v220.127.116.11, DNA Analytics v4.0 (Agilent Technologies), and prototype software tools using similar statistical methods. In addition to the ten clinical specimens, two self-self experiments for two patient samples were analyzed to estimate a false positive rate and to fine tune the parameters used for data analysis. A replicate hybridization was performed on one patient to determine reproducibility.
Data from the two arrays in each CNV array set were merged and treated as a single dataset for each experiment. Log2 ratios of fluorescent signals for each measured probe sequence were assigned to all perfect genomic matches of this sequence in the NCBI 36.1 build of the reference genome resulting in probes being assigned to a total of 500,974 genomic locations. This expanded set of log2 ratios was analyzed with a modified version of the ADM2 statistical algorithm (Lipson et al., 2005, 2006). In brief, the ADM2 algorithm uses an iterative procedure to identify all genomic regions for which the weighted average of the measured log2 ratios from probes in the region deviates from its expected value of 0 by more than a given threshold. In the version of the ADM2 algorithm used for this study, the statistical score of each genomic interval uses the measured log2 ratio of each probe sequence only once, even if a given genomic sequence has multiple perfect matches in this interval. The modified ADM2 algorithm was applied with a threshold of 5, minimum absolute average log2 ratio in called intervals of 0.2 and minimum of 2 probes.
The initial set of BAC aCGH data from the SGI/PE platform was evaluated as part of a clinical assay for 1,275 patients referred for suspected genetic abnormalities (primarily developmental delay or mental retardation) (Aston et al., 2008). CNVs thought to be bCNVs were entered into the local University of Utah database for categorizing location and frequency. Examples of the most frequently observed (presumably) benign CNVs detected by the SGI/PE platform in ten of the 1,275 patients are shown in Table Table1,1, along with information on frequencies with which these regions have been reported in the DGV. Additional information on how these regions compare with other databases and reported frequencies of gains and losses is included in Supplemental Table 1 (for supplemental material, see www.karger.com/doi/10.1159/000184696).
A subset of ten patient DNA samples from the University of Utah sample set was selected for further characterization using the Agilent CNV array set. The number of oligonucleotide probes in each of the 35 most frequently observed bCNVs from the BAC aCGH data set is shown in Table Table1.1. Using the modified ADM2 algorithm described in the Methods section, we detected on average 546 CNV regions on the autosomes in each of the ten patients (Fig. (Fig.11 and Supplemental Table 2). As none of the 35 most frequent bCNVs found from the BAC aCGH data set were observed on either sex chromosome, only the autosomes were included in this analysis. In the self-self experiments, we detected nine and 64 calls, respectively. The patient sample with 64 calls in the self-self experiment also had the largest number of calls in the hybridization vs. the reference sample (834 gains and losses). This sample had been stored at 4°C for two years prior to running on the Agilent array while other samples were stored for less than one year, a factor that could impact the performance of the assay. This sample was excluded from subsequent analysis.
CNV regions detected using the Agilent CNV array set in the ten patients overlapped 28 of the 35 most frequent bCNV regions (80%) from the Utah database, with no copy number variations observed for the remaining seven regions (clones) in this set of patients (Supplemental Table 1). The discordant regions may be due to differences in the reference DNAs used in the different studies, as the source of the normal individuals used in the pooled DNA lots may have varied. Other explanations include possible mis-mapping of BACs on early genome reference builds, or false positive/negative results on either array. There are also several regions where significant differences were seen between the CNV frequencies from the University of Utah data and the frequencies reported in the DGV (Table (Table1).1). While some of these differences may be due to population sampling, an explanation for differences in five of the regions noted in Table Table11 is that these regions were not represented on newer versions of the BAC array, and hence would be under-represented in the later patients studied.
The median size of CNV calls in the Agilent CNV array data set is 4.9 kb (Fig. (Fig.2),2), similar to size distributions previously reported in high resolution studies (de Smith et al., 2007; Korbel et al., 2007b; Kidd et al., 2008; Perry et al., 2008). 746 CNVs detected in ten samples were larger than 50 kb (14%), and 716 CNVs were smaller than 1 kb (13%). Known abnormalities were detected in two patients with both platforms used: one on chromosome 7 at 72.3 Mb (the patient had Williams syndrome) and one showing a complex subtelomere abnormality on chromosome 22 at 44.7 Mb and at 47 Mb (data not shown).
The total of 5,460 CNVs detected in the ten patients on the autosomes (Fig. (Fig.11 and Supplemental Table 2) represent 1,547 copy number variant regions, of which 948 regions were detected in more than one patient (Supplemental Table 3). Six hundred sixty-four (70%) of the CNV regions called in more than one sample had consistent (± five probes) breakpoints while some showed different breakpoints within a CNV region (see examples in in3,3, ,4,4, ,55 and Supplemental Tables 3 and 4). Of the 28 most frequent bCNV regions from the Utah database that were observed in the Agilent CNV array data set, eight were detected in multiple samples and showed variants with consistent breakpoints (up to five probes). The lengths of the variant regions observed in different individuals in 15 of these 28 regions indicated greater than 50% variability. The differing breakpoints could potentially represent different variants in the same region and/or regions with complex CNV structure (Perry et al., 2008).
CNVs have been reported and noted in numerous publications and within multiple publicly available databases; the samples from which they have been ascertained generally represent a plethora of individuals. Previous focused studies of these widespread variant regions have included different populations of individuals, but all presumably with normal phenotypes. Array CGH analysis of patients with congenital abnormalities has become a main (and sometimes a first tier) assay for clinical cytogenetics and genetics laboratories, yet how to distinguish normal variants from those of clinical significance has created a dilemma. A diagnosis of a normal variant (bCNV) should not generally be considered an abnormality, nor should it be (by itself) a major factor in interpretation of a laboratory result. Of note, one should consider the possibility that bCNVs can be observed in homozygous form, and these may in fact not be clinically insignificant (Curry et al., 2008).
New syndromes are now being discovered at one of the most rapid paces in the history of our field (Krepischi-Santos et al., 2006), and these are primarily due to the recognition of new pathogenic copy number changes (pCNVs). Furthermore, there has been the association of multiple, individually rare de novo CNVs, with neurodevelopmental disorders such as autism spectrum disorders (Sebat et al., 2004) and schizophrenia (Walsh et al., 2008). What has commonly happened in clinical laboratory settings when initiating aCGH testing, was that newly identified copy number changes were seen in the evaluation of patients with anomalies, but the interpretation of those findings could be troublesome if they involved novel sites in the genome; it was difficult to determine whether these changes were pathogenic or benign. Examination of parental samples helps in this assessment, but these studies are both costly and time consuming. Distinguishing the difference between these pCNVs and bCNVs is essential in such clinical evaluation (Lee et al., 2007).
The University of Utah Cytogenetics Laboratory has maintained a database of all copy number changes on all patients studied by aCGH since 2005. As the number of patients evaluated increased, it became apparent that many of the CNVs detected were consistent with those reported in public databases such as the DGV, while new ones were also observed. The DGV contains gains and losses reported in various studies in normal individuals in all 35 of the most frequent (presumably) bCNV regions defined by clones in the Utah database. Even CNVs detected in individual studies overlap with a significant proportion of these regions. Namely, 25 overlap CNV regions reported in Perry et al. (2008), and 17 CNV regions reported in de Smith et al. (2007) (Supplemental Table 1).
Clinically significant findings were observed in approximately 17% of cases amongst the 1,275 patients analyzed with the SGI/PE BAC array platform (Aston et al., 2008) yet, the patients which showed these pCNVs (some showing duplications/deletions of >10 Mb chromosomal regions), often showed the same bCNVs observed in normal patients. As we have shown here, use of a high density CNV array set dramatically increases the number of calls of CNVs detected. The overall resolution of clinical array platforms is increasing, and the number of detected CNVs in clinical studies would be expected to rise. Goals of this study were to compare the initial entries in the Utah database with those of other reported CNVs, and to further characterize the most commonly observed CNVs using a high resolution array set designed to measure CNVs.
As we anticipated, bCNVs in the patient population studied appear to be similar to bCNVs described in the public domain from normal individuals, yet actual frequencies of gains and losses differ in some instances (Table (Table11 and Supplemental Table 1). Similar but not exact frequencies were observed in the subset of patients analyzed using the Agilent CNV array set. This is not surprising and can likely be attributed to the small sampling studied. While gain vs. loss information is generally helpful in assessing potential pathogenicity of a newly identified CNV (Lee et al., 2007), we are cautious to over-interpret gain/loss values in this current study as the sources of reference DNAs are likely different from those used in other comparative studies. Also, as expected, a much greater number of CNVs was detected with the high density Agilent platform when compared with the SGI/PE BAC array, with a total of 440–721 calls per individual (Fig. (Fig.1).1). In the subset of ten patients evaluated at high density, the modal size of CNVs was in the 1-kb range, with approximately 7% of CNVs 100 kb or larger (Fig. (Fig.2).2). These findings are consistent with those of other reports (de Smith et al., 2007; Korbel et al., 2007b; Kidd et al., 2008; Perry et al., 2008).
This report represents summary data from 1,275 initial patients, with only a fraction of those (ten patients) studied with the high density CNV array set. Expansion of these data is inevitable as more laboratories perform aCGH for diagnostics. The CNVs listed in Table Table11 are certainly not a comprehensive list, but represent those most frequently observed in the University of Utah aCGH database, and also represent loci which we believe are bCNVs. As noted above, the clinically significant abnormality detection rate by aCGH at Utah has been approximately 17% using the SGI/PE BAC array platform. The ten patients used in this study were a sampling of all patients studied, and were picked as they exhibited the most frequently seen bCNVs. Indeed, two of these patients (20%) showed clinically significant abnormalities in addition to the bCNVs.
Characterization of the sizes of the CNVs observed in the high resolution oligonucleotide array data indicates that, similar to data from other studies, some CNVs are extremely stable (without any detectable variability of breakpoints), while others appear to have variable breakpoints (Fig. (Fig.55 and Supplemental Table 4). The overall size of the copy number variable regions and the locations of specific breakpoints, e.g. those that impact functional genomic sequences, could influence the phenotypic manifestations of specific CNVs. It should be considered that there may be other factors involved in phenotypic associations of CNVs such as incomplete penetrance and/or instances where CNVs (or mutations, epigenetic effects, etc.) elsewhere in the genome may affect the clinical manifestation of specific CNVs. Whether bCNVs truly are benign remains a key question.
In the medical cytogenetics community, it is important that clinical correlations with detailed copy number change data are assessed and documented whenever possible. Data sharing and communication between laboratories performing aCGH (or copy number assessment by other methods) regarding findings and clinical correlates will improve our understanding of these variants, and how they should be interpreted.
Thirty-five of the most commonly observed CNVs in the Utah database collected from a set of 1275 patients using the SGI/PE BAC array platform described, listed in descending frequency. The number of oligonucleotide probes represented on the Agilent CNV array set for the corresponding genomic location is indicated along with the number of gains and losses detected in ten patients using the Agilent CNV array set. Subsequent columns include data on the number of variations detected in these regions in other studies (de Smith et al. (2007), Perry et al. (2008)) and the latest version (April, 2008) of the Database of Genomic Variants.
Gains and losses detected for each of the ten patients using the Agilent CNV array set. For each gain or loss the information about its genomic coordinates, size, average log2 ratio, number of probe hits in the region, statistical significance of the call and its overlap with the 35 BAC clones from the Utah database, if any, is indicated. CNV calls were made using the modified ADM2 algorithm with threshold 5, minimum absolute average log2 ratio in called intervals of 0.2 and minimum of 2 probes as described in the Methods section.
Summary of copy number variable regions detected in ten patients by the Agilent CNV array set. Each CNV region is a union of overlapping gains and/or losses detected in ten samples. For each CNV region, the information about its genomic coordinates, size, number of gains and losses detected in ten samples, overlap with 35 BAC clones from the Utah database, if any, and consistency of the CNV boundaries using +/- five probe calls in ten samples is indicated.
Summary of base pair start and stop locations for each copy number variation observed using the Agilent CNV array set corresponding with 28 BAC clones from the Utah database. Variability between patients in breakpoints, based on +/- five oligonucleotide probes is noted.
We would like to thank Drs. Sarah South and Jennifer Warren for their thoughtful discussions and critical reading of this manuscript.
This work was supported by the University of Utah CGH Microarray Laboratory and the ARUP Institute for Clinical and Experimental Pathology, ARUP Laboratories, Salt Lake City, UT (USA).