1.  The Effect of Algorithms on Copy Number Variant Detection 
PLoS ONE  2010;5(12):e14456.
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery.
Methodology and Principal Findings
We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212.
Conclusions and Significance
Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
PMCID: PMC3012691  PMID: 21209939
2.  Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform 
BMC Bioinformatics  2011;12:220.
Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.
APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.
If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.
PMCID: PMC3146450  PMID: 21627824
3.  Genomic Landscape of a Three-Generation Pedigree Segregating Affective Disorder 
PLoS ONE  2009;4(2):e4474.
Bipolar disorder (BPD) is a common psychiatric illness with a complex mode of inheritance. Besides traditional linkage and association studies, which require large sample sizes, analysis of common and rare chromosomal copy number variants (CNVs) in extended families may provide novel insights into the genetic susceptibility of complex disorders. Using the Illumina HumanHap550 BeadChip with over 550,000 SNP markers, we genotyped 46 individuals in a three-generation Old Order Amish pedigree with 19 affected (16 BPD and three major depression) and 27 unaffected subjects. Using the PennCNV algorithm, we identified 50 CNV regions that ranged in size from 12 to 885 kb and encompassed at least 10 single nucleotide polymorphisms (SNPs). Of 19 well characterized CNV regions that were available for combined genotype-expression analysis 11 (58%) were associated with expression changes of genes within, partially within or near these CNV regions in fibroblasts or lymphoblastoid cell lines at a nominal P value <0.05. To further investigate the mode of inheritance of CNVs in the large pedigree, we analyzed a set of four CNVs, located at 6q27, 9q21.11, 12p13.31 and 15q11, all of which were enriched in subjects with affective disorders. We additionally show that these variants affect the expression of neuronal genes within or near the rearrangement. Our analysis suggests that family based studies of the combined effect of common and rare CNVs at many loci may represent a useful approach in the genetic analysis of disease susceptibility of mental disorders.
PMCID: PMC2637422  PMID: 19214233
4.  Accuracy of CNV Detection from GWAS Data 
PLoS ONE  2011;6(1):e14511.
Several computer programs are available for detecting copy number variants (CNVs) using genome-wide SNP arrays. We evaluated the performance of four CNV detection software suites—Birdsuite, Partek, HelixTree, and PennCNV-Affy—in the identification of both rare and common CNVs. Each program's performance was assessed in two ways. The first was its recovery rate, i.e., its ability to call 893 CNVs previously identified in eight HapMap samples by paired-end sequencing of whole-genome fosmid clones, and 51,440 CNVs identified by array Comparative Genome Hybridization (aCGH) followed by validation procedures, in 90 HapMap CEU samples. The second evaluation was program performance calling rare and common CNVs in the Bipolar Genome Study (BiGS) data set (1001 bipolar cases and 1033 controls, all of European ancestry) as measured by the Affymetrix SNP 6.0 array. Accuracy in calling rare CNVs was assessed by positive predictive value, based on the proportion of rare CNVs validated by quantitative real-time PCR (qPCR), while accuracy in calling common CNVs was assessed by false positive/false negative rates based on qPCR validation results from a subset of common CNVs. Birdsuite recovered the highest percentages of known HapMap CNVs containing >20 markers in two reference CNV datasets. The recovery rate increased with decreased CNV frequency. In the tested rare CNV data, Birdsuite and Partek had higher positive predictive values than the other software suites. In a test of three common CNVs in the BiGS dataset, Birdsuite's call was 98.8% consistent with qPCR quantification in one CNV region, but the other two regions showed an unacceptable degree of accuracy. We found relatively poor consistency between the two “gold standards,” the sequence data of Kidd et al., and aCGH data of Conrad et al. Algorithms for calling CNVs especially common ones need substantial improvement, and a “gold standard” for detection of CNVs remains to be established.
PMCID: PMC3020939  PMID: 21249187
5.  Genome-wide analysis shows increased frequency of CNV deletions in Dutch schizophrenia patients 
Biological psychiatry  2011;70(7):655-662.
Since 2008 multiple studies have reported on copy number variations (CNVs) in schizophrenia. However, many regions are unique events with minimal overlap between studies. This makes it difficult to gain a comprehensive overview of all CNVs involved in the aetiology of schizophrenia. We performed a systematic CNV study based on a homogeneous genome-wide dataset aiming at all CNVs ≥50 kb. We complemented this analysis with a review of cytogenetic and chromosomal abnormalities for schizophrenia reported in the literature with the purpose to combine classical genetic findings and our current understanding of genomic variation.
We investigated 834 Dutch schizophrenia patients and 672 Dutch controls. CNVs were included if they were detected by QuantiSNP as well as PennCNV and contain known protein coding genes. The integrated identification of CNV regions and cytogenetic loci indicates regions of interest (CROIs).
In total, 2,437 CNVs were identified with an average number of 2.1 CNVs per subject for both cases and controls. We observed significantly more deletions, but not duplications, in schizophrenia cases versus controls. The CNVs identified coincide with loci previously reported in the literature, confirming well-established schizophrenia CROIs 1q42 and 22q11.2, as well as indicating a potentially novel CROI on chromosome 5q35.1.
Chromosomal deletions are more prevalent in schizophrenia patients than in healthy subjects and therefore confer a risk factor for pathogenicity. The combination of our CNV data with previously reported cytogenetic abnormalities in schizophrenia provides an overview of potentially interesting regions for positional candidate genes.
PMCID: PMC3137747  PMID: 21489405
copy number variation; schizophrenia; cytogenetic abnormality; deletion; duplication; candidate gene
6.  Copy Number Variations and Primary Open-Angle Glaucoma 
This study has identified rare and recurrent deletions and duplications in POAG patients in the first large-scale, whole-genome study of structural variation performed in a sample of POAG patients and POAG-free subjects.
This study sought to investigate the role of rare copy number variation (CNV) in age-related disorders of blindness, with a focus on primary open-angle glaucoma (POAG). Data are reported from a whole-genome copy number screen in a large cohort of 400 individuals with POAG and 500 age-matched glaucoma-free subjects.
DNA samples from patients and controls were tested for CNVs using a combination of two microarray platforms. The signal intensity data generated from these arrays were then analyzed with multiple CNV detection programs including CNAG version 2.0, PennCNV, and dChip.
A total of 11 validated CNVs were identified as recurrent in the POAG set and absent in the age-matched control set. This set included CNVs on 5q23.1 (DMXL1, DTWD2), 20p12 (PAK7), 12q14 (C12orf56, XPOT, TBK1, and RASSF3), 12p13.33 (TULP3), and 10q34.21 (PAX2), among others. The CNVs presented here are exceedingly rare and are not found in the Database of Genomic Variants. Moreover, expression data from ocular tissue support the role of these CNV-implicated genes in vision-related processes. In addition, CNV locations of DMXL1 and PAK7 overlap previously identified linkage signals for glaucoma on 5p23.1 and 20p12, respectively.
The data are consistent with the hypothesis that rare CNV plays a role in the development of POAG.
PMCID: PMC3207715  PMID: 21310917
7.  Novel common copy number variation for early onset extreme obesity on chromosome 11q11 identified by a genome-wide analysis 
Human Molecular Genetics  2010;20(4):840-852.
Heritability of obesity is substantial and recent meta-analyses of genome-wide association studies (GWASs) have been successful in detecting several robustly associated genomic regions for obesity using single-nucleotide polymorphisms (SNPs). However, taken together, the SNPs explain only a small proportion of the overall heritability. Copy number variations (CNVs) might contribute to the ‘missing heritability’. We searched genome-wide for association between common CNVs and early-onset extreme obesity. Four hundred and twenty-four case-parents obesity trios and an independent sample of 453 extremely obese children and adolescents and 435 normal-weight and lean adult controls were genotyped by the Affymetrix Genome-Wide Human SNP Array 6.0. We detected 20 common copy number variable regions (CNVRs) which were associated with obesity. The most promising CNVRs were followed-up in an independent sample of 365 obesity trios, confirming the association for two candidate CNVRs. We identified a common CNVR exclusively covering the three olfactory receptor genes OR4P4, OR4S2 and OR4C6 to be associated with obesity (combined P-value = 0.015 in a total of 789 families; odds ratio for the obesity effect allele = 1.19; 95% confidence interval = 1.016–1.394). We also replicated two common deletions (near NEGR1 and at chromosome 10q11.22) that have previously been reported to be associated with body weight. Additionally, we support a rare CNV on chromosome 16 that has recently been reported by two independent groups. However, rare CNVs had not been the focus of our study. We conclude that common CNVs are unlikely to contribute substantially to the genetic basis of early-onset extreme obesity.
PMCID: PMC3024044  PMID: 21131291
8.  A Duplication CNV That Conveys Traits Reciprocal to Metabolic Syndrome and Protects against Diet-Induced Obesity in Mice and Men 
PLoS Genetics  2012;8(5):e1002713.
The functional contribution of CNV to human biology and disease pathophysiology has undergone limited exploration. Recent observations in humans indicate a tentative link between CNV and weight regulation. Smith-Magenis syndrome (SMS), manifesting obesity and hypercholesterolemia, results from a deletion CNV at 17p11.2, but is sometimes due to haploinsufficiency of a single gene, RAI1. The reciprocal duplication in 17p11.2 causes Potocki-Lupski syndrome (PTLS). We previously constructed mouse strains with a deletion, Df(11)17, or duplication, Dp(11)17, of the mouse genomic interval syntenic to the SMS/PTLS region. We demonstrate that Dp(11)17 is obesity-opposing; it conveys a highly penetrant, strain-independent phenotype of reduced weight, leaner body composition, lower TC/LDL, and increased insulin sensitivity that is not due to alteration in food intake or activity level. When fed with a high-fat diet, Dp(11)17/+ mice display much less weight gain and metabolic change than WT mice, demonstrating that the Dp(11)17 CNV protects against metabolic syndrome. Reciprocally, Df(11)17/+ mice with the deletion CNV have increased weight, higher fat content, decreased HDL, and reduced insulin sensitivity, manifesting a bona fide metabolic syndrome. These observations in the deficiency animal model are supported by human data from 76 SMS subjects. Further, studies on knockout/transgenic mice showed that the metabolic consequences of Dp(11)17 and Df(11)17 CNVs are not only due to dosage alterations of Rai1, the predominant dosage-sensitive gene for SMS and likely also PTLS. Our experiments in chromosome-engineered mouse CNV models for human genomic disorders demonstrate that a CNV can be causative for weight/metabolic phenotypes. Furthermore, we explored the biology underlying the contribution of CNV to the physiology of weight control and energy metabolism. The high penetrance, strain independence, and resistance to dietary influences associated with the CNVs in this study are features distinct from most SNP–associated metabolic traits and further highlight the potential importance of CNV in the etiology of both obesity and MetS as well as in the protection from these traits.
Author Summary
Genetic factors play a large role in obesity. However, despite recent technical progress in the search for genetic variants, the identities of causative and contributory genetic factors remain largely unknown. Whereas nucleotide sequence variation has been studied extensively with respect to its potential contribution to obesity, copy number variations (CNV), in which genes exist in abnormal numbers of copies mostly due to duplication or deletion, have only more recently been observed to be associated with human obesity. In this report, we utilize chromosome engineered mouse strains harboring a deletion or duplication CNV to address the potential functional impact of CNVs on weight control and metabolism. We show that the duplication CNV leads to lower body weight; it is also metabolically advantageous and protects from diet-induced obesity and metabolic syndrome (MetS). The deletion CNV causes a “mirror” phenotype with increased body weight and MetS–like phenotypes. Importantly, these effects manifest regardless of the genetic background and do not appear to be attributable to any single gene. These findings demonstrate experimentally that CNV can be causative for weight and metabolic phenotypes and highlight the potential relevance and importance of CNV in the etiology of obesity/MetS and the protection from these traits.
PMCID: PMC3359973  PMID: 22654670
9.  Copy Number Variation in Familial Parkinson Disease 
PLoS ONE  2011;6(8):e20988.
Copy number variants (CNVs) are known to cause Mendelian forms of Parkinson disease (PD), most notably in SNCA and PARK2. PARK2 has a recessive mode of inheritance; however, recent evidence demonstrates that a single CNV in PARK2 (but not a single missense mutation) may increase risk for PD. We recently performed a genome-wide association study for PD that excluded individuals known to have either a LRRK2 mutation or two PARK2 mutations. Data from the Illumina370Duo arrays were re-clustered using only white individuals with high quality intensity data, and CNV calls were made using two algorithms, PennCNV and QuantiSNP. After quality assessment, the final sample included 816 cases and 856 controls. Results varied between the two CNV calling algorithms for many regions, including the PARK2 locus (genome-wide p = 0.04 for PennCNV and p = 0.13 for QuantiSNP). However, there was consistent evidence with both algorithms for two novel genes, USP32 and DOCK5 (empirical, genome-wide p-values<0.001). PARK2 CNVs tended to be larger, and all instances that were molecularly tested were validated. In contrast, the CNVs in both novel loci were smaller and failed to replicate using real-time PCR, MLPA, and gel electrophoresis. The DOCK5 variation is more akin to a VNTR than a typical CNV and the association is likely caused by artifact due to DNA source. DNA for all the cases was derived from whole blood, while the DNA for all controls was derived from lymphoblast cell lines. The USP32 locus contains many SNPs with low minor allele frequency leading to a loss of heterozygosity that may have been spuriously interpreted by the CNV calling algorithms as support for a deletion. Thus, only the CNVs within the PARK2 locus could be molecularly validated and associated with PD susceptibility.
PMCID: PMC3149037  PMID: 21829596
10.  Rare chromosomal deletions and duplications in attention-deficit hyperactivity disorder: a genome-wide analysis 
Lancet  2010;376(9750):1401-1408.
Large, rare chromosomal deletions and duplications known as copy number variants (CNVs) have been implicated in neurodevelopmental disorders similar to attention-deficit hyperactivity disorder (ADHD). We aimed to establish whether burden of CNVs was increased in ADHD, and to investigate whether identified CNVs were enriched for loci previously identified in autism and schizophrenia.
We undertook a genome-wide analysis of CNVs in 410 children with ADHD and 1156 unrelated ethnically matched controls from the 1958 British Birth Cohort. Children of white UK origin, aged 5–17 years, who met diagnostic criteria for ADHD or hyperkinetic disorder, but not schizophrenia and autism, were recruited from community child psychiatry and paediatric outpatient clinics. Single nucleotide polymorphisms (SNPs) were genotyped in the ADHD and control groups with two arrays; CNV analysis was limited to SNPs common to both arrays and included only samples with high-quality data. CNVs in the ADHD group were validated with comparative genomic hybridisation. We assessed the genome-wide burden of large (>500 kb), rare (<1% population frequency) CNVs according to the average number of CNVs per sample, with significance assessed via permutation. Locus-specific tests of association were undertaken for test regions defined for all identified CNVs and for 20 loci implicated in autism or schizophrenia. Findings were replicated in 825 Icelandic patients with ADHD and 35 243 Icelandic controls.
Data for full analyses were available for 366 children with ADHD and 1047 controls. 57 large, rare CNVs were identified in children with ADHD and 78 in controls, showing a significantly increased rate of CNVs in ADHD (0·156 vs 0·075; p=8·9×10−5). This increased rate of CNVs was particularly high in those with intellectual disability (0·424; p=2·0×10−6), although there was also a significant excess in cases with no such disability (0·125, p=0·0077). An excess of chromosome 16p13.11 duplications was noted in the ADHD group (p=0·0008 after correction for multiple testing), a finding that was replicated in the Icelandic sample (p=0·031). CNVs identified in our ADHD cohort were significantly enriched for loci previously reported in both autism (p=0·0095) and schizophrenia (p=0·010).
Our findings provide genetic evidence of an increased rate of large CNVs in individuals with ADHD and suggest that ADHD is not purely a social construct.
Action Research; Baily Thomas Charitable Trust; Wellcome Trust; UK Medical Research Council; European Union.
PMCID: PMC2965350  PMID: 20888040
11.  Copy Number Variants in German Patients with Schizophrenia 
PLoS ONE  2013;8(7):e64035.
Large rare copy number variants (CNVs) have been recognized as significant genetic risk factors for the development of schizophrenia (SCZ). However, due to their low frequency (1∶150 to 1∶1000) among patients, large sample sizes are needed to detect an association between specific CNVs and SCZ. So far, the majority of genome-wide CNV analyses have focused on reporting only CNVs that reached a significant P-value within the study cohort and merely confirmed the frequency of already-established risk-carrying CNVs. As a result, CNVs with a very low frequency that might be relevant for SCZ susceptibility are lost for secondary analyses. In this study, we provide a concise collection of high-quality CNVs in a large German sample consisting of 1,637 patients with SCZ or schizoaffective disorder and 1,627 controls. All individuals were genotyped on Illumina's BeadChips and putative CNVs were identified using QuantiSNP and PennCNV. Only those CNVs that were detected by both programs and spanned ≥30 consecutive SNPs were included in the data collection and downstream analyses (2,366 CNVs, 0.73 CNVs per individual). The genome-wide analysis did not reveal a specific association between a previously unknown CNV and SCZ. However, the group of CNVs previously reported to be associated with SCZ was more frequent in our patients than in the controls. The publication of our dataset will serve as a unique, easily accessible, high-quality CNV data collection for other research groups. The dataset could be useful for the identification of new disease-relevant CNVs that are currently overlooked due to their very low frequency and lack of power for their detection in individual studies.
PMCID: PMC3699619  PMID: 23843933
12.  Rare Copy Number Variations in Adults with Tetralogy of Fallot Implicate Novel Risk Gene Pathways 
PLoS Genetics  2012;8(8):e1002843.
Structural genetic changes, especially copy number variants (CNVs), represent a major source of genetic variation contributing to human disease. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease, but to date little is known about the role of CNVs in the etiology of TOF. Using high-resolution genome-wide microarrays and stringent calling methods, we investigated rare CNVs in a prospectively recruited cohort of 433 unrelated adults with TOF and/or pulmonary atresia at a single centre. We excluded those with recognized syndromes, including 22q11.2 deletion syndrome. We identified candidate genes for TOF based on converging evidence between rare CNVs that overlapped the same gene in unrelated individuals and from pathway analyses comparing rare CNVs in TOF cases to those in epidemiologic controls. Even after excluding the 53 (10.7%) subjects with 22q11.2 deletions, we found that adults with TOF had a greater burden of large rare genic CNVs compared to controls (8.82% vs. 4.33%, p = 0.0117). Six loci showed evidence for recurrence in TOF or related congenital heart disease, including typical 1q21.1 duplications in four (1.18%) of 340 Caucasian probands. The rare CNVs implicated novel candidate genes of interest for TOF, including PLXNA2, a gene involved in semaphorin signaling. Independent pathway analyses highlighted developmental processes as potential contributors to the pathogenesis of TOF. These results indicate that individually rare CNVs are collectively significant contributors to the genetic burden of TOF. Further, the data provide new evidence for dosage sensitive genes in PLXNA2-semaphorin signaling and related developmental processes in human cardiovascular development, consistent with previous animal models.
Author Summary
Congenital heart disease affects nearly 1% of all live births. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease. This condition is associated with hemizygous deletions of chromosome 22q11.2 and chromosomal trisomies, but little else is known about the genetic heterogeneity of this complex disease. We used high-resolution microarrays and stringent methods to study structural (copy number) variants in a systematically phenotyped cohort of unrelated adults with TOF. We found that individually rare genic copy number variants (CNVs) were collectively significant contributors to the genetic burden in TOF. Among CNVs that implicated candidate genes of interest were loss CNVs overlapping the PLXNA2 gene that codes for plexin A2. This is the first study to show a role for this semaphorin receptor in human congenital heart disease, consistent with a Plxna2 mouse knockout phenotype. Pathway analyses comparing rare exonic loss CNVs in the TOF sample to controls implicated other novel gene sets suggest new pathogenetic mechanisms.
PMCID: PMC3415418  PMID: 22912587
13.  Detecting Large Copy Number Variants Using Exome Genotyping Arrays In a Large Swedish Schizophrenia Sample 
Molecular psychiatry  2013;18(11):1178-1184.
Although copy number variants (CNVs) are important in genomic medicine, CNVs have not been systematically assessed for many complex traits. Several large rare CNVs increase risk for schizophrenia (SCZ) and autism and often demonstrate pleiotropic effects; however, their frequencies in the general population and other complex traits are unknown. Genotyping large numbers of samples is essential for progress. Large cohorts from many different diseases are being genotyped using exome-focused arrays designed to detect uncommon or rare protein-altering sequence variation. Although these arrays were not designed for CNV detection, the hybridization intensity data generated in each experiment could, in principle, be used for gene-focused CNV analysis. Our goal was to evaluate the extent to which CNVs can be detected using data from one particular exome array (the Illumina Human Exome Bead Chip). We genotyped 9, 100 Swedish subjects (3, 962 cases with SCZ and 5, 138 controls) using both standard GWAS arrays and exome arrays. In comparison to CNVs detected using GWAS arrays, we observed high sensitivity and specificity for detecting genic CNVs ≥400 kb including known pathogentic CNVs along with replicating the literature finding that cases with SCZ had greater enrichment for genic CNVs. Our data confirm the association of SCZ with 16p11.2 duplications and 22q11.2 deletions and suggest a novel association with deletions at 11q12.2. Our results suggest the utility of exome focused arrays in surveying large genic CNVs in very large samples; and thereby open the door for new opportunities such as conducting well-powered CNV assessment and comparisons between different diseases. The use of a single platform also minimizes potential confounding factors that could impact accurate detection.
PMCID: PMC3966073  PMID: 23938935
schizophrenia; copy number variation; structural variation; genotyping; Illumina; exome array
14.  Genome-wide algorithm for detecting CNV associations with diseases 
BMC Bioinformatics  2011;12:331.
SNP genotyping arrays have been developed to characterize single-nucleotide polymorphisms (SNPs) and DNA copy number variations (CNVs). Nonparametric and model-based statistical algorithms have been developed to detect CNVs from SNP data using the marker intensities. However, these algorithms lack specificity to detect small CNVs owing to the high false positive rate when calling CNVs based on the intensity values. Therefore, the resulting association tests lack power even if the CNVs affecting disease risk are common. An alternative procedure called PennCNV uses information from both the marker intensities as well as the genotypes and therefore has increased sensitivity.
By using the hidden Markov model (HMM) implemented in PennCNV to derive the probabilities of different copy number states which we subsequently used in a logistic regression model, we developed a new genome-wide algorithm to detect CNV associations with diseases. We compared this new method with association test applied to the most probable copy number state for each individual that is provided by PennCNV after it performs an initial HMM analysis followed by application of the Viterbi algorithm, which removes information about copy number probabilities. In one of our simulation studies, we showed that for large CNVs (number of SNPs ≥ 10), the association tests based on PennCNV calls gave more significant results, but the new algorithm retained high power. For small CNVs (number of SNPs <10), the logistic algorithm provided smaller average p-values (e.g., p = 7.54e - 17 when relative risk RR = 3.0) in all the scenarios and could capture signals that PennCNV did not (e.g., p = 0.020 when RR = 3.0). From a second set of simulations, we showed that the new algorithm is more powerful in detecting disease associations with small CNVs (number of SNPs ranging from 3 to 5) under different penetrance models (e.g., when RR = 3.0, for relatively weak signals, power = 0.8030 comparing to 0.2879 obtained from the association tests based on PennCNV calls). The new method was implemented in software GWCNV. It is freely available at, distributed under a GPL license.
We conclude that the new algorithm is more sensitive and can be more powerful in detecting CNV associations with diseases than the existing HMM algorithm, especially when the CNV association signal is weak and a limited number of SNPs are located in the CNV.
PMCID: PMC3173460  PMID: 21827692
15.  Copy number variations in 6q14.1 and 5q13.2 are associated with alcohol dependence 
Excessive alcohol use is the third leading cause of preventable death and is highly correlated with alcohol dependence, a heritable phenotype. Many genetic factors for alcohol dependence have been found, but many remain unknown. In search of additional genetic factors, we examined the association between DSM-IV alcohol dependence and all common copy number variations (CNV) with good reliability in the Study of Addiction: Genetics and Environment (SAGE).
All participants in SAGE were interviewed using the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA), as a part of three contributing studies. 2,610 non-Hispanic European American samples were genotyped on the Illumina Human 1M array. We performed CNV calling by CNVpartition, PennCNV and QuantiSNP and only CNVs identified by all three software programs were examined. Association was conducted with the CNV (as a deletion/duplication) as well as with probes in the CNV region. Quantitative polymerase chain reaction (qPCR) was used to validate the CNVs in the laboratory.
CNVs in 6q14.1 (P= 1.04 × 10−6) and 5q13.2 (P= 3.37 × 10−4) were significantly associated with alcohol dependence after adjusting multiple tests. On chromosome 5q13.2 there were multiple candidate genes previously associated with various neurological disorders. The region on chromosome 6q14.1 is a gene desert that has been associated with mental retardation, and language delay. The CNV in 5q13.2 was validated whereas only a component of the CNV on 6q14.1 was validated by qPCR. Thus, the CNV on 6q14.1 should be viewed with caution.
This is the first study to show an association between DSM-IV alcohol dependence and CNVs. CNVs in regions previously associated with neurological disorders may be associated with alcohol dependence.
PMCID: PMC3436997  PMID: 22702843
Copy Number Variations; Alcohol dependence; CNV Accuracy
Genetic epidemiology  2012;36(3):253-262.
A major concern for all copy number variation (CNV) detection algorithms is their reliability and repeatability. However, it is difficult to evaluate the reliability of CNV calling strategies due to the lack of gold standard data that would tell us which CNVs are real. We propose that if CNVs are called in duplicate samples, or inherited from parent to child, then these can be considered validated CNVs. We used two large family-based Genome-Wide Association Study (GWAS) datasets from the GENEVA consortium to look at concordance rates of CNV calls between duplicate samples, parent-child pairs, and unrelated pairs. Our goal was to make recommendations for ways to filter and use CNV calls in GWAS datasets that do not include family data. We used PennCNV as our primary CNV-calling algorithm, and tested CNV calls using different datasets and marker sets, and with various filters on CNVs and samples. Using the Illumina core HumanHap550 SNP (single nucleotide polymorphism) set, we saw duplicate concordance rates of approximately 55% and parent-child transmission rates of approximately 28% in our datasets. GC model adjustment and sample quality filtering had little effect on these reliability measures. Stratification on CNV size and DNA sample type did have some effect. Overall, our results show that it is probably not possible to find a CNV calling strategy (including filtering and algorithm) that will give us a set of “reliable” CNV calls using current chip technologies. But if we understand the error process, we can still use CNV calls appropriately in genetic association studies.
PMCID: PMC3696390  PMID: 22714937
evaluation; CNV calling strategies; family-based GWAS
17.  Genome-wide association study identifies a maternal copy-number deletion in PSG11 enriched among preeclampsia patients 
Specific genetic contributions for preeclampsia (PE) are currently unknown. This genome-wide association study (GWAS) aims to identify maternal single nucleotide polymorphisms (SNPs) and copy-number variants (CNVs) involved in the etiology of PE.
A genome-wide scan was performed on 177 PE cases (diagnosed according to National Heart, Lung and Blood Institute guidelines) and 116 normotensive controls. White female study subjects from Iowa were genotyped on Affymetrix SNP 6.0 microarrays. CNV calls made using a combination of four detection algorithms (Birdseye, Canary, PennCNV, and QuantiSNP) were merged using CNVision and screened with stringent prioritization criteria. Due to limited DNA quantities and the deleterious nature of copy-number deletions, it was decided a priori that only deletions would be selected for assay on the entire case-control dataset using quantitative real-time PCR.
The top four SNP candidates had an allelic or genotypic p-value between 10-5 and 10-6, however, none surpassed the Bonferroni-corrected significance threshold. Three recurrent rare deletions meeting prioritization criteria detected in multiple cases were selected for targeted genotyping. A locus of particular interest was found showing an enrichment of case deletions in 19q13.31 (5/169 cases and 1/114 controls), which encompasses the PSG11 gene contiguous to a highly plastic genomic region. All algorithm calls for these regions were assay confirmed.
CNVs may confer risk for PE and represent interesting regions that warrant further investigation. Top SNP candidates identified from the GWAS, although not genome-wide significant, may be useful to inform future studies in PE genetics.
PMCID: PMC3476390  PMID: 22748001
Copy-number variant; Genome-wide association study; Microarray analysis; Preeclampsia; Single nucleotide polymorphism
18.  Evaluation of copy number variation detection for a SNP array platform 
BMC Bioinformatics  2014;15:50.
Copy Number Variations (CNVs) are usually inferred from Single Nucleotide Polymorphism (SNP) arrays by use of some software packages based on given algorithms. However, there is no clear understanding of the performance of these software packages; it is therefore difficult to select one or several software packages for CNV detection based on the SNP array platform.
We selected four publicly available software packages designed for CNV calling from an Affymetrix SNP array, including Birdsuite, dChip, Genotyping Console (GTC) and PennCNV. The publicly available dataset generated by Array-based Comparative Genomic Hybridization (CGH), with a resolution of 24 million probes per sample, was considered to be the “gold standard”. Compared with the CGH-based dataset, the success rate, average stability rate, sensitivity, consistence and reproducibility of these four software packages were assessed compared with the “gold standard”. Specially, we also compared the efficiency of detecting CNVs simultaneously by two, three and all of the software packages with that by a single software package.
Simply from the quantity of the detected CNVs, Birdsuite detected the most while GTC detected the least. We found that Birdsuite and dChip had obvious detecting bias. And GTC seemed to be inferior because of the least amount of CNVs it detected. Thereafter we investigated the detection consistency produced by one certain software package and the rest three software suits. We found that the consistency of dChip was the lowest while GTC was the highest. Compared with the CNVs detecting result of CGH, in the matching group, GTC called the most matching CNVs, PennCNV-Affy ranked second. In the non-overlapping group, GTC called the least CNVs. With regards to the reproducibility of CNV calling, larger CNVs were usually replicated better. PennCNV-Affy shows the best consistency while Birdsuite shows the poorest.
We found that PennCNV outperformed the other three packages in the sensitivity and specificity of CNV calling. Obviously, each calling method had its own limitations and advantages for different data analysis. Therefore, the optimized calling methods might be identified using multiple algorithms to evaluate the concordance and discordance of SNP array-based CNV calling.
PMCID: PMC4015297  PMID: 24555668
CNV; CGH; Evaluation; Comparison; Performance test; Reproducibility test; Success rate; Birdsuite; dChip; GTC; PennCNV
19.  New Copy Number Variations in Schizophrenia 
PLoS ONE  2010;5(10):e13422.
Genome-wide screenings for copy number variations (CNVs) in patients with schizophrenia have demonstrated the presence of several CNVs that increase the risk of developing the disease and a growing number of large rare CNVs; the contribution of these rare CNVs to schizophrenia remains unknown. Using Affymetrix 6.0 arrays, we undertook a systematic search for CNVs in 172 patients with schizophrenia and 160 healthy controls, all of Italian origin, with the aim of confirming previously identified loci and identifying novel schizophrenia susceptibility genes. We found five patients with a CNV occurring in one of the regions most convincingly implicated as risk factors for schizophrenia: NRXN1 and the 16p13.1 regions were found to be deleted in single patients and 15q11.2 in 2 patients, whereas the 15q13.3 region was duplicated in one patient. Furthermore, we found three distinct patients with CNVs in 2q12.2, 3q29 and 17p12 loci, respectively. These loci were previously reported to be deleted or duplicated in patients with schizophrenia but were never formally associated with the disease. We found 5 large CNVs (>900 kb) in 4q32, 5q14.3, 8q23.3, 11q25 and 17q12 in five different patients that could include some new candidate schizophrenia susceptibility genes. In conclusion, the identification of previously reported CNVs and of new, rare, large CNVs further supports a model of schizophrenia that includes the effect of multiple, rare, highly penetrant variants.
PMCID: PMC2954184  PMID: 20967226
20.  Relative Burden of Large CNVs on a Range of Neurodevelopmental Phenotypes 
PLoS Genetics  2011;7(11):e1002334.
While numerous studies have implicated copy number variants (CNVs) in a range of neurological phenotypes, the impact relative to disease severity has been difficult to ascertain due to small sample sizes, lack of phenotypic details, and heterogeneity in platforms used for discovery. Using a customized microarray enriched for genomic hotspots, we assayed for large CNVs among 1,227 individuals with various neurological deficits including dyslexia (376), sporadic autism (350), and intellectual disability (ID) (501), as well as 337 controls. We show that the frequency of large CNVs (>1 Mbp) is significantly greater for ID–associated phenotypes compared to autism (p = 9.58×10−11, odds ratio = 4.59), dyslexia (p = 3.81×10−18, odds ratio = 14.45), or controls (p = 2.75×10−17, odds ratio = 13.71). There is a striking difference in the frequency of rare CNVs (>50 kbp) in autism (10%, p = 2.4×10−6, odds ratio = 6) or ID (16%, p = 3.55×10−12, odds ratio = 10) compared to dyslexia (2%) with essentially no difference in large CNV burden among dyslexia patients compared to controls. Rare CNVs were more likely to arise de novo (64%) in ID when compared to autism (40%) or dyslexia (0%). We observed a significantly increased large CNV burden in individuals with ID and multiple congenital anomalies (MCA) compared to ID alone (p = 0.001, odds ratio = 2.54). Our data suggest that large CNV burden positively correlates with the severity of childhood disability: ID with MCA being most severely affected and dyslexics being indistinguishable from controls. When autism without ID was considered separately, the increase in CNV burden was modest compared to controls (p = 0.07, odds ratio = 2.33).
Author Summary
Deletions and duplications, termed copy number variants (CNVs), have been implicated in a variety of neurodevelopmental disorders including intellectual disability (ID), autism, and schizophrenia. Our understanding of the relevance of large, rare CNVs in a range of neurodevelopmental phenotypes, varying in severity and prevalence, has been difficult because these studies were restricted to the analysis of one disorder at a time using different CNV detection platforms, insufficient sample sizes, and a lack of detailed clinical information. We tested 1,227 individuals with different neurological diseases including dyslexia, autism, and ID using the same CNV detection platform. We observed striking differences in CNV burden and inheritance characteristics among these cohorts and show that ID is the primary correlate of large CNV burden. This correlation is well illustrated by a comparison of autism patients with and without ID—where the latter show only modest increases in large CNV burden compared to controls. We also find significant depletion in the frequency of large CNVs in dyslexia compared to the other cohorts. Further studies on larger sets of individuals using high-resolution arrays and next-generation sequencing are warranted for a detailed understanding of the relative contribution of genetic variants to neurodevelopmental disorders.
PMCID: PMC3213131  PMID: 22102821
21.  The Impact of CNVs on Outcomes for Infants with Single Ventricle Heart Defects 
Human genomes harbor copy number variants (CNVs), regions of DNA gains or losses. While pathogenic CNVs are associated with congenital heart disease (CHD), their impact on clinical outcomes is unknown. This study sought to determine whether pathogenic CNVs among infants with single ventricle (SV) physiology were associated with inferior neurocognitive and somatic growth outcomes.
Methods and Results
Genomic DNAs from 223 subjects of two National Heart, Lung, and Blood Institute-sponsored randomized clinical trials with infants with SV CHD and 270 controls from The Cancer Genome Atlas project were analyzed for rare CNVs >300 kb using array comparative genomic hybridization. Neurocognitive and growth outcomes at 14 months from the CHD trials were compared among subjects with and without pathogenic CNVs. Putatively pathogenic CNVs, comprising 25 duplications and 6 deletions, had a prevalence of 13.9%, significantly greater than the 4.4% rate of such CNVs among controls. CNVs associated with genomic disorders were found in 13 cases but no control. Several CNVs likely to be causative of SV CHD were observed, including aberrations altering the dosage of GATA4, MYH11, and GJA5. Subjects with pathogenic CNVs had worse linear growth, and those with CNVs associated with known genomic disorders had the poorest neurocognitive and growth outcomes. A minority of children with pathogenic CNVs were noted to be dysmorphic on clinical genetics examination.
Pathogenic CNVs appear to contribute to the etiology of SV forms of CHD in at least 10% of cases, are clinically subtle but adversely affect outcomes in children harboring them.
PMCID: PMC3987966  PMID: 24021551
copy number variant; congenital cardiac defect; outcome; hypoplastic left heart syndrome
22.  Genome-Wide Survey of Large Rare Copy Number Variants in Alzheimer’s Disease Among Caribbean Hispanics 
G3: Genes|Genomes|Genetics  2012;2(1):71-78.
Recently genome-wide association studies have identified significant association between Alzheimer’s disease (AD) and variations in CLU, PICALM, BIN1, CR1, MS4A4/MS4A6E, CD2AP, CD33, EPHA1, and ABCA7. However, the pathogenic variants in these loci have not yet been found. We conducted a genome-wide scan for large copy number variation (CNV) in a dataset of Caribbean Hispanic origin (554 controls and 559 AD cases that were previously investigated in a SNP-based genome-wide association study using Illumina HumanHap 650Y platform). We ran four CNV calling algorithms to obtain high-confidence calls for large CNVs (>100 kb) that were detected by at least two algorithms. Global burden analyses did not reveal significant differences between cases and controls in CNV rate, distribution of deletions or duplications, total or average CNV size; or number of genes affected by CNVs. However, we observed a nominal association between AD and a ∼470 kb duplication on chromosome 15q11.2 (P = 0.037). This duplication, encompassing up to five genes (TUBGCP5, CYFIP1, NIPA2, NIPA1, and WHAMML1) was present in 10 cases (2.6%) and 3 controls (0.8%). The dosage increase of CYFIP1 and NIPA1 genes was further confirmed by quantitative PCR. The current study did not detect CNVs that affect novel AD loci identified by recent genome-wide association studies. However, because the array technology used in our study has limitations in detecting small CNVs, future studies must carefully assess novel AD genes for the presence of disease-related CNVs.
PMCID: PMC3276183  PMID: 22384383
gene; deletion; duplication; Alzheimer’s Disease; copy number variants
23.  Increased rate of sporadic and recurrent rare genic copy number variants in Parkinson's disease among Ashkenazi Jews 
To date, only one genome-wide study has assessed the contribution of copy number variants (CNVs) to Parkinson's disease (PD). We conducted a genome-wide scan for CNVs in a case–control dataset of Ashkenazi Jewish (AJ) origin (268 PD cases and 178 controls). Using high-confidence CNVs, we examined the global genome wide burden of large (≥100 kb) and rare (≤1% in the dataset) CNVs between cases and controls. A total of 986 such CNVs were observed in our dataset of 432 subjects. Overall global burden analyses did not reveal significant differences between cases and controls in CNV rate, distribution of deletions or duplications or number of genes affected by CNVs. Overall deletions (total CNV size and ≥2× frequency) were found 1.4 times more often in cases than in controls (P = 0.019). The large CNVs (≥500 kb) were also significantly associated with PD (P = 0.046, 1.24-fold higher in cases than in controls). Global burden was elevated for rare CNV regions. Specifically, for OVOS2 on Chr12p11.21, CNVs were observed only in PD cases (n = 7) but not in controls (P = 0.028) and this was experimentally validated. A total of 81 PD cases carried a rare genic CNV that was absent in controls. Ingenuity pathway analysis (IPA) identified ATXN3, FBXW7, CHCHD3, HSF1, KLC1, and MBD3 in the same disease pathway with known PD genes.
PMCID: PMC3782064  PMID: 24073418
Ashkenazi Jews; candidate genes; case–control study; CNV; Parkinson's disease
24.  Genome-wide analysis of rare copy number variations reveals PARK2 as a candidate gene for attention-deficit/hyperactivity disorder 
Molecular Psychiatry  2012;19(1):115-121.
Attention-deficit/hyperactivity disorder (ADHD) is a common, highly heritable neurodevelopmental disorder. Genetic loci have not yet been identified by genome-wide association studies. Rare copy number variations (CNVs), such as chromosomal deletions or duplications, have been implicated in ADHD and other neurodevelopmental disorders. To identify rare (frequency ⩽1%) CNVs that increase the risk of ADHD, we performed a whole-genome CNV analysis based on 489 young ADHD patients and 1285 adult population-based controls and identified one significantly associated CNV region. In tests for a global burden of large (>500 kb) rare CNVs, we observed a nonsignificant (P=0.271) 1.126-fold enriched rate of subjects carrying at least one such CNV in the group of ADHD cases. Locus-specific tests of association were used to assess if there were more rare CNVs in cases compared with controls. Detected CNVs, which were significantly enriched in the ADHD group, were validated by quantitative (q)PCR. Findings were replicated in an independent sample of 386 young patients with ADHD and 781 young population-based healthy controls. We identified rare CNVs within the parkinson protein 2 gene (PARK2) with a significantly higher prevalence in ADHD patients than in controls (P=2.8 × 10−4 after empirical correction for genome-wide testing). In total, the PARK2 locus (chr 6: 162 659 756–162 767 019) harboured three deletions and nine duplications in the ADHD patients and two deletions and two duplications in the controls. By qPCR analysis, we validated 11 of the 12 CNVs in ADHD patients (P=1.2 × 10−3 after empirical correction for genome-wide testing). In the replication sample, CNVs at the PARK2 locus were found in four additional ADHD patients and one additional control (P=4.3 × 10−2). Our results suggest that copy number variants at the PARK2 locus contribute to the genetic susceptibility of ADHD. Mutations and CNVs in PARK2 are known to be associated with Parkinson disease.
PMCID: PMC3873032  PMID: 23164820
ADHD; children; CNVs; GWAS; PARK2
25.  A Genome-Wide Investigation of SNPs and CNVs in Schizophrenia 
PLoS Genetics  2009;5(2):e1000373.
We report a genome-wide assessment of single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) in schizophrenia. We investigated SNPs using 871 patients and 863 controls, following up the top hits in four independent cohorts comprising 1,460 patients and 12,995 controls, all of European origin. We found no genome-wide significant associations, nor could we provide support for any previously reported candidate gene or genome-wide associations. We went on to examine CNVs using a subset of 1,013 cases and 1,084 controls of European ancestry, and a further set of 60 cases and 64 controls of African ancestry. We found that eight cases and zero controls carried deletions greater than 2 Mb, of which two, at 8p22 and 16p13.11-p12.4, are newly reported here. A further evaluation of 1,378 controls identified no deletions greater than 2 Mb, suggesting a high prior probability of disease involvement when such deletions are observed in cases. We also provide further evidence for some smaller, previously reported, schizophrenia-associated CNVs, such as those in NRXN1 and APBA2. We could not provide strong support for the hypothesis that schizophrenia patients have a significantly greater “load” of large (>100 kb), rare CNVs, nor could we find common CNVs that associate with schizophrenia. Finally, we did not provide support for the suggestion that schizophrenia-associated CNVs may preferentially disrupt genes in neurodevelopmental pathways. Collectively, these analyses provide the first integrated study of SNPs and CNVs in schizophrenia and support the emerging view that rare deleterious variants may be more important in schizophrenia predisposition than common polymorphisms. While our analyses do not suggest that implicated CNVs impinge on particular key pathways, we do support the contribution of specific genomic regions in schizophrenia, presumably due to recurrent mutation. On balance, these data suggest that very few schizophrenia patients share identical genomic causation, potentially complicating efforts to personalize treatment regimens.
Author Summary
Schizophrenia is a highly heritable disease. While the drugs commonly used to treat schizophrenia offer important relief from some symptoms, other symptoms are not well treated, and the drugs cause serious adverse effects in many individuals. This has fueled intense interest over the years in identifying genetic contributors to schizophrenia. In this paper, we first show that common genetic variants, the focus of most research until recently, do not seem to have a major impact on schizophrenia predisposition. We then provide further evidence that very rare, large DNA deletions and duplications contribute to or explain a minority of schizophrenia cases. Although the small number of events identified here do not restrict focus to a finite set of molecular pathways, we do show one event that deletes a gene known to interact with DISC1, a gene known to cause psychiatric problems in one family. Such convergent findings have potential implications for the development of new therapies and patient subclassifications. We conclude that schizophrenia genetics research must turn sharply toward the identification of rare genetic contributors and that the most important tool in this effort will be complete whole-genome sequencing of patients whose clinical characteristics have been very thoroughly assessed.
PMCID: PMC2631150  PMID: 19197363

