Brain arteriovenous malformations (BAVM) are clusters of abnormal blood vessels, with shunting of blood from the arterial to venous circulation and a high risk of rupture and intracranial hemorrhage. Most BAVMs are sporadic, but also occur in patients with Hereditary Hemorrhagic Telangiectasia, a Mendelian disorder caused by mutations in genes in the transforming growth factor beta (TGFβ) signaling pathway.
To investigate whether copy number variations (CNVs) contribute to risk of sporadic BAVM, we performed a genome-wide association study in 371 sporadic BAVM cases and 563 healthy controls, all Caucasian. Cases and controls were genotyped using the Affymetrix 6.0 array. CNVs were called using the PennCNV and Birdsuite algorithms and analyzed via segment-based and gene-based approaches. Common and rare CNVs were evaluated for association with BAVM.
A CNV region on 1p36.13, containing the neuroblastoma breakpoint family, member 1 gene (NBPF1), was significantly enriched with duplications in BAVM cases compared to controls (P = 2.2×10−9); NBPF1 was also significantly associated with BAVM in gene-based analysis using both PennCNV and Birdsuite. We experimentally validated the 1p36.13 duplication; however, the association did not replicate in an independent cohort of 184 sporadic BAVM cases and 182 controls (OR = 0.81, P = 0.8). Rare CNV analysis did not identify genes significantly associated with BAVM.
We did not identify common CNVs associated with sporadic BAVM that replicated in an independent cohort. Replication in larger cohorts is required to elucidate the possible role of common or rare CNVs in BAVM pathogenesis.
Copy number variations (CNVs), a major source of human genetic polymorphism, have been suggested to have an important role in genetic susceptibility to common diseases such as cancer, immune diseases and neurological disorders. Nasopharyngeal carcinoma (NPC) is a multifactorial tumor closely associated with genetic background and with a male preponderance over female (3:1). Previous genome-wide association studies have identified single-nucleotide polymorphisms (SNPs) that are associated with NPC susceptibility. Here, we sought to explore the possible association of CNVs with NPC predisposition. Utilizing genome-wide SNP-based arrays and five CNV-prediction algorithms, we identified eight regions with CNV that were significantly overrepresented in NPC patients compared with healthy controls. These CNVs included six deletions (on chromosomes 3, 6, 7, 8 and 19), and two duplications (on chromosomes 7 and 12). Among them, the CNV located at chromosome 6p21.3, with single-copy deletion of the MICA and HCP5 genes, showed the highest association with NPC. Interestingly, it was more specifically associated with an increased NPC risk among males. This gender-specific association was replicated in an independent case–control sample using a self-established deletion-specific polymerase chain reaction strategy. To the best of our knowledge, this is the first study to explore the role of constitutional CNVs in NPC, using a genome-wide platform. Moreover, we identified eight novel candidate regions with CNV that merit future investigation, and our results suggest that similar to neuroblastoma and prostate cancer, genetic structural variations might contribute to NPC predisposition.
Genetic studies have identified numerous genes reproducibly associated with asthma, yet these studies have focused almost entirely on single nucleotide polymorphisms (SNPs), and virtually ignored another highly prevalent form of genetic variation: Copy Number Variants (CNVs).
To survey the prevalence of CNVs in genes previously associated with asthma, and to assess whether CNVs represent the functional asthma-susceptibility variants at these loci.
We genotyped 383 asthmatic trios participating in the Childhood Asthma Management Program (CAMP) using a competitive genomic hybridization (CGH) array designed to interrogate 20,092 CNVs. To ensure comprehensive assessment of all potential asthma candidate genes, we purposely used liberal asthma gene inclusion criteria, resulting in consideration of 270 candidate genes previously implicated in asthma. We performed statistical testing using FBAT-CNV.
Copy number variation in asthma candidate genes was prevalent, with 21% of tested genes residing near or within one of 69 CNVs. In 6 instances, the complete candidate gene sequence resides within the CNV boundaries. On average, asthmatic probands carried 6 asthma-candidate CNVs (range 1–29). However, the vast majority of identified CNVs were of rare frequency (< 5%), and were not statistically associated with asthma. Modest evidence for association with asthma was observed for 2 CNVs near NOS1 and SERPINA3. Linkage disequilibrium analysis suggests that CNV effects are unlikely to explain previously detected SNP associations with asthma.
Although a substantial proportion of asthma-susceptibility genes harbor polymorphic CNVs, the majority of these variants do not confer increased asthma risk. The lack of linkage disequilibrium (LD) between CNVs and asthma-associated SNPs suggests that these CNVs are unlikely to represent the functional variant responsible for most known asthma associations.
Structural genetic changes, especially copy number variants (CNVs), represent a major source of genetic variation contributing to human disease. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease, but to date little is known about the role of CNVs in the etiology of TOF. Using high-resolution genome-wide microarrays and stringent calling methods, we investigated rare CNVs in a prospectively recruited cohort of 433 unrelated adults with TOF and/or pulmonary atresia at a single centre. We excluded those with recognized syndromes, including 22q11.2 deletion syndrome. We identified candidate genes for TOF based on converging evidence between rare CNVs that overlapped the same gene in unrelated individuals and from pathway analyses comparing rare CNVs in TOF cases to those in epidemiologic controls. Even after excluding the 53 (10.7%) subjects with 22q11.2 deletions, we found that adults with TOF had a greater burden of large rare genic CNVs compared to controls (8.82% vs. 4.33%, p = 0.0117). Six loci showed evidence for recurrence in TOF or related congenital heart disease, including typical 1q21.1 duplications in four (1.18%) of 340 Caucasian probands. The rare CNVs implicated novel candidate genes of interest for TOF, including PLXNA2, a gene involved in semaphorin signaling. Independent pathway analyses highlighted developmental processes as potential contributors to the pathogenesis of TOF. These results indicate that individually rare CNVs are collectively significant contributors to the genetic burden of TOF. Further, the data provide new evidence for dosage sensitive genes in PLXNA2-semaphorin signaling and related developmental processes in human cardiovascular development, consistent with previous animal models.
Congenital heart disease affects nearly 1% of all live births. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease. This condition is associated with hemizygous deletions of chromosome 22q11.2 and chromosomal trisomies, but little else is known about the genetic heterogeneity of this complex disease. We used high-resolution microarrays and stringent methods to study structural (copy number) variants in a systematically phenotyped cohort of unrelated adults with TOF. We found that individually rare genic copy number variants (CNVs) were collectively significant contributors to the genetic burden in TOF. Among CNVs that implicated candidate genes of interest were loss CNVs overlapping the PLXNA2 gene that codes for plexin A2. This is the first study to show a role for this semaphorin receptor in human congenital heart disease, consistent with a Plxna2 mouse knockout phenotype. Pathway analyses comparing rare exonic loss CNVs in the TOF sample to controls implicated other novel gene sets suggest new pathogenetic mechanisms.
A small number of rare, recurrent genomic copy number variants (CNVs) are known to substantially increase susceptibility to schizophrenia. As a consequence of the low fecundity in people with schizophrenia and other neurodevelopmental phenotypes to which these CNVs contribute, CNVs with large effects on risk are likely to be rapidly removed from the population by natural selection. Accordingly, such CNVs must frequently occur as recurrent de novo mutations. In a sample of 662 schizophrenia proband–parent trios, we found that rare de novo CNV mutations were significantly more frequent in cases (5.1% all cases, 5.5% family history negative) compared with 2.2% among 2623 controls, confirming the involvement of de novo CNVs in the pathogenesis of schizophrenia. Eight de novo CNVs occurred at four known schizophrenia loci (3q29, 15q11.2, 15q13.3 and 16p11.2). De novo CNVs of known pathogenic significance in other genomic disorders were also observed, including deletion at the TAR (thrombocytopenia absent radius) region on 1q21.1 and duplication at the WBS (Williams–Beuren syndrome) region at 7q11.23. Multiple de novos spanned genes encoding members of the DLG (discs large) family of membrane-associated guanylate kinases (MAGUKs) that are components of the postsynaptic density (PSD). Two de novos also affected EHMT1, a histone methyl transferase known to directly regulate DLG family members. Using a systems biology approach and merging novel CNV and proteomics data sets, systematic analysis of synaptic protein complexes showed that, compared with control CNVs, case de novos were significantly enriched for the PSD proteome (P=1.72 × 10−6). This was largely explained by enrichment for members of the N-methyl-D-aspartate receptor (NMDAR) (P=4.24 × 10−6) and neuronal activity-regulated cytoskeleton-associated protein (ARC) (P=3.78 × 10−8) postsynaptic signalling complexes. In an analysis of 18 492 subjects (7907 cases and 10 585 controls), case CNVs were enriched for members of the NMDAR complex (P=0.0015) but not ARC (P=0.14). Our data indicate that defects in NMDAR postsynaptic signalling and, possibly, ARC complexes, which are known to be important in synaptic plasticity and cognition, play a significant role in the pathogenesis of schizophrenia.
CNV; de novo; DLG; EHMT1; postsynaptic; schizophrenia
Copy number variants (CNVs) have recently been recognized as a common form of genomic variation in humans. Hundreds of CNVs can be detected in any individual genome using genomic microarrays or whole genome sequencing technology, but their phenotypic consequences are still poorly understood. Rare CNVs have been reported as a frequent cause of neurological disorders such as mental retardation (MR), schizophrenia and autism, prompting widespread implementation of CNV screening in diagnostics. In previous studies we have shown that, in contrast to benign CNVs, MR-associated CNVs are significantly enriched in genes whose mouse orthologues, when disrupted, result in a nervous system phenotype. In this study we developed and validated a novel computational method for differentiating between benign and MR-associated CNVs using structural and functional genomic features to annotate each CNV. In total 13 genomic features were included in the final version of a Naïve Bayesian Tree classifier, with LINE density and mouse knock-out phenotypes contributing most to the classifier's accuracy. After demonstrating that our method (called GECCO) perfectly classifies CNVs causing known MR-associated syndromes, we show that it achieves high accuracy (94%) and negative predictive value (99%) on a blinded test set of more than 1,200 CNVs from a large cohort of individuals with MR. These results indicate that this classification method will be of value for objectively prioritizing CNVs in clinical research and diagnostics.
Rare copy number variants (CNVs) are a frequent cause of neurological disorders such as mental retardation (MR). However CNVs are also commonly identified in healthy individuals. It is therefore crucial for both diagnostic and research applications to be able to distinguish between disease-causing CNVs and “benign” CNVs occurring as normal genomic variation. Separating these two types can take advantage of significant differences in their genomic contents. For example, benign CNVs are enriched in repetitive sequences. By contrast, CNVs associated with MR tend to have high densities of functional elements, including genes whose mouse orthologues, when knocked-out, lead to specific nervous system abnormalities. We have developed a novel objective approach that is effective in distinguishing MR-associated CNVs from benign CNVs based on the presence of 13 genomic attributes. This method is able to achieve high accuracies in a cohort of CNVs known to cause MR and in a cohort of individuals with unexplained MR. The development of this technique promises to substantially improve the methodology for determining the pathogenicity of CNVs.
Anthrax and its etiologic agent remain a biological threat. Anthrax vaccine is highly effective, but vaccine-induced IgG antibody responses vary widely following required doses of vaccinations. Such variation can be related to genetic factors, especially genomic copy number variants (CNVs) that are known to be enriched among genes with immunologic function. We have tested this hypothesis in two study populations from a clinical trial of anthrax vaccination.
We performed CNV-based genome-wide association analyses separately on 794 European Americans and 200 African-Americans. Antibodies to protective antigen were measured at week 8 (early response) and week 30 (peak response) using an enzyme-linked immunosorbent assay. We used DNA microarray data (Affymetrix 6.0) and two CNV detection algorithms, hidden markov model (PennCNV) and circular binary segmentation (GeneSpring) to determine CNVs in all individuals. Multivariable regression analyses were used to identify CNV-specific associations after adjusting for relevant non-genetic covariates.
Within the 22 autosomal chromosomes, 2,943 non-overlapping CNV regions were detected by both algorithms. Genomic insertions containing HLA-DRB5, DRB1 and DQA1/DRA genes in the major histocompatibility complex (MHC) region (chromosome 6p21.3) were moderately associated with elevated early antibody response (β = 0.14, p = 1.78×10−3) among European Americans, and the strongest association was observed between peak antibody response and a segmental insertion on chromosome 1, containing NBPF4, NBPF5, STXMP3, CLCC1, and GPSM2 genes (β = 1.66, p = 6.06×10−5). For African-Americans, segmental deletions spanning PRR20, PCDH17 and PCH68 genes on chromosome 13 were associated with elevated early antibody production (β = 0.18, p = 4.47×10−5). Population-specific findings aside, one genomic insertion on chromosome 17 (containing NSF, ARL17 and LRRC37A genes) was associated with elevated peak antibody response in both populations.
Multiple CNV regions, including the one consisting of MHC genes that is consistent with earlier research, can be important to humoral immune responses to anthrax vaccine adsorbed.
Genetic factors predisposing individuals to cancer remain elusive in the majority of patients with a familial or clinical history suggestive of hereditary breast cancer. Germline DNA copy number variation (CNV) has recently been implicated in predisposition to cancers such as neuroblastomas as well as prostate and colorectal cancer. We evaluated the role of germline CNVs in breast cancer susceptibility, in particular those with low population frequencies (rare CNVs), which are more likely to cause disease."
Using whole-genome comparative genomic hybridization on microarrays, we screened a cohort of women fulfilling criteria for hereditary breast cancer who did not carry BRCA1/BRCA2 mutations.
The median numbers of total and rare CNVs per genome were not different between controls and patients. A total of 26 rare germline CNVs were identified in 68 cancer patients, however, a proportion that was significantly different (P = 0.0311) from the control group (23 rare CNVs in 100 individuals). Several of the genes affected by CNV in patients and controls had already been implicated in cancer.
This study is the first to explore the contribution of germline CNVs to BRCA1/2-negative familial and early-onset breast cancer. The data suggest that rare CNVs may contribute to cancer predisposition in this small cohort of patients, and this trend needs to be confirmed in larger population samples.
Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits.
A major goal of genetics and genomics is to understand how genetic differences between individuals (genotypes) translate into variation in disease susceptibility, behavior, and many other organism-level characteristics (phenotypes). While the sizes of genetic variants range from a single base to whole chromosomes, historically, only the extreme ends of this spectrum have been explored. DNA copy number variants (CNVs) lie between these two extremes, ranging in size from hundreds to millions of bases. The recent application of microarray technology to detect genetic variation in humans has led to the realization that CNVs are common. In fact, rough estimates indicate that CNVs and small-scale variants may constitute similar proportions of total genomic DNA. In this report, the authors characterize 80 CNVs across the genomes of 21 inbred strains of mice. The identification and characterization of mouse CNVs are important because inbred strains of mice are the most widely used model system to explore biomedical genetics. These CNVs are located near another class of genomic features, segmental duplications, more often than would be expected by chance, which supports the hypothesis that CNVs and segmental duplications are causally linked. Importantly, many of the CNVs contain known genes and thus may underlie both gene expression and phenotypic variation between strains.
Copy number variation (CNV) is one of the most prevalent genetic variations in the genome, leading to an abnormal number of copies of moderate to large genomic regions. High-throughput technologies such as next-generation sequencing often identify thousands of CNVs involved in biological or pathological processes. Despite the growing demand to filter and classify CNVs by factors such as frequency in population, biological features, and function, surprisingly, no online web server for CNV annotations has been made available to the research community. Here, we present CNVannotator, a web server that accepts an input set of human genomic positions in a user-friendly tabular format. CNVannotator can perform genomic overlaps of the input coordinates using various functional features, including a list of the reported 356,817 common CNVs, 181,261 disease CNVs, as well as, 140,342 SNPs from genome-wide association studies. In addition, CNVannotator incorporates 2,211,468 genomic features, including ENCODE regulatory elements, cytoband, segmental duplication, genome fragile site, pseudogene, promoter, enhancer, CpG island, and methylation site. For cancer research community users, CNVannotator can apply various filters to retrieve a subgroup of CNVs pinpointed in hundreds of tumor suppressor genes and oncogenes. In total, 5,277,234 unique genomic coordinates with functional features are available to generate an output in a plain text format that is free to download. In summary, we provide a comprehensive web resource for human CNVs. The annotated results along with the server can be accessed at http://bioinfo.mc.vanderbilt.edu/CNVannotator/.
Investigators have linked rare copy number variation (CNVs) to neuropsychiatric diseases, such as schizophrenia. One hypothesis is that CNV events cause disease by affecting genes with specific brain functions. Under these circumstances, we expect that CNV events in cases should impact brain-function genes more frequently than those events in controls. Previous publications have applied “pathway” analyses to genes within neuropsychiatric case CNVs to show enrichment for brain-functions. While such analyses have been suggestive, they often have not rigorously compared the rates of CNVs impacting genes with brain function in cases to controls, and therefore do not address important confounders such as the large size of brain genes and overall differences in rates and sizes of CNVs. To demonstrate the potential impact of confounders, we genotyped rare CNV events in 2,415 unaffected controls with Affymetrix 6.0; we then applied standard pathway analyses using four sets of brain-function genes and observed an apparently highly significant enrichment for each set. The enrichment is simply driven by the large size of brain-function genes. Instead, we propose a case-control statistical test, cnv-enrichment-test, to compare the rate of CNVs impacting specific gene sets in cases versus controls. With simulations, we demonstrate that cnv-enrichment-test is robust to case-control differences in CNV size, CNV rate, and systematic differences in gene size. Finally, we apply cnv-enrichment-test to rare CNV events published by the International Schizophrenia Consortium (ISC). This approach reveals nominal evidence of case-association in neuronal-activity and the learning gene sets, but not the other two examined gene sets. The neuronal-activity genes have been associated in a separate set of schizophrenia cases and controls; however, testing in independent samples is necessary to definitively confirm this association. Our method is implemented in the PLINK software package.
Specific rare deletion and duplication events in the genome have now been shown to be associated with neuropsychiatric diseases such as 16p11.2 to autism and 22q11.21 to schizophrenia. However, controversy remains as to whether rare events impacting certain pathways as a group increase the risk of disease, and if so, what those pathways are. Other studies have used standard gene-set enrichment approaches to demonstrate that events discovered in cases contain more genes in neuro-developmental pathways than would be expected by chance. However, these analyses do not explicitly compare the relative enrichment in cases to any enrichment that may also be present in controls. Therefore, they can be confounded by the large size of brain genes or by larger size or frequency of CNVs in cases. Here we propose a case-control statistical test to assess whether a key pathway is differentially impacted by CNVs in cases compared to controls. Our approach is robust to skewed gene sizes and case-control differences in CNV rate and size.
Many DNA copy-number variations (CNVs) are known to lead to phenotypic variations and pathogenesis. While CNVs are often only common in a small number of samples in the studied population or patient cohort, previous work has not focused on customized identification of CNV regions that only exhibit in subsets of samples with advanced data mining techniques to reliably answer questions such as “Which are all the chromosomal fragments showing nearly identical deletions or insertions in more than 30% of the individuals?”.
We introduce a tool for mining CNV subspace patterns, namely SubPatCNV, which is capable of identifying all aberrant CNV regions specific to arbitrary sample subsets larger than a support threshold. By design, SubPatCNV is the implementation of a variation of approximate association pattern mining algorithm under a spatial constraint on the positional CNV probe features. In benchmark test, SubPatCNV was applied to identify population specific germline CNVs from four populations of HapMap samples. In experiments on the TCGA ovarian cancer dataset, SubPatCNV discovered many large aberrant CNV events in patient subgroups, and reported regions enriched with cancer relevant genes. In both HapMap data and TCGA data, it was observed that SubPatCNV employs approximate pattern mining to more effectively identify CNV subspace patterns that are consistent within a subgroup from high-density array data.
SubPatCNV available through http://sourceforge.net/projects/subpatcnv/is a unique scalable open-source software tool that provides the flexibility of identifying CNV regions specific to sample subgroups of different sizes from high-density CNV array data.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-014-0426-7) contains supplementary material, which is available to authorized users.
DNA copy-number variations; Approximate pattern mining; HapMap; Cancer
The human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-β-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.
We used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.
These results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.
Recent discovery of the copy number variation (CNV) in normal individuals has widened our understanding of genomic variation. However, most of the reported CNVs have been identified in Caucasians, which may not be directly applicable to people of different ethnicities. To profile CNV in East-Asian population, we screened CNVs in 3578 healthy, unrelated Korean individuals, using the Affymetrix Genome-Wide Human SNP array 5.0. We identified 144 207 CNVs using a pooled data set of 100 randomly chosen Korean females as a reference. The average number of CNVs per genome was 40.3, which is higher than that of CNVs previously reported using lower resolution platforms. The median size of CNVs was 18.9 kb (range 0.2–5406 kb). Copy number losses were 4.7 times more frequent than copy number gains. CNV regions (CNVRs) were defined by merging overlapping CNVs identified in two or more samples. In total, 4003 CNVRs were defined encompassing 241.9 Mb accounting for ∼8% of the human genome. A total of 2077 CNVRs (51.9%) were potentially novel. Known CNVRs were larger and more frequent than novel CNVRs. Sixteen percent of the CNVRs were observed in ≥1% of study subjects and 24% overlapped with the OMIM genes. A total of 476 (11.9%) CNVRs were associated with segmental duplications. CNVS/CNVRs identified in this study will be valuable resources for studying human genome diversity and its association with disease.
Attention-deficit/hyperactivity disorder (ADHD) is a common, highly heritable neurodevelopmental disorder. Genetic loci have not yet been identified by genome-wide association studies. Rare copy number variations (CNVs), such as chromosomal deletions or duplications, have been implicated in ADHD and other neurodevelopmental disorders. To identify rare (frequency ⩽1%) CNVs that increase the risk of ADHD, we performed a whole-genome CNV analysis based on 489 young ADHD patients and 1285 adult population-based controls and identified one significantly associated CNV region. In tests for a global burden of large (>500 kb) rare CNVs, we observed a nonsignificant (P=0.271) 1.126-fold enriched rate of subjects carrying at least one such CNV in the group of ADHD cases. Locus-specific tests of association were used to assess if there were more rare CNVs in cases compared with controls. Detected CNVs, which were significantly enriched in the ADHD group, were validated by quantitative (q)PCR. Findings were replicated in an independent sample of 386 young patients with ADHD and 781 young population-based healthy controls. We identified rare CNVs within the parkinson protein 2 gene (PARK2) with a significantly higher prevalence in ADHD patients than in controls (P=2.8 × 10−4 after empirical correction for genome-wide testing). In total, the PARK2 locus (chr 6: 162 659 756–162 767 019) harboured three deletions and nine duplications in the ADHD patients and two deletions and two duplications in the controls. By qPCR analysis, we validated 11 of the 12 CNVs in ADHD patients (P=1.2 × 10−3 after empirical correction for genome-wide testing). In the replication sample, CNVs at the PARK2 locus were found in four additional ADHD patients and one additional control (P=4.3 × 10−2). Our results suggest that copy number variants at the PARK2 locus contribute to the genetic susceptibility of ADHD. Mutations and CNVs in PARK2 are known to be associated with Parkinson disease.
ADHD; children; CNVs; GWAS; PARK2
Chromosomal instability in exfoliated urothelial cells has been associated with the development of bladder cancer. Here, we analyzed the accumulation of copy number variations (CNVs) using fluorescence in situ hybridization in cancer cases and explored factors associated with the detection of CNVs in tumor-free men.
The prospective UroScreen study was designed to investigate the performance of UroVysion™ and other tumor tests for the early detection of bladder cancer in chemical workers from 2003–2010. We analyzed a database compiling CNVs of chromosomes 3, 7, and 17 and at 9p21 that were detected in 191,434 exfoliated urothelial cells from 1,595 men. We assessed the accumulation of CNVs in 1,400 cells isolated from serial samples that were collected from 18 cancer cases up to the time of diagnosis. A generalized estimating equation model was applied to evaluate the influence of age, smoking, and urine status on CNVs in cells from tumor-free men.
Tetrasomy of chromosomes 3, 7 and 17, and DNA loss at 9p21 were the most frequently observed forms of CNV. In bladder cancer cases, we observed an accumulation of CNVs that started approximately three years before diagnosis. During the year prior to diagnosis, cells from men with high-grade bladder cancer accumulated more CNVs than those obtained from cases with low-grade cancer (CNV < 2: 7.5% vs. 1.1%, CNV > 2: 16-17% vs. 9-11%). About 1% of cells from tumor-free men showed polysomy of chromosomes 3, 7, or 17 or DNA loss at 9p21. Men aged ≥50 years had 1.3-fold more cells with CNVs than younger men; however, we observed no further age-related accumulation of CNVs in tumor-free men. Significantly more cells with CNVs were detected in samples with low creatinine concentrations.
We found an accumulation of CNVs during the development of bladder cancer starting three years before diagnosis, with more altered cells identified in high-grade tumors. Also, a small fraction of cells with CNVs were exfoliated into urine of tumor-free men, mainly exhibiting tetraploidy or DNA loss at 9p21. Whether these cells are preferentially cleared from the urothelium or are artifacts needs further exploration.
Aneuploidy; Bladder cancer; Chromosomal instability; Copy number variation; DNA gain; DNA loss; Fluorescence in situ hybridization; Tetrasomy
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery.
Methodology and Principal Findings
We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212.
Conclusions and Significance
Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
Infantile spasms (IS) is a specific type of epileptic encephalopathy associated with severe developmental disabilities. Genetic factors are strongly implicated in IS, however, the exact genetic defects remain unknown in the majority of cases. Rare mutations in a single gene or in copy number variants (CNVs) have been implicated in IS of children in Western countries. The objective of this study was to dissect the role of copy number variations in Chinese children with infantile spasms.
We used the Agilent Human Genome CGH microarray 180 K for genome-wide detection of CNVs. Real-time qPCR was used to validate the CNVs. We performed genomic and medical annotations for individual CNVs to determine the pathogenicity of CNVs related to IS.
We report herein the first genome-wide CNV analysis in children with IS, detecting a total of 14 CNVs in a cohort of 47 Chinese children with IS. Four CNVs (4/47 = 8.5%) (1q21.1 gain; 1q44, 2q31.1, and 17p13 loss) are considered to be pathogenic. The CNV loss at 17p13.3 contains PAFAH1B1 (LIS1), a causative gene for lissencephaly. Although the CNVs at 1q21.1, 1q44, and 2q23.1 have been previously implicated in a wide spectrum of clinical features including autism spectrum disorders (ASD) and generalized seizure, our study is the first report identifying them in individuals with a primary diagnosis of IS. The CNV loss in the 1q44 region contains HNRNPU, a strong candidate gene recently suggested in IS by the whole exome sequencing of children with IS. The CNV loss at 2q23.1 includes MBD5, a methyl-DNA binding protein that is a causative gene of ASD and a candidate gene for epileptic encephalopathy. We also report a distinct clinical presentation of IS, microcephaly, intellectual disability, and absent hallux in a case with the 2q23.1 deletion.
Our findings strongly support the role of CNVs in infantile spasms and expand the clinical spectrum associate with 2q23.1 deletion. In particular, our study implicates the HNRNPU and MBD5 genes in Chinese children with IS. Our study also supports that the molecular mechanisms of infantile spasms appear conserved among different ethnic backgrounds.
Infantile spasms; Copy number variants; Array CGH; Autism spectrum disorders; MBD5; HNRNPU
Genomic copy number variants (CNVs) involving >1 kb of DNA have recently been found to be widely distributed throughout the human genome. They represent a newly recognized form of DNA variation in normal populations, discovered through screening of the human genome using high-throughput and high resolution methods such as array comparative genomic hybridization (array-CGH). In order to understand their potential significance and to facilitate interpretation of array-CGH findings in constitutional disorders and cancers, we studied 27 normal individuals (9 Caucasian; 9 African American; 9 Hispanic) using commercially available 1 Mb resolution BAC array (Spectral Genomics). A selection of CNVs was further analyzed by FISH and real-time quantitative PCR (RT-qPCR).
A total of 42 different CNVs were detected in 27 normal subjects. Sixteen (38%) were not previously reported. Thirteen of the 42 CNVs (31%) contained 28 genes listed in OMIM. FISH analysis of 6 CNVs (4 previously reported and 2 novel CNVs) in normal subjects resulted in the confirmation of copy number changes for 1 of 2 novel CNVs and 2 of 4 known CNVs. Three CNVs tested by FISH were further validated by RT-qPCR and comparable data were obtained. This included the lack of copy number change by both RT-qPCR and FISH for clone RP11-100C24, one of the most common known copy number variants, as well as confirmation of deletions for clones RP11-89M16 and RP5-1011O17.
We have described 16 novel CNVs in 27 individuals. Further study of a small selection of CNVs indicated concordant and discordant array vs. FISH/RT-qPCR results. Although a large number of CNVs has been reported to date, quantification using independent methods and detailed cellular and/or molecular assessment has been performed on a very small number of CNVs. This information is, however, very much needed as it is currently common practice to consider CNVs reported in normal subjects as benign changes when detected in individuals affected with a variety of developmental disorders.
Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype.
We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry.
We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case–control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins1–4. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs5–9. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with ~550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 × 10−3). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 × 10−3). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 × 10−6). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.
Genome-wide association studies (GWASs) have identified multiple genetic susceptibility loci for breast cancer. However, these loci explain only a small fraction of the heritability. Very few studies have evaluated copy number variation (CNV), another important source of human genetic variation, in relation to breast cancer risk.
We conducted a CNV GWAS in 2623 breast cancer patients and 1946 control subjects using data from Affymetrix SNP Array 6.0 (stage 1). We then replicated the most promising CNV using real-time quantitative polymerase chain reaction (qPCR) in an independent set of 4254 case patients and 4387 control subjects (stage 2). All subjects were recruited from population-based studies conducted among Chinese women in Shanghai.
Of the 268 common CNVs (minor allele frequency ≥ 5%) investigated in stage 1, the strongest association was found for a common deletion in the APOBEC3 genes (P = 1.1×10−4) and was replicated in stage 2 (odds ratio =1.35, 95% confidence interval [CI] = 1.27 to 1.44; P = 9.6×10−22). Analyses of all samples from both stages using qPCR data produced odds ratios of 1.31 (95% CI = 1.21 to 1.42) for a one-copy deletion and 1.76 (95% CI = 1.57 to 1.97) for a two-copy deletion (P = 2.0×10−24).
We provide convincing evidence for a novel breast cancer locus at the APOBEC3 genes. This CNV is one of the strongest common genetic risk variants identified so far for breast cancer.
MicroRNAs (miRNAs) are a family of short, non-coding RNAs modulating expression of human protein coding genes (miRNA target genes). Their dysfunction is associated with many human diseases, including neurodevelopmental disorders. It has been recently shown that genomic copy number variations (CNVs) can cause aberrant expression of integral miRNAs and their target genes, and contribute to intellectual disability (ID).
To better understand the CNV-miRNA relationship in ID, we investigated the prevalence and function of miRNAs and miRNA target genes in five groups of CNVs. Three groups of CNVs were from 213 probands with ID (24 de novo CNVs, 46 familial and 216 common CNVs), one group of CNVs was from a cohort of 32 cognitively normal subjects (67 CNVs) and one group of CNVs represented 40 ID related syndromic regions listed in DECIPHER (30 CNVs) which served as positive controls for CNVs causing or predisposing to ID. Our results show that 1). The number of miRNAs is significantly higher in de novo or DECIPHER CNVs than in familial or common CNV subgroups (P < 0.01). 2). miRNAs with brain related functions are more prevalent in de novo CNV groups compared to common CNV groups. 3). More miRNA target genes are found in de novo, familial and DECIPHER CNVs than in the common CNV subgroup (P < 0.05). 4). The MAPK signaling cascade is found to be enriched among the miRNA target genes from de novo and DECIPHER CNV subgroups.
Our findings reveal an increase in miRNA and miRNA target gene content in de novo versus common CNVs in subjects with ID. Their expression profile and participation in pathways support a possible role of miRNA copy number change in cognition and/or CNV-mediated developmental delay. Systematic analysis of expression/function of miRNAs in addition to coding genes integral to CNVs could uncover new causes of ID.
Micro RNA (miRNA); Copy number variants (CNVs); Copy number variant regions (CNVRs); Intellectual disabilities (ID); Functional pathways
Copy number variants (CNVs) are known to cause Mendelian forms of Parkinson disease (PD), most notably in SNCA and PARK2. PARK2 has a recessive mode of inheritance; however, recent evidence demonstrates that a single CNV in PARK2 (but not a single missense mutation) may increase risk for PD. We recently performed a genome-wide association study for PD that excluded individuals known to have either a LRRK2 mutation or two PARK2 mutations. Data from the Illumina370Duo arrays were re-clustered using only white individuals with high quality intensity data, and CNV calls were made using two algorithms, PennCNV and QuantiSNP. After quality assessment, the final sample included 816 cases and 856 controls. Results varied between the two CNV calling algorithms for many regions, including the PARK2 locus (genome-wide p = 0.04 for PennCNV and p = 0.13 for QuantiSNP). However, there was consistent evidence with both algorithms for two novel genes, USP32 and DOCK5 (empirical, genome-wide p-values<0.001). PARK2 CNVs tended to be larger, and all instances that were molecularly tested were validated. In contrast, the CNVs in both novel loci were smaller and failed to replicate using real-time PCR, MLPA, and gel electrophoresis. The DOCK5 variation is more akin to a VNTR than a typical CNV and the association is likely caused by artifact due to DNA source. DNA for all the cases was derived from whole blood, while the DNA for all controls was derived from lymphoblast cell lines. The USP32 locus contains many SNPs with low minor allele frequency leading to a loss of heterozygosity that may have been spuriously interpreted by the CNV calling algorithms as support for a deletion. Thus, only the CNVs within the PARK2 locus could be molecularly validated and associated with PD susceptibility.
Copy number data are routinely being extracted from genome-wide association study chips using a variety of software. We empirically evaluated and compared four freely-available software packages designed for Affymetrix SNP chips to estimate copy number: Affymetrix Power Tools (APT), Aroma.Affymetrix, PennCNV and CRLMM. Our evaluation used 1,418 GENOA samples that were genotyped on the Affymetrix Genome-Wide Human SNP Array 6.0. We compared bias and variance in the locus-level copy number data, the concordance amongst regions of copy number gains/deletions and the false-positive rate amongst deleted segments.
APT had median locus-level copy numbers closest to a value of two, whereas PennCNV and Aroma.Affymetrix had the smallest variability associated with the median copy number. Of those evaluated, only PennCNV provides copy number specific quality-control metrics and identified 136 poor CNV samples. Regions of copy number variation (CNV) were detected using the hidden Markov models provided within PennCNV and CRLMM/VanillaIce. PennCNV detected more CNVs than CRLMM/VanillaIce; the median number of CNVs detected per sample was 39 and 30, respectively. PennCNV detected most of the regions that CRLMM/VanillaIce did as well as additional CNV regions. The median concordance between PennCNV and CRLMM/VanillaIce was 47.9% for duplications and 51.5% for deletions. The estimated false-positive rate associated with deletions was similar for PennCNV and CRLMM/VanillaIce.
If the objective is to perform statistical tests on the locus-level copy number data, our empirical results suggest that PennCNV or Aroma.Affymetrix is optimal. If the objective is to perform statistical tests on the summarized segmented data then PennCNV would be preferred over CRLMM/VanillaIce. Specifically, PennCNV allows the analyst to estimate locus-level copy number, perform segmentation and evaluate CNV-specific quality-control metrics within a single software package. PennCNV has relatively small bias, small variability and detects more regions while maintaining a similar estimated false-positive rate as CRLMM/VanillaIce. More generally, we advocate that software developers need to provide guidance with respect to evaluating and choosing optimal settings in order to obtain optimal results for an individual dataset. Until such guidance exists, we recommend trying multiple algorithms, evaluating concordance/discordance and subsequently consider the union of regions for downstream association tests.