Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype.
We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry.
We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case–control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
Obesity is an increasingly common disorder that predisposes to several medical conditions, including type 2 diabetes. We investigated whether large and rare copy-number variations (CNVs) differentiate moderate to extreme obesity from never-overweight control subjects.
RESEARCH DESIGN AND METHODS
Using single nucleotide polymorphism (SNP) arrays, we performed a genome-wide CNV survey on 430 obese case subjects (BMI >35 kg/m2) and 379 never-overweight control subjects (BMI <25 kg/m2). All subjects were of European ancestry and were genotyped on the Illumina HumanHap550 arrays with ∼550,000 SNP markers. The CNV calls were generated by PennCNV software.
CNVs >1 Mb were found to be overrepresented in case versus control subjects (odds ratio [OR] = 1.5 [95% CI 0.5–5]), and CNVs >2 Mb were present in 1.3% of the case subjects but were absent in control subjects (OR = infinity [95% CI 1.2–infinity]). When focusing on rare deletions that disrupt genes, even more pronounced effect sizes are observed (OR = 2.7 [95% CI 0.5–27.1] for CNVs >1 Mb). Interestingly, obese case subjects who carry these large CNVs have moderately high BMI and do not appear to be extreme cases. Several CNVs disrupt known candidate genes for obesity, such as a 3.3-Mb deletion disrupting NAP1L5 and a 2.1-Mb deletion disrupting UCP1 and IL15.
Our results suggest that large CNVs, especially rare deletions, confer risk of obesity in patients with moderate obesity and that genes impacted by large CNVs represent intriguing candidates for obesity that warrant further study.
Autism spectrum disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins1–4. Previous studies focusing on candidate genes or genomic regions have identified several copy number variations (CNVs) that are associated with an increased risk of ASDs5–9. Here we present the results from a whole-genome CNV study on a cohort of 859 ASD cases and 1,409 healthy children of European ancestry who were genotyped with ~550,000 single nucleotide polymorphism markers, in an attempt to comprehensively identify CNVs conferring susceptibility to ASDs. Positive findings were evaluated in an independent cohort of 1,336 ASD cases and 1,110 controls of European ancestry. Besides previously reported ASD candidate genes, such as NRXN1 (ref. 10) and CNTN4 (refs 11, 12), several new susceptibility genes encoding neuronal cell-adhesion molecules, including NLGN1 and ASTN2, were enriched with CNVs in ASD cases compared to controls (P = 9.5 × 10−3). Furthermore, CNVs within or surrounding genes involved in the ubiquitin pathways, including UBE3A, PARK2, RFWD2 and FBXO40, were affected by CNVs not observed in controls (P = 3.3 × 10−3). We also identified duplications 55 kilobases upstream of complementary DNA AK123120 (P = 3.6 × 10−6). Although these variants may be individually rare, they target genes involved in neuronal cell-adhesion or ubiquitin degradation, indicating that these two important gene networks expressed within the central nervous system may contribute to the genetic susceptibility of ASD.
Bipolar disorder (BPD) is a common psychiatric illness with a complex mode of inheritance. Besides traditional linkage and association studies, which require large sample sizes, analysis of common and rare chromosomal copy number variants (CNVs) in extended families may provide novel insights into the genetic susceptibility of complex disorders. Using the Illumina HumanHap550 BeadChip with over 550,000 SNP markers, we genotyped 46 individuals in a three-generation Old Order Amish pedigree with 19 affected (16 BPD and three major depression) and 27 unaffected subjects. Using the PennCNV algorithm, we identified 50 CNV regions that ranged in size from 12 to 885 kb and encompassed at least 10 single nucleotide polymorphisms (SNPs). Of 19 well characterized CNV regions that were available for combined genotype-expression analysis 11 (58%) were associated with expression changes of genes within, partially within or near these CNV regions in fibroblasts or lymphoblastoid cell lines at a nominal P value <0.05. To further investigate the mode of inheritance of CNVs in the large pedigree, we analyzed a set of four CNVs, located at 6q27, 9q21.11, 12p13.31 and 15q11, all of which were enriched in subjects with affective disorders. We additionally show that these variants affect the expression of neuronal genes within or near the rearrangement. Our analysis suggests that family based studies of the combined effect of common and rare CNVs at many loci may represent a useful approach in the genetic analysis of disease susceptibility of mental disorders.
Studies that analyzed single nucleotide polymorphisms (SNP) in various genes have shown that genetic factors are strongly associated with age-related macular degeneration (AMD) susceptibility. Copy number variation (CNV) may be an additional type of genetic variation that contributes to AMD pathogenesis. This study investigated CNV in 4 AMD-relevant genes in Korean AMD patients and control subjects.
Four CNV candidate regions located in AMD-relevant genes (VEGFA, ARMS2/HTRA1, CFH and VLDLR), were selected based on the outcomes of our previous study which elucidated common CNVs in the Asian populations. Real-time PCR based TaqMan Copy Number Assays were performed on CNV candidates in 273 AMD patients and 257 control subjects.
The predicted copy number (PCN, 0, 1, 2 or 3+) of each region was called using the CopyCaller program. All candidate genes except ARMS2/HTRA1 showed CNV in at least one individual, in which losses of VEGFA and VLDLR represent novel findings in the Asian population. When the frequencies of PCN were compared, only the gain in VLDLR showed significant differences between AMD patients and control subjects (p = 0.025). Comparisons of the raw copy values (RCV) revealed that 3 of 4 candidate genes showed significant differences (2.03 vs. 1.92 for VEGFA, p<0.01; 2.01 vs. 1.97 for CFH, p<0.01; 1.97 vs. 2.01, p<0.01 for ARMS2/HTRA1).
CNVs located in AMD-relevant genes may be associated with AMD susceptibility. Further investigations encompassing larger patient cohorts are needed to elucidate the role of CNV in AMD pathogenesis.
Structural variations such as copy number variants (CNV) influence the expression of different phenotypic traits. Algorithms to identify CNVs through SNP-array platforms are available. The ability to evaluate well-characterized CNVs such as GSTM1 (1p13.3) deletion provides an important opportunity to assess their performance.
773 cases and 759 controls from the SBC/EPICURO Study were genotyped in the GSTM1 region using TaqMan, Multiplex Ligation-dependent Probe Amplification (MLPA), and Illumina Infinium 1 M SNP-array platforms. CNV callings provided by TaqMan and MLPA were highly concordant and replicated the association between GSTM1 and bladder cancer. This was not the case when CNVs were called using Illumina 1 M data through available algorithms since no deletion was detected across the study samples. In contrast, when the Log R Ratio (LRR) was used as a continuous measure for the 5 probes contained in this locus, we were able to detect their association with bladder cancer using simple regression models or more sophisticated methods such as the ones implemented in the CNVtools package.
This study highlights an important limitation in the CNV calling from SNP-array data in regions of common aberrations and suggests that there may be added advantage for using LRR as a continuous measure in association tests rather than relying on calling algorithms.
Bladder cancer risk; Glutathione S-transferase mu 1 (GSTM1); Copy number variation (CNV); SNP-array
Alcohol dependence (AD) is a complex disorder characterized by psychiatric and physiological dependence on alcohol. AD is reflected by regular alcohol drinking, which is highly inheritable. In this study, to identify susceptibility genes associated with alcohol drinking, we performed a genome-wide association study of copy number variants (CNVs) in 2,286 Caucasian subjects with Affymetrix SNP6.0 genotyping array. We replicated our findings in 1,627 Chinese subjects with the same genotyping array. We identified two CNVs, CNV207 (combined p-value 1.91E-03) and CNV1836 (combined p-value 3.05E-03) that were associated with alcohol drinking. CNV207 and CNV1836 are located at the downstream of genes LTBP1 (870 kb) and FGD4 (400 kb), respectively. LTBP1, by interacting TGFB1, may down-regulate enzymes directly participating in alcohol metabolism. FGD4 plays a role in clustering and trafficking GABAA receptor and subsequently influence alcohol drinking through activating CDC42. Our results provide suggestive evidence that the newly identified CNV regions and relevant genes may contribute to the genetic mechanism of alcohol dependence.
Variation in human intelligence is approximately 50% heritable, but understanding of the genes involved is limited. Several forms of genetic variation remain under-studied in relation to intelligence, one of which is copy number variation (CNV). Using single-nucleotide polymorphism (SNP) -based microarrays, we genotyped CNVs genome-wide in a birth cohort of 723 New Zealanders, and correlated them with four intelligence-related phenotypes. We found no significant association for any common CNV after false discovery correction, which is consistent with previous work. In contrast to a previous study, however, we found no effect on any cognitive measure of rare CNV burden, defined as total number of bases inserted or deleted in CNVs rarer than 5%. We discuss possible reasons for this failure to replicate, including interaction between CNV and aging in determining the effects of rare CNVs. While our results suggest that no CNV assayable by SNP chips contributes more than a very small amount to variation in human intelligence, it remains possible that common CNVs in segmental duplication arrays, which are not well covered by SNP chips, are important contributors.
Genetic factors predisposing individuals to cancer remain elusive in the majority of patients with a familial or clinical history suggestive of hereditary breast cancer. Germline DNA copy number variation (CNV) has recently been implicated in predisposition to cancers such as neuroblastomas as well as prostate and colorectal cancer. We evaluated the role of germline CNVs in breast cancer susceptibility, in particular those with low population frequencies (rare CNVs), which are more likely to cause disease."
Using whole-genome comparative genomic hybridization on microarrays, we screened a cohort of women fulfilling criteria for hereditary breast cancer who did not carry BRCA1/BRCA2 mutations.
The median numbers of total and rare CNVs per genome were not different between controls and patients. A total of 26 rare germline CNVs were identified in 68 cancer patients, however, a proportion that was significantly different (P = 0.0311) from the control group (23 rare CNVs in 100 individuals). Several of the genes affected by CNV in patients and controls had already been implicated in cancer.
This study is the first to explore the contribution of germline CNVs to BRCA1/2-negative familial and early-onset breast cancer. The data suggest that rare CNVs may contribute to cancer predisposition in this small cohort of patients, and this trend needs to be confirmed in larger population samples.
A major motivation for seeking disease-associated genetic variation is to identify novel risk processes. Although rare copy number variants (CNVs) appear to contribute to attention deficit hyperactivity disorder (ADHD), common risk variants (single-nucleotide polymorphisms [SNPs]) have not yet been detected using genome-wide association studies (GWAS). This raises the concern as to whether future larger-scale, adequately powered GWAS will be worthwhile. The authors undertook a GWAS of ADHD and examined whether associated SNPs, including those below conventional levels of significance, influenced the same biological pathways affected by CNVs.
The authors analyzed genome-wide SNP frequencies in 727 children with ADHD and 5,081 comparison subjects. The gene sets that were enriched in a pathway analysis of the GWAS data (the top 5% of SNPs) were tested for an excess of genes spanned by large, rare CNVs in the children with ADHD.
No SNP achieved genome-wide significance levels. As previously reported in a subsample of the present study, large, rare CNVs were significantly more common in case subjects than comparison subjects. Thirteen biological pathways enriched for SNP association significantly overlapped with those enriched for rare CNVs. These included cholesterol-related and CNS development pathways. At the level of individual genes, CHRNA7, which encodes a nicotinic receptor subunit previously implicated in neuropsychiatric disorders, was affected by six large duplications in case subjects (none in comparison subjects), and SNPs in the gene had a gene-wide p value of 0.0002 for association in the GWAS.
Both common and rare genetic variants appear to be relevant to ADHD and index-shared biological pathways.
To date, hundreds of thousands of copy-number variation (CNV) data have been reported using various platforms. The proportion of Asians in these data is, however, relatively small as compared with that of other ethnic groups, such as Caucasians and Yorubas. Because of limitations in platform resolution and the high noise level in signal intensity, in most CNV studies (particularly those using single nucleotide polymorphism arrays), the average number of CNVs in an individual is less than the number of known CNVs. In this study, we ascertained reliable, common CNV regions (CNVRs) and identified actual frequency rates in the Korean population to provide more CNV information. We performed two-stage analyses for detecting structural variations with two platforms. We discovered 576 common CNVRs (88 CNV segments on average in an individual), and 87% (501 of 576) of these CNVRs overlapped by ≥1 bp with previously validated CNV events. Interestingly, from the frequency analysis of CNV profiles, 52 of 576 CNVRs had a frequency rate of <1% in the 8842 individuals. Compared with other common CNV studies, this study found six common CNVRs that were not reported in previous CNV studies. In conclusion, we propose the data-driven detection approach to discover common CNVRs including those of unreported in the previous Korean CNV study while minimizing false positives. Through our approach, we successfully discovered more common CNVRs than previous Korean CNV study and conducted frequency analysis. These results will be a valuable resource for the effective level of CNVs in the Korean population.
common copy-number variation; CNV profile; Asian CNV; structural variation
Copy-number variants (CNVs) are a source of genetic variation that increasingly are associated with human disease. However, the role of CNVs in human lifespan is to date unknown. To identify CNVs that influence mortality at old age, we analyzed genome-wide CNV data in 5178 participants of Rotterdam Study (RS1) and positive findings were evaluated in 1714 participants of the second cohort of the Rotterdam Study (RS2) and in 4550 participants of Framingham Heart Study (FHS). First, we assessed the total burden of rare (frequency <1%) and common (frequency >1%) CNVs for association with mortality during follow-up. These analyses were repeated by stratifying CNVs by type and size. Secondly, we assessed individual common CNV regions (CNVR) for association with mortality. We observed that the burden of common but not of rare CNVs influences mortality. A higher burden of large (≥500 kb) common deletions associated with 4% higher mortality [hazard ratio (HR) per CNV 1.04, 95% confidence interval (CI) 1.02–1.07, P = 5.82 × 10−5] in the 11 442 participants of RS1, RS2 and FHS. In the analysis of 312 individual common CNVRs, we identified two regions (11p15.5; 14q21.3) that associated with higher mortality in these cohorts. The 11p15.5 region (combined HR 1.59, 95% CI 1.31–1.93, P = 2.87 × 10−6) encompasses 41 genes, of which some have previously been related to longevity, whereas the 14q21.3 region (combined HR 1.57, 95% CI 1.19–2.07, P = 1.53 × 10−3) does not encompass any genes. In conclusion, the burden of large common deletions, as well as common CNVs in 11p15.5 and 14q21.3 region, associate with higher mortality.
Accurate and efficient genome-wide detection of copy number variants (CNVs) is essential for understanding human genomic variation, genome-wide CNV association type studies, cytogenetics research and diagnostics, and independent validation of CNVs identified from sequencing based technologies. Numerous, array-based platforms for CNV detection exist utilizing array Comparative Genome Hybridization (aCGH), Single Nucleotide Polymorphism (SNP) genotyping or both. We have quantitatively assessed the abilities of twelve leading genome-wide CNV detection platforms to accurately detect Gold Standard sets of CNVs in the genome of HapMap CEU sample NA12878, and found significant differences in performance. The technologies analyzed were the NimbleGen 4.2 M, 2.1 M and 3×720 K Whole Genome and CNV focused arrays, the Agilent 1×1 M CGH and High Resolution and 2×400 K CNV and SNP+CGH arrays, the Illumina Human Omni1Quad array and the Affymetrix SNP 6.0 array. The Gold Standards used were a 1000 Genomes Project sequencing-based set of 3997 validated CNVs and an ultra high-resolution aCGH-based set of 756 validated CNVs. We found that sensitivity, total number, size range and breakpoint resolution of CNV calls were highest for CNV focused arrays. Our results are important for cost effective CNV detection and validation for both basic and clinical applications.
We assessed the role of rare copy number variants (CNVs) in Alzheimer's disease (AD) using intensity data from 3260 AD cases and 1290 age-matched controls from the genome-wide association study (GWAS) conducted by the Genetic and Environmental Risk for Alzheimer's disease Consortium (GERAD). We did not observe a significant excess of rare CNVs in cases, although we did identify duplications overlapping APP and CR1 which may be pathogenic. We looked for an excess of CNVs in loci which have been highlighted in previous AD CNV studies, but did not replicate previous findings. Through pathway analyses, we observed suggestive evidence for biological overlap between single nucleotide polymorphisms and CNVs in AD susceptibility. We also identified that our sample of elderly controls harbours significantly fewer deletions >1 Mb than younger control sets in previous CNV studies on schizophrenia and bipolar disorder (P = 8.9 × 10−4 and 0.024, respectively), raising the possibility that healthy elderly individuals have a reduced rate of large deletions. Thus, in contrast to diseases such as schizophrenia, autism and attention deficit/hyperactivity disorder, CNVs do not appear to make a significant contribution to the development of AD.
DNA copy number variations (CNVs) are an important component of genetic variation, affecting a greater fraction of the genome than single nucleotide polymorphisms (SNPs). The advent of high-resolution SNP arrays has made it possible to identify CNVs. Characterization of widespread constitutional (germline) CNVs has provided insight into their role in susceptibility to a wide spectrum of diseases, and somatic CNVs can be used to identify regions of the genome involved in disease phenotypes. The role of CNVs as risk factors for cancer is currently underappreciated. However, the genomic instability and structural dynamism that characterize cancer cells would seem to make this form of genetic variation particularly intriguing to study in cancer. Here, we provide a detailed overview of the current understanding of the CNVs that arise in the human genome and explore the emerging literature that reveals associations of both constitutional and somatic CNVs with a wide variety of human cancers.
Copy number variations (CNVs), a major source of human genetic polymorphism, have been suggested to have an important role in genetic susceptibility to common diseases such as cancer, immune diseases and neurological disorders. Nasopharyngeal carcinoma (NPC) is a multifactorial tumor closely associated with genetic background and with a male preponderance over female (3:1). Previous genome-wide association studies have identified single-nucleotide polymorphisms (SNPs) that are associated with NPC susceptibility. Here, we sought to explore the possible association of CNVs with NPC predisposition. Utilizing genome-wide SNP-based arrays and five CNV-prediction algorithms, we identified eight regions with CNV that were significantly overrepresented in NPC patients compared with healthy controls. These CNVs included six deletions (on chromosomes 3, 6, 7, 8 and 19), and two duplications (on chromosomes 7 and 12). Among them, the CNV located at chromosome 6p21.3, with single-copy deletion of the MICA and HCP5 genes, showed the highest association with NPC. Interestingly, it was more specifically associated with an increased NPC risk among males. This gender-specific association was replicated in an independent case–control sample using a self-established deletion-specific polymerase chain reaction strategy. To the best of our knowledge, this is the first study to explore the role of constitutional CNVs in NPC, using a genome-wide platform. Moreover, we identified eight novel candidate regions with CNV that merit future investigation, and our results suggest that similar to neuroblastoma and prostate cancer, genetic structural variations might contribute to NPC predisposition.
Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association.
Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis.
The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.
Understanding the genetic basis of schizophrenia continues to be major challenge. The research done during the last two decades has provided several candidate genes which unfortunately have not been consistently replicated across or within a population. The recent genome-wide association studies (GWAS) and copy number variation (CNV) studies have provided important evidence suggesting a role of both common and rare large CNVs in schizophrenia genesis. The burden of rare copy number variations appears to be increased in schizophrenia patients. A consistent observation among the GWAS studies is the association with schizophrenia of genetic markers in the major histocompatibility complex (6p22.1)-containing genes including NOTCH4 and histone protein loci. Molecular genetic studies are also demonstrating that there is more overlap between the susceptibility genes for schizophrenia and bipolar disorder than previously suspected. In this review we summarize the major findings of the past decade and suggest areas of future research.
schizophrenia; linkage; candidate gene; whole genome association study, GWAS; copy number variation; CNV; common variant; rare variant
Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely single nucleotide polymorphisms (SNPs). In recent years another type of common genetic variation has been characterised, namely structural variation, including copy number variations (CNVs). To determine the overall contribution of CNVs to complex phenotypes we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery.
Methodology and Principal Findings
We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212.
Conclusions and Significance
Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
Genome-wide association studies (GWAS) based on single nucleotide polymorphisms (SNPs) revolutionized our perception of the genetic regulation of complex traits and diseases. Copy number variations (CNVs) promise to shed additional light on the genetic basis of monogenic as well as complex diseases and phenotypes. Indeed, the number of detected associations between CNVs and certain phenotypes are constantly increasing. However, while several software packages support the determination of CNVs from SNP chip data, the downstream statistical inference of CNV-phenotype associations is still subject to complicated and inefficient in-house solutions, thus strongly limiting the performance of GWAS based on CNVs.
CONAN is a freely available client-server software solution which provides an intuitive graphical user interface for categorizing, analyzing and associating CNVs with phenotypes. Moreover, CONAN assists the evaluation process by visualizing detected associations via Manhattan plots in order to enable a rapid identification of genome-wide significant CNV regions. Various file formats including the information on CNVs in population samples are supported as input data.
CONAN facilitates the performance of GWAS based on CNVs and the visual analysis of calculated results. CONAN provides a rapid, valid and straightforward software solution to identify genetic variation underlying the 'missing' heritability for complex traits that remains unexplained by recent GWAS. The freely available software can be downloaded at http://genepi-conan.i-med.ac.at.
Genome-wide association (GWA) studies have identified common variants that are associated with a variety of traits and diseases, but most studies have been performed in European-derived populations. Here, we describe the first genome-wide analyses of imputed genotype and copy number variants (CNVs) for anthropometric measures in African-derived populations: 1188 Nigerians from Igbo-Ora and Ibadan, Nigeria, and 743 African-Americans from Maywood, IL. To improve the reach of our study, we used imputation to estimate genotypes at ∼2.1 million single-nucleotide polymorphisms (SNPs) and also tested CNVs for association. No SNPs or common CNVs reached a genome-wide significance level for association with height or body mass index (BMI), and the best signals from a meta-analysis of the two cohorts did not replicate in ∼3700 African-Americans and Jamaicans. However, several loci previously confirmed in European populations showed evidence of replication in our GWA panel of African-derived populations, including variants near IHH and DLEU7 for height and MC4R for BMI. Analysis of global burden of rare CNVs suggested that lean individuals possess greater total burden of CNVs, but this finding was not supported in an independent European population. Our results suggest that there are not multiple loci with strong effects on anthropometric traits in African-derived populations and that sample sizes comparable to those needed in European GWA studies will be required to identify replicable associations. Meta-analysis of this data set with additional studies in African-ancestry populations will be helpful to improve power to detect novel associations.
Although copy number variations (CNVs) are expected to affect various diseases, little is known about the association between CNVs and breast cancer susceptibility. Therefore, we investigated this relation. Array comparative genomic hybridization was performed to search for candidate CNVs related to breast cancer susceptibility. Subsequent quantitative real-time polymerase chain reaction was carried out for confirmation. We found seven CNV markers associated with breast cancer risk. The means of the relative copy numbers of patients with a history of breast cancer and women in the control group were 0.8 and 1.8 for Hs06535529_cn on 1p36.12 (P < 0.0001), 2.9 and 2.2 for Hs03103056_cn on 3q26.1 (P < 0.0001), 1.2 and 1.8 for Hs03899300_cn on 15q26.3 (P < 0.0001), 1.0 and 1.5 for Hs03908783_cn on 15q26.3 (P < 0.0001), and 1.1 and 1.7 for Hs03898338_cn on 15q26.3 (P < 0.0001), respectively. Interestingly, nine or more copies of Hs04093415_cn on 22q12.3 were found only in 8/193 (4.1 %) patients with a history of breast cancer and in none of the controls (P = 0.0081). Similarly, 12 or more copies of Hs040908898_cn on 22q12.3 were found only in 7/193 (3.6 %) patients with a history of breast cancer and in none of the controls (P = 0.016). A combination of two CNVs resulted in 80.3 % sensitivity, 80.6 % specificity, 82.4 % positive predictive value, and 78.3 % negative predictive value for the prediction of breast cancer susceptibility. These findings may lead to a new means of risk assessment for breast cancer. Confirmatory studies using independent data sets are needed to support our findings.
CNV; Breast cancer susceptibility; CGH; Real-time PCR; Digital PCR
Recent discovery of the copy number variation (CNV) in normal individuals has widened our understanding of genomic variation. However, most of the reported CNVs have been identified in Caucasians, which may not be directly applicable to people of different ethnicities. To profile CNV in East-Asian population, we screened CNVs in 3578 healthy, unrelated Korean individuals, using the Affymetrix Genome-Wide Human SNP array 5.0. We identified 144 207 CNVs using a pooled data set of 100 randomly chosen Korean females as a reference. The average number of CNVs per genome was 40.3, which is higher than that of CNVs previously reported using lower resolution platforms. The median size of CNVs was 18.9 kb (range 0.2–5406 kb). Copy number losses were 4.7 times more frequent than copy number gains. CNV regions (CNVRs) were defined by merging overlapping CNVs identified in two or more samples. In total, 4003 CNVRs were defined encompassing 241.9 Mb accounting for ∼8% of the human genome. A total of 2077 CNVRs (51.9%) were potentially novel. Known CNVRs were larger and more frequent than novel CNVRs. Sixteen percent of the CNVRs were observed in ≥1% of study subjects and 24% overlapped with the OMIM genes. A total of 476 (11.9%) CNVRs were associated with segmental duplications. CNVS/CNVRs identified in this study will be valuable resources for studying human genome diversity and its association with disease.
Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to play an important role in genetic susceptibility to common disease. To address this we undertook a large direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed ~19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated ~50% of all common CNVs larger than 500bp. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell-lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease, IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis, and type 1 diabetes, and TSPAN8 for type 2 diabetes, though in each case the locus had previously been identified in SNP-based studies, reflecting our observation that the majority of common CNVs which are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs which can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.