Copy-number variants (CNVs) reshape gene structure, modulate gene expression, and contribute to significant phenotypic variation. Previous studies have revealed CNV patterns in natural populations of Drosophila melanogaster and suggested that selection and mutational bias shape genomic patterns of CNV. Although previous CNV studies focused on heterogeneous strains, here, we established a number of second-chromosome substitution lines to uncover CNV characteristics when homozygous. The percentage of genes harboring CNVs is higher than found in previous studies. More CNVs are detected in homozygous than heterozygous substitution strains, suggesting the comparative genomic hybridization arrays underestimate CNV owing to heterozygous masking. We incorporated previous gene expression data collected from some of the same substitution lines to investigate relationships between CNV gene dosage and expression. Most genes present in CNVs show no evidence of increased or diminished transcription, and the fraction of such dosage-insensitive CNVs is greater in heterozygotes. More than 70% of the dosage-sensitive CNVs are recessive with undetectable effects on transcription in heterozygotes. A deficiency of singletons in recessive dosage-sensitive CNVs supports the hypothesis that most CNVs are subject to negative selection. On the other hand, relaxed purifying selection might account for the higher number of protein–protein interactions in dosage-insensitive CNVs than in dosage-sensitive CNVs. Dosage-sensitive CNVs that are upregulated and downregulated coincide with copy-number increases and decreases. Our results help clarify the relation between CNV dosage and gene expression in the D. melanogaster genome.
copy-number variation; gene expression; gene dosage sensitivity; recessive CNV; selection
Genomic structural changes, such as gene Copy Number Variations (CNVs) are extremely abundant in the human genome. An enormous effort is currently ongoing to recognize and catalogue human CNVs and their associations with abnormal phenotypic outcomes. Recently, several reports related neuropsychiatric diseases (i.e. autism spectrum disorders, schizophrenia, mental retardation, behavioral problems, epilepsy) with specific CNV. Moreover, for some conditions, both the deletion and duplication of the same genomic segment are related to the phenotype. Syndromes associated with CNVs (microdeletion and microduplication) have long been known to display specific neurobehavioral traits. It is important to note that not every gene is susceptible to gene dosage changes and there are only a few dosage sensitive genes. Smith-Magenis (SMS) and Potocki-Lupski (PTLS) syndromes are associated with a reciprocal microdeletion and microduplication within chromosome 17p11.2. in humans. The dosage sensitive gene responsible for most phenotypes in SMS has been identified: the Retinoic Acid Induced 1 (RAI1). Studies on mouse models and humans suggest that RAI1 is likely the dosage sensitive gene responsible for clinical features in PTLS. In addition, the human RAI1 gene has been implicated in several neurobehavioral traits as spinocerebellar ataxia (SCA2), schizophrenia and non syndromic autism. In this review we discuss the evidence of RAI1 as a dosage sensitive gene, its relationship with different neurobehavioral traits, gene structure and mutations, and what is known about its molecular and cellular function, as a first step in the elucidation of the mechanisms that relate dosage sensitive genes with abnormal neurobehavioral outcomes.
Copy Number Variation; dosage sensitive gene; neurobehavioral traits; Potocki-Lupski Syndrome; RAI1; Smith-Magenis Syndrome; transcription factor activity.
Due to the increased accuracy of Copy Number Variable region (CNV) break point mapping, it is now possible to say with a reasonable degree of confidence whether a gene (i) falls entirely within a CNV; (ii) overlaps the CNV or (iii) actually contains the CNV. We classify these as type I, II and III CNV genes respectively.
Here we show that although type I genes vary in copy number along with the CNV, most of these type I genes have the same expression levels as wild type copy numbers of the gene. These genes must, therefore, be under homeostatic dosage compensation control. Looking into possible mechanisms for the regulation of gene expression we found that type I genes have a significant paucity of genes regulated by miRNAs and are not significantly enriched for monoallelically expressed genes. Type III genes, on the other hand, have a significant excess of genes regulated by miRNAs and are enriched for genes that are monoallelically expressed.
Many diseases and genomic disorders are associated with CNVs so a better understanding of the different ways genes are associated with normal CNVs will help focus on candidate genes in genome wide association studies.
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. 1,447 copy number variable regions covering 360 megabases (12% of the genome) were identified in these populations; these CNV regions contained hundreds of genes, disease loci, functional elements and segmental duplications. Strikingly, these CNVs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal dramatic variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
The development of genetic technologies has led to the identification of several copy number variations (CNVs) in the human genome. Genome rearrangements affect dosage-sensitive gene expression in normal brain development. There is strong evidence associating human psychiatric disorders, especially autism spectrum disorders (ASDs) and schizophrenia to genetic risk factors and accumulated CNV risk loci. Deletions in 1q21, 3q29, 15q13, 17p12, and 22q11, as well as duplications in 16p11, 16p13, and 15q11-13 have been reported as recurrent CNVs in ASD and/or schizophrenia. Chromosome engineering can be a useful technology to reflect human diseases in animal models, especially CNV-based psychiatric disorders. This system, based on the Cre/loxP strategy, uses large chromosome rearrangement such as deletion, duplication, inversion, and translocation. Although it is hard to reflect human pathophysiology in animal models, some aspects of molecular pathways, brain anatomy, cognitive, and behavioral phenotypes can be addressed. Some groups have created animal models of psychiatric disorders, ASD, and schizophrenia, which are based on human CNV. These mouse models display some brain anatomical and behavioral abnormalities, providing insight into human neuropsychiatric disorders that will contribute to novel drug screening for these devastating disorders.
The functional polymorphism that explains the established association of the androgen receptor (AR) with androgenetic alopecia (AGA) remains unidentified, but Copy Number Variation (CNV) might be relevant. CNV involves changes in copy number of large segments of DNA, leading to the altered dosage of gene regulators or genes themselves. Two recent reports indicate regions of CNV in and around AR, and these have not been studied in relation to AGA. The aim of this preliminary case-control study was to determine if AR CNV is associated with AGA, with the hypothesis that CNV is the functional AR variant contributing to this condition.
Multiplex Ligation-dependent Probe Amplification was used to screen for CNV in five AR exons and a conserved, non-coding region upstream of AR in 85 men carefully selected as cases and controls for maximal phenotypic contrast. There was no evidence of CNV in AR in any of the cases or controls, and thus no evidence of significant association between AGA and AR CNV.
The results suggest this form of genomic variation at the AR locus is unlikely to predispose to AGA.
Copy number variants (CNVs), defined as losses and gains of segments of genomic DNA, are a major source of genomic variation.
In this study, we identified over 2,000 human CNVs that overlap with orthologous chimpanzee or orthologous macaque CNVs. Of these, 170 CNVs overlap with both chimpanzee and macaque CNVs, and these were collapsed into 34 hotspot regions of CNV formation. Many of these hotspot regions of CNV formation are functionally relevant, with a bias toward genes involved in immune function, some of which were previously shown to evolve under balancing selection in humans. The genes in these primate CNV formation hotspots have significant differential expression levels between species and show evidence for positive selection, indicating that they have evolved under species-specific, directional selection.
These hotspots of primate CNV formation provide a novel perspective on divergence and selective pressures acting on these genomic regions.
Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits.
A major goal of genetics and genomics is to understand how genetic differences between individuals (genotypes) translate into variation in disease susceptibility, behavior, and many other organism-level characteristics (phenotypes). While the sizes of genetic variants range from a single base to whole chromosomes, historically, only the extreme ends of this spectrum have been explored. DNA copy number variants (CNVs) lie between these two extremes, ranging in size from hundreds to millions of bases. The recent application of microarray technology to detect genetic variation in humans has led to the realization that CNVs are common. In fact, rough estimates indicate that CNVs and small-scale variants may constitute similar proportions of total genomic DNA. In this report, the authors characterize 80 CNVs across the genomes of 21 inbred strains of mice. The identification and characterization of mouse CNVs are important because inbred strains of mice are the most widely used model system to explore biomedical genetics. These CNVs are located near another class of genomic features, segmental duplications, more often than would be expected by chance, which supports the hypothesis that CNVs and segmental duplications are causally linked. Importantly, many of the CNVs contain known genes and thus may underlie both gene expression and phenotypic variation between strains.
Copy number variation (CNV) in terms of aneuploidies of both entire chromosomes and chromosomal segments is an important evolutionary driving force, but it is inevitably accompanied by potentially problematic variations in gene doses and genomic instability. Thus, a delicate balance must be maintained between mechanisms that compensate for variations in gene doses (and thus allow such genomic variability) and selection against destabilizing CNVs. In Drosophila, three known compensatory mechanisms have evolved: a general segmental aneuploidy-buffering system and two chromosome-specific systems. The two chromosome-specific systems are the male-specific lethal complex, which is important for dosage compensation of the male X chromosome, and Painting of fourth, which stimulates expression of the fourth chromosome. In this review, we discuss the origin and function of buffering and compensation using Drosophila as a model.
Genomic disorders are a clinically diverse group of conditions caused by gain, loss or re-orientation of a genomic region containing dosage-sensitive genes. One class of genomic disorder is caused by hemizygous deletions resulting in haploinsufficiency of a single or, more usually, several genes. For example, the heterozygous contiguous gene deletion on chromosome 22q11.2 causing DiGeorge syndrome involves at least 20-30 genes. Determining how the copy number variation (CNV) affects human variation and contributes to the aetiology and progression of various genomic disorders represents important questions for the future. Here, I will discuss the functional significance of one form of CNV, haploinsufficiency (i.e. loss of a gene copy), of DNA damage response components and its association with certain genomic disorders. There is increasing evidence that haploinsufficiency for certain genes encoding key players in the cells response to DNA damage, particularly those of the Ataxia Telangiectasia and Rad3-related (ATR)-pathway, has a functional impact. I will review this evidence and present examples of some well known clinically similar genomic disorders that have recently been shown to be defective in the ATR-dependent DNA damage response. Finally, I will discuss the potential implications of a haploinsufficiency-induced defective DNA damage response for the clinical management of certain human genomic disorders.
DNA damage response; ATR; haploinsufficiency; genomic disorders.
Multiplex ligation-dependent probe amplification (MLPA) was originally described as an efficient and reliable technique for gene dosage or DNA copy number variation (CNV) analysis. Due to its low cost, reliability, sensitivity, and relative simplicity, MLPA has rapidly gained acceptance in research and diagnostic laboratories, and fills the gap between genome-wide analysis and single gene analysis. A number of new applications have been developed shortly after the introduction of MLPA, including methylation-specific MLPA (MS-MLPA), the use of MLPA in SNP genotyping, copy number analysis in segmentally duplicated regions, etc. However, probe design is time consuming and error prone. Recently software has been developed to help human genomic MLPA probe selection and optimization. For other genomes and MS-MLPA, probe design remains a challenge.
This paper describes a number of new features added to the previous H-MAPD software, which include: 1) probe selection for MS-MLPA; 2) support of mouse and rat genomes; 3) a set of new stuffer sequences. In addition, a physical-chemical property verification tool was implemented to verify user defined probes.
MAPD is a web-based tool which is freely available to non-commercial users. The previous H-MAPD software has been used by about 200 users from more than 30 countries. With the new features, the author hopes MAPD will bring more convenience to the MLPA community.
Structural variation contributes to the rich genetic and phenotypic diversity of the modern domestic dog, Canis lupus familiaris, although compared to other organisms, catalogs of canine copy number variants (CNVs) are poorly defined. To this end, we developed a customized high-density tiling array across the canine genome and used it to discover CNVs in nine genetically diverse dogs and a gray wolf.
In total, we identified 403 CNVs that overlap 401 genes, which are enriched for defense/immunity, oxidoreductase, protease, receptor, signaling molecule and transporter genes. Furthermore, we performed detailed comparisons between CNVs located within versus outside of segmental duplications (SDs) and find that CNVs in SDs are enriched for gene content and complexity. Finally, we compiled all known dog CNV regions and genotyped them with a custom aCGH chip in 61 dogs from 12 diverse breeds. These data allowed us to perform the first population genetics analysis of canine structural variation and identify CNVs that potentially contribute to breed specific traits.
Our comprehensive analysis of canine CNVs will be an important resource in genetically dissecting canine phenotypic and behavioral variation.
Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice.
To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from O. sativa ssp. japonica) and 'Guang-lu-ai 4' (from O. sativa ssp. indica). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense.
We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.
We have established that human genome sequences encoding a novel protein domain, DUF1220, show a dramatically elevated copy number in the human lineage (>200 copies in humans vs. 1 in mouse/rat) and may be important to human evolutionary adaptation. Copy-number variations (CNVs) in the 1q21.1 region, where most DUF1220 sequences map, have now been implicated in numerous diseases associated with cognitive dysfunction, including autism, autism spectrum disorder, mental retardation, schizophrenia, microcephaly, and macrocephaly.
Although the data are only correlative at this point, we report here that these disease-related 1q21.1 CNVs either encompass or are directly flanked by DUF1220 sequences and exhibit a dosage-related correlation with human brain size. Microcephaly-producing 1q21.1 CNVs are deletions, whereas macrocephaly-producing 1q21.1 CNVs are duplications. Similarly, 1q21.1 deletions and smaller brain size are linked with schizophrenia, whereas 1q21.1 duplications and larger brain size are associated with autism. Interestingly, these two diseases are thought to be phenotypic opposites. These data suggest a model which proposes that (1) DUF1220 domain copy number may be involved in influencing human brain size and (2) the evolutionary advantage of rapidly increasing DUF1220 copy number in the human lineage has resulted in favoring retention of the high genomic instability of the 1q21.1 region, which, in turn, has precipitated a spectrum of recurrent human brain and developmental disorders.
Deletion or duplication of one copy of the human 16p11.2 interval is tightly associated with impaired brain function, including autism spectrum disorders (ASDs), intellectual disability disorder (IDD) and other phenotypes, indicating the importance of gene dosage in this copy number variant region (CNV). The core of this CNV includes 25 genes; however, the number of genes that contribute to these phenotypes is not known. Furthermore, genes whose functional levels change with deletion or duplication (termed ‘dosage sensors’), which can associate the CNV with pathologies, have not been identified in this region. Using the zebrafish as a tool, a set of 16p11.2 homologs was identified, primarily on chromosomes 3 and 12. Use of 11 phenotypic assays, spanning the first 5 days of development, demonstrated that this set of genes is highly active, such that 21 out of the 22 homologs tested showed loss-of-function phenotypes. Most genes in this region were required for nervous system development – impacting brain morphology, eye development, axonal density or organization, and motor response. In general, human genes were able to substitute for the fish homolog, demonstrating orthology and suggesting conserved molecular pathways. In a screen for 16p11.2 genes whose function is sensitive to hemizygosity, the aldolase a (aldoaa) and kinesin family member 22 (kif22) genes were identified as giving clear phenotypes when RNA levels were reduced by ∼50%, suggesting that these genes are deletion dosage sensors. This study leads to two major findings. The first is that the 16p11.2 region comprises a highly active set of genes, which could present a large genetic target and might explain why multiple brain function, and other, phenotypes are associated with this interval. The second major finding is that there are (at least) two genes with deletion dosage sensor properties among the 16p11.2 set, and these could link this CNV to brain disorders such as ASD and IDD.
Segmental copy-number variations (CNVs) may contribute to genetic variation in humans. Reports
of the existence and characteristics of CNVs in a large Japanese cohort are quite limited. We report the data from a large Japanese population.
We conducted population screening for 213 unrelated Japanese individuals using comparative genomic hybridization based on a bacterial artificial
chromosome microarray (BAC-aCGH). We summarize the data by focusing on highly polymorphic CNVs in ≥5.0% of the individual,
since they may be informative for demonstrating the relationships between genotypes and their phenotypes. We found a total of 680 CNVs at 16
different BAC-regions in the genome. The majority of the polymorphic CNVs presented on BAC-clones that overlapped with regions of segmental
duplication, and the majority of the polymorphic CNVs observed in this population had been previously reported in other publications.
Some of the CNVs contained genes which might be related to phenotypic heterogeneity among individuals.
One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies.
We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments.
To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data.
breast cancer; copy number variation; gene regulatory networks; oncogenes
Copy number variations (CNVs) are deletions, insertions, duplications, and more complex variations ranging from 1 kb to sub-microscopic sizes. Recent advances in array technologies have enabled researchers to identify a number of CNVs from normal individuals. However, the identification of new CNVs has not yet reached saturation, and more CNVs from diverse populations remain to be discovered.
We identified 65 copy number variation regions (CNVRs) in 116 normal Korean individuals by analyzing Affymetrix 250 K Nsp whole-genome SNP data. Ten of these CNVRs were novel and not present in the Database of Genomic Variants (DGV). To increase the specificity of CNV detection, three algorithms, CNAG, dChip and GEMCA, were applied to the data set, and only those regions recognized at least by two algorithms were identified as CNVs. Most CNVRs identified in the Korean population were rare (<1%), occurring just once among the 116 individuals. When CNVs from the Korean population were compared with CNVs from the three HapMap ethnic groups, African, European, and Asian; our Korean population showed the highest degree of overlap with the Asian population, as expected. However, the overlap was less than 40%, implying that more CNVs remain to be discovered from the Asian population as well as from other populations. Genes in the novel CNVRs from the Korean population were enriched for genes involved in regulation and development processes.
CNVs are recently-recognized structural variations among individuals, and more CNVs need to be identified from diverse populations. Until now, CNVs from Asian populations have been studied less than those from European or American populations. In this regard, our study of CNVs from the Korean population will contribute to the full cataloguing of structural variation among diverse human populations.
MicroRNAs (miRNAs) are important genetic elements that regulate the expression of thousands of human genes. Polymorphisms affecting miRNA biogenesis, dosage and target recognition may represent potentially functional variants. The functional consequences of single nucleotide polymorphisms (SNPs) within critical miRNA sequences and outside of miRNA genes were previously demonstrated using both experimental and computational methods. However, little is known about how copy number variations (CNVs) affect miRNA genes.
In this study, we analyzed the co-localization of all miRNA loci with known CNV regions. Using bioinformatic tools we identified and validated 209 copy number variable miRNA genes (CNV-miRNAs) in CNV regions deposited in Database of Genomic Variations (DGV) and 11 CNV-miRNAs in two sets of CNVs defined as highly polymorphic. We propose potential mechanisms of CNV-mediated variation of functional copies of miRNAs (dosage) for different types of CNVs overlapping miRNA genes. We also showed that, consistent with their essential biological functions, miRNA loci are underrepresented in highly polymorphic and well-validated CNV regions.
We postulate that CNV-miRNAs are potential functional variants and should be considered high priority candidate variants in genotype-phenotype association studies.
Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region. Hum Mutat 32:743–750, 2011. © 2011 Wiley-Liss, Inc.
CNV; defensin; antimicrobial; influenza; paralogue ratio test
To date, hundreds of thousands of copy-number variation (CNV) data have been reported using various platforms. The proportion of Asians in these data is, however, relatively small as compared with that of other ethnic groups, such as Caucasians and Yorubas. Because of limitations in platform resolution and the high noise level in signal intensity, in most CNV studies (particularly those using single nucleotide polymorphism arrays), the average number of CNVs in an individual is less than the number of known CNVs. In this study, we ascertained reliable, common CNV regions (CNVRs) and identified actual frequency rates in the Korean population to provide more CNV information. We performed two-stage analyses for detecting structural variations with two platforms. We discovered 576 common CNVRs (88 CNV segments on average in an individual), and 87% (501 of 576) of these CNVRs overlapped by ≥1 bp with previously validated CNV events. Interestingly, from the frequency analysis of CNV profiles, 52 of 576 CNVRs had a frequency rate of <1% in the 8842 individuals. Compared with other common CNV studies, this study found six common CNVRs that were not reported in previous CNV studies. In conclusion, we propose the data-driven detection approach to discover common CNVRs including those of unreported in the previous Korean CNV study while minimizing false positives. Through our approach, we successfully discovered more common CNVRs than previous Korean CNV study and conducted frequency analysis. These results will be a valuable resource for the effective level of CNVs in the Korean population.
common copy-number variation; CNV profile; Asian CNV; structural variation
Motivation: Copy number variations (CNVs) are increasingly recognized as an substantial source of individual genetic variation, and hence there is a growing interest in investigating the evolutionary history of CNVs as well as their impact on complex disease susceptibility. CNV/SNP haplotypes are critical for this research, but although many methods have been proposed for inferring integer copy number, few have been designed for inferring CNV haplotypic phase and none of these are applicable at genome-wide scale. Here, we present a method for inferring missing CNV genotypes, predicting CNV allelic configuration and for inferring CNV haplotypic phase from SNP/CNV genotype data. Our method, implemented in the software polyHap v2.0, is based on a hidden Markov model, which models the joint haplotype structure between CNVs and SNPs. Thus, haplotypic phase of CNVs and SNPs are inferred simultaneously. A sampling algorithm is employed to obtain a measure of confidence/credibility of each estimate.
Results: We generated diploid phase-known CNV–SNP genotype datasets by pairing male X chromosome CNV–SNP haplotypes. We show that polyHap provides accurate estimates of missing CNV genotypes, allelic configuration and CNV haplotypic phase on these datasets. We applied our method to a non-simulated dataset—a region on Chromosome 2 encompassing a short deletion. The results confirm that polyHap's accuracy extends to real-life datasets.
Availability: Our method is implemented in version 2.0 of the polyHap software package and can be downloaded from http://www.imperial.ac.uk/medicine/people/l.coin
Supplementary information: Supplementary data are available at Bioinformatics online.
Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction (∼55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ∼50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.
Copy-number variants (CNVs) are deletions and duplications of DNA segments, responsible for most of the genome variation in mammals. To help elucidate the impact of CNVs on evolution and function, we provide a high-resolution CNV map of the largest gene superfamily in humans, i.e., the olfactory receptor (OR) gene superfamily. Our map reveals twice as many olfactory CNVs per person than previously reported, indicating considerable OR dosage variations in humans. In particular, our findings indicate that CNVs are specifically enriched among evolutionary “young” ORs, some of which originated following the human-chimpanzee split, implying that CNVs may play an important role in the gene-birth and gene-loss processes that continuously shape the human OR repertoire. Furthermore, we describe 15 OR gene loci showing frequent human-specific deletion alleles. Additionally, we present evidence for a recent non-allelic homologous recombination event involving a pair of OR genes, forming a novel fusion OR that may harbor novel odorant-binding properties. Such events may potentially relate to individual functional “holes” in the human smell-detection repertoire, and future studies will address the specific chemosensory impact of our genomic variation map.
Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.
We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.
We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.
Copy number variants (CNVs) are genomic segments which are duplicated or deleted among different individuals. CNVs have been implicated in both Mendelian and complex traits, including immune and behavioral disorders, but the study of the mechanisms by which CNVs influence gene expression and clinical phenotypes in humans is complicated by the limited access to tissues and by population heterogeneity. We now report studies of the effect of 19 CNVs on gene expression and metabolic traits in a mouse intercross between strains C57BL/6J and C3H/HeJ. We found that 83% of genes predicted to occur within CNVs were differentially expressed. The expression of most CNV genes was correlated with copy number, but we also observed evidence that gene expression was altered in genes flanking CNVs, suggesting that CNVs may contain regulatory elements for these genes. Several CNVs mapped to hotspots, genomic regions influencing expression of tens or hundreds of genes. Several metabolic traits including cholesterol, triglycerides, glucose and body weight mapped to three CNVs in the genome, in mouse chromosomes 1, 4 and 17. Predicted CNV genes, such as Itlna, Defcr-1, Trim12 and Trim34 were highly correlated with these traits. Our results suggest that CNVs have a significant impact on gene expression and that CNVs may be playing a role in the mechanisms underlying metabolic traits in mice.