Copy-number variants (CNVs) reshape gene structure, modulate gene expression, and contribute to significant phenotypic variation. Previous studies have revealed CNV patterns in natural populations of Drosophila melanogaster and suggested that selection and mutational bias shape genomic patterns of CNV. Although previous CNV studies focused on heterogeneous strains, here, we established a number of second-chromosome substitution lines to uncover CNV characteristics when homozygous. The percentage of genes harboring CNVs is higher than found in previous studies. More CNVs are detected in homozygous than heterozygous substitution strains, suggesting the comparative genomic hybridization arrays underestimate CNV owing to heterozygous masking. We incorporated previous gene expression data collected from some of the same substitution lines to investigate relationships between CNV gene dosage and expression. Most genes present in CNVs show no evidence of increased or diminished transcription, and the fraction of such dosage-insensitive CNVs is greater in heterozygotes. More than 70% of the dosage-sensitive CNVs are recessive with undetectable effects on transcription in heterozygotes. A deficiency of singletons in recessive dosage-sensitive CNVs supports the hypothesis that most CNVs are subject to negative selection. On the other hand, relaxed purifying selection might account for the higher number of protein–protein interactions in dosage-insensitive CNVs than in dosage-sensitive CNVs. Dosage-sensitive CNVs that are upregulated and downregulated coincide with copy-number increases and decreases. Our results help clarify the relation between CNV dosage and gene expression in the D. melanogaster genome.
copy-number variation; gene expression; gene dosage sensitivity; recessive CNV; selection
The functional contribution of CNV to human biology and disease pathophysiology has undergone limited exploration. Recent observations in humans indicate a tentative link between CNV and weight regulation. Smith-Magenis syndrome (SMS), manifesting obesity and hypercholesterolemia, results from a deletion CNV at 17p11.2, but is sometimes due to haploinsufficiency of a single gene, RAI1. The reciprocal duplication in 17p11.2 causes Potocki-Lupski syndrome (PTLS). We previously constructed mouse strains with a deletion, Df(11)17, or duplication, Dp(11)17, of the mouse genomic interval syntenic to the SMS/PTLS region. We demonstrate that Dp(11)17 is obesity-opposing; it conveys a highly penetrant, strain-independent phenotype of reduced weight, leaner body composition, lower TC/LDL, and increased insulin sensitivity that is not due to alteration in food intake or activity level. When fed with a high-fat diet, Dp(11)17/+ mice display much less weight gain and metabolic change than WT mice, demonstrating that the Dp(11)17 CNV protects against metabolic syndrome. Reciprocally, Df(11)17/+ mice with the deletion CNV have increased weight, higher fat content, decreased HDL, and reduced insulin sensitivity, manifesting a bona fide metabolic syndrome. These observations in the deficiency animal model are supported by human data from 76 SMS subjects. Further, studies on knockout/transgenic mice showed that the metabolic consequences of Dp(11)17 and Df(11)17 CNVs are not only due to dosage alterations of Rai1, the predominant dosage-sensitive gene for SMS and likely also PTLS. Our experiments in chromosome-engineered mouse CNV models for human genomic disorders demonstrate that a CNV can be causative for weight/metabolic phenotypes. Furthermore, we explored the biology underlying the contribution of CNV to the physiology of weight control and energy metabolism. The high penetrance, strain independence, and resistance to dietary influences associated with the CNVs in this study are features distinct from most SNP–associated metabolic traits and further highlight the potential importance of CNV in the etiology of both obesity and MetS as well as in the protection from these traits.
Genetic factors play a large role in obesity. However, despite recent technical progress in the search for genetic variants, the identities of causative and contributory genetic factors remain largely unknown. Whereas nucleotide sequence variation has been studied extensively with respect to its potential contribution to obesity, copy number variations (CNV), in which genes exist in abnormal numbers of copies mostly due to duplication or deletion, have only more recently been observed to be associated with human obesity. In this report, we utilize chromosome engineered mouse strains harboring a deletion or duplication CNV to address the potential functional impact of CNVs on weight control and metabolism. We show that the duplication CNV leads to lower body weight; it is also metabolically advantageous and protects from diet-induced obesity and metabolic syndrome (MetS). The deletion CNV causes a “mirror” phenotype with increased body weight and MetS–like phenotypes. Importantly, these effects manifest regardless of the genetic background and do not appear to be attributable to any single gene. These findings demonstrate experimentally that CNV can be causative for weight and metabolic phenotypes and highlight the potential relevance and importance of CNV in the etiology of obesity/MetS and the protection from these traits.
Olfactory receptors (ORs), which are involved in odorant recognition, form the largest mammalian protein superfamily. The genomic content of OR genes is considerably reduced in humans, as reflected by the relatively small repertoire size and the high fraction (∼55%) of human pseudogenes. Since several recent low-resolution surveys suggested that OR genomic loci are frequently affected by copy-number variants (CNVs), we hypothesized that CNVs may play an important role in the evolution of the human olfactory repertoire. We used high-resolution oligonucleotide tiling microarrays to detect CNVs across 851 OR gene and pseudogene loci. Examining genomic DNA from 25 individuals with ancestry from three populations, we identified 93 OR gene loci and 151 pseudogene loci affected by CNVs, generating a mosaic of OR dosages across persons. Our data suggest that ∼50% of the CNVs involve more than one OR, with the largest CNV spanning 11 loci. In contrast to earlier reports, we observe that CNVs are more frequent among OR pseudogenes than among intact genes, presumably due to both selective constraints and CNV formation biases. Furthermore, our results show an enrichment of CNVs among ORs with a close human paralog or lacking a one-to-one ortholog in chimpanzee. Interestingly, among the latter we observed an enrichment in CNV losses over gains, a finding potentially related to the known diminution of the human OR repertoire. Quantitative PCR experiments performed for 122 sampled ORs agreed well with the microarray results and uncovered 23 additional CNVs. Importantly, these experiments allowed us to uncover nine common deletion alleles that affect 15 OR genes and five pseudogenes. Comparison to the chimpanzee reference genome revealed that all of the deletion alleles are human derived, therefore indicating a profound effect of human-specific deletions on the individual OR gene content. Furthermore, these deletion alleles may be used in future genetic association studies of olfactory inter-individual differences.
Copy-number variants (CNVs) are deletions and duplications of DNA segments, responsible for most of the genome variation in mammals. To help elucidate the impact of CNVs on evolution and function, we provide a high-resolution CNV map of the largest gene superfamily in humans, i.e., the olfactory receptor (OR) gene superfamily. Our map reveals twice as many olfactory CNVs per person than previously reported, indicating considerable OR dosage variations in humans. In particular, our findings indicate that CNVs are specifically enriched among evolutionary “young” ORs, some of which originated following the human-chimpanzee split, implying that CNVs may play an important role in the gene-birth and gene-loss processes that continuously shape the human OR repertoire. Furthermore, we describe 15 OR gene loci showing frequent human-specific deletion alleles. Additionally, we present evidence for a recent non-allelic homologous recombination event involving a pair of OR genes, forming a novel fusion OR that may harbor novel odorant-binding properties. Such events may potentially relate to individual functional “holes” in the human smell-detection repertoire, and future studies will address the specific chemosensory impact of our genomic variation map.
Ultraconserved elements (UCEs) are strongly depleted from segmental duplications and copy number variations (CNVs) in the human genome, suggesting that deletion or duplication of a UCE can be deleterious to the mammalian cell. Here we address the process by which CNVs become depleted of UCEs. We begin by showing that depletion for UCEs characterizes the most recent large-scale human CNV datasets and then find that even newly formed de novo CNVs, which have passed through meiosis at most once, are significantly depleted for UCEs. In striking contrast, CNVs arising specifically in cancer cells are, as a rule, not depleted for UCEs and can even become significantly enriched. This observation raises the possibility that CNVs that arise somatically and are relatively newly formed are less likely to have established a CNV profile that is depleted for UCEs. Alternatively, lack of depletion for UCEs from cancer CNVs may reflect the diseased state. In support of this latter explanation, somatic CNVs that are not associated with disease are depleted for UCEs. Finally, we show that it is possible to observe the CNVs of induced pluripotent stem (iPS) cells become depleted of UCEs over time, suggesting that depletion may be established through selection against UCE-disrupting CNVs without the requirement for meiotic divisions.
Ultraconserved elements (UCEs) display a level of sequence conservation that has defied explanation. They are also dosage sensitive, being depleted from copy number variants (CNVs) in healthy cells. Here we address the process underlying this dosage sensitivity in order to gain insights into the way that UCE dosage affects cells. Our studies demonstrate that, in contrast to CNVs inherited by healthy individuals, cancer-specific CNVs are, as a rule, not depleted for UCEs and may even be enriched. Furthermore, by discovering that CNVs arising anew in the healthy, as opposed to diseased, body are depleted of UCEs, we obtain evidence that healthy cells may be responsive to changes in UCE dosage in a way that is disrupted in cancer cells. After examining CNVs over time in cell culture, we postulate that selection against UCE-disrupting CNVs in healthy cells acts rapidly, raising the surprising possibility of exploring in cell culture how UCE dosage sensitivity may explain ultraconservation. Our observations suggest that an understanding of the different responses of healthy and cancer cells to changes in UCE dosage could be harnessed to address genomic instabilities in cancer.
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance.
Recent studies of human genetic variation have revealed that, in addition to differing at single nucleotide polymorphisms, individuals differ in copy-number at many regions of the genome. These copy-number variants (CNVs) are caused by duplication or deletion events and often affect functional sequences such as genes. Efforts to reveal the functional impact of CNVs have identified many variants increasing the risk of various disorders, and some that are adaptive. However, these studies mostly fail to detect gene duplications caused by retrotransposition, in which an mRNA transcript is reverse-transcribed and reinserted into the genome, yielding a new intron-less gene copy. Here we describe a method leveraging next-generation sequence data to accurately detect gene copy-number variants caused by retrotransposition, or retroCNVs, and apply this method to hundreds of whole-genome sequences from three different human subpopulations. We find that these variants account for a substantial number of gene copy-number differences between individuals, and that gene retrotransposition may often result in both deleterious and beneficial mutations. Indeed, we present evidence that two of these new gene duplications may be adaptive. These results imply that retroCNVs are an especially important class of CNV and should be included in future studies of human copy-number variation.
Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits.
A major goal of genetics and genomics is to understand how genetic differences between individuals (genotypes) translate into variation in disease susceptibility, behavior, and many other organism-level characteristics (phenotypes). While the sizes of genetic variants range from a single base to whole chromosomes, historically, only the extreme ends of this spectrum have been explored. DNA copy number variants (CNVs) lie between these two extremes, ranging in size from hundreds to millions of bases. The recent application of microarray technology to detect genetic variation in humans has led to the realization that CNVs are common. In fact, rough estimates indicate that CNVs and small-scale variants may constitute similar proportions of total genomic DNA. In this report, the authors characterize 80 CNVs across the genomes of 21 inbred strains of mice. The identification and characterization of mouse CNVs are important because inbred strains of mice are the most widely used model system to explore biomedical genetics. These CNVs are located near another class of genomic features, segmental duplications, more often than would be expected by chance, which supports the hypothesis that CNVs and segmental duplications are causally linked. Importantly, many of the CNVs contain known genes and thus may underlie both gene expression and phenotypic variation between strains.
Duplicate genes emerge as copy-number variations (CNVs) at the population level, and remain copy-number polymorphic until they are fixed or lost. The successful establishment of such structural polymorphisms in the genome plays an important role in evolution by promoting genetic diversity, complexity and innovation. To characterize the early evolutionary stages of duplicate genes and their potential adaptive benefits, we combine comparative genomics with population genomics analyses to evaluate the distribution and impact of CNVs across natural populations of an eco-genomic model, the three-spined stickleback. With whole genome sequences of 66 individuals from populations inhabiting three distinct habitats, we find that CNVs generally occur at low frequencies and are often only found in one of the 11 populations surveyed. A subset of CNVs, however, displays copy-number differentiation between populations, showing elevated within-population frequencies consistent with local adaptation. By comparing teleost genomes to identify lineage-specific genes and duplications in sticklebacks, we highlight rampant gene content differences among individuals in which over 30% of young duplicate genes are CNVs. These CNV genes are evolving rapidly at the molecular level and are enriched with functional categories associated with environmental interactions, depicting the dynamic early copy-number polymorphic stage of genes during population differentiation.
After a locus is duplicated in a genome, individuals from a population instantaneously differ in the number of copies of this locus producing a copy-number variation (CNV). Over time, the joint effects of selection and other evolutionary forces will act to either eliminate the extra genetic copy or retain it. Depending on this evolutionary interplay, young duplications, including newly duplicated genes, can persist for millions of years as CNVs. CNVs may especially be prevalent between populations that have colonized and adapted to disparate environments in which selective pressures differ. Using whole genome sequences from several populations of three-spined sticklebacks that inhabit different environments, we find that a third of young duplicated genes are CNVs. These young CNV genes are enriched with environmental response functions and evolving rapidly at the molecular level, making them promising candidates for a role in the rapid ecological adaptation to novel environments.
Copy number variations (CNVs) can create new genes, change gene dosage, reshape gene structures, and modify elements regulating gene expression. As with all types of genetic variation, CNVs may influence phenotypic variation and gene expression. CNVs are thus considered major sources of genetic variation. Little is known, however, about their contribution to genetic variation in rice.
To detect CNVs, we used a set of NimbleGen whole-genome comparative genomic hybridization arrays containing 718,256 oligonucleotide probes with a median probe spacing of 500 bp. We compiled a high-resolution map of CNVs in the rice genome, showing 641 CNVs between the genomes of the rice cultivars 'Nipponbare' (from O. sativa ssp. japonica) and 'Guang-lu-ai 4' (from O. sativa ssp. indica). The CNVs identified vary in size from 1.1 kb to 180.7 kb, and encompass approximately 7.6 Mb of the rice genome. The largest regions showing copy gain and loss are of 37.4 kb on chromosome 4, and 180.7 kb on chromosome 8. In addition, 85 DNA segments were identified, including some genic sequences. Contracted genes greatly outnumbered duplicated ones. Many of the contracted genes corresponded to either the same genes or genes involved in the same biological processes; this was also the case for genes involved in disease and defense.
We detected CNVs in rice by array-based comparative genomic hybridization. These CNVs contain known genes. Further discussion of CNVs is important, as they are linked to variation among rice varieties, and are likely to contribute to subspecific characteristics.
Although large-scale copy-number variation is an important contributor to conspecific genomic diversity, whether these variants frequently contribute to human phenotype differences remains unknown. If they have few functional consequences, then copy-number variants (CNVs) might be expected both to be distributed uniformly throughout the human genome and to encode genes that are characteristic of the genome as a whole. We find that human CNVs are significantly overrepresented close to telomeres and centromeres and in simple tandem repeat sequences. Additionally, human CNVs were observed to be unusually enriched in those protein-coding genes that have experienced significantly elevated synonymous and nonsynonymous nucleotide substitution rates, estimated between single human and mouse orthologues. CNV genes encode disproportionately large numbers of secreted, olfactory, and immunity proteins, although they contain fewer than expected genes associated with Mendelian disease. Despite mouse CNVs also exhibiting a significant elevation in synonymous substitution rates, in most other respects they do not differ significantly from the genomic background. Nevertheless, they encode proteins that are depleted in olfactory function, and they exhibit significantly decreased amino acid sequence divergence. Natural selection appears to have acted discriminately among human CNV genes. The significant overabundance, within human CNVs, of genes associated with olfaction, immunity, protein secretion, and elevated coding sequence divergence, indicates that a subset may have been retained in the human population due to the adaptive benefit of increased gene dosage. By contrast, the functional characteristics of mouse CNVs either suggest that advantageous gene copies have been depleted during recent selective breeding of laboratory mouse strains or suggest that they were preferentially fixed as a consequence of the larger effective population size of wild mice. It thus appears that CNV differences among mouse strains do not provide an appropriate model for large-scale sequence variations in the human population.
Until recently, it was thought that most inherited human diversity results from genetic variation at single nucleotide sites. However, recent studies discovered many larger-scale differences, involving the duplication or deletion of thousands of bases. Do these large-scale differences contribute greatly to characteristics of human individuals, or are they of little consequence? For clues to solve this mystery the authors looked to the signatures of adaptive evolution written into the DNA. They reasoned that if large-scale DNA differences are beneficial, they should be enriched in genes, particularly those involved in fighting infection and sensing our environment. The authors discovered such enrichments indicating that some large-scale sequence differences have been advantageous during the last approximately 100,000 y of human history. By contrast, modern laboratory mice exhibit few signs of beneficial large-scale DNA differences, perhaps because advantageous sequences have swept rapidly through their ancestral populations. Some large-scale variations in human genomes thus appear to be a legacy of past evolutionary challenges to our species.
The human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-β-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.
We used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.
These results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.
MicroRNAs (miRNAs) are important genetic elements that regulate the expression of thousands of human genes. Polymorphisms affecting miRNA biogenesis, dosage and target recognition may represent potentially functional variants. The functional consequences of single nucleotide polymorphisms (SNPs) within critical miRNA sequences and outside of miRNA genes were previously demonstrated using both experimental and computational methods. However, little is known about how copy number variations (CNVs) affect miRNA genes.
In this study, we analyzed the co-localization of all miRNA loci with known CNV regions. Using bioinformatic tools we identified and validated 209 copy number variable miRNA genes (CNV-miRNAs) in CNV regions deposited in Database of Genomic Variations (DGV) and 11 CNV-miRNAs in two sets of CNVs defined as highly polymorphic. We propose potential mechanisms of CNV-mediated variation of functional copies of miRNAs (dosage) for different types of CNVs overlapping miRNA genes. We also showed that, consistent with their essential biological functions, miRNA loci are underrepresented in highly polymorphic and well-validated CNV regions.
We postulate that CNV-miRNAs are potential functional variants and should be considered high priority candidate variants in genotype-phenotype association studies.
Structural genetic changes, especially copy number variants (CNVs), represent a major source of genetic variation contributing to human disease. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease, but to date little is known about the role of CNVs in the etiology of TOF. Using high-resolution genome-wide microarrays and stringent calling methods, we investigated rare CNVs in a prospectively recruited cohort of 433 unrelated adults with TOF and/or pulmonary atresia at a single centre. We excluded those with recognized syndromes, including 22q11.2 deletion syndrome. We identified candidate genes for TOF based on converging evidence between rare CNVs that overlapped the same gene in unrelated individuals and from pathway analyses comparing rare CNVs in TOF cases to those in epidemiologic controls. Even after excluding the 53 (10.7%) subjects with 22q11.2 deletions, we found that adults with TOF had a greater burden of large rare genic CNVs compared to controls (8.82% vs. 4.33%, p = 0.0117). Six loci showed evidence for recurrence in TOF or related congenital heart disease, including typical 1q21.1 duplications in four (1.18%) of 340 Caucasian probands. The rare CNVs implicated novel candidate genes of interest for TOF, including PLXNA2, a gene involved in semaphorin signaling. Independent pathway analyses highlighted developmental processes as potential contributors to the pathogenesis of TOF. These results indicate that individually rare CNVs are collectively significant contributors to the genetic burden of TOF. Further, the data provide new evidence for dosage sensitive genes in PLXNA2-semaphorin signaling and related developmental processes in human cardiovascular development, consistent with previous animal models.
Congenital heart disease affects nearly 1% of all live births. Tetralogy of Fallot (TOF) is the most common form of cyanotic congenital heart disease. This condition is associated with hemizygous deletions of chromosome 22q11.2 and chromosomal trisomies, but little else is known about the genetic heterogeneity of this complex disease. We used high-resolution microarrays and stringent methods to study structural (copy number) variants in a systematically phenotyped cohort of unrelated adults with TOF. We found that individually rare genic copy number variants (CNVs) were collectively significant contributors to the genetic burden in TOF. Among CNVs that implicated candidate genes of interest were loss CNVs overlapping the PLXNA2 gene that codes for plexin A2. This is the first study to show a role for this semaphorin receptor in human congenital heart disease, consistent with a Plxna2 mouse knockout phenotype. Pathway analyses comparing rare exonic loss CNVs in the TOF sample to controls implicated other novel gene sets suggest new pathogenetic mechanisms.
Genomic structural changes, such as gene Copy Number Variations (CNVs) are extremely abundant in the human genome. An enormous effort is currently ongoing to recognize and catalogue human CNVs and their associations with abnormal phenotypic outcomes. Recently, several reports related neuropsychiatric diseases (i.e. autism spectrum disorders, schizophrenia, mental retardation, behavioral problems, epilepsy) with specific CNV. Moreover, for some conditions, both the deletion and duplication of the same genomic segment are related to the phenotype. Syndromes associated with CNVs (microdeletion and microduplication) have long been known to display specific neurobehavioral traits. It is important to note that not every gene is susceptible to gene dosage changes and there are only a few dosage sensitive genes. Smith-Magenis (SMS) and Potocki-Lupski (PTLS) syndromes are associated with a reciprocal microdeletion and microduplication within chromosome 17p11.2. in humans. The dosage sensitive gene responsible for most phenotypes in SMS has been identified: the Retinoic Acid Induced 1 (RAI1). Studies on mouse models and humans suggest that RAI1 is likely the dosage sensitive gene responsible for clinical features in PTLS. In addition, the human RAI1 gene has been implicated in several neurobehavioral traits as spinocerebellar ataxia (SCA2), schizophrenia and non syndromic autism. In this review we discuss the evidence of RAI1 as a dosage sensitive gene, its relationship with different neurobehavioral traits, gene structure and mutations, and what is known about its molecular and cellular function, as a first step in the elucidation of the mechanisms that relate dosage sensitive genes with abnormal neurobehavioral outcomes.
Copy Number Variation; dosage sensitive gene; neurobehavioral traits; Potocki-Lupski Syndrome; RAI1; Smith-Magenis Syndrome; transcription factor activity.
Human growth has an estimated heritability of about 80%–90%. Nevertheless, the underlying cause of shortness of stature remains unknown in the majority of individuals. Genome-wide association studies (GWAS) showed that both common single nucleotide polymorphisms and copy number variants (CNVs) contribute to height variation under a polygenic model, although explaining only a small fraction of overall genetic variability in the general population. Under the hypothesis that severe forms of growth retardation might also be caused by major gene effects, we searched for rare CNVs in 200 families, 92 sporadic and 108 familial, with idiopathic short stature compared to 820 control individuals. Although similar in number, patients had overall significantly larger CNVs (p-value<1×10−7). In a gene-based analysis of all non-polymorphic CNVs>50 kb for gene function, tissue expression, and murine knock-out phenotypes, we identified 10 duplications and 10 deletions ranging in size from 109 kb to 14 Mb, of which 7 were de novo (p<0.03) and 13 inherited from the likewise affected parent but absent in controls. Patients with these likely disease causing 20 CNVs were smaller than the remaining group (p<0.01). Eleven (55%) of these CNVs either overlapped with known microaberration syndromes associated with short stature or contained GWAS loci for height. Haploinsufficiency (HI) score and further expression profiling suggested dosage sensitivity of major growth-related genes at these loci. Overall 10% of patients carried a disease-causing CNV indicating that, like in neurodevelopmental disorders, rare CNVs are a frequent cause of severe growth retardation.
With a frequency of 3%, shortness of stature is a common medical concern. Although family studies have clearly shown that gene defects play a pivotal role in the development of short stature, the underlying genetic variants involved remain unknown in about 80% of cases. In contrast to recent studies which aimed at the identification of common genetic variants to explain minor differences in the height variation in the general population, we targeted rare genomic variants where we expected a major gene effect on growth. By examining 200 patients clinically evaluated for short stature, we show that rare structural chromosomal aberrations (CNVs) are associated with shortness of stature in 10% of the cases. The identified CNVs were either de novo or segregated with short stature in the families and include genes that are functionally involved in growth regulation in humans or mice. We furthermore demonstrate an overlap of these CNVs with known microdeletion syndromes. Interestingly, 3 CNVs contain positions of common variants and confirm the localization of major growth-related genes. These findings are particularly important for identification of biological pathways leading to short stature, but also for further therapeutic approaches.
MicroRNAs (miRNAs) and copy number variations (CNVs) represent two classes of newly discovered genomic elements that were shown to contribute to genome plasticity and evolution. Recent studies demonstrated that miRNAs and CNVs must have co-evolved and interacted in an attempt to maintain the balance of the dosage sensitive genes and at the same time increase the diversity of dosage non-sensitive genes, contributing to species evolution. It has been previously demonstrated that both the number of miRNAs that target genes found in CNV regions as well as the number of miRNA binding sites are significantly higher than those of genes found in non-CNV regions. These findings raise the possibility that miRNAs may have been created under evolutionary pressure, as a mechanism for increasing the tolerance to genome plasticity. In the current study, we aimed in exploring the differences of miRNAs-CNV functional interactions between human and seven others species. By performing in silico whole genome analysis in eight different species (human, chimpanzee, macaque, mouse, rat, chicken, dog and cow), we demonstrate that miRNAs targeting genes located within CNV regions in humans have special functional characteristics that provide an insight into the differences between humans and other species.
Copy number variation (CNV) represents another important source of genetic variation complementary to single nucleotide polymorphism (SNP). High-density SNP array data have been routinely used to detect human CNVs, many of which have significant functional effects on gene expression and human diseases. In the dairy industry, a large quantity of SNP genotyping results are becoming available and can be used for CNV discovery to understand and accelerate genetic improvement for complex traits.
We performed a systematic analysis of CNV using the Bovine HapMap SNP genotyping data, including 539 animals of 21 modern cattle breeds and 6 outgroups. After correcting genomic waves and considering the pedigree information, we identified 682 candidate CNV regions, which represent 139.8 megabases (~4.60%) of the genome. Selected CNVs were further experimentally validated and we found that copy number "gain" CNVs were predominantly clustered in tandem rather than existing as interspersed duplications. Many CNV regions (~56%) overlap with cattle genes (1,263), which are significantly enriched for immunity, lactation, reproduction and rumination. The overlap of this new dataset and other published CNV studies was less than 40%; however, our discovery of large, high frequency (> 5% of animals surveyed) CNV regions showed 90% agreement with other studies. These results highlight the differences and commonalities between technical platforms.
We present a comprehensive genomic analysis of cattle CNVs derived from SNP data which will be a valuable genomic variation resource. Combined with SNP detection assays, gene-containing CNV regions may help identify genes undergoing artificial selection in domesticated animals.
Substantial contribution to phenotypic diversity is accounted for by copy number variants (CNV). In human, as well as other species, the effect of CNVs range from benign to directly disease-causing which motivates the continued investigations of CNVs. Previous canine genome-wide screenings for CNVs have been performed using high-resolution comparative genomic hybridisation arrays which have contributed with a detailed catalogue of CNVs. Here, we present the first CNV investigation in dogs based on the recently reported CanineHD 170 K genotyping array. The hitherto largest dataset in canine CNV discovery was assessed, 351 dogs from 30 different breeds, enabling identification of novel CNVs and a thorough characterisation of breed-specific CNVs.
A stringent procedure identified 72 CNV regions with the smallest size of 38 kb and of the 72 CNV regions, 38 overlapped 148 annotated genes. A total of 29 novel CNV regions were found containing 44 genes. Furthermore, 15 breed specific CNV regions were identified of which 14 were novel and some of them overlapped putative disease susceptibility genes. In addition, the human ortholog of 23 canine copy number variable genes identified herein has been previously suggested to be dosage-sensitive in human.
The present study evaluated the performance of the CanineHD in detecting CNVs and extends the current catalogue of canine CNV regions with several dozens of novel CNV regions. These novel CNV regions, which harbour candidate genes that possibly contribute to phenotypic variation in dogs or to disease-susceptibility, are a rich resource for future investigations.
Copy number variation; CNV; SNP genotyping array; Dog genome; Deletion; Duplication; CanineHD
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.
Human individual genome sequencing has recently become affordable, enabling highly detailed genetic sequence comparisons. While the identification and genotyping of single-nucleotide polymorphisms has already been successfully established for different sequencing platforms, the detection, quantification and genotyping of large-scale copy-number variants (CNVs), i.e., losses or gains of long genomic segments, has remained challenging. We present a computational approach that enables detecting CNVs in sequencing data and accurately identifies the actual copy-number at which DNA segments of interest occur in an individual genome. This approach enabled us to obtain novel insights into the largest human gene family – the olfactory receptors (ORs) – involved in smell perception. While previous studies reported an abundance of CNVs in ORs, our approach enabled us to globally identify absolute differences in OR gene counts that exist between humans. While several OR genes have very high gene counts, other ORs are found only once or are missing entirely in some individuals. The latter have a particularly high probability of influencing individual differences in the perception of smell, a question that future experimental efforts can now address. Furthermore, we observed differences in OR gene counts between populations, pointing at ORs that might contribute to population-specific differences in smell.
Psoriasis is a common inflammatory skin disease with an etiology bases on both environmental and genetic factors. As is the case of many autoimmune diseases its real cause remains poorly defined. However, it is known that genetic factors contribute to disease susceptibility. The linkage analysis has been used to identify multiple loci and alleles that confer risk of the disease. Some other studies have focused upon single nucleotide polymorphisms (SNPs) for mapping of probable causal variants. Other studies, using genome-wide analytical techniques, tried to link the disease to copy number variants (CNVs) that are segments of DNA ranging in size from kilobases to megabases that vary in copy number. CNVs represent an important element of genomic polymorphism in humans and harboring dosage-sensitive genes may cause or predispose to a variety of human genetic diseases. The mechanisms giving rise to SNPs and CNVs can be considered as fundamental processes underlying gene duplications, deletions, insertions, inversions and complex combinations of rearrangements. The duplicated genes being the results of ‘successful’ copies are fixed and maintained in the population. Conversely, many ‘unsuccessful’ duplicates remain in the genome as pseudogenes. There is another form of genetic variations termed copy-neutral loss of heterozygosity (LOH) with less information about their potential impact on complex diseases. Additional studies would include associated gene expression variations with either SNPs or CNVs. Now many genetic techniques such as PCR, real time PCR, microarray and restriction fragment length analysis are available for detecting genetic polymorphisms, gene mapping and estimation of gene expression. Recently, the scientists have used these tools to define genetic signatures of disease, to understand genetic causes of disease and to characterize the effects of certain drugs on gene expression. This review highlights the principles, technology and applications on psoriasis.
Psoriasis; Genes; Cytokine
A fundamental issue in molecular evolution is how to identify the evolutionary forces that determine the fate of duplicated genes. The dosage balance hypothesis has been invoked to explain gene duplication patterns at the genomic level under the premise that a dosage imbalance among protein-complex subunits or interacting partners is often deleterious. Here we examine this hypothesis by investigating the molecular basis of dosage sensitivity. We focus on the extent of protein wrapping, which indicates how strongly the structural integrity of a protein relies on its interactive context. From this perspective, we predict that the duplicates of a highly under-wrapped protein or protein subunit should (1) be more sensitive to dosage imbalance and be less likely to be retained and (2) be more likely to survive from a whole-genome duplication (WGD) than from a non-WGD because a WGD causes little or no dosage imbalance. Our under-wrapping analysis of more than 12,000 protein structures strongly supports these predictions and further reveals that the effect of dosage sensitivity on gene duplicability decreases with increasing organismal complexity.
A gene duplication provides an extra gene copy that can be free to accumulate mutations and gain a new function. Therefore, gene duplication plays a very important role in evolution. However, the presence of an additional gene copy can sometimes be deleterious because it can lead to an excessive dosage relative to those of its interacting partners. This dosage imbalance effect in turn influences the fate of duplicated genes in evolution. Our study gives the first description to our knowledge of the molecular/structural basis for the dosage imbalance effect. We study the relationships between gene family size and extent of protein under-wrapping, a molecular quantifier of the reliance of the protein on binding partnerships to maintain structural integrity, indicative of the extent of structure protection from disruptive hydration. Using more than 12,000 protein three-dimensional structures from six organisms that range from bacteria to human, we show an inverse relationship between extent of protein under-wrapping and family size. That is, a duplication is unlikely to be tolerated if the protein is highly under-wrapped (i.e., its structure requires substantial stabilizing interactions with other proteins). We also show that the effect of dosage imbalance is more apparent in unicellular organisms but is buffered to some extent in higher eukaryotes.
β-defensins are a family of important peptides of innate immunity, involved in host defense, immunomodulation, reproduction, and pigmentation. Genes encoding β-defensins show evidence of birth-and-death evolution, adaptation by amino acid sequence changes, and extensive copy number variation (CNV) within humans and other species. The role of CNV in the adaptation of β-defensins to new functions remains unclear, as does the adaptive role of CNV in general. Here, we fine-map CNV of a cluster of β-defensins in humans and rhesus macaques. Remarkably, we found that the structure of the CNV is different between primates, with distinct mutational origins and CNV boundaries defined by retroviral long terminal repeat elements. Although the human β-defensin CNV region is 322 kb and encompasses several genes, including β-defensins, a long noncoding RNA gene, and testes-specific zinc-finger transcription factors, the orthologous region in the rhesus macaque shows CNV of a 20-kb region, containing only a single gene, the ortholog of the human β-defensin-2 gene. Despite its independent origins, the range of gene copy numbers in the rhesus macaque is similar to humans. In addition, the rhesus macaque gene has been subject to divergent positive selection at the amino acid level following its initial duplication event between 3 and 9.5 Ma, suggesting adaptation of this gene as the macaque successfully colonized novel environments outside Africa. Therefore, the molecular phenotype of β-defensin-2 CNV has undergone convergent evolution, and this gene shows evidence of adaptation at the amino acid level in rhesus macaques.
defensin; copy number variation; macaque; genome structure; evolution
Copy number variations (CNVs) confer significant effects on genetic innovation and phenotypic variation. Previous CNV studies in swine seldom focused on in-depth characterization of global CNVs.
Using whole-genome assembly comparison (WGAC) and whole-genome shotgun sequence detection (WSSD) approaches by next generation sequencing (NGS), we probed formation signatures of both segmental duplications (SDs) and individualized CNVs in an integrated fashion, building the finest resolution CNV and SD maps of pigs so far. We obtained copy number estimates of all protein-coding genes with copy number variation carried by individuals, and further confirmed two genes with high copy numbers in Meishan pigs through an enlarged population. We determined genome-wide CNV hotspots, which were significantly enriched in SD regions, suggesting evolution of CNV hotspots may be affected by ancestral SDs. Through systematically enrichment analyses based on simulations and bioinformatics analyses, we revealed CNV-related genes undergo a different selective constraint from those CNV-unrelated regions, and CNVs may be associated with or affect pig health and production performance under recent selection.
Our studies lay out one way for characterization of CNVs in the pig genome, provide insight into the pig genome variation and prompt CNV mechanisms studies when using pigs as biomedical models for human diseases.
Electronic supplementary material
The online version of this article (doi:10.1186/1471-2164-15-593) contains supplementary material, which is available to authorized users.
Copy number variations (CNVs); Segmental duplications (SDs); Next generation sequencing (NGS); Pigs
Copy-number variation (CNV), rather than complete loss of gene function, is increasingly implicated in human disease. Moreover, gene dosage is recognised as important in tumourigenesis, and there is an increasing realisation that CNVs may not be just symptomatic of the cancerous state but may, in fact, be causative. However, the identification of CNV-related phenotypes for mammalian genes is a slow process, due to the technical difficulty of constructing deletion mutants. Using the genome-wide deletion library for the model eukaryote, Saccharomyces cerevisiae, we have identified genes (termed haploproficient, HP) which, when one copy is deleted from a diploid cell, result in an increased rate of proliferation. Since haploproficiency under nutrient-sufficient conditions is a novel phenotype, we sought here to characterise a subset of the yeast haploproficient genes which seem particularly relevant to human cancers.
We show that, for a subset of HP genes, heterozygous deletion is sufficient to cause aberrant cell cycling and altered rates of apoptosis, phenotypes associated with cancer in mammalian cells. A majority of these yeast genes are the orthologs of mammalian cancer genes, and hence our studies suggest that CNV of these oncogenic orthologs may be sufficient to lead to tumourigenesis in human cells. Moreover, where not already implicated, this cluster of cancer-like phenotypes in this model eukaryote may be predictive of the involvement in cancer of the mammalian orthologs of these yeast HP genes. Using the yeast set as a model, we show that the response to a range of anti-cancer drugs is strongly dependent on gene dosage, such that intermediate concentrations of the drugs can actually increase a mutant’s growth rate.
The exploitation of data on the phenotypic impact of heterozygosis in Saccharomyces cerevisiae has permitted the prediction of CNVs affecting tumourigenesis in humans. Our yeast data also suggest that the identification of CNVs in tumour cells may assist both the selection of anti-cancer drugs and the dosages at which they should be administered if they are to be a beneficial, rather than a deleterious, therapy.
Copy-number variation; Model organism; Yeast; Cancer; Haploinsufficiency
Duplications of genes encoding highly connected and essential proteins are selected against in several species but not in human, where duplicated genes encode highly connected proteins. To understand when and how gene duplicability changed in evolution, we compare gene and network properties in four species (Escherichia coli, yeast, fly, and human) that are representative of the increase in evolutionary complexity, defined as progressive growth in the number of genes, cells, and cell types. We find that the origin and conservation of a gene significantly correlates with the properties of the encoded protein in the protein-protein interaction network. All four species preserve a core of singleton and central hubs that originated early in evolution, are highly conserved, and accomplish basic biological functions. Another group of hubs appeared in metazoans and duplicated in vertebrates, mostly through vertebrate-specific whole genome duplication. Such recent and duplicated hubs are frequently targets of microRNAs and show tissue-selective expression, suggesting that these are alternative mechanisms to control their dosage. Our study shows how networks modified during evolution and contributes to explaining the occurrence of somatic genetic diseases, such as cancer, in terms of network perturbations.
Gene copy number is often tightly controlled because it directly affects the gene dosage. In several species, including yeast, worm, and fly, genes that have a single gene copy (singleton genes) encode proteins with several connections in the protein interaction network (hubs) as well as essential proteins. Surprisingly, in mouse and human essential proteins and hubs are encoded by genes with more than one copy in the genome (duplicated genes). Here we show that these two distinct groups of hubs were acquired at different times during the evolution of protein interaction network and contribute in different ways to the cell life. Singleton hubs are ancestral genes that are conserved from prokaryotes to vertebrates and accomplish basic functions that deal with the cell survival. Duplicated hubs were acquired mostly within metazoans and duplicated through vertebrate-specific whole genome duplication. These genes are involved in processes that are crucial for the organization of multicellularity. Although duplicated, also recent hubs are subject to gene dosage control through microRNAs and tissue-selective expression. The clarification of how the protein interaction network evolves enables us to understand the adaptation to the progressive increase in complexity and to better characterize the genes involved in diseases such as cancer.
Multiple myeloma (MM) is a malignant proliferation of plasma B cells. Based on recurrent aneuploidy such as copy number alterations (CNAs), myeloma is divided into two subtypes with different CNA patterns and patient survival outcomes. How aneuploidy events arise, and whether they contribute to cancer cell evolution are actively studied. The large amount of transcriptomic changes resultant of CNAs (dosage effect) pose big challenges for identifying functional consequences of CNAs in myeloma in terms of specific driver genes and pathways. In this study, we hypothesize that gene-wise dosage effect varies as a result from complex regulatory networks that translate the impact of CNAs to gene expression, and studying this variation can provide insights into functional effects of CNAs.
We propose gene-wise dosage effect score and genome-wide karyotype plot as tools to measure and visualize concordant copy number and expression changes across cancer samples. We find that dosage effect in myeloma is widespread yet variable, and it is correlated with gene expression level and CNA frequencies in different chromosomes. Our analysis suggests that despite the enrichment of differentially expressed genes between hyperdiploid MM and non-hyperdiploid MM in the trisomy chromosomes, the chromosomal proportion of dosage sensitive genes is higher in the non-trisomy chromosomes. Dosage-sensitive genes are enriched by genes with protein translation and localization functions, and dosage resistant genes are enriched by apoptosis genes. These results point to future studies on differential dosage sensitivity and resistance of pro- and anti-proliferation pathways and their variation across patients as therapeutic targets and prognosis markers.
Our findings support the hypothesis that recurrent CNAs in myeloma are selected by their functional consequences. The novel dosage effect score defined in this work will facilitate integration of copy number and expression data for identifying driver genes in cancer genomics studies. The accompanying R code is available at http://www.canevolve.org/dosageEffect/.
Copy number alteration; Dosage effect; Multiple myeloma; Hyperdiploid; Integrative genomics