CCL3L and CCL4L genes encode HIV-suppressive chemokines, colocalize on chromosome 17q12 and have copy number variation. Copy number variation of CCL3L associates with HIV-AIDS susceptibility. Here, we determined the influence of the combinatorial content of distinct CCL3L and CCL4L genes on HIV-AIDS susceptibility.
By designing gene-specific assays, the association between doses of all CCL3L or CCL4L genes or their individual duplicated components (CCL3La/b and CCL4La/b) with HIV-AIDS susceptibility was determined in 298 perinatally exposed Ukrainian children.
The odds of transmission was increased in children with less than two copies of CCL3L or CCL4L, compared with those with at least two copies, and 10-fold higher when both mother and offspring had less than two CCL3L or CCL4L copies, compared with mother–child pairs with at least two copies. The extent of the pair-wise correlations between CCL3La, CCL3Lb, CCL4La and CCL4Lb copy number varied extensively, with an inverse correlation between CCL4L genes that transcribe a classical chemokine (CCL4La) versus aberrantly-spliced transcripts (CCL4Lb). Children possessing only CCL4Lb progressed four times faster to AIDS than those with only CCL4La. A lower content of CCL3L and CCL4L genes that transcribe classical chemokines was associated with enhanced HIV-AIDS susceptibility.
Transmission risk is greatest when mother and offspring both have low CCL3L or CCL4L gene doses. The impact on HIV-AIDS susceptibility of the chemokine gene-rich locus on 17q12 is dependent on the balance between the doses of genes conferring protective (CCL3La and CCL4La) versus detrimental (CCL4Lb) effects. Hence, the combinatorial genomic content of distinct genes within a copy number variable region may determine disease susceptibility.
AIDS; CCL3L; CCL4L; HIV; transmission
Variation in genes underlying host immunity can lead to marked differences in susceptibility to HIV infection among humans. Despite heavy reliance on non-human primates as models for HIV/AIDS, little is known about which host factors are shared and which are unique to a given primate lineage. Here, we investigate whether copy number variation (CNV) at CCL3-like genes (CCL3L), a key genetic host factor for HIV/AIDS susceptibility and cell-mediated immune response in humans, is also a determinant of time until onset of simian-AIDS in rhesus macaques. Using a retrospective study of 57 rhesus macaques experimentally infected with SIVmac, we find that CCL3L CNV explains approximately 18% of the variance in time to simian-AIDS (p<0.001) with lower CCL3L copy number associating with more rapid disease course. We also find that CCL3L copy number varies significantly (p<10−6) among rhesus subpopulations, with Indian-origin macaques having, on average, half as many CCL3L gene copies as Chinese-origin macaques. Lastly, we confirm that CCL3L shows variable copy number in humans and chimpanzees and report on CCL3L CNV within and among three additional primate species. On the basis of our findings we suggest that (1) the difference in population level copy number may explain previously reported observations of longer post-infection survivorship of Chinese-origin rhesus macaques, (2) stratification by CCL3L copy number in rhesus SIV vaccine trials will increase power and reduce noise due to non-vaccine-related differences in survival, and (3) CCL3L CNV is an ancestral component of the primate immune response and, therefore, copy number variation has not been driven by HIV or SIV per se.
Development of vaccines for HIV/AIDS is a pressing global issue. The rhesus monkey remains the primary model for testing potential human vaccines; however, little is known about similarities and differences in host genes involved in HIV/AIDS response in humans and rhesus monkeys. Understanding these similarities and/or differences should allow more efficient testing of vaccines beneficial to humans. Here we describe the role that variation in the number of copies of CCL3-like genes (CCL3L) plays in SIV progression rates in rhesus monkeys. Copy number variation (CNV) of these genes has previously been shown to play a role in susceptibility and progression of HIV in humans. Our results suggest that individual monkeys with lower CCL3L copy number progress more rapidly. Accounting for CCL3L CNV in rhesus vaccine trials will improve researchers' abilities to interpret survival data.
CCL3L1 copy number variation has been implicated as a marker for susceptibility and immunity to human immunodeficiency virus (HIV)-1 infection and its pathogenic sequelae. Some of these findings have been confirmed in several, but not all, subsequent independent cohort studies. A three-fold risk for the development of HIV-associated dementia was reported in individuals possessing a CCL3L1 copy number below the ethnic group median combined with a detrimental CCR5 genotype. With the availability of antiretroviral therapy since 1996, there has been a significant decline in HIV-associated dementia, and milder forms of HIV-associated neurocognitive impairment (HAND) are now most prevalent. Moreover, patients are living longer with HIV-1 infection and it is recognized that aging may be a contributory factor to the development of cognitive disorder. Thus, the need for biomarkers that can be used in clinical practice to identify and provide optimal treatment for those at increased risk for HAND is great. HAND affects 20%–30% of HIV-infected individuals, and several genetic loci which have been shown to confer susceptibility to HIV infection may also modulate the development of neurocognitive disorder. The aim of this study was to determine whether CCL3L1 chemokine gene copy number in self-defined ethnic groups could differentiate HIV-infected individuals with and without HAND.
Genomic DNA was isolated from buccal swabs or peripheral blood mononuclear cells obtained from HIV-infected patients with or without a diagnoses of neurocognitive dysfunction in the Northeast AIDS Dementia Cohort and National NeuroAIDS Tissue Consortium. To maintain a uniform standard, a quantitative polymerase chain reaction design similar to previous studies using Taqman probes and fixed input DNA between 2 ng and 10 ng was used to determine a CCL3L1 copy number. Standard curves with two-fold dilutions from 25 ng to 1.56 ng were generated. CCL3L1 copy number was determined in triplicate in 262 subjects using quantitative polymerase chain reaction and the relative quantitation method. Data were analyzed using analysis of variance, with significance defined as P < 0.05 and Bonferroni post hoc tests.
Significant differences as determined by analysis of variance in CCL3L1 copy number between African-Americans and Caucasians (P < 0.0001) were found, highlighting ethnic group differences in the copy number of this gene. However, there were no differences in CCL3L1 copy number across the neurocognitive groups within each ethnic group. The median CCL3L1 copy number in African-Americans of two and Caucasians of one in this study was significantly lower than the previously reported ethnic group means of two and four copies, respectively. A higher prevalence of abnormal cognition with a relative risk of four was seen in African-Americans versus Caucasians.
Based on this nested case-control study, CCL3L1 copy number alone may not be useful for distinguishing between individuals at risk for mild or severe neurocognitive disorder. Additional larger cohort studies are required to determine whether CCL3L1 copy number in combination with polymorphisms in other genes known to contribute to HIV risk will be useful in identifying those at increased risk for HAND.
neurological; HIV-associated dementia; HAND; chemokine; copy number; African-American; Caucasian
CCL3 is a ligand for the HIV-1 co-receptor CCR5. There have recently been conflicting reports in the literature concerning whether CCL3-like gene (CCL3L) copy number variation (CNV) is associated with resistance to HIV-1 acquisition and with both viral load and disease progression following infection with HIV-1. An association has also been reported between CCL3L CNV and clinical sequelae of the simian immunodeficiency virus (SIV) infection in vivo in rhesus monkeys. The present study was initiated to explore the possibility of an association of CCL3L CNV with the control of virus replication and AIDS progression in a carefully defined cohort of SIVmac251-infected, Indian-origin rhesus monkeys. Although we demonstrated extensive variation in copy number of CCL3L in this cohort of monkeys, CCL3L CNV was not significantly associated with either peak or set-point plasma SIV RNA levels in these monkeys when MHC class I allele Mamu-A*01 was included in the models or progression to AIDS in these monkeys. With 66 monkeys in the study, there was adequate power for these tests if the correlation of CCL3L and either peak or set-point plasma SIV RNA levels was 0.34 or 0.36, respectively. These findings call into question the premise that CCL3L CNV is important in HIV/SIV pathogenesis.
Host genetic factors are important in determining why some individuals maintain effective control of HIV-1 replication while others do not. Genes implicated in this control include a number associated with the development of classical immune effector responses against HIV-1 infection and others associated with viral entry. CCL3, formerly known as macrophage inflammatory factor 1α (MIP-1α), is a ligand for the HIV-1 co-receptor CCR5 and a chemokine that inhibits HIV-1 replication. The present study was undertaken to assess the effect of CCL3L copy number on the control of virus replication and AIDS progression in SIVmac251-infected, Indian-origin rhesus monkeys. We found that CCL3L copy number varies in rhesus monkeys. However, the copy number is not predictive of SIVmac251 containment following infection.
To dissect the haplotype structure of candidate genes for disease association studies, it is important to understand the nature of genetic variation at these loci in different populations. We present a survey of haplotype structure and linkage disequilibrium of chemokine and chemokine receptor genes in 11 geographically-distinct population samples (n = 728). Chemokine proteins are involved in intercellular signalling and the immune response. These molecules are important modulators of human immunodeficiency virus (HIV)-1 infection and the progression of the acquired immune deficiency syndrome, tumour development and the metastatic process of cancer. To study the extent of genetic variation in this gene family, single nucleotide polymorphisms (SNPs) from 13 chemokine and chemokine receptor genes were genotyped using the 5' nuclease assay (TaqMan).
SNP haplotypes, estimated from unphased genotypes using the Expectation-Maximization-algorithm, are described in a cluster of four CC-chemokine receptor genes (CCR3, CCR2, CCR5 and CCRL2) on chromosome 3p21, and a cluster of three CC-chemokine genes [MPIF-1 (CCL23) PARC (CCL18) and MIP- 1α (CCL3)] on chromosome 17q11-12. The 32 base pair (bp) deletion in exon 4 of CCR5 was also included in the haplotype analysis of 3p21. A total of 87.5 per cent of the variation of 14 biallelic loci scattered over 150 kilobases of 3p21 is explained by 11 haplotypes which have a frequency of at least 1 per cent in the total sample. An analysis of haplotype blocks in this region indicates recombination between CCR2 and CCR5, although long-range pairwise linkage disequilibrium across the region appears to remain intact on two common haplotypes. A reduced-median network demonstrates a clear relationship between 3p21 haplotypes, rooted by the putative ancestral haplotype determined by direct sequencing of four primate species. Analysis of six SNPs on 17q11-12 indicates that 97.5 per cent of the variation is explained by 15 haplotypes, representing at least 1 per cent of the total sample. Additionally, a possible signature of selection at a non-synonymous coding SNP (M106V) in the MPIF-1 (CCL23) gene warrants further study. We anticipate that the results of this study of chemokine and chemokine receptor variation will be applicable to more extensive surveys of long-range haplotype structure in these gene regions and to association studies of HIV-1 disease and cancer.
chemokine; SNP; population genetics; variation; haplotype estimation; linkage disequilibrium
There is an enrichment of immune response genes that are subject to copy number variations (CNVs). However, there is limited understanding of their impact on susceptibility to human diseases. CC chemokine ligand 3 like-1 (CCL3L1) is a potent ligand for the HIV coreceptor, CC chemokine receptor 5 (CCR5), and we have demonstrated previously an association between CCL3L1- gene containing segmental duplications and polymorphisms in CCR5 and HIV/AIDS susceptibility. Here, we determined the association between these genetic variations and risk of developing systemic lupus erythaematosus (SLE), differential recruitment of CD3+ and CD68+ leukocytes to the kidney, clinical severity of SLE reflected by autoantibody titres and the risk of renal complications in SLE.
We genotyped 1084 subjects (469 cases of SLE and 615 matched controls with no autoimmune disease) from three geographically distinct cohorts for variations in CCL3L1 and CCR5.
Deviation from the average copy number of CCL3L1 found in European populations increased the risk of SLE and modified the SLE-influencing effects of CCR5 haplotypes. The CCR5 human haplogroup (HH)E and CCR5-Δ32-bearing HHG*2 haplotypes were associated with an increased risk of developing SLE. An individual’s CCL3L1–CCR5 genotype strongly predicted the overall risk of SLE, high autoantibody titres, and lupus nephritis as well as the differential recruitment of leukocytes in subjects with lupus nephritis. The CCR5 HHE/HHG*2 genotype was associated with the maximal risk of developing SLE.
CCR5 haplotypes HHE and HHG*2 strongly influence the risk of SLE. The copy number of CCL3L1 influences risk of SLE and modifies the SLE-influencing effects associated with CCR5 genotypes. These findings implicate a key role of the CCL3L1–CCR5 axis in the pathogenesis of SLE.
Submicroscopic (less than 2 Mb) segmental DNA copy number changes are a recently recognized source of genetic variability between individuals. The biological consequences of copy number variants (CNVs) are largely undefined. In some cases, CNVs that cause gene dosage effects have been implicated in phenotypic variation. CNVs have been detected in diverse species, including mice and humans. Published studies in mice have been limited by resolution and strain selection. We chose to study 21 well-characterized inbred mouse strains that are the focus of an international effort to measure, catalog, and disseminate phenotype data. We performed comparative genomic hybridization using long oligomer arrays to characterize CNVs in these strains. This technique increased the resolution of CNV detection by more than an order of magnitude over previous methodologies. The CNVs range in size from 21 to 2,002 kb. Clustering strains by CNV profile recapitulates aspects of the known ancestry of these strains. Most of the CNVs (77.5%) contain annotated genes, and many (47.5%) colocalize with previously mapped segmental duplications in the mouse genome. We demonstrate that this technique can identify copy number differences associated with known polymorphic traits. The phenotype of previously uncharacterized strains can be predicted based on their copy number at these loci. Annotation of CNVs in the mouse genome combined with sequence-based analysis provides an important resource that will help define the genetic basis of complex traits.
A major goal of genetics and genomics is to understand how genetic differences between individuals (genotypes) translate into variation in disease susceptibility, behavior, and many other organism-level characteristics (phenotypes). While the sizes of genetic variants range from a single base to whole chromosomes, historically, only the extreme ends of this spectrum have been explored. DNA copy number variants (CNVs) lie between these two extremes, ranging in size from hundreds to millions of bases. The recent application of microarray technology to detect genetic variation in humans has led to the realization that CNVs are common. In fact, rough estimates indicate that CNVs and small-scale variants may constitute similar proportions of total genomic DNA. In this report, the authors characterize 80 CNVs across the genomes of 21 inbred strains of mice. The identification and characterization of mouse CNVs are important because inbred strains of mice are the most widely used model system to explore biomedical genetics. These CNVs are located near another class of genomic features, segmental duplications, more often than would be expected by chance, which supports the hypothesis that CNVs and segmental duplications are causally linked. Importantly, many of the CNVs contain known genes and thus may underlie both gene expression and phenotypic variation between strains.
Multiple sclerosis (MS) is a polygenic disease characterized by inflammation and demyelination in the central nervous system (CNS), which can be modeled in experimental autoimmune encephalomyelitis (EAE). The Eae18b locus on rat chromosome 10 has previously been linked to regulation of beta-chemokine expression and severity of EAE. Moreover, the homologous chemokine cluster in humans showed evidence of association with susceptibility to MS. We here established a congenic rat strain with Eae18b locus containing a chemokine cluster (Ccl2, Ccl7, Ccl11, Ccl12 and Ccl1) from the EAE- resistant PVG rat strain on the susceptible DA background and utilized myelin oligodendrocyte glycoprotein (MOG)-induced EAE to characterize the mechanisms underlying the genetic regulation. Congenic rats developed a milder disease compared to the susceptible DA strain, and this was reflected in decreased demyelination and in reduced recruitment of inflammatory cells to the brain. The congenic strain also showed significantly increased Ccl11 mRNA expression in draining lymph nodes and spinal cord after EAE induction. In the lymph nodes, macrophages were the main producers of CCL11, whereas macrophages and lymphocytes expressed the main CCL11 receptor, namely CCR3. Accordingly, the congenic strain also showed significantly increased Ccr3 mRNA expression in lymph nodes. In the CNS, the main producers of CCL11 were neurons, whereas CCR3 was detected on neurons and CSF producing ependymal cells. This corresponded to increased levels of CCL11 protein in the cerebrospinal fluid of the congenic rats. Increased intrathecal production of CCL11 in congenic rats was accompanied by a tighter blood brain barrier, reflected by more occludin+ blood vessels. In addition, the congenic strain showed a reduced antigen specific response and a predominant anti-inflammatory Th2 phenotype. These results indicate novel mechanisms in the genetic regulation of neuroinflammation.
Chemokine signals and their cell-surface receptors are important modulators of HIV-1 disease and cancer. To aid future case/control association studies, aim to further characterise the haplotype structure of variation in chemokine and chemokine receptor genes. To perform haplotype analysis in a population-based association study, haplotypes must be determined by estimation, in the absence of family information or laboratory methods to establish phase. Here, test the accuracy of estimates of haplotype frequency and linkage disequilibrium by comparing estimated haplotypes generated with the expectation maximisation (EM) algorithm to haplotypes determined from Centre d'Etude Polymorphisme Humain (CEPH) pedigree data. To do this, they have characterised haplotypes comprising alleles at 11 biallelic loci in four chemokine receptor genes (CCR3, CCR2, CCR5 and CCRL2), which span 150 kb on chromosome 3p21, and haplotyes of nine biallelic loci in six chemokine genes [MCP-1(CCL2), Eotaxin(CCL11), RANTES(CCL5), MPIF-1(CCL23), PARC(CCL18) and MIP-1α(CCL3) ] on chromosome 17q11-12. Forty multi-generation CEPH families, totalling 489 individuals, were genotyped by the TaqMan 5'-nuclease assay. Phased haplotypes and haplotypes estimated from unphased genotypes were compared in 103 grandparents who were assumed to have mated at random.
For the 3p21 single nucleotide polymorphism (SNP) data, haplotypes determined by pedigree analysis and haplotypes generated by the EM algorithm were nearly identical. Linkage disequilibrium, measured by the D' statistic, was nearly maximal across the 150 kb region, with complete disequilibrium maintained at the extremes between CCR3-Y17Y and CCRL2-1243V. D'-values calculated from estimated haplotypes on 3p21 had high concordance with pairwise comparisons between pedigree-phased chromosomes. Conversely, there was less agreement between analyses of haplotype frequencies and linkage disequilibrium using estimated haplotypes when compared with pedigree-phased haplotypes of SNPs on chromosome 17q11-12. These results suggest that, while estimations of haplotype frequency and linkage disequilibrium may be relatively simple in the 3p21 chemokine receptor cluster in population samples, the more complex environment on chromosome 17q11-12 will require a higher resolution haplotype analysis.
chemokine; SNP; haplotype estimation; pedigree analysis; linkage disequilibrium
CCL20 is currently the only known chemokine ligand for the receptor CCR6, and is a mucosal chemokine involved in normal and pathological immune responses. Although nucleotide sequence data are available for ccl20 and ccr6 sequences from multiple species, the ferret ccl20 and ccr6 sequences have not been determined. To increase our understanding of immune function in ferret models of infection and vaccination, we have used RT-PCR to obtain the ferret ccl20 and ccr6 cDNA sequences and functionally characterize the encoded proteins. The open reading frames of both genes were highly conserved across species and mostly closely related to canine sequences. For functional analyses, single cell clones expressing ferret CCR6 were generated, a ferret CCL20/mouse IgG2a fusion protein (fCCL20-mIgG2a) was produced, and fCCL20 was chemically synthesized. Cell clones expressing ferret CCR6 responded chemotactically to fCCL20-mIgG2a fusion protein and synthetic ferret CCL20. Chemotaxis inhibition studies identified the polyphenol epigallocatechin-3-gallate and the murine γ-herpesvirus 68 M3 protein as inhibitors of fCCL20. Surface plasmon resonance studies revealed that EGCG bound directly to fCCL20. These results provide molecular characterization of previously unreported ferret immune gene sequences and for the first time identify a broad-spectrum small molecule inhibitor of CCL20 and reveal CCL20 as a target for the herpesviral M3 protein.
chemokine; CCL20; CCR6; ferret; EGCG; M3 protein
Leprosy is characterized by polar clinical, histologic and immunological presentations. Previous immunologic studies of leprosy polarity were limited by the repertoire of cytokines known at the time.
We used a candidate gene approach to measure mRNA levels in skin biopsies from leprosy lesions. mRNA from 24 chemokines and cytokines, and 6 immune cell type markers were measured from 85 Nepalese leprosy subjects. Selected findings were confirmed with immunohistochemistry.
Expression of three soluble mediators (CCL18, CCL17 and IL-10) and one macrophage cell type marker (CD14) was significantly elevated in lepromatous (CCL18, IL-10 and CD14) or tuberculoid (CCL17) lesions. Higher CCL18 protein expression by immunohistochemistry and a trend in increased serum CCL18 in lepromatous lesions was observed. No cytokines were associated with erythema nodosum leprosum or Type I reversal reaction following multiple comparison correction. Hierarchical clustering suggested that CCL18 was correlated with cell markers CD209 and CD14, while neither CCL17 nor CCL18 were highly correlated with classical TH1 and TH2 cytokines.
Our findings suggest that CCL17 and CCL18 dermal expression is associated with leprosy polarity.
Leprosy presents with a polarized spectrum, with lepromatous leprosy having high bacillary numbers and TH2 dermal cytokines, versus tuberculoid leprosy showing very few bacilli and TH1 cytokines. The mechanism underlying this polarized presentation is largely unknown. In the following study, we isolated mRNA from skin biopsies from 85 individuals with leprosy and measured the expression of a panel of 24 cytokines and 6 cell markers. We found that three soluble mediators (CCL17, CCL18 and IL10) and one cell marker (CD14) were differentially expressed in leprosy dermal lesions. CCL18 and IL10 were more highly expressed within lepromatous lesions, and CCL17 and CD14 were more highly expressed within tuberculoid lesions. In addition, CCL18 protein expression was confirmed by immunostaining. CCL17 and CCL18, were more strongly associated with leprosy polarity than traditional TH1 and TH2 cytokines. These data suggest that newer soluble chemokines may be important in leprosy pathogenesis and uncover a molecular signature of the two polar phenotypes of leprosy, which may be useful in future diagnostics.
Background: Proteases responsible for a CCL15-(25–92) product have not been elucidated.
Results: All 14 CC monocyte chemoattractants, including CCL15, are processed by multiple MMPs.
Conclusion: MMP-processing of CCL15, CCL23, and CCL16 functional activity is altered by MMP processing.
Significance: This is the first study showing MMPs can activate CC chemokines and hence monoycte chemoattraction with potential to propagate inflammation.
Leukocyte migration and activation is orchestrated by chemokines, the cleavage of which modulates their activity and glycosaminoglycan binding and thus their roles in inflammation and immunity. Early research identified proteolysis as a means of both activating or inactivating CXC chemokines and inactivating CC chemokines. Recent evidence has shown activating cleavages of the monocyte chemoattractants CCL15 and CCL23 by incubation with synovial fluid, although the responsible proteases could not be identified. Herein we show that CCL15 is processed in human synovial fluid by matrix metalloproteinases (MMPs) and serine proteases. Furthermore, a family-wide investigation of MMP processing of all 14 monocyte-directed CC chemokines revealed that each is precisely cleaved by one or more MMPs. By MALDI-TOF-MS, 149 cleavage sites were sequenced including the first reported instance of CCL1, CCL16, and CCL17 proteolysis. Full-length CCL15-(1–92) and CCL23-(1–99) were cleaved within their unique 31 and 32-amino acid residue extended amino termini, respectively. Unlike other CCL chemokines that lose activity and become receptor antagonists upon MMP cleavage, the prominent MMP-processed products CCL15-(25–92, 28–92) and CCL23-(26–99) are stronger agonists in calcium flux and Transwell CC receptor transfectant and monocytic THP-1 migration assays. MMP processing of CCL16-(1–97) in its extended carboxyl terminus yields two products, CCL16-(8–77) and CCL16-(8–85), with both showing unexpected enhanced glycosaminoglycan binding. Hence, our study reveals for the first time that MMPs activate the long amino-terminal chemokines CCL15 and CCL23 to potent forms that have potential to increase monocyte recruitment during inflammation.
Arthritis; Chemokines; Chemotaxis; Inflammation; Mass Spectrometry (MS); Matrix Metalloproteinase (MMP); Monocytes; Protease
We constructed a 400K WG tiling oligoarray for the horse and applied it for the discovery of copy number variations (CNVs) in 38 normal horses of 16 diverse breeds, and the Przewalski horse. Probes on the array represented 18,763 autosomal and X-linked genes, and intergenic, sub-telomeric and chrY sequences. We identified 258 CNV regions (CNVRs) across all autosomes, chrX and chrUn, but not in chrY. CNVs comprised 1.3% of the horse genome with chr12 being most enriched. American Miniature horses had the highest and American Quarter Horses the lowest number of CNVs in relation to Thoroughbred reference. The Przewalski horse was similar to native ponies and draft breeds. The majority of CNVRs involved genes, while 20% were located in intergenic regions. Similar to previous studies in horses and other mammals, molecular functions of CNV-associated genes were predominantly in sensory perception, immunity and reproduction. The findings were integrated with previous studies to generate a composite genome-wide dataset of 1476 CNVRs. Of these, 301 CNVRs were shared between studies, while 1174 were novel and require further validation. Integrated data revealed that to date, 41 out of over 400 breeds of the domestic horse have been analyzed for CNVs, of which 11 new breeds were added in this study. Finally, the composite CNV dataset was applied in a pilot study for the discovery of CNVs in 6 horses with XY disorders of sexual development. A homozygous deletion involving AKR1C gene cluster in chr29 in two affected horses was considered possibly causative because of the known role of AKR1C genes in testicular androgen synthesis and sexual development. While the findings improve and integrate the knowledge of CNVs in horses, they also show that for effective discovery of variants of biomedical importance, more breeds and individuals need to be analyzed using comparable methodological approaches.
Genomes of individuals in a species vary in many ways, one of which is DNA copy number variation (CNV). This includes deletions, duplications, and complex rearrangements typically larger than 50 base-pairs. CNVs are part of normal genetic variation contributing to phenotypic diversity but can also be pathogenic and associated with diseases and disorders. In order to distinguish between the two, detailed knowledge about CNVs in the species of interest is needed. Here we studied the genomes of 38 normal horses of 16 diverse breeds, and identified 258 CNV regions. We integrated our findings with previously published horse CNVs and generated a composite dataset of ∼1400 CNVRs. Despite this large number, our analysis shows that CNV research in horses needs further improvement because the current data are based on 10% of horse breeds and that most CNVRs are study-specific and require validation. Finally, we analyzed CNVs in horses with disorders of sexual development and found in two male pseudo-hermaphrodites a large deletion disrupting a group of genes involved in sex hormone metabolism and sexual differentiation. The findings underline the possible role of CNVs in complex disorders such as development and reproduction.
There is considerable evidence that human genetic variation influences gene expression. Genome-wide studies have revealed that mRNA levels are associated with genetic variation in or close to the gene coding for those mRNA transcripts – cis effects, and elsewhere in the genome – trans effects. The role of genetic variation in determining protein levels has not been systematically assessed. Using a genome-wide association approach we show that common genetic variation influences levels of clinically relevant proteins in human serum and plasma. We evaluated the role of 496,032 polymorphisms on levels of 42 proteins measured in 1200 fasting individuals from the population based InCHIANTI study. Proteins included insulin, several interleukins, adipokines, chemokines, and liver function markers that are implicated in many common diseases including metabolic, inflammatory, and infectious conditions. We identified eight Cis effects, including variants in or near the IL6R (p = 1.8×10−57), CCL4L1 (p = 3.9×10−21), IL18 (p = 6.8×10−13), LPA (p = 4.4×10−10), GGT1 (p = 1.5×10−7), SHBG (p = 3.1×10−7), CRP (p = 6.4×10−6) and IL1RN (p = 7.3×10−6) genes, all associated with their respective protein products with effect sizes ranging from 0.19 to 0.69 standard deviations per allele. Mechanisms implicated include altered rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA), variation in gene copy number (CCL4L1) and altered transcription (GGT1). We identified one novel trans effect that was an association between ABO blood group and tumour necrosis factor alpha (TNF-alpha) levels (p = 6.8×10−40), but this finding was not present when TNF-alpha was measured using a different assay , or in a second study, suggesting an assay-specific association. Our results show that protein levels share some of the features of the genetics of gene expression. These include the presence of strong genetic effects in cis locations. The identification of protein quantitative trait loci (pQTLs) may be a powerful complementary method of improving our understanding of disease pathways.
One of the central dogmas of molecular genetics is that DNA is transcribed to RNA which is translated to protein and alterations to proteins can influence human diseases. Genome-wide association studies have recently revealed many new DNA variants that influence human diseases. To complement these efforts, several genome-wide studies have established that DNA variation influences mRNA expression levels. Loci influencing mRNA levels have been termed “eQTLs”. In this study we have performed the first genome-wide association study of the third piece in this jigsaw – the role of DNA variation in relation to protein levels, or “pQTLs”. We analysed 42 proteins measured in blood fractions from the InCHIANTI study. We identified eight cis effects including common variants in or near the IL6R, CCL4, IL18, LPA, GGT1, SHBG, CRP and IL1RN genes, all associated with blood levels of their respective protein products. Mechanisms implicated included altered transcription (GGT1) but also rates of cleavage of bound to unbound soluble receptor (IL6R), altered secretion rates of different sized proteins (LPA) and variation in gene copy number (CCL4). Blood levels of many of these proteins are correlated with human diseases and the identification of “pQTLs” may in turn help our understanding of disease.
HuR is a regulator of mRNA turnover or translation of inflammatory genes through binding to adenylate-uridylate–rich elements and related motifs present in the 3′ untranslated region (UTR) of mRNAs. We postulate that HuR critically regulates the epithelial response by associating with multiple ARE-bearing, functionally related inflammatory transcripts. We aimed to identify HuR targets in the human airway epithelial cell line BEAS-2B challenged with TNF-α plus IFN-γ, a strong stimulus for inflammatory epithelial responses. Ribonucleoprotein complexes from resting and cytokine-treated cells were immunoprecipitated using anti-HuR and isotype-control Ab, and eluted mRNAs were reverse-transcribed and hybridized to an inflammatory-focused gene array. The chemokines CCL2, CCL8, CXCL1, and CXCL2 ranked highest among 27 signaling and inflammatory genes significantly enriched in the HuR RNP-IP from stimulated cells over the control immunoprecipitation. Among these, 20 displayed published HuR binding motifs. Association of HuR with the four endogenous chemokine mRNAs was validated by single-gene ribonucleoprotein-immunoprecipitation and shown to be 3′ UTR-dependent by biotin pull-down assay. Cytokine treatment increased mRNA stability only for CCL2 and CCL8, and transient silencing and overexpression of HuR affected only CCL2 and CCL8 expression in primary and transformed epithelial cells. Cytokine-induced CCL2 mRNA was predominantly cytoplasmic. Conversely, CXCL1 mRNA remained mostly nuclear and unaffected, as CXCL2, by changes in HuR levels. Increase in cytoplasmic HuR and HuR target expression partially relied on the inhibition of AMP-dependent kinase, a negative regulator of HuR nucleocytoplasmic shuttling. HuR-mediated regulation in airway epithelium appears broader than previously appreciated, coordinating numerous inflammatory genes through multiple posttranscriptional mechanisms.
Infiltration of colorectal carcinomas (CRC) with T-cells has been associated with good prognosis. There are some indications that chemokines could be involved in T-cell infiltration of tumors. Selective modulation of chemokine activity at the tumor site could attract immune cells resulting in tumor growth inhibition. In mouse tumor model systems, gene therapy with chemokines or administration of antibody (Ab)-chemokine fusion proteins have provided potent immune mediated tumor rejection which was mediated by infiltrating T cells at the tumor site. To develop such immunotherapeutic strategies for cancer patients, one must identify chemokines and their receptors involved in T-cell migration toward tumor cells.
To identify chemokine and chemokine receptors involved in T-cell migration toward CRC cells, we have used our previously published three-dimensional organotypic CRC culture system. Organotypic culture was initiated with a layer of fetal fibroblast cells mixed with collagen matrix in a 24 well tissue culture plate. A layer of CRC cells was placed on top of the fibroblast-collagen layer which was followed by a separating layer of fibroblasts in collagen matrix. Anti-CRC specific cytotoxic T lymphocytes (CTLs) mixed with fibroblasts in collagen matrix were placed on top of the separating layer. Excess chemokine ligand (CCL) or Abs to chemokine or chemokine receptor (CCR) were used in migration inhibition assays to identify the chemokine and the receptor involved in CTL migration.
Inclusion of excess CCL2 in T-cell layer or Ab to CCL2 in separating layer of collagen fibroblasts blocked the migration of CTLs toward tumor cells and in turn significantly inhibited tumor cell apoptosis. Also, Ab to CCR2 in the separating layer of collagen and fibroblasts blocked the migration of CTLs toward tumor cells and subsequently inhibited tumor cell apoptosis. Expression of CCR2 in four additional CRC patients' lymphocytes isolated from infiltrating tumor tissues suggests their role in migration in other CRC patients.
Our data suggest that CCL2 secreted by tumor cells and CCR2 receptors on CTLs are involved in migration of CTLs towards tumor. Gene therapy of tumor cells with CCL2 or CCL2/anti-tumor Ab fusion proteins may attract CTLs that potentially could inhibit tumor growth.
Fc gamma receptors (FcγRs) play a crucial role in immunity by linking IgG antibody-mediated responses with cellular effector and regulatory functions. Genetic variants in these receptors have been previously identified as risk factors for several chronic inflammatory conditions. The present study aimed to investigate the presence of copy number variations (CNVs) in the FCGR3B gene and its potential association with the autoimmune disease rheumatoid arthritis (RA).
CNV of the FCGR3B gene was studied using Multiplex Ligation Dependent Probe Amplification (MLPA) in 518 Dutch RA patients and 304 healthy controls. Surprisingly, three independent MLPA probes targeting the FCGR3B promoter measured different CNV frequencies, with probe#1 and #2 measuring 0 to 5 gene copies and probe#3 showing little evidence of CNV. Quantitative-PCR correlated with the copy number results from MLPA probe#2, which detected low copy number (1 copy) in 6.7% and high copy number (≥3 copies) in 9.4% of the control population. No significant difference was observed between RA patients and the healthy controls, neither in the low copy nor the high copy number groups (p-values = 0.36 and 0.71, respectively). Sequencing of the FCGR3B promoter region revealed an insertion/deletion (indel) that explained the disparate CNV results of MLPA probe#1. Finally, a non-significant trend was found between the novel -256A>TG indel and RA (40.7% in healthy controls versus 35.9% in RA patients; P = 0.08).
The current study highlights the complexity and poor characterization of the FCGR3B gene sequence, indicating that the design and interpretation of genotyping assays based on specific probe sequences must be performed with caution. Nonetheless, we confirmed the presence of CNV and identified novel polymorphisms in the FCGR3B gene in the Dutch population. Although no association was found between RA and FCGR3B CNV, the possible protective effect of the -256A>TG indel polymorphism must be addressed in larger studies.
Copy number variants (CNVs) account for substantial variation between genomes and are a major source of normal and pathogenic phenotypic differences. The dog is an ideal model to investigate mutational mechanisms that generate CNVs as its genome lacks a functional ortholog of the PRDM9 gene implicated in recombination and CNV formation in humans. Here we comprehensively assay CNVs using high-density array comparative genomic hybridization in 50 dogs from 17 dog breeds and 3 gray wolves.
We use a stringent new method to identify a total of 430 high-confidence CNV loci, which range in size from 9 kb to 1.6 Mb and span 26.4 Mb, or 1.08%, of the assayed dog genome, overlapping 413 annotated genes. Of CNVs observed in each breed, 98% are also observed in multiple breeds. CNVs predicted to disrupt gene function are significantly less common than expected by chance. We identify a significant overrepresentation of peaks of GC content, previously shown to be enriched in dog recombination hotspots, in the vicinity of CNV breakpoints.
A number of the CNVs identified by this study are candidates for generating breed-specific phenotypes. Purifying selection seems to be a major factor shaping structural variation in the dog genome, suggesting that many CNVs are deleterious. Localized peaks of GC content appear to be novel sites of CNV formation in the dog genome by non-allelic homologous recombination, potentially activated by the loss of PRDM9. These sequence features may have driven genome instability and chromosomal rearrangements throughout canid evolution.
Recent studies in human have highlighted the importance of the monocyte chemotactic proteins (MCP) in leukocyte trafficking and their effects in inflammatory processes, tumor progression, and HIV-1 infection. In European rabbit (Oryctolagus cuniculus) one of the prime MCP targets, the chemokine receptor CCR5 underwent a unique structural alteration. Until now, no homologue of MCP-2/CCL8a, MCP-3/CCL7 or MCP-4/CCL13 genes have been reported for this species. This is interesting, because at least the first two genes are expressed in most, if not all, mammals studied, and appear to be implicated in a variety of important chemokine ligand-receptor interactions. By assessing the Rabbit Whole Genome Sequence (WGS) data we have searched for orthologs of the mammalian genes of the MCP-Eotaxin cluster.
We have localized the orthologs of these chemokine genes in the genome of European rabbit and compared them to those of leporid genera which do (i.e. Oryctolagus and Bunolagus) or do not share the CCR5 alteration with European rabbit (i.e. Lepus and Sylvilagus). Of the Rabbit orthologs of the CCL8, CCL7, and CCL13 genes only the last two were potentially functional, although showing some structural anomalies at the protein level. The ortholog of MCP-2/CCL8 appeared to be pseudogenized by deleterious nucleotide substitutions affecting exon1 and exon2. By analyzing both genomic and cDNA products, these studies were extended to wild specimens of four genera of the Leporidae family: Oryctolagus, Bunolagus, Lepus, and Sylvilagus. It appeared that the anomalies of the MCP-3/CCL7 and MCP-4/CCL13 proteins are shared among the different species of leporids. In contrast, whereas MCP-2/CCL8 was pseudogenized in every studied specimen of the Oryctolagus - Bunolagus lineage, this gene was intact in species of the Lepus - Sylvilagus lineage, and was, at least in Lepus, correctly transcribed.
The biological function of a gene was often revealed in situations of dysfunction or gene loss. Infections with Myxoma virus (MYXV) tend to be fatal in European rabbit (genus Oryctolagus), while being harmless in Hares (genus Lepus) and benign in Cottontail rabbit (genus Sylvilagus), the natural hosts of the virus. This communication should stimulate research on a possible role of MCP-2/CCL8 in poxvirus related pathogenicity.
Chemokines; Monocyte chemotactic protein; Pseudogene; Poxvirus; Myxomatosis; Oryctolagus; Bunolagus; Sylvilagus; Lepus
► We model, for the first time, CCL3L1 copy number variation and susceptibility to malaria. ► Association analysis was performed using family-based methods in a Tanzanian population. ► We question whether malaria has shaped the current spectrum of variation observed at CCL3L1. ► We identify a weak association between CCL3L1 copy number and haemoglobin concentration.
Copy number variation can contribute to the variation observed in susceptibility to complex diseases. Here we present the first study to investigate copy number variation of the chemokine gene CCL3L1 with susceptibility to malaria. We present a family-based genetic analysis of a Tanzanian population (n = 922), using parasite load, mean number of clinical infections of malaria and haemoglobin levels as phenotypes. Copy number of CCL3L1 was measured using the paralogue ratio test (PRT) and the dataset exhibited copy numbers ranging between 1 and 10 copies per diploid genome (pdg). Association between copy number and phenotypes was assessed. Furthermore, we were able to identify copy number haplotypes in some families, using microsatellites within the copy variable region, for transmission disequilibrium testing. We identified a high level of copy number haplotype diversity and find some evidence for an association of low CCL3L1 copy number with protection from anaemia.
CCL3L1; MIP-1α; Malaria; Haplotype
It is well-documented that both chemokine (C-C motif) ligand 19 (CCL19) and 21 (CCL21) mediate cell migration and angiogenesis in many diseases. However, these ligands’ precise pathological role in ankylosing spondylitis (AS) has not been elucidated. The objective of this study was to examine the expression of CCL19 and CCL21 (CCL19/CCL21) in AS hip ligament tissue (LT) and determine their pathological functions.
The expression levels of CCL19, CCL21 and their receptor CCR7 in AS (n = 31) and osteoarthritis (OA, n = 21) LT were analyzed via real-time polymerase chain reaction (RT-PCR) and immunohistochemistry (IHC). The expression of CCL19, CCL21 and CCR7 in AS ligament fibroblasts was also detected. The proliferation of ligament fibroblasts was measured via a cell counting kit-8 (CCK8) assay after exogenous CCL19/CCL21 treatment. Additionally, the role of CCL19/CCL21 in osteogenesis was evaluated via RT-PCR and enzyme-linked immunosorbent assay (ELISA) in individual AS fibroblast cultures. Furthermore, the expression of the bone markers alkaline phosphatase (ALP), osteocalcin (OCN), collagenase I (COL1), integrin-binding sialoprotein (IBSP) and the key regulators runt-related transcription factor-2 (Runx-2) and osterix were investigated. Moreover, the CCL19/CCL21 levels in serum and LT were measured via ELISA.
The mRNA levels of CCL19/CCL21 in AS hip LT were significantly higher than that in OA LT, and IHC analysis revealed a similar result. Exogenous CCL19/CCL21 treatment did not affect the proliferation of ligament fibroblasts but significantly up-regulated the expression of bone markers, including ALP and OCN, and the key regulators Runx-2 and osterix. In addition, the serum levels of CCL19/CCL21 were apparently elevated in AS patients compared to healthy controls (HC), and the expression of the two chemokines correlated significantly in AS patients.
CCL19 and CCL21, two chemokines displaying significantly associated expression in serum, indicating a synergistic effect on AS pathogenesis, may function as promoters of ligament ossification in AS patients.
CCL19; CCL21; Ankylosing spondylitis; Fibroblast; Ossification
The functional contribution of CNV to human biology and disease pathophysiology has undergone limited exploration. Recent observations in humans indicate a tentative link between CNV and weight regulation. Smith-Magenis syndrome (SMS), manifesting obesity and hypercholesterolemia, results from a deletion CNV at 17p11.2, but is sometimes due to haploinsufficiency of a single gene, RAI1. The reciprocal duplication in 17p11.2 causes Potocki-Lupski syndrome (PTLS). We previously constructed mouse strains with a deletion, Df(11)17, or duplication, Dp(11)17, of the mouse genomic interval syntenic to the SMS/PTLS region. We demonstrate that Dp(11)17 is obesity-opposing; it conveys a highly penetrant, strain-independent phenotype of reduced weight, leaner body composition, lower TC/LDL, and increased insulin sensitivity that is not due to alteration in food intake or activity level. When fed with a high-fat diet, Dp(11)17/+ mice display much less weight gain and metabolic change than WT mice, demonstrating that the Dp(11)17 CNV protects against metabolic syndrome. Reciprocally, Df(11)17/+ mice with the deletion CNV have increased weight, higher fat content, decreased HDL, and reduced insulin sensitivity, manifesting a bona fide metabolic syndrome. These observations in the deficiency animal model are supported by human data from 76 SMS subjects. Further, studies on knockout/transgenic mice showed that the metabolic consequences of Dp(11)17 and Df(11)17 CNVs are not only due to dosage alterations of Rai1, the predominant dosage-sensitive gene for SMS and likely also PTLS. Our experiments in chromosome-engineered mouse CNV models for human genomic disorders demonstrate that a CNV can be causative for weight/metabolic phenotypes. Furthermore, we explored the biology underlying the contribution of CNV to the physiology of weight control and energy metabolism. The high penetrance, strain independence, and resistance to dietary influences associated with the CNVs in this study are features distinct from most SNP–associated metabolic traits and further highlight the potential importance of CNV in the etiology of both obesity and MetS as well as in the protection from these traits.
Genetic factors play a large role in obesity. However, despite recent technical progress in the search for genetic variants, the identities of causative and contributory genetic factors remain largely unknown. Whereas nucleotide sequence variation has been studied extensively with respect to its potential contribution to obesity, copy number variations (CNV), in which genes exist in abnormal numbers of copies mostly due to duplication or deletion, have only more recently been observed to be associated with human obesity. In this report, we utilize chromosome engineered mouse strains harboring a deletion or duplication CNV to address the potential functional impact of CNVs on weight control and metabolism. We show that the duplication CNV leads to lower body weight; it is also metabolically advantageous and protects from diet-induced obesity and metabolic syndrome (MetS). The deletion CNV causes a “mirror” phenotype with increased body weight and MetS–like phenotypes. Importantly, these effects manifest regardless of the genetic background and do not appear to be attributable to any single gene. These findings demonstrate experimentally that CNV can be causative for weight and metabolic phenotypes and highlight the potential relevance and importance of CNV in the etiology of obesity/MetS and the protection from these traits.
The human genome displays extensive copy-number variation (CNV). Recent discoveries have shown that large segments of DNA, ranging in size from hundreds to thousands of nucleotides, are either deleted or duplicated. This CNV may encompass genes, leading to a change in phenotype, including drug response phenotypes. Gemcitabine and 1-β-D-arabinofuranosylcytosine (AraC) are cytidine analogues used to treat a variety of cancers. Previous studies have shown that genetic variation may influence response to these drugs. In the present study, we set out to test the hypothesis that variation in copy number might contribute to variation in cytidine analogue response phenotypes.
We used a cell-based model system consisting of 197 ethnically-defined lymphoblastoid cell lines for which genome-wide SNP data were obtained using Illumina 550 and 650 K SNP arrays to study cytidine analogue cytotoxicity. 775 CNVs with allele frequencies > 1% were identified in 102 regions across the genome. 87/102 of these loci overlapped with previously identified regions of CNV. Association of CNVs with gemcitabine and AraC IC50 values identified 11 regions with permutation p-values < 0.05. Multiplex ligation-dependent probe amplification assays were performed to verify the 11 CNV regions that were associated with this phenotype; with false positive and false negative rates for the in-silico findings of 1.3% and 0.04%, respectively. We also had basal mRNA expression array data for these same 197 cell lines, which allowed us to quantify mRNA expression for 41 probesets in or near the CNV regions identified. We found that 7 of those 41 genes were highly expressed in our lymphoblastoid cell lines, and one of the seven genes (SMYD3) that was significant in the CNV association study was selected for further functional experiments. Those studies showed that knockdown of SMYD3, in pancreatic cancer cell lines increased gemcitabine and AraC resistance during cytotoxicity assay, consistent with the results of the association analysis.
These results suggest that CNVs may play a role in variation in cytidine analogue effect. Therefore, association studies of CNVs with drug response phenotypes in cell-based model systems, when paired with functional characterization, might help to identify CNV that contributes to variation in drug response.
The detection of copy number variants (CNVs) and the results of CNV-disease association studies rely on how CNVs are defined, and because array-based technologies can only infer CNVs, CNV-calling algorithms can produce vastly different findings. Several authors have noted the large-scale variability between CNV-detection methods, as well as the substantial false positive and false negative rates associated with those methods. In this study, we use variations of four common algorithms for CNV detection (PennCNV, QuantiSNP, HMMSeg, and cnvPartition) and two definitions of overlap (any overlap and an overlap of at least 40% of the smaller CNV) to illustrate the effects of varying algorithms and definitions of overlap on CNV discovery.
Methodology and Principal Findings
We used a 56 K Illumina genotyping array enriched for CNV regions to generate hybridization intensities and allele frequencies for 48 Caucasian schizophrenia cases and 48 age-, ethnicity-, and gender-matched control subjects. No algorithm found a difference in CNV burden between the two groups. However, the total number of CNVs called ranged from 102 to 3,765 across algorithms. The mean CNV size ranged from 46 kb to 787 kb, and the average number of CNVs per subject ranged from 1 to 39. The number of novel CNVs not previously reported in normal subjects ranged from 0 to 212.
Conclusions and Significance
Motivated by the availability of multiple publicly available genome-wide SNP arrays, investigators are conducting numerous analyses to identify putative additional CNVs in complex genetic disorders. However, the number of CNVs identified in array-based studies, and whether these CNVs are novel or valid, will depend on the algorithm(s) used. Thus, given the variety of methods used, there will be many false positives and false negatives. Both guidelines for the identification of CNVs inferred from high-density arrays and the establishment of a gold standard for validation of CNVs are needed.
Copy-number variations (CNVs) are widespread in the human genome, but comprehensive assignments of integer locus copy-numbers (i.e., copy-number genotypes) that, for example, enable discrimination of homozygous from heterozygous CNVs, have remained challenging. Here we present CopySeq, a novel computational approach with an underlying statistical framework that analyzes the depth-of-coverage of high-throughput DNA sequencing reads, and can incorporate paired-end and breakpoint junction analysis based CNV-analysis approaches, to infer locus copy-number genotypes. We benchmarked CopySeq by genotyping 500 chromosome 1 CNV regions in 150 personal genomes sequenced at low-coverage. The assessed copy-number genotypes were highly concordant with our performed qPCR experiments (Pearson correlation coefficient 0.94), and with the published results of two microarray platforms (95–99% concordance). We further demonstrated the utility of CopySeq for analyzing gene regions enriched for segmental duplications by comprehensively inferring copy-number genotypes in the CNV-enriched >800 olfactory receptor (OR) human gene and pseudogene loci. CopySeq revealed that OR loci display an extensive range of locus copy-numbers across individuals, with zero to two copies in some OR loci, and two to nine copies in others. Among genetic variants affecting OR loci we identified deleterious variants including CNVs and SNPs affecting ∼15% and ∼20% of the human OR gene repertoire, respectively, implying that genetic variants with a possible impact on smell perception are widespread. Finally, we found that for several OR loci the reference genome appears to represent a minor-frequency variant, implying a necessary revision of the OR repertoire for future functional studies. CopySeq can ascertain genomic structural variation in specific gene families as well as at a genome-wide scale, where it may enable the quantitative evaluation of CNVs in genome-wide association studies involving high-throughput sequencing.
Human individual genome sequencing has recently become affordable, enabling highly detailed genetic sequence comparisons. While the identification and genotyping of single-nucleotide polymorphisms has already been successfully established for different sequencing platforms, the detection, quantification and genotyping of large-scale copy-number variants (CNVs), i.e., losses or gains of long genomic segments, has remained challenging. We present a computational approach that enables detecting CNVs in sequencing data and accurately identifies the actual copy-number at which DNA segments of interest occur in an individual genome. This approach enabled us to obtain novel insights into the largest human gene family – the olfactory receptors (ORs) – involved in smell perception. While previous studies reported an abundance of CNVs in ORs, our approach enabled us to globally identify absolute differences in OR gene counts that exist between humans. While several OR genes have very high gene counts, other ORs are found only once or are missing entirely in some individuals. The latter have a particularly high probability of influencing individual differences in the perception of smell, a question that future experimental efforts can now address. Furthermore, we observed differences in OR gene counts between populations, pointing at ORs that might contribute to population-specific differences in smell.