Copy number variation (CNV) is an important source of genomic diversity in humans, and influences disease susceptibility. The immunoglobulin-receptor genes FCGR3A and FCGR3B on chromosome 1q23.3 show CNV, and CNV of the FCGR3B gene is associated with glomerulonephritis in systemic lupus erythematosus and organ-specific autoimmunity. Large-scale case-control association studies of CNV require technologies that are amenable to high-throughput analysis with low error rates. Here we propose an integrated suite of five assays, four of them duplexed to reduce DNA usage, that assays for copy number variation at FCGR3A and FCGR3B, and genotype the polymorphic neutrophil antigen HNA1. We show how a maximum-likelihood approach to combining the results from these five assays allows estimation of statistical confidence for each individual copy number, and therefore an appropriate significance threshold to be set, controlling the error rate. This approach results in a high-throughput copy number genotyping system, with demonstrable precision and accuracy, that can be applied to large case-control cohort studies. We demonstrate Mendelian inheritance of this CNV, variation in frequency between Europeans and East Asians, and a lack of strong association between the CNV and flanking SNP genotypes, with important consequences for genome-wide association studies.
Fc receptor; copy number variation; lupus; paralogue ratio test
Haptoglobin, coded by the HP gene, is a plasma protein that acts as a scavenger for free heme, and haptoglobin-related protein (coded by the HPR gene) forms part of the trypanolytic factor TLF-1, together with apolipoprotein L1 (ApoL1). We analyse the polymorphic small intragenic duplication of the HP gene, with alleles Hp1 and Hp2, in 52 populations, and find no evidence for natural selection either from extended haplotype analysis or from correlation with pathogen richness matrices. Using fiber-FISH, the paralog ratio test, and array-CGH data, we also confirm that the HPR gene is copy number variable, with duplication of the whole HPR gene at polymorphic frequencies in west and central Africa, up to an allele frequency of 15 %. The geographical distribution of the HPR duplication allele overlaps the region where the pathogen causing chronic human African trypanosomiasis, Trypanosoma brucei gambiense, is endemic. The HPR duplication has occurred on one SNP haplotype, but there is no strong evidence of extended homozygosity, a characteristic of recent natural selection. The HPR duplication shows a slight, non-significant undertransmission to human African trypanosomiasis-affected children of unaffected parents in the Democratic Republic of Congo. However, taken together with alleles of APOL1, there is an overall significant undertransmission of putative protective alleles to human African trypanosomiasis-affected children.
Electronic supplementary material
The online version of this article (doi:10.1007/s00439-013-1352-x) contains supplementary material, which is available to authorized users.
A glycosylated polypeptide, β-defensin 126 (DEFB126), derived from the epididymis and adsorbed onto the sperm surface, has been implicated in immunoprotection and efficient movement of sperm in mucosal fluids of the female reproductive tract. Here, we report a sequence variant in DEFB126 that has a 2-nucleotide deletion in the open reading frame, which generates a non-stop mRNA. The allele frequency of this variant sequence is high in both a European (0.47) and a Chinese (0.45) population cohort. Binding of the Agaricus bisporus lectin to the sperm surface glycocalyx was significantly lower in men with the homozygous variant (del/del) genotype than in those with either a del/wt or wt/wt genotype, suggesting an altered sperm glycocalyx with fewer O-linked oligosaccharides in del/del men. Moreover, sperm from the del/del donors exhibited an 84% reduction in the rate of penetration of a hyaluronic acid (HA) gel, a surrogate for cervical mucus, compared to the other genotypes. This reduction in sperm performance in HA gels was not a result of decreased progressive motility (average curvilinear velocity) or morphological deficits. However, DEFB126 genotype and lectin binding were highly correlated with performance in the penetration assays. In a prospective cohort study of newly married couples who were trying to conceive by natural means, couples were less likely to become pregnant and took longer to achieve a live birth if the male partner was homozygous for the variant sequence. This common sequence variation in DEFB126, and its apparent cause of impaired reproductive function, provides an opportunity to better understand, clinically evaluate, and possibly treat human infertility.
Motivation: Genomic copy number variation (CNV) can influence susceptibility to common diseases. High-throughput measurement of gene copy number on large numbers of samples is a challenging, yet critical, stage in confirming observations from sequencing or array Comparative Genome Hybridization (CGH). The paralogue ratio test (PRT) is a simple, cost-effective method of accurately determining copy number by quantifying the amplification ratio between a target and reference amplicon. PRT has been successfully applied to several studies analyzing common CNV. However, its use has not been widespread because of difficulties in assay design.
Results: We present PRTPrimer (www.prtprimer.org) software for automated PRT assay design. In addition to stand-alone software, the web site includes a database of pre-designed assays for the human genome at an average spacing of 6 kb and a web interface for custom assay design. Other reference genomes can also be analyzed through local installation of the software. The usefulness of PRTPrimer was tested within known CNV, and showed reproducible quantification. This software and database provide assays that can rapidly genotype CNV, cost-effectively, on a large number of samples and will enable the widespread adoption of PRT.
Availability: PRTPrimer is available in two forms: a Perl script (version 5.14 and higher) that can be run from the command line on Linux systems and as a service on the PRTPrimer web site (www.prtprimer.org).
Supplementary data are available at Bioinformatics online.
The complement C4 locus is in the class III region of the MHC, and exhibits copy number variation. Complement C4 null alleles have shown association with a number of diseases including systemic lupus erythematosus (SLE). However, most studies to date have used protein immunophenotyping and not direct interrogation of the genome to determine C4 null allele status. Moreover, a lack of accurate C4 gene copy number (GCN) estimation and tight linkage disequilibrium across the disease-associated MHC haplotypes has confounded attempts to establish whether or not these associations are causal. We have therefore developed a high through-put paralog ratio test (PRT) in association with two restriction enzyme digest variant ratio tests (REDVRs) to determine total C4 GCN, C4A GCN, and C4B GCN. In the densely genotyped CEU cohort we show that this method is accurate and reproducible when compared to gold standard Southern blot copy number estimation with a discrepancy rate of 9%. We find a broad range of C4 GCNs in the CEU and the 1958 British Birth Cohort populations under study. In addition, SNP-C4 CNV analyses show only moderate levels of correlation and therefore do not support the use of SNP genotypes as proxies for complement C4 GCN.
complement C4; CNV; lupus; paralog ratio test
The evolutionary history of variation in the human Rh blood group system, determined by variants in the RHD and RHCE genes, has long been an unresolved puzzle in human genetics. Prior to medical treatments and interventions developed in the last century, the D-positive children of D-negative women were at risk for hemolytic disease of the newborn, if the mother produced anti-D antibodies following sensitization to the blood of a previous D-positive child. Given the deleterious fitness consequences of this disease, the appreciable frequencies in European populations of the responsible RHD gene deletion variant (for example, 0.43 in our study) seem surprising. In this study, we used new molecular and genomic data generated from four HapMap population samples to test the idea that positive selection for an as-of-yet unknown fitness benefit of the RHD deletion may have offset the otherwise negative fitness effects of hemolytic disease of the newborn. We found no evidence that positive natural selection affected the frequency of the RHD deletion. Thus, the initial rise to intermediate frequency of the RHD deletion in European populations may simply be explained by genetic drift/ founder effect, or by an older or more complex sweep that we are insufficiently powered to detect. However, our simulations recapitulate previous findings that selection on the RHD deletion is frequency dependent, and weak or absent near 0.5. Therefore, once such a frequency was achieved, it could have been maintained by a relatively small amount of genetic drift. We unexpectedly observed evidence for positive selection on the C allele of RHCE in non-African populations (on chromosomes with intact copies of the RHD gene) in the form of an unusually high FST value and the high frequency of a single haplotype carrying the C allele. RhCE function is not well understood, but the C/c antigenic variant is clinically relevant and can result in hemolytic disease of the newborn, albeit much less commonly and severely than that related to the D-negative blood type. Therefore, the potential fitness benefits of the RHCE C allele are currently unknown but merit further exploration.
Blood group polymorphism; copy number variation; human evolution; balancing selection
Lung function measures are heritable, predict mortality and are relevant in diagnosis of chronic obstructive pulmonary disease (COPD). COPD and asthma are diseases of the airways with major public health impacts and each have a heritable component. Genome-wide association studies of SNPs have revealed novel genetic associations with both diseases but only account for a small proportion of the heritability. Complex copy number variation may account for some of the missing heritability. A well-characterised genomic region of complex copy number variation contains beta-defensin genes (DEFB103, DEFB104 and DEFB4), which have a role in the innate immune response. Previous studies have implicated these and related genes as being associated with asthma or COPD. We hypothesised that copy number variation of these genes may play a role in lung function in the general population and in COPD and asthma risk. We undertook copy number typing of this locus in 1149 adult and 689 children using a paralogue ratio test and investigated association with COPD, asthma and lung function. Replication of findings was assessed in a larger independent sample of COPD cases and smoking controls. We found evidence for an association of beta-defensin copy number with COPD in the adult cohort (OR = 1.4, 95%CI:1.02–1.92, P = 0.039) but this finding, and findings from a previous study, were not replicated in a larger follow-up sample(OR = 0.89, 95%CI:0.72–1.07, P = 0.217). No robust evidence of association with asthma in children was observed. We found no evidence for association between beta-defensin copy number and lung function in the general populations. Our findings suggest that previous reports of association of beta-defensin copy number with COPD should be viewed with caution. Suboptimal measurement of copy number can lead to spurious associations. Further beta-defensin copy number measurement in larger sample sizes of COPD cases and children with asthma are needed.
The role of copy number variation of the CCL3L1 gene, encoding MIP1α, in contributing to the host variation in susceptibility and response to HIV infection is controversial. Here we analyse a sub-Saharan African cohort from Tanzania and Ethiopia, two countries with a high prevalence of HIV-1 and a high co-morbidity of HIV with tuberculosis.
We use a form of quantitative PCR called the paralogue ratio test to determine CCL3L1 gene copy number in 1134 individuals and validate our copy number typing using array comparative genomic hybridisation and fiber-FISH.
We find no significant association of CCL3L1 gene copy number with HIV load in antiretroviral-naïve patients prior to initiation of combination highly active anti-retroviral therapy. However, we find a significant association of low CCL3L1 gene copy number with improved immune reconstitution following initiation of highly active anti-retroviral therapy (p = 0.012), replicating a previous study.
Our work supports a role for CCL3L1 copy number in immune reconstitution following antiretroviral therapy in HIV, and suggests that the MIP1α -CCR5 axis might be targeted to aid immune reconstitution.
AIDS, caused by the retrovirus HIV, remains the largest cause of morbidity in sub-Saharan Africa yet almost all genetic studies have focused on cohorts from Western countries. HIV shows high co-morbidity with tuberculosis (TB), as HIV stimulates the reactivation of latent tuberculosis (TB). Recent clinical trials suggest that an effective anti-HIV response correlates with non-neutralising antibodies. Given that Fcγ receptors are critical in mediating the non-neutralising effects of antibodies, analysis of the extensive variation at Fcγ receptor genes is important. Single nucleotide variation and copy number variation (CNV) of Fcγ receptor genes affects the expression profile, activatory/inhibitory balance, and IgG affinity of the Fcγ receptor repertoire of each individual. In this study we investigated whether CNV of FCGR2C, FCGR3A and FCGR3B as well as the HNA1 allotype of FCGR3B is associated with HIV load, response to highly-active antiretroviral therapy (HAART) and co-infection with TB. We confirmed an effect of TB-co-infection status on HIV load and response to HAART, but no conclusive effect of the genetic variants we tested. We observed a small effect, in Ethiopians, of FCGR3B copy number, where deletion was more frequent in HIV-TB co-infected patients than those infected with HIV alone.
Previous studies have extensively documented antimicrobial and chemotactic activities of beta-defensins. Human beta-defensin-2 (hBD-2) is strongly expressed in lesional psoriatic epidermis, and recently we have shown that high beta-defensin genomic copy number is associated with psoriasis susceptibility. It is not known, however, if biologically and pathophysiologically relevant concentrations of hBD-2 protein are present in vivo, which could support an antimicrobial and proinflammatory role of beta-defensins in lesional psoriatic epidermis.
We found that systemic levels of hBD-2 showed a weak but significant correlation with beta defensin copy number in healthy controls but not in psoriasis patients with active disease. In psoriasis patients but not in atopic dermatitis patients, we found high systemic hBD-2 levels that strongly correlated with disease activity as assessed by the PASI score. Our findings suggest that systemic levels in psoriasis are largely determined by secretion from involved skin and not by genomic copy number. Modelling of the in vivo epidermal hBD-2 concentration based on the secretion rate in a reconstructed skin model for psoriatic epidermis provides evidence that epidermal hBD-2 levels in vivo are probably well above the concentrations required for in vitro antimicrobial and chemokine-like effects.
Serum hBD-2 appears to be a useful surrogate marker for disease activity in psoriasis. The discrepancy between hBD-2 levels in psoriasis and atopic dermatitis could explain the well known differences in infection rate between these two diseases.
Psoriasis is a common inflammatory skin disease with a strong genetic component. We have analysed the genomic copy number polymorphism of the beta-defensin region on human chromosome 8 in 179 Dutch psoriasis patients and 272 controls, and in 319 German psoriasis patients and 305 controls. Comparisons in both cohorts show a significant association between higher genomic copy number for beta-defensin genes and the risk of psoriasis.
In primates, infection is an important force driving gene evolution, and this is reflected in the importance of infectious disease in human morbidity today. The beta-defensins are key components of the innate immune system, with antimicrobial and cell signalling roles, but also reproductive functions. Here we examine evolution of beta-defensins in catarrhine primates and variation within different human populations.
We show that five beta-defensin genes that do not show copy number variation in humans show evidence of positive selection in catarrhine primates, and identify specific codons that have been under selective pressure. Direct haplotyping of DEFB127 in humans suggests long-term balancing selection: there are two highly diverged haplotype clades carrying different variants of a codon that, in primates, is positively selected. For DEFB132, we show that extensive diversity, including a four-state amino acid polymorphism (valine, isoleucine, alanine and threonine at position 93), is present in hunter-gatherer populations, both African and non-African, but not found in samples from agricultural populations.
Some, but not all, beta-defensin genes show positive selection in catarrhine primates. There is suggestive evidence of different selective pressures on these genes in humans, but the nature of the selective pressure remains unclear and is likely to differ between populations.
Recent work has demonstrated an unexpected prevalence of copy number variation in the human genome, and has highlighted the part this variation may play in predisposition to common phenotypes. Some important genes vary in number over a high range (e.g. DEFB4, which commonly varies between two and seven copies), and have posed formidable technical challenges for accurate copy number typing, so that there are no simple, cheap, high-throughput approaches suitable for large-scale screening. We have developed a simple comparative PCR method based on dispersed repeat sequences, using a single pair of precisely designed primers to amplify products simultaneously from both test and reference loci, which are subsequently distinguished and quantified via internal sequence differences. We have validated the method for the measurement of copy number at DEFB4 by comparison of results from >800 DNA samples with copy number measurements by MAPH/REDVR, MLPA and array-CGH. The new Paralogue Ratio Test (PRT) method can require as little as 10 ng genomic DNA, appears to be comparable in accuracy to the other methods, and for the first time provides a rapid, simple and inexpensive method for copy number analysis, suitable for application to typing thousands of samples in large case-control association studies.
Human beta-defensin 2 (DEFB4, also known as DEFB2 or hBD-2) is a salt-sensitive antimicrobial protein that is expressed in lung epithelia. Previous work has shown that it is encoded in a cluster of beta-defensin genes at 8p23.1, which varies in copy number between 2 and 12 in different individuals. We determined the copy number of this locus in 355 patients with cystic fibrosis (CF), and tested for correlation between beta-defensin cluster genomic copy number and lung disease associated with CF. No significant association was found.
Cryptic structural abnormalities within the subtelomeric regions of chromosomes have been the focus of much recent research because of their discovery in a percentage of people with mental retardation (UK terminology: learning disability). These studies focused on subjects (largely children) with various severities of intellectual impairment with or without additional physical clinical features such as dysmorphisms. However it is well established that prevalence of schizophrenia is around three times greater in those with mild mental retardation. The rates of bipolar disorder and major depressive disorder have also been reported as increased in people with mental retardation. We describe here a screen for telomeric abnormalities in a cohort of 69 patients in which mental retardation co-exists with severe psychiatric illness.
We have applied two techniques, subtelomeric fluorescence in situ hybridisation (FISH) and multiplex amplifiable probe hybridisation (MAPH) to detect abnormalities in the patient group.
A subtelomeric deletion was discovered involving loss of 4q in a patient with co-morbid schizoaffective disorder and mental retardation.
The precise region of loss has been defined allowing us to identify genes that may contribute to the clinical phenotype through hemizygosity. Interestingly, the region of 4q loss exactly matches that linked to bipolar affective disorder in a large multiply affected Australian kindred.
Beta-defensins are a family of multifunctional genes with roles in defense against pathogens, reproduction, and pigmentation. In humans, six beta-defensin genes are clustered in a repeated region which is copy-number variable (CNV) as a block, with a diploid copy number between 1 and 12. The role in host defense makes the evolutionary history of this CNV particularly interesting, because morbidity due to infectious disease is likely to have been an important selective force in human evolution, and to have varied between geographical locations. Here, we show CNV of the beta-defensin region in chimpanzees, and identify a beta-defensin block in the human lineage that contains rapidly evolving noncoding regulatory sequences. We also show that variation at one of these rapidly evolving sequences affects expression levels and cytokine responsiveness of DEFB103, a key inhibitor of influenza virus fusion at the cell surface. A worldwide analysis of beta-defensin CNV in 67 populations shows an unusually high frequency of high-DEFB103-expressing copies in East Asia, the geographical origin of historical and modern influenza epidemics, possibly as a result of selection for increased resistance to influenza in this region. Hum Mutat 32:743–750, 2011. © 2011 Wiley-Liss, Inc.
CNV; defensin; antimicrobial; influenza; paralogue ratio test
There have been conflicting reports in the literature on association of gene copy number with disease, including CCL3L1 and HIV susceptibility, and β-defensins and Crohn's disease. Quantification of precise gene copy numbers is important in order to define any association of gene copy number with disease. At present, real-time quantitative PCR (QPCR) is the most commonly used method to determine gene copy number, however the Paralogue Ratio Test (PRT) is being used in more and more laboratories.
In this study we compare a Pyrosequencing-based Paralogue Ratio Test (PPRT) for determining beta-defensin gene copy number with two currently used methods for gene copy number determination, QPCR and triplex PRT by typing five different cohorts (UK, Danish, Portuguese, Ghanaian and Czech) of DNA from a total of 576 healthy individuals. We found a systematic measurement bias between DNA cohorts revealed by QPCR, but not by the PRT-based methods. Using PRT, copy number ranged from 2 to 9 copies, with a modal copy number of 4 in all populations.
QPCR is very sensitive to quality of the template DNA, generating systematic biases that could produce false-positive or negative disease associations. Both triplex PRT and PPRT do not show this systematic bias, and type copy number within the correct range, although triplex PRT appears to be a more precise and accurate method to type beta-defensin copy number.