|Home | About | Journals | Submit | Contact Us | Français|
Common variable immunodeficiency (CVID) is a heterogeneous immune defect characterized by hypogammaglobulinemia, failure of specific antibody production, susceptibility to infections, and an array of comorbidities.
To address the underlying immunopathogenesis of CVID and comorbidities, we conducted the first genome-wide association and gene copy number variation (CNV) study in patients with CVID.
Three hundred sixty-three patients with CVID from 4 study sites were genotyped with 610,000 single nucleotide polymorphisms (SNPs). Patients were divided into a discovery cohort of 179 cases in comparison with 1,917 control subjects and a replication cohort of 109 cases and 1,114 control subjects.
Our analyses detected strong association with the MHC region and association with a disintegrin and metalloproteinase (ADAM) genes (P combined = 1.96 × 10−7) replicated in the independent cohort. CNV analysis defined 16 disease-associated deletions and duplications, including duplication of origin recognition complex 4L (ORC4L) that was unique to 15 cases (P = 8.66 × 10−16), as well as numerous unique rare intraexonic deletions and duplications suggesting multiple novel genetic causes of CVID. Furthermore, the 1,000 most significant SNPs were strongly predictive of the CVID phenotype by using a Support Vector Machine algorithm with positive and negative predictive values of 1.0 and 0.957, respectively.
Our integrative genome-wide analysis of SNP genotypes and CNVs has uncovered multiple novel susceptibility loci for CVID, both common and rare, which is consistent with the highly heterogeneous nature of CVID. These results provide new mechanistic insights into immunopathogenesis based on these unique genetic variations and might allow for improved diagnosis of CVID based on accurate prediction of the CVID clinical phenotypes by using our Support Vector Machine model.
Common variable immunodeficiency (CVID) is manifested by insufficient quantity and quality of immunoglobulin, leading to susceptibility to bacterial infections.1 CVID is a primary immunodeficiency disease because it is believed to result from intrinsic deficits affecting immunologic functions. CVID is heterogeneous, can present early or later in life, and is associated with specific comorbidities.2 Efforts to subcategorize CVID to predict outcomes and comorbid conditions both clinically and based on immunologic phenotypes are ongoing.3 B cell–activating factor of the TNF family receptor,4 transmembrane activator and calcium modulator and cyclophilin ligand interactor (TACI),5–7 and certain HLA haplotypes8,9 have been identified as potential gene candidates for susceptibility to CVID. Inducible costimulator, 10,11 CD81,12 CD19,13,14 and CD2015 harbor disease-causing mutations that presently explain only a small percentage of cases.16 The heterogeneous presentations of patients with CVID complicate clinical management, and searches for novel genetic predictors of disease or comorbidity have been of limited success.
Genome-wide single nucleotide polymorphism (SNP) arrays have enabled high-throughput genotyping of genomic DNAwith tagging of the whole genome based on linkage disequilibrium. Given both intensity and genotype content, copy number variations (CNVs) can also be detected with the same genome-wide association study (GWAS) platforms and are predicted to be responsible for generation of rare phenotypes. We genotyped 363 CVID cases and 3031 healthy control subjects on the Illumina Infinium Human Hap610K BeadChip (Illumina, San Diego, Calif). We sought to associate SNPs and CNVs with CVID, as well as with the characteristic clinical subphenotypes of this syndrome, to address disease heterogeneity. Our hypothesis was that although complex, CVID is sufficiently genetically distinct from the healthy population to allow genetic prediction of this uncommon but clinically relevant disease.
The diagnosis of CVID was established in concordance with existing diagnostic criteria.16,17 All patients were enrolled in institutionally approved research protocols to enable genetic analysis and collection of clinical data. Subsets of the patients reported here have been previously included in published studies.2,18,19
We performed high-throughput, genome-wide SNP genotyping with the InfiniumII HumanHap610 BeadChip technology at the Center for Applied Genomics at the Children’s Hospital of Philadelphia. The genotype data content together with the intensity data provided by the genotyping array provides high confidence for CNV calls. Importantly, the simultaneous analysis of intensity and genotype data in the same experimental setting establishes a highly accurate definition for normal diploid states and any deviation from the norm. To call CNVs, we used the PennCNV algorithm,20 which combines multiple sources of information, including log R ratio and B-allele frequency at each SNP marker, along with SNP spacing and population frequency of the B allele to generate CNV calls. Rare recurrent CNVs were the focus of our study.
We calculated quality control measures on our HumanHap610 GWAS data based on statistical distributions to exclude poor-quality DNA samples and false-positive CNVs. The first threshold is the percentage of attempted SNPs that were successfully genotyped. Only samples with call rates of greater than 98% were included. The genome-wide intensity signal must have as little noise as possible. Only samples with an SD of normalized intensity (log R ratio) of less than 0.35 were included. All samples must have white ethnicity based on principle components analysis, and all other samples were excluded. Furthermore, case and control matching was ensured by calculating a genomic inflation factor between groups. Wave artifacts roughly correlating with guanine-cytosine content resulting from hybridization bias of low full-length DNA quantity are known to interfere with accurate inference of CNVs. Only samples in which the guanine-cytosine-corrected wave factor of the log R ratio −0.02 <×< 0.02 were accepted. If the count of CNV calls made by using PennCNV exceeds 100, it is suggestive of poor DNA quality, and those samples were excluded. Thus only samples with CNV call counts of less than 100 were included. Any duplicate samples (eg, monozygotic twins or repeats on the same patient) were identified, and as a result, 1 sample was excluded.
CNV frequency between cases and control subjects was evaluated at each SNP by using the Fisher exact test. We only considered loci that were nominally significant between cases and control subjects (P < .05) for which patients had the same variations that were observed in multiple cohorts or were not observed in any of the control subjects and were validated with an independent method. We report statistical local minimums to narrow the association in reference to a region of nominal significance, including SNPs residing within 1 Mb of each other. Resulting nominally significant CNVs were excluded if they met any of the following criteria: (1) residing on telomere or centromere proximal cytobands; (2) arising in a “peninsula” of common CNVs arising from variation in boundary truncation of CNV calling; (3) being genomic regions with extremes in GC content, which produces hybridization bias; or (4) contributing to multiple CNVs. Three lines of evidence establish statistical significance: an independent replication P value of less than .05, permutation of observations, and no loci observed with control-enriched significance. We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID) to assess the significance of functional annotation clustering of independently associated results into InterPro categories.
Our CVID case cohort was composed of 223 patients from Mount Sinai School of Medicine, 76 patients from the University of Oxford, 37 patients from the Children’s Hospital of Philadelphia, and 27 patients from the University of South Florida. The diagnosis in each case was validated against the European Society for Immunodeficiencies/Pan-American Group for Immunodeficiency diagnostic criteria.16
We first evaluated the quality and suitability of the data for a case-control study. Seven samples had call rates of less than 98% and were excluded. Based on principle components analysis with EIGENSTRAT software,21 the cases were stratified into 3 clusters: 288 cases were of confirmed European ancestry, whereas 30 samples were significant outliers from white race and were omitted from this analysis. A random split of 179 cases was matched with 1917 white control subjects. We identified an additional 1114 control samples that clustered well with the balance of 109 white cases used for replication. We excluded subjects with observed cryptic relatedness of genotypes. For CNV calling, samples with high noise (SD of log R ratio >0.35) in intensity signal were additionally removed.
We first performed a GWAS using the SNP genotype data. Genotype frequencies were compared between cases and control subjects by using a χ2 test statistic applied in PLINK22 for SNPs with an at least 90% call rate and 1% minor allele frequency. The discovery cohort of 179 cases and 1917 control subjects yielded a low genomic inflation factor of 1.02783 (Fig 1, A). The most significant association was to the MHC, with the most significant SNP, rs3117426, having a P value of 8.62 × 10−10 (Table I). The second best association was to a locus on 8p21.2 harboring a disintegrin and metalloproteinase 28 (ADAM28), ADAM7, ADAMDEC1, and STC1. However, this locus did not meet genome-wide significance criteria (P = 6.24 × 10−6 for rs4872262) and results, which is in keeping with modest power from the relatively small sample sizes.
In an attempt to replicate the 8p21.2 locus, we searched for direction of effects in our independent CVID cohort of 109 cases and 1114 control subjects (genomic inflation factor, 1.04789; Fig 1, B). Interestingly, 4 SNPs displayed significance (P < .05) and allele frequency in the same direction as the discovery cohort, including rs11207520, rs1194849, rs17790790, and rs4872262 (Table I). When all SNPs contributing to the discovery region were queried in replication rather than just the most significant SNP, a total of 8 SNPs displayed significance (P <.05) and allele frequencies in the same direction as discovery (Table I).
We next addressed the heterogeneity of the CVID phenotype by assessing the various CVID subphenotypes. On the basis of recent phenotypic characterization,18 we established 16 distinct clinical subgroups within the CVID cohort, including cancer, lymphoma, lymphadenopathy, nodular regenerative hyperplasia of the liver, lymphoid interstitial pneumonitis (LIP), bronchiectasis, biopsyproved granuloma, gastrointestinal enteropathy, malabsorption, splenectomy, cytopenias, organ-specific autoimmunity, low IgM level (<50 mg/dL), low IgA level (<10 mg/dL), low B cell number (CD19+ cells <1%), and young age at symptom onset (<10 years, see Fig E1 and Table E1 in this article’s Online Repository at www.jacionline.org).
Differences in genotype frequencies were assigned significance, and regions with multiple significant SNPs were then scored. Significant SNP associations were made for all subsets (Table II and Fig 1, C). The most significant association was observed with nodular regenerative hyperplasia of the liver on AK096081-AK124028 (P = 2.29 × 10−10). Lymphoma was associated with KIAA0834, PFTK1, and HAVCR1, with corresponding P values ranging from 1.69 × 10−8 to 3.62 × 10−8. An association was also observed between LIP and FGF14 and ZNF81, with corresponding P values of 5.76 × 10−8 and 7.70 × 10−8, respectively. Organ-specific autoimmunity also associated with SNX31 (P = 6.89 × 10−8), and low IgM levels (<50 mg/dL) in the case subjects were associated with LDLRAP1 (P = 6.02 × 10−8).
Finally, patients with CVID manifesting enteropathy and their resulting subphenotype significance were additionally compared with patients with inflammatory bowel disease from a previously reported cohort23 to assess whether specific loci contributed to a common cause. The SNP rs12889533 was significant in both the CVID cases with enteropathy and the inflammatory bowel disease cohort, with allele frequency difference in the same direction (see Table E2 in this article’s Online Repository at www.jacionline.org).
We next performed a CNV analysis, in which rare (<1%) CNVs were called by using PennCNV.20 We analyzed deletion and duplication CNV frequencies of 311 cases and 2766 control subjects (see Fig E2 in this article’s Online Repository at www.jacionline.org). We discovered 5 deletions and 11 duplications to be recurrent and significantly enriched in the CVID cases compared with the control subjects (Tables III and andIV).IV). In addition, 15 regions were exclusive to CVID cases. The most noteworthy duplication resided at 2q23.1 and affected multiple exons of origin recognition complex 4L (ORC4L) that were exclusive to 15 cases, and were not identified in control subjects (P = 8.66 × 10−16, see Fig E3 in this article’s Online Repository at www.jacionline.org).
We additionally observed large CNVs captured by 10 or more SNPs that were exclusive to CVID cases totaling 84 deletions and 98 duplications. (see Table E3 in this article’s Online Repository at www.jacionline.org). No significant overrepresentation was observed in the control subjects in this frequency range. Of clinical relevance, 2 patients were newly identified with deletions in the 22q11 region, suggesting an alternative presentation of the well-characterized congenital immunodeficiency associated with this microdeletion. Interestingly, 10 regions of homozygous deletion were observed exclusively in CVID cases (see Table E4 in this article’s Online Repository at www.jacionline.org).
We next examined the overall CNV burden in cases compared with control subjects for both exonic CNVs and for large (100 kb) and rare (<1%) CNVs. We found deletions to be significantly enriched in the CVID cases when considering CNV observations on all loci genome wide (see Table E5 in this article’s Online Repository at www.jacionline.org). All but 5 of the 182 CNVs exclusive to the CVID cohort were intraexonic. To assess the reliability of our CNV detection method, we reviewed the B-allele frequency (genotype) and log R ratio (intensity) values of our Illumina data and experimentally validated all the significant CNVs by using an independent method, the Affymetrix Cytogenetics Whole-Genome 2.7M Array (Affymetrix, Santa Clara, Calif), which provides high resolution with 400,103 SNP and 2,387,595 CN probes (see Fig E4 in this article’s Online Repository at www.jacionline.org).
We used a Support Vector Machine (SVM) algorithm to determine how well we could predict the CVID phenotype in a pool of “unknown” samples (see the Methods section in this article’s Online Repository at www.jacionline.org). The SVM algorithm was trained on the discovery cohort of 179 CVID cases and 1917 control subjects. We identified 1000 SNPs from association analysis that accurately separated the CVID cases from the control subjects with 98.7% accuracy overall. When applied on the independent cohort of 109 cases and 1114 control subjects, the positive and negative predictive values amounted to 1.0 and 0.957, respectively (Fig 2), suggesting that these markers were robust in distinguishing the genetic architecture of CVID cases from that of healthy control subjects.
CVID was described more than 50 years ago, but aside from a small number of recessively inherited genes in a few families and the more prevalent but poorly understood contribution of mutations in TNFRSF13B,5,6,7,24 other causes have remained obscure. CVID has thus been hypothesized to represent a diverse collection of genetic lesions resulting in a similar immunologic phenotype. The MHC region has been associated with a myriad of complex diseases,25 including immune-related conditions26 and CVID.8,9 MHC was robustly associated with CVID in our study, further validating the prior work that was not performed at the genome-wide level.
The SNP genotype association analysis presented here revealed a suggestive association outside of the MHC region with a locus encompassing ADAM28, ADAM7, ADAMDEC1, and STC1 (P = 6.24 × 10−6) that we also replicated in an independent case-control cohort (P value range = 2.25 × 10−8 to .0314). The ADAM family proteins are zinc metalloproteases involved in diverse biologic processes, including immune responses. Interestingly, the metallopeptidase MMP27 was also nominally associated with CVID (P = 1.69 × 10−4). Proteins of the matrix metalloproteinase family are involved in the breakdown of extracellular matrix to promote routine physiological processes but might also facilitate disease pathogenesis. Related genes have demonstrated immunologic function involved in the regulation of cytokine release, TH2 immune responses, and specific inflammatory processes. ADAM28 is also known as the lymphocyte metalloprotease MDC-L, which is expressed on the lymphocyte cell surface.27 As such, it has been defined as a ligand for α4β1 integrin, which enables the adhesion of other leukocytes expressing this integrin. STC1, which is located in the same region, is a gene that encodes stanniocalcin 1 protein, which is involved in regulation of calcium, including in antioxidant pathways of marcophages. 28 In this light calcium regulation within immune cells has been previously identified as aberrant in certain samples of patients with CVID.29 UBXN10 might be a compelling candidate potentially involved in immunopathogenesis of primary antibody failure in that it encodes an ubiquitin-like protein, which, in addition to phosphorylation, has been shown to regulate nuclear factor κB activity.30 Sphingosine-dependent protein kinase 1 (SDK1) is important in the survival of alveolar macrophages.31 DEPDC6 is a negative regulator of mammalian target of rapamycin signaling pathways, and RNA expression levels were found to be significantly different between those mice resistant to H5N1 influenza virus and those that were susceptible.32 No significant associations in the region of TACI were observed, and subjects with TACI mutations were not separately identified. We reviewed the reported amino acid changes and looked up their corresponding nonsynonymous SNP IDs (see Table E6 in this article’s Online Repository at www.jacionline.org). No imputation reference files, including these SNPs, were available to infer genotypes.
We additionally performed an SNP genotype association to the particular features of CVID to define potential common mechanistic links among the specific clinical and immunologic variants to potentially enable the prediction of CVID clinical phenotypes. Significant individual associations were made with all CVID variables studied (Table II). In this regard PFTK1 is a member of the CDC2-related protein kinase family found to be constitutively expressed at high levels in B-cell lymphomas33 and also found to be associated with lymphoma in the subjects with this complication. Interestingly, SNPs in HAVCR1, which is also associated in the same subjects (P= 3.62 × 10−8), are genes that play a role in TH cell development and the regulation of asthma and allergic diseases. 34 FGF14, which is associated with LIP and low IgA levels, is a member of the fibroblast growth factor family, which has crucial roles in embryonic development, cell growth, morphogenesis, tissue repair, tumor growth, and invasion.35 SNX31, which is associated with organ-specific autoimmunity, is a sorting nexin that might be involved in protein trafficking; SNX family proteins are subunits required for CD28-mediated T-cell costimulation.36
Although these SNP association results are not powered for definitive conclusions, the CNV association analysis uncovered several novel genes that were significantly associated with CVID. In fact, 84 CNV deletions and 98 duplications were identified in 1 or more patients with CVID but were not found in any of the 3031 control subjects. Most were intraexonic and thus likely to disrupt the genes involved. Some of the genes potentially affected by the identified CNV were also discovered in the GWAS part of this study. Many others have direct or potential relevance to the immune system, and many were unique to individual patients, thus underscoring the great mechanistic diversity that is likely to underlie this collection of disorders, which is also reflected in the variability in clinical presentation and disease natural history.1 Among those, we noted a highly significant number of subjects with duplications in ORC4L, a gene previously associated with B-cell lymphoproliferative disorders.37 This gene is essential for initiation of DNA replication and potentially in rapidly proliferating immune cells.
In addition to the potential of these findings to provide mechanistic insight into how CVID and its subphenotypes arise, we used SVM to exploit the overall genetic uniqueness of patients with CVID compared with healthy control subjects. These results suggest that the genetic signatures of the disease might also allow for accurate prediction of CVID based on genetic profiling. This has the potential to greatly improve the clinical management of the CVID and address a major unmet need in the field of primary immunodeficiency. By using current diagnostic criteria, diagnostic delay of multiple years can occur38 which can in part be because of the evolving immunoglobulin phenotype.38 This can lead to repeated infection and irreparable organ damage during the period of diagnostic delay. The ability to determine increased likelihood for CVID at the time of initial clinical suspicion could allow for improved intervention and reduced morbidity.
This study represents the first genome-wide population-based study of CVID. The use of the relatively large cohorts of this rare disorder assembled here was essential both to discover and confirm the findings and demonstrates the potential of genome-wide association in complicated polygenic rare diseases. This type of unbiased study has discovered many novel loci that might underlie the development of CVID. It can provide clues to the pathogenesis of the heterogeneous clinical complications and subtypes of CVID, providing a solid foundation for further studies to understand the mechanism, interplay, and clinical manifestations of CVID, with the immediate potential of improving clinical management through enhanced diagnosis. Finally, the great diversity of genetic findings identified with regard to unique CNVs substantiates the hypothesis that CVIDs are a collection of diverse mechanisms leading to complex phenotypes.
We define CVID as a genetically heterogeneous condition through genome-wide analysis. Patients with CVID can be distinguished from healthy subjects by using multiparameter predictive algorithms that could prove useful in facilitating diagnosis.
Children’s Hospital of Philadelphia support was from the Children’s Hospital of Philadelphia Institutional Development Award to the Center for Applied Genomics, which funded all genotyping (to H.H.); a Research Development Award from the Cotswold Foundation (to H.H.); the Jeffrey Modell foundation (to J.S.O.); and National Institutes of Health (NIH) grant AI-079731 (to J.S.O.). Oxford support was from the NIHR Oxford Biomedical Research Centre, Baxter Healthcare (general support to the department not specific to this project), Talecris (general support to the department not specific to this project), and the Jeffrey Modell Foundation for unrestricted gifts; the Primary Immunodeficiency Association for the Centre of Excellence award; and the European Commission for EU 7th FP EURO-PADnet number 201549. University of South Florida support was from NIH grant 5R03AI083904 (to E.E.P.). Mount Sinai support was from NIH grants, AI-101093, AI-467320, AI-48693, NIAID Contract 03-22, and the David S Gottesman Immunology Chair (all to C.C.-R.).
J. S. Orange has consultant arrangements with Talecris Biotherapeutics, Baxter Health, CSL Behring, and IBT Reference Labs; is a speaker for Baxter Health; receives research support from the National Institutes of Health (NIH)/National Institute of Allergy and Infectious Diseases (NIAID); is an Elected Officer of the American Academy of Allergy, Asthma & Immunology (AAAAI); and is on the advisory board for the Immune Deficiency Foundation. K. E. Sullivan receives research support from the NIH and is a consultant for the Immune Deficiency Foundation. J. W. Sleasman receives research support from the NIH, the Florida Department of Health, and the National Oceanic Atmospheric Administration. E. E. Perez has consultant arrangements with Baxter and CSL Behring.
We thank the clinical immunology and laboratory immunology staff in Oxford and Li Zhang, who prepared DNA at Mount Sinai Medical Center.
Disclosure of potential conflict of interest: The rest of the authors have declared that they have no conflict of interest.