|Home | About | Journals | Submit | Contact Us | Français|
Sjögren’s syndrome is a common autoimmune disease (~0.7% of European Americans) typically presenting as keratoconjunctivitis sicca and xerostomia. In addition to strong association within the HLA region at 6p21 (Pmeta=7.65×10−114), we establish associations with IRF5-TNPO3 (Pmeta=2.73×10−19), STAT4 (Pmeta=6.80×10−15), IL12A (Pmeta =1.17×10−10), FAM167A-BLK (Pmeta=4.97×10−10), DDX6-CXCR5 (Pmeta=1.10×10−8), and TNIP1 (Pmeta=3.30×10−8). Suggestive associations with Pmeta<5×10−5 were observed with 29 regions including TNFAIP3, PTTG1, PRDM1, DGKQ, FCGR2A, IRAK1BP1, ITSN2, and PHIP amongst others. These results highlight the importance of genes involved in both innate and adaptive immunity in Sjögren’s syndrome.
Sjögren’s syndrome is a common (~0.7% of European Americans1), chronic, autoimmune condition characterized by exocrine gland dysfunction resulting in substantial morbidity. Females are affected at a rate of approximately 10 times more than males2,3. Although patients are identified clinically by oral and ocular manifestations, the full disease spectrum may encompass a complex myriad of systemic features, making diagnosis a significant clinical challenge4. Current American-European Consensus Group classification criteria require either the presence of anti-Ro and/or -La autoantibodies or histologic evidence of inflammation within salivary glands plus other subjective and objective measures of keratoconjunctivitis sicca and xerostomia5. Therapeutic options are primarily palliative and do little to prevent irreversible exocrine gland damage6.
Sjögren’s syndrome pathophysiology includes concurrent dysregulation of innate and adaptive immune pathways involving both cell-mediated and humoral disease processes that are incompletely understood7. Labial salivary gland and peripheral blood gene expression microarray studies have demonstrated dysregulation of type I interferon-inducible genes8,9. Previous work in Sjögren’s syndrome genetics is limited to candidate gene studies, yet strong association with several human leukocyte antigen (HLA) molecules, including HLA-DR, HLA-DQB1, and HLA-DQA1, has been established10,11. Several non-HLA regions have been implicated in the disease, such as IRF5 and STAT412-15; however, none of these associations have exceeded the genome-wide significant (GWS) threshold of P = 5×10−8 (Supplementary Table 1). In this study, genome-wide association and large-scale replication approaches were used to identify new risk loci in a case-control cohort of primary Sjögren’s syndrome patients.
To maximize the power of our study, we formed the Sjögren’s Genetics Network (SGENE), a collaborative research effort comprising multiple international sites (Supplementary Figure 1 and Supplementary Table 2). Dataset 1 (DS1) included 395 cases and 1975 population controls of European descent after implementing quality control (QC; see Online Methods and Supplementary Table 2). Genotyping was performed using the Illumina Omni1-Quad array. Dataset 2 (DS2) consisted of 1243 cases and 4779 population controls genotyped on the Illumina ImmunoChip, and Dataset 3 (DS3) included 1158 cases and 3071 population controls genotyped on a supplemental custom array designed to include variants not present on the ImmunoChip (all of European descent and after QC; Supplementary Figure 1b). Dataset 4 (DS4) evaluated 1541 cases and 2634 population controls genotyped on the Illumina ImmunoChip focusing on variants exclusive to DS4 and not common between DS1 and DS2 (Supplementary Figure 1b). Imputation was performed in regions with P<5×10−5 to increase the number of overlapping variants available for analysis (Supplementary Figure 1c and 1d).
The HLA region at 6p21 was the only association that exceeded the GWS threshold of P<5×10−8 in DS1 (Supplementary Figure 2). After imputation, we found the peak effect at rs112357081 near HLA-DRA with PDS1=1.01×10−32 (Table 1; Supplementary Figure 3). No inflation was observed in the test statistic (λGC=1.0) and no significant deviation from the expected distribution of P-values was detected after removing the HLA region (Supplementary Figure 4). After meta-analysis between the imputed DS1 and DS2 (Figures 1 and and2;2; Supplementary Figure 1), strong association spanned a region that included the HLA Class I, III, and II loci (Figure 2a; Supplementary Table 3), with peak association observed 5′ of HLA-DQB1 in the HLA Class II region at rs115575857 with Pmeta=7.65×10−114 (Table 1 and Figure 2a). A second set of variants not in linkage disequilibrium (LD) with rs115575857 was also identified peaking at rs116232857 (Pmeta=1.33×10−96; Table 1 and Figure 2a).
Further dissection through logistic regression analysis in DS2 adjusting for rs115575857 identified an ~5 Mb haplotype encompassing the extended HLA region (Supplementary Figure 5). In addition, a narrow region of residual association tagged by rs116232857 (Presidual=7.00×10−26) was observed in the HLA-DRA through HLA-DQA1 regions (Supplementary Figure 5). Logistic regression adjusting for rs115575857 and rs116232857 accounted for most of the association within the HLA region (the residual association in the region at rs115146037 went from 10−80 to 10−9; Supplementary Figure 5), thus identifying the presence of at least two independent effects.
Classical HLA alleles were imputed into our dataset to better understand the relationship between the HLA region variants tested in this study to those reported previously. The top associated classical allele was HLA-DQB1*0201 (Pmeta=1.38×10−95) followed closely by HLA-DQA1*0501 (Pmeta=8.50×10−94; Supplementary Table 4 and Supplementary Figure 6). We used logistic regression adjusting for both the top associated classical HLA alleles and the variants identified through the meta-analysis of DS1 and DS2. As previously described16, our data shows that HLA-DQB1*0201, HLA-DQA1*0501, and HLA-DRB1*0301 are in strong LD (r2>0.97), and adjusted logistic regression using any of the three accounted for the observed association. Moreover, each of these classical alleles accounted for the association at rs115575857. Likewise, rs115575857 explained the association at each of the three classical alleles (Supplementary Figure 6). Adjusting for rs116232857 could account for the association at HLA-DQB1*0501, a previously reported protective allele in Sjögren’s syndrome11. However, HLA-DQB1*0501 could not account for the association observed at rs116232857, suggesting that rs116232857 better tags the functional effect(s) than the classical allele.
Enrichment analysis using all variants with Pmeta<5×10−5 identified a statistically significant association of disease-associated variants within 100 bp of regions found to be cross-linked to the transcription factor (TF) RFX5 (P=1.53×10−14) by ENCODE ChIP-seq studies17. In total, 161 variants contributed to the enrichment in our dataset with all but one located in the HLA region. To further evaluate the influence of the extensive LD in the HLA, we tested for enrichment of Sjögren’s syndrome-associated variants in RFX5 ChIP-seq peaks using only the HLA region variants present in DS1 as the background and continued to observe significant enrichment (P=9.24×10−7; Supplementary Table 5).
To evaluate the functional mechanism of this RFX5 enrichment within the HLA the variants near RFX5 ChIP-seq peaks were tested as expression quantitative trail loci (eQTL), resulting in several statistically significant results (Figure 3a-e); Supplementary Table 6 and Supplementary Figure 7). The top overall eQTL was observed with rs114846898 and HLA-DRB6 (PANOVA=1.03×10−42; Figure 3a). A modest eQTL was observed for rs116232857 and HLA-DRB6 (PANOVA=1.77×10−12); however, we found that rs116232857 and rs114846898 were nearly in perfect LD with each other (D’=0.98), suggesting that the association at rs116232857 might be due to rs114846898. Looking at the adjusted analysis results, we found that rs115575857 and rs116232857 could explain rs114846898, but adjusting for rs115575857 and rs114846898 left residual association (Presidual=9.44×10−7) at rs116232857, suggesting that rs116232857 likely tags additional functional variant(s).
The most statistically significant association outside the HLA region after meta-analysis was observed in the region of IRF5. In total, 67 SNPs exceeded the Pmeta=5×10−8 threshold, with the peak effect observed in the IRF5 promoter region (rs3757387, Pmeta=2.73×10−19; Figure 2b; Supplementary Table 7 and Supplementary Figure 8). A group of variants, tagged by rs17339836 (Pmeta=2.43×10−16) and spanning IRF5 and TNPO3, were not correlated with the promoter variants (Figure 2b).
Logistic regression adjusting for rs3757387 explained all of the promoter region association, while residual association persisted with variants spanning the IRF5-TNPO3 genes tagged by rs17339836 (Presidual=2.86×10−4; Supplementary Figure 9). Moreover, a group of variants tagged by rs6948875 that were not associated in the unadjusted single marker analysis became statistically significant when adjusting for rs3757387. Adjusting for both rs3757387 and rs17339836 again resulted in residual association of the region tagged by rs6948875 and rs10954213 (Supplementary Figure 9). Since rs10954213 was statistically significant in the single marker analysis (Supplementary Table 7), this SNP was added to the model in lieu of rs6948875. Logistic regression models adjusting for rs3757387, rs17339836, and rs10954213 resulted in no residual association in the regions, indicating the likelihood of three independent effects. Bioinformatics database mining of this region indicated that the associated variants were located within a variety of biologically relevant features encompassing a wide range of potential functional mechanisms (Supplementary Figure 10).
Statistically significant association exceeding GWS was observed between Sjögren’s syndrome and 12 variants, 8 of which fall within the third intron of the STAT4 locus, with peak association at an insertion-deletion (indel) polymorphism (rs10553577; Pmeta=6.80×10−15; Figure 2c; Supplementary Table 8 and Supplementary Figures 11 and 12). As expected, the indel rs10553577 accounted for the association within the region of STAT4; however, residual association was observed with rs3024918 (Presidual=2.70×10−4) in the promoter region of STAT1, but did not surpass our required threshold (P<1×10−4) after adjustment for an independent effect (Supplementary Figure 12). Bioinformatics databases revealed a large number of TFs bound to both regions (Supplementary Figure 13); however, neither effect resulted in a statistically significant eQTL (data not shown).
Seven variants spanning the entire region from the promoter demonstrated strong association, with peak association occurring 5.3kb 3′ of IL12A at rs485497 (Pmeta=1.17×10−10; Figure 2d; Supplementary Table 9 and Supplementary Figure 14). Logistic regression analysis adjusting for rs485497 indicated this variant accounted for all the observed association (Supplementary Figure 15). To determine the functional role of the variants associated between IL12A and Sjögren’s syndrome, eQTL analysis was done for all variants showing evidence of association in the region. Of the variants exceeding GWS, only rs485497 was associated with IL12A transcript expression; however, seven variants 3′ of IL12A influenced expression more significantly, with the eQTL peaking at rs4680536 (Figure 3f; Supplementary Table 6). Haplotype analysis of the IL12A associated variants indicates the presence of only one risk haplotype and the possibility that rs485497 may tag functional variants located on either side of this SNP (Supplementary Figure 15), which is supported by the presence of multiple transcription factors and other regulatory elements located in the flanking LD blocks (Supplementary Figure 16).
A total of 30 variants on chromosome 8 exceeded GWS in the region of FAM167A and BLK (Figure 2e; Supplementary Table 10 and Supplementary Figure 17). Maximum association after meta-analysis was observed in the shared promoter region of FAM167A-BLK at rs2736345 (Pmeta=4.97×10−10); however, this SNP was not statistically significant in DS1 (Table 2). The second most significant variant after meta-analysis (significant in both DS1 and DS2) was rs2729935 with Pmeta=6.85×10−10, which is located in the first intron of BLK. Logistic regression analysis adjusting for rs2729935 (as this variant was significant in both datasets) indicated residual association (Presidual=1.75×10−3) present at rs2736345. However, adjusting for rs2736345 in the logistic model showed the likely presence of only one effect accounting for all the association in the region (Supplementary Figure 18).
Using expression data for BLK, eQTL analysis of the associated variants revealed several with the potential to influence transcript expression, including the top overall SNP in the region of BLK, rs2736345 (Supplementary Table 6). The most statistically significant eQTL was located in the first intron of BLK at rs2409781 (Figure 3g), a variant loosely correlated with rs2736345 and rs2729935 (r2≈0.5). Evaluation of FAM167A probes with the variants associated with Sjögren’s syndrome yielded greater statistical significance than those with BLK (Figure 3h). Bioinformatics analysis indicated the presence of many transcriptionally relevant markers not only in the shared promoter region, but also in the first intron of BLK, suggesting the possibility that association in this region impacts the transcriptional regulation of BLK and/or FAM167A (Supplementary Figure 19).
Association with a variant ~16 kb 5′ of the coding region of CXCR5 was observed at rs7119038 (Pmeta=1.10×10−8; Figure 2f; Supplementary Table 11 and Supplementary Figure 20). Three additional variants also exceeded the GWS threshold extending to rs11217033 located as far as ~90 kb 5′ of CXCR5 and as close to ~12 kb 5′ of the neighboring gene, DEAD (Asp-Glu-Ala-Asp) box polypeptide 6 (DDX6) (Figure 2f). Adjusting for rs7119038 explained all the association signals in this region (Supplementary Figure 21). Although no statistically significant eQTLs were present in the region for CXCR5, bioinformatics analysis indicated the presence of many regulatory elements throughout the region (Supplementary Figure 22). Assessment of eQTLs for DDX6 was not possible as the probe failed QC in the gene expression study.
In the region of TNIP1, one SNP, rs6579837, exceeded GWS with Pmeta=3.30×10−8; however, 29 additional variants showed evidence of association with Pmeta<5×10−5 (Figure 2g; Supplementary Table 12 and Supplementary Figure 23). Logistic regression analysis adjusting for rs6579837 accounted for all association observed in the region (Supplementary Figure 24). Several of the variants associated with Sjögren’s syndrome were also eQTLs for the TNIP1 probe, including rs6579837 (Supplementary Table 6). However, the most statistically significant eQTL was observed at rs73272842 located in TNIP1, where multiple regulatory elements have been identified (Figure 3i; Supplementary Figure 25).
We identified 8 loci that were significant in meta-analysis (Pmeta<5×10−5) but did not surpass the overall GWS threshold (Supplementary Figure 1b and Table 3). Twenty-one additional regions surpassed suggestive thresholds for significance (Pmeta<5×10−5) using DS4 (evaluated variants were exclusive to DS4; Supplementary Figure 1b; Supplementary Table 13). Further replication of these regions is needed to determine if they are bona fide risk loci.
To assess the potential effects of associated variants on the underlying biology, we utilized two bioinformatics tools: the Disease Association Protein-Protein Link Evaluator (DAPPLE)18 and Genomatix Pathway System (GePS)19. DAPPLE defines genomic regions in LD with variants achieving GWS and suggestive thresholds to determine any potential protein-protein interaction (PPI) that could elucidate the underlying biological perturbations in Sjögren’s syndrome. In assessing the GWS variants outside the HLA region only, we observed no direct PPI, and while indirect PPI was observed, it was not more likely than expected by chance (Supplementary Figure 26). However, assessing both GWS and suggestive variants outside the HLA region, we observed a direct PPI network consisting of 9 proteins with 6 direct interactions (P=0.0007) and mean associated protein direct connectivity of 1.33 (expected=0.63, P=0.048) in addition to an indirect network that exhibited a mean associated protein indirect connectivity trending toward significance (P=0.082) (Supplementary Figures 27 and 28). Pathway analysis in GePS for the 6 GWS loci outside HLA found enrichment in immune signal transduction pathways (3/269 genes; P=1.54×10−3) and IL-12 signaling pathways (2/7 genes; P=3.80×10−5), primarily due to the influence of IL12A and STAT4. Adding the remaining 23 loci demonstrating suggestive association yielded similar results, with dominance of IL-12- and STAT4-dependent signaling (P=1.63×10−6). Assessing gene involvement in disease processes using Medical Subject Headings (MeSH) showed significant enrichment in multiple immunologically-mediated diseases, including non-Hodgkin lymphoma (16/2910 genes; P=1.7×10−6), systemic lupus erythematosus (SLE) (12/1691 genes; P=4.74×10−6), Epstein-Barr virus infections (11/1421 genes; P=6.18×10−6), dry eye syndrome (8/868 genes; P=4.99×10−5), and Sjögren’s syndrome (6/756 genes; P=1.15×10−3).
In this study, associations previously identified in the HLA region with Sjögren’s syndrome were replicated and represent the strongest genetic risk factors with OR ≈ 3.5. Our results are consistent with a trans-racial meta-analysis of 23 case-control studies evaluating Class II loci in Sjögren’s syndrome, which also identified HLA-DQA1*0501, HLA-DQB1*0201, and HLA-DRB*0301 as disease risk factors11. Associated variants in this study with a Pmeta<5×10−5 were found to be enriched within 100 bp of RFX5 ChIP-seq peaks identified by ENCODE17. Mutations in RFX5 can lead to the inability to transcribe Class II HLA molecules and bare lymphocyte syndrome20. In Sjögren’s syndrome, several risk alleles in or near the RFX5 ChIP-Seq peaks in the Class II region were eQTLs (Figure 3a-e), indicating that this TF may play a critical pathogenic role. Moreover, eQTLs were identified in the Class I region with additional HLA molecules, including HLA-C, HLA-A, HLA-H, and HLA-G, and previous studies have found that RFX5 regulates the expression of these loci21. Extensive LD between variants in the HLA region may influence the enrichment analysis performed in GenomeRunner when using all variants as background; however, we continued to observe enrichment when focusing only on the HLA region. Although we cannot completely rule out the influence of LD on these results, the observation that several of these variants are eQTLs further indicates biological relevance.
Although HLA Class II molecules have classically been implicated in Sjögren’s syndrome risk, HLA Class I loci have also been reported22. This suggests that risk attributed to the HLA region may involve altered expression of multiple genes residing in this region and important to the immune system through altered binding of RFX5 to DNA motifs and/or accessory molecules. However, it is likely that other mechanisms, such as amino-acid changes similar to those identified by Raychaudhuri et al. in rheumatoid arthritis (RA)23, are involved in the genetic pathophysiology of this region. Comparing the effect sizes for the two primary HLA associations with other autoimmune diseases, we find that effect sizes in Sjögren’s syndrome appear stronger than those in SLE and RA, similar to those in MS, and weaker than those in celiac disease, type I diabetes, and psoriasis (Supplementary Figure 29).
This study also established IRF5-TNPO3, STAT4, IL12A, FAM167A-BLK, DDX6-CXCR5, and TNIP1 as risk loci. IRF5 is a transcription factor mediating type I interferon responses in monocytes, dendritic cells, and B cells24 that induces the transcription of interferon-α genes and the production of pro-inflammatory cytokines (including IL-12, p40, IL-6, and TNF-α)25 upon viral infection. IRF5 is an established risk locus in SLE, RA, ulcerative colitis, primary biliary cirrhosis (PBC), and systemic sclerosis (SSc)26-33. In SLE, association with three independent functional alleles was observed34, similar to the effects described in this study. Previous studies in Sjögren’s syndrome have also implicated IRF5 as a risk locus, but only ~15 variants have been tested, with a CGGGG indel polymorphism yielding the most statistically significant result and appearing to be responsible for the promoter region association (Supplementary Table 1)13,15,35,36. While we were unable to impute the CGGGG indel, the variant rs2004640, which is correlated (r2=0.7) with the CGGGG indel36, demonstrated association (Pmeta=8.86×10−15). Moreover, a three variant haplotype, consisting of rs3757385, rs2004640, and rs10954213, perfectly tags the CGGGG indel37,38. Adjusting logistic regression models for these three variants showed significant residual association (P=1.05×10−7) at rs3757387. After adding rs3757387, residual association (P=1.64×10−4) was observed at rs17339836, suggesting that adjusting for the 3 SNP haplotype tagging the CGGGG indel was equivalent to adjusting for only rs10954213. Additional studies are necessary to determine the precise causal variant in the IRF5 promoter region.
STAT4 is critical for cellular responses initiated by type I interferons and is subsequently induced by IL-12 in lymphocytes, leading to the transcription of interferon-γ39. The association of variants in the third intron of STAT4 has been well established in other inflammatory diseases32,40-42 and has been implicated in Sjögren’s syndrome (Supplementary Table 1)12,14,15,43. The variants reported in these studies are highly correlated (r2>0.95) with the associated variant rs10553577 identified in this study.
IL12A encodes the p35 subunit that forms the IL-12 heterodimer with the p40 subunit encoded by IL12B44. IL-12, an immunomodulatory cytokine primarily secreted by monocytes and dendritic cells, plays a critical role in T-helper 1 cell differentiation and the production of interferon-γ by T cells and NK cells45. Although no studies in Sjögren’s syndrome report association to the IL12A region, several studies in related conditions have identified variants near IL12A that increase disease risk. Association of variants in the 3′ end of IL12A have been reported for PBC, while 5′ effects have been described for celiac disease46-48. Interestingly, the variants reported in PBC showed weak correlations (r2<0.35) with the top SNP in Sjögren’s syndrome, and analyses adjusting for variants identified in related diseases continued to exhibit residual association (Presidual<0.05) at rs485497. Moreover, rs574808, rs495499, and rs6441286, the most statistically significant variants in PBC42,46, are highly correlated with the eQTL polymorphism reported in this study. Previous studies observed only suggestive association with the block encompassing the coding region of this gene, which includes the top overall associated variant, rs485497, in this study. Thus, the effect in the region of IL12A in Sjögren’s syndrome appears to be at least partially distinct from PBC.
IRF5, IL12A, and STAT4, all participate in type I interferon pathway signaling. A role for type I interferon pathway dysregulation in Sjögren’s syndrome has been described through mRNA gene expression microarray studies of labial salivary glands and peripheral blood8,9,49,50. Moreover, Emamian et al. has shown that overexpression of type I interferon inducible genes correlates with titers of the classic Sjögren’s syndrome autoantibodies anti-Ro and anti-La9. Genetic effects in Sjögren’s syndrome potentially influence both subunits of IL-12. IRF5 can initiate transcription of IL12B while variants in the region of IL12A have also been identified as risk factors in this study. Once secreted from dendritic or NK cells, IL-12 binds T cell receptors, thereby initiating a signaling cascade through STAT4 phosphorylation44.
BLK is a non-receptor src family tyrosine kinase involved in B cell receptor signaling and B cell development. B cell receptor signaling is essential for proper immune function, deletion of autoreactive B cells, and subsequent receptor editing51,52. Variants in the BLK locus have been reported to influence both FAM167A and BLK mRNA and protein expression and are associated with related diseases, such as SLE53,54. Previous studies have implicated this region in Sjögren’s syndrome risk with peak association observed within FAM167A at rs1254979614, while the current study identifies the shared promoter region and the first intron of BLK as the most statistically significant associations (Figure 2e).
CXCR5, a membrane-bound protein present on some memory B cells and on follicular helper T cells, acts as a receptor for B-lymphocyte chemoattractant (BLC). Variants in the region of CXCR5 have been associated with multiple sclerosis and PBC42,55. Although no genetic associations between Sjögren’s syndrome and CXCR5 have been reported, several studies have found CXCR5 to be dysregulated in B cells in salivary gland tissues and the periphery49,56. Interestingly, IL-12 alone can induce CXCR5 expression, leading to the early commitment of naïve CD4+ T cells to the T follicular helper cell lineage57.
Although the exact function of TNIP1 has not been defined, TNIP1 does bind TNFAIP3, which suppresses TLR-induced apoptosis by negatively regulating NF-κB58. Variants in the region of TNIP1 have been found to increase risk for RA, SLE, psoriasis, and SSc59-63. In SLE, two independent risk haplotypes have been identified, with the H2 haplotype defined by Adrianto et al.62 corresponding to the association identified in this study. Similar to the findings reported above, the H2 haplotype in SLE also shows allele-specific differential expression62.
Considered jointly in the logistic regression model, the genome-wide and suggestively associated variants could explain a significant proportion of the heritable risk (C statistic = 0.811, Supplementary Table 14). This value is higher than that of the SLE model in women of European ancestry (C statistic = 0.67)33. Consideration was also given to the effect size of Sjögren’s syndrome-associated variants in LD with previously established loci achieving GWS in other autoimmune diseases. Of note, the OR and 95% CI for Sjögren’s syndrome variants outside the HLA region (Supplementary Table 15 and Supplementary Figure 30) are comparable to those in related autoimmune diseases, particularly with respect to SLE, with much greater variability in effect size and CI when considering the variants in the HLA region (Supplementary Table 16 and Supplementary Figure 29).
Using bioinformatics tools, the combination of GWS and suggestively associated variants provides some evidence of significant direct and indirect PPI and enrichment of genes involved in immune signaling processes, particularly with respect to IL-12 and STAT4 and many autoimmune-related diseases. The importance of type I interferons has been previously recognized in the pathogenesis of Sjögren’s syndrome. However, the type II interferon pathway, acting through interferon-γ downstream of IL-12 and STAT4, is now clearly implicated by this study. In addition, multiple genes in the NF-κB pathway, including TNIP1 and TNFAIP3, support a role for this pathway in Sjögren’s syndrome pathogenesis. Furthermore, establishing association of several genes important in the adaptive immune response, including CXCR5 and BLK, as well as suggestive associations in loci such as FCGR2A, provides insight into the complex genetic architecture for Sjögren’s syndrome and will facilitate further studies to understand the precise functional consequences of these risk variants. Taken together, this work provides significant progress towards explaining the underlying genetic pathophysiology of Sjögren’s syndrome, which will ultimately provide important new therapeutic targets and potential diagnostic markers, both of which are lacking.
In summary, we performed a large-scale genetic study in Sjögren’s syndrome and established IRF5-TNPO3, STAT4, IL12A, FAM167A-BLK, DDX6-CXCR5, and TNIP1 as risk loci. We have determined that there are likely multiple independent effects within the HLA and IRF5 regions. Our results also strongly implicate 29 suggestive regions in Sjögren’s syndrome etiology warranting further study. Together, these results highlight the importance of the innate and adaptive immune systems in Sjögren’s syndrome etiology. Future work is needed to replicate these additional candidate associations and to characterize the causal variant(s) for the regions established in this study.
The samples used in this study were collected through the Sjögren’s Genetics Network (SGENE) and organized at the Oklahoma Medical Research Foundation (OMRF; Supplementary Table 2). In Dataset 1, 438 Sjögren’s syndrome cases and 3917 population controls of European descent were subjected to quality control measures outlined below. Each Sjögren’s syndrome case was genetically matched to five population controls in DS1 using the identity-by-state (IBS) to assess allele sharing as implemented in PLINK64, and the remaining controls were used in DS2 and DS3. In the next phase, 1521 Sjögren’s syndrome cases and 6626 population controls were evaluated in DS2 only for variants that overlapped with DS1; 1169 Sjögren’s syndrome cases and 3078 population controls were evaluated in DS3; and 1897 cases and 3039 population controls were evaluated in DS4 (313 cases overlapped with DS1). Cases fulfilled the American-European Consensus Group (AECG) criteria for primary Sjögren’s syndrome5, and all population controls were evaluated within their respective study. All cases were evaluated by expert clinicians for possible overlap of clinical features or concurrent diagnosis with other autoimmune diseases. Written informed consent was obtained by each participant following protocols approved by the Institutional Review Boards of each institution where the samples were collected.
Genotypes for DS1 were obtained using the Illumina Omni1-Quad array using Infinium chemistry at OMRF following the manufacturer’s protocol (Illumina, Inc., San Diego, CA). Similarly, DS2/DS4 genotyping was performed with the ImmunoChip (an Illumina iSelect custom array designed by the ImmunoChip Consortium)65. DS3 was genotyped on iSelect arrays containing 1536 variants overlapped with DS1 that were not available on the ImmunoChip. Strict quality control procedures were implemented before SNPs were used in this analysis, including requirements for: well-defined cluster scatter plots; minor allele frequency >1%; SNP call rate >95%; sample call rate >95%; Hardy-Weinberg proportion (HWP) test with a P > 0.001 in controls; and P > 0.001 for differential missingness between cases and controls.
Samples with <95% call rate and excessive increased heterozygosity (>5 standard deviations from the mean) were excluded from the analysis. Relatedness within the remaining samples was determined using identity-by-descent (IBD) estimates as determined by PLINK v1.0764. One individual from each pair was removed if the proportion of the alleles shared identity by descent was >0.4, with preference towards cases and/or the individual with the highest call rate. Gender was estimated genetically, and discrepancies with self-report were scrutinized. Males were required to be heterozygous at rs2557523 (since the G allele for this SNP is only observed on the Y chromosome and the A allele appears only on the X chromosome) and to have chromosome X heterozygosity ≤10%, while females were required to be homozygous for the A allele at rs2557523 and have chromosome X heterozygosity >10%. After quality control, 648,937 SNPs were available to test for association across all datasets for DS1. Only SNPs present on both DS1 and DS2 arrays (20,055 after quality control) were used for meta-analysis and fine-mapping (after imputation, see below for more details). All base pair positions are given according to the hg19 version of the human genome reference sequence.
Genetic outliers were removed from further analysis as determined by principal components analysis (Supplementary Figure 31)66,67. Population substructure was identified within the sample set using EIGENSTRAT66 with 40,073 (DS1) and 2,827 (DS2/DS4) independent markers (r2<0.20 between variants). The resulting Eigenvectors were able to distinguish the four continental ancestral populations in the following HapMap samples: Africans (ASW, LWK, MKK, and YRI), Europeans (CEU and TSI), Hispanic and East Indians (MEX and GIH), and Asians (CHB, CHD, and JPT); Supplementary Figure 30)67,68. We utilized principal components to identify samples output by EIGENSTRAT that were not of European ancestry. Samples >6 standard deviations from the mean were eliminated from further analysis. Samples were then plotted by PC1 and PC2 for only the cases and controls used in DS1, DS2, or DS4 (Supplementary Figure 31). We further scrutinized and removed outliers that resided outside the main cluster of cases and controls. In addition, each Sjögren’s syndrome case was genetically matched to 5 out-of-study population controls for DS1 using the cluster feature in PLINK v1.0764. After quality control, 395 Sjögren’s syndrome cases and 1975 population controls in DS1, 1243 cases and 4779 controls in DS2, 1158 cases and 3071 controls on DS3, and 1541 case and 2634 controls in DS4 were available to test for association (Supplementary Figure 1 and Supplementary Table 2).
To test for SNP-Sjögren’s syndrome association, logistic regression models were computed as implemented in PLINK v1.0764. The additive genetic model was calculated for chromosomes 1-22 while adjusting for the first three principal components (as these accounted for >80% of the variation in the dataset after quality control) and gender. For chromosome X, only females were tested for association with Sjögren’s syndrome. Meta-analysis of the SNPs observed in both DS1 and DS2 or DS3 were calculated using a weighted Z-score in METAL69. Each dataset group was weighted by the square root of its sample size to control for differences between the two phases. For a locus to be considered it must have been significant in both datasets and the Pmeta must have exceeded the suggestive threshold of Pmeta<5×10−5.
To test for meta-analysis heterogeneity, we utilized both the Cochran’s Q test statistic70 and I2 index71 (reported in the Supplementary Tables 3, 7-12). A P < 0.05 was considered significant evidence of heterogeneity for Cochran’s Q test. The I2 index ranges between 0% and 100%, where I2 equal 0% to 25%, 26% to 50%, 51% to 75%, and 76% to 100%, indicating low, moderate, high, and very high heterogeneity, respectively. These data are provided in the Supplementary Tables for each region (Supplementary Tables 3, 7-12).
Linkage disequilibrium and probable haplotypes were determined using HAPLOVIEW ver. 4.272. Haplotype blocks were calculated using the solid-spine of LD algorithms with minimum r2 values of 0.872. To determine the number of independent effects in each region, we used step-wise logistic regression adjusting for the most significant variant with P<0.0001 in the both the adjusted and unadjusted analysis to be considered. Adjusted logistic regression analysis results for each Sjögren’s syndrome-associated region were plotted using LocusZoom73.
We computed the C statistic of the multiple logistic regression model using “rms” package in R. The C statistic is the area under the receiver operator characteristic (ROC) curve that provides a measure on how well the model can discriminate between cases and population controls.
To increase informativeness, imputation was conducted in subjects from both the DS1 and DS2 over a 100 kb interval spanning loci that met criteria for genome-wide significance or select loci demonstrating suggestive significance. Imputation of both datasets was performed across 12 regions: FCGR2A (chr1: 161,425,205-161,539,360), STAT4 (chr2: 191,844,302-192,066,322), IL12A (chr3: 159,356,623-159,763,806), DGKQ (chr4: 923,459-1,001,272), TNIP1 (chr5: 150,359,504-150,517,221), PTTG1 (chr5: 159,528,253-159,902,455), HLA (chr6: 28,400,000-33,400,000), TNFAIP3 (chr6: 137,938,325-138,254,451), IRF5 (chr7: 128.527,994-128,740,088), BLK (chr8: 11,251,521-11,472,108), PRDM1 (chr6: 106,484,195-106607,814), and DDX6-CXCR5 (chr11: 118,548,473-118,816,980). Imputation was performed using IMPUTE2 and the European Impute2 1000 Genomes Phase 1 April 2012 reference panel74-76. Imputed genotypes had to meet or exceed a probability threshold of 0.9, an information measure of >0.5, and the quality control criteria described above to be included in the analyses.
The HLA classical allele imputation was performed using HLA Genotype Imputation with Attribute Bagging (HIBAG)77. HIBAG is a highly accurate, computationally tractable package for imputing HLA types using SNP data without a training dataset in various populations, including European population. By using HLA SNPs after QC, we imputed the HLA classical alleles in seven HLA genes in both DS1 and DS2 (Supplementary Figure 6). Individuals with imputation score < 0.8 and alleles with frequency < 0.01 were removed for analysis.
Gene expression profile analysis: We performed global gene expression profiling (GEP) analysis using peripheral blood on 48,803 probes representing 37,805 loci from the Illumina HumanWG-6 v3.0 BeadChip in 258 subjects: 182 primary Sjögren’s syndrome cases meeting AECG 2002 criteria5 and 76 population controls. Analyses were performed in the R Bioconductor suite. The GEP data were obtained from two separate microarray experiments. Quality control (QC) was applied in each dataset to filter out transcripts that were expressed in less than 10% of the subjects (detection call threshold: P<0.05) and probes with differential missing rates (P<0.001 by Fisher’s exact test) between the two datasets. Re-annotation and Mapping of Oligonucleotide Array Technologies (ReMOAT) was used to determine the quality of Illumina BeadArray probes78. Each dataset was then normalized independently using Robust Multiarray Average (RMA)79 followed by log2 transformation and quantile normalization. ComBat was subsequently applied to the combined dataset to adjust for non-biological experimental variation (batch effect)80. After QC and normalization, 15,063 probes (in 12,248 genes) were kept for further statistical analysis.
Cis-expression quantitative trait loci analysis: Expression levels of the Sjögren’s syndrome-associated genes identified in the meta-analysis of DS1 and DS2 exceeding GWS were selected for use as phenotypic traits in cis-eQTL analyses in 222 European subjects (190 cases and 32 controls). Variants associated with Sjögren’s syndrome (P<5×10−5) in the meta-analysis flanking the target genes were evaluated for genetic association with gene expression levels using both a linear model in Matrix-eQTL81 (using gender and disease status as covariates) and Analysis of Variance (ANOVA) models in Prism 6. An eQTL was considered significant if a false discovery rate (FDR) adjusted P-value, accounting for the number of tests performed in each region, was P<0.05.
GenomeRunner82 was used to search for statistically significant enrichment of the set of 9,394 SNPs showing suggestive (P<5×10−5) or lower association with Sjögren’s syndrome with annotated genomic features. Genome annotations include genes and gene prediction tracks, mRNA and EST tracks, regulation related tracks from ENCODE project, histone modification marks, comparative genomics tracks, structural variation and repeats tracks. GenomeRunner evaluates the null hypothesis that co-localization of a set of Sjögren’s syndrome-associated SNPs with genome annotation features is not statistically significantly different from what would be observed for a random set of SNPs of the same size. Briefly, the Sjögren’s syndrome-associated set of SNPs was tested for co-localization with each of the genome annotation tracks obtained from the UCSC genome database83 for the human GRCh37/hg19 genome assembly. The whole set of 219,958 SNPs after imputation of select regions present on the ImmunoChip array was used to randomly select an equally sized group.
Separate analyses included identification of overrepresented experimentally validated TFBSs transcription factor binding sites, a total of 148 of them extracted from the “Transcription Factor ChIP-seq from ENCODE” track (wgEncodeRegTfbsClustered). Regions of Chromatin State Segmentation by HMM from ENCODE/Broad (wgEncodeBroadHmmGm12878HMM) were also used to assess overrepresentation of defined chromatin states containing Sjögren’s syndrome-associated SNPs. Tracks of Histone Modifications obtained by ChIP-seq from ENCODE/Broad Institute, from ENCODE/Stanford/Yale/USC/Harvard, and from ENCODE/University of Washington were used for assessment of epigenetic marks. Unless otherwise noted, data from the GM12878 lymphoblastoid cell line were used, as the majority of the best quality genome annotation data was obtained from this cell line. The number of Sjögren’s syndrome-associated SNPs co-localized with a genome annotation track was compared with that of obtained from 1,000 random simulations using Chi-square test. Additional analyses of random sets of SNPs showed no statistically significant enrichments (data not shown). For an enrichment to be considered statistically significant, we set a Bonferroni correction threshold for 6000 independent tests at P<8.33×10−6.
The HLA region was defined by genomic coordinates chr6:28400295-33398620 (GRCh37/hg19 human genome assembly). The number of significant SNPs in this region overlapping with RFX5 binding sites was calculated using BedTools84. The number of all genome-wide associated SNPs in the HLA region overlapping RFX5 binding sites was calculated and compared with the SNPs significantly associated with Sjögren’s syndrome using Fisher’s exact test (scipy.stats.fisher_exact function).
The Disease Association Protein-Protein Link Evaluator (DAPPLE)18 was used to assess the potential effects of associated variants on the underlying biology. For DAPPLE, seed variants were converted to genes based on variant overlap and LD structure within the region of interest. Analyses were performed using the following settings: 10,000 permutations; a common interactor binding degree cutoff of 2; gene regulatory region 50kb upstream and downstream of transcription start and end; simplification of the indirect network; and iteration to prioritize genes with a Bonferroni-corrected score of P<0.05. A total of 14 GWS variants and 114 suggestive variants were present in the DAPPLE databases; where variants were not found to be present, attempts to find proxies in high LD were made.
Pathway analysis was carried out using the Genomatix Pathway System (GePS)19. This system utilizes information obtained from public and proprietary annotation databases to characterize input gene lists, determine the representation of these genes in canonical pathways, and create gene networks based on co-citation of genes from the literature. The public annotation data sets incorporated in this software include gene ontology (GO) and medical subject headings (MeSH) in addition to others. GePS also provides a probability statistic using Fisher’s exact test to determine the likelihood of finding the number of co-represented genes with a particular annotation when considering the length of the input gene list. Input gene sets were generated using the gene locus nearest the association signals, which were required to demonstrate suggestive or genome-wide significance, and the pathway analysis was carried out using the gene symbols of the input set.
We are grateful to all the individuals with Sjögren’s syndrome and those serving as healthy controls who participated in this study. We would like to thank the following individuals for their help in the collection and ascertainment of the samples used in this study: Erin Rothrock, Judy Harris, Sharon Johnson, Sara Cioli, Nicole Weber, Dominique Williams, Wes Daniels, Cherilyn Pritchett-Frazee, Kylia Crouch, Laura Battiest, Justin Rodgers, James Robertson, Thuan Nguyen, Amanda Crosbie, Ellen James, Carolyn Meyer, Amber McElroy, Eshrat Emamian, Julie Ermer, Kristine Rohlf, Joanlise Leon, Anita Petersen, Danielle Hartle, Jill Novizke, Ward Ortman, Carl Espy, Beth Cobb, Gudlaug Kristjansdottir, Marianne Eidsheim and Joelle Benessiano Centre de Ressources Biologiques, Hôpital Bichat, Paris, France. We would also like to thank Stuart Glenn and Jared Ning for their ongoing assistance in developing and maintaining the computational infrastructure used to perform this study. We would like to thank the following funding agencies for their support: This publication was made possible by grants P50 AR0608040 (K.L.S., C.J.L., R.H.S., and A.D.F.), 5R01 DE015223 (K.L.S. and J.B.H.), 5RC2 AR058959 (P.M.G.), 5P01 AR049084-10 (J.B.H), 5P30 AR053483 (J.A.J. and J.M.G.), 5U19 AI082714 (K.L.S., J.A.J., and C.J.L.), 1R01 DE018209-02 (K.L.S and J.B.H.), 5R01 DE018209 (K.L.S.), 8P20 GM103456 (P.M.G., C.J.L., J.D.W, and I.A.), P20 GM103636 (M.G.D. and J.D.W.), 5R37 AI024717-25 (J.B.H), 5P01 AI083194-03 (K.L.S. and J.B.H.), 7S10 RR027190-02 (J.B.H.), 1U01 AI101934 (J.A.J. and J.M.G.), 1RC1 AR058554 (J.A.J. and J.M.G.), and 5P30 GM103510 (J.A.J. AND J.M.G.) from the NIH. The contents are the sole responsibility of the authors and do not necessarily represent the official views of the NIH. Additional funding was obtained from Intramural Research Program of the National Institute of Dental and Craniofacial Research (G.G.I.), US Department of Veterans Affairs IMMA 9 (J.B.H.), USA Department of Defense PR094002 (J.B.H), American College of Rheumatology Research and Education Foundation/Abbott Health Professional Graduate Student Preceptorship Award 2009 (C.J.L. and K.L.S.), Oklahoma Medical Research Foundation (C.J.L. and K.L.S.), Sjögren’s Syndrome Foundation (K.L.S.), Phileona Foundation (K.L.S.), French ministry of health: PHRC N°2006-AOM06133 and French ministry of research: ANR-2010-BLAN-1133 (X.M. and C.M.R.), The Strategic Research Program at Helse Bergen, Western Norway Regional Health Authority (L.G.G., J.G.B., and R.J.), The Broegelmann Foundation (J.G.B. and R.J.), Norwegian Foundation for Health and Rehabilitation (E.H.), KFO 250 TP03, WI 1031/6-1 (T.W.), KFO 250, Z1 (T.W.), Medical Research Council, UK G0800629 (W.F.N. and S.B.), Northumberland, Tyne & Wear CLRN (W.F.N.), The Swedish Research Council (M.W.H. and L.R.), The King Gustaf the V-th 80-year Foundation (M.W.H.), Knut and Alice Wallenberg Foundation (L.R.), and The Swedish Rheumatism Association (M.W.H., G.N., L.R., and P.E.). This study made use of genotypes available through dbGAP, with acknowledgment provided in the supplement.
HLA Genotype Imputation with Attribute Bagging http://cran.r-project.org/web/packages/HIBAG/
R Bioconductor suite http://www.R-project.org
Genomatix Pathway System http://www.genomatix.de
Accession Numbers: Genes and Blood Clotting Study dataset(s) obtained through dbGaP: phs000304.v1.p1
GWAS of Ischemic Stroke dataset(s) obtained through dbGaP: phs000292.v1.p1
CIDR:NGRC Parkinson’s disease study dataset(s) obtained through dbGaP: phs000196.v2.p1
High Density SNP Association Analysis of Melanoma: Case-Control and Outcomes Investigation dataset(s) obtained through dbGaP: phs000187.v1.p1
Competing Financial Interests: The authors declare no competing financial interests.