|Home | About | Journals | Submit | Contact Us | Français|
Systemic lupus erythematosus (SLE) is the prototypic systemic autoimmune disorder with complex etiology and a strong genetic component. Recently, gene products involved in the interferon pathway have been under intense investigation in SLE pathogenesis. STAT1 and STAT4 are transcription factors that play key roles in the interferon and Th1 signaling pathways, making them attractive candidates for SLE susceptibility.
Fifty-six single-nucleotide polymorphisms (SNPs) across STAT1 and STAT4 genes on chromosome 2 were genotyped using Illumina platform as a part of extensive association study in a large collection of 9923 lupus cases and controls from different racial groups. DNA from patients and controls was obtained from peripheral blood. Principal component analyses and population based case-control association analyses were performed and the p values, FDR q values and Odds ratios with 95% confidence intervals (95% CIs) were calculated.
We observed strong genetic associations with SLE and multiple SNPs located within the STAT4 gene in different ethnicities (Fisher combined p= 7.02×10−25). In addition to strong confirmation of the association in the 3rd intronic region of this gene reported previously, we identified additional haplotypic association across STAT4 gene and in particular a common risk haplotype that is found in multiple racial groups. In contrast, only a relatively weak suggestive association was observed with STAT1, probably due to the proximity to STAT4.
Our findings indicate that the STAT4 gene is likely to be a crucial component in SLE pathogenesis among multiple racial groups. The functional effects of this association, when revealed, might improve our understanding of the disease and provide new therapeutic targets.
Systemic lupus erythematosus (SLE) is a complex multi-organ autoimmune disorder with a strong genetic component characterized by breakdown of self-tolerance, which results in a wide range of immunological abnormalities including pathogenic immune complex formation, T and B lymphocyte dysregulation, and defective clearance of apoptotic materials.
The roles of various cytokines and their signaling molecules have gained importance in understanding the pathogenesis of SLE. Among these, the interferons, both type I and type II, have received particular attention. Accordingly, peripheral blood mononuclear cells from SLE patients show a pattern of upregulated IFN-induced genes (1–3) and this “interferon signature” correlates with disease severity markers (4). IFN-α treatment of individuals with viral infections or malignancies might result in SLE-like manifestations (5, 6). Increase serum IFN-activity was found to be a heritable trait in families with SLE (7). Furthermore, a component of the interferon pathway, IRF5 has been established as an SLE susceptibility gene (8–10). Similarly, the participation of IFN-γ (the only type II IFN), has been inferred in human SLE, and confirmed in lupus mice (11–13).
Signal transmission from the interferons involves STAT1 and STAT4, which are members of the signal transducer and activators of transcription (STAT) family of transcriptional factors. These proteins are involved in essential cellular events such as differentiation, proliferation, and apoptosis following cytokine and growth factor signaling (14). T
By binding to their receptors, interferons and other cytokines trigger Jak kinases to phosphorylate and activate STAT proteins (15). Before activation, STAT proteins are cytosolic and activation by tyrosine phosphorylation results in their homo- and hetero-dimerization through interactions involving their SH2 domains; STAT dimers then translocate to the nucleus, where they either directly bind to DNA or act together with other DNA-binding proteins in multiprotein transcription complexes to direct transcription of a large variety of gene products (14, 15).
The human STAT genes have been identified in three chromosomal clusters: STAT1 and STAT4 on human chromosome 2 (q12-33), STAT2 and STAT6 on chromosome 12 (q13-14) and STAT3, STAT5a, and 5b on chromosome 17 (q11.2-22) (16).
STAT1 is activated both by IFNα/β and by IFN-γ signaling (17), which plays an important role in the activation of macrophages and in the defense response to pathogenic agents (18, 19). STAT1 targets genes that can promote inflammation and induce apoptosis (17).
STAT4, identified through its homology to STAT1, was found to lie adjacent to the STAT1 gene at 2q32.2-2q32.3, containing 24 exons and spanning 122 Kb. STAT4 is activated by several cytokines including IL-12, IL-23 and IFNα, and stimulates the transcription of specific genes including IFNγ (20). Previous genome scans in SLE have revealed linkage to the 2q33 region (21, 22). This fact, together with the extensive involvement of type I and type II IFNs in the pathogenesis of SLE, made the cluster of STAT1 and STAT4 on chromosome 2q an obvious candidate region for genetic predisposition to this autoimmune disease. Recently, Remmers et al (23) showed genetic association between STAT4 and RA and also association of one SNP (rs7574865) with SLE in Europeans. Furthermore, two recently published genome-wide association studies of SLE in European ancestry populations confirm the association with the STAT4 gene (24, 25)
In this report, we describe the results of a fine mapping study in which we evaluated 56 single nucleotide polymorphisms (SNPs) spanning the STAT1 and STAT4 genes on chromosome 2 in a large collection of 9923 lupus cases and controls from different racial groups. This study is the largest study of these genes in SLE and the first to investigate the associations in populations with higher prevalence including African Americans and Hispanics. Our results confirm and significantly extend the previous association in multiple racial groups.
The present study included 9923 participants (4771 SLE cases and 5,152 controls) enrolled in the Lupus Genetics Studies at OMRF as described (26), in the Lupus Genetic Study Group at USC as described (27), in the PROFILE Study Group at UAB (28), and from additional collaborators to the studies. The demographics and numbers of samples are provided in Table 1. Among SLE cases, 769 independent cases were defined as childhood-onset according to the criterion that the diagnosis of SLE was made before the age of 13 by at least one pediatric rheumatologist participating in the study. All protocols were approved by the Institutional Review Boards at each respective institution. All patients met the revised 1997 ACR criteria for the classification of SLE (29). Ethnicity was self-reported and verified by parental and grandparental ethnicity, when known. Blood samples were collected from each participant, and genomic DNA was isolated and stored using standard methods.
Genotyping was performed using Illumina iSelect™ Infinium II Assays on the BeadStation™ 500GX system (Illumina, San Diego, CA) at the Lupus Genetics Studies unit of the Oklahoma Medical Research Foundation and at the University of Texas Southwestern DNA Microarray Core facility. Genotype data were only used from samples with a call rate greater than 90% of the SNPs screened (98.05% of the samples). The average call rate for all samples was 97.18%. For analysis, only genotype data from SNPs with a call frequency greater than 90% in the samples tested and an Illumina GenTrain score greater than 0.7 were used. GenTrain scores measure the reliability of SNP detection based on the distribution of genotypic classes. In order to minimize sample misidentification, data from 91 SNPs that had been previously genotyped on 42.12% of the samples were used to verify sample identity. In addition, at least one sample previously genotyped was randomly placed on each Illumina Infinium BeadChip and used to track samples throughout the genotyping process.
Testing for association was completed using the freely available programs SNPGWA (http://www.phs.wfubmc.edu/web/public_bios/sec_gene/downloads.cfm) and PLINK (30). For each SNP, missing data proportions for cases and controls, minor allele frequency and exact tests for departures from Hardy-Weinberg expectations were calculated. In addition to allelic test of association, the additive genetic model was used as the primary hypothesis of statistical inference. If the lack-of-fit (LOF) test for the additive model was significant (LOF p<0.05), then the minimum p-value from the dominant, additive or recessive models is reported. For recessive models, at least 30 individuals homozygous for the minor allele were required. Haploview version 4.0 (31) was used to estimate the linkage disequilibrium (LD) between markers and haplotype structures in different ethnicities. The deviation of the observed frequency of a haplotype from the expected is a quantity called the linkage disequilibrium and is commonly denoted by a capital D. D has the disadvantage of depending on the frequency of the alleles. The so called D′ is a common normalized measure of D by dividing it with the theoretical maximum for the observed allele frequencies.
Conditional haplotype analyses were conducted using WHAP program version 2.09 and conditional logistic regression for clinical or serological criteria were conducted using PLINK. African-American and Gullah populations (a relatively more homogenous group of African-American who live in the Low Country of South Carolina) have been analyzed separately. Combined p values were calculated from the per-ethnicity p values using the Fisher method. Q values were calculated using the q value package (available from http://cran.r-project.org) which implements the q value correction of False Discovery Rate (FDR) (32). Q values correspond to the proportion of false positives among the results. Thus, Q values less than 0.05 signify less than 5% of false positives and is taken as a measure of significance.
To account for potential confounding substructure or admixture in these samples, principal component analyses (PCA) were performed (33) using all SNPs (numbering 20,506 genotyped on these subjects as part of a large effort to determine the genetic susceptibility in SLE) except those within the HLA region and known associations from the published genome scan (24). Four principal components were identified that explained a total of ~60% of the observed genetic variation. The PCA scores were used to identify individual that were genetically distant from the other samples and prone to introducing admixture bias. A total of 252 controls and 165 cases were so identified and removed from further analysis (European American: 124 controls and 89 cases; African American: 88 controls and 38 cases; Hispanic American 35 controls and 30 cases; Korean: 1 control; Gullah: 4 controls and 8 cases). After removing these genetic outliers, duplicates and relative samples, 4374 independent SLE cases and 4860 controls remained for analysis. All these subjects were also independent from SLEGEN study (24). We then performed genomic control analysis to calculate the inflation factor λ (Lambda) using all SNPs minus HLA region and previously identified genes (18,446 SNPs, 92% of original SNPs), which produced a λ=1.13 in European samples, λ=1.03 in Hispanics, λ=1.08 in African-Americans, λ=1.04 is Koreans and λ=1.02 in Gullah. Inflation factor is a measure that quantifies the degree to which population stratification increases the χ2 test statistic.
Only the Hispanic sample required PCA as covariates in the logistic regression model to remove the final source of confounding via admixture to obtain the above inflation factor.
To determine if STAT variants associate with SLE, we genotyped 59 SNPs that span the STAT1 and STAT4 genes in our subjects. Fifty-six of these SNPs passed quality control standards and were subsequently used for analyses. The SNPs were evaluated in multiple racial groups (Table 1). To address the population stratification and admixture effect, all outliers have been removed, and results have been corrected based on principal component analyses.
Childhood-onset SLE presents a unique subgroup of patients for genetic study because an earlier disease onset, a more severe disease course, a greater frequency of family history of SLE, and a lesser effect of sex hormones in disease development (34,35) may imply involvement of different genetic factors relative to adult onset disease. Therefore, we initially analyzed these two groups separately. We had a total of 769 samples of the childhood-onset cases. As shown in Fig. 1, the p values for SNPs in STAT4 gene are well-correlated between adults and childhood onset cases, with overall correlation coefficient r =0.84, justifying the joint analysis of the two groups, as presented in all following results. We detected significant associations (10−15<p<10−5) in the STAT4 gene. As shown in Table 2, the greatest significance was observed with rs10168266 (p=1.38×10−15) in the Europeans with six other SNPs at p<10−9 (rs7568275: 4.26×10−15, rs7582694: 7.67×10−15, rs10181656: 1.16×10−14, rs3024886: 2.71×10−13, rs10174238: 3.30×10−13, rs3821236: 3.41×10−11). Furthermore, all significant SNPs observed in the European population were also strongly significant in the Asian-Korean population, with the strongest significance observed with rs10168266 ((4.00×10−10) (Table 2). Several SNPs were also found to be associated less strongly (10−5<p<10−3) in the Hispanic and African populations (Supplementary Table 1 and Fig. 2). In genotype based analyses, the best model of association for almost all of SNPs was additive model (supplementary Table 1). Several SNPs were also found to be associated less strongly (10−5<p<10−3) in the Hispanic and African populations (Supplementary Table 1 and Fig. 2). Seventeen SNPs were genotyped in the STAT1 gene, but only produced suggestive results (0.0005<p<0.05) (Fig. 2 and Supplementary Table 1). The classical Bonferroni correction for multiple testing is both too strict and inappropriate in studies such as the present one because it assumes that each test is independent, whereas in actuality a complex and unknown mutual dependence is present among SNPs of the same gene. Therefore, for multiple test correction we calculated the false discovery rate (FDR) q values (32) (Table 2, and supplementary Table 1).
Haplotype analyses in different racial groups identified multiple significant haplotypes (Fig. 3 and Table 3). Particularly in European, three major significant haplotypes have been detected spanning 73 Kb from 3rd intron to exon 17 of STAT4 gene (Fig 3 and Table 3). These three risk haplotypes are: Block1 (13 Kb) AAAG, spanning from exon 17 to intron 14 with p=9.24×10−14, block 2 (18kb) CATTTAAA spanning from intron 14 to exon 4 with p=4.25×10−14 and block 3 (32 Kb) GGCGAGCG located mostly on 3rd intron of STAT4 gene with p= 1.69×10−15. Parts of these three major haplotypes in European were also significant in other ethnical backgrounds with the same sequence (Table 3). Especially, an eight marker haplotype spanning 18 kb across (Block2) was strongly significant in Korean-Asian and Hispanic population (Table 3). The frequency of this conserved haplotype was 39% in Korean-Asian, 23% in European and 40% in Hispanic patients (Table 3). Conditional analyses on this haplotype showed that SNP rs10168266 explained the whole association in this haplotype. In addition, in African-American, part of this haplotype (11 kb) were also significant with frequency of 70% in lupus cases compare to 65% in controls (p=7.90×10−3) (Table 3). An eight marker haplotype in intron 3 of STAT4 (Block 3, GGCGAGCG) were also significantly associated in the Europeans (p=1.69×10−15) (Table 3). In this haplotype there was no single SNP that could explain the whole association mainly because of the high LD between the most significant associated SNPs rs7568275, rs10181656, rs7582694, and rs10174238 (Fig. 3) located in this haplotype. In fact, conditional logistic regression showed that the GAGCG haplotype as a unit explained the association in this haplotype. Part of this haplotype was also significant in the Hispanic and Asian populations but in African-Americans different haplotype in this block was significant (Table 3). Because of relatively high LD (D′=0.87) between the two haplotypes: (CATTTAAA) and (GGCGAGCG) (Block 2 and 3), an extended haplotype consisting of these two was reconstructed (CATTTAAAGGCGAGCG) and reevaluated in European, Korean and Hispanic. Table 3 shows the frequency in case and controls of this extended haplotype which remained significant in multiple ethnicities (in European best p=4.32×10−14). Conditional analyses in this extended haplotype suggest that SNP rs10168266 is the best SNP that could explain the entire omnibus result. Furthermore, when, only responsible variants or units in each haplotype were combined, i.e. SNPs: (rs10168266, rs7568275, rs10181656, rs7582694 and rs10174238) SNP rs10168266, still had an independent effect after controlling for everything else (p=0.009).
As expected, the block structure in the STAT4 region generally was less cohesive across the entire region in the African American samples than in the Korean-Asian or the European and therefore the extended haplotype block cannot be reconstructed in the African population (Fig. 3).
Finally, one additional haplotype that was less significantly associated in European (p=5.70×10−6) and Korean-Asian population (p=6.61×10−8) was located at 3′ end of the STAT4 gene (C-terminal domain). This haplotype is almost 25 Kb removed from the highly significant associated SNPs and manifests a weak LD (D′= 0.54) with the first block in Figure 3. Conditional analysis with the block 1 haplotype suggests that this association is unlikely to have an independent effect (p=0.63).
We performed stratification analyses by gender, age of onset, the 11 ACR criteria, and presence of autoantibodies (anti-Ro, anti-La, anti-RNP, anti-dsDNA, anti-SM) for the SNPs that most likely explain the haplotypic association. After corrections for multiple comparisons, such analysis did not improve the significance of the results (p values) beyond what we have shown in Table 3; However, presence of anti-Ro antibodies in European, did improve the odds ratio to 1.74 (1.36–2.21) in the best associated SNP (rs10168266), while in Korean-Asian early age of onset (less than 13 years old) improved the odds ratio for this SNP to 2.44 (1.66–3.60).
We have identified robust associations between the STAT4 (but not STAT1) gene and multiple SNPs in a large study of SLE cases and controls from different racial backgrounds (best combined p=7.02×10−25). To our knowledge this is the first study that targets both STAT4 and STAT1 genes with high resolution SNP genotyping in an extensive case-control study that includes high risk minority African-American, Hispanic, and Korean-Asian populations and a large cohort of childhood-onset SLE.
We found that the observed association with STAT1 is orders of magnitude weaker than that of STAT4 with respect to both p values and number of significant SNPs. This suggestive association with STAT1 most likely reflects the LD with STAT4 since STAT1 and STAT4 are only 25 kb apart. Indeed, we observed (D′=0.96) between rs10199181 in STAT1 among Europeans which produced the best suggestive result (p =1.40 ×10−3) in this gene and the nearest marker on 3′ end of STAT4 gene rs3024896 (p=6.13×10−6). However, this suggestive result disappeared completely p=0.60 upon conditional analyses. Although the genetic association between SLE and STAT1 cannot be formally excluded, our collection of almost 10,000 samples, which has a 99% power to find effects with odds ratio (OR) = 1.3 at p=10−8, and 84% power to find effects with OR =1.2 at p=10−6 for SNPs with a minor allele frequency of 0.3 and D′=1, makes it highly unlikely that STAT1 is a susceptibility gene for SLE.
Remmers et al (23) have demonstrated association of one STAT4 SNP (rs7574865) with SLE in Europeans. Although we did not directly type rs7574865 in our samples, the HapMap CEU data indicate that rs7574865, is a perfect proxy of rs7568275 in European Americans (r2=1). The SNP rs7568275 was one of the top associated SNPs in our data with Fisher combined p= 1.08×10−22 (Table 2 and supplementary Table 1). Using HapMap CEU data in European, imputation method support this proxy association for SNP rs7574865 with p=6.41×10−15 in our data. Both of these SNPs are located in the third intron of STAT4.
Lee et al. (36) replicated the association of STAT4 with RA in European and Korean patients. Three SNPs (rs10181656, rs13017460, and rs1517352) that were significantly associated with rheumatoid arthritis in Korean patients, were also significant in both European and Asian cases in our SLE subjects with the same associated alleles (Table 2 and supplementary Table 1). In addition, the minor allele frequencies observed between our study and the studies by Remmers et al. and Lee at al. are similar.
Also, Korman et al. (37) reported an association with rs7574865 and primary Sjogren’s syndrome (PSS) in a study of 124 Caucasian PSS subjects and 1143 controls (p=0.01). PSS and SLE share overlapping autoantibody profiles (such as anti-Ro) and B lymphocyte hyperactivity, supporting the notion that related autoimmune diseases share common risk variants in STAT4.
Using a relatively dense map of SNPs within the STAT4 gene enabled us to construct haplotypes in the various racial groups. While we identified and strongly confirmed multiple SNP associations in 3rd intron of STAT4 gene, we also detected two additional haplotypes adjacent to this intron by dense SNP genotyping that spans from exon 4 to exon 17 of this gene. Most importantly, one of these haplotypes, an eight marker haplotype spanning 18 kb from exon 4 to exon 14 of the STAT4 gene was significant in multiple ethnicities. Conditional analyses on this haplotype suggest that rs10168266 can explain the whole association in this haplotype and, therefore, the genotyping data for this SNP can predict the risk or protective haplotype. Indeed, this SNP produced the best results in European, Korean-Asian, and Hispanic with the combined Fisher p value of p= 1.12×10−24 in our study (Table 2 and supplementary Table 1). The best model of association for this SNP (and most other highly significant markers) was an additive model (European p= 7.80×10−16) (supplementary Table 1). This intronic SNP is located between exons 5 and 6 of STAT4 gene and is 28 kb apart from the published rs7574865 at 3rd intron with the estimated LD=0.90 and r2=0.62 between them in European population. However as mentioned previously they are located on different but adjacent haplotypes (block 2 and block 3 respectively) (Figure 3).
Functionally, STAT 4 is the main transcriptional regulatory molecule for IL-12 and, as such, is pivotal to the development of a fully functioning Th1 immune response (38). Polarization of the immune response to Th1 vs Th2 has profound in vivo consequences. In general, Th1-type immune res ponses are characteristic of cell-mediated immunity, while Th2-type responses are associated with help for B-cell antibody production (humoral immunity) and allergic phenomena (39). Indeed, STAT4-deficient mice are protected from the effects of Th1-cell mediated autoimmune diseases. In models of experimental allergic encephalomyelitis (EAE) (40), experimental arthritis (41), colitis (42), myocarditis (43), and diabetes in the NOD mouse (44), STAT4-deficient mice display less disease, decreased parameters of inflammation, and reduced secretion of IFNγ. Although IFNγ-deficiency does mimic STAT4 deficiency in an arthritis model (41), IFNγ-deficient mice are not protected from EAE (45), myocarditis (43), or colitis (42). Thus, although IFNγ is an important STAT4-induced immune mediator, STAT4 regulates other genes independent of IFNγ that are crucial for the development of such diseases. By contrast, in other autoimmune diseases that are not Th1 mediated, STAT4-deficient mice are not protected from autoimmune diseases that are not Th1-mediated, such as myasthenia gravis or Graves’ disease (46, 47). Moreover, STAT4-deficiency caused more severe SLE-like disease in NZM 2328 and NZM 2410 mice than in corresponding wild-type control mice (48, 49).
Since other cytokines (IL-23, IFNα) in addition to IL-12, can activate STAT4 (20), the phenotype of STAT4-deficient mice, in the different mouse models, may actually be a composite effect of defects in IL-12, IL-23, and IFNα signaling, in addition to other yet unidentified cytokine pathways (20). It should also be stressed that the different effects of STAT4 deficiency in animal models of arthritis compared to those in SLE point to the real possibility that the causal STAT4 polymorphisms in SLE and RA differ from each other.
A relatively small replication study in Swedish SLE patients suggest a significant correlation between the European risk allele and production of anti-dsDNA autoantibodies (50) while a second study in North Americans of European ancestry suggest a strong correlation between SNP rs7574865 and anti-dsDNA Abs and an even stronger association with nephritis (51). These subphenotype associations should be interpreted with caution given that the same SNP alleles are associated with RA patients that do not have anti-dsDNA autoAbs and do not develop nephritis. In this regards, the in vivo findings that STAT4 deficient lupus mice develop accelerated nephritis despite decreased levels of anti-dsDNA autoAbs (48, 49) might be highly relevant. Subphenotype analyses of our data, in a much larger collection of SLE subjects then these previous studies, do not support a stronger association between STAT4 risk alleles and autoAb levels or presence of nephritis. For example, rs1068266, that produces the best association with nephritis in our European cases (OR=1.59) was not statistically significant from the results obtained with the same SNP in Europeans without nephritis (OR=1.46).
Based on the nature of the SNPs implicated in the present study, it is premature to suggest a molecular mechanism that would explain the gene’s association with SLE. Although we used a relatively dense SNP map, the possibility of other polymorphisms that may be responsible for the exact disease mechanism still exists and must await a complete resequencing of the STAT4 gene in SLE patients. Nevertheless, our findings here should significantly advance our understanding and establish new key steps in the pathogenesis of the disease. Furthermore, the unambiguous establishment of STAT4 as a susceptibility gene for SLE provides justification for the developments of therapeutic approaches targeting this molecule or other molecules within its biochemical pathway.
The cooperation of patients and normal control individuals involved in this study is gratefully acknowledged. This work was supported in part by NIH grant RO1AR445650 and ALR grant 52104 to COJ and by the USC FCE. At the Oklahoma Medical Research Foundation (OMRF), the work was supported by the NIH (AR42460, RR015577, AI31584, AR12253, AR48940, DE015223, RR020143, AI062629, AI24717, AI07633, and AR62277), Lupus Foundation of America, the Alliance for Lupus Research, and the U.S. Department of Veterans Affairs. At the University of Alabama at Birmingham, the work was supported by NIH grants P01-AR49084, P60-AR48095, T32-AR07450, M01-RR00032. Members of the PROFILE Study Group include GS Alarcon, E Brown, JC Edberg, BJ Fessler, RP Kimberly, G McGwin Jr, M Petri, R Ramsey-Goldman, J Reveille, LM Vila.