|Home | About | Journals | Submit | Contact Us | Français|
Rheumatoid arthritis is a chronic inflammatory disease with a substantial genetic component. Susceptibility to disease has been linked with a region on chromosome 2q.
We tested single-nucleotide polymorphisms (SNPs) in and around 13 candidate genes within the previously linked chromosome 2q region for association with rheumatoid arthritis. We then performed fine mapping of the STAT1-STAT4 region in a total of 1620 case patients with established rheumatoid arthritis and 2635 controls, all from North America. Implicated SNPs were further tested in an independent case-control series of 1529 patients with early rheumatoid arthritis and 881 controls, all from Sweden, and in a total of 1039 case patients and 1248 controls from three series of patients with systemic lupus erythematosus.
A SNP haplotype in the third intron of STAT4 was associated with susceptibility to both rheumatoid arthritis and systemic lupus erythematosus. The minor alleles of the haplotype-defining SNPs were present in 27% of chromosomes of patients with established rheumatoid arthritis, as compared with 22% of those of controls (for the SNP rs7574865, P = 2.81×10-7; odds ratio for having the risk allele in chromosomes of patients vs. those of controls, 1.32). The association was replicated in Swedish patients with recent-onset rheumatoid arthritis (P = 0.02) and matched controls. The haplotype marked by rs7574865 was strongly associated with lupus, being present on 31% of chromosomes of case patients and 22% of those of controls (P = 1.87×10-9; odds ratio for having the risk allele in chromosomes of patients vs. those of controls, 1.55). Homozygosity of the risk allele, as compared with absence of the allele, was associated with a more than doubled risk for lupus and a 60% increased risk for rheumatoid arthritis.
A haplotype of STAT4 is associated with increased risk for both rheumatoid arthritis and systemic lupus erythematosus, suggesting a shared pathway for these illnesses.
RHEUMATOID ARTHRITIS IS THE MOST common cause of adult inflammatory arthritis and is associated with considerable disability and early mortality.1 Studies of twins clearly show a genetic contribution to disease susceptibility,2 and the siblings of patients with seropositive, erosive rheumatoid arthritis have an estimated risk of developing the disease of between 5 and 10 times that of the general population.3 The highly polymorphic HLA region is a major contributor to genetic risk of rheumatoid arthritis.4 Several other genes associated with more modest risks have recently been identified, including the Arg620→Trp variant of the intracellular phosphatase gene PTPN22.5,6 However, the definitive identification of additional risk genes outside the HLA region has been challenging.
We recently described a linkage peak with nearly genomewide significance on the long (q) arm of chromosome 2 in 642 families of European ancestry7 collected by the North American Rheumatoid Arthritis Consortium (NARAC).8 The region encompasses more than 50 million base pairs (Mb) of genomic DNA and has also been implicated in previous meta-analyses of linkage-study data.9,10 In the current study, we undertook a large case-control disease-association analysis of 13 selected candidate genes within the chromosome 2q linkage region.
The NARAC case-control series included one affected member (the proband, if a DNA sample from the proband was available) of each family of European descent from the NARAC collection of affected sibling pairs7,8 and unrelated controls of self-identified European ancestry from the New York Cancer Project11 (www.amdec.org/amdec_initiatives/nycp.html). The rheumatoid arthritis replication series consisted of singleton case patients with rheumatoid arthritis who were positive for anti-cyclic citrullinated peptide antibody, obtained through the Wichita Rheumatic Disease Data Bank,12 the National Inception Cohort of Rheumatoid Arthritis Patients,13 and the Study of New Onset Rheumatoid Arthritis,14 and additional unrelated controls of European descent obtained from the New York Cancer Project. The Swedish case-control series included case patients and controls from the Epidemiological Investigation of Rheumatoid Arthritis Swedish inception cohort.15,16
Case patients with lupus were obtained from three sources: the University of California at San Francisco (UCSF) patients were participants in the UCSF Lupus Genetics Project17 and were recruited from UCSF Arthritis Clinics or from private rheumatology practices in northern California or by means of nationwide outreach. Medical records were reviewed to confirm that subjects met the criteria of the American College of Rheumatology (ACR) for lupus.18 The Autoimmune Biomarkers Collaborative Network (ABCoN) patients were recruited from the Hopkins Lupus Cohort19 under the auspices of ABCoN20 and also met the criteria of the ACR for lupus. Data from the Hopkins historical database were used to determine fulfillment of ACR criteria. The Multiple Autoimmune Diseases Genetics Consortium (MADGC) patients were part of the MADGC collection.21 The diagnosis of lupus based on ACR criteria was confirmed either by the treating physician or by the review of medical records. These three series included only case patients of self-described European descent from the aforementioned collections. The controls were additional subjects of self-reported European ancestry from the New York Cancer Project.
The numbers of case patients and controls in the three rheumatoid arthritis and the three lupus case-control series are listed in Table 1. The institutional review boards of all investigative institutions approved these studies, and all participants provided written informed consent.
We selected candidate genes from within a linkage region of 52 Mb on chromosome 2q (the 2-LOD support interval) that we defined previously.7 For each selected gene, we initially used HapMap Phase I data to identify tag single-nucleotide polymorphisms (SNPs) that captured the majority of the then-known common SNP variation (i.e., that present on ≥5% of chromosomes) in the gene (defined as the sequence ranging from 10 kb upstream of the coding sequence to 10 kb downstream) using an r2 threshold of 0.8 or greater. (The r2 correlation coefficient is a measure of linkage dis-equilibrium determined by the allelic correlation between SNPs; if r2 = 1, the markers are perfect predictors of one another.) Some SNPs were not genotyped directly; rather, their imputed genotypes were inferred from multimarker combinations22 (see Fig. S1 in the Supplementary Appendix, available with the full text of this article at www.nejm.org).
For fine mapping of the STAT1-STAT4 region, we used HapMap Phase II data to select additional SNPs that captured the majority of the known common variation in the region and that had a pairwise r2 of more than 0.8. We also included all nonsynonymous coding SNPs reported in dbSNP, the SNP database of the National Center for Biotechnology Information; all SNPs within motifs conserved across species that were identified with the use of the University of California at Santa Cruz (UCSC) genome browser phastCons (conserved-elements track) (http://genome.ucsc.edu); and SNPs disrupting putative transcription-factor binding sites, identified with the use of the UCSC human-mouse-rat conserved transcription-factor binding sites track (http://genome.ucsc.edu).
DNA samples were obtained from all subjects. The samples were genotyped by means of one of two methods: a custom, highly multiplexed, bead-based array method, GoldenGate Genotyping (Illumina), and a multiplexed primer-extension method (Sequenom) (for details, see the Supplementary Appendix).
All SNPs were tested for significant deviation from Hardy-Weinberg equilibrium in controls. Those with P values of less than 0.005 were removed from the analysis. We also removed all SNPs with a minor allele frequency of less than 0.01, because of the reduced power to detect associations for rare SNPs. The remaining SNPs (including the imputed SNPs) were analyzed for an association with disease by means of comparison of the minor allele frequency in case patients and controls, with significance determined by means of a chi-square test. Odds ratios, and their 95% confidence intervals, for having the risk allele in chromosomes of case patients as compared with those of controls were also determined for selected SNPs.23 When combining data from different case-control series, we used a Mantel-Haenszel test (SPSS software, version 12.0.0; www.spss.com) to summarize the stratum-specific estimates.
Linkage-disequilibrium patterns in the STAT1-STAT4 region were determined with the use of Haploview software, version 3.32.24 The genotypes of 768 SNPs informative about European ancestry25 were used to adjust for the possibility of unmatched population structure in the case patients and the controls; STRAT software was used for structured association26 and EIGENSTRAT software for the correction of association-study results according to a method based on principal-components analysis.27
We examined the 2-LOD support interval (Fig. 1A) of the previously identified7 linkage peak on chromosome 2 for the presence of genes that might influence rheumatoid arthritis. We evaluated 13 candidate genes (Fig. 1B; for further description, see Table S1 in the Supplementary Appendix). Association results for 82 tag or imputed SNPs within the selected candidate genes from an initial set of 525 independent case patients with rheumatoid arthritis and 1165 unrelated controls are shown in Figure 1B. In addition to a known association with a SNP in CTLA428 (rs3087243, P = 0.008), we found an association with an unlinked SNP (located 15 Mb away; r2 = 0) in STAT4 (rs7574865, P = 0.002).
The most significantly (P = 0.002) associated SNP in the region, rs7574865, is in a linkage-disequilibrium block that extends from the middle of the STAT4 locus to the 3′ end of the gene (Fig. 2). There was, however, some evidence of longer-range disequilibrium that extended into STAT1 from the 3′ end of STAT4. We therefore included both genes in the fine mapping and in further analyses.
To map the location of the association with rheumatoid arthritis, we successfully genotyped the case patients and controls in the NARAC series for 63 SNPs located within the 209-kb STAT1-STAT4 region (average density, one SNP per 3.1 kb). These 63 SNPs captured 87% of the common variation (defined as a minor allele frequency of ≥0.05) in the HapMap Phase II data in the region, with an r2 value of more than 0.8. Four SNPs located within the large third intron of STAT4 had associations with rheumatoid arthritis with P values of less than 0.001. The most significant P value, 8.29×10-5, was found for rs7574865 (Table 2). The four disease-associated SNPs were in strong linkage disequilibrium (r2>0.97), and all had a minor allele frequency of 0.28 in the NARAC case patients with rheumatoid arthritis, as compared with 0.22 in the unrelated controls. Results for the complete set of 63 SNPs are given in Table S2 in the Supplementary Appendix.
To confirm the associations found in the STAT1-STAT4 region, we genotyped subjects in the rheumatoid arthritis replication series for the same 63 SNPs. Among the case patients, we genotyped only those who were positive for anti-cyclic citrullinated peptide antibody, to minimize disease heterogeneity in this singleton case series. Four variants within intron 3 of STAT4 — the same four identified in our initial findings — were strongly associated with rheumatoid arthritis (e.g., rs7574865, P = 6.26×10-4) (Table 2). The complete results for the rheumatoid arthritis replication series are listed in Table S3 in the Supplementary Appendix.
We also performed analyses of the 63 SNPs in the combined NARAC and rheumatoid arthritis replication series (Fig. 3). In a combined Mantel-Haenszel analysis, the SNP most strongly associated with rheumatoid arthritis in the NARAC series, rs7574865, had a minor allele frequency of 0.27 in case patients and 0.22 in controls (P = 2.81×10-7; odds ratio for having the risk allele in chromosomes of patients vs. those of controls, 1.32; 95% confidence interval [CI], 1.19 to 1.46).
We genotyped the most significantly associated SNP from the NARAC case-control series, rs7574865, in 1529 case patients with recent-onset rheumatoid arthritis and in 881 controls from the Swedish Epidemiological Investigation of Rheumatoid Arthritis series. In this independent series, the minor allele frequency of rs7574865 was significantly greater in the case patients than in the controls (P = 0.02) (Table 2). The minor allele frequency for rs7574865 was lower in the Swedish patients with early rheumatoid arthritis (0.25) than in the North American patients with established rheumatoid arthritis (0.27), whereas the frequency in the controls was the same in both series (0.22).
A meta-analysis of the three independent case-control series for rheumatoid arthritis yielded strong evidence of an association of the minor allele of rs7574865 with disease susceptibility (P = 4.64×10-8). The odds ratio for having the risk allele in chromosomes of case patients as compared with those of controls was 1.27 (Table 2). Genotypic odds ratios for patients as compared with controls were 1.61 (95% CI, 1.28 to 2.03) for homozygotes and 1.27 (95% CI, 1.14 to 1.41) for heterozygotes.
In the NARAC patients with rheumatoid arthritis, of which 81% were positive for anti-cyclic citrullinated peptide antibody, the rs7574865 minor allele frequency did not differ significantly (P>0.05) in the subgroup that was positive for the antibody (0.28) and the subgroup that was negative for the antibody (0.27). Logistic-regression analysis after accounting for the rs7574865 genotype in the combined NARAC and rheumatoid arthritis replication series showed that this one SNP could explain the signal across the STAT1-STAT4 region (data not shown). Furthermore, after accounting for the CTLA4 SNP associated with rheumatoid arthritis (rs3087243), the result for the STAT4 rs7574865 remained significant. Thus, we concluded that the STAT4 SNP, or a variant in tight linkage disequilibrium with it, confers increased susceptibility to the development of rheumatoid arthritis.
To address the possibility that case-control analyses may yield spurious associations due to undetected differences in population admixture or population substructure between case patients and controls, we genotyped the rheumatoid arthritis replication series for 768 SNPs informative about European ancestry, located throughout the genome. There was still strong evidence of association according to a structured association analysis (with STRAT software) (P = 5×10-5) and an analysis using EIGENSTRAT software with correction for the four most significant principal components (P = 2×10-5. Furthermore, when the genotypes of the SNPs informative about European ancestry were used to distinguish controls of predominantly northern European ancestry from those of predominantly southern European ancestry, we found an rs7574865 minor allele frequency of 0.22 in both groups, indicating that this allele frequency does not vary significantly between these subgroups (P>0.05).
Since STAT4 lies within linkage peaks that have also been reported in patients with lupus,29-31 three lupus series of case and control subjects of European ancestry were also genotyped. We found that the minor allele frequency for rs7574865 was significantly increased in all three series among patients (0.29 to 0.31) as compared with controls (0.22 to 0.23) (P = 9.56×10-6 to P = 0.03) (Table 3). In a meta-analysis of the three series, we found strong evidence of association of the rs7574865 minor allele with lupus (P = 1.87×10-9). The odds ratio for having the allele associated with lupus in chromosomes of patients as compared with those of controls was 1.55 (Table 3). Genotypic odds ratios were 2.41 (95% CI, 1.66 to 3.49) for homozygotes and 1.56 (95% CI, 1.30 to 1.88) for heterozygotes.
We have shown that a variant allele of STAT4 confers an increased risk for both rheumatoid arthritis and systemic lupus erythematosus. This finding provides support for the evolving concept that common risk genes underlie multiple autoimmune disorders and suggests the involvement of common pathways of pathogenesis among these different diseases.32
STAT4 encodes a transcription factor that transmits signals induced by several key cytokines, including interleukin-12 and type 1 interferons, as well as interleukin-23.33 STAT4 is a latent cytosolic factor that, after activation by cytokines, is phosphorylated and accumulates in the nucleus. Activated STAT4 stimulates transcription of specific genes including interferon-γ, a key indicator of T-cell differentiation into type 1 helper T (Th1) cells. Therefore, STAT4-dependent signaling by interleukin-12 receptors plays a critical role in the development of a Th1-type T-cell response.34,35
STAT4 has also been implicated in the optimal differentiation of a newly defined CD4+ T-cell lineage, designated Th17 cells. Dependent in part on the activity of interleukin-23, a cytokine related to interleukin-12,36 proinflammatory Th17 cells can play an important, if not predominant, role in chronic inflammatory disorders.37 Indeed, experiments that have targeted Th1 cells in models of autoimmune disease have often unwittingly targeted the Th17 lineage, because the key cytokines of the two lineages, interleukin-12 and interleukin-23, and their receptors share common subunits.33
STAT4, a central player in both lineages, has proved to play a crucial role in experimental models of autoimunity. STAT4-deficient mice are generally resistant to models of autoimmune disease, including arthritis.38 Furthermore, specific targeting of STAT4 by inhibitory oligodeoxynucleotides or antisense oligonucleotides can ameliorate disease in arthritis models,39,40 suggesting the utility of STAT4 as a therapeutic target.
Recent genetic data have shown that interleukin-23-receptor variants are associated with susceptibility to both Crohn's disease41 and psoriasis42; interleukin-12β polymorphisms have also been associated with a risk of psoriasis.42,43 Since both interleukin-12 and interleukin-23 act through STAT4, these data imply that a complex pattern of alterations in related pathways can lead to various forms of autoimmunity and chronic inflammation. STAT4 is also required for signaling in mature dendritic cells in response to type 1 inter-ferons.44,45 Thus, there may be multiple mechanisms by which genetic variation in STAT4 can influence immune responses and predispose persons to autoimmunity. Indeed, in a murine model of lupus, STAT4 deficiency is associated with accelerated nephritis and increased mortality,46 in contrast to the protective effects in arthritis models.38
Several family-based genome scans have revealed linkage of the chromosome 2q region with lupus, as well as with rheumatoid arthritis.29-31 We therefore extended our association studies to three independent lupus case-control series and found strong evidence that the STAT4 variant associated with rheumatoid arthritis was also associated with lupus. The identification of STAT4 as a common predisposition gene for both lupus and rheumatoid arthritis is similar to reported findings of broad associations of the intracellular phosphatase PTPN22 with these and other autoimmune diseases,6 such as type 1 diabetes mellitus,47 autoimmune thyroid disease,21 and myasthenia gravis.48 Clearly, the role of STAT4 in these other disorders should be examined. In addition, the influence of allelic variation on subgroups, manifestations, and outcomes of disease may shed further light on disease mechanisms. The majority of the North American patients with rheumatoid arthritis in our study had long-standing erosive disease, whereas the Swedish patients generally had disease of more recent onset. This may explain the somewhat weaker STAT4 association in the Swedish series. Given that lupus is a highly heterogeneous disorder, it will be important to study STAT4 polymorphisms in clinical subgroups, and in view of the knockout mouse data, this is particularly true with regard to the development of nephritis.
Genetic case-control studies such as ours must be carried out with careful attention to the possibility of false positive results, as has been emphasized elsewhere.49 First, multiple replication is essential for certainty about the basic findings. In addition, studies using unrelated case-control series run the risk of yielding spurious associations if there are unidentified differences in population structure between case patients and controls.25,50 To address this possibility, we genotyped the 1013 case patients and the 1326 controls in the rheumatoid arthritis replication series for 768 ancestry-informative SNPs that were selected for reflecting differences in allele frequency among European subgroups25 and used two methods to control for such stratification. These methods did not reduce the significance of the association with disease. The association was probably robust to this correction because the allele frequency of the disease-associated STAT4 SNP does not vary among European subgroups.
Association studies cannot distinguish among multiple variants in strong linkage disequilibrium with one another, and a haplotype containing several variants could be required to confer a biologic effect. SNPs known to be in strong linkage disequilibrium with rs7574865, on the basis of HapMap CEU data, are listed in Table S4 in the Supplementary Appendix. All these variants are located in the third intron of the STAT4 gene, suggesting that splice variation or regulatory effects may explain the gene's association with disease. Studies are under way to investigate these possibilities in various types of cells, including T cells, monocytes, macrophages, and dendritic cells. In addition, a complete resequencing of the STAT4 gene may yet reveal additional risk alleles. Nevertheless, even in the absence of precise molecular mechanisms, the discovery of these new disease associations with STAT4 should generate a variety of new hypotheses about the pathogenesis of autoimmunity.
Note added in proof: The association of STAT4 with rheumatoid arthritis has now been replicated in a Korean population.51
Supported by grants from the National Institutes of Health (N01-AR-1-2256, N01-AI95386, R01 AR44422, and N01-AR-2-2263, to Dr. Gregersen; R01 AR050267, to Dr. Seldin; and R01 AR052300 and R01 AR-44804, to Dr. Criswell), the Rosalind Russell Medical Research Center for Arthritis and the Kirkland Scholar Award (to Dr. Criswell), the Arthritis Foundation, the Boas Family, the Eileen Ludwig Greenland Center for Rheumatoid Arthritis, and the Intramural Research Program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases. The work was carried out in part at the General Clinical Research Center, Moffitt Hospital, University of California at San Francisco (UCSF), and at the General Clinical Research Center Feinstein Institute for Medical Research (FIMR), with grants from the National Center for Research Resources, Public Health Service (5-M01-RR-00079, to UCSF, and M01 RR018535).
Dr. Plenge reports receiving consulting fees from Biogen Idec and lecture fees from Genentech. Drs. Hom and Behrens report being employees of Genentech; and Dr. Carulli, an employee of Biogen Idec. Dr. Criswell reports receiving consulting fees from Celera Diagnostics. Dr. Gregersen reports serving on the Abbott Scholar Award Advisory Committee and receiving honoraria from Biogen Idec, Genentech, and Roche Pharmaceuticals. No other potential conflict of interest relevant to this article was reported.
We thank the large number of investigators, practicing physicians, and research nurses who contributed data from their patients to the various collections used in our studies, including Dr. Michelle Petri for the Autoimmune Biomarkers Collaborative Network (ABCoN); Drs. Elena Massarotti, Claire Bombardier, and Michael Weisman for the Study of New Onset Rheumatoid Arthritis (SONORA); the Multiple Autoimmune Diseases Genetics Consortium (MADGC); Marlena Kern, R.N., for the North American Rheumatoid Arthritis Consortium (NARAC); and Dr. Frederick Wolfe for the National Data Bank for Rheumatic Diseases, Wichita, Kansas, and the National Inception Cohort of Rheumatoid Arthritis patients. We also thank Dr. John O'Shea for thoughtful comments on an earlier version of this manuscript.
Drs. Remmers and Plenge contributed equally to this article.