|Home | About | Journals | Submit | Contact Us | Français|
CIA coordinated the design of the study, carried out and supervised all of the statistical analyses, and contributed to the writing of the manuscript.
ATL performed GWAS on all U.S. samples and organized samples for distribution to carry out replication studies.
EL carried out statistical analysis
EFM contributed to study design, carried out replication genotyping and participated in preparation of the manuscript.
DLK contributed to study design and participated in review of the manuscript
MFS contributed to study design and data analysis, and participated in manuscript preparation
LAC contributed to study design, contributed patients for study, and participated in manuscript preparation
RMP contributed to study design and participated in manuscript preparation
VMH, TM, TS, SLB, LWM contribute patient populations for study and contributed to manuscript preparation.
GX carried out genotyping and sample preparation of Canadian samples
ABB contributed samples for study, carried out replication genotyping and participated in manuscript preparation.
KAS participated in study design and organization, supervised all genotyping of Canadian samples and participated in preparation of the manuscript.
A genome-wide association study of rheumatoid arthritis in 2418 cases and 4504 controls from North America identified an association at the REL locus, encoding c-Rel, on chromosome 2p13 (rs13031237, p=6.01 × 10-10). Replication in independent case-control datasets comprising 2604 cases and 2882 controls confirmed this association, yielding an allelic OR=1.25 (95%CI 1.177-1.318, p = 3.08 × 10-14) for marker rs13031237 and an allelic OR= 1.21 (95%CI 1.150-1.282, p = 2.60 × 10-11) for marker rs13017599 in the combined dataset. The combined dataset also provides definitive support for associations at both CTLA4 (rs231735; OR=0.85, 95% CI 0.81-0.90; p= 6.25 × 10-9) and BLK (rs2736340; OR= 1.19, 95% CI 1.125-1.268; p=5.69 × 10-9 ). C-Rel is an NF-kappaB family member with distinct functional properties in hematopoietic cells, and its association with RA suggests disease pathways that involve other recently identified RA susceptibility genes including CD40, TRAF1, TNFAIP3, and PRKCQ1,2.
Rheumatoid arthritis [RA (MIM 180300)] is a common autoimmune disorder affecting approximately 1% of populations of European origin and whose predominant manifestation is inflammation with bone and cartilage destruction in diarthrodial joints. The genetic basis for RA is complex, with at least six genes generally accepted as associated with disease in populations of European origin, including HLA-DRB1, PTPN22, STAT4, TRAF1, and TNFAIP31. A number of additional loci have recently been reported as a result of expanded genome-wide association studies3 and metanalyses2. Many of these are likely to reflect true associations, although a convincing demonstration often requires very large samples sizes, given that many of the associations at these loci are quite modest. In most cases the causative allele(s) have not been identified, and therefore the actual contribution to disease risk at these loci is unknown.
The diagnosis of rheumatoid arthritis is based on clinical criteria established over two decades ago4. However, these criteria do not yet include antibody reactivity to cyclic citrullinated peptides (CCP), the presence of which is a highly sensitive and specific marker for the diagnosis of RA; between 50 and 80% of patients meeting standard criteria for RA exhibit anti-CCP antibodies5. Remarkably, the classical HLA-DRB1 associations with RA are entirely restricted to this phenotypic subgroup6, as are many of the other reported genetic associations. In addition to this phenotypic heterogeneity, there is evidence for genetic heterogeneity in risk for RA among different racial groups7, and this has complicated efforts at replication. Given these considerations, it is apparent that additional risk genes for RA remain to be discovered. For these reasons we undertook an expansion of our previous genome-wide association study of rheumatoid arthritis8, restricted to North American cases of European origin that were overwhelmingly (~95%) anti-CCP antibody positive. We also assembled a large case-control population for replication studies.
All new genotyping of case samples for this study was performed on Illumina HapMap370 BeadArray typing platforms, and after quality control filtering (see Supplementary Methods), a combined dataset of 2418 RA cases and 4504 controls was available for WGA analysis that had been genotyped on 278502 SNPs that passed all quality control filters applied to each set of data. The cases are derived in part from affected sibling pair families of the North American Rheumatoid Arthritis Consortium (NARAC) previously reported8 (one case per family), as well as new collections from both the U.S. and Canada (see Supplementary Table 1A). The genome-wide lambda value inflation of chi-square values was calculated to be 1.21, allowing for the large sample size. Structured association was therefore applied to correct for population stratification by matching homogeneous clusters of cases and controls, as implemented in PLINK. The genome-wide lambda value after adjustment was calculated to be 1.06.
Figure 1 displays a graphical summary of the results of the genome-wide analysis as implemented in PLINK, after conditioning on clusters. As noted in previous studies, the largest association signal is in an extended region within the MHC achieving a maximal level significance of p=9.50 × 10-104 in this study. (The y axis is truncated at - log p=27 in Figure 1). In addition, the previously reported associations8 are confirmed at PTPN22 (rs2476601, p= 1.62 × 10-21) and the TRAF1/C5 regions (maximal association at rs881375, p= 4.09×10-8). However, after PTPN22 the second most significant association is observed on chromosome 2 in the region of REL, where two markers provide highly significant evidence of association (rs13031237, allele specific p=6.01 × 10-10 and rs13017599, allele specific p=9.05 × 10-9), as shown in Table 1. This is a novel finding, since REL has not been brought forward as a candidate region for RA risk by any of the previous association studies in RA, although a recent publication has reported an association with Coeliac disease9. At lower levels of significance, we also noted evidence of association (Table 1) with CTLA4 (rs 6748358, p= 8.24 × 10-5) and a SNP marker in the region of the BLK locus (rs2736340, p= 6.06 × 10-7). A complete list of results at significance level p<0.01 for the entire dataset is provided (Supplementary Table 2).
In order to establish that these results are robust to various analytic approaches, we also performed analysis using Eigenstrat. Before performing the PCA, we removed markers in the region of chromosome 8p that shows inversions in Northern Europeans (8.135-11.936 Mb) and markers in the centromeric region of chromosome 17q21.31 (40-43 Mb) that is polymorphic in European origin populations10. We also removed markers around the HLA region that are related to European ancestry and also rheumatoid arthritis risk (from 24-36 Mb). These HLA-related markers were also removed for analyses that estimate inflation of the genome-wide inflation of test statistics that can arise from differences in population ancestry, since the HLA region includes many hundreds of markers that are associated with RA risk6. The association tests with all markers (including those on chromosomes 6p, 8p, and 17p) were performed adjusting for the eigenvectors derived using Eigenstrat to remove population admixture effects. The lambda value was 1.20 prior to PCA correction and this value reduced to a lambda value of 1.06 after PCA correction. Results from Eigenstrat analysis along with trend tests from PLINK are presented in supplementary table 3. Despite some genome-wide excess in the expected number of positive results from tests for association, the specific findings for REL did not appear to be influenced substantially by population structure. As shown in supplementary table 3, we found that without and with adjustment for population structure the p-values and odds ratios associating the SNPs we queried were quite similar. For example, for rs13031237, the odds ratio and p-value without adjustment for population structure in the combined genome-wide association analysis was 1.278 (p=5.2 × 10-11) versus an odds ratio of 1.268 with adjustment (p=6.0 × 10-10).
In order to confirm the associations with REL, we carried out a replication study on independent sets of 2604 cases and 2882 controls from the U.S. and Canada (Supplementary Table 1B). While complete serologic data were not available for all subjects, the majority of cases were seropositive (either rheumatoid factor or CCP+) in the replication datasets. In addition to selected SNPs from the REL locus, we also included candidate SNPs from the CTLA4 and BLK regions. For technical reasons, slightly different SNP panels were utilized for the replication studies in the Canadian and U.S. samples, based on tagging LD provided in the HapMap, and the results from the separate datasets are provided in Supplementary Table 2. The combined results with common SNPs markers across all the datasets are shown in Table 1 for REL, CTLA4 and BLK. Two highly associated SNPs at the REL locus are observed in the combined data using a Cochrane-Mantel-Haenszel analysis to allow for stratification among populations (rs13031237, OR=1.24, p = 3.08 × 10-14 and rs13017599, OR=1.21, p= 2.06 × 10-12). A graphical representation of the association results across the REL locus is shown in Figure 2. In addition, the data in Table 1 (and supplemental Table 2) provide definitive evidence for the previously suggested association with CTLA4 (rs 231735, OR=0.86, p=6.25 × 10-9). The data also support BLK as a new RA risk locus (rs2736340, OR 1.19, p=5.69 × 10-9), a finding of some interest given the recent association of this locus with systemic lupus11,12. Supplementary table 4 presents genotype-specific results, which show codominance for all the loci except CTLA4, which is nearly dominant.
The nuclear factor-κB (NF-κB)/REL family of transcription factors contain five members including c-Rel, p65/Rel-A, Rel-B, p50/NFkB-1 and p52/NFkB-2. These factors have a central role in coordinating the expression of a wide variety of genes that control immune responses and autoimmunity13. Therefore, the identification of REL, encoding c-Rel, as a new risk locus for RA has provoked us to consider how this observation may fit in with pathways suggested by the complex emerging landscape of genetic susceptibility for RA. While the various NF-κB subunits have complex overlapping functions, current data suggest some distinct roles for c-Rel. The production of IL12 and IL23 subunits by macrophages and dendritic cells are critically dependent on c-Rel14,15. Thus, c-Rel knockout animals exhibit deficiencies in Th1-type immune responses, although intrinsic T cell defects may also contribute to this phenotype16. Intriguingly, both c-Rel and another recently identified RA risk gene, PRKCQ, are specifically involved in the survival of activated CD8 cells, at least in part through the regulation of IL2 production by these cells. In addition, a variety of genes in T cells are regulated by c-Rel, including CD40 and TNFAIP3, both of which are now accepted RA susceptibility loci17.
In addition to effects on T cell and antigen presenting cell function, c-Rel has been shown to have a role in B cell proliferation and survival that particularly involves CD40 signaling pathways. Specifically, c-rel deficient B cells are susceptible to BCR induced apoptosis that cannot be prevented by activation through CD4018. This is due to reduced expression of the anti-apoptotic protein Bcl-XL, a gene that is known to be regulated by c-Rel. Interestingly, rescue from Fas induced apoptosis is normal in these cells, demonstrating the existence of distinct CD40 signaling pathways that are at least in part distinguished by the involvement of c-rel. A similar c-rel associated difference in CD40 signaling has been seen in patients with ectodermal dysplasia and hyperIgM syndrome due to mutations in the NF-κB essential modulator, NEMO, where c-Rel dependent IL4 responses are also impaired19. C-Rel is the only NF-kappaB family member with oncogenic activity and the gene is amplified in some B cell lymphomas. Intriguingly, in both tumors and normal B cells it has recently been reported that the CD40 and c-Rel proteins can physically interact and form a heterodimer that is translocated to the nucleus20 and has transcriptional regulatory activity for known c-rel target genes, including CD154, BLyS/BAFF, and Bfl-1/A1.
The associations of CD402, REL, TRAF18 and TNFAIP3 21are consistent with an important role for CD40 signaling pathways in RA pathogenesis. Indeed, CD40-CD40 ligand interactions have previously been identified as a potential target for therapy in autoimmune disease22, and clinical trials in lupus have shown promise23. Unfortunately, the clinical development of monoclonal antibody inhibitors of CD40L was cut short by adverse effects on the platelet function associated with the development of thromboembolic complications. Nevertheless, this pathway remains viable as a therapeutic target, and the current results mandate a thorough analysis of all the genes in this pathway to search for additional susceptibility alleles
The cases and controls utilized in this analysis were taken from a variety of collections in both the U.S. and Canada, as detailed below. The genome-wide association data from 868 RA cases and 1194 controls from North America have been previously reported in a study of susceptibility loci for rheumatoid arthritis8; all the additional genome-wide association data (1550 cases and 3310 controls) and all the replication sample data (2604 cases and 2882 controls) have not been previously reported in a RA association study. Informed consent was obtained from all subjects and these studies were approved by the Institutional Review Board of the North Shore LIJ Health System.
For our WGA analysis, we included WGA data on 868 North American RA cases samples reported previously8. We refer to these samples as “NARAC I”. All cases included in the current analysis are anti-CCP positive.
Details for patient collections used to compile the “NARAC II” dataset (N=952) used for final analysis on the Illumina HapMap370 chip are as follows.
Controls for the replication WGA were taken from control datasets that are publicly available in the Ilumina iControl datasbase. (www.illumina.com/iControlDB, Illumina, San Diego, CA). Additional control genotypes were from the neurodevelopmental control group obtained from the NIH Laboratory of Neurogenetics (http://neurogenetics.nia.nih.gov/paperdata/public/). These control genotypes were selected from the entire set of European American genotypes available from these resources based on the following data filters: 1) > 90% complete genotyping data; 2) HW > 10-4; and 3) >90% European continental ancestry. The European continental ancestry was determined using ancestry informative markers as previously described11. In addition we utilized control derived from a recent GWA study of lung cancer26 None of these controls overlap with controls utilized for the previously reported NARAC WGA study8. A third set of 1137 controls samples from M.D. Anderson Cancer Center Lung Cancer Study were used. These healthy individuals were seen for routine care at Kelsey-Seybold Clinics in the Houston Metropolitan area; all control subjects were required to have been current or former smokers.
The “NARAC II Replication” dataset contained RA patients from the following sources.
Control subjects for replication “NARAC II- replication studies were taken in part from the New York Cancer Project 30 collection using subjects with self reported European ancestry (N=1163). Random matching of these controls (Supplementary Table 1B) with the TEAR and OPA3 replication datasets simply reflects the logistics of how replication genotyping was batched and has been used as a convenient way of grouping the analysis, shown in Supplementary Table 2. Additional controls were taken from the Controls (N=474) for the GCI collection are described above.
The NARAC-II and Canadian case control collections were genotyped using 373,400 SNPs on the Illumina HapMap370 BeadChip. Genotyping of the NARAC-II dataset was carried out at the Feinstein Institute for Medical Research while the Canadian samples were genotyped at Illumina in San Diego, CA. In order to organize the data to permit integration of results among studies, all genotypes were called using BeadStudio Software according to Top designation (http://www.illumina.com/downloads/TOPBOT_technote27Jun06.pdf). Each data set was subjected to quality control filtering based on SNP genotype call rates (>95% completeness), minor allele frequency (>0.01), and the Hardy-Weinberg equilibrium (P>1×10-4). Subjects with more than 5% missing genotype data or showing evidence of non-European ancestry were excluded. In addition samples showing evidence of relatedness to another sample, or possible DNA contamination, were also excluded. After filtering, genotypes derived from SNP markers that were common to both datasets were merged into a single file for analysis. A summary of all samples utilized for analysis after data cleaning is given in Supplementary Table 1. These filters were applied to each data set independently and SNPs that passed quality control filters in each data set were then merged. In all a total of 278502 SNPs passed all filters
Replication genotyping of selected markers of interest was carried out using Sequenom iPlex technology at the Analytic Genetics Technology Centre in Toronto (Canadian samples) or within the Genetics and Genomics Branch of the National Institutes of Arthritis and Musculoskeletal and Skin Diseases (NARAC II-Replication dataset). The GCI cohort was genotyped using a kinetic PCR assay at Celera Diagnostics. Slightly different panels of SNPs were selected for replication in each dataset, based on technical considerations for efficient multiplexing of markers. The patterns of linkage disequilibrium reported in the HapMap informed the selection of replacement markers to develop marker sets for the replication studies. For rs13031237 and rs2736340, the PCR reaction provided results on the alternate strand (T/G for A/C and T/C for A/G, respectively for rs13031237 and rs2736340).
To allow for potential effects of population substructure and heterogeneity among populations we employed several techniques. In initial analyses of genome-wide data, we combined marker genotypes of all individuals and performed analyses to identify clusters of individuals who had similar genotypes using the program PLINK, with the clustering criterion set at PPC set at 0.0001. We also applied the program Eigenstrat to perform association analysis adjusting for correlations among the subjects according to marker similarities. For both clustering analyses in PLINK and construction of Eigenvectors using Eigenstrat we first removed SNPs on chromosomes 6p near the HLA region, 8p, and 17p that contain large polymorphic inversions. We removed these SNPs as the HLA region contains many SNPs relating to case-control status and including them could reduce our power to detect true associations. On the other hand the SNPs in chromosomes 8p and 17p show different LD patterns compared with other SNPs throughout the genome and we removed these regions to avoid influencing the results due to the potential chance variations caused by effects from many markers in these regions that are not of interest for correcting for population structure but that may show stronger correlations among subjects than expected.
To check for gene-gene interactions we evaluated all possible interactions between SNPs in CTLA4, Rel, and BLK were performed for the 157 most significantly associated non-HLA region SNPs with rheumatoid arthritis. Results showed no interaction effects with significance levels below p< 0.001. Given that 471 tests were performed, no significant interaction effects were identified, after correcting for multiple testing.
Lambda values indicating population structure varied among the populations being studied and according to the test conducted. For NARAC, the lambda value after correcting to a sample of 1000 cases and 1000 controls using PLINK was 1.148 and for Eigenstrat lambda was 1.029. For the Canadian sample, lambdas were 0.996 for PLINK and 1.010 for Eigenstrat. For the combined sample, the lambda value was 1.077 for PLINK and 1.023 for Eigenstrat.
This work was supported by grants from the National Institutes of Health NO1-AR-2-2263 (PKG), RO1 AR44422 (PKG) and by the Eileen Ludwig Greenland Center for Rheumatoid Arthritis and the Muriel Fusfeld Foundation. The work was also supported in part by the Intramural Research Program of the National Institute of Arthritis and Musculoskeletal and Skin Diseases and by by grants from the Canadian institutes for Health Research (MOP79321) and Ontario Research Fund (RE01061)and a Canada Research Chair to KAS.