|Home | About | Journals | Submit | Contact Us | Français|
Dr. Ding had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study design. Ding, Padyukov, Gregersen, Alfredsson, Klareskog.
Acquisition of data. Ding, Padyukov, Lundström, Seielstad, Gregersen,Alfredsson, Klareskog.
Analysis and interpretation of data. Ding, Padyukov, Lundström,Seielstad, Plenge, Oksenberg, Gregersen, Alfredsson, Klareskog.
Manuscript preparation. Ding, Padyukov, Lundström, Seielstad,Plenge, Oksenberg, Gregersen, Alfredsson, Klareskog.
Statistical analysis. Ding, Padyukov, Alfredsson.
To identify additional variants in the major histocompatibility complex (MHC) region that independently contribute to risk in 2 disease subsets of rheumatoid arthritis (RA) defined according to the presence or absence of antibodies to citrullinated protein antigens (ACPAs).
In a multistep analytical strategy using unmatched as well as matched analyses to adjust for HLA–DRB1 genotype, we analyzed 2,221 single-nucleotide polymorphisms (SNPs) spanning 10.7 Mb, from 6p22.2 to 6p21.31, across the MHC. For ACPA-positive RA, we analyzed samples from the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA) and the North American Rheumatoid Arthritis Consortium (NARAC) studies (totaling 1,255 cases and 1,719 controls). For ACPA-negative RA, we used samples from the EIRA study (640 cases and 670 controls). Plink and SAS statistical packages were used to conduct all statistical analyses.
A total of 299 SNPs reached locus-wide significance (P < 2.3 × 10−5) for ACPA-positive RA, whereas surprisingly, no SNPs reached this significance for ACPA-negative RA. For ACPA-positive RA, we adjusted for known DRB1 risk alleles and identified additional independent associations with SNPs near HLA–DPB1 (rs3117213; odds ratio 1.42 [95% confidence interval 1.17–1.73], Pcombined = 0.0003 for the strongest association).
There are distinct genetic patterns of MHC associations in the 2 disease subsets of RA defined according to ACPA status. HLA–DPB1 is an independent risk locus for ACPA-positive RA. We did not identify any associations with SNPs within the MHC for ACPA-negative RA.
Rheumatoid arthritis (RA) is a complex disease that is influenced by both genetic and environmental factors (1,2). It can be divided into 2 major clinical subtypes that are defined by the presence or absence of antibodies to citrullinated protein antigens (ACPA) (3,4). The major histocompatibility complex (MHC) locus has long been known to contain genes that confer substantial risk of the disease (5). Much of the genetic signal maps to class II HLA–DRB1 alleles (DRB1*01, DRB1*04, and DRB1*10, which are known as shared epitope [SE] alleles), and this information has been crucial to the working hypothesis concerning the role of class II MHC–dependent immune activation in the pathogenesis of RA (6). Previous studies have also suggested the presence of other MHC risk factors (7–10), although their precise localization has been hampered by linkage disequilibrium and insufficient single-nucleotide polymorphism (SNP) genotyping over the entire region.
It was recently demonstrated that the SE alleles are associated only with the risk of ACPA-positive RA but not with ACPA-negative disease (11–13). Two studies showed that HLA–DRB1*03 is associated with ACPA-negative RA (14,15), thus emphasizing the need to consider the subdivision of RA according to ACPA status in genetic studies. A crucial question for future understanding of the immune pathogenesis of RA is whether polymorphisms in other MHC loci are associated with only the ACPA-positive form of RA or whether different MHC loci are associated with different disease variants.
This study was approved by ethics review boards at the Karolinska Institutet, and informed consent was obtained from all participating subjects. EIRA is a population-based case–control study of incident cases of RA defined according to the American College of Rheumatology (ACR; formerly, the American Rheumatism Association) 1987 criteria (16), which enrolls 85% of patients within 1 year after initial arthritis symptoms. Most subjects were born in Sweden, and 97% report a white ancestry (for details, see ref. 17).
Cases consisted of RA patients of self-reported white ancestry that were randomly drawn from 4 different collections of patients (the NARAC population, the National Data Bank for Rheumatic Diseases, the National Inception Cohort of Rheumatoid Arthritis, and the Study of New-Onset Rheumatoid Arthritis) (for details, see ref. 18). All patients selected for the present study were ACPA-positive and met the ACR 1987 criteria for RA.
Control subjects were selected based on similar self-reported ancestry from among 20,000 persons who were part of the New York Cancer Project (18).
Genotyping was performed as previously described (18). Briefly, the Illumina Human Hap300 version 1.0 chip (Illumina, San Diego, CA) containing probes for 317,503 SNPs was used. Samples included in the analysis had call rates >96%, and more than 93% of samples had a SNP call rate >99% (mean 99.7% for all completed samples). Forty-one samples were genotyped twice, with a mean concordance prior to any SNP quality control filtering of 99.96% (median 99.98% [range 99.24–100%]).
Genotyping was also performed as described by the International MHC and Autoimmunity Genetics Network (IMAGEN) Consortium (submitted for publication). Briefly, the Illumina GoldenGate assay was used to genotype a panel of 1,230 SNPs. Replication genotyping was performed by iPlex Gold chemistry (Sequenom, San Diego, CA).
HLA–DRB1 type was determined by sequence-specific primer–polymerase chain reaction analysis using low-resolution HLA–DR and DR4 kits from Olerup SSP (Saltsjö-baden, Sweden). High-resolution (4 digits) genotype data were used to define the categories of SE. HLA–DRB1*01, *0401, *0404, *0405, *0408, and *10 were defined as SE alleles. Any genotype with a combination of 2 of these alleles was considered to be a double SE.
The rationale for our classification of SE alleles into different groups was to adjust the associations between RA risk and non-DRB1 genetic characteristics for confounding by DRB1 as much as possible, especially confounding by the DRB1 SE alleles. We tried to achieve this in different ways. We used 3 different classifications of SE alleles (models 1, 2, and 3). The traditional SE grouping (model 1, consisting of 3 categories) was derived from the number of copies of SE alleles (0, 1, and 2 copies). The other groupings (model 2 and model 3) were created based on our data. We used the odds ratio (OR) for each SE allele combination in our study as the basis for grouping and tried to 1) group SE allele combinations that had similar ORs, 2) group the less-frequent SE alleles, and 3) separate the DRB1*01 allele and the DRB1*10 allele from the DRB1*04 allele. In addition, we categorized the SE alleles as proposed by Tezenas du Montcel et al (19).
For the EIRA study, high-resolution DRB1*04 (4 digits) typing and low-resolution (2 digits) typing for other allelic groups were used to perform case–control matching at a ratio of 1:1. For the NARAC study, high-resolution DRB1*01 (4 digits), DRB1*04 (4 digits), DRB1*09 (4 digits), DRB1*15 (4 digits), and DRB1*16 (5 digits) typing and low-resolution (2 digits) typing for other allelic groups were used. Using this stringent selection strategy, we identified 358 pairs of cases and controls from the EIRA and 264 pairs from the NARAC study.
All genotype data were processed using the statistical software package Plink (20). We combined data from the MHC region in the data sets from the GWAS and IMAGEN analyses and then did filtering. To quantify and control for population stratification, we used a principal components analysis approach implemented in the EigenStrat software (21). Using the genome-wide SNP data (18), EigenStrat identified 53 significant outliers (σ = 4; iterations = 5, with no outliers identified after the third iteration) from the ACPA-positive RA patient and control association analysis, and 59 significant outliers from the ACPA-negative RA patient and control association analysis. These outliers and subjects with self-reported non-Swedish ancestry were removed from the final analysis.
Data sets were filtered as follows: SNPs with >5% missing data (n = 8 for the ACPA-positive and n = 10 for the ACPA-negative subgroups), control Hardy-Weinberg equilibrium at P < 2.251 × 10−5 (n = 9), and minor allele frequency <0.01 (n = 85 for ACPA-positive and n = 74 for ACPA-negative subgroup) were excluded. We found no individuals with >5% missing genotypes in both the ACPA-positive and ACPA-negative subgroups. There were 2,122 SNPs that passed quality control filters in the ACPA-positive group and 2,131 in the ACPA-negative group.
We used the Armitage trend test for the initial univariate test of association for both ACPA-positive and ACPA-negative subsets implemented in the package Plink (20). P values less than 0.05 after Bonferroni correction were considered statistically significant for the univariate analysis. Unconditional logistic regression and conditional logistic regression were conducted using the SAS statistical package (version 9.1.3; SAS Institute, Cary, NC). Raw genotypes were recoded as a score variable (0, 1, and 2), counting the number of common alleles using Plink. The genotype variable was entered into the logistic regression models. Associations are reported as ORs and 95% confidence intervals (95% CIs), which were calculated from the models.
In order to identify variants in the MHC region that might contribute to risk of the 2 forms of RA that are defined by presence and absence of ACPA, we selected tag SNPs to capture common genetic variation across the MHC region, using both a set of 1,230 SNPs selected for a combined analysis of 7 different inflammatory diseases (the IMAGEN study) (IMAGEN Consortium: submitted for publication) and a set of 1,298 additional SNPs covering the MHC region that were included in a GWAS (18). There were 307 SNPs that overlapped between the IMAGEN and the GWAS data, leaving a total of 2,221 SNPs for analysis.
For the exploratory analysis, we genotyped a total of 1,291 RA patients (cases), who were selected equally from the 2 major RA subsets (651 ACPA-positive and 640 ACPA-negative), and 670 controls; all of these study subjects were from the EIRA population (18,22). For replication, we used data from the NARAC study (18). In the NARAC population, all RA cases were ACPA-positive; therefore, we used these data for replication and extension of the findings in the ACPA-positive RA cases.
The analytical strategy is illustrated in Figure 1. Due to the availability of equal numbers of samples from ACPA-positive and ACPA-negative RA patients as well as matched controls in the EIRA population, we performed the initial analyses in this group. In the initial univariate analysis of EIRA cases and controls, 299 SNPs reached locus-wide significance (defined here as P < 2.3 × 10−5) when the ACPA-positive RA cases were compared with the controls (Figure 2A). In contrast, no single SNP was found to be statistically significant at this level when the ACPA-negative RA cases were compared with the controls (Figure 2B), despite similar statistical power for the 2 subsets of RA cases in the EIRA study. This provides strong evidence of genetically distinct etiologies behind these 2 forms of RA. Subsequently, we analyzed only ACPA-positive RA cases and controls.
To adjust for the influence of the SE alleles, we initially used unconditional logistic regression analyses, including all ACPA-positive cases and controls in the EIRA study. We first investigated which HLA–DR genotypes were dependent on known HLA–DRB1 SE alleles, as typed by conventional HLA–DR typing (high resolution 4-digit genotype) (Table 1).
We confirmed that different DRB1 SE genotypes confer different levels of risk of developing RA (23). We tried 4 different models to adjust for DRB1 SE alleles, based on categorization of the various DRB1 SE variants into 3, 5, or 7 different categories (Table 1), in order to analyze whether any of the initially identified 299 SNPs conferred risk of ACPA-positive RA independently of the known DRB1 SE risk alleles. In addition, we also included the classification of DRB1 alleles according to the method described by Tezenas du Montcel et al (19). We found that our third model, which included 7 DRB1 SE categories, gave the smallest number of SNPs (70 SNPs) at the σ = 0.05 level and that among the 70 significant SNPs, 61 were common among these 4 classification methods. In further analyses, we chose model 3 to control for the effect of DRB1 SE alleles and found that 70 of the 299 SNPs that were associated with ACPA-positive RA were independent of DRB1 SE alleles.
In order to replicate these findings, we used data from 604 ACPA-positive RA cases and 1,049 controls in the NARAC study. Here, we had access to information on DRB1 type and SNP genotypes for 43 of the 70 SNPs that had been selected from the initial EIRA analysis. We adjusted for DRB1 SE alleles in the same way, and only SNPs that in the NARAC sample were statistically significantly associated with ACPA-positive RA at P < 0.05 and had the same direction of association were considered to be replicated. Using these criteria, 11 of the 43 SNPs were replicated (Table 2), representing 4 different loci: HLA–DPB1, C2-DOM3Z, MICA, and HLA–DQA1. The position and linkage disequilibrium structure of these 11 SNPs are shown in Figure 3.
There may still exist residual confounding by HLA–DRB1 SE alleles, since the categorization into 7 groups might not entirely capture all the complexity of the DRB1 locus. Therefore, we analyzed the replicated 11 SNPs using a data set with pairwise matching of cases and controls on DRB1 genotypes (both alleles). A total of 358 case–control pairs with identical DRB1 genotypes were identified in the EIRA sample and used in the conditional logistic regression analysis. The results showed that associations with markers at 10 of the 11 SNPs were statistically significant at P < 0.05 (MICA was marginally significant at P = 0.0497) and that the HLA–DQA1 locus was not statistically significant (Table 3).
We replicated our findings by including 264 case–control pairs matched for DRB1 genotype from the NARAC sample in the conditional logistic regression model and confirmed associations with 6 HLA–DPB1 SNPs (Table 3). Analysis adjusting for study by combining the pairwise-matched EIRA and NARAC data sets (622 matched case–control pairs) showed statistically significant associations with all 8 of the HLA–DPB1 SNPs and the C2-DOM3Z SNP (rs544167; Pcombined = 0.01, OR 1.75 [95% CI 1.13–2.70]) (Table 3). The strongest association in DPB1 was seen for SNP rs3117213 (Pcombined = 0.0003, OR 1.42 [95% CI 1.17– 1.73]). The 8 strongly linked DPB1 SNPs were in linkage disequilibrium and are independent on HLA–DRB1 and HLA–DQA1. The C2-DOM3Z SNP was independent of all DPB1 SNPs (Figure 3).
We did not find any SNPs that associated with ACPA-negative RA when the data were adjusted for multiple testing for the 2,131 SNPs. SNP rs2040410, which tags the HLA–DRB1*03 allele, had a P value of only 0.17 (Armitage test for trend). Since there have been previous reports on an association between DRB1*03 and ACPA-negative RA (14,15), we further performed a separate analysis regarding risk of ACPA-negative RA, using the EIRA data concerning an association with DRB1*03 when using conventional DRB1 genotyping. This analysis showed an OR of 1.15 (95% CI 0.54–2.44) for DRB1*03/ DRB1*03 (double-dose DRB1*03), an OR of 1.16 (95% CI 0.89–1.52) for DRB1*03/x (single-dose DRB1*03), and an OR of 1.16 (95% CI 0.90–1.51) for DRB1*03/ DRB1*03 or DRB1*03/x, using the x/x (no DRB1*03 alleles) as the reference group and adjusting for age, sex, and geographic location. Thus, there was a trend toward an association between DRB1*03 and ACPA-negative RA also in our data, but the association did not reach statistical significance.
Our study suggests that at least 1 independent locus from the classic HLA–DRB1 locus in the MHC region (HLA–DPB1) contributes significantly to the risk of ACPA-positive RA. A similar finding related to the DPB1 locus in ACPA-positive RA was made independently of the present study in a North American Caucasian population (24). Our study provides an indication that there may also be an independent effect of HLA–DQ, but this requires further study in even larger populations. The C2-DOM3Z locus showed suggestive evidence of association, although not yet fully replicated in the NARAC sample using the matching technique. Finally, with regard to clinical subgroups of RA, we did not identify any associations with SNPs within the MHC for ACPA-negative RA.
Two studies have previously reported an association of DRB1*03 with ACPA-negative RA (14,15). Since the SNPs used in our study do not identify specific HLA–DRB1 alleles (such as DRB1*03), we performed an analysis to investigate the relationship between DRB1*03 and ACPA-negative RA using the DRB1 allele data in the EIRA study. Despite the observed trend for an association between DRB1*03 and the risk of ACPA-negative RA, this association was not significant in our data. We must acknowledge, however, that our data, as well as the previous data showing a positive association, are still of limited size, and even larger studies will be needed to resolve the question of whether and in which populations DRB1*03 is associated with RA and, in particular, with ACPA-negative RA. More detailed analyses and data concerning DRB1 allele distribution and association with the 2 disease subsets of RA using the EIRA data have been described in a separate article (Lundström E, et al: submitted for publication).
To test the robustness of our findings, we also analyzed the relationship between all the primarily identified 299 SNPs and the risk of RA in a matched manner, with case–control pairs matched according to DRB1 genotype. In terms of bias, the matched analysis is the optimal way to adjust for confounding by DRB1 genotype, but it will often be at the expense of power, since the number of useful subject pairs in the analysis often is relatively small. Using this strategy in the EIRA sample, we found 47 SNPs that were statistically significant at P < 0.05. The NARAC study had access to data on 24 of these 47 SNPs, and the EIRA results were confirmed in 6 of these. All 6 of these SNPs were identical to those identified from our first strategy using unconditional logistic regression. We considered the possibility that other non-SE alleles of DRB1 might be in linkage disequilibrium with HLA–DPB1 alleles and might potentially explain the association between DPB1 alleles and susceptibility to ACPA-positive RA. The pairwise matching process, however, eliminated this risk and demonstrated that DPB1 variations are independently associated with the risk of ACPA-positive RA.
In order to remove the confounding by DRB1 SE alleles as much as possible without losing sufficient statistical power, we classified the DRB1 alleles into different groups (defined here as models 2 and 3) based on a statistically oriented approach, in which the OR was used as the basis for grouping the DRB1 alleles (see Patients and Methods and Table 1 for details). Another approach to classifying DRB1 alleles according to their amino acid sequences at positions 70–74 was proposed by Tezenas du Montcel and colleagues (19). The Tezenas du Montcel model has been cross-validated (25,26) and shown to be superior to other classification systems for SE alleles in predicting radiologic progression to erosive disease and in supporting the identification of a protective effect on disease progression (27,28). We compared this approach with our models and found that it yielded results that were very similar to those of our best model, model 3, with 71 significant SNPs for the Tezenas du Montcel model versus 70 significant SNPs for our model 3; among these 70 SNPs, 61 were in common with those in our model 3, as compared with 62 SNPs in common among our 3 models. Therefore, we think our model 3 is, in the present context, similar to the Tezenas du Montcel model in terms of controlling for confounding by HLA–DRB1 risk alleles. For exploratory purposes, we also performed an unconditional logistic regression analysis using recursive partitioning, an alternative risk-categorization method of grouping (29), in the EIRA data, which replicated our findings very well.
These findings support a model in which the role of MHC-dependent adaptive immunity in a broad manner is restricted only or mainly to ACPA-positive RA and in which at least 2, and possibly more, different class II MHC loci (HLA–DR, HLA–DQ, and HLA–DP) may be involved in the pathogenesis of the ACPA-positive RA. Our observations may help to open up the field for new immunologic studies aiming to determine how different MHC-restricted immune reactions, tentatively directed toward citrullinated peptides and proteins, may be involved in the pathogenesis of a serologically defined subset of RA (11,30).
We thank the RA patients and controls for participating in the study; Ingeli Andréasson (Landvetter), Eva Baecklund (Akademiska Hospital), Ann Bengtsson and Thomas Skogh (Linköping Hospital), Birgitta Nordmark, Johan Bratt, and Ingiäld Hafström (Karolinska University Hospital), Kjell Huddénius (Rheumatology Clinic in Stockholm City), Shirani Jayawardene (Bollnäs Hospital), Ann Knight (Hudiksvall Hospital and Uppsala University Hospital), Ido Leden (Kristianstad Hospital), Göran Lindahl (Danderyd Hospital), Bengt Lindell (Kalmar Hospital), Christin Lindström and Gun Sandahl (Sophiahemmet), Björn Löfström (Katrineholm Hospital), Ingmar Petersson (Spenshult Hospital), Christoffer Schaufelberger (Sahlgrenska University Hospital), Patrik Stolt (Västerås Hospital), Berit Sverdrup (Eskilstuna Hospital), Olle Svernell (Västervik Hospital), and Tomas Weitoft (Gävle Hospital) for recruiting patients; Marie-Louise Serra, Camilla Bengtsson, Eva Jemseby, and Lena Nise for invaluable contributions to the collection of data and maintenance of the database; and Annette Lee and Wentian Li for contributions to the generation and analysis of the NARAC data.
Supported by grants from the Swedish Medical Research Council, the Swedish Council for Working Life and Social Research, King Gustaf V’s 80-Year Foundation, the Swedish Rheumatic Foundation, the Stockholm County Council, the Swedish insurer AFA, and the European Union Sixth Framework Programme (project Auto-Cure). The Agency for Science Technology and Research, Singapore, supported some of the genotyping and data analysis (the Illuminabased study); the National Institute of Allergy and Infectious Diseases supported the IMAGEN program (grant U19-AI-067152), in which the rest of the dense SNP typing was performed at the Broad Institute of MIT and Harvard. Some of the NARAC genotyping was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases (grant R01-AR-44422). Dr. Plenge’s work was supported by the NIH (grant K08-AI-55314-3), the Research and Education Foundation of the American College of Rheumatology, the Burroughs Wellcome Fund (Career Awards for Medical Scientists), and the William Randolph Hearst Fund of Harvard University.