Rheumatoid arthritis is a systemic autoimmune disease characterized by intra-articular inflammation1
. About 70% of patients have antibodies against cyclic citrullinated peptide (CCP)2
. Until now, the strong association of the MHC to anti-CCP disease3,4
has been explained by the presence of consensus amino acid sequences (QRRAA, RRRAA and QKRAA) spanning positions 70 through 74 in the β1 subunit of the HLA-DR molecule. The classical HLA-DRB1
haplotypes carrying these sequences define the “shared epitope” alleles5
. The shared epitope association was historically defined by exploring structural differences between HLA-DRB1*04
alleles with allospecific T cell recognition6,7
. These reagents focused attention on sequence determinants on the exposed alpha helical rim of the HLA-DR molecule where the shared epitope is located, but left allelic differences at the inaccessible base of the binding groove largely unexplored.
Despite serving as the foundation for rheumatoid arthritis genetic studies, the shared epitope hypothesis does not fully explain the association at DRB1
; studies have suggested additional independent associations within the MHC outside DRB13,8–11
. However, pinpointing those loci has been challenging, in part due to the complexity and cost of complete HLA genotyping and the broad linkage disequilibrium (LD) characteristic of the MHC12
To define the association across the region and identify functional and potentially causal variants, we obtained SNP genotype data for 19,992 anti-CCP positive rheumatoid arthritis cases and controls of European descent from six independent genome-wide data sets (Supplementary Table 1
. We used a large reference panel of 2,767 individuals of European descent14
to impute classical alleles genotypes for HLA-A
, and DRB1
, their corresponding amino acid sequences, and SNPs within the MHC15
. In total, we tested 99 classical 2-digit alleles, 164 classical 4-digit alleles, 372 polymorphic amino acid positions, and 3,117 SNPs across the region for association with logistic regression. To control for population stratification, we included as covariates the first five principal components from genome-wide SNP genotypes for each of the six data sets16
=1.06, see Supplementary Note
First, to assess imputation accuracy, we compared imputed DRB1
classical alleles to genotyped alleles for a subset of 1,403 individuals from two data sets genotyped to 4-digit resolution (Supplementary Table 2A
). Imputations were 95.8% accurate for alleles at 2-digit resolution and 84.0% at 4-digit resolution (see Supplementary Note
). We observed high accuracy in frequency estimates and imputation quality for alleles with >2.5% frequency in the reference set (Supplementary Figure 1A
). We observed similar accuracy at four other classical loci in a subset of 1958 Birth Cohort samples that were part of the WTCCC controls (Supplementary Table 2B,C
). We note that the WTCCC samples have the sparsest SNP coverage across the MHC and that these accuracies probably represent a lower bound (Supplementary Figure 1B
Next, we compared allelic odds ratios of imputed DRB1
haplotypes in our data with recently reported allelic odds ratios for DRB1
haplotypes in a large study of anti-CCP positive rheumatoid arthritis17
. Except the rare *11:02/*11:03 haplotype (<1% frequency), effect sizes from our study were entirely consistent for each of the DRB1
classical haplotypes (Supplementary Figure 2
, Supplementary Table 3
Having demonstrated the validity of our analytic approach, we tested SNPs and HLA
alleles across the MHC for association to rheumatoid arthritis. The most significant allele was the A nucleotide at rs17878703, a quadrallelic SNP in the second nucleotide of DRB1
codon 11 (odds ratio (OR) =3.7, p<10−526
; , Supplementary Table 4
). This allele codes for Val-11 or Leu-11 in DRβ1. Thus, the strongest MHC signal mapped to amino acid 11 of DRβ1, and not any of the shared epitope positions (amino acids 70-74).
Association tests within the MHC to rheumatoid arthritis
We then tested each of the amino acid positions within DRβ1 for association, by grouping classical DRB1
haplotypes according to the specific amino acid carried at each position (Supplementary Table 5
). Amino acid position 11 demonstrated the strongest association (p<10−581
; ). Of the six possible amino acids at this position, the aliphatic residues Val-11 (OR=3.8) and Leu-11 (OR=1.3) confer high risk, whereas other residues confer less risk (, Supplementary Table 4
). In fact, the polar Ser-11 residue is highly protective against disease (OR=0.38). Amino acid position 13 was similarly statistically significant (p<10−574
); its six alleles are in tight LD with those at position 11. Conditioning on position 11 eliminated the effect of position 13 (p=0.57), but conditioning on position 13 did not eliminate the effect of position 11 (p=3.5 × 10−8
). While these results favor position 11 over 13, the tight LD between them makes it is difficult to unambiguously assign causality to one position at the exclusion of the other (). After conditioning on the shared epitope haplotypes amino acid position 11 and 13 remained highly significantly associated (p<10−70
respectively), and more strongly associated than other amino acid positions.
Association results for amino acids in HLA-DRβ1
Effect of individual amino acids within HLA proteins
Effect Estimates for the Five Amino Acids Associated with RA Risk.
To replicate these DRβ1 effects without imputed genotypes, we analyzed an independent South Korean data set of 616 anti-CCP positive cases and 675 controls with genome-wide SNP data18
and sequencing-based classical HLA-DRB1
genotypes at 4-digit resolution19
. We used the first five principal components as covariates to correct for population stratification (λgc
= 1.01). Of all amino acids tested in HLA-DRβ1, the strongest associations mapped to amino acid positions 11 (p=6.1×10−36
) and 13 (p=3.1×10−36
), with statistically indistinguishable effects (p>0.08; Supplementary Table 3
, Supplementary Table 6
). Thus, amino acids 11 and 13 in DRβ1 are the strongest associations in two different continental populations.
Given the polymorphic nature of HLA-DRB1, we evaluated whether a similarly significant result could emerge by chance, by “tagging” classical alleles of differential risk. To test this possibility, we preserved classical HLA genotypes and case-control status in all samples, and permuted the amino acid sequence defined by each classical HLA-DRB1 allele 10,000 times. We found that a single amino acid position only rarely resulted in a better model goodness-of-fit (measured by the deviance) as compared to amino acid position 11 in the actual data (p=0.0002; ). Therefore, the degree to which the six alleles at amino acid position 11 divide the classical alleles of HLA-DRB1 into differential risk groups is extremely unlikely to occur by chance.
After accounting for the amino acid 11 effects in DRβ1 with conditional haplotype analysis, we observed an independent association at position 71 (p<10−37
; , Supplementary Table 5A
). We tested all possible pairs of polymorphic amino acid positions in DRβ1; of the 1,275 pairs of amino acids tested, none achieved a better goodness-of-fit than positions 11 and 71 (p~4×10−615
). Using the same permutation strategy described above, we found that the degree to which amino acid positions 11 and 71 divide the classical alleles of HLA-DRB1
into differential risk groups is unlikely to occur by chance (p=0.0002) (). At HLA-DRβ1 position 71, the positively charged Lys-71 and Arg-71 residues confer greater odds of disease (OR=2.0 and 0.97, respectively) than the small aliphatic Ala-71 (OR=0.59); the negatively charged Glu-71 confers the least odds of disease (OR=0.32, ).
Conditioning on positions 11 and 71 revealed an additional association at position 74 (p=1.5×10−11
; , Supplementary Table 5A
). When we tested all possible combinations of three amino acid positions in DRβ1, we found that only one combination of amino acids sites (37, 67 and 74, p=2×10−624
) out of 20,825 tested outperformed the combination of amino acid sites 11, 71 and 74 (p=1.6×10−622
). However, even that combination did not outperform the 11, 71 and 74 combination by a statistically superior margin (p>0.01). As before, we permuted amino acid sequences, and only rarely were we able to pick three amino acid positions that obtained a better goodness-of-fit in the permuted data than positions 11, 71 and 74 in the actual data (p=0.004; ). Addition of each of these three amino acid positions yielded improved model fit, even after accounting for the increased number of parameters (Supplementary Table 5B
). No residual association was observed at other DRβ1 amino acids after conditioning on positions 11, 71 and 74 (p>8×10−4
; , Supplementary Table 5A
The amino acids at positions 11, 71 and 74 in DRβ1 define 16 haplotypes (). In fact, individual disease risk predicted by a full model where each classical DRB1 allele confers its own unique risk, and a simpler model where risk is defined by amino acid positions 11, 71 and 74, are nearly perfectly correlated (r=0.994). Hence, the model based on the amino acid residues at positions 11, 71 and 74 provides a parsimonious explanation for the effects of the classical DRB1 haplotypes, and suggests an important role for these amino acids in DRβ1 function in rheumatoid arthritis etiology. This is underscored by their central location in the peptide-binding groove of the HLA-DR structure (). Positions 11 and 13 are located on the beta-sheet floor with their side chains oriented into the peptide-binding groove. Positions 71 and 74 are separated by a single turn along the α-helix, and their side chains are spatially close to those of positions 11 and 13.
Three-dimensional ribbon models for the HLA-DR, HLA-B and HLA-DP proteins
In order to assess if there were other independent MHC associations outside of HLA-DRB1
, we conditioned on DRβ1 amino acids 11, 71 and 74 and tested all MHC SNPs and HLA
alleles. We observed the most significant association at HLA-B
in the class I region (p<2×10−37
; ). This association maps to Asp-9 in HLA-B (OR=2.12 relative to His-9 or Tyr-9; , , Supplementary Table 4
), although we could not statistically distinguish this effect from the classical B*08
allele (p>0.68). Like positions 11, 71 and 74 in DRβ1, position 9 in HLA-B is also located within the binding groove (). Many of the previously described associations across the MHC, including markers in the TNF
region, are in LD with Asp-910
Since previously observed B*08
associations to autoimmune diseases, including rheumatoid arthritis, have been attributed specifically to the long ancestral 8.1 haplotype, containing B*08
on the DRB1*03
, we tested whether the B*08/
Asp-9 effect is general to all DRB1
backgrounds. Since B*08
are not in perfect LD and both are seen independent of the 8.1 haplotype, we were able to apply conditional haplotype analysis to demonstrate that B*08
/Asp-9 increases risk roughly two-fold regardless of DRB1
background (). Therefore, this risk effect is not restricted to the 8.1 haplotype. Risk alleles for HLA-B
contribute risk additively (on a log-odds scale) even though they are in strong (but incomplete) LD.
Conditioning on the HLA-DRB1
effects, we observed the most significant association at HLA-DPB1
in the class II region (p<10−20
; ), which corresponds to Phe-9 in DPβ1 (OR=1.40 relative to His-9 and Tyr-9; , , Supplementary Table 4
). This effect is significantly stronger than any 2- or 4-digit HLA-DPB1
classical allele, but in LD with and indistinguishable from the Val-8 allele. Amino acid position 9 is within the binding groove of HLA-DP ().
We observed no residual signals across the MHC after conditioning on DRB1
/Asp-9 in B, and Phe-9 in DPβ1 effects (p>3×10−6
; ). Nor did we observe any evidence of epistatic interactions between known risk loci13,20,21
and any of the HLA alleles described here (p>0.0003, see Supplementary Note
). These results are consistent with a disease model where classical HLA genes/proteins are the dominant factors in rheumatoid arthritis pathogenesis with only a minor contribution from non-HLA loci in the MHC.
A key finding of this study is the major influence of amino acids 11 and 13 within DRβ1, but outside of the well-described shared epitope region. It is possible that one position is driving the effect and the other is in tight LD. Alternatively, there may be a joint effect involving both amino acids, driven by combined selection. This is plausible given the important role of natural selection22
in the MHC and the physical proximity of these two positions. To disentangle these effects, larger studies including multiple ethnicities, and many more examples of alleles where the LD between 11 and 13 is discordant will be necessary. Alternatively, if candidate rheumatoid arthritis auto-antigens can be determined, then these effects might be disentangled by comparing T-cell responses to these antigens presented in the context of DRB1 molecules engineered to contain distinct combinations of amino acids at positions 11 and 13.
This study implicates three amino acid positions in the HLA-DRβ1, and two additional amino acid positions in HLA-B and HLA-DP in conferring rheumatoid arthritis risk. These variants account for 12.7% of the phenotypic variance, whereas common validated alleles outside the MHC explain ~4%13
(see Supplementary Note
). The location of these positions within the peptide-binding grooves implies a functional impact on antigenic peptide presentation to T-cells, either during early thymic development or peripheral immune responses. The presence of class I and II alleles implicate both CD8+ cytotoxic and CD4+ helper T-cells in pathogenesis. Besides rheumatoid arthritis, type 1 diabetes has also been shown to have strong HLA class I and II associations23
. We also note that the HLA-B*08
allele, carrying Asp-9, has been documented in many autoimmune diseases, including myasthenia gravis, immunoglobulin-A deficiency, and systemic lupus erythematosus24
The pathogenic auto-antigens in most autoimmune disorders remain controversial. For rheumatoid arthritis, these results could facilitate evaluation of specific citrullinated polypeptides with molecular modeling and binding assays, and in doing so will guide our understanding of how HLA risk alleles influence the immune repertoire and disease susceptibility.