As highlighted above, recent progress has identified a large number of RA risk alleles. Growing international collaboration, improving genotyping technologies, and enlarging patient sample collections will further enhance these discoveries. However, specific challenges remain to clearly understand how these variants cause disease and how these discoveries can be used to enhance patient care.
The major histocompatability complex
The largest contributor of the genetic variation has been attributed to HLA-DRB1
risk alleles in the MHC region discovered in the 1970s [5
] and subsequently organized into the shared epitope alleles by Gregersen et al
]. Estimates of the contribution of the shared epitope alleles to the total genetic variability of RA have ranged from 18 to 37% [23••
]. But, outside the shared epitope alleles within the HLA-DRB1
gene locus, additional risk alleles may exist within the MHC, but these alleles remain to be pinpointed precisely [61••
]. The MHC region is a highly complex region with extended linkage disequilibrium and complex structural features that reduce the effectiveness of standard SNP-based genotyping strategies and analytical approaches. In particular, converting SNP genotypes into HLA alleles is difficult. Although SNP GWA data are available for a large number of samples worldwide, in most cases HLA genotyping is expensive and unavailable. In the coming years, one of the challenges will be to use genotyped SNPs in GWAS to estimate HLA alleles and to then identify additional risk alleles within the MHC itself. Recent advances to use a panel of approximately 100 SNP genotypes across the MHC to estimate HLA genotypes could have a tremendous impact in this area and help investigators to clarify the genetics of the MHC and its impact on RA [63•
Recently, new array-based and sequencing technologies have allowed investigators to examine the genome for structural variants [65
], such as regions that are deleted and duplicated. Case–control studies have demonstrated association of structural variants with multiple diseases, including autoimmune diseases. In one striking example, McCarroll et al
] examined a strongly associated SNP to Crohn’s disease and recognized that it correlated perfectly with the presence of a deletion in the promoter region of the IRGM
gene and therefore affected gene expression. Another compelling example is the β-defensin gene cluster; duplications of β-defensin have been shown to be associated with greater risk of psoriasis [69
]. Rare variants can also now be detected with current technologies, and the role of rare or single-event deletions and duplications has been demonstrated in neuoropsychiatric diseases [70
]. No convincing examples of common or rare structural variants conferring disease risk have been recognized for RA yet. However, there is mounting evidence in the literature that a microdeletion in the CCR5 gene, the CCR5Δ32 polymorphism, protects individuals from disease; this variant has been demonstrated to play a role in HIV disease progression [72
]. There has also been published evidence suggesting that duplications of the CCL3L1
gene may increase RA risk [73
Implicating pathways to rheumatoid arthritis risk
One of the key goals of genetics is to identify biological pathways and processes that predispose to risk. In , I have used GRAIL to demonstrate the compelling functional connections across genes near RA-associated SNPs; these connections strongly suggest that common pathways are present across multiple RA risk loci.
Gene relationships across rheumatoid arthritis-associated loci
For example, risk alleles highlight genes involved in T-cell activation by antigen presenting cells (class II MHC region, PTPN22
). More recent discoveries have demonstrated the role of the CD40 signaling pathway and downstream activation of nuclear factor-κB (NF-κB) signaling pathway (CD40
, TRAF6, TNFSF14
). Recently, Gregersen et al
] conducted an unbiased GWAS to definitively identify an associated risk allele at c-REL
, one of the five NF-κB family proteins.
Recent associations have also now clearly implicated the IL2
signaling pathway, a critical cytokine involved in T-cell activation and proliferation. Follow-up genotyping of nominally associated SNPs (P
) in the WTCCC RA GWAS demonstrated evidence of association at IL2RA
that was subsequently confirmed by an independent group [37
]. Follow-up genotyping of nominally associated SNPs (P
) demonstrated evidence of association at IL2RB
that was subsequently confirmed by an independent group [29••
]. Finally, Zhernakova et al
] demonstrated association of a SNP implicating the IL2/IL21
locus, which was subsequently confirmed in an independent study by Barton et al
]. Functional connections across the genes within these loci are clearly highlighted in .
The majority of the discoveries presented here have been identified with predominantly seropositive (anti-CCP or rheumatoid factor positive) samples. Many studies explicitly exclude seronegative samples to assure diagnostic certainty and homogeneity. The differences in the genetics of anti-CCP-positive and anti-CCP-negative RA have now been demonstrated most strikingly in the role that the shared epitope alleles play in multiple studies [23••
], whereas in seropositive disease, shared epitope alleles are the strongest risk factor; they seem to play a much more modest role if any at all in seronegative disease. More recently, Ding et al
] conducted a GWA study and examined the association of MHC SNPs in seronegative cases and observed no significant association of any SNP. However, outside the MHC, additional studies are necessary to demonstrate similarities and differences between the risk loci for seronegative and seropositive disease.
Predicting disease risk
One of the goals of genetic studies is to be able to predict individual risk for patients. For such predictive strategies to be clinically applicable, they will need to achieve a high degree of specificity.
Based on twin studies, it is unlikely that full ascertainment of the genome alone will result in highly accurate clinical risk prediction in RA. Concordance rates among dizygotic twins offer a sense for the upper-bound for the predictive power of a genetic test. As monozygotic twins have identical genotypes, a hypothetical predictive method that uses only genetic information will assign the same prediction to both twins. For monozygotic twin pairs with at least one affected twin, RA has a maximum of 15% concordance. Any method that is 100% sensitive when applied to probands will accurately classify them as affected. But, the same sensitive method when applied to the related twins will also classify all of them as affected, and the positive predictive value will be at most 15%. On the other hand, a 100% specific method will accurately classify at least 85% of the related twins as unaffected; but the same method when applied to the affected probands will only be at most 15% sensitive. A specific and clinically applicable method that uses only genetic information can, therefore, identify at most 15% of affected patients in advance of clinical symptoms.
Certainly, the addition of additional factors such as epigenetic information, biomarkers, clinical predictors, and environmental factors will improve predictive power and may result in a more useful approach.
However, current SNP associations and MHC alleles in aggregate can offer insight into risk of RA for individuals within a population, assuming representative ethnic background. Karlson et al
] used published RA risk SNP alleles and their odds ratios along with MHC risk alleles to demonstrate that they could be used to stratify patients into seven genetic risk categories; they showed that the highest risk category was at three-fold the risk of the population baseline and six-fold the lowest risk category.