|Home | About | Journals | Submit | Contact Us | Français|
To review the recently discovered genetic risk loci in rheumatoid arthritis (RA), the pathways they implicate, and the genetic architecture of RA.
Since 2008 investigators have identified many common genetic variants that confer disease risk through single nucleotide polymorphism genotyping studies; the list of variants will no doubt continue to expand at a rapid rate as genotyping technologies evolve and case–control sample collections continue to grow. In aggregate, these variants implicate pathways leading to NF-κB (nuclear factor kappa-light-chain-enhancer of activated B cells) activation, the interluekin-2 signaling pathway, and T-cell activation.
Although the effect of any individual variant is modest and even in aggregate considerably less than that of the major histocompatability complex, discovery of recent risk variants suggests immunological processes that are involved in disease pathogenesis.
Rheumatoid arthritis (RA) is a complex disease involving many different pathways within the innate and the adaptive immune system ; in addition to an inflammatory polyarthritis that predominantly affects the hands, RA can affect multiple other tissues, including the lungs . The possibility that RA might have a genetic component was considered as long ago as 1806 by William Heberden in his book Commentaries on the History and Cure of Diseases , in which he posed the question in his writings ‘Is [rheumatism] not in some degree hereditary?’. Today, it is well established that there are key genetic risk factors, such as the shared epitope human leukocyte antigen (HLA) alleles [4,5] and environmental risk factors, such as smoking . Within the past 2 years, genetic studies have produced a rapidly growing list of risk loci for RA and other autoimmune diseases, and investigators are struggling to understand how these loci augment our knowledge about human disease and identify individuals at risk. Currently, known risk alleles, however, account for only a fraction of the heritability of RA, and as genotyping technology evolves and case–control collections enlarge, the number of known risk variants will undoubtedly continue to expand.
Mouse models of arthritis have helped to elucidate the key genetic factors and highlight potential therapeutic interventions [7–10], but do not always correlate perfectly with human disease. Genetic studies can help us identify the key genes and pathways that are involved at the inception of human disease. As genetic variants are present at birth and precede disease, the convincing association of a specific variant with disease demonstrates that either the variant or a highly correlated nearby variant plays a key role in disease pathogenesis. Consequently, pathways that these genes are involved in may be targeted effectively by therapeutics in early disease and furthermore are potentially promising for curative therapy.
In the present review, I seek to review recent advances in RA genetics. First, I will review the available data demonstrating that RA is a heritable disease. Second, I will review the RA risk loci identified so far and the strategies used to discover them. Third, I will talk about some of the biological insights that might be gleaned from the current list of RA risk loci. Finally, I will discuss the application of these loci to predicting RA risk.
Overwhelming evidence from familial studies has long implicated genetic factors in RA. Most notably, familial studies have demonstrated an increased prevalence of disease in first-degree relatives. Twin studies have shown an increased rate of disease in monozygous (identical) versus dizygotic twins.
One common approach to assess and quantify the possible role of genetic factors is to measure the concordance of disease between related individuals. A highly heritable disease will have an increased prevalence in individuals related to affected probands compared with the general population. For example, a proband with a rare and highly penetrant monogenic dominant disease will have 50% of their siblings affected by a rare disease compared with a less than 1% population prevalence. The increased incidence of RA among those individuals related to probands with disease is a key observation that suggests the heritability of disease. Of course, this observation could be explained by shared environmental factors as well; for example, family members living together close to heavy traffic might all be exposed to the same risk . A key descriptive parameter is λR – the ratio of the prevalence of a disease among first-degree relatives (parents, children, and siblings) compared with the general population. For highly penetrant monogenic diseases, λR can be very high; for example, λR is approximately 500 for cystic fibrosis. Complex diseases have more modest λR values; for example, it is approximately 15 for type I diabetes.
RA is commonly cited as having an λR as 2–17 , though more careful estimates are often more modest. The earliest evidence of familial segregation was made in 1950 by the Empire Rheumatism Council . They noted that fathers, mothers, and sibling relatives of affected individuals were each about twice as likely as age-matched and sex-matched controls. This finding was reproduced in many additional studies in the same time-frame as reviewed by Lawrence . Several large studies that have carefully ascertained controls to match or adjust for age and sex have shown that the relative risk of RA for first-degree relatives is also about 2–4 [15–17]. Jones et al.  looked at 134 RA probands from the Norfolk Arthritis Registry compared with age-matched and sex-matched controls. When they compared the prevalence of RA among 418 first-degree proband relatives compared with control relatives, they noted a 2.2-increased relative risk; however, this increase was not statistically significant. Del Junco et al.  examined 1631 first-degree relatives of 78 RA probands in Rochester, Minnesota and noted a significant increased rate of 1.7 compared with age-adjusted and sex-adjusted local rates. A recent large study using the Multigeneration Register in Sweden used diagnostic codes from hospital discharges to demonstrate an increased relative rate of disease in siblings of affected individuals to be 4.6 and in children to be 3.0 . These values are likely dependent on the severity of the disease, serologic status of patients, age, and sex of the proband [14,18]. Additionally, hospital-based cohorts seem to suggest greater heritability than community-based ones.
Twin studies offer a powerful strategy to estimate the genetic aspects of disease. In the past, twin studies have been used to estimate the heritability of disease. Specifically, an increased rate of disease among monozygotic twins compared with dizygotic twins offers evidence that inherited factors predispose to disease risk, potentially above and beyond common environmental factors. To date, the most compelling analysis of twin studies suggests RA is approximately 65% heritable  based on data from a Finnish  and British cohort . These estimates are based on a total of only 23 concordant monozygotic twins and 10 concordant dizygotic twins, and estimates of heritability are, therefore, potentially sensitive to modest biases. For example, a smaller Danish RA twin study demonstrated no heritability for RA whatsoever . On the other hand, previous studies to estimate twin concordance had demonstrated higher degrees of heritability . Possibly, diagnostic criteria and study designs have changed over time, and these more recent concordance rates are more modest.
In a recent extension of the British twin study, investigators genotyped twins for shared epitope alleles and for anticyclic citrullinated peptide (anti-CCP) antibody [23••]. They found that both seropositive and seronegative RAs were equally heritable with approximately 65% heritability for both diseases as previously noted. But, shared epitope alleles account for only 2.4% of the heritability of seronegative RA, but 18% of seropositive RA.
Over the last year, a rapidly growing list of RA risk loci has emerged in the scientific literature as case–control sample collections grow and genotyping technologies develop (see Table 1, Fig. 1) [4,24,25••,26,27,28••,29••, 30,31,32••,33,34,35••,36••,37,38,39••,40,41]. Different genetic strategies have been effectively applied to identify risk loci in RA. These strategies can be broadly divided into unbiased genome-wide investigations and targeted investigations to examine specific alleles or genes for which there is increased prior probability of association based on previous functional or genetic information.
The single nucleotide polymorphism (SNP) has become the standard genetic marker to identify associated alleles [42–44]. Most confirmed SNP associations in RA are common SNPs – those SNPs in which the rarer nucleotide form is present in more than 5% of the population of chromosomes. Although these SNPs are associated with disease, in most cases the altered nucleotides do not necessarily cause disease themselves. But, they are likely to be correlated with common mutations within the region that affect a functional change in nearby genes and thereby cause disease. Methods to directly look for rare variation, such as de-novo mutations and rare structural variants, are ongoing and as of yet have not yielded confirmed RA risk variants, but have been successfully applied to other phenotypes.
High throughput SNP genotyping technologies [45,46] have allowed investigators to genotype common variation across 100 000–1 000 000 SNPs efficiently. These SNPs typically capture up to 65–70% of common variation across the human genome [47,48] and can be used to do an unbiased investigation for novel risk loci for a phenotype of interest. Genome-wide association studies (GWASs) combined this technology with large case–control collections to allow investigators to rapidly identify associated alleles. Additionally, genome-wide genotyping offers investigators additional analytical advantages such as checking for duplicated or related individuals, imputation across the genome to determine genotype of ungenotyped SNPs, and application of strategies to check and correct for case–control stratification.
One key caveat for such studies is that as they assess variation across a large number of loci, very high levels of significance are necessary to convincingly rule out the possibility that an association is a false positive. Today, investigators typically require risk alleles to demonstrate significant association at about P value less than 5 × 10−8 before they accept it as truly associated. To achieve this level of stringency, effective genome-wide studies have required large-scale international collaborations between investigators worldwide.
One of the first effective applications of the GWAS study in RA was to discover the association at TNFAIP3. Plenge et al.  identified an associated SNP (rs10499194) in that region using a collection of Boston-based cases and controls. A separate and independent TNFAIP3-associated SNP (rs6920220) was identified by Thomson et al.  in a follow-up study of the Wellcome Trust Case Control Consortium (WTCCC) RA GWAS study . Subsequent investigation by the Orozco et al. [25••] identified a third independent SNP association in the same region in a fine mapping study. The TNFAIP3 locus highlights the potential complexity that can be present at a given locus involving multiple alleles marked by SNPs with independent effects (see Fig. 2). Other discoveries by the GWAS approach include an association at TRAF1 [26,27], REL, and BLK [28••].
Another powerful aspect of genome-wide SNP data is that it can be used to conduct genome-wide meta-analyses across multiple studies, even if the same SNPs have not been genotyped in all studies. Powerful statistical techniques can be used to estimate or ‘impute’ missing genotypes prior to conducting meta-analyses [50,51•]. We conducted the most recent large-scale meta-analysis of three GWA scans [39••], including scans from the Epidemiological Investigation in RA (EIRA) consortium, the WTCCC, and The North American RA Consortium (NARAC; see Fig. 3). As the different studies used different genotyping platforms, we used imputation strategies to estimate SNP genotypes that were missing in individual studies. That study in conjunction with follow-up genotyping in independent cases and controls identified evidence of association at six separate loci. Of those, five associations at CCL21, KIF5A, CD40, PRKCQ, and TNFRSF14 have been independently described [29••,52•]. Imputation and meta-analysis techniques have become a standard tool to combine results of multiple GWA studies and have now been widely applied to many phenotypes, including Crohn’s disease , height , and diabetes .
One effective approach to discovering risk alleles in RA has been candidate SNP approach. In this approach, investigators have selected specific SNPs to investigate in RA, based on their known function, association with other autoimmune disease, or prior nominal evidence of association in linkage or case–control data.
Known risk alleles in type I diabetes have served as valuable candidates for investigation in RA. Investigators have tested multiple type I diabetes risk alleles for association with RA. Some of theses alleles have now been convincingly validated in RA, including CTLA4 , 4q27 , and AFF3 [32••]. Similarly, Zhernakova et al.  demonstrated preliminary evidence of association at SH2B3 by genotyping a known risk allele in celiac disease in RA samples . These alleles highlight a very important theme in autoimmunity, which is the common SNP loci which seem to predispose to autoimmune diseases in general . This theme was apparent even with the earliest discovery of a risk variant outside the major histocompatability complex (MHC) region in the PTPN22 gene with its association with RA, type I diabetes, lupus, and other autoimmune diseases [33,58].
At least two known RA associations have been driven by following up on regions with nominal evidence of association. Remmers et al.  identified a promising candidate gene in a known RA linkage peak – subsequent follow-up genotyping in a large set of independent cases and controls demonstrated association at the STAT4 locus. More recently, Suzuki et al. [35••] identified nominal evidence of association in a linkage disequilibrium block containing multiple signaling lymphocyte activation molecule (or SLAM) genes and mapped the signal to the rs3766379 SNP in Japanese patients. This SNP turned out to be a functional SNP that affected CD244 gene expression.
Pathway-based approaches can also be used to discover novel variants. The central assumption to these approaches is that disease-associated SNPs should implicate a common pathway. We described one novel approach that leverages both genome-wide data and information about gene function. In order to identify whether SNP associations implicated a set of genes that are functionally interrelated, we devised the GRAIL (gene relationships across implicated loci) algorithm [59•]. GRAIL applies computational text mining to PubMed abstracts to assess whether genes across multiple loci are functionally connected, and therefore, are likely to represent a common pathway. We used this approach to help identify novel RA risk loci. Based on the aforementioned meta-analysis, we identified 22 SNPs with nominal evidence of association (P <0.001), which were also near genes that were connected to genes already associated with RA. We genotyped these SNPs in independent samples and identified seven that replicated including associations at CD2/CD58, CD28, PRDM1, TAGAP, PTPRC, TRAF6/RAG1, and FCGR2A [36••].
As highlighted above, recent progress has identified a large number of RA risk alleles. Growing international collaboration, improving genotyping technologies, and enlarging patient sample collections will further enhance these discoveries. However, specific challenges remain to clearly understand how these variants cause disease and how these discoveries can be used to enhance patient care.
The largest contributor of the genetic variation has been attributed to HLA-DRB1 risk alleles in the MHC region discovered in the 1970s  and subsequently organized into the shared epitope alleles by Gregersen et al. . Estimates of the contribution of the shared epitope alleles to the total genetic variability of RA have ranged from 18 to 37% [23••,60]. But, outside the shared epitope alleles within the HLA-DRB1 gene locus, additional risk alleles may exist within the MHC, but these alleles remain to be pinpointed precisely [61••,62••]. The MHC region is a highly complex region with extended linkage disequilibrium and complex structural features that reduce the effectiveness of standard SNP-based genotyping strategies and analytical approaches. In particular, converting SNP genotypes into HLA alleles is difficult. Although SNP GWA data are available for a large number of samples worldwide, in most cases HLA genotyping is expensive and unavailable. In the coming years, one of the challenges will be to use genotyped SNPs in GWAS to estimate HLA alleles and to then identify additional risk alleles within the MHC itself. Recent advances to use a panel of approximately 100 SNP genotypes across the MHC to estimate HLA genotypes could have a tremendous impact in this area and help investigators to clarify the genetics of the MHC and its impact on RA [63•,64].
Recently, new array-based and sequencing technologies have allowed investigators to examine the genome for structural variants [65–67], such as regions that are deleted and duplicated. Case–control studies have demonstrated association of structural variants with multiple diseases, including autoimmune diseases. In one striking example, McCarroll et al.  examined a strongly associated SNP to Crohn’s disease and recognized that it correlated perfectly with the presence of a deletion in the promoter region of the IRGM gene and therefore affected gene expression. Another compelling example is the β-defensin gene cluster; duplications of β-defensin have been shown to be associated with greater risk of psoriasis . Rare variants can also now be detected with current technologies, and the role of rare or single-event deletions and duplications has been demonstrated in neuoropsychiatric diseases [70,71]. No convincing examples of common or rare structural variants conferring disease risk have been recognized for RA yet. However, there is mounting evidence in the literature that a microdeletion in the CCR5 gene, the CCR5Δ32 polymorphism, protects individuals from disease; this variant has been demonstrated to play a role in HIV disease progression . There has also been published evidence suggesting that duplications of the CCL3L1 gene may increase RA risk .
One of the key goals of genetics is to identify biological pathways and processes that predispose to risk. In Fig. 4, I have used GRAIL to demonstrate the compelling functional connections across genes near RA-associated SNPs; these connections strongly suggest that common pathways are present across multiple RA risk loci.
For example, risk alleles highlight genes involved in T-cell activation by antigen presenting cells (class II MHC region, PTPN22, STAT4, CD28, and CTLA4). More recent discoveries have demonstrated the role of the CD40 signaling pathway and downstream activation of nuclear factor-κB (NF-κB) signaling pathway (CD40, TRAF1, TRAF6, TNFSF14, TNFAIP3). Recently, Gregersen et al. [28••] conducted an unbiased GWAS to definitively identify an associated risk allele at c-REL, one of the five NF-κB family proteins.
Recent associations have also now clearly implicated the IL2 signaling pathway, a critical cytokine involved in T-cell activation and proliferation. Follow-up genotyping of nominally associated SNPs (P <10−5) in the WTCCC RA GWAS demonstrated evidence of association at IL2RA that was subsequently confirmed by an independent group [37,38]. Follow-up genotyping of nominally associated SNPs (P <10−4) demonstrated evidence of association at IL2RB that was subsequently confirmed by an independent group [29••,38]. Finally, Zhernakova et al.  demonstrated association of a SNP implicating the IL2/IL21 locus, which was subsequently confirmed in an independent study by Barton et al. [32••]. Functional connections across the genes within these loci are clearly highlighted in Fig. 4.
The majority of the discoveries presented here have been identified with predominantly seropositive (anti-CCP or rheumatoid factor positive) samples. Many studies explicitly exclude seronegative samples to assure diagnostic certainty and homogeneity. The differences in the genetics of anti-CCP-positive and anti-CCP-negative RA have now been demonstrated most strikingly in the role that the shared epitope alleles play in multiple studies [23••,74], whereas in seropositive disease, shared epitope alleles are the strongest risk factor; they seem to play a much more modest role if any at all in seronegative disease. More recently, Ding et al. [62••] conducted a GWA study and examined the association of MHC SNPs in seronegative cases and observed no significant association of any SNP. However, outside the MHC, additional studies are necessary to demonstrate similarities and differences between the risk loci for seronegative and seropositive disease.
One of the goals of genetic studies is to be able to predict individual risk for patients. For such predictive strategies to be clinically applicable, they will need to achieve a high degree of specificity.
Based on twin studies, it is unlikely that full ascertainment of the genome alone will result in highly accurate clinical risk prediction in RA. Concordance rates among dizygotic twins offer a sense for the upper-bound for the predictive power of a genetic test. As monozygotic twins have identical genotypes, a hypothetical predictive method that uses only genetic information will assign the same prediction to both twins. For monozygotic twin pairs with at least one affected twin, RA has a maximum of 15% concordance. Any method that is 100% sensitive when applied to probands will accurately classify them as affected. But, the same sensitive method when applied to the related twins will also classify all of them as affected, and the positive predictive value will be at most 15%. On the other hand, a 100% specific method will accurately classify at least 85% of the related twins as unaffected; but the same method when applied to the affected probands will only be at most 15% sensitive. A specific and clinically applicable method that uses only genetic information can, therefore, identify at most 15% of affected patients in advance of clinical symptoms.
Certainly, the addition of additional factors such as epigenetic information, biomarkers, clinical predictors, and environmental factors will improve predictive power and may result in a more useful approach.
However, current SNP associations and MHC alleles in aggregate can offer insight into risk of RA for individuals within a population, assuming representative ethnic background. Karlson et al. [75•] used published RA risk SNP alleles and their odds ratios along with MHC risk alleles to demonstrate that they could be used to stratify patients into seven genetic risk categories; they showed that the highest risk category was at three-fold the risk of the population baseline and six-fold the lowest risk category.
In the last year, investigators have identified multiple additional novel RA risk loci; but these loci likely represent early steps in understanding RA genetics. Much additional work remains to understand the complexity of each of these loci, the functional role that each locus plays in the pathogenesis of RA and of autoimmune disease in general, and to discover the complete list of risk loci. Ultimately, these risk loci will continue to provide key insights into the pathogenesis of RA.
S.R. is supported by an NIH Career Development Award (1K08AR055688).
Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Additional references related to this topic can also be found in the Current World Literature section in this issue (pp. 000–000).