|Home | About | Journals | Submit | Contact Us | Français|
Investigators have made key advances in rheumatoid arthritis (RA) genetics in the past 10 years. Although genetic studies have had limited influence on clinical practice and drug discovery, they are currently generating testable hypotheses to explain disease pathogenesis. Firstly, we review here the major advances in identifying RA genetic susceptibility markers both within and outside of the MHC. Understanding how genetic variants translate into pathogenic mechanisms and ultimately into phenotypes remains a mystery for most of the polymorphisms that confer susceptibility to RA, but functional data are emerging. Interplay between environmental and genetic factors is poorly understood and in need of further investigation. Secondly, we review current knowledge of the role of epigenetics in RA susceptibility. Differences in the epigenome could represent one of the ways in which environmental exposures translate into phenotypic outcomes. The best understood epigenetic phenomena include post-translational histone modifications and DNA methylation events, both of which have critical roles in gene regulation. Epigenetic studies in RA represent a new area of research with the potential to answer unsolved questions.
That genetic and environmental factors participate in mechanisms of rheumatoid arthritis (RA) pathogenesis1 is well established. The overall contribution of genetic factors to RA development has historically been investigated through analysis of family pedigrees. For example, familial clustering—greater disease occurrence in relatives of probands than of healthy controls—has been a consistent observation in RA;2 the relative risk of RA development in first degree relatives of affected individuals is estimated at ~2 or greater.3–5 In addition, disease discordance in monozygotic compared with dizygotic twins suggests that the genetic contribution to RA, or disease heritability, approaches 65%;6 these estimates are, however, based on a relatively small number of twins (23 monozygotic and 10 dizygotic disease-concordant twin pairs). Heritability is the proportion of phenotypic variance that can be attributed to genetic, rather than environmental, causes. Thus, although RA clearly has a considerable genetic component, and few known environmental triggers (such as cigarette smoke),7 many environmental factors remain largely unknown and their contribution to RA aetiology is likely substantial.
Mechanisms that underlie the observed sex-bias (3:1 female to male ratio) in the incidence of RA are also unknown. Investigators have suggested a range of hypotheses including potential roles for sex hormones.8,9 The sex chromosomes have been underinvestigated in genetic studies in RA. An Immunochip study published in 2012,10 described in the ‘Immunochip’ subsection of this manuscript, shows for the first time an association with an X chromosome locus in RA, in this case IRAK1 (encoding interleukin-1 receptor-associated kinase 1). This locus has been shown to escape X-inactivation in humans,11 pointing to a possible epigenetic mechanism underlying the sex bias in RA.
In 1969, researchers noticed that peripheral blood lymphocytes from patients with RA were non-reactive in so-called mixed lymphocyte cultures to cells of the same type from other patients with RA.12 Investigators demonstrated in 1976 that patients with RA tend to share the same HLA genes,13 thus explaining the lack of reactivity in mixed cultures. Serotyping experiments subsequently identified an increased proportion of patients with RA who were positive for the HLA allele HLA-DRw4, in comparison with healthy controls,14 establishing the HLA region as a genetic contributor to RA susceptibility. A decade later, further characterization of the HLA locus identified multiple RA risk alleles within HLA-DRB1, and showed that the molecules they encoded shared a conserved amino acid sequence; this finding led to the ‘shared epitope’ hypothesis.15 HLA molecules that contain this 5-amino-acid sequence, which is encoded by shared epitope alleles and is arranged around the antigen-binding groove, are associated with the development of anti-citrullinated protein antibodies (ACPA), and —mostly—with ACPA-positive RA. Although this feature is thought to influence the affinity of binding to citrullinated peptides, and to modulate T-cell responses, the precise biological implications of the shared epitope are not yet clear;1 as we discuss in this manuscript, new associations with ACPA-negative RA complicate the shared epitope theory.
Outside of the MHC region, candidate gene studies performed prior to 2007 had identified only a handful of RA susceptibility loci, including PTPN22 (encoding tyrosine-protein phosphatase non-receptor type 22),16 protein-arginine deiminase type 4 (PADI4)17 and cytotoxic T-lymphocyte protein 4 (CTLA-4)18 By 2007, genome-wide association studies (GWAS) had become possible due to several major preceding advances, including the completion of the Human Genome Project in 200119 and the initial release of the International HapMap project data in 2003.20 These initiatives enabled the design of single-nucleotide polymorphism (SNP) chips with good coverage of variations that occur across the entire genome. In the past 5 years, these technologies have escalated the rate of discovery of disease-associated variants, and around 60 risk loci for RA are now known in European and Asian populations.10,21,22 This advance has been aided by the attainment of large, well-characterized, and homogeneous (that is, ACPA-positive) collections of samples from patients.
By design, GWAS are powered to detect associations with variants that are common in the population (minor allele frequency >5%). Most of the variants identified to date in GWAS in RA, and in other complex diseases, have modest effect sizes, with odds ratios of 1.5 or less.22,23 These associations are potentially caused by causal variants that are in tight linkage disequilibrium with the observed variants. Given that their effect sizes are so modest, each of these alleles individually explains a small fraction of the genetic contribution to RA susceptibility. Currently, all RA genetic risk factors taken together only explain ~16% of the total susceptibility (heritable and environmental).22,24,25 Hundreds of common risk alleles are likely to exist but remain undiscovered to date owing to the limited power of current GWAS. A recent analysis by our group suggests that hundreds of uncharacterized SNP associations throughout the genome, taken together with known risk alleles, in aggregate explain ~36% of RA disease risk.25 SNP associations and known risk alleles therefore account for only about half of the estimated 65% of RA risk that is thought to be heritable. Sequencing experiments in the coming years have the potential to identify causal variants across the entire allele frequency range, including low frequency variants.
Of prime importance for future genetic studies is stratification of samples by distinct phenotypic subgroups of RA. Although the clinical presentation at disease initiation is very similar between patients with ACPA-positive and ACPA-negative RA, disease course, possibly disease pathogenesis,26 and genetic susceptibility27,28 are different. The association of RA with the shared epitope is, as we have mentioned, different between the two serotype subsets.27 Now, it further seems that non-HLA SNPs associated with RA susceptibility are only partially shared between ACPA-positive and ACPA-negative patients with RA,28 confirming the hypothesis that ACPA-positive and ACPA-negative RA are two genetically different diseases.29 Although patients with ACPA-negative RA included in genetic studies satisfy the 1987 American College of Rheumatology classification criteria for RA,30 concerns remain about misclassification in this subgroup of patients.26,29 Nevertheless, even if ACPA-negative RA represents a heterogeneous disease group, the overall contribution of genetic factors to disease susceptibility in ACPA-negative RA seems to be as high as for ACPA-positive RA.31 Interestingly, we could show that the pattern of association of ACPA-positive susceptibility SNPs with ACPA-negative RA (in terms of effect size or presence or absence of an association) cannot be explained solely by contamination with erroneously characterized ACPA-positive samples, because the ratios of the effect sizes between ACPA-positive and ACPA-negative RA vary widely for different genetic markers.28 Nevertheless, ACPA-negative RA is likely to be subclassified in the future on the basis of further types of autoantibody. In 2011, antibodies against carbamylated proteins (anti-CarP antibodies) were shown to be present in around 20% of patients with ACPA-negative RA.32 Furthermore, anti-CarP antibodies were associated with more severe joint damage in this group.32 Twin studies have also established important differences between ACPA-positive and ACPA-negative disease.31 Although heritability estimates remain similar in both serological strata, the contribution of the HLA-DRB1 shared epitope alleles differs markedly, explaining 18% and 2.4% of RA heritability in ACPA-positive and ACPA-negative patients, respectively.31
One of the biggest challenges for the future will be to elucidate the biological mechanisms in which risk alleles operate. We are already beginning to understand which cells are central to RA pathogenesis. For example, CD4+ effector memory T cells specifically express many of the genes located within RA-associated loci.33 Also, certain pathways seem to be critical for disease pathogenesis; for example, multiple genes within RA loci are involved in signalling downstream of the CD40 molecule (also known as TNF receptor superfamily member 5).34
Although discoveries from GWAS in RA have not yet lead to the direct identification of therapeutic targets, some existing therapies target genes and/or pathways that have been highlighted by such studies. For example, abatacept is a fusion protein made up of cytotoxic T-lymphocyte protein 4 (CTLA-4) and immunoglobulin. CTLA-4, together with other transmembrane receptors expressed on T cells (CD28 and inducible T-cell costimulator [ICOS]), has a crucial role in T-cell co-stimulation, and CTLA-436 (in addition to CD28)37 polymorphisms are associated with RA risk. Nevertheless, the translation of genetic findings into clinical applications is often more challenging than originally postulated.38,39 As we discuss in detail below, an association study published in 201235 examining the entire MHC region could not confirm the frequently-reported association of TNF polymorphisms with disease susceptibility, although anti-TNF treatment has substantially improved quality of life for patients with RA.
The discovery of a strong association between HLA-DRB1 and RA was initially made using antibodies to specific MHC class II proteins and thus serotyping individuals according to the surface expression of antigenic molecules on their circulating B cells. Stastny13,14 found that substantially more patients with RA were positive for the B-cell alloantigen DRw4 (later renamed DR4) than were healthy individuals. Subsequently, investigators used cloning and sequencing experiments to characterize different alleles at that gene locus (now called HLA-DRB1). According to the current nomenclature,39 HLA-DRB1*04 denominates the allele group corresponding roughly to the archaic serotypic classification DRw4, while the next appendant digit set defines a specific allele; for example, HLA-DRB1*0401. A decade after Stastny’s discoveries, Gregersen et al.15 showed that molecules encoded by RA-associated HLA-DRB1 alleles share a common amino acid sequence in the third hyper-variable region of the DRβ1 chain—the shared epitope. A T-cell epitope is, by definition, a three-dimensional structure recognized by a paratope (the T-cell receptor [TCR]) and constituted in part by the MHC molecule and in part by the antigenic peptide bound in the groove. The term ‘shared epitope’, therefore, suggests the existence of an autoantigenic peptide that has not unequivocally been identified after two decades of research. As a result, the ‘arthritogenic peptide hypothesis’15,40,41 remains controversial42 and, although shared epitope alleles are established genetic risk factors in RA, the immunological implications of their expression remain uncertain.
Between 1998 and 2003, five genome-wide linkage scans in family-based cohorts of people with or without RA demonstrated strong and significant linkage of the disease with the MHC region, but not consistently with any other region in the genome.43–47 As we have mentioned, linkage with the MHC applied only to ACPA-positive, not ACPA-negative, RA.27 Initially, MHC associations with ACPA-positive RA were attributed to HLA genes; however, shared epitope alleles at the HLA-DRB1 locus do not fully explain the association of the MHC region with RA—several studies in which the HLA-DRB1 effect was controlled for have suggested additional independent associations within the MHC.48–50
Until 2012, MHC alleles were thought to be exclusively associated with ACPA-positive RA. Now, several reports from well-powered studies have identified and confirmed the association of the shared epitope with ACPA-negative RA.10,28,51 The role of this association and its possible restriction to specific serotypes or subtypes of ACPA-negative RA remain to be determined.
Despite advances in high-throughput SNP genotyping technologies, the application of probe-based genotyping to query HLA genes within the MHC has been limited, owing to the highly polymorphic nature of these genes. Historically, investigation of HLA genes required direct PCR-based genotyping. Twenty-first century advances in statistical genetics have now facilitated imputation of HLA alleles based on SNP data.52,53 Imputation employs a large reference data set from individuals genotyped for classical HLA alleles and HLA SNPs to determine the most likely HLA alleles in individuals for whom SNP data over the HLA region, but not direct HLA genotyping, are available.
In 2012, we applied this imputation approach to SNP data from the 2010 GWAS meta-analysis by Stahl et al.22 (Figure 1), and demonstrated that the risk of RA associated with the HLA-DRB1 gene correlates most strongly with the amino acid residue in position 11, located at the bottom of the DRβ1 antigen-binding groove.38 Amino acids 71 and 74, whose sidechains constitute the surface of the antigen-binding groove, also correlated independently with susceptibility to RA. In addition, we found independent RA risk alleles in HLA-B and HLA-DPB1; in both cases, signals from these regions were best explained by a variation in a single amino acid site at the bottom of their respective antigen-binding grooves. No further signal of an association with RA was found within the MHC when we controlled for the independent effects mentioned here. That is, these genetic variants in HLA-B, HLA-DRB1 and HLA-DPB1—affecting a total of 5 amino acid positions—almost completely explained the variance in RA risk caused by the MHC region. Although other SNP associations are indeed possible within the MHC and other HLA genes, such variants are likely to have comparatively weak effects in conferring susceptibility to RA. Importantly, no association signal with RA was identified within the TNF region, indicating that frequently reported associations between TNF promoter polymorphisms and RA susceptibility were probably confounded by nearby HLA-B, HLA-DRB1 and HLA-DPB1 gene variants.
Interestingly, when haplotypes of alleles for the 3 RA-associated amino acid positions within the locus were studied and coded using the classical 4-digit HLA-DRB1 allele nomenclature, the hierarchy of HLA alleles associated with risk of, and protection from, RA was consistent with previous classification systems or studies:51,54–60 for example, shared epitope alleles were associated with greatest susceptibility to RA, whereas HLA-DRB1*130154 was part of the most protective haplotype (Table 1).38 A large European meta-analysis in 2010 confirmed HLA-DRB1*1301 as a protective allele for RA.54
The fine-mapping of MHC polymorphisms that we describe above confirms that HLA-DRB1 modulates susceptibility to RA, and defines a few amino acids, including positions 71 and 74 originally highlighted in the shared epitope hypothesis, as determining the effect.38 Further, the data extend associations with RA to HLA-B and HLA-DPB1. Most interestingly in biological terms, sidechains of amino acids in the positions that alter susceptibility to RA all point towards the peptide-binding groove, revitalizing the ‘arthritogenic peptide hypothesis’. Lack of identification of an ‘RA antigen’ to date might, therefore, be related more to technical challenges than to its non-existence. Indeed, the well-established effect-size hierarchy of classical shared epitope alleles (Table 1 and reviewed elsewhere)51,55 might correlate with the HLA-binding affinity of an antigenic peptide61 and, ultimately, with its immunogenicity.62,63 The next step in characterizing this potential pathogenic mechanism consists of identifying T-cell autoantigens in ACPA-positive RA. New structural information regarding the peptide-binding groove38 and the importance of citrullination with regard to binding affinity will help in the selection of peptides from putative target proteins for reverse engineering experiments.64
Since 2000, high-throughput SNP genotyping has successfully facilitated case–control association studies in RA to test putative links with genetic variants outside the MHC.24,65–67 In 2003, Suzuki et al.68 identified a SNP in the third intron of PADI4 that contributed to the risk of RA in Japanese populations (Figure 2). A year later, Begovich et al.16 identified a non-synonymous SNP in PTPN22 as a risk variant in white individuals in the USA; this variant remains the most strongly RA-associated SNP identified to date with an odds ratio of 1.8 for ACPA-positive RA. Subsequent case–control studies investigating other candidate gene associations have suggested only a handful of additional susceptibility loci (among them CTLA-4,18 TRAF169 and FCRL370); most loci identified in candidate-gene studies were not reproducible in independent studies.18 In 2007, investigators in RA genetics published three separate GWAS in RA,71–73 including one within the multi-disease Wellcome Trust Case–Control Consortium study.73 Several of the many new RA susceptibility SNPs (reviewed elsewhere)37 identified in these GWAS71–73 and subsequent studies are described in this section. GWAS have now been used to identify risk factors for RA in populations of European and Asian descent.21,22,68–71,74–77
Imputation techniques, which have become standard tools to determine the genotype of ungenotyped SNPs,78 facilitate powerful meta-analyses of GWAS data originating from different genotyping platforms. Two large RA GWAS meta-analyses have independently examined different populations: Stahl et al.22 analysed data from people of European descent in 2010 (initially 5,539 patients with ACPA-positive RA and 20,169 controls, replicated using data from a further 6,768 patients and 8,806 controls), whereas Okada et al.21 used data from Japanese individuals (initially 4,074 with RA and 16,891 controls, then a further 5,277 patients and 21,684 controls) in 2012—these analyses identified 7 and 9 novel RA risk alleles, respectively.
Another approach to investigating the genetic basis of susceptibility to RA is to examine shared genetic bases between it and other autoimmune diseases, or across different ethnicities; genes, including ubiquitin-conjugating enzyme E2 L3 (UBE2L3)79, DEAD (Asp-Glu-Ala-Asp) box helicase 6 (DDX6; encoding probable ATP-dependent RNA helicase DDX6)80, and IKAROS family zinc finger 3 (Aiolos) (IKZF3, encoding zinc finger protein Aiolos),81 have been thus implicated. Okada et al.21 conducted a multi-ancestry comparative analysis of 46 risk loci between the Japanese data we have mentioned and data from individuals of European descent—5,539 patients with RA and 20,169 controls. Six of these sites were monomorphic in Japanese people (that is, all Japanese individuals have the same genotype at that locus), but all were polymorphic in individuals of European descent. Significant associations with RA (false discovery rate <0.05, P <0.0030) were found at 22 loci in Japanese people and at 36 loci in those of European descent; 14 of these signals were shared. Indeed, a comparison of all tested SNPs across the two populations showed a positive correlation of odds ratios for of a large proportion of SNPs between cohorts of individuals of European descent and Japanese cohorts, indicating shared genetic susceptibility alleles.21 Ethnogenetic heterogeneity in RA has been reviewed previously in this journal.82
Of note, the focus of meta-analyses to date has been almost exclusively on ACPA-positive RA. Genetic architecture differs between ACPA-positive and ACPA-negative RA,48 and RA susceptibility loci are only partially shared between the two serotypes.10,28
Immunochip, a custom SNP array, facilitates dense genotyping and fine-mapping at 186 genetic loci, including confirmed autoimmune loci and other alleles with nominal GWAS-based evidence of an association with an autoimmune disease. Collaborating with investigators worldwide, our group genotyped 11,475 patients of European descent with RA and 15,870 controls at 130,000 markers using Immunochip, identifying 14 novel RA risk loci.10 Furthermore, we refined to single genes the association signals of 19 previously identified loci. Secondary independent effects (defined as a remaining association at P <5 × 10−4 after conditioning on the most associated SNP of the region) were identified at 6 loci, and non-synonymous exonic SNPs or SNPs located within an essential splice site suggested putative causality at 7 loci. Interestingly, PADI4 polymorphisms, unequivocally associated with RA in Asian populations in previous studies, were significantly associated (genome-wide; P <5 × 10−8) with RA in patients of European descent in this study.10 Although a PADI4 variant was historically the first RA-associated polymorphism to be identified outside the HLA, its association has been controversial in populations of European descent.
Few genetic markers of RA susceptibility have been experimentally linked to functions, and many different molecular mechanisms are implicated. Indeed, the first molecular steps by which a SNP influences phenotype might involve alterations in transcriptional activity, epigenetic modifications, microRNA regulation, splicing, mRNA or protein stability, translation, protein activity or post-translational modifications. In this section, we review RA susceptibility loci for which roles have been investigated experimentally after their discovery in GWAS.
The most studied polymorphism in RA to date is the PTPN22 non-synonymous Arg620Trp SNP rs2476601. Tyrosine-protein phosphatase non-receptor type 22 (known as PTPN22 and encoded by PTPN22), down-regulates TCR signalling by dephosphorylating Src family kinases, such as Lck or Fyn (Figure 3). Although evidence indicates that the PTPN22 risk allele affects the enzymatic activity of the encoded phosphatase,83 the influence of the Arg620Trp variant on the immune response has been controversial—in 2005, a gain-of-function consequence was reported,84 but further functional studies have been inconsistent. In 2011, Zhang et al.85 showed that rs2476601 is a loss-of-function allele that mediates its effect by destabilizing PTPN22 (or its mouse homolog). The variant phosphatase is targeted for degradation both by calpain proteases and through ubiquitin-mediated proteasomal degradation. Reduced levels of the protein correlate with increased number, activation and thymic positive selection of T cells, and with dendritic-cell and B-cell activation. In 2012, the function of PTPN22 was linked to the thymic development of regulatory T cells,86,87 and alternative molecular mechanisms as heterogeneous as imbalance in the expression of PTPN22 splice variants88 and differential allelic expression89 have been suggested.
PADI4 mediates post-translational conversion of arginine residues to citrulline. Originally,68 an RA susceptibility haplotype was shown to increase the stability of PADI4 mRNA transcripts and was associated with ACPA positivity in patients with RA. Citrullinated peptides bind with higher affinity to HLA-DRβ1 shared epitope molecules,61,90 are naturally processed,91 and are immunogenic.62 Thus, it seems that increased translation of variant PADI4 mRNA boosts production of citrullinated peptides, which act as autoantigens and elicit profound adaptive immune responses. Whereas many other risk loci seem to be connected to multiple autoimmune diseases, the PADI4 locus is specific to RA.
CCR6 encodes a chemokine receptor expressed by CD4+ type 17 T helper (TH17) cells. A polymorphism in CCR6 correlated with expression level of CCR6 mRNA and with the presence of IL-17 in the sera of patients with RA, highlighting the importance of the TH17 pathway in RA pathogenesis.74
Other than PTPN22, PADI4 and CCR6, few other risk loci have been investigated functionally in RA. Nevertheless, knowledge has been gained from studies in healthy individuals or in the context of other auto-immune diseases.
Autoimmunity-associated SNPs located in non-coding genomic regions in the vicinity of IL2RA (encoding IL-2 receptor subunit α) have been shown to correlate with IL2RA mRNA and surface protein expression levels in monocytes, CD4+ naive T cells and memory T cells, but not in other cell types tested.94 According to the quantal theory of immunity, T-cell responses depend on a critical number of stimuli mediated by TCR and IL-2R,95 which could explain different activation thresholds in the T-cell compartment of individuals polymorphic at the IL2RA locus.
TNFAIP3 encodes TNF-induced protein 3 (TNFAIP3), a ubiquitin-modifying enzyme that is a key regulator of nuclear factor κB activity (Figure 3). Three SNPs within the TNFAIP3 locus are independently associated with RA susceptibility.96 In patients with systemic lupus erythematosus (SLE), a polymorphism located in a highly conserved region of TNFAIP3 has been shown to reduce mRNA and protein expression of the gene, seemingly by reducing the avidity with which a nuclear protein complex of NF κB subunits binds to it.97 Indicating the importance of cell-type specific expression, mice with conditional knockout of Tnfaip3 expression in dendritic cells develop an SLE-like phenotype,98 whereas mice lacking Tnfaip3 in myeloid cells develop an RA-like phenotype.99
A biologically pragmatic way to define a pathway is to consider it as a chronological succession of molecular interactions occurring between cells or within a cell, starting with a signal (input) and ultimately resulting in a response (output). This linear definition of pathways conveniently allows experimental testing, as responses to signals can be measured. Several bioinformatic techniques have been proposed to analyse post-GWAS data as a whole and to identify RA-specific mechanisms of disease progression.98 In this section, we describe how bioinformatic techniques, such as pathways and networks analyses and integrative systematic approaches, are applied to functional analysis of putative RA risk alleles. Importantly, bioinformatic definitions of pathways are often non-linear and do not lend themsleves to experimental validation; an important challenge for the future will be how to biologically validate integrated bioinformatic analysis approaches.
Representing the TCR intracellular signalling pathway as a schematic linear series of molecular events provides an example of how RA susceptibility loci can be matched with potential roles in a biological model of pathology, and yet also illustrates how complex such efforts are. Indeed, many of the known loci associated with RA are involved in the TCR signalling pathway, but intricate interactions link these components with other signalling pathways in even a simplified depiction (Figure 3). Thus, although plausible functional explanations for how genetic variants confer RA risk can be generated, it remains to be experimentally demonstrated that RA risk alleles in aggregate alter, for example, the efficiency of TCR engagement (input), subsequent signal transduction events, and consequent production of IL-2 (output).
The generic term ‘pathway analysis’ is loosely defined in the literature and has been used to refer broadly to systematic analyses examining sets of genes for common functional properties. In some instances, this broad definition, instead of a clear linear one, can be misleading; for example, the ‘cellular compartment’ ontology (or pathway) in the Gene Ontology (GO) classification does not describe a biological pathway, rather it describes the specific cellular locations where a protein localizes.99
Several studies have analysed GWAS data for enrichment in genes belonging to specific biological pathways, as defined by pathway classification tools such as GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), Reactome, PantherDB and BioCarta.100–103 Biological notions of pathways, as well as detail and quality of annotation, differ between pathway databases.104–106 As a result, outcomes of database-driven pathway analyses depend on the pathway ontologies used.107 Nevertheless, such analyses have confirmed broad statements about the aetiology of RA by showing enrichment of RA susceptibility loci in pathways related to immune functions,100,101 T-cell activation and/or differentiation,101,102 JAK-STAT pathway signalling,102 and TNF signalling.101 Novel approaches are required to gain a more differentiated picture of causal molecular events.
Another way to investigate the function of RA susceptibility loci in biological pathways is through bioinformatic identification of proteins that their products physically interact with. Such approaches commonly include protein–protein interaction network analyses. Protein–protein interaction databases such as the Human Protein Reference Database108 or text mining techniques such as GRAIL35 can be used to construct networks. Other emerging techniques integrate pathway and network-oriented analysis.109,110 Several proteins encoded by RA susceptibility genes are consequently thought to interact or bind with each other (Figure 4).108,109
Cell-specific expression analysis can, as a proxy for gene function, be particularly useful in identifying pathways, cell types, and regulatory programmes relevant to RA. Our group recently mapped RA susceptibility markers to specific cell types.33 As a comprehensive and unbiased catalogue of gene functions is not available, we used a compendium of gene expression data as an objective proxy for tissue-specific gene function. We observed that CD4+ effector memory T cells were highly enriched for the specific expression of genes within RA risk loci.
A 2012 systematic review of putative pathogenic mechanisms in SLE illustrates how experimental data from various sources can be integrated into a single disease model: SLE pathology is hypothesized to result from type I interferon (IFN) misregulation.111 Data from single-gene disorders, GWAS, gene expression micro-arrays, and serologic studies tend to converge towards a linear disease model: immune complexes bind to Toll-like receptors (TLRs) on plasmacytoid dendritic cells; type I IFN production by dendritic cells is triggered; IFN binds to its receptor on target cells; JAK-STAT signalling is activated and the expression of hundreds of genes—the ‘IFN signature’—is altered, leading to disease manifestation. Type I IFN levels and its signature can both be directly measured in peripheral blood. Of 47 loci associated with SLE susceptibility, 27 (57%) are involved in type I IFN production or signalling. Evidence supporting the direct involvement of type I IFN in SLE pathogenesis has paved the way for new therapeutic approaches targeting type I IFN.111 No such evidence of a clear disease pathway is available yet for RA.
Results from twin studies support a substantial role for environmental triggers in determining RA risk, as evidenced by high discordance rates between monozygotic twins.6 However, the identities of non-shared environmental exposures remain largely elusive. One of the ways in which individuals may respond to an environmental exposure is through changes in their epigenome. The best-understood epigenetic phenomena include post-translational histone modifications and DNA methylation, both of which have a profound influence on gene expression.112
Methylation of DNA cytosine residues at the carbon 5 position, generating 5-methylcytosine, occurs primarily in the context of cytosine-guanine dinucleotides (CpGs). An unexpected feature of the human genome is the relative paucity of CpGs due to the frequent mutation of 5-methylcytosine to thymine.113 Regions of the genome with high CpG content, termed CpG islands (CGI),114 are often hypomethylated and are associated with the promoter regions of actively transcribed genes.115 Methylation in regions up to 2kb away from CGIs (termed GpG island shores) can also strongly influence gene expression.116
N-terminal tails of histone proteins are subject to a wide range of different modifications including acetylation, methylation, phosphorylation and ubiquitylation. More than 60 different histone modification sites have been described.117 A mechanistic connection clearly exists between histone modifications and DNA methylation.118 For example, the presence of DNA methylation reportedly promotes deacetylation of histone 4 and dimethylation of histone 3 at lysine 9, as well as inhibiting methylation of histone 3 at lysine 4, all of which are important modifications that inhibit gene repression.119
The epigenome has sufficient plasticity to react to the internal and external environment; a range of environmental exposures (such as xenobiotic chemicals and behavioural cues)120 can alter the epigenome. For example, DNA methylation levels at the F2RL3 locus (the gene for proteinase-activated receptor 4) are significantly lower in individuals exposed to cigarette smoke.121 In a related study, F2RL3 methylation status was reported to mediate smoking-associated mortality in patients with stable coronary heart disease.122 Induced epigenetic changes can be inherited during cell division, thereby maintaining the acquired phenotype in daughter cells.120 In addition, stochastic epigenetic instability may accumulate over time in multiple cell types in the absence of obvious environmental stimuli. For example, methylation patterns are more poorly conserved than DNA sequences during mitosis. The error rate for the maintenance of methylation is approximately 10−3 per base pair, whereas the error rate for DNA sequence is approximately 10−6 per base pair.123 Phenotypic differences in genetically identical siblings could conceivably be determined more by this stochastic variation in the epigenome than by epigenetic differences due to non-shared environmental effects.124
The pluripotency of cells decreases during cellular differentiation as gene expression programmes become more restricted.125 This process, which results in the acquisition of cell-type specific features, is controlled epigenetically and is characterized by a specific set of histone modifications and DNA methylation patterns. For example, histone acetylation at the IFN-γ promoter occurs during differentiation of naive T helper (TH0) cells into cells with the TH1 phenotype. 126 This modification reduces the affinity between the histone and DNA, increasing access for transcription factors. When investigating epigenetic alterations in the context of disease pathogenesis it is therefore essential to focus on a pure or enriched cell type that is relevant to the disease under investigation. This requirement is particularly challenging in RA, wherein the most relevant cell subsets are not immediately obvious.
Data for epigenetic phenomena in RA are currently limited, especially in terms of study scale and power.127 However, some interesting observations from studies of DNA methylation patterns are beginning to emerge. For example, analysis of DNA methylation in T cells has revealed global hypomethylation in cells derived from patients with RA compared with those from healthy controls.128 DNA hypomethylation has also been observed in RA fibroblast-like synoviocytes (FLS), as compared with normal FLS, derived from small joint post-trauma biopsy samples.129 In 2013, Nakano et al.130 published a genome-wide evaluation of FLS derived from patients with RA or osteoarthritis (OA), reporting that as many as 1,859 loci, relevant to cell movement, adhesion and trafficking, were differentially methylated in RA (732 hypomethylated and 1,127 hypermethylated). This study was performed using the latest Illumina HumanMethylation450 BeadChip, which provides comprehensive gene region (for example, promoter, exon 1, gene body, 5′ and 3′ untranslated regions) coverage of over 96% of NCBI Reference Sequence genes.
In a gene-targeted approach Nile et al.131 investigated DNA methylation patterns in the promoter region of IL6 in peripheral blood mononuclear cells (PBMC) derived from patients with RA (n = 8) and healthy controls (n = 5). This study identified a single CpG motif 1,099 base pairs upstream of the IL6 transcription start site that was less methylated in patients with RA than in controls. Electrophoretic mobility-shift assay (EMSA) experiments supported this finding in that reduced methylation at the –1,099 locus was reported to correlate with increased binding of nuclear proteins to the genomic DNA.131 However, further experiments in isolated B cells will be needed to further support these interesting data.
Representing a new class of modulators of gene expression, miRNAs base-pair with the 3′-untranslated region of target mRNAs leading to mRNA degradation or inhibition of translation.132 Increased expression of miRNA-115133 and miRNA-203134 has been observed in RA FLS (compared with OA FLS) and this increase correlates with elevated levels of matrix metalloproteinase-1 (MMP-1) and IL-6. It is important to note that expression of miRNA-115132 and of miRNA-203134 are inversely correlated with levels of DNA methylation.
Correct study design will be critical if our understanding of the role of epigenetic alterations in RA is to expand. Retrospective case–control studies are possible and may also include GWAS information, but care must be taken to ensure that observed differences reflect true epigenetic differences and not variance in, for example, cell-type composition. Retrospective studies are limited in that they cannot determine whether an observed epigenetic mark is causal or consequential (secondary, for example, to therapeutic intervention or the inflammatory response). Investigations including disease discordant monozygotic twins are useful as they control for differences due to germline sequence variation and gender; however, unless samples are collected longitudinally, which would be very difficult in a late onset disease such as RA, cause will not be distinguished from consequence. Longitudinal cohorts of people initially free from disease (for example, the 1958 birth cohort in the UK)135 would avoid confounding due to differences in recruitment of cases and controls, and avoid bias due to case-control differences in the measurement of non-genetic risk factors. Longitudinal cohorts would be essential for establishing the temporal origin of deleterious events and distinguishing causal from consequential effects.136 Important considerations in designing epigenetic studies include sample throughput methods and genome coverage and resolution. For studies of DNA methylation, array-based approaches involving bisulphite conversion are currently the most powerful, but a shift towards whole-genome bisulphite sequencing is likely in the future.137
Epigenetic alterations might prove useful in the clinical setting as markers of disease progression or response to treatment. Furthermore, epigenetic alterations provide new and important targets for the development of therapeutics in RA. Histone deacetylase inhibitors (HDACIs) are currently the best-studied epigenetic therapeutic agents.138 The anti-inflammatory properties of HDACIs include reductions in the levels of cytokines such as TNF, IL-6 and INF-γ.139,140 HDACIs could represent, in the future, a suitable therapeutic option for the treatment of autoimmune diseases such as RA as they are well tolerated at low doses and orally active.141
Future challenges in understanding and leveraging RA genetics and epigenetics include further identification of causal genetic variants and their functional characterization, investigation of the role of epigenetic modifications in RA pathogenesis, and translation of fundamental discoveries into clinical practice. Accurate risk prediction in susceptible individuals142 will allow preventive intervention; in patients, individual predictions of disease outcome143 and treatment response144 will pave the way to personalized medicine and allow more efficient patient care. A better understanding of RA molecular pathogenesis will enable the development of new intervention strategies.
Those future tasks are likely to be achieved by the use of new technologies and innovative research strategies. Next-generation sequencing will facilitate whole-exome and whole-genome investigations, in particular studies of the role of rare (<0.5%) genetic variants in large cohorts. Rare variants might explain a certain proportion of the missing heritability of RA, and growing evidence indicates that such alleles are functionally important, penetrant, and harbour larger effect sizes than common variations.145,146 Deep re-sequencing, which identified new and independent effects in other autoimmune diseases,147,148 could be applied to RA.
New technologies will also drive epigenetic studies.137 Epigenetic modifications are potentially influenced by genetic factors as well as by environmental signals including those, such as cigarette smoke, that are known to influence RA risk.
Despite the large number of RA susceptibility loci identified in recent years, genetic risk prediction of RA cannot be performed with sufficient accuracy to enter clinical practice.149,142 Nevertheless, as a result of GWAS and related studies, new pathogenic pathways have been revealed, and mechanisms of some existing drugs are becoming clearer. Genetic sequence variants are unlikely to explain all of the variation in gene function that underpins RA. Correct gene function also depends on appropriate epigenetic programming, which differs between cell and tissue types, and between different stages of cellular development.
Little is currently known about the extent of epigenetic burden in RA; however, epigenetic data in this disease are beginning to accumulate. It will be important for future epigenetic studies in RA to focus on the correct cell types and, in targeted approaches, the correct biological pathways.
Genetic testing has already entered clinical practice in oncology and predicting drug response is currently part of everyday practice in some oncologic subspecialities. Although the genetic architecture of disease susceptibility, severity and treatment response differs significantly between cancers and autoimmune diseases, genetic testing is likely to enter clinical practice in rheumatology in the next decade.
The PubMed database was searched using the following terms: “genetics AND (rheumatoid OR arthritis)”, “epigenetics AND (rheumatoid OR arthritis)” for full papers and abstracts published online and/or in print in English up to June 2012. References to be included were selected by the authors according to their opinion of their relevance to the scope of this Review, and further papers were identified from the reference lists of relevant publications. Some reports published after June 2012 and identified during revisions to this manuscript have also been included. Pathways presented in Figure 3 were curated manually from the literature; only well established interactions were considered.
S. Viatte’s research activities are supported by a grant from the Swiss Foundation for Medical-Biological Scholarships (SSMBS), managed by the Swiss National Science Foundation and financed by a donation from Novartis (PASMP3 134380). The work of S. Raychaudhuri is supported by grants from the National Institutes of Health (5K08AR055688 and 1R01AR062886) and an Arthritis Foundation Innovator Award. This manuscript was also funded by a core programme grant from Arthritis Research UK (17552).
The authors declare no competing interests.
Author contributionsAll authors contributed equally to researching data for the article, writing the article, discussions of the content, and review and/or editing of the article before submission.