We undertook a large-scale, multi-tiered association study of RA using a panel of putative functional SNPs that have been successfully applied to case-control studies of other disease phenotypes 
. The initial step of this large-scale RA association study, individual genotyping of 87 prioritized SNPs to evaluate DNA quality prior to constructing DNA pools for our scan, led to the identification of the PTPN22
R620W SNP 
. This SNP has been both widely replicated and associated with multiple autoimmune diseases 
The present study focuses on variants in the chr 9q33.2 region that were also convincingly correlated with RA status. In particular, three groups of SNPs, represented by rs2239657 (Group 1), rs7021049 (Group 2) and rs7021880 were highly significant and showed a localized effect to a 70 kb region extending from rs10985070, in intron 3 of PHF19
, across TRAF1
to rs2900180 in the TRAF1-C5
intergenic region, but excluding the C5
coding region (LD Block 1 in ). Examination of the CEU HapMap data identified 16 additional SNPs that were highly correlated (r2
>0.95) with either the Group 1 or Group 2 SNPs genotyped in this study () and all 16 fall within this 70 kb region (no such SNPs were found for rs7021880). Across sample sets, the evidence for association at these sites was stronger, maintaining statistical significance after correction for multiple testing, and more consistent than sites in neighboring regions. Additional analyses further buttressed the statistical support for these conclusions: (i) a haplotype sliding window analysis of all SNPs genotyped in the chr 9q33.2 region demonstrated strong statistical evidence for the TRAF1
-region harboring RA risk variants (Pcomb
4.15E-08) and (ii) haplotype analysis of SNPs within the 70 kb LD Block 1, identified a common protective haplotype (Pcomb
3.08E-08) and a less frequent risk haplotype (Pcomb
8.00E-09). The three representative SNPs (rs2239657, rs7021049 and rs7021880) were strongly associated with RF-positive disease and trended towards association in RF-negative disease although the small number of RF-negative patients in our study precludes drawing statistically meaningful conclusions about the role of these SNPs in this patient population.
To tease apart association signals from LD patterns, we used logistic regression. The pairwise analyses of the combined datasets suggest there may be two independent statistical signals of association to RA at chr 9q33.2 – one in the TRAF1 region represented by rs7021049 and one in the GSN region represented by rs10985196 (); however, analyses of the individual sample sets showed rs10985196 was independently associated with disease risk in Sample Set 2 only while rs7021049 showed consistent association across all three sample sets (data not presented). Consequently, additional samples are needed to determine whether the GSN region truly contains RA-predisposing effects.
To explore more complex models and assess whether SNPs outside of LD Block 1 were incorporated into these models, we used both forwards and backwards stepwise logistic regression. The sets of SNPs included in the models chosen by the stepwise procedures were inconsistent indicating that the observed association in the region is not clearly explained by a single SNP or set of SNPs included in the tested models.
Independently, Plenge and colleagues 
, using a whole genome association (WGA) study, and Kurreeman and coworkers 
, using a candidate gene approach, have also shown this chr 9q33.2 region is associated with RA risk in whites of European descent. Although a partially overlapping subset of samples was used in all three studies (see footnotes), each study employed unique experimental designs, analyses and presented different facets of the 9q33.2-RA association. Plenge and colleagues 
identified rs3761847, a TRAF1/C5
intergenic SNP, as one of two non-MHC SNPs reaching genome-wide significance in their WGA study; not surprisingly, the other significant non-MHC SNP was in PTPN22
. Their follow-up fine mapping of the chr 9q33.2 region with nine haplotype tag SNPs in four RA sample sets (2519 cases / 3627 controls) localized the region of interest to 100 kb extending from PHF19
. rs3761847, which is a Group 2 SNP in LD Block 1, remained the most significant SNP in their combined analysis (P
4.00E-14) followed by rs2900180 (P
8.00E-14), a member of the Group 1 SNPs in LD Block 1.
Taking a candidate gene approach, Kurreeman and colleagues 
studied 40 SNPs in a 300 kb region surrounding C5
(from the 3′ UTR of PHF19
to intron 25 of CEP110
) in a staged-approach in four sample sets (2,719 cases / 1999 controls) and concluded that rs10818488, another TRAF1-C5
intergenic SNP and member of the LD Block 1 Group 2 SNPs, was the SNP most significantly associated with a diagnosis of RA in their study. Association of a Group 1 SNP, rs2416806, was moderately significant in a combined analysis of three of their sample sets (P
0.015). Neither the Plenge et al. nor Kurreeman et al. study included rs7021880.
Our analyses, which included more SNPs and incorporated HapMap information for all SNPs highly correlated with SNPs genotyped in our study, permitted a comprehensive analysis of the genetic architecture of 9q33.2 region, allowing us to localize the RA-susceptibility effects to a 70 kb region (LD Block 1) that includes a portion of PHF19, all of TRAF1 and the majority of the TRAF1-C5 intergenic region, but excludes the C5 coding region, narrowing the true region of interest. Our data, however, did not allow us to identify a single SNP or group of highly correlated SNPs (r2>0.95) in this 70 kb region that unambiguously explained the association signal in all three independent sample sets. Other sample sets with different patterns of LD or functional studies will be required to resolve this issue.
Interestingly, Potter and colleagues 
, who studied 23 haplotype-tagging SNPs from the 6 TRAF
genes, including three from TRAF1
, in a UK case-control study (351 RA cases / 368 controls) failed to see association with both a Group1 (rs1468671) and a Group2 (rs4836834) SNP. Using an overlapping sample set to the Potter et al study, the recent Welcome Trust Case Control Consortium genome-wide association study of RA (1860 cases / 2930 controls) 
also failed to identify RA-risk variants in this region. However, a more recent follow-up study from the same group of an independent RA sample set from the UK (3418 cases / 3337 controls) confirmed association with four LD Block 1 Group 2 SNPs although the effect size was less (meta analysis OR 1.08, 95%CI 1.03–1.14) 
The original RA-associated, 9q33.2 SNP identified in our genome-wide scan, rs1953126, is located within LD Block 1, 1 kb upstream of the 5′ end of PHF19
, the human homologue of the Drosophila polycomblike protein, PCL
, gene. In Drosophila, the protein encoded by this gene is part of the 1MDa extra sex combs and enhancer of zeste [ESC-E(Z)] complex which is thought to mediate transcriptional repression by modulating the chromatin environment of many developmental regulatory genes such as homeobox genes. While the exact function of this gene in humans remains unknown, it encodes two nuclear proteins that appear to be upregulated in multiple cancers and preliminary evidence suggests that deregulation of these genes may play a role in tumor progression 
encodes a protein that is a member of the TNF receptor (TNFR) associated factor (TRAF) protein family that associates with, and mediates signal transduction from various receptors including a subset of the TNFR superfamily. There are six members of this family of adaptor proteins; however, TRAF1 is unique in that while it contains the hallmark carboxyl-terminal TRAF domain, it has a single zinc finger in the amino-terminal part and the N-terminal RING finger domain, required for NF-κB activation, is missing. TRAF1 appears to have both anti-apoptotic and anti-proliferative effects 
. In addition, this protein has been found to be elevated in malignancies of the B cell lineage 
. This observation is interesting given the risk of lymphoma, particularly diffuse large B cell lymphomas, appears to be increased in the subset of RA patients with very severe disease, independent of treatment 
. Although the precise mechanism of TRAF1 action in various signaling pathways has not been fully elucidated, it is clear that this molecule plays an important role in immune cell homeostasis making it an excellent candidate gene for RA. In fact, in vitro
work suggests that TNFα-mediated synovial hyperplasia, a major pathophysiologic feature of RA, may be correlated with upregulation of TRAF molecules, particularly TRAF1 
. Given that TNF blockade has proved a highly effective therapy for RA 
and response to TNF-antagonists among RA patients is known to vary, investigation of whether the TRAF1
variants identified in this study play a role in this differential response may be a fruitful pharmacogenetic avenue to pursue.
is also an excellent RA candidate gene and although our analyses allowed us to exclude the C5
coding region, SNPs in LD Block 1 could differentially regulate the expression of this gene. C5
encodes a zymogen that is involved in all three pathways of complement activation. Traditionally, the complement system has been viewed as a central part of the innate immune system in host defenses against invading pathogens and in clearance of potentially damaging cell debris; however, complement activation has also recently been implicated in the pathogenesis of many inflammatory and immunological diseases. Proteolytic cleavage of C5 results in C5a, one the most potent inflammatory peptides, and C5b, a component of the membrane attack complex (MAC) that can cause lysis of cells and bacteria. Genetic studies in various mouse models of RA, including collagen-induced arthritis (CIA) and the K/BXN T cell receptor transgenic mouse model of inflammatory arthritis have provided evidence that C5
or a variant in strong LD, plays a role in disease 
. More striking is the observation that anti-C5 monoclonal antibody therapy can prevent and ameliorate disease in both mouse models 
In summary, we have independently identified a region on chr 9q33.2 as a risk locus for RA. Although the evidence from the SNPs genotyped in our sample sets most strongly points towards TRAF1 variants as being the most highly consistent with a disease model, the high LD that extends from the 5′ end of PHF19 through TRAF1 and into the TRAF1-C5 intergenic region precludes conclusively determining causative genes or functional motifs through genetic means in these samples. Mapping studies in additional sample sets with a different LD architecture and/or functional studies will be required to resolve the molecular relevance of these findings.
Aside from the possibility of developing targeted therapies with knowledge of predisposing variants underlying the onset of RA, the identification of RA susceptibility alleles may encourage earlier monitoring and provide an intervention avenue in advance of significant joint erosion. Our initial analysis of the three known genetic risk factors, HLA-SE, PTPN22 and the chr 9q33.2 variants described here, suggests a >45 fold difference in RA risk depending on an individual's genotype at these three loci. As additional markers are identified, the ability to accurately predict individuals at increased risk for developing RA, particularly within families with a history of RA, may prove useful. Finally, differential risk variants may prove to be drug response markers.