The genetic association of the major histocompatibility complex (MHC) to rheumatoid arthritis risk has commonly been attributed to HLA-DRB1 alleles. Yet controversy persists about the causal variants in HLA-DRB1 and the presence of independent effects elsewhere in the MHC. Using existing genome-wide SNP data in 5,018 seropositive cases and 14,974 controls, we imputed and tested classical alleles and amino acid polymorphisms for HLA-A, B, C, DPA1, DPB1, DQA1, DQB1, and DRB1 along with 3,117 SNPs across the MHC. Conditional and haplotype analyses reveal that three amino acid positions (11, 71 and 74) in HLA-DRβ1, and single amino acid polymorphisms in HLA-B (position 9) and HLA-DPβ1 (position 9), all located in the peptide-binding grooves, almost completely explain the MHC association to disease risk. This study illustrates how imputation of functional variation from large reference panels can help fine-map association signals in the MHC.
To examine the association of previously identified autoimmune disease susceptibility loci with granulomatosis with polyangiitis (GPA, formerly known as Wegener’s granulomatosis), and determine whether genetic susceptibility profiles of other autoimmune diseases are associated with GPA
Genetic data from two cohorts were meta-analyzed. Genotypes for 168 previously identified single nucleotide polymorphisms (SNPs) associated with susceptibility to different autoimmune diseases were ascertained for a total of 880 GPA cases and 1969 controls of European descent. Single marker associations were identified using additive logistic regression models. Multi-SNP associations with GPA were assessed using genetic risk scores based on susceptibility loci for Crohn’s disease, type 1 diabetes, systemic lupus erythematosus, rheumatoid arthritis, celiac disease, and ulcerative colitis. Adjustment for population substructure was performed in all analyses using ancestry informative markers and principal components analysis.
Genetic polymorphisms in CTLA4 were significantly associated with GPA in the single-marker meta-analysis (OR 0.79. 95% CI 0.70–0.89, p=9.8×10−5). A genetic risk score based on rheumatoid arthritis susceptibility markers was significantly associated with GPA (OR 1.05 per 1-unit increase in genetic risk score, 95% CI 1.02–1.08, p=5.1×10−5).
Rheumatoid arthritis and GPA may arise from a similar genetic predisposition. Aside from CTLA4, other loci previously found to be associated with common autoimmune diseases were not statistically associated with GPA in this study.
genetics; vasculitis; granulomatosis with polyangiitis; rheumatoid arthritis; CTLA4
We investigated the prevalence of xenotropic murine leukemia virus-related virus (XMRV) in 293 participants seen at academic hospitals in Boston, Massachusetts. Participants were recruited from five groups of patients: chronic fatigue syndrome (CFS, n = 32), HIV infection (n = 43), rheumatoid arthritis (RA, n = 97), hematopoietic stem-cell or solid organ transplant (n = 26), or a general cohort of patients presenting for medical care (n = 95). XMRV DNA was not detected in any participant samples. We found no association between XMRV and patients with CFS or chronic immunomodulatory conditions.
XMRV; chronic fatigue syndrome; HIV infection; rheumatoid arthritis; hematopoietic stem-cell transplantation; solid organ transplantation
We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record.
Materials and Methods
The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values.
Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers.
Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.
Cumulative genetic profiles can help identify individuals at high-risk for developing RA. We examined the impact of 39 validated genetic risk alleles on the risk of RA phenotypes characterized by serologic and erosive status.
We evaluated single nucleotide polymorphisms at 31 validated RA risk loci and 8 Human Leukocyte Antigen alleles among 542 Caucasian RA cases and 551 Caucasian controls from Nurses' Health Study and Nurses' Health Study II. We created a weighted genetic risk score (GRS) and evaluated it as 7 ordinal groups using logistic regression (adjusting for age and smoking) to assess the relationship between GRS group and odds of developing seronegative (RF− and CCP−), seropositive (RF+ or CCP+), erosive, and seropositive, erosive RA phenotypes. In separate case only analyses, we assessed the relationships between GRS and age of symptom onset.
In 542 RA cases, 317 (58%) were seropositive, 163 (30%) had erosions and 105 (19%) were seropositive with erosions. Comparing the highest GRS risk group to the median group, we found an OR of 1.2 (95% CI = 0.8–2.1) for seronegative RA, 3.0 (95% CI = 1.9–4.7) for seropositive RA, 3.2 (95% CI = 1.8–5.6) for erosive RA, and 7.6 (95% CI = 3.6–16.3) for seropositive, erosive RA. No significant relationship was seen between GRS and age of onset.
Results suggest that seronegative and seropositive/erosive RA have different genetic architecture and support the importance of considering RA phenotypes in RA genetic studies.
Electronic medical records (EMRs) are a rich data source for discovery research but are underutilized due to the difficulty of extracting highly accurate clinical data. We assessed whether a classification algorithm incorporating narrative EMR data (typed physician notes), more accurately classifies subjects with rheumatoid arthritis (RA) compared to an algorithm using codified EMR data alone.
Subjects with ≥1 ICD9 RA code (714.xx) or who had anti-CCP checked in the EMR of two large academic centers were included into an ‘RA Mart’ (n=29,432). For all 29,432 subjects, we extracted narrative (using natural language processing) and codified RA clinical information. In a training set of 96 RA and 404 non-RA cases from the RA Mart classified by medical record review, we used narrative and codified data to develop classification algorithms using logistic regression. These algorithms were applied to the entire RA Mart. We calculated and compared the positive predictive value (PPV) of these algorithms by reviewing records of an additional 400 subjects classified as RA by the algorithms.
A complete algorithm (narrative and codified data) classified RA subjects with a significantly higher PPV of 94%, than an algorithm with codified data alone (PPV 88%). Characteristics of the RA cohort identified by the complete algorithm were comparable to existing RA cohorts (80% female, 63% anti-CCP+, 59% erosion+).
We demonstrate the ability to utilize complete EMR data to define an RA cohort with a PPV of 94%, which was superior to an algorithm using codified data alone.
Recent discoveries of risk alleles have made it possible to define genetic risk profiles for patients with rheumatoid arthritis (RA). We examined whether a cumulative score based on 22 validated genetic risk alleles for seropositive RA would identify high-risk, asymptomatic individuals who might benefit from preventive interventions.
We genotyped 14 single nucleotide polymorphisms (SNPs) at 13 validated RA risk loci and 8 HLA alleles among (1) 289 Caucasian seropositive cases and 481 controls from the US Nurses' Health Studies (NHS), and (2) 629 Caucasian CCP antibody positive cases and 623 controls from the Swedish Epidemiologic Investigation of RA (EIRA). We created a weighted genetic risk score (GRS), where the weight for each risk allele is the log of the published odds ratio. We used logistic regression to study associations with incident RA. We compared AUCs from a clinical-only model and clinical + genetic model in each cohort.
Patients with GRS > 1.25 standard deviations of the mean had a significantly higher OR of seropositive RA in both NHS (OR=2.9, 95%CI 1.8–4.6) and EIRA (OR=3.4, 95% CI 2.3–5.0) referent to the population average. In NHS, the AUC for a clinical model was 0.57 and for a clinical + genetic model was 0.66, and in EIRA was 0.63 and 0.75, respectively.
The combination of 22 risk alleles into a weighted genetic risk score significantly stratifies individuals for RA risk beyond clinical risk factors alone. However, given the low incidence of RA, the clinical utility of a weighted genetic risk score is limited in the general population.
rheumatoid arthritis; polymorphism; autoantibodies; anti-CCP; smoking
HLA-DRB1 shared epitope (HLA-SE), PTPN22 and CTLA4 alleles are associated with CCP+ RA.
We examined associations between HLA-SE, PTPN22, CTLA4 genotypes and RA phenotypes in a large cohort to (a) replicate prior associations with CCP status, and (b) determine associations with radiographic erosions and age of diagnosis.
689 RA patients from the Brigham RA Sequential Study (BRASS) were genotyped for HLA-SE, PTPN22 (rs2476601) and CTLA4 (rs3087243). Association between genotypes and CCP, RF erosive phenotypes and age at diagnosis were assessed with multivariable models adjusting for age, sex, and disease duration. Novel causal pathway analysis was used to test the hypothesis that genetic risk factors and CCP are in the causal pathway for predicting erosions.
In multivariable analysis, presence of any HLA-SE was strongly associated with CCP+ (OR 3.05 (2.18–4.25)), and RF+ (OR 2.53 (1.83–3.5)) phenotypes; presence of any PTPN22 T allele was associated with CCP+ (OR 1.81 (1.24–2.66)) and RF+ phenotypes (1.84 (1.27–2.66)). CTLA4 was not associated with CCP or RF phenotypes. While HLA-SE was associated with erosive RA phenotype (OR 1.52 (1.01–2.17)), this was no longer significant after conditioning on CCP. PTPN22 and CTLA4 were not associated with erosive phenotype. Presence of any HLA-SE was associated with on average 3.6 years earlier diagnosis compared with absence of HLA-SE (41.3 vs. 44.9 years, p=0.003) and PTPN22 was associated with 4.2 years earlier age of diagnosis (39.5 vs. 43.6 years, p=0.002). CTLA4 genotypes were not associated with age at diagnosis of RA.
In this large clinical cohort, we replicated the association between HLA-SE, and PTPN22 but not CTLA4 with CCP+ and RF+ phenotypes. We also found evidence for associations between HLA-SE, and PTPN22 and earlier age at diagnosis. Since HLA-SE is associated with erosive phenotype in unconditional analysis, but is not significant after conditioning on CCP, this suggests that CCP is in the causal pathway for predicting erosive phenotype.
rheumatoid arthritis; age at diagnosis; PTPN22; HLA; CCP
Electronic health records (EHR) can allow for the generation of large cohorts of individuals with given diseases for clinical and genomic research. A rate-limiting step is the development of electronic phenotype selection algorithms to find such cohorts. This study evaluated the portability of a published phenotype algorithm to identify rheumatoid arthritis (RA) patients from EHR records at three institutions with different EHR systems.
Materials and Methods
Physicians reviewed charts from three institutions to identify patients with RA. Each institution compiled attributes from various sources in the EHR, including codified data and clinical narratives, which were searched using one of two natural language processing (NLP) systems. The performance of the published model was compared with locally retrained models.
Applying the previously published model from Partners Healthcare to datasets from Northwestern and Vanderbilt Universities, the area under the receiver operating characteristic curve was found to be 92% for Northwestern and 95% for Vanderbilt, compared with 97% at Partners. Retraining the model improved the average sensitivity at a specificity of 97% to 72% from the original 65%. Both the original logistic regression models and locally retrained models were superior to simple billing code count thresholds.
These results show that a previously published algorithm for RA is portable to two external hospitals using different EHR systems, different NLP systems, and different target NLP vocabularies. Retraining the algorithm primarily increased the sensitivity at each site.
Electronic phenotype algorithms allow rapid identification of case populations in multiple sites with little retraining.
Automated learning; biomedical informatics; discovery and text and data mining methods; electronic health record; genetic; improving the education and skills training of health professionals; infection control; knowledge representations; linking the genotype and phenotype; medical informatics; natural language processing; other methods of information extraction; phenotype algorithms DNA databank machine learning; phenotype identification; phenotyping; rheumatoid arthritis; rheumatology; translational research – application of biological knowledge to clinical care
To identify additional variants in the major histocompatibility complex (MHC) region that independently contribute to risk in 2 disease subsets of rheumatoid arthritis (RA) defined according to the presence or absence of antibodies to citrullinated protein antigens (ACPAs).
In a multistep analytical strategy using unmatched as well as matched analyses to adjust for HLA–DRB1 genotype, we analyzed 2,221 single-nucleotide polymorphisms (SNPs) spanning 10.7 Mb, from 6p22.2 to 6p21.31, across the MHC. For ACPA-positive RA, we analyzed samples from the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA) and the North American Rheumatoid Arthritis Consortium (NARAC) studies (totaling 1,255 cases and 1,719 controls). For ACPA-negative RA, we used samples from the EIRA study (640 cases and 670 controls). Plink and SAS statistical packages were used to conduct all statistical analyses.
A total of 299 SNPs reached locus-wide significance (P < 2.3 × 10−5) for ACPA-positive RA, whereas surprisingly, no SNPs reached this significance for ACPA-negative RA. For ACPA-positive RA, we adjusted for known DRB1 risk alleles and identified additional independent associations with SNPs near HLA–DPB1 (rs3117213; odds ratio 1.42 [95% confidence interval 1.17–1.73], Pcombined = 0.0003 for the strongest association).
There are distinct genetic patterns of MHC associations in the 2 disease subsets of RA defined according to ACPA status. HLA–DPB1 is an independent risk locus for ACPA-positive RA. We did not identify any associations with SNPs within the MHC for ACPA-negative RA.
Next-generation DNA sequencing reveals rare alleles protective from type 1 diabetes.
A large population study using ultra-high-throughput DNA sequencing to re-sequence a genetic locus associated with type 1 diabetes reveals rare protective alleles.
The co-occurrence of autoimmune diseases such as rheumatoid arthritis (RA) and type 1 diabetes (T1D) has been reported in individuals and families. We studied the strength and nature of this association at the population level.
We conducted a case-control study of 1419 incident RA cases and 1674 controls between 1996 and 2003. Subjects were recruited from university, public and private rheumatology units throughout Sweden. Blood samples were tested for the presence of antibodies to cyclic citrullinated peptide (anti-CCP), rheumatoid factor (RF) and the presence or absence of the 620W PTPN22 allele. Information on history of diabetes was obtained by questionnaire, telephone interview, and medical record review. The prevalence of T1D and type 2 diabetes (T2D) was compared between incident RA cases and controls and further stratified by anti-CCP, RF status, and the presence of the PTPN22 risk allele.
T1D was associated with an increased risk of RA, OR 4.9 (95% CI 1.8–13.1), and was specific for anti-CCP+ RA, OR 7.3 (95% CI 2.7–20.0), but not anti-CCP negative RA. Further adjustment for PTPN22 attenuated the odds ratio for anti-CCP+ RA in individuals with T1D to 5.3 (95% CI 1.5–18.7). No association was observed between RA and T2D.
The association between T1D and RA is specific for a particular RA subset, anti-CCP+ RA. The risk of type 1 diabetics developing RA later in life may be attributed in part to the presence of the 620W PTPN22 allele, suggesting a common pathway for the pathogenesis of these two diseases.
Although genetic and non-genetic studies in mouse and human implicate the CD40 pathway in rheumatoid arthritis (RA), there are no approved drugs that inhibit CD40 signaling for clinical care in RA or any other disease. Here, we sought to understand the biological consequences of a CD40 risk variant in RA discovered by a previous genome-wide association study (GWAS) and to perform a high-throughput drug screen for modulators of CD40 signaling based on human genetic findings. First, we fine-map the CD40 risk locus in 7,222 seropositive RA patients and 15,870 controls, together with deep sequencing of CD40 coding exons in 500 RA cases and 650 controls, to identify a single SNP that explains the entire signal of association (rs4810485, P = 1.4×10−9). Second, we demonstrate that subjects homozygous for the RA risk allele have ∼33% more CD40 on the surface of primary human CD19+ B lymphocytes than subjects homozygous for the non-risk allele (P = 10−9), a finding corroborated by expression quantitative trait loci (eQTL) analysis in peripheral blood mononuclear cells from 1,469 healthy control individuals. Third, we use retroviral shRNA infection to perturb the amount of CD40 on the surface of a human B lymphocyte cell line (BL2) and observe a direct correlation between amount of CD40 protein and phosphorylation of RelA (p65), a subunit of the NF-κB transcription factor. Finally, we develop a high-throughput NF-κB luciferase reporter assay in BL2 cells activated with trimerized CD40 ligand (tCD40L) and conduct an HTS of 1,982 chemical compounds and FDA–approved drugs. After a series of counter-screens and testing in primary human CD19+ B cells, we identify 2 novel chemical inhibitors not previously implicated in inflammation or CD40-mediated NF-κB signaling. Our study demonstrates proof-of-concept that human genetics can be used to guide the development of phenotype-based, high-throughput small-molecule screens to identify potential novel therapies in complex traits such as RA.
A current challenge in human genetics is to follow-up “hits” from genome-wide association studies (GWAS) to guide drug discovery for complex traits. Previously, we identified a common variant in the CD40 locus as associated with risk of rheumatoid arthritis (RA). Here, we fine-map the CD40 signal of association through a combination of dense genotyping and exonic sequencing in large patient collections. Further, we demonstrate that the RA risk allele is a gain-of-function allele that increases the amount of CD40 on the surface of primary human B lymphocyte cells from healthy control individuals. Based on these observations, we develop a high-throughput assay to recapitulate the biology of the RA risk allele in a system suitable for a small molecule drug screen. After a series of primary screens and counter screens, we identify small molecules that inhibit CD40-mediated NF-kB signaling in human B cells. While this is only the first step towards a more comprehensive effort to identify CD40-specific inhibitors that may be used to treat RA, our study demonstrates a successful strategy to progress from a GWAS to a drug screen for complex traits such as RA.
The single nucleotide polymorphism (SNP) rs11761231 on chromosome 7q has been reported as a sexually dimorphic marker for rheumatoid arthritis susceptibility in a British population. We sought to replicate this finding and better characterize susceptibility alleles in the region in a North American population.
DNA from two North American collections of RA patients and controls (1605 cases and 2640 controls) was genotyped for rs11761231 and 16 additional chromosome 7q tag SNPs using Sequenom iPlex assays. Association tests were performed for each collection and also separately contrasting male cases versus male controls and female cases versus female controls. Principal components analysis (EIGENSTRAT) was used to determine association with RA before and after adjusting for population stratification in the subset of the samples (772 cases and 1213 controls) with whole genome SNP data.
We failed to replicate association of the 7q region with rheumatoid arthritis. Initially, rs11761231 showed evidence for association with RA in the NARAC collection (p=0.0076) and rs11765576 showed association with RA in both the NARAC (p = 0.019) and RA replication (p = 0.0013) collections. These markers also exhibited sexual differentiation. However, in the whole genome subset, neither SNP showed significant association with RA after correction for population stratification.
While two SNPs on chromosome 7q appeared to be associated with RA in a North American cohort, the significance of this finding did not withstand correction for population substructure. Our results emphasize the need to carefully account for population structure to avoid false positive disease associations.
For Genetic Analysis Workshop 16 Problem 1, we provided data for genome-wide association analysis of rheumatoid arthritis. Single-nucleotide polymorphism (SNP) genotype data were provided for 868 cases and 1194 controls that had been assayed using an Illumina 550 k platform. In addition, phenotypic data were provided from genotyping DRB1 alleles, which were classified according to the rheumatoid arthritis shared epitope, levels of anti-cyclic citrullinated peptide, and levels of rheumatoid factor IgM. Several questions could be addressed using the data, including analysis of genetic associations using single SNPs or haplotypes, as well as gene-gene and genetic analysis of SNPs for qualitative and quantitative factors.
SLE is an autoimmune disease influenced by genetic and environmental components. We performed a genome-wide association scan (GWAS) and observed novel association evidence with a variant inTNFAIP3(rs5029939, P = 2.89×10−12, OR = 2.29). We also found evidence of two independent signals of association to SLE risk, including one described in Rheumatoid Arthritis. These results establish that genetic variation inTNFAIP3contributes to differential risk for SLE and RA.
Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL), that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk). We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions—that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/).
Modern genetic studies, including genome-wide surveys for disease-associated loci and copy number variation, provide a list of critical genomic regions that play an important role in predisposition to disease. Using these regions to understand disease pathogenesis requires the ability to first distinguish causal genes from other nearby genes spuriously contained within these regions. To do this we must identify the key pathways suggested by those causal genes. In this manuscript we describe a statistical approach, Gene Relationships Across Implicated Loci (GRAIL), to achieve this task. It starts with genomic regions and identifies related subsets of genes involved in similar biological processes—these genes highlight the likely causal genes and the key pathways. GRAIL uses abstracts from the entirety of the published scientific literature about the genes to look for potential relationships between genes. We apply GRAIL to four very different phenotypes. In each case we identify a subset of highly related genes; in cases where false positive regions are present, GRAIL is able to separate out likely true positives. GRAIL therefore offers the potential to translate disease genomic regions from unbiased genomic surveys into the key processes that may be critical to the disease.
We carried out a genome-wide association study of genetic predictors of anti-cyclic citrullinated peptide antibody (anti-CCP) level in 531 self-reported non-Hispanic Caucasian Rheumatoid Arthritis (RA) patients enrolled in the Brigham Rheumatoid Arthritis Sequential Study (BRASS). For replication, we then analyzed 289 single nucleotide polymorphisms (SNPs) with P < 0.001 in BRASS in an independent population of 849 RA patients from the North American Rheumatoid Arthritis Consortium (NARAC). BRASS and NARAC samples were genotyped using the Affymetrix 100K and Illumina 550K platforms respectively. Association between SNPs and anti-CCP titer was tested using general linear models. The five most significant SNPs from BRASS all were within the major histocompatibility complex (MHC) region (P ≤ 3.5 × 10−6). After controlling for the human leukocyte antigen shared epitope (HLA-SE), the top SNPs still yielded P values < 0.0002. In NARAC, a single SNP from the MHC region near BTNL2 and HLA-DRA, rs1980493 (r2 = 0.85 with the top five SNPs from BRASS), was associated significantly with CCP titer (P = 6.1 × 10−5) even after adjustment for the HLA-SE (P = 0.0002). The top SNPs found in BRASS and NARAC had r2 = 0.46 and 0.64, respectively, to HLA-DRB1 DR3 alleles. These results confirm that the most significant genome region affecting anti-CCP titers in RA is the MHC region. We identified a SNP in moderate linkage disequilibrium (LD) with HLA-DR3, which may influence anti-CCP titer independently of the HLA-SE.
To identify susceptibility alleles associated with rheumatoid arthritis, we genotyped 397 individuals with rheumatoid arthritis for 116,204 SNPs and carried out an association analysis in comparison to publicly available genotype data for 1,211 related individuals from the Framingham Heart Study1. After evaluating and adjusting for technical and population biases, we identified a SNP at 6q23 (rs10499194, ∼150 kb from TNFAIP3 and OLIG3) that was reproducibly associated with rheumatoid arthritis both in the genome-wide association (GWA) scan and in 5,541 additional case-control samples (P = 10−3, GWA scan; P < 10−6, replication; P = 10−9, combined). In a concurrent study, the Wellcome Trust Case Control Consortium (WTCCC) has reported strong association of rheumatoid arthritis susceptibility to a different SNP located 3.8 kb from rs10499194 (rs6920220; P = 5 × 10−6 in WTCCC)2. We show that these two SNP associations are statistically independent, are each reproducible in the comparison of our data and WTCCC data, and define risk and protective haplotypes for rheumatoid arthritis at 6q23.
Rheumatoid arthritis has a complex mode of inheritance. Although HLA-DRB1 and PTPN22 are well-established susceptibility loci, other genes that confer a modest level of risk have been identified recently. We carried out a genomewide association analysis to identify additional genetic loci associated with an increased risk of rheumatoid arthritis.
We genotyped 317,503 single-nucleotide polymorphisms (SNPs) in a combined case-control study of 1522 case subjects with rheumatoid arthritis and 1850 matched control subjects. The patients were seropositive for autoantibodies against cyclic citrullinated peptide (CCP). We obtained samples from two data sets, the North American Rheumatoid Arthritis Consortium (NARAC) and the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA). Results from NARAC and EIRA for 297,086 SNPs that passed quality-control filters were combined with the use of Cochran-Mantel-Haenszel stratified analysis. SNPs showing a significant association with disease (P<1×10-8) were genotyped in an independent set of case subjects with anti-CCP-positive rheumatoid arthritis (485 from NARAC and 512 from EIRA) and in control subjects (1282 from NARAC and 495 from EIRA).
We observed associations between disease and variants in the major-histocompatibility-complex locus, in PTPN22, and in a SNP (rs3761847) on chromosome 9 for all samples tested, the latter with an odds ratio of 1.32 (95% confidence interval, 1.23 to 1.42; P = 4×10-14). The SNP is in linkage disequilibrium with two genes relevant to chronic inflammation: TRAF1 (encoding tumor necrosis factor receptor-associated factor 1) and C5 (encoding complement component 5).
A common genetic variant at the TRAF1-C5 locus on chromosome 9 is associated with an increased risk of anti-CCP-positive rheumatoid arthritis.
Rheumatoid arthritis is a chronic inflammatory disease with a substantial genetic component. Susceptibility to disease has been linked with a region on chromosome 2q.
We tested single-nucleotide polymorphisms (SNPs) in and around 13 candidate genes within the previously linked chromosome 2q region for association with rheumatoid arthritis. We then performed fine mapping of the STAT1-STAT4 region in a total of 1620 case patients with established rheumatoid arthritis and 2635 controls, all from North America. Implicated SNPs were further tested in an independent case-control series of 1529 patients with early rheumatoid arthritis and 881 controls, all from Sweden, and in a total of 1039 case patients and 1248 controls from three series of patients with systemic lupus erythematosus.
A SNP haplotype in the third intron of STAT4 was associated with susceptibility to both rheumatoid arthritis and systemic lupus erythematosus. The minor alleles of the haplotype-defining SNPs were present in 27% of chromosomes of patients with established rheumatoid arthritis, as compared with 22% of those of controls (for the SNP rs7574865, P = 2.81×10-7; odds ratio for having the risk allele in chromosomes of patients vs. those of controls, 1.32). The association was replicated in Swedish patients with recent-onset rheumatoid arthritis (P = 0.02) and matched controls. The haplotype marked by rs7574865 was strongly associated with lupus, being present on 31% of chromosomes of case patients and 22% of those of controls (P = 1.87×10-9; odds ratio for having the risk allele in chromosomes of patients vs. those of controls, 1.55). Homozygosity of the risk allele, as compared with absence of the allele, was associated with a more than doubled risk for lupus and a 60% increased risk for rheumatoid arthritis.
A haplotype of STAT4 is associated with increased risk for both rheumatoid arthritis and systemic lupus erythematosus, suggesting a shared pathway for these illnesses.
Lymphoblastoid cell lines (LCLs), originally collected as renewable sources of DNA, are now being used as a model system to study genotype–phenotype relationships in human cells, including searches for QTLs influencing levels of individual mRNAs and responses to drugs and radiation. In the course of attempting to map genes for drug response using 269 LCLs from the International HapMap Project, we evaluated the extent to which biological noise and non-genetic confounders contribute to trait variability in LCLs. While drug responses could be technically well measured on a given day, we observed significant day-to-day variability and substantial correlation to non-genetic confounders, such as baseline growth rates and metabolic state in culture. After correcting for these confounders, we were unable to detect any QTLs with genome-wide significance for drug response. A much higher proportion of variance in mRNA levels may be attributed to non-genetic factors (intra-individual variance—i.e., biological noise, levels of the EBV virus used to transform the cells, ATP levels) than to detectable eQTLs. Finally, in an attempt to improve power, we focused analysis on those genes that had both detectable eQTLs and correlation to drug response; we were unable to detect evidence that eQTL SNPs are convincingly associated with drug response in the model. While LCLs are a promising model for pharmacogenetic experiments, biological noise and in vitro artifacts may reduce power and have the potential to create spurious association due to confounding.
The use of lymphoblastoid cell lines (LCLs) has evolved from a renewable source of DNA to an in vitro model system to study the genetics of gene expression, drug response, and other traits in a controlled laboratory setting. While convincing relationships between SNPs and mRNA levels (eQTLs) have been described, the degree to which non-genetic variables also influence phenotypes in LCLs is less well characterized. In the course of attempting to map genes for drug responses in vitro, we evaluated the reproducibility of in vitro traits across replicates, the impact of the EBV virus used to transform B cells into cell lines, and the effect of in vitro culture conditions. We found that responses to at least some drugs and levels of many mRNAs can be technically well measured, but vary both across experiments and with non-genetic confounders such as growth rates, EBV levels, and ATP levels. The influence of such non-genetic factors can both decrease power to detect true relationships between DNA variation and traits and create the potential for non-genetic confounding and spurious associations between DNA variants and traits.