Infectious and inflammatory diseases have repeatedly shown strong genetic associations within the major histocompatibility complex (MHC); however, the basis for these associations remains elusive. To define host genetic effects on the outcome of a chronic viral infection, we performed genome-wide association analysis in a multiethnic cohort of HIV-1 controllers and progressors, and we analyzed the effects of individual amino acids within the classical human leukocyte antigen (HLA) proteins. We identified >300 genome-wide significant single-nucleotide polymorphisms (SNPs) within the MHC and none elsewhere. Specific amino acids in the HLA-B peptide binding groove, as well as an independent HLA-C effect, explain the SNP associations and reconcile both protective and risk HLA alleles. These results implicate the nature of the HLA–viral peptide interaction as the major factor modulating durable control of HIV infection.
Understanding the implications of genome-wide association studies (GWAS) for disease biology requires both identification of causal variants and definition of how these variants alter gene function. The non-coding triallelic dinucleotide polymorphism CCR6DNP is associated with risk for rheumatoid arthritis, and is considered likely causal because allelic variation correlates with expression of the chemokine receptor CCR6. Using transcription activator-like effector nuclease (TALEN) gene editing, we confirmed that CCR6DNP regulates CCR6. To identify the associated transcription factor, we applied a novel assay, Flanking Restriction Enhanced Pulldown (FREP), to identify specific association of poly (ADP-ribose) polymerase 1 (PARP-1) with CCR6DNP consistent with the established allelic risk hierarchy. Correspondingly, manipulation of PARP-1 expression or activity impaired CCR6 expression in several lineages. These findings show that CCR6DNP is a causal variant through which PARP-1 regulates CCR6, and introduce a highly efficient approach to interrogate non-coding genetic polymorphisms associated with human disease.
Genome-wide association studies (GWAS) identify loci associated with human disease risk, but bridging the gap between locus and mechanism has proven particularly difficult in cases where associated variants do not alter coding. We aimed to develop a generalizable approach to this problem. Previously, a dual nucleotide polymorphism within the first intron of CCR6 (termed the CCR6DNP) had been associated with risk for rheumatoid arthritis, but the pathway by which this variant altered gene expression could not be determined. Here, we employed sequence perturbation to confirm a regulatory role for the CCR6DNP. Next, using a new technique termed Flanking Restriction Enhanced Pulldown (FREP), we identified PARP-1 as the protein that regulates CCR6 expression through allelic association with the CCR6DNP, a finding confirmed by chromatin immunoprecipitation and functional assays. These findings reveal an unexpected regulatory pathway for CCR6 implicated in rheumatoid arthritis and other disease by human genetics, and more generally introduce a novel approach to identifying regulatory protein-DNA interactions.
Background: A long-standing epidemiological puzzle is the reduced rate of rheumatoid arthritis (RA) in those with schizophrenia (SZ) and vice versa. Traditional epidemiological approaches to determine if this negative association is underpinned by genetic factors would test for reduced rates of one disorder in relatives of the other, but sufficiently powered data sets are difficult to achieve. The genomics era presents an alternative paradigm for investigating the genetic relationship between two uncommon disorders.
Methods: We use genome-wide common single nucleotide polymorphism (SNP) data from independently collected SZ and RA case-control cohorts to estimate the SNP correlation between the disorders. We test a genotype X environment (GxE) hypothesis for SZ with environment defined as winter- vs summer-born.
Results: We estimate a small but significant negative SNP-genetic correlation between SZ and RA (−0.046, s.e. 0.026, P = 0.036). The negative correlation was stronger for the SNP set attributed to coding or regulatory regions (−0.174, s.e. 0.071, P = 0.0075). Our analyses led us to hypothesize a gene-environment interaction for SZ in the form of immune challenge. We used month of birth as a proxy for environmental immune challenge and estimated the genetic correlation between winter-born and non-winter born SZ to be significantly less than 1 for coding/regulatory region SNPs (0.56, s.e. 0.14, P = 0.00090).
Conclusions: Our results are consistent with epidemiological observations of a negative relationship between SZ and RA reflecting, at least in part, genetic factors. Results of the month of birth analysis are consistent with pleiotropic effects of genetic variants dependent on environmental context.
Schizophrenia; rheumatoid arthritis; genetic relationship; pleiotropy
Genome wide association studies in human autoimmune disorders has provided a long list of alleles with rather modest degrees of risk. A large fraction of these associations are likely due to either quantitative differences in gene expression or amino acid changes that regulate quantitative aspects of the immune response. While functional studies are still lacking for most of these associations, we present examples of autoimmune disease risk alleles that influence quantitative changes in lymphocyte activation, cytokine signaling and dendritic cell function. The analysis of immune quantitative traits associated with autoimmune loci is clearly going to be an important component of understanding the pathogenesis of autoimmunity. This will require both new and more efficient ways of characterizing the normal immune system, as well as large population resources with which genotype-phenotype correlations can be convincingly demonstrated. Future development of new therapies will depend on understanding the mechanistic underpinnings of immune regulation by these new risk loci.
A highly polygenic etiology and high degree of allele-sharing between ancestries have been well-elucidated in genetic studies of rheumatoid arthritis. Recently, the high-density genotyping array Immunochip for immune disease loci identified 14 new rheumatoid arthritis risk loci among individuals of European ancestry. Here, we aimed to identify new rheumatoid arthritis risk loci using Korean-specific Immunochip data.
We analyzed Korean rheumatoid arthritis case-control samples using the Immunochip and GWAS array to search for new risk alleles of rheumatoid arthritis with anti-citrullinated peptide antibodies. To increase power, we performed a meta-analysis of Korean data with previously published European Immunochip and GWAS data, for a total sample size of 9,299 Korean and 45,790 European case-control samples.
We identified 8 new rheumatoid arthritis susceptibility loci (TNFSF4, LBH, EOMES, ETS1–FLI1, COG6, RAD51B, UBASH3A and SYNGR1) that passed a genome-wide significance threshold (p<5×10−8), with evidence for three independent risk alleles at 1q25/TNFSF4. The risk alleles from the 7 new loci except for the TNFSF4 locus (monomorphic in Koreans), together with risk alleles from previously established RA risk loci, exhibited a high correlation of effect sizes between ancestries. Further, we refined the number of SNPs that represent potentially causal variants through a trans-ethnic comparison of densely genotyped SNPs.
This study demonstrates the advantage of dense-mapping and trans-ancestral analysis for identification of potentially causal SNPs. In addition, our findings support the importance of T cells in the pathogenesis and the fact of frequent overlap of risk loci among diverse autoimmune diseases.
Rheumatoid arthritis; Gene polymorphism; Anti-CCP
To evaluate African American rheumatoid arthritis HLA-DRB1 genetic risk by three validated allele classification systems, and by amino acid position and residue. To compare the genetic risk between African American and European ancestries.
Four-digit HLA-DRB1 genotyping was performed on 561 autoantibody-positive African American cases and 776 African American controls. Association analysis was performed on Tezenas du Montcel (TdM); de Vries (DV); and Mattey classification system alleles and separately by amino acid position and individual residues.
TdM S2 and S3P alleles were associated with RA (odds ratios (95% CI) 2.8 (2.0, 3.9) and 2.1 (1.7, 2.7), respectively). The DV (P-value=3.2 x 10−12) and Mattey (P-value=6.5 x 10−13) system alleles were both protective in African Americans. Amino acid position 11 (permutation P-value < 0.00001) accounted for nearly all variability explained by HLA-DRB1, although conditional analysis demonstrated that position 57 was also significant (0.01<= permutation P-val <=0.05). The valine and aspartic acid residues at position 11 conferred the highest risk for RA in African Americans.
With some exceptions, the genetic risk conferred by HLA-DRB1 in African Americans is similar to European ancestry at multiple levels: classification system (e.g., TdM), amino acid position (e.g. 11) and residue (Val 11). Unlike that reported from European ancestry, amino acid position 57 was associated with RA in African Americans, but positions 71 and 74 were not. Asp11 (OR = 1 in European ancestry) corresponds to the four digit classical allele, *09:01, also a risk allele for RA in Koreans.
A large portion of common variant loci associated with genetic risk for schizophrenia reside within non-coding sequence of unknown function. Here, we demonstrate promoter and enhancer enrichment in schizophrenia variants associated with expression quantitative trait loci (eQTL). The enrichment is greater when functional annotations derived from human brain are used relative to peripheral tissues. Regulatory trait concordance analysis ranked genes within schizophrenia genome-wide significant loci for a potential functional role, based on co-localization of a risk SNP, eQTL and regulatory element sequence. We identified potential physical interactions of non-contiguous proximal and distal regulatory elements. This was verified in prefrontal cortex and induced pluripotent stem cell-derived neurons for the L-type calcium channel (CACNA1C) risk locus. Our findings point to a functional link between schizophrenia-associated non-coding SNPs and 3-dimensional genome architecture associated with chromosomal loopings and transcriptional regulation in the brain.
A major challenge in human genetics is to devise a systematic strategy to integrate disease-associated variants with diverse genomic and biological datasets to provide insight into disease pathogenesis and guide drug discovery for complex traits such as rheumatoid arthritis (RA)1. Here, we performed a genome-wide association study (GWAS) meta-analysis in a total of >100,000 subjects of European and Asian ancestries (29,880 RA cases and 73,758 controls), by evaluating ~10 million single nucleotide polymorphisms (SNPs). We discovered 42 novel RA risk loci at a genome-wide level of significance, bringing the total to 1012–4. We devised an in-silico pipeline using established bioinformatics methods based on functional annotation5, cis-acting expression quantitative trait loci (cis-eQTL)6, and pathway analyses7–9 – as well as novel methods based on genetic overlap with human primary immunodeficiency (PID), hematological cancer somatic mutations and knock-out mouse phenotypes – to identify 98 biological candidate genes at these 101 risk loci. We demonstrate that these genes are the targets of approved therapies for RA, and further suggest that drugs approved for other indications may be repurposed for the treatment of RA. Together, this comprehensive genetic study sheds light on fundamental genes, pathways and cell types that contribute to RA pathogenesis, and provides empirical evidence that the genetics of RA can provide important information for drug discovery.
Every person carries a vast repertoire of CD4+ T-helper cells and CD8+ cytotoxic T cells for a healthy immune system. Somatic VDJ recombination at genomic loci that encode the T-cell receptor (TCR) is a key step during T-cell development, but how a single T cell commits to become either CD4+ or CD8+ is poorly understood. To evaluate the influence of TCR sequence variation on CD4+/CD8+ lineage commitment, we sequenced rearranged TCRs for both α and β chains in naïve T cells isolated from healthy donors and investigated gene segment usage and recombination patterns in CD4+ and CD8+ T-cell subsets. Our data demonstrate that most V and J gene segments are strongly biased in the naïve CD4+ and CD8+ subsets with some segments increasing the odds of being CD4+ (or CD8+) up to five-fold. These V and J gene associations are highly reproducible across individuals and independent of classical HLA genotype, explaining ~11% of the observed variance in the CD4+ vs. CD8+ propensity. In addition, we identified a strong independent association of the electrostatic charge of the complementarity determining region 3 (CDR3) in both α and β chains, where a positively charged CDR3 is associated with CD4+ lineage and a negatively charged CDR3 with CD8+ lineage. Our findings suggest that somatic variation in different parts of the TCR influences T-cell lineage commitment in a predominantly additive fashion. This notion can help delineate how certain structural features of the TCR-peptide-HLA complex influence thymic selection.
Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study.
Methods and Results
We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors.
We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM.
Despite the success of genome-wide association studies (GWAS) in detecting a large number of loci for complex phenotypes such as rheumatoid arthritis (RA) susceptibility, the lack of information on the causal genes leaves important challenges to interpret GWAS results in the context of the disease biology. Here, we genetically fine-map the RA risk locus at 19p13 to define causal variants, and explore the pleiotropic effects of these same variants in other complex traits. First, we combined Immunochip dense genotyping (n = 23,092 case/control samples), Exomechip genotyping (n = 18,409 case/control samples) and targeted exon-sequencing (n = 2,236 case/controls samples) to demonstrate that three protein-coding variants in TYK2 (tyrosine kinase 2) independently protect against RA: P1104A (rs34536443, OR = 0.66, P = 2.3x10-21), A928V (rs35018800, OR = 0.53, P = 1.2x10-9), and I684S (rs12720356, OR = 0.86, P = 4.6x10-7). Second, we show that the same three TYK2 variants protect against systemic lupus erythematosus (SLE, Pomnibus = 6x10-18), and provide suggestive evidence that two of the TYK2 variants (P1104A and A928V) may also protect against inflammatory bowel disease (IBD; Pomnibus = 0.005). Finally, in a phenome-wide association study (PheWAS) assessing >500 phenotypes using electronic medical records (EMR) in >29,000 subjects, we found no convincing evidence for association of P1104A and A928V with complex phenotypes other than autoimmune diseases such as RA, SLE and IBD. Together, our results demonstrate the role of TYK2 in the pathogenesis of RA, SLE and IBD, and provide supporting evidence for TYK2 as a promising drug target for the treatment of autoimmune diseases.
Background. The aim of our work was to replicate, in a Southern European population, the association reported in Northern populations between PTPRC locus and response to anti-tumor necrosis factor (anti-TNF) treatment in rheumatoid arthritis (RA). We also looked at associations between five RA risk alleles and treatment response. Methods. We evaluated associations between anti-TNF treatment responses assessed by DAS28 change and by EULAR response at six months in 383 Portuguese patients. Univariate and multivariate linear and logistic regression analyses were performed. In a second step to confirm our findings, we pooled our population with 265 Spanish patients. Results. No association was found between PTPRC rs10919563 allele and anti-TNF treatment response, neither in Portuguese modeling for several clinical variables nor in the overall population combining Portuguese and Spanish patients. The minor allele for RA susceptibility, rs3761847 SNP in TRAF1/C5 region, was associated with a poor response in linear and logistic univariate and multivariate regression analyses. No association was observed with the other allellic variants. Results were confirmed in the pooled analysis. Conclusion. This study did not replicate the association between PTPRC and the response to anti-TNF treatment in our Southern European population. We found that TRAF1/C5 risk RA variants potentially influence anti-TNF treatment response.
Differences in lipid levels associated with cardiovascular (CV) risk between rheumatoid arthritis (RA) and the general population remain unclear. Determining these differences is important in understanding the role of lipids in CV risk in RA.
We studied 2,005 RA subjects from two large academic medical centers. We extracted electronic medical record (EMR) data on the first low density lipoprotein (LDL), total cholesterol (TChol) and high density lipoprotein (HDL) within 1 year of the LDL. Subjects with an electronic statin prescription prior to the first LDL were excluded.
We compared lipid levels in RA to levels from the general United States population (Carroll, et al., JAMA 2012), using the t-test and stratifying by published parameters, i.e. 2007–2010, women. We determined lipid trends using separate linear regression models for TChol, LDL and HDL, testing the association between year of measurement (1989–2010) and lipid level, adjusted by age and gender. Lipid trends were qualitatively compared to those reported in Carroll, et al.
Women with RA had a significantly lower Tchol (186 vs 200mg/dL, p=0.002) and LDL (105 vs 118mg/dL, p=0.001) compared to the general population (2007–2010). HDL was not significantly different in the two groups. In the RA cohort, Tchol and LDL significantly decreased each year, while HDL increased (all with p<0.0001), consistent with overall trends observed in Carroll, et al.
RA patients appear to have an overall lower Tchol and LDL than the general population, despite the general overall risk of CVD in RA from observational studies.
To identify novel genetic risk factors for rheumatoid arthritis (RA), we conducted a genome-wide association study (GWAS) meta-analysis of 5,539 autoantibody positive RA cases and 20,169 controls of European descent, followed by replication in an independent set of 6,768 RA cases and 8,806 controls. Of 34 SNPs selected for replication, 7 novel RA risk alleles were identified at genome-wide significance (P<5×10−8) in analysis of all 41,282 samples. The associated SNPs are near genes of known immune function, including IL6ST, SPRED2, RBPJ, CCR6, IRF5, and PXK. We also refined the risk alleles at two established RA risk loci (IL2RA and CCL21) and confirmed the association at AFF3. These new associations bring the total number of confirmed RA risk loci to 31 among individuals of European ancestry. An additional 11 SNPs replicated at P<0.05, many of which are validated autoimmune risk alleles, suggesting that most represent bona fide RA risk alleles.
Anti–tumor necrosis factor α (anti-TNF) therapy is a mainstay of treatment in rheumatoid arthritis (RA). The aim of the present study was to test established RA genetic risk factors to determine whether the same alleles also influence the response to anti-TNF therapy.
A total of 1,283 RA patients receiving etanercept, infliximab, or adalimumab therapy were studied from among an international collaborative consortium of 9 different RA cohorts. The primary end point compared RA patients with a good treatment response according to the European League Against Rheumatism (EULAR) response criteria (n = 505) with RA patients considered to be nonresponders (n = 316). The secondary end point was the change from baseline in the level of disease activity according to the Disease Activity Score in 28 joints (ΔDAS28). Clinical factors such as age, sex, and concomitant medications were tested as possible correlates of treatment response. Thirty-one single-nucleotide polymorphisms (SNPs) associated with the risk of RA were genotyped and tested for any association with treatment response, using univariate and multivariate logistic regression models.
Of the 31 RA-associated risk alleles, a SNP at the PTPRC (also known as CD45) gene locus (rs10919563) was associated with the primary end point, a EULAR good response versus no response (odds ratio [OR] 0.55, P = 0.0001 in the multivariate model). Similar results were obtained using the secondary end point, the ΔDAS28 (P = 0.0002). There was suggestive evidence of a stronger association in autoantibody-positive patients with RA (OR 0.55, 95% confidence interval [95% CI] 0.39–0.76) as compared with autoantibody-negative patients (OR 0.90, 95% CI 0.41–1.99).
Statistically significant associations were observed between the response to anti-TNF therapy and an RA risk allele at the PTPRC gene locus. Additional studies will be required to replicate this finding in additional patient collections.
Treatment strategies blocking tumor necrosis factor (anti-TNF) have proven very successful in patients with rheumatoid arthritis (RA). However, a significant subset of patients does not respond for unknown reasons. Currently there are no means of identifying these patients prior to treatment. This study was aimed at identifying genetic factors predicting anti-TNF treatment outcome in patient with RA using a genome-wide association approach.
We conducted a multi-stage, genome-wide association study with a primary analysis of 2,557,253 single nucleotide polymorphisms (SNPs) in 882 RA patients receiving anti-TNF therapy included through the Dutch Rheumatoid Arthritis Monitoring (DREAM) registry and the database of Apotheekzorg. Linear regression analysis of changes in the Disease Activity Score in 28 joints after 14 weeks of treatment was performed using an additive model. Markers with a p<10−3 were selected for replication in 1,821 RA patients from three independent cohorts. Pathway analysis including all SNPs with a p-value < 10−3 was performed using Ingenuity.
Seven hundred seventy two markers demonstrated evidence of association with treatment outcome in the initial stage. Eight genetic loci showed improved p-value in the overall meta-analysis compared to the first stage, three of which (rs1568885, rs1813443 and rs4411591) showed directional consistency over all four studied cohorts. We were unable to replicate markers previously reported to be associated with anti-TNF outcome. Network analysis indicated strong involvement of biological processes underlying inflammatory response and cell morphology.
Using a multi-stage strategy, we have identified 8 genetic loci associated with response to anti-TNF treatment. Further studies are required to validate these findings in additional patient collections.
anti-TNF; gene polymorphism; pharmacogenetics; rheumatoid arthritis; genome-wide association study
To study genetic factors that influence quantitative anti-cyclic citrullinated peptide (anti-CCP) antibody levels in RA patients.
We carried out a genome wide association study (GWAS) meta-analysis using 1,975 anti-CCP+ RA patients from 3 large cohorts, the Brigham Rheumatoid Arthritis Sequential Study (BRASS), North American Rheumatoid Arthritis Consortium (NARAC), and the Epidemiological Investigation of RA (EIRA). We also carried out a genome-wide complex trait analysis (GCTA) to estimate the heritability of anti-CCP levels.
GWAS-meta analysis showed that anti-CCP levels were most strongly associated with the human leukocyte antigen (HLA) region with a p-value of 2×10−11 for rs1980493. There were 112 SNPs in this region that exceeded the genome-wide significance threshold of 5×10−8, and all were in linkage disequilibrium (LD) with the HLA- DRB1*03 allele with LD r2 in the range of 0.25-0.88. Suggestive novel associations outside of the HLA region were also observed for rs8063248 (near the GP2 gene) with a p-value of 3×10−7. None of the known RA risk alleles (~52 loci) were associated with anti-CCP level. Heritability analysis estimated that 44% of anti-CCP variation was attributable to genetic factors captured by GWAS variants.
Anti-CCP level is a heritable trait. HLA-DR3 and GP2 are associated with lower anti-CCP levels.
RA; GWAS; anti-CCP; heritability
Vitamin D may have an immunological role in Crohn’s disease (CD) and ulcerative colitis (UC). Retrospective studies suggested a weak association between vitamin D status and disease activity but have significant limitations.
Using a multi-institution inflammatory bowel disease (IBD) cohort, we identified all CD and UC patients who had at least one measured plasma 25-hydroxy vitamin D [25(OH)D]. Plasma 25(OH)D was considered sufficient at levels ≥ 30ng/mL. Logistic regression models adjusting for potential confounders were used to identify impact of measured plasma 25(OH)D on subsequent risk of IBD-related surgery or hospitalization. In a subset of patients where multiple measures of 25(OH)D were available, we examined impact of normalization of vitamin D status on study outcomes.
Our study included 3,217 patients (55% CD, mean age 49 yrs). The median lowest plasma 25(OH)D was 26ng/ml (IQR 17–35ng/ml). In CD, on multivariable analysis, plasma 25(OH)D < 20ng/ml was associated with an increased risk of surgery (OR 1.76 (1.24 – 2.51) and IBD-related hospitalization (OR 2.07, 95% CI 1.59 – 2.68) compared to those with 25(OH)D ≥ 30ng/ml. Similar estimates were also seen for UC. Furthermore, CD patients who had initial levels < 30ng/ml but subsequently normalized their 25(OH)D had a reduced likelihood of surgery (OR 0.56, 95% CI 0.32 – 0.98) compared to those who remained deficient.
Low plasma 25(OH)D is associated with increased risk of surgery and hospitalizations in both CD and UC and normalization of 25(OH)D status is associated with a reduction in the risk of CD-related surgery.
Crohn’s disease; ulcerative colitis; vitamin D; surgery; hospitalization
Prior studies identifying patients with inflammatory bowel disease (IBD) utilizing administrative codes have yielded inconsistent results. Our objective was to develop a robust electronic medical record (EMR) based model for classification of IBD leveraging the combination of codified data and information from clinical text notes using natural language processing (NLP).
Using the EMR of 2 large academic centers, we created data marts for Crohn’s disease (CD) and ulcerative colitis (UC) comprising patients with ≥ 1 ICD-9 code for each disease. We utilized codified (i.e. ICD9 codes, electronic prescriptions) and narrative data from clinical notes to develop our classification model. Model development and validation was performed in a training set of 600 randomly selected patients for each disease with medical record review as the gold standard. Logistic regression with the adaptive LASSO penalty was used to select informative variables.
We confirmed 399 (67%) CD cases in the CD training set and 378 (63%) UC cases in the UC training set. For both, a combined model including narrative and codified data had better accuracy (area under the curve (AUC) for CD 0.95; UC 0.94) than models utilizing only disease ICD-9 codes (AUC 0.89 for CD; 0.86 for UC). Addition of NLP narrative terms to our final model resulted in classification of 6–12% more subjects with the same accuracy.
Inclusion of narrative concepts identified using NLP improves the accuracy of EMR case-definition for CD and UC while simultaneously identifying more subjects compared to models using codified data alone.
Crohn’s disease; ulcerative colitis; disease cohort; natural language processing; informatics
While accurate measures of heritability are needed to understand the pharmacogenetic basis of drug treatment response, these are generally not available, since it is unfeasible to give medications to individuals for which treatment is not indicated. Using a polygenic linear mixed modeling approach, we estimated lower-bounds on asthma heritability and the heritability of two related drug-response phenotypes, bronchodilator response and airway hyperreactivity, using genome-wide SNP data from existing asthma cohorts. Our estimate of the heritability for bronchodilator response is 28.5% (se 16%, p = 0.043) and airway hyperresponsiveness is 51.1% (se 34%, p = 0.064), while we estimate asthma genetic liability at 61.5% (se 16%, p < 0.001). Our results agree with previously published estimates of the heritability of these traits, suggesting that the LMM method is useful for computing the heritability of other pharmacogenetic traits. Furthermore, our results indicate that multiple SNP main-effects, including SNPs as yet unidentified by GWAS methods, together explain a sizable portion of the heritability of these traits.
Asthma; Pharmacogenetics; Heritability; Bronchodilator Response; Airway Hyperresponsiveness
While genetic determinants of LDL cholesterol levels are well characterized in the general population, they are understudied in rheumatoid arthritis (RA). Our objective was to determine the association of established LDL and RA genetic alleles with LDL levels in RA cases compared to non-RA controls.
Using electronic medical records (EMR) data, we linked validated RA cases and non-RA controls to discarded blood samples. For each individual, we extracted data on: 1st LDL measurement, age, gender, and year of LDL measurement. We genotyped subjects for 11 LDL and 44 non-HLA RA alleles, and calculated RA and LDL genetic risk scores (GRS). We tested the association between each GRS and LDL level using multivariate linear regression models adjusted by age, gender, year of LDL measurement, and RA status.
Among 567 RA cases and 979 controls, 80% were female and the mean age at 1st LDL measurement was 55 years. RA cases had significantly lower mean LDL levels than controls (117.2 vs. 125.6mg/dL, respectively, p<0.0001). Each unit increase in LDL GRS was associated with 0.8mg/dL higher LDL levels in both RA cases and controls (p=3.0×10−7). Each unit increase in RA GRS was associated with 4.3mg/dL lower LDL levels in both groups (p=0.01).
LDL alleles were associated with higher LDL levels in RA. RA alleles were associated with lower LDL levels in both RA cases and controls. Since RA cases carry more RA alleles, these findings suggest a genetic basis for epidemiologic observations of lower LDL levels in RA.
Rheumatoid arthritis; low density lipoprotein; genetics; human leukocyte antigen
Psychiatric co-morbidity is common in Crohn’s disease (CD) and ulcerative colitis (UC). IBD-related surgery or hospitalizations represent major events in the natural history of disease. Whether there is a difference in risk of psychiatric co-morbidity following surgery in CD and UC has not been examined previously.
We used a multi-institution cohort of IBD patients without a diagnosis code for anxiety or depression preceding their IBD-related surgery or hospitalization. Demographic, disease, and treatment related variables were retrieved. Multivariate logistic regression analysis was performed to individually identify risk factors for depression and anxiety.
Our study included a total of 707 CD and 530 UC patients who underwent bowel resection surgery and did not have depression prior to surgery. The risk of depression 5 years after surgery was 16% and 11% in CD and UC respectively. We found no difference in the risk of depression following surgery in CD and UC patients (adjusted OR 1.11, 95%CI 0.84 – 1.47). Female gender, co-morbidity, immunosuppressant use, perianal disease, stoma surgery, and early surgery within 3 years of care predicted depression after CD-surgery; only female gender and co-morbidity predicted depression in UC. Only 12% of the CD cohort had ≥ 4 risk factors for depression, but among them nearly 44% were subsequently received a diagnosis code for depression.
IBD-related surgery or hospitalization is associated with a significant risk for depression and anxiety with a similar magnitude of risk in both diseases.
Crohn’s disease; depression; anxiety; surgery; hospitalization
The significance of non-RA autoantibodies in patients with rheumatoid arthritis (RA) is unclear. We studied associations between autoimmune risk alleles and autoantibodies in RA cases and non-RA controls, and autoantibodies and clinical diagnoses from the electronic medical records (EMR).
We studied 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry from the EMR from two large academic centers. We measured antibodies to citrullinated peptides (ACPA), anti-nuclear antibodies (ANA), antibodies to tissue transglutaminase (anti-tTG), antibodies to thyroid peroxidase (anti-TPO). We genotyped subjects for autoimmune risk alleles, and studied the association between number of autoimmune risk alleles and number of types of autoantibodies present. We conducted a phenome-wide association study (PheWAS) to study potential associations between autoantibodies and clinical diagnoses among RA cases and controls.
Mean age was 60.7 in RA and 64.6 years in controls, and both were 79% female. The prevalence of ACPA and ANA was higher in RA cases compared to controls (p<0.0001, both); we observed no difference in anti-TPO and anti-tTG. Carriage of higher numbers of autoimmune risk alleles was associated with increasing types of autoantibodies in RA cases (p=4.4x10−6) and controls (p=0.002). From the PheWAS, ANA was significantly associated with Sjogren’s/siccain RA cases.
The increased frequency of autoantibodies in RA cases and controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS analyses within the EMR linked to blood samples provide a novel method to test for the clinical significance of biomarkers in disease.