Psychiatric co-morbidity, in particular major depression and anxiety is common in patients with Crohn’s disease (CD) and ulcerative colitis (UC). Prior studies examining this may be confounded by the co-existence of functional bowel symptoms. Limited data exists examining an association between depression or anxiety and disease-specific endpoints such as bowel surgery.
Using a multi-institution cohort of patients with CD and UC, we identified those who also had co-existing psychiatric co-morbidity (major depressive disorder or generalized anxiety). After excluding those diagnosed with such co-morbidity for the first time following surgery, we used multivariate logistic regression to examine the independent effect of psychiatric co-morbidity on IBD-related surgery and hospitalization. To account for confounding by disease severity, we adjusted for a propensity score estimating likelihood of psychiatric co-morbidity influenced by severity of disease in our models.
A total of 5,405 CD and 5,429 UC patients were included in this study; one-fifth had either major depressive disorder or generalized anxiety. In multivariate analysis, adjusting for potential confounders and the propensity score, presence of mood or anxiety co-morbidity was associated with a 28% increase in risk of surgery in CD (OR 1.28, 95% CI 1.03 – 1.57) but not UC (OR 1.01, 95% CI 0.80 – 1.28). Psychiatric co-morbidity was associated with increased healthcare utilization.
Depressive disorder or generalized anxiety is associated with a modestly increased risk of surgery in patients with CD. Interventions addressing this may improve patient outcomes.
Crohn’s disease; ulcerative colitis; depression; surgery; hospitalization
Recent work has shown that much of the missing heritability of complex traits can be resolved by estimates of heritability explained by all genotyped SNPs. However, it is currently unknown how much heritability is missing due to poor tagging or additional causal variants at known GWAS loci. Here, we use variance components to quantify the heritability explained by all SNPs at known GWAS loci in nine diseases from WTCCC1 and WTCCC2. After accounting for expectation, we observed all SNPs at known GWAS loci to explain more heritability than GWAS-associated SNPs on average (). For some diseases, this increase was individually significant: for Multiple Sclerosis (MS) () and for Crohn's Disease (CD) (); all analyses of autoimmune diseases excluded the well-studied MHC region. Additionally, we found that GWAS loci from other related traits also explained significant heritability. The union of all autoimmune disease loci explained more MS heritability than known MS SNPs () and more CD heritability than known CD SNPs (), with an analogous increase for all autoimmune diseases analyzed. We also observed significant increases in an analysis of Rheumatoid Arthritis (RA) samples typed on ImmunoChip, with more heritability from all SNPs at GWAS loci () and more heritability from all autoimmune disease loci () compared to known RA SNPs (including those identified in this cohort). Our methods adjust for LD between SNPs, which can bias standard estimates of heritability from SNPs even if all causal variants are typed. By comparing adjusted estimates, we hypothesize that the genome-wide distribution of causal variants is enriched for low-frequency alleles, but that causal variants at known GWAS loci are skewed towards common alleles. These findings have important ramifications for fine-mapping study design and our understanding of complex disease architecture.
Heritable diseases have an unknown underlying “genetic architecture” that defines the distribution of effect-sizes for disease-causing mutations. Understanding this genetic architecture is an important first step in designing disease-mapping studies, and many theories have been developed on the nature of this distribution. Here, we evaluate the hypothesis that additional heritable variation lies at previously known associated loci but is not fully explained by the single most associated marker. We develop methods based on variance-components analysis to quantify this type of “local” heritability, demonstrating that standard strategies can be falsely inflated or deflated due to correlation between neighboring markers and propose a robust adjustment. In analysis of nine common diseases we find a significant average increase of local heritability, consistent with multiple common causal variants at an average locus. Intriguingly, for autoimmune diseases we also observe significant local heritability in loci not associated with the specific disease but with other autoimmune diseases, implying a highly correlated underlying disease architecture. These findings have important implications to the design of future studies and our general understanding of common disease.
Autoimmune disease results from a loss of tolerance to self-antigens in genetically susceptible individuals. Completely understanding this process requires that targeted antigens be identified, and so a number of techniques have been developed to determine immune receptor specificities. We previously reported the construction of a phage-displayed synthetic human peptidome and a proof-of-principle analysis of antibodies from three patients with neurological autoimmunity. Here we present data from a large-scale screen of 298 independent antibody repertoires, including those from 73 healthy sera, using phage immunoprecipitation sequencing. The resulting database of peptide-antibody interactions characterizes each individual’s unique autoantibody fingerprint, and includes specificities found to occur frequently in the general population as well as those associated with disease. Screening type 1 diabetes (T1D) patients revealed a prematurely polyautoreactive phenotype compared with their matched controls. A collection of cerebrospinal fluids and sera from 63 multiple sclerosis patients uncovered novel, as well as previously reported antibody-peptide interactions. Finally, a screen of synovial fluids and sera from 64 rheumatoid arthritis patients revealed novel disease-associated antibody specificities that were independent of seropositivity status. This work demonstrates the utility of performing PhIP-Seq screens on large numbers of individuals and is another step toward defining the full complement of autoimmunoreactivities in health and disease.
autoantigen discovery; high throughput screening; PhIP-Seq; proteomics
Psychiatric co-morbidity is common in Crohn’s disease (CD) and ulcerative colitis (UC). IBD-related surgery or hospitalizations represent major events in the natural history of disease. Whether there is a difference in risk of psychiatric co-morbidity following surgery in CD and UC has not been examined previously.
We used a multi-institution cohort of IBD patients without a diagnosis code for anxiety or depression preceding their IBD-related surgery or hospitalization. Demographic, disease, and treatment related variables were retrieved. Multivariate logistic regression analysis was performed to individually identify risk factors for depression and anxiety.
Our study included a total of 707 CD and 530 UC patients who underwent bowel resection surgery and did not have depression prior to surgery. The risk of depression 5 years after surgery was 16% and 11% in CD and UC respectively. We found no difference in the risk of depression following surgery in CD and UC patients (adjusted OR 1.11, 95%CI 0.84 – 1.47). Female gender, co-morbidity, immunosuppressant use, perianal disease, stoma surgery, and early surgery within 3 years of care predicted depression after CD-surgery; only female gender and co-morbidity predicted depression in UC. Only 12% of the CD cohort had ≥ 4 risk factors for depression, but among them nearly 44% were subsequently received a diagnosis code for depression.
IBD-related surgery or hospitalization is associated with a significant risk for depression and anxiety with a similar magnitude of risk in both diseases.
Crohn’s disease; depression; anxiety; surgery; hospitalization
The significance of non-RA autoantibodies in patients with rheumatoid arthritis (RA) is unclear. We studied associations between autoimmune risk alleles and autoantibodies in RA cases and non-RA controls, and autoantibodies and clinical diagnoses from the electronic medical records (EMR).
We studied 1,290 RA cases and 1,236 non-RA controls of European genetic ancestry from the EMR from two large academic centers. We measured antibodies to citrullinated peptides (ACPA), anti-nuclear antibodies (ANA), antibodies to tissue transglutaminase (anti-tTG), antibodies to thyroid peroxidase (anti-TPO). We genotyped subjects for autoimmune risk alleles, and studied the association between number of autoimmune risk alleles and number of types of autoantibodies present. We conducted a phenome-wide association study (PheWAS) to study potential associations between autoantibodies and clinical diagnoses among RA cases and controls.
Mean age was 60.7 in RA and 64.6 years in controls, and both were 79% female. The prevalence of ACPA and ANA was higher in RA cases compared to controls (p<0.0001, both); we observed no difference in anti-TPO and anti-tTG. Carriage of higher numbers of autoimmune risk alleles was associated with increasing types of autoantibodies in RA cases (p=4.4x10−6) and controls (p=0.002). From the PheWAS, ANA was significantly associated with Sjogren’s/siccain RA cases.
The increased frequency of autoantibodies in RA cases and controls was associated with the number of autoimmune risk alleles carried by an individual. PheWAS analyses within the EMR linked to blood samples provide a novel method to test for the clinical significance of biomarkers in disease.
The genetic association of the major histocompatibility complex (MHC) to rheumatoid arthritis risk has commonly been attributed to HLA-DRB1 alleles. Yet controversy persists about the causal variants in HLA-DRB1 and the presence of independent effects elsewhere in the MHC. Using existing genome-wide SNP data in 5,018 seropositive cases and 14,974 controls, we imputed and tested classical alleles and amino acid polymorphisms for HLA-A, B, C, DPA1, DPB1, DQA1, DQB1, and DRB1 along with 3,117 SNPs across the MHC. Conditional and haplotype analyses reveal that three amino acid positions (11, 71 and 74) in HLA-DRβ1, and single amino acid polymorphisms in HLA-B (position 9) and HLA-DPβ1 (position 9), all located in the peptide-binding grooves, almost completely explain the MHC association to disease risk. This study illustrates how imputation of functional variation from large reference panels can help fine-map association signals in the MHC.
We investigated the prevalence of xenotropic murine leukemia virus-related virus (XMRV) in 293 participants seen at academic hospitals in Boston, Massachusetts. Participants were recruited from five groups of patients: chronic fatigue syndrome (CFS, n = 32), HIV infection (n = 43), rheumatoid arthritis (RA, n = 97), hematopoietic stem-cell or solid organ transplant (n = 26), or a general cohort of patients presenting for medical care (n = 95). XMRV DNA was not detected in any participant samples. We found no association between XMRV and patients with CFS or chronic immunomodulatory conditions.
XMRV; chronic fatigue syndrome; HIV infection; rheumatoid arthritis; hematopoietic stem-cell transplantation; solid organ transplantation
Cumulative genetic profiles can help identify individuals at high-risk for developing RA. We examined the impact of 39 validated genetic risk alleles on the risk of RA phenotypes characterized by serologic and erosive status.
We evaluated single nucleotide polymorphisms at 31 validated RA risk loci and 8 Human Leukocyte Antigen alleles among 542 Caucasian RA cases and 551 Caucasian controls from Nurses' Health Study and Nurses' Health Study II. We created a weighted genetic risk score (GRS) and evaluated it as 7 ordinal groups using logistic regression (adjusting for age and smoking) to assess the relationship between GRS group and odds of developing seronegative (RF− and CCP−), seropositive (RF+ or CCP+), erosive, and seropositive, erosive RA phenotypes. In separate case only analyses, we assessed the relationships between GRS and age of symptom onset.
In 542 RA cases, 317 (58%) were seropositive, 163 (30%) had erosions and 105 (19%) were seropositive with erosions. Comparing the highest GRS risk group to the median group, we found an OR of 1.2 (95% CI = 0.8–2.1) for seronegative RA, 3.0 (95% CI = 1.9–4.7) for seropositive RA, 3.2 (95% CI = 1.8–5.6) for erosive RA, and 7.6 (95% CI = 3.6–16.3) for seropositive, erosive RA. No significant relationship was seen between GRS and age of onset.
Results suggest that seronegative and seropositive/erosive RA have different genetic architecture and support the importance of considering RA phenotypes in RA genetic studies.
To optimally leverage the scalability and unique features of the electronic health records (EHR) for research that would ultimately improve patient care, we need to accurately identify patients and extract clinically meaningful measures. Using multiple sclerosis (MS) as a proof of principle, we showcased how to leverage routinely collected EHR data to identify patients with a complex neurological disorder and derive an important surrogate measure of disease severity heretofore only available in research settings.
In a cross-sectional observational study, 5,495 MS patients were identified from the EHR systems of two major referral hospitals using an algorithm that includes codified and narrative information extracted using natural language processing. In the subset of patients who receive neurological care at a MS Center where disease measures have been collected, we used routinely collected EHR data to extract two aggregate indicators of MS severity of clinical relevance multiple sclerosis severity score (MSSS) and brain parenchymal fraction (BPF, a measure of whole brain volume).
The EHR algorithm that identifies MS patients has an area under the curve of 0.958, 83% sensitivity, 92% positive predictive value, and 89% negative predictive value when a 95% specificity threshold is used. The correlation between EHR-derived and true MSSS has a mean R2 = 0.38±0.05, and that between EHR-derived and true BPF has a mean R2 = 0.22±0.08. To illustrate its clinical relevance, derived MSSS captures the expected difference in disease severity between relapsing-remitting and progressive MS patients after adjusting for sex, age of symptom onset and disease duration (p = 1.56×10−12).
Incorporation of sophisticated codified and narrative EHR data accurately identifies MS patients and provides estimation of a well-accepted indicator of MS severity that is widely used in research settings but not part of the routine medical records. Similar approaches could be applied to other complex neurological disorders.
Electronic medical records (EMRs) are a rich data source for discovery research but are underutilized due to the difficulty of extracting highly accurate clinical data. We assessed whether a classification algorithm incorporating narrative EMR data (typed physician notes), more accurately classifies subjects with rheumatoid arthritis (RA) compared to an algorithm using codified EMR data alone.
Subjects with ≥1 ICD9 RA code (714.xx) or who had anti-CCP checked in the EMR of two large academic centers were included into an ‘RA Mart’ (n=29,432). For all 29,432 subjects, we extracted narrative (using natural language processing) and codified RA clinical information. In a training set of 96 RA and 404 non-RA cases from the RA Mart classified by medical record review, we used narrative and codified data to develop classification algorithms using logistic regression. These algorithms were applied to the entire RA Mart. We calculated and compared the positive predictive value (PPV) of these algorithms by reviewing records of an additional 400 subjects classified as RA by the algorithms.
A complete algorithm (narrative and codified data) classified RA subjects with a significantly higher PPV of 94%, than an algorithm with codified data alone (PPV 88%). Characteristics of the RA cohort identified by the complete algorithm were comparable to existing RA cohorts (80% female, 63% anti-CCP+, 59% erosion+).
We demonstrate the ability to utilize complete EMR data to define an RA cohort with a PPV of 94%, which was superior to an algorithm using codified data alone.
Recent discoveries of risk alleles have made it possible to define genetic risk profiles for patients with rheumatoid arthritis (RA). We examined whether a cumulative score based on 22 validated genetic risk alleles for seropositive RA would identify high-risk, asymptomatic individuals who might benefit from preventive interventions.
We genotyped 14 single nucleotide polymorphisms (SNPs) at 13 validated RA risk loci and 8 HLA alleles among (1) 289 Caucasian seropositive cases and 481 controls from the US Nurses' Health Studies (NHS), and (2) 629 Caucasian CCP antibody positive cases and 623 controls from the Swedish Epidemiologic Investigation of RA (EIRA). We created a weighted genetic risk score (GRS), where the weight for each risk allele is the log of the published odds ratio. We used logistic regression to study associations with incident RA. We compared AUCs from a clinical-only model and clinical + genetic model in each cohort.
Patients with GRS > 1.25 standard deviations of the mean had a significantly higher OR of seropositive RA in both NHS (OR=2.9, 95%CI 1.8–4.6) and EIRA (OR=3.4, 95% CI 2.3–5.0) referent to the population average. In NHS, the AUC for a clinical model was 0.57 and for a clinical + genetic model was 0.66, and in EIRA was 0.63 and 0.75, respectively.
The combination of 22 risk alleles into a weighted genetic risk score significantly stratifies individuals for RA risk beyond clinical risk factors alone. However, given the low incidence of RA, the clinical utility of a weighted genetic risk score is limited in the general population.
rheumatoid arthritis; polymorphism; autoantibodies; anti-CCP; smoking
To examine the association of previously identified autoimmune disease susceptibility loci with granulomatosis with polyangiitis (GPA, formerly known as Wegener’s granulomatosis), and determine whether genetic susceptibility profiles of other autoimmune diseases are associated with GPA
Genetic data from two cohorts were meta-analyzed. Genotypes for 168 previously identified single nucleotide polymorphisms (SNPs) associated with susceptibility to different autoimmune diseases were ascertained for a total of 880 GPA cases and 1969 controls of European descent. Single marker associations were identified using additive logistic regression models. Multi-SNP associations with GPA were assessed using genetic risk scores based on susceptibility loci for Crohn’s disease, type 1 diabetes, systemic lupus erythematosus, rheumatoid arthritis, celiac disease, and ulcerative colitis. Adjustment for population substructure was performed in all analyses using ancestry informative markers and principal components analysis.
Genetic polymorphisms in CTLA4 were significantly associated with GPA in the single-marker meta-analysis (OR 0.79. 95% CI 0.70–0.89, p=9.8×10−5). A genetic risk score based on rheumatoid arthritis susceptibility markers was significantly associated with GPA (OR 1.05 per 1-unit increase in genetic risk score, 95% CI 1.02–1.08, p=5.1×10−5).
Rheumatoid arthritis and GPA may arise from a similar genetic predisposition. Aside from CTLA4, other loci previously found to be associated with common autoimmune diseases were not statistically associated with GPA in this study.
genetics; vasculitis; granulomatosis with polyangiitis; rheumatoid arthritis; CTLA4
HLA-DRB1 shared epitope (HLA-SE), PTPN22 and CTLA4 alleles are associated with CCP+ RA.
We examined associations between HLA-SE, PTPN22, CTLA4 genotypes and RA phenotypes in a large cohort to (a) replicate prior associations with CCP status, and (b) determine associations with radiographic erosions and age of diagnosis.
689 RA patients from the Brigham RA Sequential Study (BRASS) were genotyped for HLA-SE, PTPN22 (rs2476601) and CTLA4 (rs3087243). Association between genotypes and CCP, RF erosive phenotypes and age at diagnosis were assessed with multivariable models adjusting for age, sex, and disease duration. Novel causal pathway analysis was used to test the hypothesis that genetic risk factors and CCP are in the causal pathway for predicting erosions.
In multivariable analysis, presence of any HLA-SE was strongly associated with CCP+ (OR 3.05 (2.18–4.25)), and RF+ (OR 2.53 (1.83–3.5)) phenotypes; presence of any PTPN22 T allele was associated with CCP+ (OR 1.81 (1.24–2.66)) and RF+ phenotypes (1.84 (1.27–2.66)). CTLA4 was not associated with CCP or RF phenotypes. While HLA-SE was associated with erosive RA phenotype (OR 1.52 (1.01–2.17)), this was no longer significant after conditioning on CCP. PTPN22 and CTLA4 were not associated with erosive phenotype. Presence of any HLA-SE was associated with on average 3.6 years earlier diagnosis compared with absence of HLA-SE (41.3 vs. 44.9 years, p=0.003) and PTPN22 was associated with 4.2 years earlier age of diagnosis (39.5 vs. 43.6 years, p=0.002). CTLA4 genotypes were not associated with age at diagnosis of RA.
In this large clinical cohort, we replicated the association between HLA-SE, and PTPN22 but not CTLA4 with CCP+ and RF+ phenotypes. We also found evidence for associations between HLA-SE, and PTPN22 and earlier age at diagnosis. Since HLA-SE is associated with erosive phenotype in unconditional analysis, but is not significant after conditioning on CCP, this suggests that CCP is in the causal pathway for predicting erosive phenotype.
rheumatoid arthritis; age at diagnosis; PTPN22; HLA; CCP
To identify additional variants in the major histocompatibility complex (MHC) region that independently contribute to risk in 2 disease subsets of rheumatoid arthritis (RA) defined according to the presence or absence of antibodies to citrullinated protein antigens (ACPAs).
In a multistep analytical strategy using unmatched as well as matched analyses to adjust for HLA–DRB1 genotype, we analyzed 2,221 single-nucleotide polymorphisms (SNPs) spanning 10.7 Mb, from 6p22.2 to 6p21.31, across the MHC. For ACPA-positive RA, we analyzed samples from the Swedish Epidemiological Investigation of Rheumatoid Arthritis (EIRA) and the North American Rheumatoid Arthritis Consortium (NARAC) studies (totaling 1,255 cases and 1,719 controls). For ACPA-negative RA, we used samples from the EIRA study (640 cases and 670 controls). Plink and SAS statistical packages were used to conduct all statistical analyses.
A total of 299 SNPs reached locus-wide significance (P < 2.3 × 10−5) for ACPA-positive RA, whereas surprisingly, no SNPs reached this significance for ACPA-negative RA. For ACPA-positive RA, we adjusted for known DRB1 risk alleles and identified additional independent associations with SNPs near HLA–DPB1 (rs3117213; odds ratio 1.42 [95% confidence interval 1.17–1.73], Pcombined = 0.0003 for the strongest association).
There are distinct genetic patterns of MHC associations in the 2 disease subsets of RA defined according to ACPA status. HLA–DPB1 is an independent risk locus for ACPA-positive RA. We did not identify any associations with SNPs within the MHC for ACPA-negative RA.
Next-generation DNA sequencing reveals rare alleles protective from type 1 diabetes.
A large population study using ultra-high-throughput DNA sequencing to re-sequence a genetic locus associated with type 1 diabetes reveals rare protective alleles.
The co-occurrence of autoimmune diseases such as rheumatoid arthritis (RA) and type 1 diabetes (T1D) has been reported in individuals and families. We studied the strength and nature of this association at the population level.
We conducted a case-control study of 1419 incident RA cases and 1674 controls between 1996 and 2003. Subjects were recruited from university, public and private rheumatology units throughout Sweden. Blood samples were tested for the presence of antibodies to cyclic citrullinated peptide (anti-CCP), rheumatoid factor (RF) and the presence or absence of the 620W PTPN22 allele. Information on history of diabetes was obtained by questionnaire, telephone interview, and medical record review. The prevalence of T1D and type 2 diabetes (T2D) was compared between incident RA cases and controls and further stratified by anti-CCP, RF status, and the presence of the PTPN22 risk allele.
T1D was associated with an increased risk of RA, OR 4.9 (95% CI 1.8–13.1), and was specific for anti-CCP+ RA, OR 7.3 (95% CI 2.7–20.0), but not anti-CCP negative RA. Further adjustment for PTPN22 attenuated the odds ratio for anti-CCP+ RA in individuals with T1D to 5.3 (95% CI 1.5–18.7). No association was observed between RA and T2D.
The association between T1D and RA is specific for a particular RA subset, anti-CCP+ RA. The risk of type 1 diabetics developing RA later in life may be attributed in part to the presence of the 620W PTPN22 allele, suggesting a common pathway for the pathogenesis of these two diseases.
The single nucleotide polymorphism (SNP) rs11761231 on chromosome 7q has been reported as a sexually dimorphic marker for rheumatoid arthritis susceptibility in a British population. We sought to replicate this finding and better characterize susceptibility alleles in the region in a North American population.
DNA from two North American collections of RA patients and controls (1605 cases and 2640 controls) was genotyped for rs11761231 and 16 additional chromosome 7q tag SNPs using Sequenom iPlex assays. Association tests were performed for each collection and also separately contrasting male cases versus male controls and female cases versus female controls. Principal components analysis (EIGENSTRAT) was used to determine association with RA before and after adjusting for population stratification in the subset of the samples (772 cases and 1213 controls) with whole genome SNP data.
We failed to replicate association of the 7q region with rheumatoid arthritis. Initially, rs11761231 showed evidence for association with RA in the NARAC collection (p=0.0076) and rs11765576 showed association with RA in both the NARAC (p = 0.019) and RA replication (p = 0.0013) collections. These markers also exhibited sexual differentiation. However, in the whole genome subset, neither SNP showed significant association with RA after correction for population stratification.
While two SNPs on chromosome 7q appeared to be associated with RA in a North American cohort, the significance of this finding did not withstand correction for population substructure. Our results emphasize the need to carefully account for population structure to avoid false positive disease associations.
For Genetic Analysis Workshop 16 Problem 1, we provided data for genome-wide association analysis of rheumatoid arthritis. Single-nucleotide polymorphism (SNP) genotype data were provided for 868 cases and 1194 controls that had been assayed using an Illumina 550 k platform. In addition, phenotypic data were provided from genotyping DRB1 alleles, which were classified according to the rheumatoid arthritis shared epitope, levels of anti-cyclic citrullinated peptide, and levels of rheumatoid factor IgM. Several questions could be addressed using the data, including analysis of genetic associations using single SNPs or haplotypes, as well as gene-gene and genetic analysis of SNPs for qualitative and quantitative factors.
SLE is an autoimmune disease influenced by genetic and environmental components. We performed a genome-wide association scan (GWAS) and observed novel association evidence with a variant inTNFAIP3(rs5029939, P = 2.89×10−12, OR = 2.29). We also found evidence of two independent signals of association to SLE risk, including one described in Rheumatoid Arthritis. These results establish that genetic variation inTNFAIP3contributes to differential risk for SLE and RA.
Translating a set of disease regions into insight about pathogenic mechanisms requires not only the ability to identify the key disease genes within them, but also the biological relationships among those key genes. Here we describe a statistical method, Gene Relationships Among Implicated Loci (GRAIL), that takes a list of disease regions and automatically assesses the degree of relatedness of implicated genes using 250,000 PubMed abstracts. We first evaluated GRAIL by assessing its ability to identify subsets of highly related genes in common pathways from validated lipid and height SNP associations from recent genome-wide studies. We then tested GRAIL, by assessing its ability to separate true disease regions from many false positive disease regions in two separate practical applications in human genetics. First, we took 74 nominally associated Crohn's disease SNPs and applied GRAIL to identify a subset of 13 SNPs with highly related genes. Of these, ten convincingly validated in follow-up genotyping; genotyping results for the remaining three were inconclusive. Next, we applied GRAIL to 165 rare deletion events seen in schizophrenia cases (less than one-third of which are contributing to disease risk). We demonstrate that GRAIL is able to identify a subset of 16 deletions containing highly related genes; many of these genes are expressed in the central nervous system and play a role in neuronal synapses. GRAIL offers a statistically robust approach to identifying functionally related genes from across multiple disease regions—that likely represent key disease pathways. An online version of this method is available for public use (http://www.broad.mit.edu/mpg/grail/).
Modern genetic studies, including genome-wide surveys for disease-associated loci and copy number variation, provide a list of critical genomic regions that play an important role in predisposition to disease. Using these regions to understand disease pathogenesis requires the ability to first distinguish causal genes from other nearby genes spuriously contained within these regions. To do this we must identify the key pathways suggested by those causal genes. In this manuscript we describe a statistical approach, Gene Relationships Across Implicated Loci (GRAIL), to achieve this task. It starts with genomic regions and identifies related subsets of genes involved in similar biological processes—these genes highlight the likely causal genes and the key pathways. GRAIL uses abstracts from the entirety of the published scientific literature about the genes to look for potential relationships between genes. We apply GRAIL to four very different phenotypes. In each case we identify a subset of highly related genes; in cases where false positive regions are present, GRAIL is able to separate out likely true positives. GRAIL therefore offers the potential to translate disease genomic regions from unbiased genomic surveys into the key processes that may be critical to the disease.
We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record.
Materials and Methods
The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values.
Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers.
Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.
We carried out a genome-wide association study of genetic predictors of anti-cyclic citrullinated peptide antibody (anti-CCP) level in 531 self-reported non-Hispanic Caucasian Rheumatoid Arthritis (RA) patients enrolled in the Brigham Rheumatoid Arthritis Sequential Study (BRASS). For replication, we then analyzed 289 single nucleotide polymorphisms (SNPs) with P < 0.001 in BRASS in an independent population of 849 RA patients from the North American Rheumatoid Arthritis Consortium (NARAC). BRASS and NARAC samples were genotyped using the Affymetrix 100K and Illumina 550K platforms respectively. Association between SNPs and anti-CCP titer was tested using general linear models. The five most significant SNPs from BRASS all were within the major histocompatibility complex (MHC) region (P ≤ 3.5 × 10−6). After controlling for the human leukocyte antigen shared epitope (HLA-SE), the top SNPs still yielded P values < 0.0002. In NARAC, a single SNP from the MHC region near BTNL2 and HLA-DRA, rs1980493 (r2 = 0.85 with the top five SNPs from BRASS), was associated significantly with CCP titer (P = 6.1 × 10−5) even after adjustment for the HLA-SE (P = 0.0002). The top SNPs found in BRASS and NARAC had r2 = 0.46 and 0.64, respectively, to HLA-DRB1 DR3 alleles. These results confirm that the most significant genome region affecting anti-CCP titers in RA is the MHC region. We identified a SNP in moderate linkage disequilibrium (LD) with HLA-DR3, which may influence anti-CCP titer independently of the HLA-SE.