|Home | About | Journals | Submit | Contact Us | Français|
Acute severe ulcerative colitis (UC) remains a significant clinical challenge and the ability to predict, at an early stage, those individuals at risk of colectomy for medically refractory UC (MR-UC) would be a major clinical advance. The aim of this study was to use a genome-wide association study (GWAS) in a well-characterized cohort of UC patients to identify genetic variation that contributes to MR-UC.
A GWAS comparing 324 MR-UC patients with 537 Non-MR-UC patients was analyzed using logistic regression and Cox proportional hazards methods. In addition, the MR-UC patients were compared with 2601 healthy controls.
MR-UC was associated with more extensive disease (p= 2.7×10−6) and a positive family history of UC (p= 0.004). A risk score based on the combination of 46 SNPs associated with MR-UC explained 48% of the variance for colectomy risk in our cohort. Risk scores divided into quarters showed the risk of colectomy to be 0%, 17%, 74% and 100% in the four groups. Comparison of the MR-UC subjects with healthy controls confirmed the contribution of the major histocompatibility complex to severe UC (peak association: rs17207986, p= 1.4×10−16) and provided genome-wide suggestive association at the TNFSF15 (TL1A) locus (peak association: rs11554257, p= 1.4×10−6).
A SNP-based risk scoring system, identified here by GWAS analyses, may provide a useful adjunct to clinical parameters for predicting natural history in UC. Furthermore, discovery of genetic processes underlying disease severity may help to identify pathways for novel therapeutic intervention in severe UC.
Genome-wide association studies (GWAS) have advanced our knowledge of the genetic contribution to ulcerative colitis (UC) and led to the identification of approximately 20 UC-associated loci, including the major histocompatibility complex (MHC) 1–6. These advances in UC, in addition to those made in CD, have significantly increased our understanding of the underlying pathogenic processes that lead to chronic mucosal inflammation. Yet despite the identification of approximately 50 IBD susceptibility loci, there has been minimal clinical benefit for IBD patients or clinicians 4, 7.
Patients with UC demonstrate a degree of heterogeneity as age of onset, disease extent, natural history, response to medical therapies, and need for surgery vary between individuals. Medically refractory ulcerative colitis (MR-UC) requiring colectomy remains a significant challenge in the management of IBD. In one five-year study, more than 12% of ~1100 UC patients developed toxic, fulminant or severe colitis 8; in a population-based study in Norway, 7.5% of approximately 450 UC patients followed up for five years required colectomy 9; and in another study from Copenhagen, more than 30% of UC patients required colectomy 18 years after diagnosis 10. While a number of prognostic models have been developed for predicting outcomes in acutely severe UC, it would be a significant advance if patients at high risk for severe disease could be identified earlier in their course of disease, as they may benefit from an earlier introduction of more intensive therapy and monitoring. Previous studies have shown that severe UC is more prevalent in non-smokers and ex-smokers, in those with an extensive distribution of disease 11, 12, and may also be more common in Caucasians than Hispanics and Asians 13, 14. However, none of these parameters alone demonstrate adequate clinical utility.
A number of studies have examined the role of genetic variation in the development of extensive and severe UC and have implicated loci including the MHC and the multidrug resistance gene 1 (MDR1/ABCB1) 15–23. These data support the hypothesis that genetic variation contributes not only to UC susceptibility, but also to clinical phenotype and natural history. Furthermore, recent advances in IBD genetics have confirmed that genetically complex conditions such as IBD are caused by multiple genetic variants. A number of studies combining information from multiple loci have shown some utility in other complex conditions such as diabetes 24–26. Crohn’s disease investigators, including our own group, have started utilizing these approaches for predicting both natural history in and response to anti-TNF therapy in CD 27, 28. In this study, we aimed to uncover genetic associations with MR-UC both to assess their utility in identifying patients at an increased risk of colectomy and to identify potential novel therapeutic targets for the treatment of severe UC.
Ulcerative Colitis (UC) subjects (n= 929) were recruited at Cedars Sinai-Medical Center Inflammatory Bowel Disease Center following informed consent after approval by the Institutional Review Board. UC diagnosis was based on standard criteria29. UC subjects requiring colectomy for severe disease refractory to medical therapies were classified as medically refractory UC (MR-UC). Approximately 66% of the MR-UC patients had been treated with cyclosporine and 24% with biologic therapies prior to colectomy. Subjects requiring colectomy where the indication was for treatment of cancer/dysplasia, in addition to subjects not requiring colectomy, were classified as Non-MR-UC. Subjects who required colectomy for MR-UC and were subsequently found to have evidence of dysplasia or carcinoma in the resected colon were classified as MR-UC (n= 3). For the MR-UC cohort, time from diagnosis to date of colectomy was collected; time from diagnosis to last follow-up visit was obtained for the Non-MR-UC cohort. Samples which did not genotype successfully (n= 16), exhibited gender mismatch (n= 9) or cryptic relatedness (n= 13), or were considered outliers by principal components analysis (n= 30; see below) were excluded. Following these measures, 861 UC subjects (MR-UC n= 324; Non-MR-UC n= 537) were included in the analyses.
Controls were obtained from the Cardiovascular Health Study (CHS), a population-based cohort study of risk factors for cardiovascular disease and stroke in adults 65 years of age or older, recruited at four field centers 30, 31. 5,201 predominantly Caucasian individuals were recruited in 1989–1990 from random samples of Medicare eligibility lists, followed by an additional 687 African-Americans recruited in 1992–1993 (total n= 5,888). CHS was approved by the Institutional Review Board at each recruitment site, and subjects provided informed consent for the use of their genetic information. A total of 2,601 Caucasian non-IBD control subjects who underwent GWAS were included in these analyses. African-American CHS participants were excluded from analysis due to insufficient number of ethnically-matched cases.
All genotyping was performed at the Medical Genetics Institute at Cedars-Sinai Medical Center using Infinium technology (Illumina, San Diego, CA) 32, 33. UC cases were genotyped with either the HumanCNV370-Quad or Human610-Quad platform; controls were genotyped with the HumanCNV370-Duo platform. Identity-by-descent was used to exclude related individuals (Pi-hat scores >0.5; PLINK) 34. Average genotyping rate among cases and controls retained in the analysis was >99.8% and >99.2%, respectively. Single nucleotide polymorphisms (SNPs) were excluded based on: test of Hardy-Weinberg Equilibrium p <10−3; SNP failure rate >10%; MAF <3%; SNPs not found in dbSNP Build 129. 313,720 SNPs passed quality control measures and were common in all data sets.
Principal components analysis (Eigenstrat as implemented in Helix Tree) (Golden Helix, Bozeman, MT) was conducted to examine population stratification 35. Extreme outliers, defined as subjects more than two standard deviations away from the distribution of the rest of the samples for any component, were removed. All African-American participants identified by principal components analysis were excluded from these analyses. Genetic heterogeneity following correction for population sub-structure was low, with estimated genomic inflation factors (λGC) of 1.04 and 1.06 for MR-UC vs. Non-MR-UC, and MR-UC cases vs. Non-IBD controls analyses, respectively 36.
Single marker association analysis of MR-UC vs. Non-MR-UC (analysis-I) was performed using a logistic regression model correcting for population stratification using 20 principal components as covariates (PLINK v1.06) 34. Association between medically refractory disease (MR-UC) and the top 100 SNPs together (as determined by the lowest corrected p-values) from analysis-I were tested using a stepwise logistic regression model. SNPs were further analyzed by Cox proportional hazards regression utilizing time-to information, as described above for UC cases (using the step and glm, and coxph functions, respectively, in R v2.9.0) 37. 37 SNPs identified with logistic regression p <0.05 and Cox proportional hazards p <0.1 were retained in the risk model. The 100 SNPs (p <3×10−4) evaluated from analysis-I are listed in Supplementary Table S1. A genome-wide Cox proportional hazards regression analysis (analysis-II) was then performed on a subset of our UC cohort (MR-UC subjects with colectomy <60 months, n= 187; Non-MR-UC followed up >60 months, n= 328) correcting for population stratification using two principal components as covariates (PLINK). The top 65 SNPs (8 of which overlap with the 100 SNPs from analysis-I above) were tested together (using coxph function in R). The 65 SNPs (p <1×10−4) from analysis-II are listed in Supplementary Table S2. From these 65 SNPs, 9 SNPs were identified (p <3×10−4) and combined with the 37 SNPs from analysis-I to identify a final risk model consisting of 46 SNPs (see Figure 1 for schematic). A genetic risk score was calculated from the total number of risk alleles (0, 1, or 2) across all 46 risk SNPs (theoretical range: 0–92). Risk score (observed range: 28–60) was divided into quarters: scores 28–38 (risk-A); scores 39–45 (risk-B); scores 46–52 (risk-C); and scores 53–60 (risk-D). Receiver operating characteristic (ROC) curve and area under the ROC curve (AUC) were calculated using R software v2.9.0, including packages survival and survivalROC 37–39. Sensitivity and specificity curves, positive and negative predictive values, positive (sensitivity/1−specificity) and negative likelihood ratio (1−sensitivity/specificity) were all calculated using the R package ROCR 40. 1000-fold replication of 10-fold cross-validation was implemented to validate the fitted logistic regression model. Mean sensitivity and specificity were then re-calculated using the 1000 replicated samples. Bootstrap method with 1000-fold replication was utilized for estimating variability of hazard ratio estimated from the Cox regression model 41, 42. The hazard ratio in survival analysis is the effect of an explanatory variable on the hazard or risk of an event.
Single marker analysis of genome-wide data for MR-UC cases vs. Non-IBD Caucasian controls from CHS (analysis-III) was performed as above, using logistic regression correcting for 20 principal components (PLINK).
Complete temporal data was available on 861 UC subjects (MR-UC n= 324; Non-MR-UC n= 537). The demographic data of our cohort is summarized in Table 1. We observed no differences in gender, median age of onset of disease, and smoking status between our medically refractory and Non-MR-UC subjects. There was a significant difference in our median disease duration (p= 7.4×10−9), with the time from diagnosis to last follow-up in the Non-MR-UC cohort nearly double the time from diagnosis to colectomy in our MR-UC subjects. Additionally, there was a significantly higher incidence of disease that extended proximal to the splenic flexure (p= 2.7×10−6) in the MR-UC group when compared to Non-MR-UC, consistent with previously published data 43. We identified a novel association between a family history (first or second degree relative) of UC and the development of MR-UC (p= 0.004).
We performed a GWAS on 324 MR-UC and 537 Non-MR-UC subjects. Results of this analysis (analysis-I) are given in Table S1 and discussed below. Following identification of single markers associated with MR-UC, we proceeded to a multivariate approach. Beginning with the top 100 results from analysis-I (p <3×10−4), we performed a stepwise logistic regression and identified 64 SNPs (all had p <0.05) that together were associated with medically refractory disease (MR-UC) and were carried forward to survival analysis. Of these 64 SNPs, 37 SNPs remained (Cox proportional hazards regression p <0.1; OR 1.2–1.8), which explained 40% of the variance for MR-UC (Table 2, see non-bolded SNPs). In order to elucidate the maximum discrimination, i.e. greatest percentage of the variance, we further performed a genome-wide Cox proportional hazards regression analysis (analysis-II) on a subset of our UC cohort (see Methods) to identify SNPs involved in earlier progression to colectomy. Testing together the top 65 SNPs from this analysis (Table S2, p <1×10−4), we identified nine SNPs with Cox proportional hazards p <3×10−4 (individual OR ranged from 1.4–1.6), explaining 17% of the variance (Table 2, see bolded SNPs). Beginning with our previously identified 37 risk SNP model, these 9 SNPs were added sequentially to the model. This analysis resulted in the final risk model consisting of 46 SNPs (OR for MR-UC for each individual SNP ranged from 1.2–1.9), which explained 48% of the variance for colectomy in our MR-UC cohort (Table 2).
We calculated a genetic risk score from the total number of risk alleles across all 46 risk SNPs (theoretical range: 0–92). The observed risk score ranged from 28–60, and was significantly associated with MR-UC (logistic regression and Cox proportional hazards p-values <10−16). An ROC curve using this risk score gave an AUC of 0.91. The sensitivity of the fitted model for MR-UC was 0.793, with a specificity of 0.858. Using 1000 replicates of the 10-fold cross-validation data, we obtained a mean sensitivity of 0.789 (SD= 0.0067) and mean specificity of 0.859 (SD= 0.002; Table 3). This indicates that the fitted model was robust and only ~0.4% over-fitting was observed. The hazard ratio was estimated to be 1.313 from the Cox regression model. 1000 replicates of bootstrapped samples gave an estimated hazard ratio of 1.314 (SD= 0.017; Table 3).
Based on the genetic risk scores, we grouped our UC cohort into four risk categories (see Methods); less than 1% of cases in the lowest risk category (risk-A) were MR-UC and the percentage of MR-UC increased to ~17%, ~74% and 100% in risk-B, -C and -D groups, respectively (Figure 2A; χ2 test for trend p <2.2×10−16). The median time to colectomy for risk-C and -D categories was 72 months and 23 months, respectively (given the low incidence of MR-UC in groups -A and -B, it was not possible to calculate the median time to colectomy for these two groups). Progression to colectomy within 2 and 5 years of diagnosis may be more clinically relevant and while no individuals in the risk-A category had undergone colectomy at either 2 or 5 years after diagnosis, the respective incidence of MR-UC at 2 years for risk groups -B, -C and -D was 3.1%, 19.1%, and 62%, respectively, and at 5 years was 8.3%, 50%, and 80%, respectively (Figure 2B). At five years from diagnosis, either the total risk score (AUC 0.86) or the risk category (AUC 0.82) are able to predict patients that will require surgery. The operating characteristics of the risk score system are shown in Supplementary Figure 1. A score of 44 (Figure S1-C) and 47 (Figure S1-D) can be used to generate a test with a sensitivity (to exclude a diagnosis of colectomy) and specificity (to include a diagnosis) of over 90%, respectively.
Loci corresponding to the 46 SNPs in our risk model include several compelling candidate genes for UC severity and suggest potential biological pathways for further avenues of study (shown Table 2 and discussed later). As each risk SNP contributes only modestly to the overall risk of MR-UC (OR 1.2–1.9), this work supports the paradigm that a group of SNPs, identified by GWAS and combined together may account for a large proportion of the genetic contribution to a complex phenotype (48% of the variance for risk in this study) to provide a risk score with clinical utility.
Association analyses between 324 UC subjects with MR-UC and 2,601 population-matched controls confirmed a major contribution of the major histocompatibility (MHC) on chromosome 6p to the development of severe UC (analysis-III, Figure 3). Ten SNPs in MHC reached a priori defined level of genome-wide significance (p≤5×10−7; 82 SNPs with p <1×10−3), with peak association at rs17207986 (p= 1.4×10−16; Figure 4A and Supplementary Table S3). Three SNPs on chromosome 9q32, a locus which contains the known IBD susceptibility gene TNFSF15 (TL1A), achieved genome-wide suggestive significance (p <5×10−5), with the most significant association seen at rs11554257 (p= 1.4×10−6; Figure 4B). In addition, we observed association with several known and putative UC loci, including interleukin (IL)-10 (1q32.1), IL-12B (5q33.3), 12q15 (IFNG/IL-26), ZFP90 (16q22.1) and GSDML/ORMDL3 (17q12) when comparing MR-UC and controls 1, 3, 4, 44, 45. An association with ZFP90 was also observed in our analysis of MR-UC and Non-MR-UC. Furthermore, we confirmed association with a newly identified UC locus, KIF1A (2q37.3), in our association analysis of MR-UC and Non-MR-UC 44 (Table 4).
Utilizing a GWAS approach of a well-characterized UC cohort and a large healthy control group, we confirmed the contribution of the MHC to severe UC at a genome-wide level of significance and our data suggest that there is more than one severity associated ‘signal’ from this locus. We also implicated TNFSF15 (TL1A) in UC severity, with potential therapeutic implications 46. We confirmed that extensive disease and identified that a family history of UC are associated with the need for surgery (Table 1), justifying our hypothesis that genetic variation contributes to the natural history of UC. The 46 SNP model discriminates patients at risk of MR-UC and explains approximately 50% of the genetic contribution to the risk of surgery in our cohort. Higher risk score categories had an elevated percentage of MR-UC subjects (p <2.2×10−16; Figure 2A) and predicted earlier colectomy (Figure 2B).
The predictive power of diagnostic tests can be evaluated by the area under the curve (AUC), an ROC summary index, which evaluates the probability that one’s test correctly identifies a diseased subject from a pair of affected and unaffected individuals. A perfect test has an AUC of 1.0, while random chance gives an AUC of 0.5 39, 47. Screening programs attempting to identify high-risk groups generally have an AUC of ~0.80 48. The genetic risk score reported here yielded an AUC of 0.91. Furthermore, we calculated operating characteristics for our model (see Methods; Table 3 and Figure S1) and ‘scores’ of 44 and 47 (out of a possible score of 60; Figure S1-C and D) demonstrate a sensitivity and specificity of over 90%, respectively. The fitted model was robust, given the comparable mean sensitivity and specificity following cross-validation (Table 3). In addition, likelihood ratios can be used with differing pre-test probabilities to calculate relevant post-test probabilities and are therefore much more generalizable. The Cochrane collaboration has suggested that positive likelihood ratios of greater than 10 and negative likelihood ratios of less than 0.1 are likely to make a significant impact on health care (reviewed in 49). In our model, these ratios are met with a risk score of 47 and 43, respectively (Figure S1-A and B).
The utility of a model with this ability to discriminate can be demonstrated with the following example: a newly diagnosed patient with UC is estimated to have a pre-test probability of colectomy for MR-UC of approximately 20% (based on epidemiological and clinical data) and has a genetic risk score of 47 (positive likelihood ratio of approximately 10); utilizing Bayesian principles, this equates to a post-test probability of colectomy of approximately 75%. If patients at high risk for colectomy could be identified early in their course of disease, then this could have significant consequences for clinicians who might consider earlier introduction of more potent medication and more intense monitoring for high risk patients. Furthermore, appropriate counseling regarding the risk of colectomy (early introduction to stoma therapists, etc.), the importance of medication compliance, and the need for protecting the anal sphincter in obstetric management may also be appropriate for these patients.
Recent advances in IBD genetics have identified novel pathways involved in disease pathogenesis, and similarly genes associated with more severe disease may highlight new therapeutic targets. A number of interesting genes are highlighted in our analysis comparing MR-UC and Non-MR-UC (see Table S1 and Table 2). Bicaudal-D1 (BICD1), implicated in transport from the Golgi apparatus to the endoplasmic reticulum, has been shown to localize to Chlamydia trachomatis inclusions, suggesting a potential role for BICD1 in the bacterial-host interface 50, 51. Recently, our own group and others have implicated genes encoding proteins involved in epithelial barrier integrity, an important component of innate immunity, in UC pathogenesis 45, 53–55. The identification of MAGI1, a scaffolding protein at cell-cell junctions that localizes at the tight junctions of intestinal epithelial cells, provides further evidence for this phenomenon 52. Retinoid-related orphan receptors (ROR) are a family of nuclear hormone receptors that have an established role in autoimmune disease 56 and in particular ROR-α (associated with MR-UC) plays a critical role in cellular stress response and is highly expressed in Th17 cells, and upregulates IL-17 and IL-17F expression 57. Furthermore, the association between FGFBP2 (KSP37) and MR-UC supports a role for cytotoxic lymphocyte-mediated immunity 58 and interestingly, increased FGFBP2 expression is found in asthma 59.
We identified additional genes of interest when comparing the ‘extreme phenotypes’ of MR-UC and healthy individuals including variants in the MHC, TNFSF15 (TL1A), PLCB1 and CTSF. The MHC relationship with severe UC confirmed earlier associations 2–5, 15–20, 60, 61, with genome-wide levels of significance for ten SNPs (p≤5×10−7; Figure 3 and Table S3) and the suggestion of independent signals encompassing both the HLA-E and BTNL2/HLA-DRA regions (Figure 4A). TNFSF15 (TL1A) is also implicated in severe UC (17 SNPs associated with MR-UC; p <1×10−3), with three SNPs showing suggestive evidence of genome-wide significance (p <5×10−5; Figure 4B). TNFSF15 variants are associated with both CD and UC in a broad spectrum of populations 62–65 and increased TNFSF15 mRNA and protein expression have been demonstrated in inflamed mucosa of the colon and small bowel in CD patients 66, 67. Increased TNFSF15 expression is also correlated with severity of ileal and colonic inflammation in the mouse, supporting our finding of an association with severe disease 46, 66. Administration of neutralizing TNFSF15 antibodies prevent and treat established chronic intestinal inflammation, by suppressing the production of interferon-gamma and IL-17/IL-6 46. Taken together, these data suggest that therapies directed against TNFSF15 may prove useful in the setting of acute severe UC.
Also implicated in severe UC is phospholipase C-β1 (PLCB1; rs6039206, p= 1.5×10−5), an intracellular transductor of extracellular signals, including IL-1β and TNF-α in rheumatoid arthritis 68. Furthermore, PLCB1 has been implicated as a potential asthma modifier gene 69. We also observed a cluster of SNPs at the CTSF locus (association peak at rs540874, p= 4.5×10−5). CTSF (cathespin-F) represents a major component of the lysosomal proteolytic system, a critical determinant of the timing and location of MHC class II peptide loading in macrophages, further implicating the critical role that the MHC plays in disease severity 70.
We have confirmed the association with the MHC and disease severity in UC and our data suggest that there may be more than one ‘signal’ from this locus. Furthermore, we have also implicated a realistic therapeutic target and known IBD locus, TNFSF15 (TL1A), suggesting that interference with this pathway is a worthy avenue for study in severe UC. In addition, we have demonstrated the utility of a model based on GWAS data for predicting the need for surgery in UC, although it is important to acknowledge that these findings clearly require replication in independent cohorts. However, these data suggest that even though the individual effect of any severity associated variant is likely to be small, cumulatively they may provide adequate discriminatory power for clinical use. These findings, if reproduced, would allow a more tailored approach to the management of UC patients, potentially provide a ‘blueprint’ for the utilization of large-scale genomic data in clinical practice, as well as potentially identifying additional targets for early therapeutic intervention in more aggressive UC.
The clinical utility of the MR-UC risk score is demonstrated by positive (A) and negative (B) likelihood ratios; sensitivity (C) and specificity (D); and positive (E) and negative (F) predictive values. A score of 44 (C) and 47 (D) can be used to generate a test with a sensitivity and specificity of over 90%, respectively. Recommended (see discussion) positive likelihood ratio parameter of >10 and negative likelihood ratio parameter of <0.1 are met with a score of 47 (A) and 43 (B), respectively.
Grant Support: This study was supported in part by NCRR grant M01-RR00425 to the Cedars-Sinai General Research Center Genotyping core; NIH/NIDDK grant P01-DK046763; the the Southern California Diabetes Endocrinology Research Center grant, DK063491; Cedars-Sinai Medical Center Inflammatory Bowel Disease Research Funds; The Feintech Family Chair in IBD (S.R.T.); The Abe and Claire Levine Chair in Pediatric IBD (M.D.) and The Cedars-Sinai Board of Governors’ Chair in Medical Genetics (J.I.R.). Additional funding through grants DK76984 (M.D.) and DK084554 (M.D. and D.P.B.M). The CHS research reported in this article was supported by contract numbers N01-HC-85079 through N01-HC-85086, N01-HC-35129, N01 HC-15103, N01 HC-55222, N01-HC-75150, N01-HC-45133; grant numbers U01 HL080295 and R01 HL087652 from the NHLBI, with additional contribution from the National Institute of Neurological Disorders and Stroke. A full list of principal CHS investigators and institutions can be found at http://www.chs-nhlbi.org/pi.htm.
Disclosures: Authors have no disclosures to declare. No conflicts of interest exist.