|Home | About | Journals | Submit | Contact Us | Français|
Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease characterized by inflammation and fibrosis of the bile ducts. Both environmental and genetic factors contribute to its pathogenesis. To further clarify its genetic background, we investigated susceptibility loci recently identified for ulcerative colitis (UC) in a large cohort of 1186PSC patients and 1748 controls. Single nucleotide polymorphisms (SNPs) tagging 13 UC susceptibility loci were initially genotyped in 854 PSC patients and 1491 controls from the Benelux (331 cases, 735 controls), Germany (265 cases, 368 controls) and Scandinavia (258 cases, 388 controls). Subsequently, a joint analysis was performed with an independent second Scandinavian cohort (332 cases, 257 controls). SNPs at chromosomes2p16 (p value 4.12×10−4), 4q27 (p value 4.10×10−5) and 9q34 (p value 8.41×10−4) were associated with PSC in the joint analysis after correcting for multiple testing. In PSC patients without inflammatory bowel disease(IBD), SNPs at 4q27and9q34 were nominally associated (p<0.05). We applied additional in silico analyses to identify likely candidate genes at PSC susceptibility loci. To identify non-random, evidence-based links we used GRAIL analysis showing interconnectivity between genes in six out of in total nine PSC-associated regions. Expression quantitative trait analysis from 1469 Dutch and UK individuals demonstrated that five out of nine SNPs had an effect on cis-gene expression. These analyses prioritized IL2, CARD9 and REL as novel candidates.
We have identified three UC susceptibility loci to be associated with PSC, harboring the putative candidate genes REL, IL2 and CARD9. These results add to the scarce knowledge on the genetic background of PSC and imply an important role for both innate and adaptive immunological factors.
Primary sclerosing cholangitis (PSC) is a chronic cholestatic liver disease characterized by inflammation and fibrosis of both intrahepatic and extrahepatic bile ducts.(1) Currently there is no curative therapy available and, in most cases, PSC ultimately leads to liver cirrhosis and liver failure requiring liver transplantation.(2, 3) Population-based studies from the USA and Norway show an incidence of 0.9 and 1.3 per 100,000/year and prevalence of 14.2 and 8.5 per 100,000, respectively.(4, 5) About 50 to 80 percent of the patients with PSC also have inflammatory bowel disease (IBD), compatible with a diagnosis of ulcerative colitis (UC) in approximately 80 percent of these cases and of Crohn’s disease (CD) or unclassified IBD in the remaining 20 percent.(6)
PSC is a complex disease, with both environmental and genetic factors likely to determine its development and course. Siblings of PSC patients are 9 to 39 times more likely to develop PSC than the overall population. They also have an eight times increased risk of developing UC, suggesting a shared genetic component between both diseases.(7) Very little is known about the pathogenesis of PSC, but its high concordance with other immune-mediated diseases and its known genetic associations within the HLA complex suggest that PSC is an immune-mediated disease.(8)
Genome-wide association scans and candidate gene studies have identified more than 30 risk loci for UC in the past decade. Eighteen of these loci reached the threshold of genome-wide significance.(9–16) Identifying these loci has led to a better understanding of the biological pathways involved and implicatedTh1 and Th17 responses and the epithelial barrier in UC pathogenesis.(17) However, little is known about the genetic architecture of PSC. Since the discovery of associations between the HLA variants DR3 and HLA-B8 and PSC in 1982,(18) many studies have confirmed genetic associations between variants in the HLA complex and PSC.(19) Recently, two genome-wide association scans identified five PSC risk loci outside the HLA complex including 2q13, 2q35, 3p21, 10p15 and 13q31.(20, 21) Of these loci,2q35 and3p21are also associated with UC.(10–12) The causal genes for these regions still need to be determined, but the locus on chromosome 2q35 harbors a compelling candidate gene (the bile acid receptor TGR5).(22)
BCL2L11 is an interesting candidate for the association at 2q13because of its role in maintaining immunological tolerance, and IL2RA is of interest in PSC at 10p15 since IL2ra −/− mice spontaneously develop intestinal and biliary inflammation.(21)
Our aim was to further clarify the genetic background of PSC. Because of the clinical overlap between UC and PSC, we first investigated the association of 13recently identified UC susceptibility loci in a cohort of 1186PSC patients and 1748healthy controls. We further analyzed whether associations were limited to PSC patients with concomitant IBD or to PSC patients without IBD. Finally, we used eQTL mapping and Gene Relationships Across Implicated Loci (GRAIL) analysis (23) to propose likely causal genes from the nine known PSC-associated loci, including the three new ones.
We included 1186 primary sclerosing cholangitis (PSC) patients and 1748healthy controls in our cohorts. The initial three cohorts comprised 331 Benelux, 265 German and 258 Scandinavian patients, and735 Benelux, 368 German and 388Scandinavian healthy individuals. A second Scandinavian cohort included in the joint analysis consisted of 332 patients and 257 controls. Benelux patients were recruited at the Academic Medical Center in Amsterdam and the University Medical Center Groningen, the Netherlands, and at the University Hospital Leuven, Belgium. The German patients were recruited through the Northern German biobank popgen (http://www.popgen.de), and contain patients from the University Medical Center Hamburg-Eppendorf, the Hannover Medical School and the Christian-Albrechts-University Hospital Kiel. The PSC patients in the Scandinavian cohorts were recruited at the Rikshospitalet, Oslo University Hospital, Oslo, Norway, the Huddinge University Hospital, Stockholm, Sweden and the Sahlgrenska University Hospital, Gothenburg, Sweden. A description of the clinical characteristics of the PSC cases is provided in Supplementary table 1.
Healthy controls from the Benelux were blood donors recruited from donor centers in Groningen, the Netherlands, and from healthy volunteers recruited via the University Hospital Leuven, Belgium. DNA from the German controls was obtained from blood donors through the Northern German biobank popgen. The Scandinavian controls were randomly selected from the Norwegian Bone Marrow Donor Registry. Although blood bank donors and bone marrow donors are a selected group compared to population controls, they are regularly and confidently being used in genetic association studies.(24)
The diagnosis of primary sclerosing cholangitis was based on standard clinical, biochemical, histological and cholangiographic criteria.(25) The diagnosis of IBD was based on accepted clinical, radiologic, endoscopic and histopathologic criteria.(26) Phenotype information on IBD status was available for 984cases, of which 739cases had concurrent IBD and 245cases did not. Written informed consent was obtained from all participants. Recruitment of the participants was approved by the ethics committees at each of the recruiting hospitals.
At the start of the study all known UC loci identified through GWAS and large candidate gene studies were considered. In total we included 13 single nucleotide polymorphisms (SNPs) tagging 13 recently identified UC susceptibilityloci(14–16, 27, 28) (Table 1). Fifteen other UC susceptibility variants had been tested before as part of a GWAS in PSC in the Scandinavian PSC population and were therefore not included in the current study.(20)
Genotyping of the initial three cohorts (Benelux, German, and Scandinavian I) was performed using TaqMan technology from Applied Biosystems(Foster City, California, USA) according to the manufacturer’s recommendations. The patient and control DNA samples were processed in 384-wells plates, with each plate containing 16 genotyping controls (four duplicates of four Centre d’Etude du Polymorphisme Humain (CEPH) DNA). The results from the genotyping assays were analyzed using SDS 2.3 software compatible with the TaqMan system. Genotype data for the Scandinavian replication panel (Scandinavian II) were extracted from a previous PSC GWAS. The patients and controlsin this study were genotyped with the Affymetrix® Genome-Wide Human SNP Array 6.0 (Affymetrix, Santa Clara, CA, USA). Details on quality control and imputation of additional genotypes (using MACH software version 1.0.16) are described previously.(21, 29) Individuals overlapping with the Taqman genotyped Scandinavian study panel were excluded from the GWAS data (n=5 controls).
For the directly genotyped SNPs genotyping cluster plots were manually inspected to ensure high genotyping quality while for the imputed SNPs good imputation quality was required (r2>0.3).
Statistical analysis of the genotyping data from the initial three cohorts genotyped with Taqman assays was performed using PLINK software version 1.06.(30) Quality controls excluded samples with a low genotyping success rate (missing one or more SNPs) and SNPs with a genotyping success rate below 95%. SNPs showing deviation from Hardy-Weinberg equilibrium (HWE) in the controls (P < 0.004) would be discarded from further analysis. All 13 SNPs were in HWE, therefore no SNPs were discarded. To correct for multiple testing forthe13 SNPs, we determined a significance threshold for P at 0.05/13 → P <0.004.
Differences in allele frequencies between cases and controls were analyzed in the three individual cohorts with a χ2 test. Association analysis for the combined German, Benelux and Scandinavian I cohort was performed with a Cochrane-Mantel-Haenszel (CMH) test. A Breslow-Day test for heterogeneity of odds ratios was also performed for the combined cohort.
Association analysis for the Scandinavian II cohort was performed as follows: To take account of uncertainty from the imputation, allele dosages were used for all calculations except for Hardy-Weinberg equilibrium, which was evaluated using the best-guess genotypes in PLINK v1.06.(30) For one SNP(rs6017342) the controls showed deviation from HWE, therefore this SNP was discarded for further analysis. The SNPs were tested for association using binary logistic regression as implemented in the R statistical package version 2.11.1.
The combined analysis of the Taqman genotyped data and the GWAS data was conducted using weighted Z-scores.(31) To take account of unequal numbers of cases and controls in the study panels, the effective sample size was used for the weighting of the Z-scores.(32) To check for heterogeneity of the associations between the four different cohorts a Breslow-Day test was performed in R statistical package. Regarding the Scandinavian II cohort, 2×2 tables with allele frequencies in cases/controls were recalculated using allele frequencies based on the allele dosages from the GWAS.
To obtain insight into the functional relation of the PSC risk loci, we performed a Gene Relationship Across Implicated Loci (GRAIL) pathway analysis (http://www.broadinstitute.org/mpg/grail). GRAIL is a statistical tool that uses text mining of PubMed abstracts to annotate candidate genes in loci associated with disease risk.(23) We included all six regions that showed an association in previous studies,(20, 21) in combination with the three new loci we identified. Specifically, GRAIL evaluated each gene in a PSC-associated locus for non-random correlation with genes in the other eight loci, via word-usage in PubMed abstracts related to the gene. We used HG17 and December 2006 PubMed datasets, default settings for SNP rsNumber submission, and all nine PSC loci as query and seed. To assess the accuracy of the statistical significance of the set of connections, we conducted simulations in which we selected 1000 sets of nine SNPs. Since PSC associated SNPs were in genomic regions implicating 100 genes, we selected SNP sets using rejection sampling that implicated 100 genes ± 5%. Each of these 1000 sets were also scored with GRAIL.
We used genetical genomics data of 1469 peripheral blood DNA and RNA (PAXgene) samples from Dutch and UK individuals, described in detail by Dubois et al.(33) to perform an expression quantitative traitlocus (eQTL) analysis. All samples had been genotyped using either an Illumina Hap370 or 610-Quad platform. Imputation for ungenotyped SNPs was performed using IMPUTE software (https://mathgen.stats.ox.ac.uk/impute/impute.html). RNA from these individuals was hybridized either to an Illumina HumanRef-8 v2 array (229 samples, Ref-8 v2, Gene Expression Omnibus (GEO) accession GSE20332) or to an Illumina HumanHT-12 array (1,240 samples, HT-12, GEO accession GSE20142), and raw probe intensity extracted using BeadStudio. The Ref-8 v2 samples were jointly quantile normalized and log2 transformed, as were the HT-12 samples. Expression variation due to batch and technical effects were removed by using principal component analysis (PCA) and removal of 50 PCAs. Subsequent analyses were conducted separately for both datasets, up to the eventual eQTL mapping, which used a meta-analysis framework combining eQTL results from both arrays. We applied a window of 500kb around each SNP (250kb on each side).
To prevent spurious associations due to outliers, a non-parametric Spearman’s rank correlation analysis was performed. When a particular probe-SNP pair was present in both the HT-12 and Ref-8 v2datasets, an overall joint P value was calculated using a weighted Z-method (square root of the dataset’s sample number). To correct for multiple testing, we controlled the false-discovery rate (FDR). The distribution of observed P values was used to calculate the FDR, by permuting expression phenotypes relative to genotypes 25 times within the HT-12 and Ref-8 v2dataset. cis-eQTLs were considered statistically significant with a Spearman P < 0.0028, corresponding to a 5% FDR. Finally, we removed any probes from the analysis that contained a known SNP (1000Genomes CEU(Utah residents with European ancestry) SNP data, April 2009 release) to prevent false-positive associations, assuming the “probe SNP” caused differential hybridization. Six regions that showed an association in previous studies, along with the three loci identified in our study, were included in the eQTL analysis.
Initially, we genotyped 13 SNPs in 854 PSC patients and 1491 controls. After applying quality control measures, 823 cases (96,4%) and 1445 controls (96,9%) were available for further analysis. Next, a joint analysis was performed with an independent Scandinavian cohort consisting of 332 cases and 257 controls. Thus, in total1155PSC patients and 1702 controls were analyzed. Both in the initial analysis as in the joint analysis we did not observe any heterogeneity of the ORs between the cohorts(Breslow Day test p>0.004)
Combined analysis of the initial three cohorts revealed three loci associated with PSC. The strongest association was observed for SNP rs6822844 at locus 4q27 with a P-value of 6.87×10−4 (ORT allele 0.73; 95% CI 0.61–0.88). SNP rs4077515 at 9q34 was associated with a p-value in the combined analysis of 2.18×10−3 (ORT allele 1.21; 95% CI 1.07–1.37) and rs6706689 at 2p16 showed a P-value of 2.62×10−3 (ORA allele 0.82; 95% CI 0.73–0.93). All the results are shown in Table 1. Joint analysis with an independent Scandinavian cohort of 332 cases and 257 controls further strengthened the association signal(p=4.10×10−5 for rs6822844, p= 8.41×10−4 for rs4077515 and p= 4.12×10−4 for rs6706689, Table 2). All currently identified allelic association signals for PSC are in the same direction as the original identified associations for ulcerative colitis.
Of the 984PSC cases for which IBD phenotype information was available, there were 739cases with concurrent IBD and 245 cases without it. After quality control, 732PSC cases with IBD and 241PSC cases without IBD were available for further analysis. Two of the three SNPs associated in the entire PSC cohort were also nominally associated in PSC patients without IBD: rs6822844 at 4q27 (P = 1.54×102) and rs4077515 at chromosome 9q34 (P =1.62×10−2), but they were not statistically significant after correcting for multiple testing.(Table 3)
For both the GRAIL and eQTL analyses, we took the three new risk loci (rs6822844, rs4077515 and rs6706689) and added the six previously reported loci, including the HLA locus (rs6720394, rs12612347, rs3197999, rs3134792, rs10905718 and rs9524260). GRAIL shows significant interconnectivity between genes at six out of the nine PSC associated regions. Results of the GRAIL analysis are shown in Figure 1.
The mean number of P <0.05 hits in a simulated list was 0.291, with a range in the 1000 sets from zero to five. The likelihood of observing six hits with P <0.05 is therefore less than 0.1%.
Five out of the nine SNPs had a significant effect (Spearman P <0.0028; FDR 0.05) on expression of one or more nearby genes. The results of both the GRAIL and eQTL analyses are summarized in Table 4. Detailed results of the eQTL analysis for each locus are given in the online supplementary materials (Supp. Table 2 and Supp. Figs 1A-P).
In a large, northern European cohort consisting of 1186PSC cases and 1748controls we identified three UC susceptibility loci on chromosomes2p16, 4q27 and 9q34 to be associated with PSC. In silico analysis using GRAIL and eQTL analyses prioritized REL, IL2, IL21 and CARD9 as putative candidate genes from these associated loci. These results add to the scarce knowledge about the genetic background of PSC and candidate genes residing at these and previously identified loci highlight an important role for both the innate and adaptive immunological factors in disease pathogenesis.
On the 4q27 locus SNP rs62822844 resides in a non-coding region upstream of IL21 and downstream of IL2. The region shows extensive linkage disequilibrium (LD) and harbours four genes: KIAA1109, TENR, IL2 and IL21. Due to the LD structure, defining the causal variant by conventional genetic testing is difficult. From a functional perspective and from secondary interconnectivity analysis IL2 and IL21 are the most compelling candidate genes. Furthermore IL2RA, encoding the alpha subunit of the IL2 receptor, was found to be associated with PSC in a recent genome-wide analysis, suggesting an important role for the IL2 pathway in disease pathogenesis.(21) The IL2-IL21 region appears to be a general risk locus for many immune mediated diseases since it is associated with multiple diseases such as celiac disease, UC, type 1 diabetes, systemic lupus erythematosus, rheumatoid arthritis and psoriasis.(27, 34–38) IL2 serves as a T-cell growth factor, augments natural killer (NK) cell cytolytic activity and promotes immunoglobulin production by B cells. IL2 is also involved in the development of regulatory T-cells playinga role in T-cell tolerance. (39) The other possible candidate gene from the 4q27 locus is IL-21 which has a broad function, including promotion of differentiation of B cells to plasma cells, driving the expansion of CD8+ T cells, and acting as a pro-apoptotic factor for NK cells and incompletely activated B cells.(39)
The 9q34 locus constistutes an extended haplotype block spanning 120 kb including multiple genes (among others CARD9, GPSM1, SNAPC4, SDCCAG3, PMPCA, INPP5E, and KIAA0310). Although a number of genes located in this LD block are attractive candidates for association with PSC, previous reports on IBD prioritized CARD9 as the most likely candidate gene.(37) Furthermore, the associated SNP rs4077515 exerts a very strong effect on the expression of CARD9 (Cis-eQTL p-value 5.81×10−125) in our current analysis. In addition, CARD9 is interesting from a functional point of view as the encoded protein of CARD9 plays an essential role in stimulating innate immune signaling by intracellular and extracellular pathogens. After binding of fungi to the Dectin-1 receptor or bacteria to the Toll-like receptor (TLR), CARD9 stimulates T cells to differentiate into type 17 helper T cells (TH17 cells) and also promotes the pro-inflammatory cytokine production.(40, 41) TH17 cells are increasingly recognized as being involved in host defense and inducing autoimmunity and tissue inflammation.(42, 43) On chromosome 2p16 the REL gene is a possible candidate for the association with rs6706689. REL is a key mediator in NF-κB inflammatory signaling. Besides UC, this locus is also associated with celiac disease(44) and rheumatoid arthritis.(45)
From the previously identified PSC loci, TGR5, the G protein-coupled bile acid receptor 1 (GPBAR1) is a compelling candidate gene at chromosome 2q35 because of its function in bile duct homeostasis and inflammation; recent re-sequencing and functional studies have provided insight into the function of TGR5. Due to the strong linkage disequilibrium (LD), other genes at the 2q35 locus cannot be excluded as PSC-associated genes(21) and our analyses showed that SLC11A1 (NRAMP1) might also be a candidate gene. SLC11A1 encodes a multi-pass membrane protein and is involved in iron metabolism and host resistance to certain pathogens.(46, 47) and the locus is also implicated in other immune-related diseases such as rheumatoid arthritis.(48)
As PSC is strongly associated with IBD, and in particular with UC, it is difficult to identify loci specifically associated with PSC (Supp. Table 3). In the literature it is debated whether IBD in PSC patients is indeed UC or Crohn’s Disease (CD) or whether it represents a distinct entity of IBD. Disease in patients with the PSC-IBD phenotype is, for instance, more frequently characterized by rectal sparing and backwash ileitis compared to chronic UC patients.(49) Furthermore, previous candidate gene studies investigating IBD genes in PSC did not yield any associations with PSC.(50) It is compelling that two out of three loci were associated in the subgroup of PSC patients without IBD despite the low power of this subgroup analysis. These associations do not pass stringent correction for multiple testing, but the fact that the association is in the same direction as for the whole cohort suggests association irrespective of concomitant IBD. However, it is important to realize that the group of PSC patients without IBD is difficult to define precisely since the IBD in PSC can present either before the onset of PSC or up to several years after the PSC diagnosis,(1) and even sometimes after liver transplantation.(51)
Since only five non-HLA genetic risk loci for PSC had been identified before, this study provides a major addition to the current knowledge on the genetic background of PSC. The PSC susceptibility loci we identified in our study were not found in the two recently published GWAS.(20, 21) A probable reason for this is that the number of samples in the discovery sets for both GWAS were small (285 and 715 cases, respectively) and due to the general design of GWAS studies, many true positive findings will be discarded and not followed up due to stringent thresholds for multiple testing, hereby ignoring large amounts of data. In the current study, the associations reported do not reach genome wide significant levels but are robust to correction for multiple testing using a conservative Bonferroni approach. Although the current cohort is large, not reaching genome wide significance might be due to the lack of power to identify genes with smaller effect sizes than the previously reported associations.(20, 21) Since PSC is a relatively rare condition, large-scale international collaborations will be needed to identify additional genetic loci with even smaller effect sizes.
In conclusion, we have identified three UC susceptibility loci to be associated with PSC. Likely candidate genes at these and other PSC susceptibility loci highlight an important role for the innate immune system (REL, CARD9) and the adaptive immune system (IL2-IL2R pathway). Large international collaborations, using custom array-based technologies, have started fine-mapping studies aimed at identifying causal variants within the PSC-associated loci. These studies will hopefully lead to a better understanding of PSC pathogenesis and identify possible targets for therapy.
This study was supported by a clinical fellowship grant (90.700.281) from the Netherlands Organization for Scientific Research (NWO) to RKW. SR is supported by an NIH Career Development Award (1K08AR055688). The study was supported by The Norwegian PSC research center (http://www.rikshospitalet.no/nopsc), the German Ministry of Education and Research (BMBF): through the National Genome Research Network (NGFN) and the Integrated Research and Treatment Center Transplantation (reference number: 01EO0802). This study received infrastructure support through the Research Computing Services at the University of Oslo. L.H.v.d.B. and R.A.O acknowledge funding from the Amyotrophic Lateral Sclerosis Association. R.A.O acknowledges funding from the National Institute of Neurological Disorders and Stroke(NINDS)
We thank Jackie Senior for editing the manuscript. Dr. Benedicte A. Lie and the Norwegian Bone Marrow Donor Registry at Oslo University Hospital, Rikshospitalet are acknowledged for contributing the healthy Norwegian control population.
The authors have no financial, professional or personal conflicts of interest to declare.