|Home | About | Journals | Submit | Contact Us | Français|
Systemic lupus erythematosus (SLE) is a chronic multisystem genetically complex autoimmune disease characterised by the production of autoantibodies to nuclear and cellular antigens, tissue inflammation and organ damage. Genome-wide association studies have shown that variants within the major histocompatibility complex (MHC) region on chromosome 6 confer the greatest genetic risk for SLE in European and Chinese populations. However, the causal variants remain elusive due to tight linkage disequilibrium across disease-associated MHC haplotypes, the highly polymorphic nature of many MHC genes and the heterogeneity of the SLE phenotype.
A high-density case-control single nucleotide polymorphism (SNP) study of the MHC region was undertaken in SLE cohorts of Spanish and Filipino ancestry using a custom Illumina chip in order to fine-map association signals in these haplotypically diverse populations. In addition, comparative analyses were performed between these two datasets and a northern European UK SLE cohort. A total of 1433 cases and 1458 matched controls were examined.
Using this transancestral SNP mapping approach, novel independent loci were identified within the MHC region in UK, Spanish and Filipino patients with SLE with some evidence of interaction. These loci include HLA-DPB1, HLA-G and MSH5 which are independent of each other and HLA-DRB1 alleles. Furthermore, the established SLE-associated HLA-DRB1*15 signal was refined to an interval encompassing HLA-DRB1 and HLA-DQA1. Increased frequencies of MHC region risk alleles and haplotypes were found in the Filipino population compared with Europeans, suggesting that the greater disease burden in non-European SLE may be due in part to this phenomenon.
These data highlight the usefulness of mapping disease susceptibility loci using a transancestral approach, particularly in a region as complex as the MHC, and offer a springboard for further fine-mapping, resequencing and transcriptomic analysis.
Systemic lupus erythematosus (SLE) is a chronic multisystem autoimmune disease characterised by the production of autoantibodies to nuclear and cellular antigens, tissue inflammation and organ damage. There is a strong but complex genetic component to SLE susceptibility, whereby many polymorphisms each with a small or modest effect contribute to disease susceptibility. Genome-wide association studies have shown that variants within the major histocompatibility complex (MHC) region on chromosome 6 confer the greatest genetic risk for SLE in European and Chinese populations.1–3 The extended MHC spans almost 8 Mb and is divided into five subregions: extended class I (telomeric), class I, class III, class II and extended class II (centromeric). One of the most complex regions of the genome, this locus harbours two copy variable regions (the HLA-DRB genes in class II and the RCCX module containing complement component C4 in class III), some of the most polymorphic genes in the genome and conserved haplotypes where linkage disequilibrium (LD) extends over 2 Mb in some instances. The region has been the subject of extensive study given the importance of MHC alleles in the pathogenesis of tissue incompatibility, drug sensitivity, autoimmune, infectious and inflammatory diseases.
In European SLE cohorts, well-established associations are observed with highly conserved and extended haplotypes bearing the class II alleles HLA-DRB1*03:01 and HLA-DRB1*15:01.4 More recent high-density single nucleotide polymorphism (SNP) genotyping studies have demonstrated multiple independent signals across the MHC in northern European cohorts.5 6 However, the causal variants remain elusive due to tight LD across disease-associated MHC haplotypes, the highly polymorphic nature of associated variants and the heterogeneity of the SLE phenotype. Fine-mapping studies across the MHC region in other European and non-European SLE populations are lacking. The haplotypic diversity consequent on differing ancestry and environment demonstrated by these populations at the MHC region as well as non-MHC loci should allow further refinement of known association signals together with the identification of novel susceptibility variants. Given these known haplotypic differences, we undertook a high-density case-control SNP study of the MHC region in SLE cohorts of Spanish and Filipino ancestry using a custom Illumina chip in order to fine-map established association signals and potentially uncover novel susceptibility loci. In addition, we have performed comparative analyses between these two datasets and a northern European UK SLE cohort. In total we examined 1433 cases and 1458 matched controls.
The cohort comprised 464 cases and 468 controls. All cases were recruited from rheumatology clinics throughout Spain. Control samples were obtained from the Blood Bank Units of the hospitals where the cases originated.
The cohort comprised 335 SLE probands and 247 unrelated controls. We also included 26 trios (father, mother and affected child) to allow checks for Mendelian inheritance. All probands attended the Rheumatology and Clinical Immunology Clinics at the University of Santo Tomas Hospital, Manila, Philippines. Unrelated controls were recruited from spouses and acquaintances of the probands.
The cohort comprised 632 SLE probands and 742 unrelated controls from a previous study.5
All SLE probands fulfilled the American College of Rheumatology criteria for the classification of SLE.7 Written consent was obtained from all study participants.
DNA was obtained from whole blood using phenol-chloroform extraction. Native genomic DNA was used for the Spanish study. For the Filipino cohort, 100 ng (5 μl at 20 ng/μl) native DNA was whole genome amplified using the Qiagen REPLI-g Midi Kit (Cat. No. 150045) according to the manufacturer's written instructions.
The samples were genotyped at the Feinstein Institute, USA using a custom Illumina iSelect chip comprising 10 788 SNPs: 6045 SNPs within the MHC region (29–33.5 Mb) and 4743 SNPs informative for major and European ancestry (see online supplement for SNP selection criteria).8 9
All HLA typing was performed using Luminex One Lambda SSO. Four-digit genotyping for HLA-B, HLA-DRB1 and HLA-DQB1 was performed in 82%, 99% and 44% of the Spanish cohort, respectively, following quality control (QC) at Hospital Virgen del Rocío, Seville, Spain and Hospital Virgen de las Nieves, Granada, Spain.
In order to assess LD relationships, four-digit HLA-DRB1 typing was performed in a subset of the Filipino cohort of known genotype for the top SNP, rs9271366, where DNA was available (n=89). Four-digit HLA-DRB1 typing was performed in 606 of 632 UK cases of SLE (96%). The Filipino and UK typing was performed at the Anthony Nolan Trust, London, UK. Two-digit HLA-DRB1 data were obtained for 694 of the 742 UK controls (92%) from the 1958 British birth cohort.
All QC analyses except principal components analyses were performed using PLINK.10 Samples and SNPs were put forward for analysis if they met the following quality control filters: SNPs greater than 95% genotyping efficiency, minor allele frequency (MAF) >1% (failed Spanish n=271, Filipino n=758), non-deviation from Hardy-Weinberg equilibrium in controls on the basis of a false discovery rate of 0.05 (failed Spanish n=61, Filipino n=21). SNPs were excluded if they showed >10% Mendel error rate in the post-QC Filipino trios (n=7). Samples required >95% genotyping efficiency (failed Spanish n=50, Filipino n=50) and PI-Hat scores >0.2 on identity-by-descent analysis using ancestry informative markers (AIMs) in order to exclude cryptic relatedness and duplicate samples (failed Spanish n=27, Filipino n=3). In order to correct for population stratification, samples were excluded if they were outliers on principal components analysis using post-QC AIMs (performed using EIGENSTRAT and defined as >4 SDs from the mean)11 (failed Spanish n=62, Filipino n=88). The genomic inflation factor (λGC) was calculated using the post-QC AIMs after correction for population stratification (Spanish λGC=1.04 and Filipino λGC=1.09).
Genotypes were imputed using IMPUTE12 on the initial set of directly genotyped SNPs (n=1230) up to the Wellcome Trust Case-Control Consortium 2 (WTCCC2) study (n=7119). The WTCCC2 data were used as reference genotypes in the imputation, with dbsnp build 126 defining the genome map.13 No reference haplotypes were used in the imputation. Of the 7119 imputed SNPs in the UK SLE cohort, 3314 overlapped with the 6045 MHC SNPs genotyped in the Spanish and Filipino cohorts in this study and were used for analysis.
Single marker association analyses using logistic regression and stepwise logistic regression analyses were performed using PLINK and SNPTEST.14 We took the genotypes for the most associated SNP as a covariate and conditioned on this in the search for other independently associated SNPs in each dataset. If this analysis yielded further SNPs that passed our threshold of significance (see below), we added the top SNP to further stepwise logistic regression models and continued the process until no further SNPs passed our threshold of significance. Haplotypic association analyses were performed using PLINK and R statistical package. Data for SNP rs409558 in the Spanish and Filipino cohorts were meta-analysed using the standard inverse variance method. We performed tests of heterogeneity using the Breslow-Day test in PLINK. p values are represented following adjustment for the first principal component in each dataset or following adjustment for the first principal component and additional SNPs as covariates in SLR analyses in each dataset. A significance threshold of p=7×10−5 was set, given that genome-wide significance thresholds based on haplotype structure are typically in the range of 5–7×10−8 and that the MHC region constitutes approximately 1/1000th of the genome. The LD structure of the MHC region has been shown to be similar to that of the genome in general, but there appears to be greater LD between haplotype blocks in the MHC region so our significance threshold is likely to be conservative. In the Spanish cohort, separate logistic regression and conditional logistic regression analyses were performed for HLA-DRB1 alleles in order to assess relative predispositional effects. In order to account for multiple testing, Bonferonni-corrected p values were used as follows: HLA-DRB1, p=0.0023 (0.05/22 alleles tested). We examined LD relationships between SNPs and HLA alleles in each cohort by calculating the correlation coefficient (r2) using the Tagger algorithm in Haploview.15
In all three datasets under study there was significant SNP association across the entire MHC region (figure 1). The UK SLE data confirmed previously published reports in northern European cohorts demonstrating principal SNP association within the class II and class III regions of the MHC.5 6 16 The most significantly associated SNP was rs1269852, located intergenic TNXB-ATF6B in the class III region of the MHC. Stepwise logistic regression demonstrated independent association at additional MHC loci including SNPs tagging the HLA-DRB1*1501 haplotype in class II, as well as class I SNPs located between HLA-B and HLA-C and 5′ PSORS1C1 (table 1).
In the Spanish cohort, 399 cases and 394 controls were put forward for analysis following QC measures. Logistic regression analysis of 4924 post-QC SNPs showed that the peak signals in this southern European SLE cohort also arise from the class II and class III regions of the MHC (figure 1, table 1 and table S1 in online supplement). The most significantly associated SNP, rs9268832, was located in the class II pseudogene HLA-DRB9 (OR 1.80, CI 1.45 to 2.23, p=7.64×10−8) and showed moderate/weak LD with HLA-DRB1 alleles (figure S1 in online supplement). Serial stepwise logistic regression revealed a number of independent signals around HLA-DPB1 (best SNP rs3117213) as well as risk and protective signals in and surrounding MSH5 (best risk SNP rs3130490; best protective SNP rs409558). Interestingly, the SNPs with the best OR in this Spanish dataset were the aforementioned variants in and around the class III genes MSH5/C6orf27. The most associated of these SNPs was rs3130490 (OR 3.08, CI 2.03 to 4.66, p=1.04×10−7; (table 1, figure 2 and figure S2 in online supplement). This SNP showed strong LD with the top UK MHC SNP rs1269852 (r2=0.97). Thus, the primary MHC signal in Spanish SLE replicated that observed in the previously published UK dataset.5 In contrast to the UK data where the SNP rs3130490 showed strong LD with HLA-DRB1*03:01 (r2=0.71), the Spanish signal showed only moderate LD with HLA-DRB1*03:01 (r2=0.23), suggesting that variants in the class III region of the MHC may play a more important role than previously recognised. Conditioning on rs3130490 also revealed a number of potentially independent signals in the Spanish cohort, the best of which was the class II SNP rs3129768 located between HLA-DRB1 and HLA-DQA1 (OR 1.91, CI 1.44 to 2.53, p=7.57×10−6). This SNP showed moderate LD with HLA-DRB1*15:01 (r2=0.62). Again this contrasts with our northern European data where one of the main secondary association signals was observed with variants in strong LD with HLA-DRB1*15:01 (r2=0.93). Further stepwise logistic regression revealed association with the previously mentioned SNPs rs3117213 (HLA-DPB1) and rs409558 (MSH5) (table 1). LD analysis revealed that the association underlying rs9268832 probably represents a composite effect of rs3130490 and rs3129768, resulting in its greater statistical significance (figure S1 in online supplement).
Analysis of HLA-DRB1 alleles alone demonstrated principal association with HLA-DRB1*03:01 (OR 1.89, CI 1.43 to 2.48, p=5.53×10−6) (table S2 in online supplement). Conditioning on HLA-DRB1*03:01 in order to assess relative predispositional effects, we found association with HLA-DRB1*15:01 (OR 1.83, CI 1.31 to 2.55; p=0.00045). Conditioning on these top two HLA-DRB1 alleles, we found association with HLA-DRB1*08:01 (OR 3.52, CI 1.55 to 8.01; p=0.0027). No other HLA-DRB1 alleles showed significant disease association following further stepwise logistic regression. Next we used the three principally associated HLA-DRB1 alleles as covariates in a serial stepwise logistic regression in the entire SNP dataset. We found that all the aforementioned SNPs showed some evidence of association independent of HLA-DRB1 alleles except rs3129768 (table S3 in online supplement). Similar results were obtained when conditioning the UK SNP data for HLA-DRB1*03:01 and HLA-DRB1*15:01 (table S4 in online supplement).
Following QC measures, 275 cases and 166 controls were put forward for analysis in the Filipino SLE cohort. The overall pattern of association showed that, of the 3704 post-QC SNPs, the major signal arises from the class II region of the MHC and therefore differs from that observed in European SLE cohorts where principal associations are seen in class II and class III (figure 1, table 1 and table S5 in online supplement). The top SNP, rs9271366, was located between HLA-DRB1 and HLA-DQA1 (OR 2.46, CI 1.83 to 3.30, p=1.97×10−9). Furthermore, HLA-DRB1 typing in a subset of this cohort showed that the most highly associated SNP, rs9271366, was a perfect proxy for HLA-DRB1*15:02 in the Filipino population (r2=1) and suggests a major role for variants on this haplotype in Filipino SLE (table S6 in online supplement). These data are consistent with the known high allele frequency of HLA-DRB1*15:02 in Filipino reference and other Pacific rim populations where the allele frequency ranges from 37% to 48%.17 18 Furthermore, the association of HLA-DRB1*15:01 is well established in East Asian SLE cohorts from Japan and Korea, while the association of HLA-DRB1*15:02 with SLE has been reported in South East Asians from Thailand.19–21
SLR analyses on the top SNP (rs9271366) revealed independent signals in the class I region of the MHC between HLA-G and HLA-A (best SNP rs2571391: OR 0.36, CI 0.22 to 0.59, p=6.06×10−5). Further stepwise logistic regression revealed additional independent signals that replicate those observed in the Spanish cohort: MSH5 (best SNP rs409558) and HLA-DPB1 (best SNP rs2071351) (table 1 and figure S3 in online supplement). Meta-analysis of Spanish and Filipino data for the MSH5 SNP rs409558 revealed a locus-wide significance level at p=1.92×10−5 (ORmeta 0.58, CImeta 0.33 to 0.83).
The most associated Filipino MHC SNP, rs9271366, which acts as a surrogate marker for HLA-DRB1*15:02 in this population, tags HLA-DRB1*15:01 in populations of European ancestry (r2=0.94 (UK controls); r2=0.77 (Spanish controls)) and, as such, shows disease association in European SLE with ORs of approximately 1.4 (table S7 in online supplement). The frequency of HLA-DRB1*15:02 is low in European populations (1–2%). The SNP rs9271366 also tags the SNPs demonstrating disease association following primary conditional analysis in the UK (r2 with rs3129868=0.98) and Spanish (r2 with rs3129768=0.77) cohorts because these SNPs (rs3129868 and rs3129768) are also in LD with HLA-DRB1*15:01 (table 1). As the effect size of the SNP rs9271366 is significantly greater in the Filipino SLE cohort than in the Europeans (Filipino OR 2.46, European OR ~1.4, Breslow-Day p=5.25×10−4, table S7 in online supplement), it is interesting to speculate that this genetic variant or variants in LD may predispose to a more severe disease phenotype such as renal involvement, as is often observed in non-European populations.22–24
Previous attempts to fine-map the SLE-associated HLA-DRB1*1501 haplotypic signal could only delimit the region to approximately 500 kb of the MHC class II region in European-Americans using microsatellite typing.25 When the SNP rs9271366 was used as a surrogate marker for HLA-DRB1*15 haplotypes, we found that the LD surrounding this SNP varied in the different populations studied from a 375 kb region in UK SLE to a 182 kb region in Spanish SLE and to an 87 kb region encompassing HLA-DRB1 and the intergenic interval between HLA-DRB1 and HLA-DQA1 in Filipino SLE (figure 3). Hence, the transancestral mapping approach used in this study allowed refinement of the SLE-associated HLA-DRB1*15 signal.
Haplotypic analyses were performed on the top three independent SNPs from each cohort to look for evidence of interaction. The top SNP was chosen from each cohort, together with the next two independently associated SNPs which were selected following stepwise logistic regression (table 2). Despite the relatively modest cohort sizes, these analyses suggested evidence of genetic interaction (non-additive effects) in all three populations studied (table S8 in online supplement). For example, in the Spanish cohort, a multiple logistic regression model was fitted using the top three independent SNPs (rs3130490, rs3129768 and rs3117213) as explanatory variables and interaction terms were tested for. Interestingly, the best model (difference in Akaike Information Criterion=4.8, difference in Bayesian Information Criterion=6.8) had an rs3130490*rs3129768 interaction where the effect on the OR was positive (5.1 (95% CI 1.12 to 23.08), p=0.03), plus an independent additive term for rs3117213 (see table S9 and figure S4 in online supplement).
Next we examined haplotypic frequency and association using the top three independent SNPs in all three cohorts studied (table 2). We found that the haplotype harbouring the risk alleles of the top three SNPs was rare in European SLE cohorts. However, in Filipino SLE, the risk haplotype was common while the protective haplotype was rare (risk OR 3.45, CI 2.24 to 5.33, p=5.69×10−11; protective OR 0.002, CI 1×10−4 to 0.05, p=2.10×10−4). Thus, the population frequency of the top ranked risk alleles and risk haplotypes increases from the UK and Spain to the Philippines (risk haplotype frequencycases 0, 0.004 and 0.273, respectively), suggesting that the greater disease burden in non-European SLE populations may be due in part to this phenomenon. Furthermore, the frequency of protective haplotypes in each population decreases through the same gradient (protective haplotype frequencycases 0.525, 0.505 and 0.005, respectively).
We present the results of the first high-density transancestral mapping study of the MHC region in SLE using cohorts from the UK, Spain and the Philippines. Despite the modest sample sizes, we have identified and replicated new independent loci with evidence of interaction across this complex region, some of which appear to be SLE-specific (MSH5) while others suggest shared mechanisms across autoimmune/inflammatory diseases (HLA-DPB1, HLA-G, HLA-B/C) (box 1). In particular, we were able to demonstrate a considerable effect from MHC variants in Filipino SLE using single marker and haplotypic analyses due to the high frequency of disease-associated variants in this cohort. There are no accurate prevalence data for SLE in the Philippines and most parts of Asia, even in the most recent literature. In general, published prevalence rates for SLE in Asia are broadly similar to those observed in Europeans and range between 30 and 50 per 100 000.22 24 26 Interestingly, prevalence rates appear to be higher in Asian migrant populations.24
The primary class III risk haplotype tagged by rs1269852 and rs3130490 in Europeans displays extended LD such that it encompasses most of the MHC class III region. Stepwise logistic regression analyses demonstrated an identical protective haplotype encompassing the class III gene MSH5 alone in Filipino and Spanish SLE. These data suggest that dysregulation of MSH5 may underlie some of the risk attributable to the conserved class III risk haplotype. Further fine-mapping will be required to elucidate the nature of this signal. The class III variants that confer the greatest risk in SLE cohorts of European ancestry are rare in Filipino and Han Chinese SLE where MAF are approximately 0.001. These data suggest that either these variants are not important in SLE cohorts of south-east Asian ancestry or that different class III SNPs that are uncommon in Europeans and hence not typed in this study show association in these populations.
Two recent genome-wide association scans in SLE case-control cohorts of Chinese ancestry have shown that the most highly associated SNPs were located in the class II region of the MHC, between HLA-DRA and HLA-DQA2.2 3 We observed a similar pattern of association in the Filipino cohort under study. The most highly associated SNP in the Filipino cohort, rs9271366, is a surrogate marker for HLA-DRB1*15:02. This SNP showed the greatest overall association in the Hong Kong Chinese genome-wide association study and is ranked 8 of the top 13 MHC SNPs in the Han Chinese genome-wide association study in SLE, implicating HLA-DRB1*15:02 haplotypes in SLE susceptibility in these populations as well.
Using transancestral SNP mapping of the MHC region in SLE, we have successfully refined the established SLE-associated HLA-DRB1*15 signal to an interval encompassing HLA-DRB1 and HLA-DQA1. We have identified and replicated association in the genes MSH5 and HLA-DPB1 in Filipino and Spanish SLE cohorts, and also demonstrated association at HLA-G in Filipino SLE. These signals are independent of each other and HLA-DRB1 alleles and show some evidence of genetic interaction. These data highlight the usefulness of mapping disease susceptibility loci using a transancestral approach, particularly in a region as complex as the MHC, and offer a springboard for further fine-mapping, resequencing and transcriptomic analysis.
The authors would like to thank all study participants.
International MHC and Autoimmunity Genetics Network (IMAGEN): John D Rioux, Philippe Goyette, Timothy J Vyse, Lennart Hammarström, Michelle M A Fernando, Todd Green, Philip L De Jager, Sylvain Foisy, Joanne Wang, Paul I W de Bakker, Stephen Leslie, Gilean McVean, Leonid Padyukov, Lars Alfredsson, Vito Annese, David A Hafler, Qiang Pan-Hammarström, Ritva Matell, Stephen J Sawcer, Alastair D Compston, Bruce A C Cree, Daniel B Mirel, Mark J Daly, Tim W Behrens, Lars Klareskog, Peter K Gregersen, Jorge R Oksenberg and Stephen L Hauser.
Clinicians who provided access to SLE samples: Norberto Ortego Centeno, Department of Internal Medicine, Hospital San Cecilio, Granada, Juan Jimenez Alonso, Department of Internal Medicine, Hospital Virgen de las Nieves, Granada, Enrique de Ramon Garrido, Department of Internal Medicine, Hospital Carlos Haya, Malaga, Maria Teresa Camps Garcia, Department of Internal Medicine, Hospital Carlos Haya, Malaga, Julio Sanchez Roman, Department of Internal Medicine, Hospital Virgen del Rocio, Seville, Spain.
A full list of the investigators who contributed to the generation of the WTCCC data is available from www.wtccc.org.uk.
Funding MMAF and LB were funded through an Arthritis Research UK grant (18239). The IMAGEN consortium was supported by grant AI067152 from the National Institutes of Allergy and Infectious Diseases. This study makes use of data generated by the Wellcome Trust Case-Control Consortium. Funding for the project was provided by the Wellcome Trust under awards 076113 and 085475. We acknowledge the use of DNA from the British 1958 Birth Cohort collection (D Strachan, S Ring, W McArdle and M Pembrey) funded by the Medical Research Council grant G0000934 and Wellcome Trust grant 068545/Z/02.
Competing interests None.
Ethics approval This study was approved by the London Research Ethics Committee, UK (Ref: 06/MRE02/9) and the Comité de Ética del CSIC, Granada, Spain.
Provenance and peer review Not commissioned; externally peer reviewed.