|Home | About | Journals | Submit | Contact Us | Français|
This work is licensed under the Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 Licence. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
Systemic lupus erythematosus (SLE) is a complex autoimmune disease. Genome-wide linkage studies implicated a region containing the adhesion molecule P-Selectin. This family-based study revealed two regions of association within P-Selectin. The strongest signal, from a 21.4-kb risk haplotype, stretched from the promoter into the first two consensus repeat (CR) regions (P=8 × 10−4), with a second association from a 14.6-kb protective haplotype covering CR 2–9 (P=0.0198). The risk haplotype is tagged by the rare C allele of rs3753306, which disrupts the binding site of the trans-activating transcription factor HNF-1. One other variant (rs3917687) on the risk haplotype was significant after permutation (P10000<1 × 10−5), replicated in independent pseudo case-control analysis and was significant by meta-analysis (P=4.37 × 10−6). A third associated variant on the risk haplotype (rs3917657) replicated in 306 US SLE families and was significant in a joint UK-SLE data set after permutation. The protective haplotype is tagged by rs6133 (a non-synonymous variant in CR8 (P=9.00 × 10−4), which also shows association in the pseudo case-control analysis (P=1.09 × 10−3) and may contribute to another signal in P-Selectin. We propose that polymorphism in the upstream region may reduce expression of P-Selectin, the mechanism by which this promotes autoimmunity is unknown, although it may reduce the production of regulatory T cells.
Systemic lupus erythematosus (SLE) is a generalized autoimmune disease that behaves as a complex genetic trait. A number of groups have published genome-wide and targeted linkage analyses.1, 2, 3, 4, 5 There is considerable heterogeneity within these linkage studies, no doubt reflecting small study sizes, genetic heterogeneity and clinical heterogeneity in SLE. The most consistently mapped lupus susceptibility loci reside in the following regions: 1q23, 1q25-31, 1q41-42, 2q35-37, 4p16-15.2, 6p11-21 (MHC) and 16q12; all of which have been corroborated at least once in an independent cohort.6, 7 The 1q22-25 region, which contains SELP (P-Selectin), has been identified as part of two genome-wide linkage scans in humans.3, 4 The gene itself, alternately known as P-Selectin, CD62 or Granulocyte membrane protein (GMP140), is over 50kb long and contains 17 exons,8 most of which encode discrete domain structures, as described in the legend to Figure 1.
P-Selectin is a member of the selectin gene family, which includes L-Selectin and E-Selectin. We have already studied these two genes and found no evidence of association with SLE, although two variants in the EGF domain were associated with the levels of soluble L-Selectin.9 All three selectin genes are located in a gene cluster on chromosome 1. The selectin molecules begin with an N-terminal, ‘lectin' domain followed by an ‘EGF' domain, nine tandem consensus repeats similar to those in complement-binding proteins, a transmembrane domain and a cytoplasmic tail. SELP is longer than the other selectin genes because it contains nine consensus repeats (CRs) compared with six in E-Selectin and two in L-Selectin.10
We were interested in the role of P-Selectin as a candidate gene for lupus because, in addition to the linkage data, it also makes a good biological candidate for a susceptibility gene. P-Selectin is primarily expressed on the cell membrane of platelets and endothelial cells after cellular activation by molecules such as histamine or thrombin, which cause release of the protein from α-granules (platelets) or Weibel–Palade bodies (endothelial cells).11, 12, 13, 14 The protein functions as an adhesion molecule for a variety of leukocytes, including neutrophils, monocytes, T cells, eosinophils, basophils, platelets and some malignant cells,15 thereby bringing to sites of inflammation and into contact with a range of cytokines and chemokines expressed by endothelial cells. There are two receptors for P-Selectin, the major leukocyte receptor, P-selectin glycoprotein ligand 1 (PSGL1), which is the high affinity receptor on both myeloid cells and stimulated T lymphocytes,16, 17 and CD24, which is expressed on a number of cells, including B-lineage cells and T lymphocytes.18 Furthermore, the 57 V allele of CD24 has also been recently described as a susceptibility allele for lupus in a Spanish population18 and for both multiple sclerosis19 and rheumatoid arthritis.20
In addition to the genetic and biological reasons for investigating the potential role for P-Selectin, there is evidence from an animal model to support a role for the gene in autoimmune glomerulonephritis. In this model, P-Selectin-deficient mice show several defects in leukocyte function, including almost absent leukocyte rolling in mesenteric venules and delayed recruitment of neutrophils to the peritoneal cavity after laboratory-induced inflammation using intra-peritoneal thioglycollate.21
The work presented in this paper describes a comprehensive family-based association study to look for potential susceptibility alleles, in a collection of UK and US (Minnesota) SLE families.
Our genotyping strategy was to type the haplotype-tagging single nucleotide polymorphisms (SNPs) across SELP and also to include the synonymous/non-synonymous coding variants in SELP, with minor allele frequencies above 0.05 in Caucasian populations. We therefore typed a total of 52 SNPs across the SELP locus (Figure 1a). The final list of 29 markers used for association analysis is shown in Table 4, following the quality control checks described in the Materials and methods.
The results of the transmission disequilibrium test (TDT) analysis using GENEHUNTER, in the UK SLE families are presented in Table 1. These data show that there are two association signals in SELP in two different regions of the gene. The first significantly associated variant, SNP 13 (rs3917687) is located in intron 1 (P=8.00 × 10–6) and the second variant, SNP 36 (rs6133) is a non-synonymous variation, in the CR region of the gene (CR8) (P=9.00 × 10–4). After permutation analysis, using 10,000 permutations, both of these variants remain either highly significantly (rs391687 P10000 <1 × 10–5) or significantly associated (rs6133 P10000=3.30 × 10−2). In our UK families, there is no linkage disequilibrium (LD) between these two variants (D′=0.039, r2=0.001).
We defined the boundaries of the haplotype blocks across P-Selectin using Haploview (Figure 1b). There are three haplotype blocks across the gene: block 1 stretching 21.4kb across the promoter, the signaling domains and into the first three CR regions; block 2, which covers 14.6kb, stretching over most of the CR and block 3, which is an 11.2-kb region including the transmembrane domain and the 3′ flanking region. Block 1 contains a single risk haplotype, haplotype 5, which is tagged by the C allele of SNP 3 (rs3753306). This haplotype 5 also remains significantly associated after permutation analysis with 10000 permutations (P10000=3.30 × 10−2). All the other haplotypes in block 1 carry one of the under-transmitted minor alleles. The second region contains a significant single SNP association is block 2. Haplotype block 2 contains an under-transmitted haplotype, haplotype 11 (P=0.0198). This protective haplotype is uniquely tagged by the under-transmitted T allele of SNP 36 (rs6133). The breakdown in LD between blocks 1 and 2, as shown by an inter-block D′ score of 0.21 (Figure 1b), provides further indication that there may be multiple independent association signals within P-Selectin.
To seek replication of the UK association in SELP, we genotyped selected variants in an independent collection of 306 Minnesota SLE parental-affected trios. We have earlier used this population as a replication cohort for our work on OX40L.22 We used two strategies to confirm that the Minnesota samples also made a suitable replication cohort for P-Selectin. We tested for differences in parental allele frequencies between the UK and Minnesota populations (Supplementary Table 1) (P-value >0.05), and for differences in the pattern of LD across the gene (data not shown). The results of the TDT analysis for individual SNPs revealed associations in the Minnesota samples with SNP 6 (rs3917657) P=3.70 × 10−2, OR=0.627, 95% CI=0.403–0.976) and SNP 21 (rs6131) (P=0.0219, OR=0.732, 95% CI=0.560–0.957), presented in Table 1. The two over-transmitted alleles of these two SNPs, the C allele of SNP 6 and the G allele of SNP 21 (rs6133) are carried on the UK risk haplotype.
We wanted to carry out a joint UK–US analysis across P-Selectin. Tests for stratification were carried out on the assumption that the two populations were indeed homogenous across P-Selectin: when employing Pearson's χ2-test, we did not reject this assumption with respect to the transmission ratios (P>0.001) (Supplementary Table 2) and using the Breslow–Day test, we did not reject the assumption with respect to the odds ratios (P>0.001) (Supplementary Table 3). All of these factors give us the confidence to do a robust joint analysis between the two populations. The results of this joint analysis, presented in Table 2, show a stronger association for SNP 3 (rs3753306), SNP 6 (rs3917657) and SNP 21 (rs6131) in the joint UK–US data set, compared with either of the two independent UK or US data sets. The risk alleles of all the three variants are carried on the risk haplotype defined in the UK SLE families (Figure 1b). Furthermore, the association with SNP 6 remains signficant after permutation analysis with 10000 permutations (P=7.20 × 10−3). Haplotype–TDT analysis in the joint data set adds further evidence to the over-transmission of a SNP 3(C)-SNP 6(C)-SNP 21(G) haplotype (P=0.0239). These are alleles all carried on the more-densely mapped UK SLE risk haplotype. An under-transmitted T-T-A haplotype was also seen in the combined data set. This haplotype: SNP 3(T)-SNP 6(T)-SNP 21(A) haplotype (P=3.10 × 10−3) is carried on UK SLE haplotypes 6 and 7, both of which show a trend for under-transmission (P<0.05). These data serve to confirm the importance of the promoter region of P-Selectin in the association signal.
To extend our analysis, we used imputation to test un-genotyped variants within the chromosomal ranges of our existing UK genotyping. We then sought to replicate the associations observed in our UK trio collection by carrying out an imputation-based ‘pseudo case-control' analysis with a set of 270 cases independent from the 263 UK SLE families used for our TDT association study. Figure 2a shows a plot of –log (P-value) for all the SNPs used in the pseudo case-control association study against chromosomal position. The P-values for the variants with the strongest association are presented in Table 3. These data reveal independent replications of the association seen by TDT analysis in our UK families for two variants within the bounds of the upstream risk haplotype: SNP 12 (rs3917683) (P-value=0.0270) and SNP 13 (rs3917687) (P-value=0.0330), with a third borderline association for SNP 21 (rs6131) (P-value=0.053), which also lies on the risk haplotype. It is interesting that when we repeated the imputation on taking both the 263 cases from the trios and the additional 270 independent cases and then re-did the pseudo case-control analysis, there was highly significant association for SNP 13 rs3918687 (P=2.52 × 10–5) and a significant association for non-synonymous variant SNP 36 (rs6133) (P=1.09 × 10−3). The imputed SNPs in the independent cases data were not significant, whereas several SNPs showed marginal levels of significance in the combined data set (Figure 2b).
To maximize the numbers of samples for the association analysis, we combined the P-values for SNPs on the UK risk haplotype, which showed association (SNP 13—rs3917687 and SNP 12—rs3917683) or a trend for association (SNP 21—rs6131) in the pseudo case-control analysis and/or showed association in either the 263 UK parental-proband trios (rs3917687) or the 306 Minnesota trios (rs6131) and in the imputation analysis. These results revealed a highly significant P-value for SNP 13 (P=4.37 × 10–6) and a significant association for SNP 21 (P=9.56 × 10−3).
P-Selectin is located in the SLE linkage region on human chromosome 1 (1q23). Linkage studies in humans and in animal models of lupus have suggested an important role for the gene in SLE. The current family-based association testing of this candidate gene has provided significant evidence for the association to SLE in two independent populations. Specifically, we are confident that we have discovered a risk haplotype stretching across the promoter of P-Selectin, and we have additional evidence to support a second signal in the CR region of the gene.
This is the first report of an association for P-Selectin in the upstream region of the gene. In our 263 UK SLE families, a single risk haplotype, haplotype 5, is responsible for the signal from the upstream region of SELP—haplotype block 1 in Figure 1b. This haplotype remains significant following permutation analysis (P10000=0.0330). There are no other haplotypes in this region, which show association. The presence of this single associated haplotype in block 1 may be because of the pattern of association for the associated SNPs within the block, namely the allele frequencies of the over-transmitted alleles. In this respect, SNP 3 (rs3753306) is unique, in that it has an over-transmitted rare allele carried on the risk haplotype. For all the remaining associated polymorphisms in this region, SNPs 6, 8, 11 and 13, it is the common allele, which is the risk allele. The higher frequency of these major risk alleles means that they are present on multiple different haplotypes. In these situations, pinpointing the ‘etiological haplotype' will require a greater density of typing to allow discovery of possible recombination events. However, we believe that there are associated variants in the upstream region of P-Selectin because we have replicated the association for two alleles, which are carried on the risk haplotype (SNPs 6 and 21) in the Minnesota SLE families. Furthermore, we then went on to not only show that the association for SNP 6 is stronger in the joint UK–US data set compared with the individual populations, but that the association is still significant following permutation analysis (P10000=7.20 × 10−3) (Table 2).
To determine whether we could localize the causal alleles for association to the upstream region of P-Selectin, we carried out imputation analysis for variants across P-Selectin in an independent collection of 270 UK SLE cases. The strongest result was the independent replication of SNP 13 (rs3917687) (P=0.0339) (Table 3). We confirmed that it was the same (C) allele of this variant that was both under-represented in SLE cases (Table 3) and under-transmitted in the 263 UK trios (Table 1). The case for an association in the upstream region of P-Selectin was further strengthened by the finding that the same variant, SNP 13, was highly significantly associated following pseudo case-control analysis, using both the cases from the 263 trios and the 270 independent UK cases (Figure 2b) (P=2.52 × 10–5). Furthermore, we have also shown that the association from SNP 13 remained highly significant following permutation analysis in our 263 UK trios (P10000 <1 × 10–5).
Having established that variants in the upstream region of P-Selectin may contribute to disease susceptibility, we went on to look for the potential functional significance for any of the associated SNPs on the risk haplotype. We discovered that the risk C allele of rs3753306 disrupts the binding site for the HNF-1, a trans-activating transcription factor, expression of which is predominantly restricted to the liver and kidney. It is possible that this loss-of-function risk allele in the promoter may reduce production of P-Selectin, thereby reducing recruitment of pro-inflammatory leukocytes. However, in a mouse model, loss of P-Selectin in MRLfas mice by a gene-targeted deletion leads to increased glomerulonephritis.23 In another mouse model of glomerulonephritis, P-Selectin-deficient mice show increased proteinuria, glomerular damage and mortality.24 However, although these animal models have the potential to provide a link between decreased production of P-Selectin and increased risk of renal disease, there may be a complication in the interpretation the results. This complication was first identified by Bygrave et al., who discovered that the 129-derived chromosome 1 segment used for gene-targeted deletions may itself produce symptoms of autoimmunity. Therefore, the observed increase in renal disease noted in the ‘P-Selectin-deficient' mice may be the result of the effect of a 129-derived chromosome 1 segment rather than a primary effect from P-Selectin deficiency.25 However, in human renal disease, a reduction of P-Selectin transcription has been reported. No P-Selectin mRNA transcript was detected in the glomerulus of renal cases, despite an increased expression of glomerular P-Selectin.26 The increase in protein expression may reflect the release of stored P-Selectin from storage granules rather than increased gene expression, as P-Selectin is constitutively produced in the secretory granules of platelets (α-granules) and endothelial cells (Weibel–Palade bodies), but is not expressed on the cell surface until an appropriate stimulus is provided.14
Given the evidence presented above, it is intriguing to hypothesize that variants in P-Selectin may have a role in renal lupus, but we do not have sufficient power within the UK SLE cohort used for this paper to assess whether the association in P-Selectin is stronger for UK SLE families in which the affected offspring has renal disease.
An alternative functional consequence of a reduced expression of P-Selectin is a potential reduction in the numbers of regulatory T cells. A recent publication by Urzainqui et al.27 reported that the engagement of P-Selectin ligand on dendritic cells with P-Selectin may increase the production ability of the dendritic cells to generate CD4+ CD25+ Foxp3+ regulatory T cells. This reduction in the production of regulatory T cells may in turn lead to increased incidence of autoimmune disease.
However, SNP 3 (rs3753306) is not the only susceptibility allele in the upstream region of P-Selectin because although this variant is located <1kb upstream from the 5′ UTR, it is carried on a risk haplotype stretching into the CR regions. There are several other protein domains within the bounds of this haplotype, which may contribute to disease pathogenesis. The region between SNP 3 and SNP 6 (found <1kb downstream of the 5′ UTR) contains the proximal promoter, a 249-bp section of upstream sequence, which has been shown to promote a high level of gene expression in cultured endothelial cells.28 The risk haplotype also includes the lectin and EGF domains, which are necessary for P-Selectin to bind to its ligand PSGL.29, 30 Changes in the affinity of the binding between P-Selectin and its ligand will affect the recruitment of leukocytes to the sites of inflammation. Although the two SNPs we typed in these domains failed the quality control, it is probable that there are other unknown variants, which may be important in disease etiology. All these data would suggest that either SNP 3, SNP 6 and/or another variant(s) in LD with them play an important role in increasing the risk of SLE. Nevertheless, to identify the exact causal variants in P-Selectin, it will be necessary to undertake extensive re-sequencing of the 21.4-kb risk haplotype.
Establishing a convincing association within the CR region of the gene proved to be more challenging. As there were many variants in the coding region of the gene, around the nine CR it was an attractive hypothesis to believe that the core association came from these sequences. However, as shown in Table 4, many of these variants failed to generate reliable assays or were of low minor allele frequency. This may be because of the high level of sequence similarity between the separate CRs, which affect genotyping efficiency. Although our study is the first to report an association for P-Selectin arising from the upstream region of the gene, there has been one earlier report of an association in SELP with lupus from an SNP in the coding region.31 The variant in question, is SNP 38 (rs3917815) a Ser/Asn non-synonymous polymorphism located in E12. We genotyped SNP 38 as part of our current study, but the variant proved to be monomorphic, and therefore was excluded from our analyses. These data agree with the genotype data on the CEU samples from the HAPMAP database. Nevertheless, in the same exon in the SNP described by Jacob, we find significant association from a Leu/Val non-synonymous variant in CR 8 (SNP 36—rs6133). The associated allele was the under-transmitted allele T (Val) allele, which tags the protective haplotype. We went on to show that the association from rs6133 remains significant after permutation analysis in our 263 UK families P10000=0.0250 (Table 2). The consensus repeat domains are thought to be important, but not essential, in binding P-Selectin to its ligand on leukocytes. It is possible that the over-transmitted G (Leu) allele of SNP 36 results in a stronger ligand binding, and if so, may contribute to an increase in inflammatory activity, resulting from greater recruitment of leukocytes to sites of inflammation. We also know that the same G (Leu) allele of this variant has earlier been reported as being associated with thromboembolic stroke32 and atopy.33 As in our UK SLE families, the T allele of SNP 36 (rs6133) is unique to the under-transmitted haplotype in block 2 (Figure 1b) and we observed no LD with the risk haplotype in block 1, we think that rs6133 may tag a second signal within P-Selectin.
The collection of the UK Caucasian SLE families consisted of a total of 263 complete trios and 270 independent UK families. All probands conformed to the American College of Rheumatology criteria for SLE,34 with a diagnosis of SLE being established by telephone interview, health questionnaire and details from clinical notes. Written consent was obtained from all participants, including relatives. In the United Kingdom, ethical approval was obtained from Multi-Centre Research Ethics Committee. Genomic DNA from the UK samples was isolated from anti-coagulated whole blood by a standard phenol–chloroform extraction.
Corroboration of the UK association study in SELP was investigated using a total of 306 US parental-affected trios, taken from the Minnesota SLE collection. These studies were approved by the Human Subject Institutional Review Boards at the University of Minnesota, and informed consent was obtained from all the patients.
In this study a total of 52 markers were selected from across P-Selectin, taken from the public databases, dbSNP (http://www.ncbi.nlm.nih.gov/SNP/index.html) and from HAPMAP (http://www.hapmap.org/). A single SNP (A1969G) was identified by re-sequencing 60 UK SLE probands. Preliminary genotyping was carried out in 263 UK trios, with markers being excluded from the analysis if they showed a genotyping success of <75%, had >5% of families with Mendelian errors as identified by PEDCHECK and/or had a Hardy–Weinberg P-value in the parental samples of <P=0.001 (Table 4). Three markers were removed because of low genotyping frequency, ten markers were monomorphic, six markers had low minor allele frequency and three other markers that were removed had a Hardy–Weinberg (HWE) P-value of <0.001 in the founder chromosomes. A further marker failed to generate a viable assay with Sequenom methodology. In the Minnesota samples, we sought replication for several of the variants from our UK families. None of the seven markers typed in the Minnesota samples failed the quality control.
The genotyping in the UK samples was carried out using MALDI-TOF mass spectrometry (Sequenom, San Diego, CA, USA)35 and analysis of the raw genotype data was carried out using the MassArray Typer v3.4 software (Sequenom). After visual inspection of the clusters, manual adjustments were made for some of the assays by the Sequenom iPlex system (Sequenom, San Diego, CA, USA).
All sample genotype and phenotype data were managed by, and analysis files were generated with BC/SNPmax and BC/CLIN software (Biocomputing Platforms Ltd, Espoo, Finland). A comparison of the parental allele frequencies between the UK and US collections was made using χ2 analysis in a 2 × 2 contingency table, with the level of significance being presented as a P-value with 1 d.f. Pearson's test was used to compare the ratio of transmitted to untransmitted (T:U) families for each marker and the Breslow–Day test in PLINK was used to test homogeneity of odds ratios between the two populations.
Association of alleles to SLE for both individual and multiple SNPs was tested by the TDT, which compares the observed and expected transmission of alleles from heterozygous parents with affected offspring. This analysis was carried out using PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/).
Haplotype–TDT was carried out using Haploview. For SNPs in each gene, haplotype patterns were generated using haplotype block definitions on the basis of the solid-spine algorithm, described on the Haploview website, with an LD cut-off at D′=0.6. This program constructs haplotypes on the basis of the D′ measure of LD,36 together with a logarithm of the odds (LOD) score as measure of significance and 95% confidence intervals to state the accuracy of the P-value. The pairwise linkage LD for SNPs across each gene was confirmed by the r2 values.37, 38 Only markers having minor allele frequency of >5% were included in the haplotype constructions and haplotypes with a frequency of >3.0% were included in the LD diagrams.
Imputation was carried out on the cases and pseudo-controls using IMPUTE, by the method described by Marchini et al.39 We imputed all SNPs in the HAPMAP within the range of our data plus 20kb either side—giving us 141 SNPs (110 imputed) for our case-control analysis.
The output from IMPUTE gives probabilities for each genotype, rather than point estimates. The use of probabilities allowed us to account for the uncertainty in imputation within the case-control analysis using SNPTEST.
For each affected child in the study, we created a matched control using the untransmitted alleles from their parents. As phasing the data was possible, we could have matched each case to all haplotypes that could have been passed on from the parents. However, this would not have been possible for the imputed SNPs, therefore we only matched to genotypes made up from the untransmitted alleles.
The data were analyzed using classical methods, which returned P-values. Logistic regression additive models were fitted with SLE as the outcome.
We carried out a meta-analysis by combining the P-values from the United Kingdom and Minnesota parental-affected trios and the UK-independent pseudo case-control study using Fisher's method.40
The authors declare no conflict of interest.
This work was funded through a Senior Fellowship Award from the Wellcome Trust to Timothy J Vyse and we are grateful for the support from the NIHR Biomedical Research Centre funding scheme at the Imperial College. We acknowledge the work of Andrew Wong recruiting patients and families into the study and we would like to thank our clinical colleagues for helping us recruit study participants. Our thanks and appreciation is extended to all the patients and their relatives for generously donating blood samples and all the general practitioners and practice nurses for collecting them.