In a large collection of SLE cases and controls, we investigated the relationship between 22 risk alleles, considered individually and as cumulative genetic risk scores, with SLE susceptibility and specific SLE manifestations. It is important to understand the etiology of SLE subphenotypes, since different subphenotypes of SLE have differential morbidity and mortality, and appear likely to have different underlying etiologies as well. We believe that a more clear understanding of which, if any, genes affect each subphenotype may help lead to a better understanding of SLE disease mechanisms.
We defined a genetic risk score, the GRS, as a summation of SLE risk alleles with each allele unit multiplied by the SLE OR for that allele. This is similar to the weighted “wGRS” defined by Karlson et al
[11] for rheumatoid arthritis, except that we use the OR directly rather than its logarithm to be on a scale more similar to the number of risk alleles; the use of 22 risk alleles in both is coincidental. While the number of risk alleles is more intuitive and easier to visualize, the GRS has a wider range and variance and a stronger correlation with SLE susceptibility and subphenotypes. When applied to subphenotypes, the GRS may lose power due to unassociated or improperly weighted SNPs. For this reason we also modeled subphenotype-specific genetic risk scores (sub-GRS) with subsets of SNPs determined using a discovery-replication approach. While the association of these scores in our overall dataset was likely to be inflated since a substantial subset of the data was used to determine the ranking and weighting of the composite SNPs, the odds ratios in our replication set were similar or slightly higher than for the SLE GRS.
It should be noted that many of the SLE risk alleles were discovered using subjects in our study; thus our odds ratios may be an overestimate of the actual odds ratios (“winner's curse”) resulting in over-weighting in the GRS for some SNPs. On the other hand, it is likely that many of these SNPs are not the causal variants but markers in LD. In that case, their effect sizes for SLE susceptibility and/or subphenotype associations would be underestimated, causing the GRS and/or sub-GRS scores to be underweighted and under-associated. Also, in some cases we were not able to use directly-genotyped SNPs at exactly the risk locus previously identified in the literature. Three SNPs were imputed in the SLEGEN dataset (Illumina 317K versus 550K, see
Table S1), and for 6 SNPs we used a proxy. Use of proxy and/or imputed SNPs may have given us lower power to detect associations if those SNPs were not as accurate or highly associated; however we believe accuracy was assured by high thresholds for imputation inclusion (see
Methods) and proxy SNP selection (r
2≥0.8). Also, while multiple signals have been implicated in the
TNFAIP3 region
[5],
[17], we were only able to include one locus with a suitable match in our data. Another potential limitation of the GRS is lack of modeling interactions between SNPs. We tested for all 2×2 interactions between the 22 SNPs in our data with no results being significant after multiple-testing correction; however we may have lacked the statistical power to detect such interactions given our sample size.
Our analyses used
HLA-DRB1 tagging SNPs for the
DRB1*0301 (DR3) and
DRB1*1501 (DR2) alleles rather than direct
HLA-DRB1 genotyping data. Our resulting ORs were lower than those in the literature and therefore may underestimate the GRS. We performed sensitivity analyses with a subset of our cases having 4-digit
HLA-DRB1 typing (n

=

716) and a subset of controls having mixed 2- and 4-digit typing (n

=

1414). Removing ambiguous 2-digit types, there was 98.9% agreement of the DR3 classification (as 0/1/2 alleles) and 98.2% agreement for DR2. We were not able to assess case-control ORs using this data due to the differential typing; however we tested our DR3 associations with anti-dsDNA production and renal subphenotypes, and observed nearly identical ORs and significance compared to the tag SNPs using the same subset of subjects (data not shown).
We have shown that a subset of SLE clinical manifestations – immunological disorder including anti-dsDNA production, renal disease, age at diagnosis, hematologic disorder, and oral ulcers – are strongly associated with the number of risk alleles and the GRS. For most of these, the GRS was much more highly associated than any single locus, with the exception of renal disease and the
HLA-DRB1 *0301 (DR3) allele, which is stronger than the GRS signal (and equivalent to the sub-GRS as it had only a single allele). For arthritis, there was no association with the GRS, but there is evidence for a protective effect of the
ITGAM locus. For other manifestations, such as malar rash and serositis, there were no significant associations with either the GRS, sub-GRS, or with single loci. This led to our categorization of SLE manifestations into those that are: a) influenced by cumulative effects of multiple known genes, b) influenced primarily by a single gene out of the currently-established risk loci, and c) thus far not appearing to be strongly influenced by genetics (). Anti-nuclear antibody production was not included in this characterization as it was present in almost all SLE patients (95.9% of our subjects,
Table S4); it is also possible that some associations were not evident due to lack of power for less-frequent manifestations, such as discoid rash and neurologic disorder.
Strengths of this study include the large sample size and availability of clinical data for the SLE cases. Although there are potential issues of differing clinical evaluation at different sites and comprehensive follow-up after DNA collection, we expect the standardized ACR criteria to be highly consistent; furthermore we expect that any misclassification would be random with respect to genotype and therefore bias our results towards the null. One related issue was the large number of cases lacking data for disease duration. In general, we took a conservative approach and did not include observations that did not have disease duration information when disease duration was found to be associated with subphenotypes; for a subset of analyses, we also utilized single and/or multiple imputation on the entire dataset and observed similar results.
A limitation of this and most other recent studies of SLE genetics is that it contains only subjects of European ancestry, and primarily northern European. The GRS was strongly associated with the first principal component of whole-genome SNPs, which reflects ancestry along the northwest-to-southeast European cline. This is likely to be at least somewhat if not largely due to the fact that these risk alleles have been discovered using mostly subjects of northern European ancestry, and additional risk alleles for other populations have yet to be discovered.
While the GRS was very highly associated with SLE susceptibility, the predictive capability was somewhat modest (AUC for ROC curve 68.9%). For subphenotypes associated with the GRS and sub-GRS, these scores significantly improve prediction over disease duration and gender, but the AUC for these subphenotypes is even more modest (56.0%–65.7%). For renal disease, the GRS did not improve prediction over clinical variables. It will be very interesting to see how such measures will be improved as we obtain additional information on SLE risk. In particular we anticipate that new susceptibility loci will be found as non-northern-Europeans are studied in greater detail. We also anticipate that the locations of current risk loci will be determined more precisely with regional fine mapping, re-sequencing, and functional studies.