|Home | About | Journals | Submit | Contact Us | Français|
In our previous study, we utilized a Bayesian design to probe the association of ~1,000 genes (~10,000 SNPs) with SLE on a moderate number of trios of parents and children with SLE. Two genes associated with SLE with a multitest corrected False Discovery Rate (FDR) of <0.05. were identified, and a number of noteworthy genes with FDR of <0.8 were also found, pointing out a future direction for the study. In the present report, using a large population of controls and adult- or -childhood onset SLE cases, we have extended the previous investigation to explore the SLE association of ten of these noteworthy genes (109 SNPs). We have found that seven of these genes exhibit significant (FDR < 0.05) association with SLE, both confirming some genes that have previously been found to be associated with SLE (PTPN22 and IRF5) and novel findings of genes (KLRG1, IL-16, PTPRT, TLR8 and CASP10) which have not been previously reported. The results signify that the two-step candidate pathway design is an efficient way to study the genetic foundations of complex diseases. Furthermore, the novel genes identified in this study point to new directions in both the diagnosis and the eventual treatment of this debilitating disease.
In the past 3 years, genome wide association (GWA) studies have become extremely popular because they permit the interrogation of the entire human genome, both at levels of resolution previously unattainable and in thousands of unrelated individuals, while remaining unconstrained by prior hypotheses regarding genetic association with the disease. While an alternative to GWA studies, pedigree-based linkage analysis, has found disease susceptibility variants, these variants tend to have large relative risks. Furthermore, they have little effect on disease risk at a population level due to their rarity. This argument suggests that more common genetic variants, despite having more moderate relative risk, may be far more important in terms of public health simply because they are more common. GWA studies rely, therefore, on the “common disease, common variant” (CDCV) hypothesis, which suggests that the influences of genetics on many common diseases will be at least partly attributable to a limited number of allelic variants present in more than 1% to 5% of the population.(1, 2). But there also exist examples of rare variants influencing common disease (3, 4). If multiple rare genetic variants were the primary cause of common complex disease, GWA studies would have little power to detect them; particularly if allelic heterogeneity existed. Ironically, given the recent huge financial and scientific investment in GWA, there is not a great deal of evidence in support of the CDCV hypothesis (5).
Furthermore, the GWA approach is also problematic because the massive number of statistical tests performed presents an unprecedented potential for false-positive results, leading to multiple test correction to properly control levels of statistical significance, coupled with the increased need for replication of findings (6). If performed appropriately, correction for multiple testing will render most of the findings insignificant due to the large number of tests (≥300,000, typically).
Given that the case-control samples for GWA usually number in the thousands, it might be expected that such studies are well-powered. However, several authors have shown that, given the strict genome-wide significance criteria that studies must fulfill, the power of such studies is much less than might be naively imagined (7, 8).
There is also a limit to how large population-based studies can get due to constraints such as budget, time, and the physical number of cases in the population, so there may be a further class of variants that are too rare to be captured by GWA but are not sufficiently high risk to be captured by population-based linkage (for examples ref. 9). Alternative approaches are needed to find these variants.
To counteract these shortcomings of GWA, we have adopted a Bayesian approach, which concentrates on a collection of candidate pathways rather than concentrating on specific candidate genes (or the whole genome). Using these pathways, we have taken advantage of the accumulated data from pre-existing association studies of adult SLE families, candidate gene investigations, information gained from genetics of mouse models of lupus and the gene expression profiling data of human SLE to identify sets of genes and regions containing genes which have a higher prior likelihood of association with SLE
To implement this approach we have developed a set of programs which embody a combination of automated and manual approaches to maximize the power of gene association studies using prior information to select and prioritize genes, both to decrease the size of the problem, and to increase the likelihood of discovering reproducible associations (10).
Utilizing this bioinformatics-driven design, we selected ~10,000 SNPs derived from ~1000 genes on a custom-made platform to genotype a modest sample of 753 subjects corresponding to 251 childhood-onset SLE trios (SLE patient and both parents) (11). Family-based Transmission Disequilibrium Test (TDT) and multi-test correction analysis identified SELP and IRAK1 as novel SLE-associated genes with high degree of significance corrected for multiple tests using the False Discovery Rate (FDR) less than 0.05 Importantly, the original study had also identified a number of genes which although not significant by the accepted criteria, were considered to be noteworthy for further investigation (0.05<FDR < 0.8). We present here the results on a group of ten such genes (109 SNPs), obtained in case-control study with a large number of subjects.
In the present study we explored the SLE association of ten promising genes, each of which showed a False Discovery Rate (FDR) of <0.8 in our previous TDT-based study. The candidate genes evaluated in the present study are BCL6, CASP10, IL-16, IRF5, KIR2DS4, KLRG1, PRL, PTPN22, PTPRT and TLR8. The present case-control study included an independent childhood-onset cohort of 769 childhood-onset SLE and 5337 cases of adult-onset SLE subjects and 5317 healthy controls each being composed of four ethnicities, as detailed in Supplementary Table 1.
An important component of our approach was the deliberate recruitment and usage of childhood-onset SLE cases. They present a unique subgroup of patients for genetic study because their earlier disease onset, a more severe disease course, a greater frequency of family history of SLE, and a lesser effect of sex hormones in disease development (12, 13) implies a higher genetic load or a more penetrant expression of this genetic load. However, because childhood-onset SLE may also show the involvement of different genetic factors relative to adult onset disease, we analyzed childhood-onset and adult-onset groups of SLE patients separately.
In order to account for any potential confounding substructure or admixture we performed principal component analyses (PCA) (14) as detailed in Patients and Methods. Excluding the outliers identified by PCA resulted in low inflation factors in all ethnicities except Hispanic Americans, with only the latter requiring additional PC correction.
As we are performing tests of multiple related hypotheses, controlling for study-wide significance is an important concern to avoid promulgating false positives due to the multiple testing. A classical correction for multiple testing is the Bonferroni correction (or similar family-wise error rate corrections). Unfortunately, it is both too strict and inappropriate in studies such as the present one because of the underlying assumption that each test is independent, whereas in actuality a complex and unknown interdependence is present among SNPs in linkage disequilibrium (11, 15). In light of this, we have instead calculated an estimate of the false discovery rate (FDR), which measures the number of false positives (Type I errors) we would have to accept to consider a result a true discovery (reject the null hypothesis), using the Benjamini and Hochberg procedure (16), considering the total number of SNPs tested and the four different ethnic groups (Supplementary Table 1). Combined p values were calculated from the per-ethnicity p value using the Fisher method. Table 1 shows that 28 SNPs from 7 genes out of the 10 tested have significant combined association with SLE in adult or childhood-onset subgroups after correction for multiple testing. The complete data on all SNPs tested in this study are given in Supplementary Table 2. Importantly, these genes include the previously associated genes PTPN22 and IRF5 but also several novel genes that have not yet been associated with SLE. We did not find significant association with the SNPs genotyped in KIR2DS4, PRL and BCL6 in either childhood- or adult-onset SLE. With the exception of rs2476601 in PTPN22, none of the SNPs which we found to be significant code for amino acid changes; only rs11073001 in IL-16 is in an exon, but this variant does not encode for a different amino acid. The most significant SNP found was rs4728142 in IRF5, with a combined p value in adults on the order of 10−29, and a corresponding FDR on the order of 10−27.
Figure 1 shows the association of SNPs from four novel genes: KLRG1, IL-16, PTPRT and TLR8 with SLE in four ethnic groups (European Americans [EA], African Americans [AA], Asian Americans [AsA], and Hispanic Americans [HA]) in childhood-and adult-onset SLE cases. It is noteworthy that the majority of the significantly associated SNPs show significance in multiple ethnicities both in adult-onset and in childhood-onset SLE. Nevertheless it is also important to notice cases where SLE association is strongly ethnicity-dependent. For example, the SNPs around exon 1 of TLR8 are not significant in AsA but are significant in HA both in children and adults. These graphs also show the distribution of significant SNPs in the genes. For example, the significant SNPs in IL-16 are concentrated around exon 18.
Next we performed haplotype analyses in different ethnic groups, children and adults separately (Tables 2, ,44,,55 and Supplementary Tables 3-5). Supplementary Table 3 depicts the significant haplotype blocks in KLRG1 which are noticeably different in the various ethnicities. Interestingly no significant haplotype blocks were found in adult EA. The significant haplotype blocks in IL-16 are limited to childhood-onset HA (Supplementary Table 4). As shown in Supplementary Table 5, the significant haplotype blocks in PTPRT were limited to AsA and a smaller block in childhood-onset HA.
IRF5 has a large number of significant haplotype blocks which are similar in the various ethnicities beside in AA (Table 2).Comparing our results with the previously published data on IRF5 association with SLE, we found that rs729302 SNP was reported to be associated with SLE in an EA population with a p-value of 4×10−04 (17), or p-value of 5.2×10−7(18), in Swedish cohort with a p-value of 2.7×10−4 (not corrected for multitest)(19) and in family trios (uncorrected p-value of 5.0×10−4) (20). We confirmed these findings on an EA cohort with a multitest corrected FDR of 3.4×10−9 in adults and 1.8×10−8 in childhood-onset cases as shown in Table 3. Furthermore, we found significant association of this rs729302 SNP with SLE in HA adults (q value of 8.0×10−3 (Table 3)), and combined children (FDR of 1.6×10−5 (Table 1), but not in AA or AsA cohorts in either adult- or childhood-onset SLE (Table 3). The previously reported association of rs4728142 in a Swedish cohort (19) and family trios (20) was confirmed by us and extended to all four ethnicities in adult-onset, and to all ethnicities, except AA in childhood-onset disease (Table 3). We have also confirmed the involvement of rs2004640 in EA (17-19), African Americans (18), Chinese (21) and family trios (20), and in both childhood- and adult-onset SLE in each ethnicity, except for childhood-onset HA (Table 3). Association with rs752637 in Europeans was demonstrated by some previous investigators (18, 19) but not by others (17). Our studies found a strong association of this SNP with SLE in EA adults (FDR 1.4×10−10), but not as strong in adult HA, AsA or children EA cohorts (Table 3).
We have confirmed the previous association of rs3807306 with SLE demonstrated in a European cohort (19) and in EA and AA (18) and extended this association to HA and AsA (Table 3). The association of rs3807306 in AA was not significant in our study (q value 0.09), but with an uncorrected p-value of 0.03, it does not contradict the results of a previous study (18). Also, in agreement with Sigurdsson et al., (19) we did not detect association of rs1874328 with SLE (Supplementary Table 2). Underscoring the ethnic dependence of many SNP associations, rs3807135, previously found to be SLE-associated in a family trio study (20), was found by us to be associated in adults only in EA and HA, but not in AsA or AA, with a very low q value of 0.51 for adult-onset AA (Table 3). We have also confirmed SLE-associated haplotype block in the same region as reported previously (17, 18) and extended this block in chromosome region and detected its SLE association with other ethnicities (Table 2).
Table 4 shows significant haplotype blocks in TLR8 which are distributed throughout the gene, though childhood-onset EA has a haplotype with much higher significance located in the 5′UTR of TLR8. In addition, Fig.2 shows the LD between SNPs that compose the haplotype blocks of TLR8 in adult-onset AA (panel A) and childhood-onset AA (panel B) and EA (panel C). These panels illustrate the differences in LD structure which lead to distinct haplotype blocks observed in different ethnicities.
Although no SNPs found in BCL6 survived the multitest correction, the haplotype analysis indicated that a haplotype block in childhood-onset AsA is significantly associated with the disease (Table 5) underlining the utility of haplotype analysis even in the absence of singly significant SNPs. The LD structure which led to this haplotype block is depicted in Figure 3.
Using a large cohort of adult- or childhood-onset SLE cases in four different ethnicities and comparable numbers of relevant controls, we show in the present study, that seven genes exhibit significant (FDR < 0.05) association with SLE, both confirming some genes that were previously found to be associated with SLE (PTPN22 and IRF5) and novel findings (KLRG1, IL-16, PTPRT, TLR8 and CASP10) that were not previously reported. Furthermore, although none of the SNPs within the BCL6 gene achieved significant association with SLE after multitest correction, a haplotype block within BCL6 shows significant association with the disease as well. These genes are additional to the IRAK1 and SELP which we found to be significantly associated with SLE in our first step study (11), and their follow up studies are being reported separately.
The presented results demonstrate the powerful potential of using a two step Bayesian approach utilizing up-to-date biotechnology and bioinformatic methods for discovering novel genes. This methodology yielded more novel significant results than the much more expensive GWA approach. Indeed, Iles (5) re-examined the results of 54 studies across 22 different relatively common complex diseases, most of which were GWA studies with some GWA-follow up studies. Only 45 disease-associated SNPs found initially in GWA studies could be conclusively ascertained as significant. Furthermore, for several diseases, such as Parkinson’s disease (22), bipolar disorder (23, 24) and hypertension (23), no new replicable variant has yet been found using GWA (as of February 2008). Compared to approximately 2 new genes identified per disease using GWA, our Bayesian approach yielded many more novel genes. As a case in point, several of the genes discovered using the Bayesian design were missed by the GWA studies performed to date in SLE (25-28).
GWA studies have been praised because of their unbiased nature, namely they are “unbiased by prior assumptions about the DNA alterations responsible” (29). However, it does not make much sense to ignore the whole universe of valuable information collected about the pathogenesis of a common disease which has been studied for decades by excellent investigators. The previous studies, whereas often not resulting (or not even designed) to identify SLE-associated genes, did provide a well-documented background to reveal SLE-associated physiological pathways. Accordingly, our design took advantage of the vast literature on SLE in humans and in mouse models of lupus. The results obtained demonstrate that using prior available information as a primary guide allows one to of identify novel SLE-associated genes with high confidence.
In each of these novel genes there is much biological information to form hypotheses as to their involvement in the genetic predisposition to SLE Thus, the association of KLRG1 (killer cell lectin like receptor 1) gene (mapped at 12p12) implicates the involvement of NK-cells in the genetic predisposition to SLE. KLRG1 is expressed on NK cells and on subsets of activated T-cells. KLRG1-expressing NK cells show decreased proliferative activity (30). SLE patients, including childhood-onset cases, have quantitative and qualitative alterations in NK cells (31-33). The association of SLE with KLRG1 demonstrated in our studies, coupled to previous findings that first-degree relatives of SLE patients (33) and healthy monozygotic co-twins of SLE patients (34) display reduced numbers and activity of NK cells, suggests that this latter phenotype might be involved in disease causation rather than being simply a consequence of the disease process.
More recent work has shown that KLRG1 expression defines a novel and distinctive subset of CD4+ Treg cells that depend on IL-2 and express FoxP3 but are only partially overlapping with the CD4+, CD25+ Treg subset (35). Interestingly, the cytokine IL-16 shown to be elevated in SLE subjects (36, 37), is a natural ligand of the CD4+ molecule and induces CD4 T cell anergy (38, 39). IL-16 may also induce or recruit CD4+ FoxP3+ T regs in the tissue (40). Thus, the involvement of IL-16 in the genetic predisposition to SLE as shown here might be in the same pathway as KLRG1.
SLE is characterized by the production of autoantibodies to certain cellular macromolecules, such as the small nuclear ribonucleoprotein particles (snRNPs) (41) and by the increased expression of type I interferon (IFNA) (42, 43). Conserved RNA sequences within snRNPs can stimulate Toll-like receptors (TLRs) 7 and 8 as well as activate innate immune cells, such as plasmacytoid dendritic cells (pDCs), which respond by secreting high levels of IFNA. Possibly, SLE patients’ sera containing autoantibodies to snRNPs form immune complexes that are taken up through the Fc receptor gammaRII and efficiently stimulate pDCs to secrete IFNAs. Thus, a prototype autoantigen, the snRNP, can directly stimulate innate immunity, suggesting that autoantibodies against snRNP may initiate the autoimmune response by stimulating TLR7/8 (41). IFNA, via inducing genes such as IRF5, can exert major effects on the immune system, including inducing a Th1 response and maintaining T cell activation, while also lowering the threshold for B-cell activation and promoting B-cell survival and differentiation (44). It is likely that genetic variants that change IRF5 activity could result in a prolonged pro-inflammatory response, and/or potentially break immunological tolerance. It is therefore possible that the genetic involvement of TLR8 gene may at least partially overlap with the IFNA induced gene IRF5 in predisposing to SLE.
Importantly, IRF5 signaling has also been shown to play a role in the regulation of cell cycle and apoptosis (45) raising the possibility that susceptibility variants of IRF5 may affect SLE pathogenesis at the level of the apoptosis pathway as well (44). Indeed, the involvement of defective apoptosis in the predisposition of SLE is well documented (46, 47).The association of CASP10 with SLE demonstrated in the present manuscript may further emphasize the importance of apoptosis pathways in the genetic predisposition to SLE. The CASP10 gene locus at 2q23 is mutated in human autolymphoproliferative syndrome (ALPS) type II (48). Patients with ALPS II exhibit prominent non-malignant lymphadenopathy, hepatosplenomegaly, hyperimmunegammaglobulinemia with multiple autoantibodies, autoimmune-hemolytic anemia and lymphocytosis with accumulation of normally rare CD4-/CD8-T-cells (48) as in the lupus prone MRL/lpr/lpr mice. Importantly, CASP10 is not only involved in Fas signaling but is essential for apoptosis signaling via multiple death receptors (49).
Although further studies will be necessary to prove the involvement of BCL6 in the pathogenesis of SLE, a significant haplotype block within this gene is an important first step in incriminating this transcriptional repressor. BCL6, a frequently translocated oncogene in diffuse large B-cell lymphoma has also an important function in regulating the differentiation of B cells, T cells, and myeloid cells (50). More specifically, BCL6 is required for germinal center formation and is also a critical inhibitor of Th2 responses and inflammation (51, 52).
Protein tyrosine phosphatase (PTP) receptor type T, (PTPRT) together with the previously associated PTPN22 (25, 53-55) underscore the importance of PTPs in the pathogenesis of SLE. PTPRT has been characterized as a key inhibitor of STAT-3 (56) which, in turn, mediates transcriptional activation in response to several cytokines involved in the inflammatory response, such as IL-6.
Interestingly, PTPRT is a genetic loci that was suggested to be associated with Rheumatoid Arthritis in three independent GWA studies (23,57, 58) In each of these GWA studies the SNPs within PTPRT lost their significance after multitest correction, further exemplifying the problematic of GWA studies.
In summary, the extensive involvement of these candidate genes in the regulation of the immune response makes their association with SLE potentially very important and justifies subsequent genetic and functional studies.
Subjects were enrolled in the Lupus Genetic Study Groups at USC and OMRF, in the PROFILE Study Group at UAB and from additional collaborators using identical protocols. All patients met the revised 1997 ACR criteria for the classification of SLE (59). Ethnicity was self-reported and verified by parental and grandparental ethnicity, when known. Blood samples were collected from each participant, and genomic DNA was isolated and stored using standard methods. Cases were defined as childhood-onset according to the criterion that the diagnosis of SLE was made before the age of 13 by at least one pediatric rheumatologist participating in the study. All protocols were approved by the Institutional Review Boards at the respective institutions.
Genotyping was performed using Illumina iSelect™ Infinium II Assays on the BeadStation™ 500GX system (Illumina, San Diego, CA). For analysis, only genotype data from SNPs with a call frequency greater than 90% in the samples tested and an Illumina GenTrain score greater than 0.7 were used. GenTrain scores measure the reliability of the SNP detection based on the distribution of genotypic classes. The average SNP call rate for all samples was 97.18%. In order to minimize sample misidentification, data from 91 SNPs that had been previously genotyped on 42.12% of the samples were used to verify sample identity. In addition, at least one sample previously genotyped was randomly placed on each Illumina Infinium BeadChip and used to track samples throughout the genotyping process.
Testing for association was completed using the freely available R module, snpassoc (60), and PLINK (61). For each SNP, missing data proportions for cases and controls, minor allele frequency and exact tests for departures from Hardy-Weinberg expectations were calculated. In addition to allelic test of association, the additive genetic model was used as the primary hypothesis of statistical inference. Haploview version 4.0 (62) and the R module genetics (available from http://cran.r-project.org/web/packages/genetics/index.html) were used to estimate the linkage disequilibrium (LD) between markers and haplotype structures in different ethnicities.
Combined p values were calculated from the per-ethnicity p values using the Fisher method. FDR estimates using q values were calculated for different ethnicities using the q value package (available from http://cran.r-project.org) which implements the q value extension of False Discovery Rate (FDR) (63). The FDR for combined results were estimated using Benjamini and Hochberg procedure (16), as the proportion of correctly rejected null hypotheses was possibly overestimated when using the q value extension, and this procedure provides a more conservative estimation of FDR (but with less power). The FDR corresponds to the proportion of false positives among the results. Thus, an estimate of FDR less than 0.05 signifies less than 5% of the results accepted as true are false positives and is taken as a measure of significance.
To account for potential confounding substructure or admixture in these samples, principal component analyses (PCA) were performed (14) using a large set of SNPs (18,446, which were genotyped on these subjects as part of a larger effort). Four principal components were identified that explained a total of ~60% of the observed genetic variation. These were used to identify individuals who were genetically distant from other samples in the same ethnic subset, and thus capable of introducing admixture bias. A total of 378 controls and 569 adult SLE and 80 childhood-SLE cases were identified in this fashion and removed from further analysis as detailed in Supplementary Table 1. After removing these genetic outliers, duplicates, and related samples, 5457 independent SLE cases and 4939 controls remained for analysis. We then performed genomic control analysis to calculate the inflation factor λ using the same set of SNPs. This yielded a λ of 1.13 in European Americans samples, 1.03 in Hispanic Americans, 1.08 in African-Americans, and 1.04 in Asian Americans. Only the Hispanic subpopulation required a PCA correction to remove the final source of confounding via admixture to obtain the inflation factor given above.
This work was supported in part by NIH grant RO1AR445650 to COJ and ALR grant 52104 to COJ and RZ. Work at OMRF was supported by National Institutes of Health (AI063622, RR020143, AR053483, AR049084, AI24717, AR42460, AR048940, AR445650, AR043274), the Alliance for Lupus Research, and the US Department of Veterans Affairs. The work at UAB was supported by NIH grants P01-AR49084 and P60-AR48095, SCB was supported by Ministry for Health, Welfare and Family Affairs, Republic of Korea grants A010252, and A080588.