|Home | About | Journals | Submit | Contact Us | Français|
Multiple sclerosis is a debilitating neuroimmunological and neurodegenerative disease affecting more than 400,000 individuals in the United States. Population and family-based studies have suggested that there is a strong genetic component. Numerous genomic linkage screens have identified regions of interest for MS loci. Our own second-generation genome-wide linkage study identified a handful of non-MHC regions with suggestive linkage. Several of these regions were further examined using single-nucleotide polymorphisms (SNPs) with average spacing between SNPs of approximately 1.0 Mb in a dataset of 173 multiplex families. The results of that study provided further evidence for the involvement of the chromosome 1q43 region. This region is of particular interest given linkage evidence in studies of other autoimmune and inflammatory diseases including rheumatoid arthritis and systemic lupus erythematosus. In this follow-up study, we saturated the region with ~700 SNPs (average spacing of 10kb per SNP) in search of disease associated variation within this region. We found preliminary evidence to suggest that common variation within the RGS7 locus may be involved in disease susceptibility.
Multiple sclerosis (MS [MIM 126200]) is a neurodegenerative autoimmune disease characterized by demyelination within the central nervous system (CNS. Demyelination and the resulting formation of scar tissue in the CNS impair the saltatory conduction along axons that is necessary for normal functioning of nerve impulses. Though little is known about the underlying etiology of the disease, MS is a heterogeneous disorder with several characteristics common to autoimmune disorders —including polygenic inheritance, evidence of environmental exposure, and partial susceptibility conferred by a human leukocyte antigen (HLA)-associated gene (Barcellos, et al. 2002). There is also a greater prevalence of the disease in women than in men. This inflammatory disorder results from an autoimmune response directed against CNS antigens—particularly myelin proteins. Class II major histocompatibility complex (MHC) molecules, such as HLA (human leukocyte antigen), normally function to bind and present peptide antigens to antigen-specific T cells. It is thought that the dysregulation of this process in MS results in damage to the myelin sheath, producing the pathophysiological phenotype seen in the disease. Despite the degenerative nature of the disease, an affected individual’s lifespan is rarely shortened.
The clinical heterogeneity and complex etiology of MS have been confounding factors for genetic studies of the disease. Yet despite these complexities, it is clear that genes play a vital role in the susceptibility to MS. Numerous research groups, including our own, have conducted genomic linkage screens for MS in an attempt to identify regions that harbor disease loci (Akesson, et al. 2002, Ban, et al. 2002, Broadley, et al. 2001, Coraddu, et al. 2001, Dyment, et al. 2004, Ebers, et al. 1996, Eraksoy, et al. 2003, Haines, et al. 1996, Hensiek, et al. 2003, Kenealy, et al. 2004, Kuokkanen, et al. 1997, Sawcer, et al. 1996, Sawcer, et al. 2005). Over 70 genomic regions have been investigated, revealing varying levels of support. However, the lack of replication of results from these studies has been problematic for using linkage to detect loci involved in MS.
The strongest and most consistent finding for linkage to MS is chromosome 6p21.3, the location of the MHC containing HLA. Until just recently, the MHC has been the only region clearly and consistently demonstrating linkage and association with MS (Haines, et al. 1996, Sawcer, et al. 1996, Sawcer, et al. 2005, Yaouanq, et al. 1997). However, the MHC has been estimated to account for only 10–50% of the genetic component of MS susceptibility, at least in the Caucasian populations of northern European descent (Dyment, et al. 2004, Haines, et al. 1998). It appears that the association with the HLA-DRB1*1501 allele explains this linkage signal (Barcellos, et al. 2002, Haines, et al. 1998), although this issue has been debated (Ligers, et al. 2001). The exact mechanism by which a gene or genes in the MHC increase disease risk has yet to be determined, although recent work has sought to provide the tools needed to explore this region in greater detail (de Bakker, et al. 2006, Horton, et al. 2008). Recent work by our group has identified a strongly associated single-nucleotide polymorphism (SNP) (rs6897932) in the interleukin-7 receptor α gene (IL7RA), a gene located on chromosome 5p13. This SNP introduces a coding change (T244I) that alters the ratio of soluble to bound protein (Gregory, et al. 2007). This result was further replicated in a genome-wide association study (GWAS) performed by a broader collaborative group of investigators (The International Multiple Sclerosis Genetics Consortium (IMSGC)). The GWAS also identified additional replicated associations with variations in other genes (including CD25/IL2RA and CD58) (International Multiple Sclerosis Genetics Consortium, et al. 2007). The involvement of these genes plus several others (e.g. CLEC16A) are now being tested and confirmed in multiple international studies (International Multiple Sclerosis Genetics Consortium (IMSGC). 2008, Rubio, et al. 2008, Weber, et al. 2008).
Although genome-wide association studies have now begun, dense SNP follow-up studies of narrowed linkage regions are still underway because of their potential for identifying additional disease susceptibility genes. The American-French Multiple Sclerosis Genetics Group published one of the largest MS genomic linkage screens conducted (Kenealy, et al. 2004). Follow-up of candidate regions identified in this study entailed genotyping SNPs at approximately 1.0 Mb intervals flanking 10 Mb on each side of peak screen SNPs in an expanded U.S. dataset (Kenealy, et al. 2006). The 1q43 region not only continued to demonstrate evidence for linkage in the follow-up analysis (non-parametric multipoint LOD = 2.99), but also generated a narrowed linkage interval when using the ordered subset analysis (OSA) method to condition on the other loci (Ghosh, et al. 2000, Hauser and Boehnke. 1998, Hauser, et al. 2001, Kenealy, et al. 2006). This narrowed interval is ~7.0 Mb for a LOD score cut-off of 1.8 (corresponding to a –2.0 LOD score confidence interval from the 3.8 LOD score HLA-based peak) (see Figure 1). Chromosome 1q43 was selected for further detailed investigation based on several additional lines of evidence. SNPs in the 1q43 region have also demonstrated suggestive linkage and/or association in several other MS screens conducted in a variety of study populations (Ban, et al. 2002, Broadley, et al. 2001, Goedde, et al. 2002, Laaksonen, et al. 2003, Sawcer, et al. 2002). None of these earlier studies, which lack the power and marker density of the current generation of genome-wide association studies, demonstrate significant evidence for either linkage or association. However, they do provide support from independent sample populations of an effect within a common region of interest. Another compelling piece of evidence for 1q is linkage to this region in studies for the autoimmune diseases rheumatoid arthritis and systemic lupus erythematosus, suggesting the potential presence of a gene common to autoimmune processes in this region (Gaffney, et al. 1998, Gaffney, et al. 2000, Jawaheer, et al. 2001, Jawaheer, et al. 2003, MacKay, et al. 2002, Shai, et al. 1999). Following these lines of evidence, we set out to test whether common variation within this region is involved in susceptibility to develop multiple sclerosis.
After quality control procedures, 578 SNPs were included in our initial analysis. Using the Tagger implementation in the Haploview program we selected 268/578 SNPs, with r2<0.2, for inclusion in a multipoint linkage analysis of the narrow interval. These results continue to demonstrate considerable evidence for linkage to this region (peak Lod Score 3.42 at ~238Mb) (Figure 2a). As was analyzed in the initial follow-up (Kenealy, et al. 2006), we examined this narrowed interval using the ordered subset analysis (OSA) method to condition on the same loci (namely HLA-DRB1*1501 and Chr2) in an attempt to further resolve this effect (Figure 2a). The goal of this analysis was to determine if other previously suggested linkage regions, within our initial dataset, have a significant effect on the 1q43 region such that we might be able to identify specific subsets of our sample that may account for the effect within this region. Neither of these conditional analyses demonstrated significant differences in lod scores from the overall multipoint analysis, suggesting that neither of these two loci can be used to predict families which will demonstrate strong evidence for a genetic effect within the 1q43 region.
Finally, as recent genome-wide association studies have found genetic overlap (by way of identifying common loci, which are definitively associated with multiple autoimmune-related disorders) within autoimmune disorders, it is feasible that a gene (or genes) within this region of chromosome 1q43 is affecting multiple autoimmune phenotypes (Maier and Hafler. 2008). Given the evidence for linkage in other autoimmune diseases, we split our dataset into a) families with evidence of additional autoimmune disease (n=100); b) families with no evidence of additional autoimmune diseases (n=53); note there were 20 families for which a decision could not be made given our available information. While the lod scores are generally higher for the families with evidence for multiple autoimmune diseases, this difference is not significant (Figure 2b).
Subsequent to the multipoint linkage analysis, we conducted tests of association across the 578 SNPs. Twelve SNPs demonstrated nominal p-values (p ≤0.05) (Table 1). We then chose an additional set of 51 SNPs to be genotyped (using Taqman) in our extended dataset of 831 multiplex and trio families. These SNPs provide a higher density across regions demonstrating either evidence for linkage (based on 2pt lod scores) or association. Analysis of these additional SNPs pointed to a region of interest within the RGS7 gene. Two adjacent SNPs within the follow-up demonstrated both the highest 2pt lod score (rs4660010, lod=3.24) and the most significant association (rs261809, PDT p = 0.01). While none of these results survive any sort of multiple comparisons correction, they suggest the need for further examination. We then examined the data from the recent IMSGC genome-wide association study to determine if there was any evidence for association across this region (International Multiple Sclerosis Genetics Consortium, et al. 2007). Several of these SNPs demonstrate moderately significant p-values within this same region (Table 2). It should be noted that 301 trios in our follow-up dataset (n= 831 families) overlap with the 931 trios within the GWAS. While the p-values listed in Table 2 represent independent results, the single exception is rs1380304 for which data is represented for our follow-up, the non-overlapping families in the GWAS, and the GWAS result itself. These preliminary findings further suggest that there may be a common variant within this region that is involved in MS.
Despite the success of previous studies in narrowing the linkage peak on chromosome 1q43, the search for a susceptibility locus in this region remains a challenging task. Even in this narrowed interval there are over 20 known or predicted genes currently listed in the public databases, several of which could serve as candidate genes due to their proposed function (e.g. involvement in autoimmunity, viral susceptibility, oxidative stress/mitochondrial function, or neuronal processes) or due to their tissue expression patterns (Table 3). In addition, many of the remaining genes in this region have undergone little or no functional characterization, likely resulting in failure of these genes to be selected for investigation in a candidate gene study despite their potential involvement in disease. These problems are not unique to studies of MS—studies of many complex diseases are hindered by flawed assumptions and lack of information concerning genes in a given candidate region. New methods that address these confounding factors are warranted. With this in mind we devised (and previously described) a process that prioritizes annotated SNPs for genotyping studies based on their location within Multi-species Conserved Sequences (MCSs) (Margulies, et al. 2003) and used this process to select SNPs across this ~7.0 Mb region of interest (McCauley, et al. 2007).
The linkage signal in this region remains strong, although our results indicate that this appears to be independent of HLA status, a putative chromosome 2 locus, or the presence of multiple autoimmune disorders within a family. One possible explanation is that there may be multiple rare variants present in this region, leading to the consistent detection of linkage, but lending difficulties to detect strong association to a common variant.
Interestingly, the previous reports demonstrating linkage in the MS, rheumatoid arthritis, and systemic lupus erythematosus screens are within a few Mb of the peak SNP from the MSGG genomic screen and follow-up. However, genome-wide association studies (GWAS) across multiple autoimmune disorders have failed to convincingly identify common variation within this region predisposing to any or all of these autoimmune diseases. This is based on the current literature detailing the findings of these studies in numerous other autoimmune diseases. This does not exclude the possibility that the effects of common variation within this region are too small within these other diseases to be detected with the current available datasets. When examining the IMSGC GWAS results (International Multiple Sclerosis Genetics Consortium, et al. 2007) across this region, we find further potential evidence for association to SNPs within RGS7. Future examination of this region will undoubtedly require examination of copy-number variation (CNVs) and possible resequencing to identify a potential functional variant predisposing to MS.
Although the results of this study demonstrate some modest evidence for involvement of a common variant within the RGS7 gene to be associated with MS, additional well-powered studies need to be conducted in order to confirm these findings. RGS7 is a member of the family of proteins called regulators of G protein signaling (RGS) and it has been shown that altering RGS7 function could impair the normal dampening of the inflammatory response (Hausmann, et al. 2002). Hausmann and colleagues found that expression of RGS7 in microglia and/or invading peripheral macrophages was induced by the inflammatory response created by experimental spinal cord injury in rats. It may follow that impaired regulation of RGS7 expression may have detrimental effects on the normal inflammatory cascade and that these effects may play a role in multiple sclerosis.
The dataset used for the first stage of this follow-up study consisted of 173 multiplex families with 405 total affected individuals (115 male, 290 female). Of these initial multiplex families, 91 families had previous evidence for positive linkage to 1q43 (Kenealy, et al. 2006). The second stage (or extended) follow-up dataset included 831 multiplex and trio families with 1077 total affected individuals (266 male, 811 female). All families were ascertained at the University of California at San Francisco (UCSF) according to well established diagnostic and inclusion criteria.
All protocols were approved by the appropriate Institutional Review Boards and all individuals provided informed consent before participating in the study. Positive family histories were investigated by direct contact with other family members, request for medical records, and by clinical examination, laboratory testing, or paraclincial studies (MRI scanning and evoked-response testing). Individuals were placed into one of four categories: definite MS, probable MS, possible MS, and no evidence of MS. Consistent and stringent clinical criteria were applied as described elsewhere (Goodkin, et al. 1991, Haines, et al. 1996) and all clinically definite MS cases met the Poser criteria (Poser, et al. 1983). Only definite MS cases were considered as affected in the analyses.
For the first stage of follow-up, 768 SNPs were genotyped across this region in 173 multiplex families. The Illumina BeadArray™ platform was used for rapid and accurate SNP genotyping in our dataset. All genotyping was performed by the Duke Genomics Resource Laboratory Core using the Illumina BeadArray™ platform. This system provides automated outputs that ease transfer between data generation and the PEDIGENE® database system used in our statistical analyses (Haynes, et al. 1995). Due to our own SNP density requirements and the nature of the Illumina system, we chose 768 SNPs which completed two Illumina oligo pool assays (or OPAs). A detailed description of the SNP selection process for this project has been previously described ((McCauley, et al. 2007)). In general SNPs were selected by considering informativeness, validation status, location/density, and putative function. Average spacing of the 768 SNPs in the ~ 7.0 Mb region on 1q43–1q44 region was < 10 kb— allowing for coverage of the region that is appropriate for observed patterns of linkage disequilibrium in Caucasian populations (Gabriel, et al. 2002). Most SNPs fell within intronic and intergenic areas.
We examined the Illumina panel of SNPs using standard measures of quality control, dropping 190 SNPs from downstream analysis. Of the 190 SNPs that failed: 167 SNPs were monomorphic in our dataset; 22 SNPs demonstrated poor assay performance as measured by call rate (<95%), cluster separation, or other assay failure; and 1 SNP mapped to multiple locations in the genome. For stage one of the follow-up, eleven samples were dropped (including 1 affected male, 3 affected females) due to low call rates (<95%) and an excess of Mendelian errors.
Subsequent to the initial linkage and association analysis of the 173 multiplex family dataset, more detailed association analysis was performed within the extended dataset of 831 total families. SNPs, within the Illumina panel, demonstrating evidence for association as well as additional novel SNPs (n=51) within high priority regions based on linkage and association results were selected for further genotyping and analysis using Taqman assays from Applied Biosystems (Foster City, CA). The final set of the additional 51 SNPs met the design criteria for Taqman, provided an additional level of marker density in key regions, and had relatively high minor allele frequencies.
Following completion of genotyping, quality control (QC) procedures were conducted to ensure the accuracy of all genotype data. These included SNP checks for call rate and assay reliability, as well as sample checks for call rate and Mendelian inconsistencies. Thresholds used to exclude data are discussed in the results.
After assessing the quality of both SNPs and samples, we conducted a battery of analyses to localize and identify common susceptibility variants underlying our narrowed linkage peak. Multipoint linkage analysis was performed using MERLIN (Abecasis, et al. 2002). Two-point LOD scores were calculated in FASTLINK (Cottingham, et al. 1993, Schaffer, et al. 1994), with two-point heterogeneity LOD (HLOD) scores being calculated through HOMOG (Ott. 1986). Ordered subset analyses (OSA) were performed using the FLOSS software (Browning. 2006). SNPs included in both multipoint linkage analysis and FLOSS were chosen using the implementation of Tagger within the Haploview software package (Barrett, et al. 2005, de Bakker, et al. 2005). Association analyses for this study were conducted using the PDT statistic (Martin, et al. 2000).
This work was funded through NIH grants NS051695 and NS32830 and a post-doctoral fellowship (FG 1718-A-1) to JLM from the National Multiple Sclerosis Society (NMSS). This project was performed through the use of the Vanderbilt Center for Human Genetics Research Core (CHGR) facilities (DNA Resources Core and Bioinformatics Core), as well as the Duke Center for Human Genetics Molecular Genetics Core. We also recognize the International Multiple Sclerosis Genetics Consortium (IMSGC) for the use of data generated by the efforts and support of this broader collaboration. Most notably, we thank the patients and families that contributed to this study.