PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of plosonePLoS OneView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS One. 2012; 7(10): e46730.
Published online 2012 October 18. doi:  10.1371/journal.pone.0046730
PMCID: PMC3475710

Protein-Protein Interaction Analysis Highlights Additional Loci of Interest for Multiple Sclerosis

Joseph Najbauer, Editor

Abstract

Genetic factors play an important role in determining the risk of multiple sclerosis (MS). The strongest genetic association in MS is located within the major histocompatibility complex class II region (MHC), but more than 50 MS loci of modest effect located outside the MHC have now been identified. However, the relative candidate genes that underlie these associations and their functions are largely unknown. We conducted a protein-protein interaction (PPI) analysis of gene products coded in loci recently reported to be MS associated at the genome-wide significance level and in loci suggestive of MS association. Our aim was to identify which suggestive regions are more likely to be truly associated, which genes are mostly implicated in the PPI network and their expression profile. From three recent independent association studies, SNPs were considered and divided into significant and suggestive depending on the strength of the statistical association. Using the Disease Association Protein-Protein Link Evaluator tool we found that direct interactions among genetic products were significantly higher than expected by chance when considering both significant regions alone (p<0.0002) and significant plus suggestive (p<0.007). The number of genes involved in the network was 43. Of these, 23 were located within suggestive regions and many of them directly interacted with proteins coded within significant regions. These included genes such as SYK, IL-6, CSF2RB, FCLR3, EIF4EBP2 and CHST12. Using the gene portal BioGPS, we tested the expression of these genes in 24 different tissues and found the highest values among immune-related cells as compared to non-immune tissues (p<0.001). A gene ontology analysis confirmed the immune-related functions of these genes. In conclusion, loci currently suggestive of MS association interact with and have similar expression profiles and function as those significantly associated, highlighting the fact that more common variants remain to be found to be associated to MS.

Introduction

Multiple Sclerosis (MS) is the most common inflammatory disease of central nervous system (CNS) which affects young adults [1]. It is widely acknowledged that genetic factors play an important role in determining the risk of MS [2]. Several epidemiological studies demonstrated an increased frequency of MS among biological relatives of affected individuals [3], [4]. Family based and association studies have shown that the strongest genetic association in MS is located within the major histocompatibility complex (MHC) class II region [5]. In particular the HLA-DRB1*1501 allele confers an approximate odds ratio of 3 [6]. However, during the last few years Genome Wide Association Studies (GWAS) have identified many other MS associated loci of modest effect located outside the MHC (now more than 50) [7][11].

Despite the recent advances in the understanding of the genetic architecture of MS, several questions remain to be answered. For example, due to stringent correction criteria many genetic variants fail to reach genome-wide significance but can still be considered as suggestive of genetic association. Furthermore, once a SNP is found to be associated with a particular disease, the relative candidate gene (or genes) that mediate such association is usually unknown.

Analysis of protein-protein interaction (PPI) networks is being increasingly recognized as an important tool to characterize the underlying biology of genes associated to complex diseases, in particular immune-mediated ones [12], [13]. It is logical to hypothesize that those genes which are truly associated with the same trait will be involved in similar biological processes. For example, Rossin et al. found that proteins encoded in genomic regions associated to rheumatoid Arthritis and Crohn's disease physically interact more than what would be expected by chance and that the genes encoding these proteins are highly expressed in immune tissues [12]. Studying such PPI interactions can ultimately elucidate which suggestive regions are more likely to be truly associated and greatly aid the identification of those genes that are mediating the GWAS findings.

We conducted a PPI analysis of gene products coded in loci recently reported to be MS associated and suggestive of MS association. Our aim was to identify which suggestive regions are more likely to be truly associated, which genes are mostly implicated in the MS PPI network, their expression profiles and functions.

Methods

Three recent independent association studies were considered for our analysis [14][16]. In Sawcer et al. and Patsopoulos et al., SNPs were divided into significant and suggestive depending on the strength of the statistical association [14], [15]. From Sawcer et al we defined as suggestive those SNPs with p values in the discovery phase of less than 1×10−4 and significant those that either were replication of previous GWAS findings or had a replication p<0.05 and a p-combined<5×10−7 [14]. In Patsopoulos et al., significant SNPs were defined as either those with p-value<5×10−8 or replication of previously identified associated SNPs. Suggestive SNPs were those with p-values between 5×10−8 and 1×10−6 [15]. We also included in the analyses the top 82 SNPs (with a log p value>4.91) from Wang et al [16]. All SNPs from this study were considered as suggestive, because the study was not designed to meet currently accepted criteria for genome wide significance. After removing duplicate SNPs, 67 significant and 133 suggestive SNPs were obtained.

Protein-to-protein interaction assessment was conducted using the Disease Association Protein-Protein Link Evaluator (DAPPLE) tool [12]. This bioinformatics tool is able to investigate physical interactions among gene products encoded within certain genomic regions by the creation of a PPI network. Interactions are extracted from the database “InWeb” that combines data from a variety of public PPI sources including MINT, BIND, IntAct and KEGG and defines high confidence interactions as those seen in multiple independent experiments. The region around a given SNP is extended to the genomic interval defined by SNPs in moderate linkage disequilibrium (r∧2> = 0.5) and then to the nearest recombination hotspots [12]. Connections can be direct (two proteins are physically linked to each other) and indirect (interaction is mediated by a common interactor). The extent of the PPI network are assessed using the following parameters: the number of direct interactions between proteins from different loci, the mean associated protein direct and indirect connectivities (the mean number of distinct loci a protein is directly or indirectly connected to) and the mean common interactor connectivity (average number of proteins in separate loci bound by common interactors) [12]. The non-randomness of the network and the significance of the interaction parameters are tested using a permutation method that compares the original network with thousands of networks created by randomly re-assigning the protein names while keeping the overall structure (size and number of interactions) of the original network. Those genes that participate in the network more than expected by chance are defined as genes to prioritize (corrected p<0.05) [12]. Expression data were gathered from BioGPS, an online gene annotation database that reports individual gene expression levels for a number of human tissues and cell types [17]. Analyses were performed using non-parametric tests (Kruskal-Wallis and Mann-Whitney tests). Gene ontology terms were investigated using The Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7, an online tool that is able to identify the functional categories and biological processes which are most represented within a list of genes [18], [19].

Results

Dapple analysis of significant SNPs

Our first aim was to assess the extent of PPI interactions among genes located within genomic regions with definite association with MS susceptibility. We therefore submitted into DAPPLE the 67 SNPs with genome-wide significant association with MS risk. There were a total of 75 proteins participating in the direct network with 104 direct interactions (expected direct interactions = 61, p<0.0002) (Table 1, Figure 1 and Table S1). The mean associated protein direct connectivity was 2.7 (expected = 1.7, p<0.0002). The mean associated protein indirect connectivity was 52.2 (expected = 43.8, p = 0.04) and the mean common interactor connectivity was 4.5. (expected = 3.9, p = 0.0002). The total number of genes implicated in the network was 215 (Table S1). The total number of genes that had more connections than expected by chance (genes to prioritize) was 22 and included previously shown putative candidate genes such as IL-12A, SOCS-1, CBLB, MALT-1, IL-22RA, MAPK-1 and IL-7R.

Figure 1
Direct connections among gene products from MS significant regions.
Table 1
Summary of DAPPLE analysis of significant and significant plus suggestive SNPs.

Dapple analysis of significant plus suggestive SNPs

When suggestive SNPs were included in the analysis, the number of proteins participating in the network and that of direct interactions increased from 75 to 189 and from 104 to 281 respectively (expected direct interactions = 242, p<0.007) (Table 1, Figure 2 and Table S2). The mean associated protein direct connectivity was also higher than expected (observed = 2.9, expected = 2.4, p = 0.0008). The mean associated protein indirect connectivity was 93 (expected = 91, p = 0.34). The mean common interactor connectivity was 5.05 (expected = 4.8, p = 0.05). The total number of genes analyzed was 445 (Table S2), while genes to prioritize were 43 of which 23 were located within suggestive regions. These included genes such as SYK, IL-6, CSF2RB, FCLR3, EIF4EBP2 and CHST12 (Table 2).

Figure 2
Direct connections among gene products from MS significant plus suggestive regions.
Table 2
List of candidate genes (genes to prioritize) obtained from DAPPLE analysis of significant plus suggestive SNPs.

Tissue-specific expression and gene ontology terms of candidate genes

In order to further investigate the nature of our findings we assessed in which tissues these genes were mostly expressed. We used the gene portal BioGPS which contains gene expression data on a variety of human tissues and cell types [17]. For our analysis we considered 10 immune cell types and 14 non-immune tissues. We submitted the full list of candidate genes (n = 43) obtained from the significant plus suggestive DAPPLE analysis and for each gene we obtained a different genetic expression value in every tissue or cell type tested. Because of different background characteristics between each probe set, a direct comparison of expression across different genes was not possible. Therefore, we decided to standardize the expression values of each single gene across different tissues and used the obtained z-values for all subsequent analyses. Figure 3 shows the standardized expression values in the 24 tissues and cell types tested. Expression appeared particularly high in whole blood as well as in most of immune-related cell types (in particular B-cells, plasmacytoid dendritic cells (pDCs), natural killer (NK) cells, CD4+ and CD8+ T cells). An independent-sample Kruskal-Wallis test confirmed that gene expression was significantly different across tissues (p<0.001). When tissues were divided into immune and non-immune, expression was substantially different between the two groups (p<0.001) (Figure 4). When compared to average expression across tissues, candidate genes were significantly overexpressed in B-lymphoblasts, pDCs, monocytes, B cells, NK cells, CD4+ T cells (p<0.001), CD34+ hematopoietic cells (p = 0.001) and CD8+ T cells (p = 0.003). Expression patterns were similar for significantly and suggestively associated loci.

Figure 3
Expression values of candidate genes (genes to prioritize) in all 23 tissues and cell types tested.
Figure 4
Expression values of candidate genes (genes to prioritize) in immune and non immune tissues.

We further confirmed the immunological nature of these candidate genes using DAVID [18], [19], a bioinformatics tool that is able to identify the biological processes in which a group of genes are involved. Candidate genes were significantly enriched for immune related processes such as regulation of leukocyte activation (p = 3.10×10−8), regulation of T cell proliferation (p = 3.25×10−8), positive regulation of immune system processes (p = 7.7×10−7), regulation of protein kinase cascade (p = 5.46×10−4) and regulation of cytokine production (p = 0.001459) (see Table S3 for the full list). GO enrichment was similar for significantly and suggestively associated loci.

Discussion

We showed that genetic products coded in loci strongly associated with MS risk substantially interact with each other. Both direct and indirect interactions were significantly higher than what would be expected by chance only. When the PPI analysis was extended to suggestive SNPs, we found an increased number of total proteins participating in the network and direct interactions (Figure 1 and and2).2). The only parameter that did not reach significance was the number of indirect interactions. This finding could be explained by the possible lack of real MS association among several suggestive SNPs.

However, including suggestive SNPs in the PPI analysis increased the number of genes to prioritize from 22 to 43. Interestingly, more than half of these genes (n = 23) were located within suggestive regions and many of them directly interacted with proteins coded within significant regions (e.g. CSF2RB-CBLB, IL6-IL2RA, MAPK3K14-NFKB1, SYK-STAT3, see Table S2). Taken together the suggestive statistical evidence of genetic association and the functional evidence of protein-protein interaction support the hypothesis that these genes could play an important role in the pathogenesis of MS.

We validated our results looking at tissue specific expression of these candidate genes. Using the BioGPS database we were able to show that the suggestively associated genes identified by DAPPLE were largely and specifically expressed in immune cells as compared to other tissues. A gene ontology analysis also confirmed the immune-related functions of these genes. More generally, these findings provide additional support to the immunological nature of MS [20]. Notably, candidate gene expression was particularly high among CD8+ and CD4+ T cells, B cells, NK cells and pDCs. Interestingly all these cell types have been implicated in the pathogenesis of MS.

Several immune specific genes are located within MS suggestive regions. For example a SNP located near the gene encoding the Spleen Tyrosine Kinase (SYK) was found suggestive of association in Sawcer et al. Notably SYK was particularly highly expressed in B-cells, DCs, monocytes, CD33+ myeloid cells and NK cells. This protein has a central role in adaptive immune receptor signalling by phosphorylation of the immunoreceptor tyrosine-based activation motifs (ITAMs) [21]. SYK mediated ITAMs phosphorylation determines activation of signalling intermediates such as NF-κB, JNK and PYK2 that ultimately lead to lymphocyte activation [22]. ITAM signals mediated by SYK can also induce expansion of NK cells [23]. Interestingly, the SYK-inhibitor R788 (fostamatinib) has beneficial effects in patients affected by RA, when compared to placebo [24].

CSF2RB is another gene particularly highly expressed in B-cells, DCs, monocytes, CD33+ myeloid cells and NK cells. It codes for the β-subunit (βc) of the granulocyte-macrophage colony-stimulating factor (GM-CSF), IL-3 and IL-5 receptors that are expressed by peripheral leucocytes and blood DCs [25]. This gene appears to play an important role in allergic inflammation [26]. Interestingly, associations between CSF2RB and schizophrenia [27] and bipolar disorder [28] have been recently found.

EIF4EBP2 encodes the Eukaryotic Translation Initiation Factor 4E Binding Protein 2. The members of this family of proteins (4EBPs) can inhibit translation initiation through binding eIF4E [29]. 4EBPs regulate cell proliferation by interaction with mTORC1 pathway [30]. In addiction, EIF4EBP1 knock-out mice showed a type I IFN over production in pDCs [31]. We found an over-expression of EIF4BP2 in pDCs, CD4 cells, CD8 cells and NK cells. CHST12 encodes the carbohydrate (chondroitin 4-O) sulfotransferase 2, a protein located in the membrane of the Golgi apparatus membrane and which is implicated in chondroitin and dermatan sulphate (DS) synthesis in different tissues [32]. DS proteoglycans participate in various biological events such as extracellular matrix assembly, cell adhesion, migration and proliferation [33]. We found high expression of CHST12 in pDCs, CD4 cells, CD8 cells and NK cells.

To conclude, a number of proteins coded by genes located within MS-associated genomic regions are implicated in the same PPI networks. The extent of this interaction substantially increases when genomic regions with suggestive evidence of association are included in the analysis. This suggests that at least some of these suggestive GWAS hits represent truly associated loci, and thus more common variants remain to be found to be associated to MS. Finally, we further confirmed the immunological nature of MS and show how a single cell type cannot explain the complexity of this disease. Future functional studies should investigate how and in which cell types the suggestive candidate genes are acting. This will improve our knowledge of this complex disease and hopefully provide future strategies of disease prevention and treatment.

Supporting Information

Table S1

Direct connections and list of genes from DAPPLE analysis of significant SNPs.

(XLSX)

Table S2

Direct connections and list of genes from DAPPLE analysis of significant plus suggestive SNPs.

(XLSX)

Table S3

Results of DAVID gene ontology.

(XLSX)

Funding Statement

This work was funded by: the Medical Research Council [GRANT NUMBER G0801976]; the Wellcome Trust 090532/Z/09/Z and a research fellowship FISM-Fondazione Italiana Sclerosi Multipla-Cod.: [2010/B/5 to GD] The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

1. Ramagopalan SV, Dobson R, Meier UC, Giovannoni G (2010) Multiple sclerosis: risk factors, prodromes, and potential causal pathways. Lancet Neurol 97:727–39. [PubMed]
2. Dyment DA, Ebers GC, Sadovnick AD (2004) Genetics of multiple sclerosis. Lancet Neurol 32:104–10. [PubMed]
3. Ebers GC, Sadovnick AD, Risch NJ (1995) A genetic basis for familial aggregation in multiple sclerosis. Canadian Collaborative Study Group. Nature 3776545:150–1. [PubMed]
4. Ebers GC, Sadovnick AD, Dyment DA, Yee IM, Willer CJ, et al. (2004) Parent-of-origin effect in multiple sclerosis: observations in half-siblings. Lancet 3639423:1773–4. [PubMed]
5. Dyment DA, Herrera BM, Cader MZ, Willer CJ, Lincoln MR, et al. (2005) Complex interactions among MHC haplotypes in multiple sclerosis: susceptibility and resistance. Hum Mol Genet 1414:2019–26. [PubMed]
6. Ramagopalan SV, Ebers GC (2009) Multiple sclerosis: major histocompatibility complexity and antigen presentation. Genome Med 111:105. [PMC free article] [PubMed]
7. Hafler DA, Compston A, Sawcer S, Lander ES, Daly MJ, et al. (2007) Risk alleles for multiple sclerosis identified by a genomewide study. N Engl J Med 3579:851–62. [PubMed]
8. Baranzini SE, Wang J, Gibson RA, Galwey N, Naegelin Y, et al. (2009) Genome-wide association analysis of susceptibility and clinical phenotype in multiple sclerosis. Hum Mol Genet 184:767–78. [PubMed]
9. Australia and New Zealand Multiple Sclerosis Genetics Consortium (ANZgene) (2009) Genome-wide association study identifies new multiple sclerosis susceptibility loci on chromosomes 12 and 20. Nat Genet 417:824–8. [PubMed]
10. De Jager PL, Jia X, Wang J, de Bakker PI, Ottoboni L, et al. (2009) Meta-analysis of genome scans and replication identify CD6, IRF8 and TNFRSF1A as new multiple sclerosis susceptibility loci. Nat Genet 417:776–82. [PMC free article] [PubMed]
11. Sanna S, Pitzalis M, Zoledziewska M, Zara I, Sidore C, et al. (2010) Variants within the immunoregulatory CBLB gene are associated with multiple sclerosis. Nat Genet 426:495–7. [PubMed]
12. Rossin EJ, Lage K, Raychaudhuri S, Xavier RJ, Tatar D, et al. (2011) Proteins encoded in genomic regions associated with immune-mediated disease physically interact and suggest underlying biology. PLoS Genet 71:e1001273. [PMC free article] [PubMed]
13. O'Roak BJ, Vives L, Girirajan S, Karakoc E, Krumm N, et al. (2012) Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations. Nature 4857397:246–50. [PMC free article] [PubMed]
14. Sawcer S, Hellenthal G, Pirinen M, Spencer CC, Patsopoulos NA, et al. (2011) Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 4767359:214–9. [PMC free article] [PubMed]
15. Patsopoulos NA, Esposito F, Reischl J, Lehr S, Bauer D, et al. (2011) Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Ann Neurol 706:897–912. [PMC free article] [PubMed]
16. Wang JH, Pappas D, De Jager PL, Pelletier D, de Bakker PI, et al. (2011) Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data. Genome Med 31:3. [PMC free article] [PubMed]
17. Wu C, Orozco C, Boyer J, Leglise M, Goodale J, et al. (2009) BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources. Genome Biol 1011:R130. [PMC free article] [PubMed]
18. Huang da W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 41:44–57. [PubMed]
19. Huang da W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 371:1–13. [PMC free article] [PubMed]
20. Kasper LH, Shoemaker J (2010) Multiple sclerosis immunology: The healthy immune system vs the MS immune system. Neurology 74Suppl 1:S2–8. [PubMed]
21. Kerrigan AM, Brown GD (2011) Syk-coupled C-type lectins in immunity. Trends Immunol 324:151–6. [PMC free article] [PubMed]
22. Mocsai A, Ruland J, Tybulewicz VL (2010) The SYK tyrosine kinase: a crucial player in diverse biological functions. Nat Rev Immunol 106:387–402. [PubMed]
23. Hesslein DG, Palacios EH, Sun JC, Beilke JN, Watson SR, et al. (2011) Differential requirements for CD45 in NK-cell function reveal distinct roles for Syk-family kinases. Blood 11711:3087–95. [PubMed]
24. Weinblatt ME, Kavanaugh A, Genovese MC, Musser TK, Grossbard EB, et al. (2010) An oral spleen tyrosine kinase (Syk) inhibitor for rheumatoid arthritis. N Engl J Med 36314:1303–12. [PubMed]
25. Yamada T, Sun Q, Zeibecoglou K, Bungre J, North J, et al. (1998) IL-3, IL-5, granulocyte-macrophage colony-stimulating factor receptor alpha-subunit, and common beta-subunit expression by peripheral leukocytes and blood dendritic cells. J Allergy Clin Immunol 1015:677–82. [PubMed]
26. Asquith KL, Ramshaw HS, Hansbro PM, Beagley KW, Lopez AF, et al. (2008) The IL-3/IL-5/GM-CSF common receptor plays a pivotal role in the regulation of Th2 immunity and allergic airway inflammation. J Immunol 1802:1199–206. [PubMed]
27. Chen Q, Wang X, O'Neill FA, Walsh D, Fanous A, et al. (2008) Association study of CSF2RB with schizophrenia in Irish family and case - control samples. Mol Psychiatry 1310:930–8. [PubMed]
28. Moskvina V, Craddock N, Holmans P, Nikolov I, Pahwa JS, et al. (2009) Gene-wide analyses of genome-wide association data sets: evidence for multiple common risk alleles for schizophrenia and bipolar disorder and for overlap in genetic risk. Mol Psychiatry 143:252–60. [PubMed]
29. Mader S, Lee H, Pause A, Sonenberg N (1995) The translation initiation factor eIF-4E binds to a common motif shared by the translation factor eIF-4 gamma and the translational repressors 4E-binding proteins. Mol Cell Biol 159:4990–7. [PMC free article] [PubMed]
30. Dowling RJ, Topisirovic I, Alain T, Bidinosti M, Fonseca BD, et al. (2010) mTORC1-mediated cell proliferation, but not cell growth, controlled by the 4E-BPs. Science 3285982:1172–6. [PMC free article] [PubMed]
31. Colina R, Costa-Mattioli M, Dowling RJ, Jaramillo M, Tai LH, et al. (2008) Translational control of the innate immune response through IRF-7. Nature 4527185:323–8. [PubMed]
32. Hiraoka N, Nakagawa H, Ong E, Akama TO, Fukuda MN, et al. (2000) Molecular cloning and expression of two distinct human chondroitin 4-O-sulfotransferases that belong to the HNK-1 sulfotransferase gene family. J Biol Chem 27526:20188–96. [PubMed]
33. Mikami T, Mizumoto S, Kago N, Kitagawa H, Sugahara K (2003) Specificities of three distinct human chondroitin/dermatan N-acetylgalactosamine 4-O-sulfotransferases demonstrated using partially desulfated dermatan sulfate as an acceptor: implication of differential roles in dermatan sulfate biosynthesis. J Biol Chem 27838:36115–27. [PubMed]

Articles from PLoS ONE are provided here courtesy of Public Library of Science