Glutamate levels were measured in vivo
in both the normal-appearing white and grey matter of 382 patients with multiple sclerosis using 1
H magnetic resonance spectroscopy imaging. Although at the time of the study most patients had a diagnosis of relapse–remitting multiple sclerosis (n
= 279) or clinically isolated syndrome (n
= 58), all types of the disease were represented in this analysis. lists the demographic details of the cohort alongside other radiological and clinical parameters. Approximately 500 000 genotypes were available for each study participant (Baranzini et al.
) and were used to perform a genome-wide association analysis using brain glutamate concentrations as an endophenotypic continuous trait (). The top associated marker was rs794185 (P
< 6.44 × 10−7
), a SNP in chromosome 3p26.2 that maps to intron 6 of the gene coding for sulphatase modifying factor 1 (SUMF1
). Mutations in SUMF1
lead to multiple sulphatase deficiency (MIM 272200), a lysosomal storage disorder. DNA variants in this gene may indirectly regulate extracellular glutamate by altering the activity of steroid sulphatases (Shirakawa et al.
; Gibbs et al.
; Valenzuela et al.
). A region spanning ~4 Mb in chromosome 7 and containing 11 SNPs in HDAC9,
5 SNPs in CDCA7L
and 6 SNPs in DRCTNNB
was found to be modestly associated (P
-values between 10−4
). The top 20 associated SNPs are listed in Supplementary Table 1
Figure 1 Genome-wide scan for allele frequency differences related to in vivo glutamate concentration in multiple sclerosis brains. (A) P-values from the linear regression with glutamate concentration, controlled by disease duration, age of onset and DRB1 status. (more ...)
To maximize the probability of identifying true associated genes, we used a PIN-based analysis as previously described (Baranzini et al.
). In this method, evidence of genetic association is combined with evidence of physical interaction of the respective gene products. Thirty-four modules were found to be significantly associated with in vivo
glutamate concentration. Due to the nature of the searches, significant overlap in the composition of modules is expected. In order to assess the relative importance of these modules, several criteria were considered. First, a literature search was performed to determine the relevance of each of the component genes to glutamate biology. Next, their association with related phenotypes such as NAA decline and brain atrophy change were measured. Module 14 was the top scoring module across all these criteria. This module was composed of 70 genes and included three ionotropic glutamate receptors (GRID2
), 17 anchoring proteins required for glutamate receptor and transporter organization (AKAP5, DLG2, DLG4, SHANK2, PRKCA, LRRC7, PKP4, CTNND2, CDH2, CDH5, DSC3, ARVCF, NLGN4X, DLGAP1, CASK, CASKIN, ACTN2
), two axon guidance molecules (DAB1
) and three key regulators of glutamatergic synaptic activity (ERBB4, PTK2B and PARK2
) (). In addition, seven members of the TGF-β signalling pathway (SMAD1
, SMAD2 SMAD3, SMAD6, SMURF1, ERBB2IP and ACVR1
) were also members of this network.
Figure 2 Module 14. A graphical representation of the overall highest scoring module from the protein interaction network. Circles represent proteins and lines represent interactions among them. Proteins are coloured according to their relationship to glutamate. (more ...)
To assess the biological relevance of reported modules a text-mining strategy was implemented by performing automated PubMed searches with each of the component genes of all identified modules and the term ‘glutamate’ OR ‘glutamic acid’. The aggregate number of articles found for all genes in a given module was recorded as its DKS. The top associated genes (by P-value) found in the PIN (whether they interact or not) had higher DKS than the top associated genes not found in the PIN, perhaps indicating that genes in the PIN are overall better annotated. However, the DKS of Module 14 (and that of all other modules, data not shown) was significantly higher than that of both the top genes in the PIN and the overall top associated genes (). These results support our hypothesis that the network-based approach identifies biologically related genes.
Figure 3 Domain knowledge scores. Mean DKSs were calculated for genes in Module 14 (black bar), and for the top associated genes from the same protein network (whether they interact or not) (grey bar). Also, the mean DKS of the top associated genes from the original (more ...)
We then tested the association of each module with selected relevant MRI-based metrics (). To this end, a module-specific genetic score was computed for each patient. The score was derived from the number of risk alleles carried at each gene represented in the module. We reasoned that if glutamate concentration was affected by genomic variants, then individuals with the highest glutamate levels would show the highest number of associated alleles in the module. As predicted, patients with the highest glutamate levels in grey matter were more likely to display the highest genetic scores (for Module 14: R2 = 42%) (A). While this correlation was expected because genetic scores were derived from the regression with the trait, the highly significant P-value (2.58 × 10−29) indicates that most, if not all, of the 70 genes in the module contribute to the phenotype. Interestingly, we observed a significant correlation between the rate of NAA decline in grey matter over the first year after glutamate measurements and the module-specific genetic score (R2 = 6.3%, P < 10−4) (B). Although a relationship between these two variables was anticipated (expected R2 = 4.4%) due to the existing correlation between glutamate and NAA levels (R2 = 10.4%, P < 10−7), the correlation between genetic scores and NAA decline is higher than expected. Simulations with artificial data sets and conditional regression analysis were conducted to assess the statistical significance of these additional correlations but were not conclusive, possibly reflecting lack of power due to a moderate sample size and the relatively short follow-up time. Similar results were obtained when NAA decline in white matter was considered (R2 = 3.2%, P < 0.007). Finally, we also observed a significant correlation between brain atrophy over the first year and the module-specific genetic score (observed: R2 = 1.2% P < 0.05; expected R2 = 0.14%) (C). Correlations between glutamate-based genetic score and the multiple sclerosis severity score (computed over 2 and 3 years) did not reach significance.
Figure 4 Correlation between glutamate genetic score and relevant variables. (A) Correlation of glutamate genetic scores with grey matter glutamate concentration. (B) Correlation of genetic scores with NAA change over 1 year. Genetic scores explain more variance (more ...)
As described in , the data set included patients of all disease subtypes and presumably with variable degrees of neurodegeneration. It is conceivable that the DNA variants associated with glutamate concentration will have a stronger effect in patients with more neurodegeneration, as evidenced by MRI. To evaluate this hypothesis, we stratified patients based on brain atrophy (i.e. a surrogate of neurodegeneration) and repeated the entire analysis in each group. Patients showing at least 0.2% decline (−0.2% percent whole brain volume change) in structural image evaluation using normalization of atrophy at two or more times during a 3-year follow-up period were considered as the group with ‘high’ neurodegeneration (n
= 250), while the remaining 132 individuals were defined as the group with ‘low’ neurodegeneration. A decline of 0.2% or more was observed in normal ageing from 48 healthy controls scanned annually using the same 3T scanner (D. Pelletier, unpublished data). The top 20 associated variants in the high neurodegeneration group are shown in Supplementary Table 2
. It is noteworthy that the top hit from the original GWAS performed in all patients (rs794185 in SUMF1
< 6.44 × 10−7
) was the second-most significant marker in the new analysis (P
< 9.92 × 10−6
). In comparison, the same SNP ranked 329 987 in the GWAS with the low neurodegeneration group (the top 20 variants are shown in Supplementary Table 3
), suggesting that most of the statistical significance of the SUMF1
association in the full data set derives from the group of patients with high neurodegeneration.
We then performed the network-based analysis to search for modules enriched in modestly associated variants within functionally related genes. Like in the original GWAS, several networks (i.e. 23) with different degrees of overlapping were significantly enriched in genes associated with glutamate concentrations. Of these, eight networks were also functionally related to glutamate in the group with high neurodegeneration as determined by their DKS. Whereas a similar number of significant modules were found for the group with low neurodegeneration (i.e. 30), only one of them was functionally related to glutamate.
The top associated module in the group with high neurodegeneration was composed of 55 genes (Supplementary Fig. 2
) significantly enriched in glutamate biology (Supplementary Fig. 3
). Although computing the overlap between any two networks is currently a subject of much debate in graphic theory (nets could be compared on the basis of shared nodes, edges or half a dozen network parameters such as connectivity distribution, clustering coefficient, etc.), a rough measure is to count the number of nodes in common. When compared to the network obtained with data from the original GWAS, we identified 13 genes in common, a remarkable overlap considering that the expected number of shared genes between two random networks of comparable sizes is virtually 0.