Biological processes are carried out by coordinated modules of interacting molecules. As clustering methods demonstrate that genes with similar expression display increased likelihood of being associated with a common functional module, networks of coexpressed genes provide one framework for assigning gene function. This has informed the guilt-by-association (GBA) heuristic, widely invoked in functional genomics. Yet although the idea of GBA is accepted, the breadth of GBA applicability is uncertain.
We developed methods to systematically explore the breadth of GBA across a large and varied corpus of expression data to answer the following question: To what extent is the GBA heuristic broadly applicable to the transcriptome and conversely how broadly is GBA captured by a priori knowledge represented in the Gene Ontology (GO)? Our study provides an investigation of the functional organization of five coexpression networks using data from three mammalian organisms. Our method calculates a probabilistic score between each gene and each Gene Ontology category that reflects coexpression enrichment of a GO module. For each GO category we use Receiver Operating Curves to assess whether these probabilistic scores reflect GBA. This methodology applied to five different coexpression networks demonstrates that the signature of guilt-by-association is ubiquitous and reproducible and that the GBA heuristic is broadly applicable across the population of nine hundred Gene Ontology categories. We also demonstrate the existence of highly reproducible patterns of coexpression between some pairs of GO categories.
We conclude that GBA has universal value and that transcriptional control may be more modular than previously realized. Our analyses also suggest that methodologies combining coexpression measurements across multiple genes in a biologically-defined module can aid in characterizing gene function or in characterizing whether pairs of functions operate together.
Significant effort has been invested in network-based gene function prediction algorithms based on the guilt by association (GBA) principle. Existing approaches for assessing prediction performance typically compute evaluation metrics, either averaged across all functions being considered, or strictly from properties of the network. Since the success of GBA algorithms depends on the specific function being predicted, evaluation metrics should instead be computed for each function. We describe a novel method for computing the usefulness of a network by measuring its impact on gene function cross validation prediction performance across all gene functions. We have implemented this in software called Network Assessor, and describe its use in the GeneMANIA (GM) quality control system. Network Assessor is part of the GM command line tools.
network inference; function prediction; cross validation; network biology; machine learning
In an opinion published in 2012, we reviewed and discussed our studies of how gene network-based guilt-by-association (GBA) is impacted by confounds related to gene multifunctionality. We found such confounds account for a significant part of the GBA signal, and as a result meaningfully evaluating and applying computationally-guided GBA is more challenging than generally appreciated. We proposed that effort currently spent on incrementally improving algorithms would be better spent in identifying the features of data that do yield novel functional insights. We also suggested that part of the problem is the reliance by computational biologists on gold standard annotations such as the Gene Ontology. In the year since, there has been continued heavy activity in GBA-based research, including work that contributes to our understanding of the issues we raised. Here we provide a review of some of the most relevant recent work, or which point to new areas of progress and challenges.
Expressed Sequence Tag-based gene expression profiling can be used to discover functionally associated genes on a large scale. Currently available web servers and tools focus on finding differentially expressed genes in different samples or tissues rather than finding co-expressed genes. To fill this gap, we have developed a web server that implements the GBA (Guilt-by-Association) co-expression algorithm, which has been successfully used in finding disease-related genes. We have also annotated UniGene clusters with links to several important databases such as GO, KEGG, OMIM, Gene, IPI and HomoloGene. The GBA server can be accessed and downloaded at .
Mutations in the glucocerebrosidase gene (GBA) result in Gaucher disease and can be associated with a phenotype characterized by adult-onset progressive neurologic deterioration and parkinsonism.
To define the clinical and neurologic spectrum of parkinsonian manifestations associated with GBA mutations.
Design, Setting, and Patients
A prospective case series of 10 patients (7 men and 3 women) with parkinsonism and GBA mutations evaluated at the National Institutes of Health Clinical Center.
Main Outcome Measures
The GBA genotypes were identified by means of DNA sequencing. Tests evaluating neurologic, motor, cognitive, ocular, and olfactory functions were performed and the results were analyzed by a single team.
Genotyping identified GBA mutations N370S, L444P, and c.84dupG and recombinant alleles. The mean age at onset of parkinsonian manifestations was 49 years (range, 39–65 years), disease duration was 7.8 years (range, 1.2–16.0 years), and Unified Parkinson Disease Rating Scale part III score was 26.3 (range, 13–38). Half of the patients reported cognitive changes later in the disease course. Six patients were diagnosed as having Parkinson disease, 3 as having Lewy body dementia, and 1 as having a “Parkinson plus” syndrome. The most frequent nonmotor finding was olfactory dysfunction. Atypical manifestations included myoclonus, electroencephalographic abnormalities, and seizures.
In the homozygous and heterozygous states, GBA mutations are associated with a spectrum of parkinsonian phenotypes ranging from Parkinson disease, mostly of the akinetic type, to a less common phenotype characteristic of Lewy body dementia.
We propose a new feature selection algorithm, Guilt-By-Association (GBA), which
uses hierarchical clustering based on feature correlations to
eliminate redundant features. GBA can be used in conjunction with other
algorithms to produce a feature selection routine that explicitly considers
both the similarities between features and their individual discriminatory
powers. In this preliminary study, a simple form of GBA
was investigated on simulated proteomic data.
To determine the frequency of mutations responsible for Gaucher's disease, we systematically sequenced the GBA1 gene as part of a molecular characterization of 73 adult patients in the United Kingdom. Five hitherto unknown pathogenic variants were identified, one of which is a splice site change; the others are novel missense mutations. Given that GBA1 gene mutations are an important risk factor for the development of Parkinson's disease, we contend that a complete analysis and molecular characterization of both the known and novel GBA1 variants will be needed before the biochemical processes underlying this genetic association can be fully understood.
► We report a comprehensive genotypic analysis of GBA1 in 73 Type I GD patients. ► We identified 5 new mutations in the GBA1 gene. ► The mutations we report here are clearly loss of function alleles.
Parkinson's disease; Genetics; Gaucher's disease; Glucocerebrosidase; GBA1 gene
Gaucher disease (GD) is the most common inherited lysosomal storage disorder in humans, caused by mutations in the gene encoding the lysosomal enzyme glucocerebrosidase (GBA1). GD is clinically heterogeneous and although the type of GBA1 mutation plays a role in determining the type of GD, it does not explain the clinical variability seen among patients. Cumulative evidence from recent studies suggests that GBA2 could play a role in the pathogenesis of GD and potentially interacts with GBA1.
We used a framework of functional and genetic approaches in order to further characterize a potential role of GBA2 in GD. Glucosylceramide (GlcCer) levels in spleen, liver and brain of GBA2-deficient mice and mRNA and protein expression of GBA2 in GBA1-deficient murine fibroblasts were analyzed. Furthermore we crossed GBA2-deficient mice with conditional Gba1 knockout mice in order to quantify the interaction between GBA1 and GBA2. Finally, a genetic approach was used to test whether genetic variation in GBA2 is associated with GD and/ or acts as a modifier in Gaucher patients. We tested 22 SNPs in the GBA2 and GBA1 genes in 98 type 1 and 60 type 2/3 Gaucher patients for single- and multi-marker association with GD.
We found a significant accumulation of GlcCer compared to wild-type controls in all three organs studied. In addition, a significant increase of Gba2-protein and Gba2-mRNA levels in GBA1-deficient murine fibroblasts was observed. GlcCer levels in the spleen from Gba1/Gba2 knockout mice were much higher than the sum of the single knockouts, indicating a cross-talk between the two glucosylceramidases and suggesting a partially compensation of the loss of one enzyme by the other. In the genetic approach, no significant association with severity of GD was found for SNPs at the GBA2 locus. However, in the multi-marker analyses a significant result was detected for p.L444P (GBA1) and rs4878628 (GBA2), using a model that does not take marginal effects into account.
All together our observations make GBA2 a likely candidate to be involved in GD etiology. Furthermore, they point to GBA2 as a plausible modifier for GBA1 in patients with GD.
Guilt is a core emotion governing social behavior by promoting compliance with social norms or self-imposed standards. The goal of this study was to contrast guilty responses to actions that affect self versus others, since actions with social consequences are hypothesized to yield greater guilty feelings due to adopting the perspective and subjective emotional experience of others. Sixteen participants were presented with brief hypothetical scenarios in which the participant’s actions resulted in harmful consequences to self (guilt-self) or to others (guilt-other) during functional MRI. Participants felt more intense guilt for guilt-other than guilt-self and guilt-neutral scenarios. Guilt scenarios revealed distinct regions of activity correlated with intensity of guilt, social consequences of actions, and the interaction of guilt by social consequence. Guilt intensity was associated with activation of the dorsomedial PFC, superior frontal gyrus, supramarginal gyrus, and anterior inferior frontal gyrus. Guilt accompanied by social consequences was associated with greater activation than without social consequences in the ventromedial and dorsomedial PFC, precuneus, posterior cingulate, and posterior superior temporal sulcus. Finally, the interaction analysis highlighted select regions that were more strongly correlated with guilt intensity as a function of social consequence, including the left anterior inferior frontal gyrus, left ventromedial PFC, and left anterior inferior parietal cortex. Our results suggest these regions intensify guilt where harm to others may incur a greater social cost.
guilt; empathy; perspective taking; social emotions; functional magnetic resonance imaging
Mutations in the glucocerebrosidase gene (GBA) are associated with Gaucher's disease, the most common lysosomal storage disorder. Parkinsonism is an established feature of Gaucher's disease and an increased frequency of mutations in GBA has been reported in several different ethnic series with sporadic Parkinson's disease. In this study, we evaluated the frequency of GBA mutations in British patients affected by Parkinson's disease. We utilized the DNA of 790 patients and 257 controls, matched for age and ethnicity, to screen for mutations within the GBA gene. Clinical data on all identified GBA mutation carriers was reviewed and analysed. Additionally, in all cases where brain material was available, a neuropathological evaluation was performed and compared to sporadic Parkinson's disease without GBA mutations. The frequency of GBA mutations among the British patients (33/790 = 4.18%) was significantly higher (P = 0.01; odds ratio = 3.7; 95% confidence interval = 1.12–12.14) when compared to the control group (3/257 = 1.17%). Fourteen different GBA mutations were identified, including three previously undescribed mutations, K7E, D443N and G193E. Pathological examination revealed widespread and abundant α-synuclein pathology in all 17 GBA mutation carriers, which were graded as Braak stage of 5–6, and had McKeith's limbic or diffuse neocortical Lewy body-type pathology. Diffuse neocortical Lewy body-type pathology tended to occur more frequently in the group with GBA mutations compared to matched Parkinson's disease controls. Clinical features comprised an early onset of the disease, the presence of hallucinations in 45% (14/31) and symptoms of cognitive decline or dementia in 48% (15/31) of patients. This study demonstrates that GBA mutations are found in British subjects at a higher frequency than any other known Parkinson's disease gene. This is the largest study to date on a non-Jewish patient sample with a detailed genotype/phenotype/pathological analyses which strengthens the hypothesis that GBA mutations represent a significant risk factor for the development of Parkinson's disease and suggest that to date, this is the most common genetic factor identified for the disease.
Parkinson's disease; GBA; Gaucher's disease; neuropathology
Gamma Band Activity (GBA) is increasingly studied for its relation with attention, change detection, maintenance of working memory and the processing of sensory stimuli. Activity around the gamma range has also been linked with early visual processing, although the relationship between this activity and the low frequency visual evoked potential (VEP) remains unclear. This study examined the ability of blind and semi-blind source separation techniques to extract sources specifically related to the VEP and GBA in order to shed light on the relationship between them. Blind (Independent Component Analysis—ICA) and semi-Blind (Functional Source Separation—FSS) methods were applied to dense array EEG data recorded during checkerboard stimulation. FSS was performed with both temporal and spectral constraints to identify specifically the generators of the main peak of the VEP (P100) and of the GBA. Source localisation and time-frequency analyses were then used to investigate the properties and co-dependencies between VEP/P100 and GBA. Analysis of the VEP extracted using the different methods demonstrated very similar morphology and localisation of the generators. Single trial time frequency analysis showed higher GBA when a larger amplitude VEP/P100 occurred. Further examination indicated that the evoked (phase-locked) component of the GBA was more related to the P100, whilst the induced component correlated with the VEP as a whole. The results suggest that the VEP and GBA may be generated by the same neuronal populations, and implicate this relationship as a potential mediator of the correlation between the VEP and the Blood Oxygenation Level Dependent (BOLD) effect measured with fMRI.
► ICA and FSS are able to extract sources specifically related to the VEP/P100 and GBA. ► Localisation and frequency analyses show co-dependencies between VEP/P100 and GBA. ► Trial by trial induced GBA covaries with VEP amplitude. ► VEP and GBA may be generated by the same neuronal populations. ► VEP and GBA relationship may underlie the relationship between VEP and BOLD.
Visual Evoked Potential (VEP); Electroencephalography (EEG); Independent Component Analysis (ICA); Functional Source Separation (FSS); Induced Visual Gamma (IVG); Gamma Band Activity (GBA)
While mutations in glucocerebrosidase (GBA1) are associated with an increased risk for Parkinson disease (PD), it is important to establish whether such mutations are also a common risk factor for other Lewy body disorders.
To establish whether GBA1 mutations are a risk factor for dementia with Lewy bodies (DLB).
We compared genotype data on patients and controls from 11 centers. Data concerning demographics, age at onset, disease duration, and clinical and pathological features were collected when available. We conducted pooled analyses using logistic regression to investigate GBA1 mutation carrier status as predicting DLB or PD with dementia status, using common control subjects as a reference group. Random-effects meta-analyses were conducted to account for additional heterogeneity.
Eleven centers from sites around the world performing genotyping.
Seven hundred twenty-one cases met diagnostic criteria for DLB and 151 had PD with dementia. We compared these cases with 1962 controls from the same centers matched for age, sex, and ethnicity.
Main Outcome Measures
Frequency of GBA1 mutations in cases and controls.
We found a significant association between GBA1 mutation carrier status and DLB, with an odds ratio of 8.28 (95% CI, 4.78–14.88). The odds ratio for PD with dementia was 6.48 (95% CI, 2.53–15.37). The mean age at diagnosis of DLB was earlier in GBA1 mutation carriers than in noncarriers (63.5 vs 68.9 years; P<.001), with higher disease severity scores.
Conclusions and Relevance
Mutations in GBA1 are a significant risk factor for DLB. GBA1 mutations likely play an even larger role in the genetic etiology of DLB than in PD, providing insight into the role of glucocerebrosidase in Lewy body disease.
We conducted 3 studies to test the idea that guilt is a key affective component of Conscientiousness and that it can account for the relation between Conscientiousness and negative affect. Study 1 used meta-analysis to show that Conscientiousness was associated with specific emotions and overall negative affect but was most strongly associated with guilt. Conscientiousness was negatively related to guilt experience but positively related to guilt proneness. Also, guilt experience mediated the relation between Conscientiousness and negative affect. Study 2 (N = 142) examined the relation between facets of Conscientiousness and guilt. We replicated results from Study 1 and showed that the relation between Conscientiousness and guilt was not due to overlap with Extraversion and Neuroticism. Study 3 (n = 176) examined the interplay between Conscientiousness and guilt on grades in a short-term longitudinal study. These studies showed that Conscientiousness is primarily related to guilt and highlighted the importance of examining the emotional substrate of Conscientiousness.
Co-expression based Cancer Modules (CMs) are sets of genes that act in concert to carry out specific functions in different cancer types, and are constructed by exploiting gene expression profiles related to specific clinical conditions or expression signatures associated to specific processes altered in cancer. Unfortunately, genes involved in cancer are not always detectable using only expression signatures or co-expressed sets of genes, and in principle other types of functional interactions should be exploited to obtain a comprehensive picture of the molecular mechanisms underlying the onset and progression of cancer.
We propose a novel semi-supervised method to rank genes with respect to CMs using networks constructed from different sources of functional information, not limited to gene expression data. It exploits on the one hand local learning strategies through score functions that extend the guilt-by-association approach, and on the other hand global learning strategies through graph kernels embedded in the score functions, able to take into account the overall topology of the network. The proposed kernelized score functions compare favorably with other state-of-the-art semi-supervised machine learning methods for gene ranking in biological networks and scales well with the number of genes, thus allowing fast processing of very large gene networks.
The modular nature of kernelized score functions provides an algorithmic scheme from which different gene ranking algorithms can be derived, and the results show that using integrated functional networks we can successfully predict CMs defined mainly through expression signatures obtained from gene expression data profiling. A preliminary analysis of top ranked "false positive" genes shows that our approach could be in perspective applied to discover novel genes involved in the onset and progression of tumors related to specific CMs.
To assess the cognitive phenotype of glucocerebrosidase (GBA) mutation carriers with early-onset Parkinson disease (PD).
We administered a neuropsychological battery and the University of Pennsylvania Smell Identification Test (UPSIT) to participants in the CORE-PD study who were tested for mutations in PARKIN, LRRK2, and GBA. Participants included 33 GBA mutation carriers and 60 noncarriers of any genetic mutation. Primary analyses were performed on 26 GBA heterozygous mutation carriers without additional mutations and 39 age- and PD duration–matched noncarriers. Five cognitive domains, psychomotor speed, attention, memory, visuospatial function, and executive function, were created from transformed z scores of individual neuropsychological tests. Clinical diagnoses (normal, mild cognitive impairment [MCI], dementia) were assigned blind to genotype based on neuropsychological performance and functional impairment as assessed by the Clinical Dementia Rating (CDR) score. The association between GBA mutation status and neuropsychological performance, CDR, and clinical diagnoses was assessed.
Demographics, UPSIT, and Unified Parkinson's Disease Rating Scale–III performance did not differ between GBA carriers and noncarriers. GBA mutation carriers performed more poorly than noncarriers on the Mini-Mental State Examination (p = 0.035), and on the memory (p = 0.017) and visuospatial (p = 0.028) domains. The most prominent differences were observed in nonverbal memory performance (p < 0.001). Carriers were more likely to receive scores of 0.5 or higher on the CDR (p < 0.001), and a clinical diagnosis of either MCI or dementia (p = 0.004).
GBA mutation status may be an independent risk factor for cognitive impairment in patients with PD.
To characterize sequence variation within the glucocerebrosidase (GBA) gene in a select subset of our sample of patients with familial Parkinson disease (PD) and then to test in our full sample whether these sequence variants increased the risk for PD and were associated with an earlier onset of disease.
We performed a comprehensive study of all GBA exons in one patient with PD from each of 96 PD families, selected based on the family-specific lod scores at the GBA locus. Identified GBA variants were subsequently screened in all 1325 PD cases from 566 multiplex PD families and in 359 controls.
Nine different GBA variants, five previously reported, were identified in 21 of the 96 PD cases sequenced. Screening for these variants in the full sample identified 161 variant carriers (12.2%) in 99 different PD families. An unbiased estimate of the frequency of the five previously reported GBA variants in the familial PD sample was 12.6% and in the control sample was 5.3% (odds ratio 2.6; 95% confidence interval 1.5–4.4). Presence of a GBA variant was associated with an earlier age at onset (p = 0.0001). On average, those patients carrying a GBA variant had onset with PD 6.04 years earlier than those without a GBA variant.
This study suggests that GBA is a susceptibility gene for familial Parkinson disease (PD) and patients with GBA variants have an earlier age at onset than patients with PD without GBA variants.
= confidence interval;
= Gaucher disease;
= Geriatric Depression Scale;
= Mini-Mental State Examination;
= National Cell Repository for Alzheimer’s Disease;
= nonparametric lod;
= odds ratio;
= Parkinson disease;
= Unified Parkinson’s Disease Rating Scale.
Mutations in the glucocerebrosidase (GBA) gene are associated with Lewy body (LB) disorders.
To determine the relationship of GBA mutations and APOE4 genotype to LB and Alzheimer disease (AD) pathological findings.
The 187 subjects included patients with primary neuropathological diagnoses of LB disorders with or without AD changes (95 cases), randomly selected patients with AD (without significant LB pathological findings; 60 cases), and controls with neither LB nor AD pathological findings (32 cases).
Main Outcome Measures
GBA mutation status, APOE4 genotype, LB pathological findings (assessed according to the third report of the Dementia With Lewy Body Consortium), and Alzheimer plaque and tangle pathological findings (rated by criteria of Braak and Braak, the Consortium to Establish a Registry for Alzheimer Disease, and the National Institute on Aging–Reagan Institute).
GBA mutations were found in 18% (34 of 187) of all subjects, including 28% (27 of 95) of those with primary LB pathological findings compared with 10% (6 of 60) of those with AD pathological findings and 3% (1 of 32) of those without AD or LB pathological findings (P=.001). GBA mutation status was significantly associated with the presence of cortical LBs (odds ratio, 6.48; 95% confidence interval, 2.45–17.16; P<.001), after adjusting for sex, age at death, and presence of APOE4. GBA mutation carriers were significantly less likely to meet AD pathological diagnostic (National Institute on Aging–Reagan Institute intermediate or high likelihood) criteria (odds ratio, 0.35; 95% confidence interval, 0.15–0.79; P=.01) after adjustment for sex, age at death, and APOE4.
GBA mutations may be associated with pathologically “purer” LB disorders, characterized by more extensive (cortical) LB, and less severe AD pathological findings and may be a useful marker for LB disorders.
Lewy body disease (LBD) development is enhanced by mutations in the GBA gene coding for glucocerebrosidase (GCase). The mechanism of this association is thought to involve an abnormal lysosomal system and we therefore sought to evaluate if lysosomal changes contribute to the pathogenesis of idiopathic LBD. Analysis of post-mortem frontal cortex tissue from 7 GBA mutation carriers with LBD, 5 GBA mutation carriers with no signs of neurological disease and human neural stem cells exposed to a GCase inhibitor was used to determine how GBA mutation contributes to LBD. GBA mutation carriers demonstrated a significantly reduced level of GCase protein and enzyme activity and retention of glucocerebrosidase isoforms within the endoplasmic reticulum (ER). This was associated with enhanced expression of the lysosomal markers LAMP1 and LAMP2, though the expression of ATP13A2 and Cathepsin D was reduced, along with the decreased activity of Cathepsin D. The ER unfolded protein response (UPR) regulator BiP/GRP78 was reduced by GBA mutation and this was a general phenomenon in LBD. Despite elevation of GRP94 in LBD, individuals with GBA mutations showed reduced GRP94 expression, suggesting an inadequate UPR. Finally, human neural stem cell cultures showed that inhibition of GCase causes acute reduction of BiP, indicating that the UPR is affected by reduced glucocerebrosidase activity. The results indicate that mutation in GBA leads to additional lysosomal abnormalities, enhanced by an impaired UPR, potentially causing α-synuclein accumulation.
dementia with Lewy bodies; endoplasmic reticulum; glucocerebrosidase; Lewy body disease; lysosome; Parkinson’s disease
Gene networks are commonly interpreted as encoding functional information in their connections. An extensively validated principle called guilt by association states that genes which are associated or interacting are more likely to share function. Guilt by association provides the central top-down principle for analyzing gene networks in functional terms or assessing their quality in encoding functional information. In this work, we show that functional information within gene networks is typically concentrated in only a very few interactions whose properties cannot be reliably related to the rest of the network. In effect, the apparent encoding of function within networks has been largely driven by outliers whose behaviour cannot even be generalized to individual genes, let alone to the network at large. While experimentalist-driven analysis of interactions may use prior expert knowledge to focus on the small fraction of critically important data, large-scale computational analyses have typically assumed that high-performance cross-validation in a network is due to a generalizable encoding of function. Because we find that gene function is not systemically encoded in networks, but dependent on specific and critical interactions, we conclude it is necessary to focus on the details of how networks encode function and what information computational analyses use to extract functional meaning. We explore a number of consequences of this and find that network structure itself provides clues as to which connections are critical and that systemic properties, such as scale-free-like behaviour, do not map onto the functional connectivity within networks.
The analysis of gene function and gene networks is a major theme of post-genome biomedical research. Historically, many attempts to understand gene function leverage a biological principle known as “guilt by association” (GBA). GBA states that genes with related functions tend to share properties such as genetic or physical interactions. In the past ten years, GBA has been scaled up for application to large gene networks, becoming a favored way to grapple with the complex interdependencies of gene functions in the face of floods of genomics and proteomics data. However, there is a growing realization that scaled-up GBA is not a panacea. In this study, we report a precise identification of the limits of GBA and show that it cannot provide a way to understand gene networks in a way that is simultaneously general and useful. Our findings indicate that the assumptions underlying the high-throughput use of gene networks to interpret function are fundamentally flawed, with wide-ranging implications for the interpretation of genome-wide data.
Beta-glucosidase 1 (GBA1; lysosomal glucocerebrosidase) and β-glucosidase 2 (GBA2, non-lysosomal glucocerebrosidase) both have glucosylceramide as a main natural substrate. The enzyme-deficient conditions with glucosylceramide accumulation are Gaucher disease (GBA1–/– in humans), modelled by the Gba1–/– mouse, and the syndrome with male infertility in the Gba2–/– mouse, respectively. Before the leading role of glucosylceramide was recognised for both deficient conditions, bile acid-3-O-β-glucoside (BG), another natural substrate, was viewed as the main substrate of GBA2. Given that GBA2 hydrolyses both BG and glucosylceramide, it was asked whether vice versa GBA1 hydrolyses both glucosylceramide and BG. Here we show that GBA1 also hydrolyses BG. We compared the residual BG hydrolysing activities in the GBA1–/–, Gba1–/– conditions (where GBA2 is the almost only active β-glucosidase) and those in the Gba2–/– condition (GBA1 active), with wild-type activities, but we used also the GBA1 inhibitor isofagomine. GBA1 and GBA2 activities had characteristic differences between the studied fibroblast, liver and brain samples. Independently, the hydrolysis of BG by pure recombinant GBA1 was shown. The fact that both GBA1 and GBA2 are glucocerebrosidases as well as bile acid β-glucosidases raises the question, why lysosomal accumulation of glucosylceramide in GBA1 deficiency, and extra-lysosomal accumulation in GBA2 deficiency, are not associated with an accumulation of BG in either condition.
β-Glucosidase 1 (GBA1); β-Glucosidase 2 (GBA2); Bile acid β-glucosidases; Glucosylceramide lipidosis; β-Glucosidase null mice; Isofagomine
Plasmodium falciparum is the main causative agent of malaria. Of the 5 484 predicted genes of P. falciparum, about 57% do not have sufficient sequence similarity to characterized genes in other species to warrant functional assignments. Non-homology methods are thus needed to obtain functional clues for these uncharacterized genes. Gene expression data have been widely used in the recent years to help functional annotation in an intra-species way via the so-called Guilt By Association (GBA) principle.
We propose a new method that uses gene expression data to assess inter-species annotation transfers. Our approach starts from a set of likely orthologs between a reference species (here S. cerevisiae and D. melanogaster) and a query species (P. falciparum). It aims at identifying clusters of coexpressed genes in the query species whose coexpression has been conserved in the reference species. These conserved clusters of coexpressed genes are then used to assess annotation transfers between genes with low sequence similarity, enabling reliable transfers of annotations from the reference to the query species. The approach was used with transcriptomic data sets of P. falciparum, S. cerevisiae and D. melanogaster, and enabled us to propose with high confidence new/refined annotations for several dozens hypothetical/putative P. falciparum genes. Notably, we revised the annotation of genes involved in ribosomal proteins and ribosome biogenesis and assembly, thus highlighting several potential drug targets.
Our approach uses both sequence similarity and gene expression data to help inter-species gene annotation transfers. Experiments show that this strategy improves the accuracy achieved when using solely sequence similarity and outperforms the accuracy of the GBA approach. In addition, our experiments with P. falciparum show that it can infer a function for numerous hypothetical genes.
Individual researchers are struggling to keep up with the accelerating emergence of high-throughput biological data, and to extract information that relates to their specific questions. Integration of accumulated evidence should permit researchers to form fewer - and more accurate - hypotheses for further study through experimentation.
Here a method previously used to predict Gene Ontology (GO) terms for Saccharomyces cerevisiae (Tian et al.: Combining guilt-by-association and guilt-by-profiling to predict Saccharomyces cerevisiae gene function. Genome Biol 2008, 9(Suppl 1):S7) is applied to predict GO terms and phenotypes for 21,603 Mus musculus genes, using a diverse collection of integrated data sources (including expression, interaction, and sequence-based data). This combined 'guilt-by-profiling' and 'guilt-by-association' approach optimizes the combination of two inference methodologies. Predictions at all levels of confidence are evaluated by examining genes not used in training, and top predictions are examined manually using available literature and knowledge base resources.
We assigned a confidence score to each gene/term combination. The results provided high prediction performance, with nearly every GO term achieving greater than 40% precision at 1% recall. Among the 36 novel predictions for GO terms and 40 for phenotypes that were studied manually, >80% and >40%, respectively, were identified as accurate. We also illustrate that a combination of 'guilt-by-profiling' and 'guilt-by-association' outperforms either approach alone in their application to M. musculus.
Global meta-analysis (GMA) of microarray data to identify genes with highly similar co-expression profiles is emerging as an accurate method to predict gene function and phenotype, even in the absence of published data on the gene(s) being analyzed. With a third of human genes still uncharacterized, this approach is a promising way to direct experiments and rapidly understand the biological roles of genes. To predict function for genes of interest, GMA relies on a guilt-by-association approach to identify sets of genes with known functions that are consistently co-expressed with it across different experimental conditions, suggesting coordinated regulation for a specific biological purpose. Our goal here is to define how sample, dataset size and ranking parameters affect prediction performance.
13,000 human 1-color microarrays were downloaded from GEO for GMA analysis. Prediction performance was benchmarked by calculating the distance within the Gene Ontology (GO) tree between predicted function and annotated function for sets of 100 randomly selected genes. We find the number of new predicted functions rises as more datasets are added, but begins to saturate at a sample size of approximately 2,000 experiments. For the gene set used to predict function, we find precision to be higher with smaller set sizes, yet with correspondingly poor recall and, as set size is increased, recall and F-measure also tend to increase but at the cost of precision.
Of the 20,813 genes expressed in 50 or more experiments, at least one predicted GO category was found for 72.5% of them. Of the 5,720 genes without GO annotation, 4,189 had at least one predicted ontology using top 40 co-expressed genes for prediction analysis. For the remaining 1,531 genes without GO predictions or annotations, ~17% (257 genes) had sufficient co-expression data yet no statistically significantly overrepresented ontologies, suggesting their regulation may be more complex.
β-Glucosidase 2 (GBA2) is a resident enzyme of the endoplasmic reticulum thought to play a role in the metabolism of bile acid–glucose conjugates. To gain insight into the biological function of this enzyme and its substrates, we generated mice deficient in GBA2 and found that these animals had normal bile acid metabolism. Knockout males exhibited impaired fertility. Microscopic examination of sperm revealed large round heads (globozoospermia), abnormal acrosomes, and defective mobility. Glycolipids, identified as glucosylceramides by mass spectrometry, accumulated in the testes, brains, and livers of the knockout mice but did not cause obvious neurological symptoms, organomegaly, or a reduction in lifespan. Recombinant GBA2 hydrolyzed glucosylceramide to glucose and ceramide; the same reaction catalyzed by the β-glucosidase acid 1 (GBA1) defective in subjects with the Gaucher’s form of lysosomal storage disease. We conclude that GBA2 is a glucosylceramidase whose loss causes accumulation of glycolipids and an endoplasmic reticulum storage disease.
Gaucher's disease is an autosomal recessive, lysosomal storage disease caused by mutations of the β-glucocerebrosidase gene (GBA). There is increasing evidence that GBA mutations are a genetic risk factor for the development of Parkinson's disease (PD). We report herein a family of Koreans exhibiting parkinsonism-associated GBA mutations.
A 44-year-old woman suffering from slowness and paresthesia of the left arm for the previous 1.5years, visited our hospital to manage known invasive ductal carcinoma. During a preoperative evaluation, she was diagnosed with Gaucher's disease and double mutations of S271G and R359X in GBA. Parkinsonian features including low amplitude postural tremors, rigidity, bradykinesia and shuffling gait were observed. Genetic analysis also revealed that her older sister, who had also been diagnosed with PD and had been taking dopaminergic drugs for 8-years, also possessed a heterozygote R359X mutation in GBA. 18F-fluoropropylcarbomethoxyiodophenylnortropane positron-emission tomography in these patients revealed decreased uptake of dopamine transporter in the posterior portion of the bilateral putamen.
This case study demonstrates Korean familial cases of PD with heterozygote mutation of GBA, further supporting the association between PD and GBA mutation.
Gaucher's disease; glucocerebroside; Parkinson's diseases