Whole genome sequencing of matched tumor-normal sample pairs is becoming routine in cancer research. However, analysis of somatic copy-number changes from sequencing data is still challenging because of insufficient sequencing coverage, unknown tumor sample purity and subclonal heterogeneity. Here we describe a computational framework, named SomatiCA, which explicitly accounts for tumor purity and subclonality in the analysis of somatic copy-number profiles. Taking read depths (RD) and lesser allele frequencies (LAF) as input, SomatiCA will output 1) admixture rate for each tumor sample, 2) somatic allelic copy-number for each genomic segment, 3) fraction of tumor cells with subclonal change in each somatic copy number aberration (SCNA), and 4) a list of substantial genomic aberration events including gain, loss and LOH. SomatiCA is available as a Bioconductor R package at http://www.bioconductor.org/packages/2.13/bioc/html/SomatiCA.html.
Genetic factors influence the risk for posttraumatic stress disorder (PTSD), a potentially chronic and disabling psychiatric disorder that can arise after exposure to trauma. Candidate gene association studies have identified few genetic variants that contribute to PTSD risk.
We conducted genome-wide association analyses in 1578 European Americans (EAs), including 300 PTSD cases, and 2766 African Americans, including 444 PTSD cases, to find novel common risk alleles for PTSD. We used the Illumina Omni1-Quad microarray, which yielded approximately 870,000 single nucleotide polymorphisms (SNPs) suitable for analysis.
In EAs, we observed that one SNP on chromosome 7p12, rs406001, exceeded genome-wide significance (p = 3.97×10−8). A SNP that maps to the first intron of the Tolloid-Like 1 gene (TLL1) showed the second strongest evidence of association, although no SNPs at this locus reached genome-wide significance. We then tested six SNPs in an independent sample of nearly 2000 EAs and successfully replicated the association findings for two SNPs in the first intron of TLL1, rs6812849 and rs7691872, with p values of 6.3×10−6 and 2.3×10−4, respectively. In the combined sample, rs6812849 had a p value of 3.1 ×10−9. No significant signals were observed in the African American part of the sample. Genome-wide association study analyses restricted to trauma-exposed individuals yielded very similar results.
This study identified TLL1 as a new susceptibility gene for PTSD.
American populations; genome-wide association study; posttraumatic stress disorder; TLL1
Motivation: It is well recognized that the effects of drugs are far beyond targeting individual proteins, but rather influencing the complex interactions among many relevant biological pathways. Genome-wide expression profiling before and after drug treatment has become a powerful approach for capturing a global snapshot of cellular response to drugs, as well as to understand drugs’ mechanism of action. Therefore, it is of great interest to analyze this type of transcriptomic profiling data for the identification of pathways responsive to different drugs. However, few computational tools exist for this task.
Results: We have developed FacPad, a Bayesian sparse factor model, for the inference of pathways responsive to drug treatments. This model represents biological pathways as latent factors and aims to describe the variation among drug-induced gene expression alternations in terms of a much smaller number of latent factors. We applied this model to the Connectivity Map data set (build 02) and demonstrated that FacPad is able to identify many drug–pathway associations, some of which have been validated in the literature. Although this method was originally designed for the analysis of drug-induced transcriptional alternation data, it can be naturally applied to many other settings beyond polypharmacology.
Availability and implementation: The R package ‘FacPad’ is publically available at: http://cran.open-source-solution.org/web/packages/FacPad/
Supplementary data are available at Bioinformatics online.
Supravalvular aortic stenosis (SVAS) is caused by mutations in the elastin (ELN) gene and is characterized by abnormal proliferation of vascular smooth muscle cells (SMCs) that can lead to narrowing or blockage of the ascending aorta and other arterial vessels. Availability of patient-specific SMCs may facilitate studying disease mechanisms and developing novel therapeutic interventions.
Methods and Results
Here, we report the development of a human induced pluripotent stem cell (iPSC) line from a patient with SVAS caused by the premature termination in exon 10 of the ELN gene due to an exon 9 4-nucleotide insertion. We showed that SVAS iPSC-derived SMCs (iPSC-SMCs) had significantly fewer organized networks of smooth muscle alpha actin (SM α-actin) filament bundles, a hallmark of mature contractile SMCs, compared to control iPSC-SMCs. Addition of elastin recombinant protein or enhancement of small GTPase RhoA signaling was able to rescue the formation of SM α-actin filament bundles in SVAS iPSC-SMCs. Cell counts and BrdU analysis revealed a significantly higher proliferation rate in SVAS iPSC-SMCs than control iPSC-SMCs. Furthermore, SVAS iPSC-SMCs migrated at a markedly higher rate to the chemotactic agent platelet-derived growth factor (PDGF) in comparison with the control iPSC-SMCs. We also provided evidence that elevated activity of extracellular signal-regulated kinase 1/2 (ERK1/2) is required for hyper-proliferation of SVAS iPSC-SMCs. The phenotype was confirmed in iPSC-SMCs generated from a patient with deletion of elastin due to Williams-Beuren syndrome (WBS).
Thus, SVAS iPSC-SMCs recapitulate key pathological features of patients with SVAS and may provide a promising strategy to study disease mechanisms and to develop novel therapies.
elastin; induced pluripotent stem cells; smooth muscle alpha actin filament bundle; smooth muscle cells; supravalvular aortic stenosis
Cortical layer 5 pyramidal neurons and spinal cord motor neurons are selectively vulnerable to degeneration after loss of the autophagy gene Epg5.
The molecular mechanism underlying the selective vulnerability of certain neuronal populations associated with neurodegenerative diseases remains poorly understood. Basal autophagy is important for maintaining axonal homeostasis and preventing neurodegeneration. In this paper, we demonstrate that mice deficient in the metazoan-specific autophagy gene Epg5/epg-5 exhibit selective damage of cortical layer 5 pyramidal neurons and spinal cord motor neurons. Pathologically, Epg5 knockout mice suffered muscle denervation, myofiber atrophy, late-onset progressive hindquarter paralysis, and dramatically reduced survival, recapitulating key features of amyotrophic lateral sclerosis (ALS). Epg5 deficiency impaired autophagic flux by blocking the maturation of autophagosomes into degradative autolysosomes, leading to accumulation of p62 aggregates and ubiquitin-positive inclusions in neurons and glial cells. Epg5 knockdown also impaired endocytic trafficking. Our study establishes Epg5-deficient mice as a model for investigating the pathogenesis of ALS and indicates that dysfunction of the autophagic–endolysosomal system causes selective damage of neurons associated with neurodegenerative diseases.
Fibroblast growth factor (Fgf) and Wnt signaling are necessary for the intertwined processes of tail elongation, mesodermal development and somitogenesis. Here, we use pharmacological modifiers and time-resolved quantitative analysis of both nascent transcription and protein phosphorylation in the tailbud, to distinguish early effects of signal perturbation from later consequences related to cell fate changes. We demonstrate that Fgf activity elevates Wnt signaling by inhibiting transcription of the Wnt antagonists dkk1 and notum1a. PI3 kinase signaling also increases Wnt signaling via phosphorylation of Gsk3β. Conversely, Wnt can increase signaling within the Mapk branch of the Fgf pathway as Gskβ phosphorylation elevates phosphorylation levels of Erk. Despite the reciprocal positive regulation between Fgf and Wnt, the two pathways generally have opposing effects on the transcription of co-regulated genes. This opposing regulation of target genes may represent a rudimentary relationship that manifests as out-of-phase oscillation of Fgf and Wnt target genes in the mouse and chick tailbud. In summary, these data suggest that Fgf and Wnt signaling are tightly integrated to maintain proportional levels of activity in the zebrafish tailbud, and this balance is important for axis elongation, cell fate specification and somitogenesis.
Fgf signaling; Wnt signaling; paraxial mesoderm; tailbud; axis elongation
Most approaches for analyzing ChIP-Seq data are focused on inferring exact protein binding sites from a single library. However, frequently multiple ChIP-Seq libraries derived from differing cell lines or tissue types from the same individual may be available. In such a situation, a separate analysis for each tissue or cell line may be inefficient. Here, we describe a novel method to analyze such data that intelligently uses the joint information from multiple related ChIP-Seq libraries. We present our method as a two-stage procedure. First, separate single cell line analysis is performed for each cell line. Here, we use a novel mixture regression approach to infer the subset of genes that are most likely to be involved in protein binding in each cell line. In the second step, we combine the separate single cell line analyses using an Empirical Bayes algorithm that implicitly incorporates inter-cell line correlation. We demonstrate the usefulness of our method using both simulated data, as well as real H3K4me3 and H3K27me3 histone methylation libraries.
Empirical Bayes; EM; ChIP-Seq; histone methylation
DNA copy number variation (CNV) accounts for a large proportion of genetic variation. One commonly used approach to detecting CNVs is array-based comparative genomic hybridization (aCGH). Although many methods have been proposed to analyze aCGH data, it is not clear how to combine information from multiple samples to improve CNV detection. In this paper, we propose to use a matrix to approximate the multisample aCGH data and minimize the total variation of each sample as well as the nuclear norm of the whole matrix. In this way, we can make use of the smoothness property of each sample and the correlation among multiple samples simultaneously in a convex optimization framework. We also developed an efficient and scalable algorithm to handle large-scale data. Experiments demonstrate that the proposed method outperforms the state-of-the-art techniques under a wide range of scenarios and it is capable of processing large data sets with millions of probes.
CNV; aCGH; total variation; spectral regularization; convex optimization
The effects of alleles in many genes are believed to contribute to common complex diseases such as hypertension. Whether risk alleles comprise a small number of common variants or many rare independent mutations at trait loci is largely unknown. We screened members of the Framingham Heart Study (FHS) for variation in three genes -SLC12A3 (NCCT), SLC12A1 (NKCC2) and KCNJ1 (ROMK)- causing rare recessive diseases featuring large reductions in blood pressure. Using comparative genomics, genetics, and biochemistry, we identified subjects with mutations proven or inferred to be functional. These mutations, all heterozygous and rare, produce clinically significant blood pressure reduction and protect from development of hypertension. Our findings implicate many rare alleles that alter renal salt handling in blood pressure variation in the general population, and identify alleles with health benefit that are nonetheless under purifying selection. These findings have implications for the genetic architecture of hypertension and other common complex traits.
Many statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. Limited work has been carried out in the regression setting to study the effects of clinical covariates and expression levels of genes in a pathway either on a continuous or on a binary clinical outcome. Hence, we propose a Bayesian approach for identifying pathways related to both types of outcomes. We compare our Bayesian approaches with a likelihood-based approach that was developed by relating a least squares kernel machine for nonparametric pathway effect with a restricted maximum likelihood for variance components. Unlike the likelihood-based approach, the Bayesian approach allows us to directly estimate all parameters and pathway effects. It can incorporate prior knowledge into Bayesian hierarchical model formulation and makes inference by using the posterior samples without asymptotic theory. We consider several kernels (Gaussian, polynomial, and neural network kernels) to characterize gene expression effects in a pathway on clinical outcomes. Our simulation results suggest that the Bayesian approach has more accurate coverage probability than the likelihood-based approach, and this is especially so when the sample size is small compared with the number of genes being studied in a pathway. We demonstrate the usefulness of our approaches through its applications to a type II diabetes mellitus data set. Our approaches can also be applied to other settings where a large number of strongly correlated predictors are present.
Gaussian random process; kernel machine; pathway
Next generation sequencing is widely used to study complex diseases because of its ability to identify both common and rare variants without prior single nucleotide polymorphism (SNP) information. Pooled sequencing of implicated target regions can lower costs and allow more samples to be analyzed, thus improving statistical power for disease-associated variant detection. Several methods for disease association tests of pooled data and for optimal pooling designs have been developed under certain assumptions of the pooling process, e.g. equal/unequal contributions to the pool, sequencing depth variation, and error rate. However, these simplified assumptions may not portray the many factors affecting pooled sequencing data quality, such as PCR amplification during target capture and sequencing, reference allele preferential bias, and others. As a result, the properties of the observed data may differ substantially from those expected under the simplified assumptions. Here, we use real datasets from targeted sequencing of pooled samples, together with microarray SNP genotypes of the same subjects, to identify and quantify factors (biases and errors) affecting the observed sequencing data. Through simulations, we find that these factors have a significant impact on the accuracy of allele frequency estimation and the power of association tests. Furthermore, we develop a workflow protocol to incorporate these factors in data analysis to reduce the potential biases and errors in pooled sequencing data and to gain better estimation of allele frequencies. The workflow, Psafe, is available at http://bioinformatics.med.yale.edu/group/.
pooled sequencing; allele frequency estimation; next-generation sequencing; disease association tests
Pre-clinical and clinical studies have implicated changes in cytokine and innate immune gene-expression in both the development of and end-organ damage resulting from alcohol dependence. However, these changes have not been systematically assessed on the basis of alcohol consumption in human subjects.
Illumina Sentrix Beadchip (Human-6v2) microarrays were used to measure levels of gene-expression in peripheral blood in 3 groups of subjects: those with alcohol dependence (AD, n=12), heavy drinkers (HD, defined as regular alcohol use over the past year of at least 8 standard drinks/week for women and at least 15 standard drinks/week for men, n=13), and moderate drinkers (MD, defined as up to 7 standard drinks/week for women and 14 standard drinks/week for men, n=17).
436 genes were differentially expressed among the three groups of subjects (FDR corrected p-value < 0.05). 291 genes differed between AD and MD subjects, 240 differed between AD and HD subjects, but only 6 differed between HD and MD subjects. Pathway analysis using DAVID and GeneGO Metacore software showed that the most affected pathways were those related to T-cell receptor and JAK-Stat (Janus kinase-Signal transducer and activator of transcription) signaling.
These results suggest the transition from heavy alcohol use to dependence is accompanied by changes in the expression of genes involved in regulation of the innate immune response. Such changes may underlie some of the previously described changes in immune function associated with chronic alcohol abuse. Early detection of these changes may allow individuals at high risk for dependence to be identified.
alcohol dependence; IL-15; IL-21; Janus kinase; Signal transducer and activator of transcription; microarray
Hair bundles of the inner ear have a unique structure and protein composition that underlies their sensitivity to mechanical stimulation. Using mass spectrometry, we identified and quantified >1100 proteins, present from a few to 400,000 copies per stereocilium, from purified chick bundles; 336 of these were significantly enriched in bundles. Bundle proteins that we detected have been shown to regulate cytoskeleton structure and dynamics, energy metabolism, phospholipid synthesis, and cell signaling. Three-dimensional imaging using electron tomography allowed us to count the number of actin-actin crosslinkers and actin-membrane connectors; these values compared well to those obtained from mass spectrometry. Network analysis revealed several hub proteins, including RDX (radixin) and SLC9A3R2 (NHERF2), which interact with many bundle proteins and may perform functions essential for bundle structure and function. The quantitative mass spectrometry of bundle proteins reported here establishes a framework for future characterization of dynamic processes that shape bundle structure and function.
For more fruitful discoveries of genetic variants associated with diseases in genome-wide association studies, it is important to know whether joint analysis of multiple markers is more powerful than the commonly used single-marker analysis, especially in the presence of gene-gene interactions. This article provides a statistical framework to rigorously address this question through analytical power calculations for common model search strategies to detect binary trait loci: marginal search, exhaustive search, forward search, and two-stage screening search. Our approach incorporates linkage disequilibrium, random genotypes, and correlations among score test statistics of logistic regressions. We derive analytical results under two power definitions: the power of finding all the associated markers and the power of finding at least one associated marker. We also consider two types of error controls: the discovery number control and the Bonferroni type I error rate control. After demonstrating the accuracy of our analytical results by simulations, we apply them to consider a broad genetic model space to investigate the relative performances of different model search strategies. Our analytical study provides rapid computation as well as insights into the statistical mechanism of capturing genetic signals under different genetic models including gene-gene interactions. Even though we focus on genetic association analysis, our results on the power of model selection procedures are clearly very general and applicable to other studies.
model selection; statistical power; random predictor; genome-wide association studies; gene-gene interaction
Recent studies of 16S rRNA sequences through next-generation sequencing have revolutionized our understanding of the microbial community composition and structure. One common approach in using these data to explore the genetic diversity in a microbial community is to cluster the 16S rRNA sequences into Operational Taxonomic Units (OTUs) based on sequence similarities. The inferred OTUs can then be used to estimate species, diversity, composition, and richness. Although a number of methods have been developed and commonly used to cluster the sequences into OTUs, relatively little guidance is available on their relative performance and the choice of key parameters for each method. In this study, we conducted a comprehensive evaluation of ten existing OTU inference methods. We found that the appropriate dissimilarity value for defining distinct OTUs is not only related with a specific method but also related with the sample complexity. For data sets with low complexity, all the algorithms need a higher dissimilarity threshold to define OTUs. Some methods, such as, CROP and SLP, are more robust to the specific choice of the threshold than other methods, especially for shorter reads. For high-complexity data sets, hierarchical cluster methods need a more strict dissimilarity threshold to define OTUs because the commonly used dissimilarity threshold of 3% often leads to an under-estimation of the number of OTUs. In general, hierarchical clustering methods perform better at lower dissimilarity thresholds. Our results show that sequence abundance plays an important role in OTU inference. We conclude that care is needed to choose both a threshold for dissimilarity and abundance for OTU inference.
The alcohol dehydrogenase 1C (ADH1C) subunit is an important member of the alcohol dehydrogenase family, a set of genes that plays a major role in the catabolism of ethanol. Numerous association studies have provided compelling evidence that ADH1C gene variation (formerly ADH3) is associated with altered genetic susceptibility to alcoholism and alcohol-related liver disease, cirrhosis, or pancreatitis. However, the results have been inconsistent, partially because each study involved a limited number of subjects, and some were underpowered. Using cumulative data over the past two decades, this meta-analysis (6,796 cases and 6,938 controls) considered samples of Asian, European, African, and Native American origins to examine whether the aggregate genotype provide statistically significant evidence of association. The results showed strong evidence of association between ADH1C Ile350Val (rs698, formerly ADH1C *1/*2) and alcohol dependence (AD) and abuse in the combined studies. The overall allelic (Val vs. Ile or *2 vs. *1) P value was 1×10−8 and Odds Ratio (OR) was 1.51 (1.31, 1.73). The Asian populations produced stronger evidence of association with an allelic P value of 4×10−33 (OR = 2.14 (1.89, 2.43)) with no evidence of heterogeneity, and the dominant and recessive models revealed even stronger effect sizes. The strong evidence remained when stricter criteria and sub-group analyses were applied, while Asians always showed stronger associations than other populations. Our findings support that ADH1C Ile may lower the risk of AD and alcohol abuse as well as alcohol-related cirrhosis in pooled populations, with the strongest and most consistent effects in Asians.
Meta-analysis; Association; Ethanol Oxidation; Addiction; ADH1C
Hepatitis C virus (HCV) is the most common chronic blood-borne infection in the United States, with the majority of patients becoming chronically infected and a subset (20%) progressing to cirrhosis and hepatocellular carcinoma. Individual variations in immune responses may help define successful resistance to infection with HCV. We have compared the immune response in primary macrophages from patients who have spontaneously cleared HCV (viral load negative [VL−], n = 37) to that of primary macrophages from HCV genotype 1 chronically infected (VL+) subjects (n = 32) and found that macrophages from VL− subjects have an elevated baseline expression of Toll-like receptor 3 (TLR3). Macrophages from HCV patients were stimulated ex vivo through the TLR3 pathway and assessed using gene expression arrays and pathway analysis. We found elevated TLR3 response genes and pathway activity from VL− subjects. Furthermore, macrophages from VL− subjects showed higher production of beta interferon (IFN-β) and related IFN response genes by quantitative PCR (Q-PCR) and increased phosphorylation of STAT-1 by immunoblotting. Analysis of polymorphisms in TLR3 revealed a significant association of intronic TLR3 polymorphism (rs13126816) with the clearance of HCV and the expression of TLR3. Of note, peripheral blood mononuclear cells (PBMCs) from the same donors showed opposite changes in gene expression, suggesting ongoing inflammatory responses in PBMCs from VL+ HCV patients. Our results suggest that an elevated innate immune response enhances HCV clearance mechanisms and may offer a potential therapeutic approach to increase viral clearance.
As a technique that allows simultaneous quantitation of proteins in multiple samples, iTRAQ (isobaric Tags for Relative and Absolute Quantitation) has gained increased interest and applications in proteomics research. Despite its success, iTRAQ data present a number of statistical challenges even after the proteins and peptides are identified and the peak areas of the reported ions are estimated for peptide intensities. In this article, we review recent studies on the analysis of iTRAQ data, the computation problems involved and the nonrandom missingness in the iTRAQ data.
iTRAQ; ANOVA; Nonrandom missing; Bayesian hierarchical model; Mass spectrometry
Motivation: Pathway-based drug discovery considers the therapeutic effects of compounds in the global physiological environment. This approach has been gaining popularity in recent years because the target pathways and mechanism of action for many compounds are still unknown, and there are also some unexpected off-target effects. Therefore, the inference of drug-pathway associations is a crucial step to fully realize the potential of system-based pharmacological research. Transcriptome data offer valuable information on drug-pathway targets because the pathway activities may be reflected through gene expression levels. Hence, it is of great interest to jointly analyze the drug sensitivity and gene expression data from the same set of samples to investigate the gene-pathway–drug-pathway associations.
Results: We have developed iFad, a Bayesian sparse factor analysis model to jointly analyze the paired gene expression and drug sensitivity datasets measured across the same panel of samples. The model enables direct incorporation of prior knowledge regarding gene-pathway and/or drug-pathway associations to aid the discovery of new association relationships. We use a collapsed Gibbs sampling algorithm for inference. Satisfactory performance of the proposed model was found for both simulated datasets and real data collected on the NCI-60 cell lines. Our results suggest that iFad is a promising approach for the identification of drug targets. This model also provides a general statistical framework for pathway-based integrative analysis of other types of -omics data.
Availability: The R package ‘iFad’ and real NCI-60 dataset used are available at http://bioinformatics.med.yale.edu/group/.
Supplementary data are available at Bioinformatics online.
The West Nile virus (WNV) is an emerging infection of biodefense concern and there are no available treatments or vaccines. Here we used a high-throughput method based on a novel gene expression analysis, RNA-Seq, to give a global picture of differential gene expression by primary human macrophages of 10 healthy donors infected in vitro with WNV. From a total of 28 million reads per sample, we identified 1,514 transcripts that were differentially expressed after infection. Both predicted and novel gene changes were detected, as were gene isoforms, and while many of the genes were expressed by all donors, some were unique. Knock-down of genes not previously known to be associated with WNV resistance identified their critical role in control of viral infection. Our study distinguishes both common gene pathways as well as novel cellular responses. Such analyses will be valuable for translational studies of susceptible and resistant individuals—and for targeting therapeutics—in multiple biological settings.
anti-viral gene expression; immune response; macrophage; RNA-Seq; West Nile virus
Recent genome-wide association studies have identified many genetic variants affecting complex human diseases. It is of great interest to build disease risk prediction models based on these data. In this article, we first discuss statistical challenges in using genome-wide association data for risk predictions, and then review the findings from the literature on this topic. We also demonstrate the performance of different methods through both simulation studies and application to real-world data.
Complex traits; Genome-wide association studies; High-dimensional data; Risk prediction; Single-nucleotide polymorphism
Mutations in GBA1 gene result in defective acid β-glucosidase and the complex phenotype of Gaucher disease (GD) related to the accumulation of glucosylceramide-laden macrophages. The phenotype is highly variable even among patients harboring identical GBA1 mutations. We hypothesized that modifier gene(s) underlie phenotypic diversity in GD and performed a GWAS study in Ashkenazi Jewish patients with type 1 GD (GD1), homozygous for N370S mutation. Patients were assigned to mild, moderate or severe disease category using composite disease severity scoring systems. Whole-genome genotyping for >500,000 SNPs was performed to search for associations using OQLS algorithm in 139 eligible patients. Several SNPs in linkage disequilibrium within the CLN8 gene locus were associated with the GD1 severity: SNP rs11986414 was associated with GD1 severity at p value 1.26 × 10−6. Compared to mild disease, risk allele A at rs11986414 conferred an odds ratio of 3.72 for moderate/severe disease. Loss of function mutations in CLN8 causes neuronal ceroid-lipofuscinosis but our results indicate that its increased expression may protect against severe GD1. In cultured skin fibroblasts, the relative expression of CLN8 was higher in mild GD compared to severely affected patients in whom CLN8 risk alleles were over-represented. In an in vitro cell model of GD, CLN8 expression was increased which was further enhanced in the presence of bioactive substrate, glucosylsphingosine. Taken together, CLN8 is a candidate modifier gene for GD1 that may function as a protective sphingolipid sensor and/or in glycosphingolipid trafficking. Future studies should explore the role of CLN8 in pathophysiology of GD.
Gaucher disease; GWAS; genotype/phenotype correlations; phenotypic diversity; modifier genes; CLN8; N370S; GBA mutations
We have reported that, in addition to recapitulating the classical human Gaucher disease (GD1) phenotype, deletion of the glucocerebrosidase (GBA1) gene in mice results in the dysfunction of a diverse population of immune cells. Most of immune-related, non-classical features of GD1, including gammopathies and autoimmune diathesis, are resistant to macrophage-directed therapies. This has prompted a search for newer agents for human GD1. Here, we used high-density microarray on splenic and liver cells from affected GBA1−/− mice to establish a gene “signature”, which was then utilized to interrogate the Broad Institute database, CMAP. Computational connectivity mapping of disease and drug pairs through CMAP revealed several highly enriched, non-null, mimic and anti-mimic hits. Most notably, two compounds with anti-helminthic properties, namely albendazole and oxamniquine, were identified; these are particularly relevant for future testing as the expression of chitinases is enhanced in GD1.
The increased vulnerability to alcohol dependence (AD) seen in individuals with childhood adversity (CA) may result in part from CA-induced epigenetic changes. To examine CA-associated DNA methylation changes in AD patients, we examined peripheral blood DNA methylation levels of 384 CpGs in promoter regions of 82 candidate genes in 279 African Americans [AAs; 88 with CA (70.5% with AD) and 191 without CA (38.2% with AD)] and 239 European Americans [EAs; 61 with CA (86.9% with AD) and 178 without CA (46.6% with AD)] using Illumina GoldenGate Methylation Array assays. The effect of CA on methylation of individual CpGs and overall methylation in promoter regions of genes was evaluated using a linear regression analysis (with consideration of sex, age, and ancestry proportion of subjects) and a principal components-based analysis, respectively. In EAs, hypermethylation of 10 CpGs in seven genes (ALDH1A1, CART, CHRNA5, HTR1B, OPRL1, PENK, and RGS19) were cross validated in AD patients and healthy controls who were exposed to CA. P values of two CpGs survived Bonferroni correction when all EA samples were analyzed together to increase statistical power [CHRNA5_cg17108064: Padjust = 2.54×10−5; HTR1B_cg06031989: Padjust = 8.98×10−5]. Moreover, overall methylation levels in the promoter regions of three genes (ALDH1A1, OPRL1 and RGS19) were elevated in both EA case and control subjects who were exposed to CA. However, in AAs, CA-associated DNA methylation changes in AD patients were not validated in healthy controls. Our findings suggest that CA could induce population-specific methylation alterations in the promoter regions of specific genes, thus leading to changes in gene transcription and an increased risk for AD and other disorders.