Little is known about how changes in DNA methylation mediate risk for human diseases including dementia. Analysis of genome-wide methylation patterns in patients with two forms of tau-related dementia – progressive supranuclear palsy (PSP) and frontotemporal dementia (FTD) – revealed significant differentially methylated probes (DMPs) in patients versus unaffected controls. Remarkably, DMPs in PSP were clustered within the 17q21.31 region, previously known to harbor the major genetic risk factor for PSP. We identified and replicated a dose-dependent effect of the risk-associated H1 haplotype on methylation levels within the region in blood and brain. These data reveal that the H1 haplotype increases risk for tauopathy via differential methylation at that locus, indicating a mediating role for methylation in dementia pathophysiology.
Progressive supranuclear palsy (PSP) and frontotemporal dementia (FTD) are two neurodegenerative diseases linked, at the pathologic and genetic level, to the microtubule associated protein tau. We studied epigenetic changes (DNA methylation levels) in peripheral blood from patients with PSP, FTD, and unaffected controls. Analysis of genome-wide methylation patterns revealed significant differentially methylated probes in patients versus unaffected controls. Remarkably, differentially methylated probes in PSP vs. controls were preferentially clustered within the 17q21.31 region, previously known to harbor the major genetic risk factor for PSP. We identified and replicated a dose-dependent effect of the risk-associated H1 haplotype on methylation levels within the region in independent datasets in blood and brain. These data reveal that the H1 haplotype increases risk for tauopathy via differential methylation, indicating a mediating role for methylation in dementia pathophysiology.
The combination of expression patterns of AGR2 and CD10 by prostate cancer provided four phenotypes that correlated with clinical outcome. Based on immunophenotyping, CD10lowAGR2high, CD10highAGR2high, CD10lowAGR2low, and CD10highAGR2low were distinguished. AGR2+ tumors were associated with longer recurrence-free survival and CD10+ tumors with shorter recurrence-free survival. In high-stage cases, the CD10lowAGR2high phenotype was associated with a 9-fold higher recurrence-free survival than the CD10highAGR2low phenotype. The CD10highAGR2high and CD10lowAGR2low phenotypes were intermediate. The CD10highAGR2low phenotype was most frequent in high-grade primary tumors. Conversely, bone and other soft tissue metastases, and derivative xenografts, expressed more AGR2 and less CD10. AGR2 protein was readily detected in tumor metastases. The CD10highAGR2low phenotype in primary tumors is predictive of poor outcome; however, the CD10lowAGR2high phenotype is more common in metastases. It appears that AGR2 has a protective function in primary tumors but may have a role in the distal spread of tumor cells.
Prostate cancer; AGR2; CD10; cancer cell phenotypes; patient stratification; bone and soft tissue metastases; xenografts
Abnormalities of the intestinal microbiota are implicated in the pathogenesis of Crohn's disease (CD) and ulcerative colitis (UC), two spectra of inflammatory bowel disease (IBD). However, the high complexity and low inter-individual overlap of intestinal microbial composition are formidable barriers to identifying microbial taxa representing this dysbiosis. These difficulties might be overcome by an ecologic analytic strategy to identify modules of interacting bacteria (rather than individual bacteria) as quantitative reproducible features of microbial composition in normal and IBD mucosa. We sequenced 16S ribosomal RNA genes from 179 endoscopic lavage samples from different intestinal regions in 64 subjects (32 controls, 16 CD and 16 UC patients in clinical remission). CD and UC patients showed a reduction in phylogenetic diversity and shifts in microbial composition, comparable to previous studies using conventional mucosal biopsies. Analysis of weighted co-occurrence network revealed 5 microbial modules. These modules were unprecedented, as they were detectable in all individuals, and their composition and abundance was recapitulated in an independent, biopsy-based mucosal dataset 2 modules were associated with healthy, CD, or UC disease states. Imputed metagenome analysis indicated that these modules displayed distinct metabolic functionality, specifically the enrichment of oxidative response and glycan metabolism pathways relevant to host-pathogen interaction in the disease-associated modules. The highly preserved microbial modules accurately classified IBD status of individual patients during disease quiescence, suggesting that microbial dysbiosis in IBD may be an underlying disorder independent of disease activity. Microbial modules thus provide an integrative view of microbial ecology relevant to IBD.
It is not yet known whether DNA methylation levels can be used to accurately predict age across a broad spectrum of human tissues and cell types, nor whether the resulting age prediction is a biologically meaningful measure.
I developed a multi-tissue predictor of age that allows one to estimate the DNA methylation age of most tissues and cell types. The predictor, which is freely available, was developed using 8,000 samples from 82 Illumina DNA methylation array datasets, encompassing 51 healthy tissues and cell types. I found that DNA methylation age has the following properties: first, it is close to zero for embryonic and induced pluripotent stem cells; second, it correlates with cell passage number; third, it gives rise to a highly heritable measure of age acceleration; and, fourth, it is applicable to chimpanzee tissues. Analysis of 6,000 cancer samples from 32 datasets showed that all of the considered 20 cancer types exhibit significant age acceleration, with an average of 36 years. Low age-acceleration of cancer tissue is associated with a high number of somatic mutations and TP53 mutations, while mutations in steroid receptors greatly accelerate DNA methylation age in breast cancer. Finally, I characterize the 353 CpG sites that together form an aging clock in terms of chromatin states and tissue variance.
I propose that DNA methylation age measures the cumulative effect of an epigenetic maintenance system. This novel epigenetic clock can be used to address a host of questions in developmental biology, cancer and aging research.
Rationale and Objective
In this Emerging Science Review, we discuss a systems genetics strategy, which we call Gene Module Association Study (GMAS), as a novel approach complementing Genome Wide Association Studies (GWAS), to understand complex diseases by focusing on how genes work together in groups rather than singly.
The first step is to characterize phenotypic differences among a genetically diverse population. The second step is to use gene expression microarray (or other high throughput) data from the population to construct gene co-expression networks. Co-expression analysis typically groups 20,000 genes into 20–30 modules containing 10’s to 100’s of genes, whose aggregate behavior can be represented by the module’s “eigengene.” The third step is to correlate expression patterns with phenotype, as in GWAS, only applied to eigengenes instead of SNPs.
Results and Conclusions
The goal of the GMAS approach is to identify groups of co-regulated genes that explain complex traits from a systems perspective. From an evolutionary standpoint, we hypothesize that variability in eigengene patterns reflects the “good enough solution” concept, that biological systems are sufficiently complex so that many possible combinations of the same elements (in this case eigengenes) can produce an equivalent output, i.e. a “good enough solution” to accomplish normal biological functions. However, when faced with environmental stresses, some “good enough solutions” adapt better than others, explaining individual variability to disease and drug susceptibility. If validated, GMAS may imply that common polygenic diseases are related as much to group interactions between normal genes, as to multiple gene mutations.
systems genetics; genetics of complex diseases; scale-free networks; hybrid mouse diversity panel; computational biology
We used affinity-purification mass spectrometry to identify 747 candidate proteins that are complexed with Huntingtin (Htt) in distinct brain regions and ages in Huntington’s disease (HD) and wildtype mouse brains. To gain a systems-level view of the Htt interactome, we applied Weighted Gene Correlation Network Analysis (WGCNA) to the entire proteomic dataset to unveil a verifiable rank of Htt-correlated proteins and a network of Htt-interacting protein modules, with each module highlighting distinct aspects of Htt biology. Importantly, the Htt-containing module is highly enriched with proteins involved in 14-3-3 signaling, microtubule-based transport, and proteostasis. Top-ranked proteins in this module were validated as novel Htt interactors and genetic modifiers in an HD Drosophila model. Together, our study provides a compendium of spatiotemporal Htt-interacting proteins in the mammalian brain, and presents a conceptually novel approach to analyze proteomic interactome datasets to build in vivo protein networks in complex tissues such as the brain.
Consistent compositional shifts in the gut microbiota are observed in IBD and other chronic intestinal disorders and may contribute to pathogenesis. The identities of microbial biomolecular mechanisms and metabolic products responsible for disease phenotypes remain to be determined, as do the means by which such microbial functions may be therapeutically modified.
The composition of the microbiota and metabolites in gut microbiome samples in 47 subjects were determined. Samples were obtained by endoscopic mucosal lavage from the cecum and sigmoid colon regions, and each sample was sequenced using the 16S rRNA gene V4 region (Illumina-HiSeq 2000 platform) and assessed by UPLC mass spectroscopy. Spearman correlations were used to identify widespread, statistically significant microbial-metabolite relationships. Metagenomes for identified microbial OTUs were imputed using PICRUSt, and KEGG metabolic pathway modules for imputed genes were assigned using HUMAnN. The resulting metabolic pathway abundances were mostly concordant with metabolite data. Analysis of the metabolome-driven distribution of OTU phylogeny and function revealed clusters of clades that were both metabolically and metagenomically similar.
The results suggest that microbes are syntropic with mucosal metabolome composition and therefore may be the source of and/or dependent upon gut epithelial metabolites. The consistent relationship between inferred metagenomic function and assayed metabolites suggests that metagenomic composition is predictive to a reasonable degree of microbial community metabolite pools. The finding that certain metabolites strongly correlate with microbial community structure raises the possibility of targeting metabolites for monitoring and/or therapeutically manipulating microbial community function in IBD and other chronic diseases.
Microbiome; Metabolome; Inter-omic analysis
Transcriptional studies suggest Alzheimer's disease (AD) involves dysfunction of many cellular pathways, including synaptic transmission, cytoskeletal dynamics, energetics, and apoptosis. Despite known progression of AD pathologies, it is unclear how such striking regional vulnerability occurs, or which genes play causative roles in disease progression.
To address these issues, we performed a large-scale transcriptional analysis in the CA1 and relatively less vulnerable CA3 brain regions of individuals with advanced AD and nondemented controls. In our study, we assessed differential gene expression across region and disease status, compared our results to previous studies of similar design, and performed an unbiased co-expression analysis using weighted gene co-expression network analysis (WGCNA). Several disease genes were identified and validated using qRT-PCR.
We find disease signatures consistent with several previous microarray studies, then extend these results to show a relationship between disease status and brain region. Specifically, genes showing decreased expression with AD progression tend to show enrichment in CA3 (and vice versa), suggesting transcription levels may reflect a region's vulnerability to disease. Additionally, we find several candidate vulnerability (ABCA1, MT1H, PDK4, RHOBTB3) and protection (FAM13A1, LINGO2, UNC13C) genes based on expression patterns. Finally, we use a systems-biology approach based on WGCNA to uncover disease-relevant expression patterns for major cell types, including pathways consistent with a key role for early microglial activation in AD.
These results paint a picture of AD as a multifaceted disease involving slight transcriptional changes in many genes between regions, coupled with a systemic immune response, gliosis, and neurodegeneration. Despite this complexity, we find that a consistent picture of gene expression in AD is emerging.
Activation of the epidermal growth factor receptor (EGFR) in glioblastoma (GBM) occurs through mutations or deletions in the extracellular (EC) domain. Unlike lung cancers with EGFR kinase domain (KD) mutations, GBMs respond poorly to the EGFR inhibitor erlotinib. Using RNAi, we show that GBM cells carrying EGFR EC mutations display EGFR addiction. In contrast to KD mutants found in lung cancer, glioma-specific EGFR EC mutants are poorly inhibited by EGFR inhibitors that target the active kinase conformation (e.g., erlotinib). Inhibitors which bind to the inactive EGFR conformation, on the other hand, potently inhibit EGFR EC mutants and induce cell death in EGFR mutant GBM cells. Our results provide first evidence for single kinase addiction in GBM, and suggest that the disappointing clinical activity of first-generation EGFR inhibitors in GBM versus lung cancer may be attributed to the different conformational requirements of mutant EGFR in these two cancer types.
Many network analyses of fMRI data begin by defining a set of regions, extracting the mean signal from each region and then analyzing the correlations between regions. One essential question that has not been addressed in the literature is how to best define the network neighborhoods over which a signal is combined for network analyses. Here we present a novel unsupervised method for the identification of tightly interconnected voxels, or modules, from fMRI data. This approach, weighted voxel coactivation network analysis (WVCNA) is based on a method that was originally developed to find modules of genes in gene networks. This approach differs from many of the standard network approaches in fMRI in that connections between voxels are described by a continuous measure, whereas typically voxels are considered to be either connected or not connected depending on whether the correlation between the two voxels survives a hard threshold value. Additionally, instead of simply using pairwise correlations to describe the connection between two voxels, WVCNA relies on a measure of topological overlap, which not only compares how correlated two voxels are, but also the degree to which the pair of voxels is highly correlated with the same other voxels. We demonstrate the use of WVCNA to parcellate the brain into a set of modules that are reliably detected across data within the same subject and across subjects. In addition we compare WVCNA to ICA and show that the WVCNA modules have some of the same structure as the ICA components, but tend to be more spatially focused. We also demonstrate the use of some of the WVCNA network metrics for assessing a voxel’s membership to a module and also how that voxel relates to other modules. Last, we illustrate how WVCNA modules can be used in a network analysis to find connections between regions of the brain and show that it produces reasonable results.
Functional Magnetic Resonance Imaging; Functional Connectivity; Graph Theory; Small World Networks
Since hub nodes have been found to play important roles in many networks, highly connected hub genes are expected to play an important role in biology as well. However, the empirical evidence remains ambiguous. An open question is whether (or when) hub gene selection leads to more meaningful gene lists than a standard statistical analysis based on significance testing when analyzing genomic data sets (e.g., gene expression or DNA methylation data). Here we address this question for the special case when multiple genomic data sets are available. This is of great practical importance since for many research questions multiple data sets are publicly available. In this case, the data analyst can decide between a standard statistical approach (e.g., based on meta-analysis) and a co-expression network analysis approach that selects intramodular hubs in consensus modules. We assess the performance of these two types of approaches according to two criteria. The first criterion evaluates the biological insights gained and is relevant in basic research. The second criterion evaluates the validation success (reproducibility) in independent data sets and often applies in clinical diagnostic or prognostic applications. We compare meta-analysis with consensus network analysis based on weighted correlation network analysis (WGCNA) in three comprehensive and unbiased empirical studies: (1) Finding genes predictive of lung cancer survival, (2) finding methylation markers related to age, and (3) finding mouse genes related to total cholesterol. The results demonstrate that intramodular hub gene status with respect to consensus modules is more useful than a meta-analysis p-value when identifying biologically meaningful gene lists (reflecting criterion 1). However, standard meta-analysis methods perform as good as (if not better than) a consensus network approach in terms of validation success (criterion 2). The article also reports a comparison of meta-analysis techniques applied to gene expression data and presents novel R functions for carrying out consensus network analysis, network based screening, and meta analysis.
Autism spectrum disorder (ASD) is a common, highly heritable neuro-developmental condition characterized by marked genetic heterogeneity1–3. Thus, a fundamental question is whether autism represents an etiologically heterogeneous disorder in which the myriad genetic or environmental risk factors perturb common underlying molecular pathways in the brain4. Here, we demonstrate consistent differences in transcriptome organization between autistic and normal brain by gene co-expression network analysis. Remarkably, regional patterns of gene expression that typically distinguish frontal and temporal cortex are significantly attenuated in the ASD brain, suggesting abnormalities in cortical patterning. We further identify discrete modules of co-expressed genes associated with autism: a neuronal module enriched for known autism susceptibility genes, including the neuronal specific splicing factor A2BP1/FOX1, and a module enriched for immune genes and glial markers. Using high-throughput RNA-sequencing we demonstrate dysregulated splicing of A2BP1-dependent alternative exons in ASD brain. Moreover, using a published autism GWAS dataset, we show that the neuronal module is enriched for genetically associated variants, providing independent support for the causal involvement of these genes in autism. In contrast, the immune-glial module showed no enrichment for autism GWAS signals, indicating a non-genetic etiology for this process. Collectively, our results provide strong evidence for convergent molecular abnormalities in ASD, and implicate transcriptional and splicing dysregulation as underlying mechanisms of neuronal dysfunction in this disorder.
The models in this article generalize current models for both correlation networks and multigraph networks. Correlation networks are widely applied in genomics research. In contrast to general networks, it is straightforward to test the statistical significance of an edge in a correlation network. It is also easy to decompose the underlying correlation matrix and generate informative network statistics such as the module eigenvector. However, correlation networks only capture the connections between numeric variables. An open question is whether one can find suitable decompositions of the similarity measures employed in constructing general networks. Multigraph networks are attractive because they support likelihood based inference. Unfortunately, it is unclear how to adjust current statistical methods to detect the clusters inherent in many data sets.
Here we present an intuitive and parsimonious parametrization of a general similarity measure such as a network adjacency matrix. The cluster and propensity based approximation (CPBA) of a network not only generalizes correlation network methods but also multigraph methods. In particular, it gives rise to a novel and more realistic multigraph model that accounts for clustering and provides likelihood based tests for assessing the significance of an edge after controlling for clustering. We present a novel Majorization-Minimization (MM) algorithm for estimating the parameters of the CPBA. To illustrate the practical utility of the CPBA of a network, we apply it to gene expression data and to a bi-partite network model for diseases and disease genes from the Online Mendelian Inheritance in Man (OMIM).
The CPBA of a network is theoretically appealing since a) it generalizes correlation and multigraph network methods, b) it improves likelihood based significance tests for edge counts, c) it directly models higher-order relationships between clusters, and d) it suggests novel clustering algorithms. The CPBA of a network is implemented in Fortran 95 and bundled in the freely available R package PropClust.
Network decomposition; Model-based clustering; MM algorithm; Propensity; Network conformity
We report a systems genetics analysis of high density lipoproteins (HDL) levels in an F2 intercross between inbred strains CAST/EiJ and C57BL/6J. We previously showed that there are dramatic differences in HDL metabolism in a cross between these strains, and we now report co-expression network analysis of HDL that integrates global expression data from liver and adipose with relevant metabolic traits. Using data from a total of 293 F2 intercross mice, we constructed weighted gene co-expression networks and identified modules (subnetworks) associated with HDL and clinical traits. These were examined for genes implicated in HDL levels based on large human genome-wide associations studies (GWAS) and examined with respect to conservation between tissue and sexes in a total of 9 data sets. We identify genes that are consistently ranked high by association with HDL across the 9 data sets. We focus in particular on two genes, Wfdc2 and Hdac3, that are located in close proximity to HDL QTL peaks where causal testing indicates that they may affect HDL. Our results provide a rich resource for studies of complex metabolic interactions involving HDL.
Human Immunodeficiency Virus-1 (HIV) infection frequently results in neurocognitive impairment. While the cause remains unclear, recent gene expression studies have identified genes whose transcription is dysregulated in individuals with HIV-association neurocognitive disorder (HAND). However, the methods for interpretation of such data have lagged behind the technical advances allowing the decoding genetic material. Here, we employ systems biology methods novel to the field of NeuroAIDS to further interrogate extant transcriptome data derived from brains of HIV + patients in order to further elucidate the neuropathogenesis of HAND. Additionally, we compare these data to those derived from brains of individuals with Alzheimer’s disease (AD) in order to identify common pathways of neuropathogenesis.
In Study 1, using data from three brain regions in 6 HIV-seronegative and 15 HIV + cases, we first employed weighted gene co-expression network analysis (WGCNA) to further explore transcriptome networks specific to HAND with HIV-encephalitis (HIVE) and HAND without HIVE. We then used a symptomatic approach, employing standard expression analysis and WGCNA to identify networks associated with neurocognitive impairment (NCI), regardless of HIVE or HAND diagnosis. Finally, we examined the association between the CNS penetration effectiveness (CPE) of antiretroviral regimens and brain transcriptome. In Study 2, we identified common gene networks associated with NCI in both HIV and AD by correlating gene expression with pre-mortem neurocognitive functioning.
Study 1: WGCNA largely corroborated findings from standard differential gene expression analyses, but also identified possible meta-networks composed of multiple gene ontology categories and oligodendrocyte dysfunction. Differential expression analysis identified hub genes highly correlated with NCI, including genes implicated in gliosis, inflammation, and dopaminergic tone. Enrichment analysis identified gene ontology categories that varied across the three brain regions, the most notable being downregulation of genes involved in mitochondrial functioning. Finally, WGCNA identified dysregulated networks associated with NCI, including oligodendrocyte and mitochondrial functioning. Study 2: Common gene networks dysregulated in relation to NCI in AD and HIV included mitochondrial genes, whereas upregulation of various cancer-related genes was found.
While under-powered, this study identified possible biologically-relevant networks correlated with NCI in HIV, and common networks shared with AD, opening new avenues for inquiry in the investigation of HAND neuropathogenesis. These results suggest that further interrogation of existing transcriptome data using systems biology methods can yield important information.
HIV encephalitis; HIV-associated dementia; HIV-associated neurocognitive disorder; Weighted gene coexpression network analysis; WGCNA; CNS penetration effectiveness; National neuroAIDS tissue consortium; Coexpression module
Similarities between speech and birdsong make songbirds advantageous for investigating the neurogenetics of learned vocal communication; a complex phenotype likely supported by ensembles of interacting genes in cortico-basal ganglia pathways of both species. To date, only FoxP2 has been identified as critical to both speech and birdsong. We performed weighted gene co-expression network analysis on microarray data from singing zebra finches to discover gene ensembles regulated during vocal behavior. We found ~2,000 singing-regulated genes comprising 3 co-expression groups unique to area X, the basal ganglia subregion dedicated to learned vocalizations. These contained known targets of human FOXP2 and potential avian targets. We validated novel biological pathways for vocalization. Higher order gene co-expression patterns, rather than expression levels, molecularly distinguish area X from the ventral striato-pallidum during singing. The previously unknown structure of singing-driven networks enables prioritization of molecular interactors that likely bear on human motor disorders, especially those affecting speech.
Ensemble predictors such as the random forest are known to have superior accuracy but their black-box predictions are difficult to interpret. In contrast, a generalized linear model (GLM) is very interpretable especially when forward feature selection is used to construct the model. However, forward feature selection tends to overfit the data and leads to low predictive accuracy. Therefore, it remains an important research goal to combine the advantages of ensemble predictors (high accuracy) with the advantages of forward regression modeling (interpretability). To address this goal several articles have explored GLM based ensemble predictors. Since limited evaluations suggested that these ensemble predictors were less accurate than alternative predictors, they have found little attention in the literature.
Comprehensive evaluations involving hundreds of genomic data sets, the UCI machine learning benchmark data, and simulations are used to give GLM based ensemble predictors a new and careful look. A novel bootstrap aggregated (bagged) GLM predictor that incorporates several elements of randomness and instability (random subspace method, optional interaction terms, forward variable selection) often outperforms a host of alternative prediction methods including random forests and penalized regression models (ridge regression, elastic net, lasso). This random generalized linear model (RGLM) predictor provides variable importance measures that can be used to define a “thinned” ensemble predictor (involving few features) that retains excellent predictive accuracy.
RGLM is a state of the art predictor that shares the advantages of a random forest (excellent predictive accuracy, feature importance measures, out-of-bag estimates of accuracy) with those of a forward selected generalized linear model (interpretability). These methods are implemented in the freely available R software package randomGLM.
Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes).
We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables.
The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
High serum triglyceride (TG) levels is an established risk factor for coronary heart disease (CHD). Fat is stored in the form of TGs in human adipose tissue. We hypothesized that gene co-expression networks in human adipose tissue may be correlated with serum TG levels and help reveal novel genes involved in TG regulation.
Gene co-expression networks were constructed from two Finnish and one Mexican study sample using the blockwiseModules R function in Weighted Gene Co-expression Network Analysis (WGCNA). Overlap between TG-associated networks from each of the three study samples were calculated using a Fisher’s Exact test. Gene ontology was used to determine known pathways enriched in each TG-associated network.
We measured gene expression in adipose samples from two Finnish and one Mexican study sample. In each study sample, we observed a gene co-expression network that was significantly associated with serum TG levels. The TG modules observed in Finns and Mexicans significantly overlapped and shared 34 genes. Seven of the 34 genes (ARHGAP30, CCR1, CXCL16, FERMT3, HCST, RNASET2, SELPG) were identified as the key hub genes of all three TG modules. Furthermore, two of the 34 genes (ARHGAP9, LST1) reside in previous TG GWAS regions, suggesting them as the regional candidates underlying the GWAS signals.
This study presents a novel adipose gene co-expression network with 34 genes significantly correlated with serum TG across populations.
Mexicans; Finns; RNA sequencing; Triglycerides; Adipose tissue; Weighted gene co-expression network analysis
The predominant model for regulation of gene expression through DNA methylation is an inverse association in which increased methylation results in decreased gene expression levels. However, recent studies suggest that the relationship between genetic variation, DNA methylation and expression is more complex.
Systems genetic approaches for examining relationships between gene expression and methylation array data were used to find both negative and positive associations between these levels. A weighted correlation network analysis revealed that i) both transcriptome and methylome are organized in modules, ii) co-expression modules are generally not preserved in the methylation data and vice-versa, and iii) highly significant correlations exist between co-expression and co-methylation modules, suggesting the existence of factors that affect expression and methylation of different modules (i.e., trans effects at the level of modules). We observed that methylation probes associated with expression in cis were more likely to be located outside CpG islands, whereas specificity for CpG island shores was present when methylation, associated with expression, was under local genetic control. A structural equation model based analysis found strong support in particular for a traditional causal model in which gene expression is regulated by genetic variation via DNA methylation instead of gene expression affecting DNA methylation levels.
Our results provide new insights into the complex mechanisms between genetic markers, epigenetic mechanisms and gene expression. We find strong support for the classical model of genetic variants regulating methylation, which in turn regulates gene expression. Moreover we show that, although the methylation and expression modules differ, they are highly correlated.
DNA methylation; Gene expression; Association; Epigenetics; WGCNA
Both avian and mammalian basal ganglia are involved in voluntary motor control. In birds, such movements include hopping, perching and flying. Two organizational features that distinguish the songbird basal ganglia are that striatal and pallidal neurons are intermingled, and that neurons dedicated to vocal-motor function are clustered together in a dense cell group known as area X that sits within the surrounding striato-pallidum. This specification allowed us to perform molecular profiling of two striato-pallidal subregions, comparing transcriptional patterns in tissue dedicated to vocal-motor function (area X) to those in tissue that contains similar cell types but supports non-vocal behaviors: the striato-pallidum ventral to area X (VSP), our focus here. Since any behavior is likely underpinned by the coordinated actions of many molecules, we constructed gene co-expression networks from microarray data to study large-scale transcriptional patterns in both subregions. Our goal was to investigate any relationship between VSP network structure and singing and identify gene co-expression groups, or modules, found in the VSP but not area X. We observed mild, but surprising, relationships between VSP modules and song spectral features, and found a group of four VSP modules that were highly specific to the region. These modules were unrelated to singing, but were composed of genes involved in many of the same biological processes as those we previously observed in area X-specific singing-related modules. The VSP-specific modules were also enriched for processes disrupted in Parkinson's and Huntington's Diseases. Our results suggest that the activation/inhibition of a single pathway is not sufficient to functionally specify area X versus the VSP and support the notion that molecular processes are not in and of themselves specialized for behavior. Instead, unique interactions between molecular pathways create functional specificity in particular brain regions during distinct behavioral states.
Understanding how gene transcription relates to behavior is challenging. Learned vocal-motor behavior is a complex trait that represents the output of multiple converging genes, pathways, and patterns of neural activity. Here, we applied a systems analytical approach to determine how thousands of genes change their expression levels simultaneously in a region of the vertebrate brain important for vocal-motor function, the basal ganglia, during a specific vocal-motor behavior, singing. We used the zebra finch species of songbird based on similarities between song learning/production and speech, and because they possess a set of brain subregions dedicated to singing. Microarrays were used to measure gene expression levels in one such song-dedicated region and in an adjacent motor area that is not thought to play a role in vocal function. This allowed us to address the question of whether distinct gene co-expression patterns could be found in each area. We found that each area contained unique patterns of transcriptional co-activity, but there were also unexpected overlaps. We conclude that the particular behaviors (singing versus non-vocal behaviors) supported by these subregions depend on the particular sets of interactions between molecular pathways that occur in each subregion.
Estrogen signaling pathways may play a significant role in the pathogenesis of non-small cell lung cancers (NSCLC) as evidenced by the expression of aromatase and estrogen receptors (ERα and ERβ) in many of these tumors. Here we examine whether ERα and ERβ levels in conjunction with aromatase define patient groups with respect to survival outcomes and possible treatment regimens. Immunohistochemistry was performed on a high-density tissue microarray with resulting data and clinical information available for 377 patients. Patients were subdivided by gender, age and tumor histology, and survival data was determined using the Cox proportional hazards model and Kaplan-Meier curves. Neither ERα nor ERβ alone were predictors of survival in NSCLC. However, when coupled with aromatase expression, higher ERβ levels predicted worse survival in patients whose tumors expressed higher levels of aromatase. Although this finding was present in patients of both genders, it was especially pronounced in women ≥ 65 years old, where higher expression of both ERβ and aromatase indicated a markedly worse survival rate than that determined by aromatase alone. Conclusion: Expression of ERβ together with aromatase has predictive value for survival in different gender and age subgroups of NSCLC patients. This predictive value is stronger than each individual marker alone. Our results suggest treatment with aromatase inhibitors alone or combined with estrogen receptor modulators may be of benefit in some subpopulations of these patients.
NSCLC; tissue microarray; aromatase; estrogen receptor; immunohistochemistry; prognosis
It has been debated whether human induced pluripotent stem cells (iPSCs) and embryonic stem cells (ESCs) express distinctive transcriptomes. By using the method of weighted gene co-expression network analysis, we showed here that iPSCs exhibit altered functional modules compared with ESCs. Notably, iPSCs and ESCs differentially express 17 modules that primarily function in transcription, metabolism, development, and immune response. These module activations (up- and downregulation) are highly conserved in a variety of iPSCs, and genes in each module are coherently co-expressed. Furthermore, the activation levels of these modular genes can be used as quantitative variables to discriminate iPSCs and ESCs with high accuracy (96%). Thus, differential activations of these functional modules are the conserved features distinguishing iPSCs from ESCs. Strikingly, the overall activation level of these modules is inversely correlated with the DNA methylation level, suggesting that DNA methylation may be one mechanism regulating the module differences. Overall, we conclude that human iPSCs and ESCs exhibit distinct gene expression networks, which are likely associated with different epigenetic reprogramming events during the derivation of iPSCs and ESCs.
Primary Sjögren's syndrome (pSS) is a chronic autoimmune disease with complex etiopathogenesis. Despite extensive studies to understand the disease process utilizing human and mouse models, the intersection between these species remains elusive. To address this gap, we utilized a novel systems biology approach to identify disease-related gene modules and signaling pathways that overlap between humans and mice.
Parotid gland tissues were harvested from 24 pSS and 16 non-pSS sicca patients and 25 controls. For mouse studies, salivary glands were harvested from C57BL/6.NOD-Aec1Aec2 mice at various times during development of pSS-like disease. RNA was analyzed with Affymetrix HG U133+2.0 arrays for human samples and with MOE430+2.0 arrays for mouse samples. The images were processed with Affymetrix software. Weighted-gene co-expression network analysis was used to identify disease-related and functional pathways.
Nineteen co-expression modules were identified in human parotid tissue, of which four were significantly upregulated and three were downregulated in pSS patients compared with non-pSS sicca patients and controls. Notably, one of the human disease-related modules was highly preserved in the mouse model, and was enriched with genes involved in immune and inflammatory responses. Further comparison between these two species led to the identification of genes associated with leukocyte recruitment and germinal center formation.
Our systems biology analysis of genome-wide expression data from salivary gland tissue of pSS patients and from a pSS mouse model identified common dysregulated biological pathways and molecular targets underlying critical molecular alterations in pSS pathogenesis.
Many high-throughput biological data analyses require the calculation of large correlation matrices and/or clustering of a large number of objects. The standard R function for calculating Pearson correlation can handle calculations without missing values efficiently, but is inefficient when applied to data sets with a relatively small number of missing data. We present an implementation of Pearson correlation calculation that can lead to substantial speedup on data with relatively small number of missing entries. Further, we parallelize all calculations and thus achieve further speedup on systems where parallel processing is available. A robust correlation measure, the biweight midcorrelation, is implemented in a similar manner and provides comparable speed. The functions cor and bicor for fast Pearson and biweight midcorrelation, respectively, are part of the updated, freely available R package WGCNA.
The hierarchical clustering algorithm implemented in R function hclust is an order n3 (n is the number of clustered objects) version of a publicly available clustering algorithm (Murtagh 2012). We present the package flashClust that implements the original algorithm which in practice achieves order approximately n2, leading to substantial time savings when clustering large data sets.
Pearson correlation; robust correlation; hierarchical clustering; R