Plasma levels of branched-chain amino acids (BCAA) are consistently elevated in obesity and type 2 diabetes (T2D) and can also prospectively predict T2D. However, the role of BCAA in the pathogenesis of insulin resistance and T2D remains unclear.
To identify pathways related to insulin resistance, we performed comprehensive gene expression and metabolomics analyses in skeletal muscle from 41 humans with normal glucose tolerance and 11 with T2D across a range of insulin sensitivity (SI, 0.49 to 14.28). We studied both cultured cells and mice heterozygous for the BCAA enzyme methylmalonyl-CoA mutase (Mut) and assessed the effects of altered BCAA flux on lipid and glucose homeostasis.
Our data demonstrate perturbed BCAA metabolism and fatty acid oxidation in muscle from insulin resistant humans. Experimental alterations in BCAA flux in cultured cells similarly modulate fatty acid oxidation. Mut heterozygosity in mice alters muscle lipid metabolism in vivo, resulting in increased muscle triglyceride accumulation, increased plasma glucose, hyperinsulinemia, and increased body weight after high-fat feeding.
Our data indicate that impaired muscle BCAA catabolism may contribute to the development of insulin resistance by perturbing both amino acid and fatty acid metabolism and suggest that targeting BCAA metabolism may hold promise for prevention or treatment of T2D.
•Human insulin resistance is associated with perturbed muscle BCAA metabolism.•Experimental modulation of BCAA metabolic flux alters fatty acid oxidation in vitro.•Mut heterozygosis leads to increased body weigh and muscle TAG accumulation in mice.
Insulin sensitivity; BCAA; Fatty acid oxidation; TCA cycle
The COMBREX database (COMBREX-DB; combrex.bu.edu) is an online repository of information related to (i) experimentally determined protein function, (ii) predicted protein function, (iii) relationships among proteins of unknown function and various types of experimental data, including molecular function, protein structure, and associated phenotypes. The database was created as part of the novel COMBREX (COMputational BRidges to EXperiments) effort aimed at accelerating the rate of gene function validation. It currently holds information on ∼3.3 million known and predicted proteins from over 1000 completely sequenced bacterial and archaeal genomes. The database also contains a prototype recommendation system for helping users identify those proteins whose experimental determination of function would be most informative for predicting function for other proteins within protein families. The emphasis on documenting experimental evidence for function predictions, and the prioritization of uncharacterized proteins for experimental testing distinguish COMBREX from other publicly available microbial genomics resources. This article describes updates to COMBREX-DB since an initial description in the 2011 NAR Database Issue.
Developmental fate decisions are dictated by master transcription factors (TFs) that interact with cis-regulatory elements to direct transcriptional programs. Certain malignant tumors may also depend on cellular hierarchies reminiscent of normal development but superimposed on underlying genetic aberrations. In glioblastoma (GBM), a subset of stem-like tumor-propagating cells (TPCs) appears to drive tumor progression and underlie therapeutic resistance, yet remain poorly understood. Here, we identify a core set of neurodevelopmental TFs (POU3F2, SOX2, SALL2, OLIG2) essential for GBM propagation. These TFs coordinately bind and activate TPC-specific regulatory elements, and are sufficient to fully reprogram differentiated GBM cells to ‘induced’ TPCs, recapitulating the epigenetic landscape and phenotype of native TPCs. We reconstruct a network model that highlights critical interactions and identifies novel therapeutic targets for eliminating TPCs. Our study establishes the epigenetic basis of a developmental hierarchy in GBM, provides detailed insight into underlying gene regulatory programs, and suggests attendant therapeutic strategies.
cis-regulatory elements; enhancers; chromatin; epigenetic states; glioblastoma; stem cells; cellular hierarchy; cellular reprogramming; cancer
Copy number variations (CNVs) are increasingly recognized as significant disease susceptibility markers in many complex disorders including cancer. The availability of a large number of chromosomal copy number profiles in both malignant and normal tissues in cancer patients presents an opportunity to characterize not only somatic alterations but also germline CNVs, which may confer increased risk for cancer.
We explored the germline CNVs in five cancer cohorts from the Cancer Genome Atlas (TCGA) consisting of 351 brain, 336 breast, 342 colorectal, 370 renal, and 314 ovarian cancers, genotyped on Affymetrix SNP6.0 arrays. Comparing these to ~3000 normal controls from another study, our case–control association study revealed 39 genomic loci (9 brain, 3 breast, 4 colorectal, 11 renal, and 12 ovarian cancers) as potential candidates of tumor susceptibility loci. Many of these loci are new and in some cases are associated with a substantial increase in disease risk. The majority of the observed loci do not overlap with coding sequences; however, several observed genomic loci overlap with known cancer genes including RET in brain cancers, ERBB2 in renal cell carcinomas, and DCC in ovarian cancers, all of which have not been previously associated with germline changes in cancer.
This large-scale genome-wide association study for CNVs across multiple cancer types identified several novel rare germline CNVs as cancer predisposing genomic loci. These loci can potentially serve as clinically useful markers conferring increased cancer risk.
Electronic supplementary material
The online version of this article (doi:10.1186/s12943-015-0292-6) contains supplementary material, which is available to authorized users.
Array CGH; DNA copy number; CNV association study; Cancer susceptibility
Dysregulated muscle metabolism is a cardinal feature of human insulin resistance (IR) and associated diseases, including type 2 diabetes (T2D). However, specific reactions contributing to abnormal energetics and metabolic inflexibility in IR are unknown.
We utilize flux balance computational modeling to develop the first systems-level analysis of IR metabolism in fasted and fed states, and varying nutrient conditions. We systematically perturb the metabolic network to identify reactions that reproduce key features of IR-linked metabolism.
While reduced glucose uptake is a major hallmark of IR, model-based reductions in either extracellular glucose availability or uptake do not alter metabolic flexibility, and thus are not sufficient to fully recapitulate IR-linked metabolism. Moreover, experimentally-reduced flux through single reactions does not reproduce key features of IR-linked metabolism. However, dual knockdowns of pyruvate dehydrogenase (PDH), in combination with reduced lipid uptake or lipid/amino acid oxidation (ETFDH), does reduce ATP synthesis, TCA cycle flux, and metabolic flexibility. Experimental validation demonstrates robust impact of dual knockdowns in PDH/ETFDH on cellular energetics and TCA cycle flux in cultured myocytes. Parallel analysis of transcriptomic and metabolomics data in humans with IR and T2D demonstrates downregulation of PDH subunits and upregulation of its inhibitory kinase PDK4, both of which would be predicted to decrease PDH flux, concordant with the model.
Our results indicate that complex interactions between multiple biochemical reactions contribute to metabolic perturbations observed in human IR, and that the PDH complex plays a key role in these metabolic phenotypes.
Schematic of targeted knockdowns that model phenotypes of insulin resistance. Knockdowns (KD; green) of fatty acid transporter (FAT), pyruvate dehydrogenase (PDH), and electron transfer flavoprotein (ETFDH) recapitulated the metabolic phenotypes of insulin resistance (red), defined as reduced ATP + P-Cr synthesis, TCA flux, and metabolic flexibility (RQ). Additional metabolic alterations identified are depicted by orange arrows.
Muscle insulin resistance; Muscle metabolism; Flux balance analysis; Computational modeling
experimental validation; hypothetical proteins; crowdsourcing; high-throughput; traceability
Glioblastoma (GBM) is thought to be driven by a sub-population of cancer stem cells (CSCs) that self-renew and recapitulate tumor heterogeneity, yet remain poorly understood. Here we present a comparative histone modification analysis of GBM CSCs that reveals widespread activation of genes normally held in check by Polycomb repressors. These activated targets include a large set of developmental transcription factors (TFs) whose coordinated activation is unique to the CSCs. We demonstrate that a critical factor in the set, ASCL1, activates Wnt signaling by repressing the negative regulator DKK1. We show that ASCL1 is essential for maintenance and in vivo tumorigenicity of GBM CSCs. Genomewide binding profiles for ASCL1 and the Wnt effector LEF1 provide mechanistic insight and suggest widespread interactions between the TF module and the signaling pathway. Our findings demonstrate regulatory connections between ASCL1, Wnt signaling and collaborating TFs that are essential for the maintenance and tumorigenicity of GBM CSCs.
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
Flux balance analysis and constraint based modeling have been successfully used in the past to elucidate the metabolism of single cellular organisms. However, limited work has been done with multicellular organisms and even less with humans. The focus of this paper is to present a novel use of this technique by investigating human nutrition, a challenging field of study. Specifically, we present a steady state constraint based model of skeletal muscle tissue to investigate amino acid supplementation's effect on protein synthesis. We implement several in silico supplementation strategies to study whether amino acid supplementation might be beneficial for increasing muscle contractile protein synthesis. Concurrent with published data on amino acid supplementation's effect on protein synthesis in a post resistance exercise state, our results suggest that increasing bioavailability of methionine, arginine, and the branched-chain amino acids can increase the flux of contractile protein synthesis. The study also suggests that a common commercial supplement, glutamine, is not an effective supplement in the context of increasing protein synthesis and thus, muscle mass. Similar to any study in a model organism, the computational modeling of this research has some limitations. Thus, this paper introduces the prospect of using systems biology as a framework to formally investigate how supplementation and nutrition can affect human metabolism and physiology.
The functional characterization of Open Reading Frames (ORFs) from sequenced genomes remains a bottleneck in our effort to understand microbial biology. In particular, the functional characterization of proteins with only remote sequence homology to known proteins can be challenging, as there may be few clues to guide initial experiments. Affinity enrichment of proteins from cell lysates, and a global perspective of protein function as provided by COMBREX, affords an approach to this problem. We present here the biochemical analysis of six proteins from Helicobacter pylori ATCC 26695, a focus organism in COMBREX. Initial hypotheses were based upon affinity capture of proteins from total cellular lysate using derivatized nano-particles, and subsequent identification by mass spectrometry. Candidate genes encoding these proteins were cloned and expressed in Escherichia coli, and the recombinant proteins were purified and characterized biochemically and their biochemical parameters compared with the native ones. These proteins include a guanosine triphosphate (GTP) cyclohydrolase (HP0959), an ATPase (HP1079), an adenosine deaminase (HP0267), a phosphodiesterase (HP1042), an aminopeptidase (HP1037), and new substrates were characterized for a peptidoglycan deacetylase (HP0310). Generally, characterized enzymes were active at acidic to neutral pH (4.0–7.5) with temperature optima ranging from 35 to 55°C, although some exhibited outstanding characteristics.
The dramatic reduction in the cost of sequencing has allowed many researchers to join in the effort of sequencing and annotating prokaryotic genomes. Annotation methods vary considerably and may fail to identify some genes. Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST.
By analyzing 1,474 prokaryotic genome annotations in GenBank, we identify 13,602 likely missed genes that are homologs to non-hypothetical proteins, and 11,792 likely missed genes that are homologs only to hypothetical proteins, yet have supporting evidence of their protein-coding nature from COMBREX, a newly created gene function database. We also estimate the likelihood that each potential missing gene found is a genuine protein-coding gene using COMBREX.
Our analysis of the causes of missed genes suggests that larger annotation centers tend to produce annotations with fewer missed genes than smaller centers, and many of the missed genes are short genes <300 bp. Over 1,000 of the likely missed genes could be associated with phenotype information available in COMBREX. 359 of these genes, found in pathogenic organisms, may be potential targets for pharmaceutical research. The newly identified genes are available on COMBREX’s website.
This article was reviewed by Daniel Haft, Arcady Mushegian, and M. Pilar Francino (nominated by David Ardell).
The oral microbiome, the complex ecosystem of microbes inhabiting the human mouth, harbors several thousands of bacterial types. The proliferation of pathogenic bacteria within the mouth gives rise to periodontitis, an inflammatory disease known to also constitute a risk factor for cardiovascular disease. While much is known about individual species associated with pathogenesis, the system-level mechanisms underlying the transition from health to disease are still poorly understood. Through the sequencing of the 16S rRNA gene and of whole community DNA we provide a glimpse at the global genetic, metabolic, and ecological changes associated with periodontitis in 15 subgingival plaque samples, four from each of two periodontitis patients, and the remaining samples from three healthy individuals. We also demonstrate the power of whole-metagenome sequencing approaches in characterizing the genomes of key players in the oral microbiome, including an unculturable TM7 organism. We reveal the disease microbiome to be enriched in virulence factors, and adapted to a parasitic lifestyle that takes advantage of the disrupted host homeostasis. Furthermore, diseased samples share a common structure that was not found in completely healthy samples, suggesting that the disease state may occupy a narrow region within the space of possible configurations of the oral microbiome. Our pilot study demonstrates the power of high-throughput sequencing as a tool for understanding the role of the oral microbiome in periodontal disease. Despite a modest level of sequencing (∼2 lanes Illumina 76 bp PE) and high human DNA contamination (up to ∼90%) we were able to partially reconstruct several oral microbes and to preliminarily characterize some systems-level differences between the healthy and diseased oral microbiomes.
Type 2 diabetes and obesity are increasingly affecting human populations around the world. Our goal was to identify early molecular signatures predicting genetic risk to these metabolic diseases using two strains of mice that differ greatly in disease susceptibility.
RESEARCH DESIGN AND METHODS
We integrated metabolic characterization, gene expression, protein-protein interaction networks, RT-PCR, and flow cytometry analyses of adipose, skeletal muscle, and liver tissue of diabetes-prone C57BL/6NTac (B6) mice and diabetes-resistant 129S6/SvEvTac (129) mice at 6 weeks and 6 months of age.
At 6 weeks of age, B6 mice were metabolically indistinguishable from 129 mice, however, adipose tissue showed a consistent gene expression signature that differentiated between the strains. In particular, immune system gene networks and inflammatory biomarkers were upregulated in adipose tissue of B6 mice, despite a low normal fat mass. This was accompanied by increased T-cell and macrophage infiltration. The expression of the same networks and biomarkers, particularly those related to T-cells, further increased in adipose tissue of B6 mice, but only minimally in 129 mice, in response to weight gain promoted by age or high-fat diet, further exacerbating the differences between strains.
Insulin resistance in mice with differential susceptibility to diabetes and metabolic syndrome is preceded by differences in the inflammatory response of adipose tissue. This phenomenon may serve as an early indicator of disease and contribute to disease susceptibility and progression.
COMBREX (http://combrex.bu.edu) is a project to increase the speed of the functional annotation of new bacterial and archaeal genomes. It consists of a database of functional predictions produced by computational biologists and a mechanism for experimental biochemists to bid for the validation of those predictions. Small grants are available to support successful bids.
Complete and accurate annotation of gene function is an essential starting point for genome interpretation and a host of systems and synthetic biology endeavors. Detecting errors in existing annotation now has an important new tool.
Methylthiotransferases (MTTases) are a closely related family of proteins that perform both radical-S-adenosylmethionine (SAM) mediated sulfur insertion and SAM-dependent methylation to modify nucleic acid or protein targets with a methyl thioether group (–SCH3). Members of two of the four known subgroups of MTTases have been characterized, typified by MiaB, which modifies N6-isopentenyladenosine (i6A) to 2-methylthio-N6-isopentenyladenosine (ms2i6A) in tRNA, and RimO, which modifies a specific aspartate residue in ribosomal protein S12. In this work, we have characterized the two MTTases encoded by Bacillus subtilis 168 and find that, consistent with bioinformatic predictions, ymcB is required for ms2i6A formation (MiaB activity), and yqeV is required for modification of N6-threonylcarbamoyladenosine (t6A) to 2-methylthio-N6-threonylcarbamoyladenosine (ms2t6A) in tRNA. The enzyme responsible for the latter activity belongs to a third MTTase subgroup, no member of which has previously been characterized. We performed domain-swapping experiments between YmcB and YqeV to narrow down the protein domain(s) responsible for distinguishing i6A from t6A and found that the C-terminal TRAM domain, putatively involved with RNA binding, is likely not involved with this discrimination. Finally, we performed a computational analysis to identify candidate residues outside the TRAM domain that may be involved with substrate recognition. These residues represent interesting targets for further analysis.
To characterize the hormonal milieu and adipose gene expression in response to catch-up growth (CUG), a growth pattern associated with obesity and diabetes risk, in a mouse model of low birth weight (LBW).
RESEARCH DESIGN AND METHODS
ICR mice were food restricted by 50% from gestational days 12.5–18.5, reducing offspring birth weight by 25%. During the suckling period, dams were either fed ad libitum, permitting CUG in offspring, or food restricted, preventing CUG. Offspring were killed at age 3 weeks, and gonadal fat was removed for RNA extraction, array analysis, RT-PCR, and evaluation of cell size and number. Serum insulin, thyroxine (T4), corticosterone, and adipokines were measured.
At age 3 weeks, LBW mice with CUG (designated U-C) had body weight comparable with controls (designated C-C); weight was reduced by 49% in LBW mice without CUG (designated U-U). Adiposity was altered by postnatal nutrition, with gonadal fat increased by 50% in U-C and decreased by 58% in U-U mice (P < 0.05 vs. C-C mice). Adipose expression of the lipogenic genes Fasn, AccI, Lpin1, and Srebf1 was significantly increased in U-C compared with both C-C and U-U mice (P < 0.05). Mitochondrial DNA copy number was reduced by >50% in U-C versus U-U mice (P = 0.014). Although cell numbers did not differ, mean adipocyte diameter was increased in U-C and reduced in U-U mice (P < 0.01).
CUG results in increased adipose tissue lipogenic gene expression and adipocyte diameter but not increased cellularity, suggesting that catch-up fat is primarily associated with lipogenesis rather than adipogenesis in this murine model.
Aberrant activation of signaling pathways drives many of the fundamental biological processes that accompany tumor initiation and progression. Inappropriate phosphorylation of intermediates in these signaling pathways are a frequently observed molecular lesion that accompanies the undesirable activation or repression of pro- and anti-oncogenic pathways. Therefore, methods which directly query signaling pathway activation via phosphorylation assays in individual cancer biopsies are expected to provide important insights into the molecular “logic” that distinguishes cancer and normal tissue on one hand, and enables personalized intervention strategies on the other.
We first document the largest available set of tyrosine phosphorylation sites that are, individually, differentially phosphorylated in lung cancer, thus providing an immediate set of drug targets. Next, we develop a novel computational methodology to identify pathways whose phosphorylation activity is strongly correlated with the lung cancer phenotype. Finally, we demonstrate the feasibility of classifying lung cancers based on multi-variate phosphorylation signatures.
Highly predictive and biologically transparent phosphorylation signatures of lung cancer provide evidence for the existence of a robust set of phosphorylation mechanisms (captured by the signatures) present in the majority of lung cancers, and that reliably distinguish each lung cancer from normal. This approach should improve our understanding of cancer and help guide its treatment, since the phosphorylation signatures highlight proteins and pathways whose phosphorylation should be inhibited in order to prevent unregulated proliferation.
Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance.
Results: We perform integrative analysis of the 16 DGAP data sets that span multiple tissues, conditions, array types, laboratories, species, genetic backgrounds and study designs. For each data set, we identify differentially expressed genes compared with control. Then, for the combined data, we rank genes according to the frequency with which they were found to be statistically significant across data sets. This analysis reveals RetSat as a widely shared component of mechanisms involved in insulin resistance and sensitivity and adds to the growing importance of the retinol pathway in diabetes, adipogenesis and insulin resistance. Top candidates obtained from our analysis have been confirmed in recent laboratory studies.
Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity.
Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data.
Supplementary information: Supplementary data are available at Bioinformatics online.
The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes.
We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN).
We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent.
As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods.
Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.
Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE.
We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0∼1%.
Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing.
In embryonic stem (ES) cells, bivalent chromatin domains with overlapping repressive (H3 lysine 27 tri-methylation) and activating (H3 lysine 4 tri-methylation) histone modifications mark the promoters of more than 2,000 genes. To gain insight into the structure and function of bivalent domains, we mapped key histone modifications and subunits of Polycomb-repressive complexes 1 and 2 (PRC1 and PRC2) genomewide in human and mouse ES cells by chromatin immunoprecipitation, followed by ultra high-throughput sequencing. We find that bivalent domains can be segregated into two classes—the first occupied by both PRC2 and PRC1 (PRC1-positive) and the second specifically bound by PRC2 (PRC2-only). PRC1-positive bivalent domains appear functionally distinct as they more efficiently retain lysine 27 tri-methylation upon differentiation, show stringent conservation of chromatin state, and associate with an overwhelming number of developmental regulator gene promoters. We also used computational genomics to search for sequence determinants of Polycomb binding. This analysis revealed that the genomewide locations of PRC2 and PRC1 can be largely predicted from the locations, sizes, and underlying motif contents of CpG islands. We propose that large CpG islands depleted of activating motifs confer epigenetic memory by recruiting the full repertoire of Polycomb complexes in pluripotent cells.
Polycomb-group (PcG) proteins play essential roles in the epigenetic regulation of gene expression during development. PcG proteins are repressors that catalyze lysine 27 tri-methylation on histone H3. They are antagonized by trithorax-group proteins that catalyze lysine 4 tri-methylation. Recent studies of ES cells revealed a novel chromatin pattern consisting of overlapping lysine 27 and lysine 4 tri-methylation. Genomic regions with these opposing modifications were termed “bivalent domains” and proposed to silence developmental regulators while keeping them “poised” for alternate fates. However, our understanding of PcG regulation and bivalent domains remains limited. For instance, bivalent domains affect over 2,000 promoters with diverse functions, which suggests that they may function in diverse cellular processes. Moreover, the mechanisms that underlie the targeting of PcG complexes to specific genomic regions remain completely unknown. To gain insight into these issues, we used ultra high-throughput sequencing to map PcG complexes and related modifications genomewide in human and mouse ES cells. The data identify two classes of bivalent domains with distinct regulatory properties. They also reveal striking relationships between genome sequence and chromatin state that suggest a prominent role for the DNA sequence in dictating the genomewide localization of PcG complexes and, consequently, bivalent domains in ES cells.
In the current climate of high-throughput computational biology, the inference of a protein's function from related measurements, such as protein-protein interaction relations, has become a canonical task. Most existing technologies pursue this task as a classification problem, on a term-by-term basis, for each term in a database, such as the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functions. However, ontology structures are essentially hierarchies, with certain top to bottom annotation rules which protein function predictions should in principle follow. Currently, the most common approach to imposing these hierarchical constraints on network-based classifiers is through the use of transitive closure to predictions.
We propose a probabilistic framework to integrate information in relational data, in the form of a protein-protein interaction network, and a hierarchically structured database of terms, in the form of the GO database, for the purpose of protein function prediction. At the heart of our framework is a factorization of local neighborhood information in the protein-protein interaction network across successive ancestral terms in the GO hierarchy. We introduce a classifier within this framework, with computationally efficient implementation, that produces GO-term predictions that naturally obey a hierarchical 'true-path' consistency from root to leaves, without the need for further post-processing.
A cross-validation study, using data from the yeast Saccharomyces cerevisiae, shows our method offers substantial improvements over both standard 'guilt-by-association' (i.e., Nearest-Neighbor) and more refined Markov random field methods, whether in their original form or when post-processed to artificially impose 'true-path' consistency. Further analysis of the results indicates that these improvements are associated with increased predictive capabilities (i.e., increased positive predictive value), and that this increase is consistent uniformly with GO-term depth. Additional in silico validation on a collection of new annotations recently added to GO confirms the advantages suggested by the cross-validation study. Taken as a whole, our results show that a hierarchical approach to network-based protein function prediction, that exploits the ontological structure of protein annotation databases in a principled manner, can offer substantial advantages over the successive application of 'flat' network-based methods.
A systematic analysis of the relationship between the neoplastic and developmental transcriptome provides an outline of global trends in cancer gene expression.
In recent years, the molecular underpinnings of the long-observed resemblance between neoplastic and immature tissue have begun to emerge. Genome-wide transcriptional profiling has revealed similar gene expression signatures in several tumor types and early developmental stages of their tissue of origin. However, it remains unclear whether such a relationship is a universal feature of malignancy, whether heterogeneities exist in the developmental component of different tumor types and to which degree the resemblance between cancer and development is a tissue-specific phenomenon.
We defined a developmental landscape by summarizing the main features of ten developmental time courses and projected gene expression from a variety of human tumor types onto this landscape. This comparison demonstrates a clear imprint of developmental gene expression in a wide range of tumors and with respect to different, even non-cognate developmental backgrounds. Our analysis reveals three classes of cancers with developmentally distinct transcriptional patterns. We characterize the biological processes dominating these classes and validate the class distinction with respect to a new time series of murine embryonic lung development. Finally, we identify a set of genes that are upregulated in most cancers and we show that this signature is active in early development.
This systematic and quantitative overview of the relationship between the neoplastic and developmental transcriptome spanning dozens of tissues provides a reliable outline of global trends in cancer gene expression, reveals potentially clinically relevant differences in the gene expression of different cancer types and represents a reference framework for interpretation of smaller-scale functional studies.