Chromothripsis is a recently discovered phenomenon of genomic rearrangement, possibly arising during a single genome-shattering event. This could provide an alternative paradigm in cancer development, replacing the gradual accumulation of genomic changes with a “one-off” catastrophic event. However, the term has been used with varying operational definitions, with the minimal consensus being a large number of locally clustered copy number aberrations. The mechanisms underlying these chromothripsis-like patterns (CTLP) and their specific impact on tumorigenesis are still poorly understood.
Here, we identified CTLP in 918 cancer samples, from a dataset of more than 22,000 oncogenomic arrays covering 132 cancer types. Fragmentation hotspots were found to be located on chromosome 8, 11, 12 and 17. Among the various cancer types, soft-tissue tumors exhibited particularly high CTLP frequencies. Genomic context analysis revealed that CTLP rearrangements frequently occurred in genomes that additionally harbored multiple copy number aberrations (CNAs). An investigation into the affected chromosomal regions showed a large proportion of arm-level pulverization and telomere related events, which would be compatible to a number of underlying mechanisms. We also report evidence that these genomic events may be correlated with patient age, stage and survival rate.
Through a large-scale analysis of oncogenomic array data sets, this study characterized features associated with genomic aberrations patterns, compatible to the spectrum of “chromothripsis”-definitions as previously used. While quantifying clustered genomic copy number aberrations in cancer samples, our data indicates an underlying biological heterogeneity behind these chromothripsis-like patterns, beyond a well defined “chromthripsis” phenomenon.
Chromothripsis; Human cancer; Array comparative genomic hybridization; SNP array
With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
STITCH is a database of protein–chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at http://stitch.embl.de, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the previous version, the number of high-confidence protein–chemical interactions in human has increased by 45%, to 367 000. In this version, we added features for users to upload their own data to STITCH in the form of internal identifiers, chemical structures or quantitative data. For example, a user can now upload a spreadsheet with screening hits to easily check which interactions are already known. To increase the coverage of STITCH, we expanded the text mining to include full-text articles and added a prediction method based on chemical structures. We further changed our scheme for transferring interactions between species to rely on orthology rather than protein similarity. This improves the performance within protein families, where scores are now transferred only to orthologous proteins, but not to paralogous proteins. STITCH can be accessed with a web-interface, an API and downloadable files.
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps.
Results: Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster.
Availability and implementation: Source code and binaries are freely available at http://meringlab.org/software/hpc-clust/; the pipeline is implemented in C++ and uses the Message Passing Interface (MPI) standard for distributed computing.
Supplementary Information: Supplementary data are available at Bioinformatics online.
The above- and below-ground parts of rice plants create specific habitats for various microorganisms. In this study, we characterized the phyllosphere and rhizosphere microbiota of rice cultivars using a metaproteogenomic approach to get insight into the physiology of the bacteria and archaea that live in association with rice. The metaproteomic datasets gave rise to a total of about 4600 identified proteins and indicated the presence of one-carbon conversion processes in the rhizosphere as well as in the phyllosphere. Proteins involved in methanogenesis and methanotrophy were found in the rhizosphere, whereas methanol-based methylotrophy linked to the genus Methylobacterium dominated within the protein repertoire of the phyllosphere microbiota. Further, physiological traits of differential importance in phyllosphere versus rhizosphere bacteria included transport processes and stress responses, which were more conspicuous in the phyllosphere samples. In contrast, dinitrogenase reductase was exclusively identified in the rhizosphere, despite the presence of nifH genes also in diverse phyllosphere bacteria.
metaproteogenomics; phyllosphere; rhizosphere; Oryza sativa; microbial community; rice
The above-ground surfaces of terrestrial plants, the phyllosphere, comprise the main interface between the terrestrial biosphere and solar radiation. It is estimated to host up to 1026 microbial cells that may intercept part of the photon flux impinging on the leaves. Based on 454-pyrosequencing-generated metagenome data, we report on the existence of diverse microbial rhodopsins in five distinct phyllospheres from tamarisk (Tamarix nilotica), soybean (Glycine max), Arabidopsis (Arabidopsis thaliana), clover (Trifolium repens) and rice (Oryza sativa). Our findings, for the first time describing microbial rhodopsins from non-aquatic habitats, point towards the potential coexistence of microbial rhodopsin-based phototrophy and plant chlorophyll-based photosynthesis, with the different pigments absorbing non-overlapping fractions of the light spectrum.
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database (http://string-db.org/) aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
Regional genomic copy number alterations (CNA) are observed in the vast majority of cancers. Besides specifically targeting well-known, canonical oncogenes, CNAs may also play more subtle roles in terms of modulating genetic potential and broad gene expression patterns of developing tumors. Any significant differences in the overall CNA patterns between different cancer types may thus point towards specific biological mechanisms acting in those cancers. In addition, differences among CNA profiles may prove valuable for cancer classifications beyond existing annotation systems.
We have analyzed molecular-cytogenetic data from 25579 tumors samples, which were classified into 160 cancer types according to the International Classification of Disease (ICD) coding system. When correcting for differences in the overall CNA frequencies between cancer types, related cancers were often found to cluster together according to similarities in their CNA profiles. Based on a randomization approach, distance measures from the cluster dendrograms were used to identify those specific genomic regions that contributed significantly to this signal. This approach identified 43 non-neutral genomic regions whose propensity for the occurrence of copy number alterations varied with the type of cancer at hand. Only a subset of these identified loci overlapped with previously implied, highly recurrent (hot-spot) cytogenetic imbalance regions.
Thus, for many genomic regions, a simple null-hypothesis of independence between cancer type and relative copy number alteration frequency can be rejected. Since a subset of these regions display relatively low overall CNA frequencies, they may point towards second-tier genomic targets that are adaptively relevant but not necessarily essential for cancer development.
Essential genes are absolutely required for the survival of an organism. The identification of essential genes, besides being one of the most fundamental questions in biology, is also of interest for the emerging science of synthetic biology and for the development of novel antimicrobials. New antimicrobial therapies are desperately needed to treat multidrug-resistant pathogens, such as members of the Burkholderia cepacia complex.
We hypothesize that essential genes may be highly conserved within a group of evolutionary closely related organisms. Using a bioinformatics approach we determined that the core genome of the order Burkholderiales consists of 649 genes. All but two of these identified genes were located on chromosome 1 of Burkholderia cenocepacia. Although many of the 649 core genes of Burkholderiales have been shown to be essential in other bacteria, we were also able to identify a number of novel essential genes present mainly, or exclusively, within this order. The essentiality of some of the core genes, including the known essential genes infB, gyrB, ubiB, and valS, as well as the so far uncharacterized genes BCAL1882, BCAL2769, BCAL3142 and BCAL3369 has been confirmed experimentally in B. cenocepacia.
We report on the identification of essential genes using a novel bioinformatics strategy and provide bioinformatics and experimental evidence that the large majority of the identified genes are indeed essential. The essential genes identified here may represent valuable targets for the development of novel antimicrobials and their detailed study may shed new light on the functions required to support life.
Many protein-protein interactions are mediated by domain-motif interaction, where a domain in one protein binds a short linear motif in its interacting partner. Such interactions are often involved in key cellular processes, necessitating their tight regulation. A common strategy of the cell to control protein function and interaction is by post-translational modifications of specific residues, especially phosphorylation. Indeed, there are motifs, such as SH2-binding motifs, in which motif phosphorylation is required for the domain-motif interaction. On the contrary, there are other examples where motif phosphorylation prevents the domain-motif interaction. Here we present a large-scale integrative analysis of experimental human data of domain-motif interactions and phosphorylation events, demonstrating an intriguing coupling between the two. We report such coupling for SH3, PDZ, SH2 and WW domains, where residue phosphorylation within or next to the motif is implied to be associated with switching on or off domain binding. For domains that require motif phosphorylation for binding, such as SH2 domains, we found coupled phosphorylation events other than the ones required for domain binding. Furthermore, we show that phosphorylation might function as a double switch, concurrently enabling interaction of the motif with one domain and disabling interaction with another domain. Evolutionary analysis shows that co-evolution of the motif and the proximal residues capable of phosphorylation predominates over other evolutionary scenarios, in which the motif appeared before the potentially phosphorylated residue, or vice versa. Our findings provide strengthening evidence for coupled interaction-regulation units, defined by a domain-binding motif and a phosphorylated residue.
Domain-motif interactions are instrumental for many central cellular processes, and are therefore tightly regulated. Phosphorylation events are known modulators of protein-protein interactions in general, including domain-motif interactions. Here, we addressed the association of phosphorylation and domain-motif interaction taking a motif-centred view. We integrated human domain-motif interaction and phosphorylation data for four representative domains (SH2, WW, SH3 and PDZ), and showed that the adjacency between phosphorylation and domain-motif interactions is extensive, suggesting interesting functional links between them that extend the classical and widely studied phospho-regulation of SH2 or WW domain-motif interactions. Furthermore, we show that such interaction-regulation units may function as double switches, concurrently enabling interaction of the motif with one domain and disabling interaction with another domain. These latter interaction-regulation units are more conserved in evolution than the individual units comprising them. Assuming that the four analyzed domain-motif interaction types are reliable representatives of such interactions, our results support the existence of units comprising motifs and associated phosphorylation sites, in which the regulation of domain-motif interaction is inherent.
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database (http://eggnog.embl.de) contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at http://stitch.embl.de/.
The genus Nepenthes, a carnivorous plant, has a pitcher to trap insects and digest them in the contained fluid to gain nutrient. A distinctive character of the pitcher fluid is the digestive enzyme activity that may be derived from plants and dwelling microbes. However, little is known about in situ digestive enzymes in the fluid. Here we examined the pitcher fluid from four species of Nepenthes. High bacterial density was observed within the fluids, ranging from 7×106 to 2.2×108 cells ml−1. We measured the activity of three common enzymes in the fluid: acid phosphatases, β-d-glucosidases, and β-d-glucosaminidases. All the tested enzymes detected in the liquid of all the pitcher species showed activity that considerably exceeded that observed in aquatic environments such as freshwater, seawater, and sediment. Our results indicate that high enzyme activity within a pitcher could assist in the rapid decomposition of prey to maximize efficient nutrient use. In addition, we filtered the fluid to distinguish between dissolved enzyme activity and particle-bound activity. As a result, filtration treatment significantly decreased the activity in all enzymes, while pH value and Nepenthes species did not affect the enzyme activity. It suggested that enzymes bound to bacteria and other organic particles also would significantly contribute to the total enzyme activity of the fluid. Since organic particles are themselves usually colonized by attached and highly active bacteria, it is possible that microbe-derived enzymes also play an important role in nutrient recycling within the fluid and affect the metabolism of the Nepenthes pitcher plant.
The phosphorylation and dephosphorylation of proteins by kinases and phosphatases constitute an essential regulatory network in eukaryotic cells. This network supports the flow of information from sensors through signaling systems to effector molecules, and ultimately drives the phenotype and function of cells, tissues, and organisms. Dysregulation of this process has severe consequences and is one of the main factors in the emergence and progression of diseases, including cancer. Thus, major efforts have been invested in developing specific inhibitors that modulate the activity of individual kinases or phosphatases; however, it has been difficult to assess how such pharmacological interventions would affect the cellular signaling network as a whole. Here, we used label-free, quantitative phosphoproteomics in a systematically perturbed model organism (Saccharomyces cerevisiae) to determine the relationships between 97 kinases, 27 phosphatases, and more than 1000 phosphoproteins. We identified 8814 regulated phosphorylation events, describing the first system-wide protein phosphorylation network in vivo. Our results show that, at steady state, inactivation of most kinases and phosphatases affected large parts of the phosphorylation-modulated signal transduction machinery, and not only the immediate downstream targets. The observed cellular growth phenotype was often well maintained despite the perturbations, arguing for considerable robustness in the system. Our results serve to constrain future models of cellular signaling and reinforce the idea that simple linear representations of signaling pathways might be insufficient for drug development and for describing organismal homeostasis.
Non-intermingling, adjacent populations of cells define compartment boundaries;
such boundaries are often essential for the positioning and the maintenance of
tissue-organizers during growth. In the developing wing primordium of
Drosophila melanogaster, signaling by the secreted protein
Hedgehog (Hh) is required for compartment boundary maintenance. However, the
precise mechanism of Hh input remains poorly understood. Here, we combine
experimental observations of perturbed Hh signaling with computer simulations of
cellular behavior, and connect physical properties of cells to their Hh
signaling status. We find that experimental disruption of Hh signaling has
observable effects on cell sorting surprisingly far from the compartment
boundary, which is in contrast to a previous model that confines Hh influence to
the compartment boundary itself. We have recapitulated our experimental
observations by simulations of Hh diffusion and transduction coupled to
mechanical tension along cell-to-cell contact surfaces. Intriguingly, the best
results were obtained under the assumption that Hh signaling cannot alter the
overall tension force of the cell, but will merely re-distribute it locally
inside the cell, relative to the signaling status of neighboring cells. Our
results suggest a scenario in which homotypic interactions of a putative Hh
target molecule at the cell surface are converted into a mechanical force. Such
a scenario could explain why the mechanical output of Hh signaling appears to be
confined to the compartment boundary, despite the longer range of the Hh
molecule itself. Our study is the first to couple a cellular vertex model
describing mechanical properties of cells in a growing tissue, to an explicit
model of an entire signaling pathway, including a freely diffusible component.
We discuss potential applications and challenges of such an approach.
In developing animal tissues, cells can often re-arrange locally and mix
relatively freely. However, in some stereotypic and crucially important
instances during body development, cells will strictly not intermingle, and
instead form sharp boundaries along which they will sort out from each other.
This mechanism helps organisms to establish signaling centers and to maintain
distinct cellular identities. Often, cells at such boundaries will remain in
close physical contact and are morphologically alike. Thus, the boundary itself
can be difficult to observe unless the expression status of specific marker
genes is monitored experimentally. How are these ‘compartment
boundaries’ established? Here we devise a computational model that aims to
describe one such boundary in a well-studied animal tissue: the developing wing
primordium of Drosophila melanogaster. We model the production,
diffusion and local sensing of an essential signaling molecule, the
Hedgehog protein. We reveal one possible mechanism by which
Hedgehog sensing can influence the mechanical properties of cells, and compare
the simulated outcome to observations in experimentally perturbed, actual wing
discs. Our relatively simple model suffices to establish a straight and stable
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein–protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at http://string-db.org.
Shotgun sequencing of environmental DNA is an essential technique for characterizing uncultivated microbes in situ. However, the taxonomic and functional assignment of the obtained sequence fragments remains a pressing problem.
Existing algorithms are largely optimized for speed and coverage; in contrast, we present here a software framework that focuses on a restricted set of informative gene families, using Maximum Likelihood to assign these with the best possible accuracy. This framework ('MLTreeMap'; http://mltreemap.org/) uses raw nucleotide sequences as input, and includes hand-curated, extensible reference information.
We discuss how we validated our pipeline using complete genomes as well as simulated and actual environmental sequences.
Improving the ability to reverse engineer biochemical networks is a major goal of systems biology. Lesions in signaling networks lead to alterations in gene expression, which in principle should allow network reconstruction. However, the information about the activity levels of signaling proteins conveyed in overall gene expression is limited by the complexity of gene expression dynamics and of regulatory network topology. Two observations provide the basis for overcoming this limitation: a. genes induced without de-novo protein synthesis (early genes) show a linear accumulation of product in the first hour after the change in the cell's state; b. The signaling components in the network largely function in the linear range of their stimulus-response curves. Therefore, unlike most genes or most time points, expression profiles of early genes at an early time point provide direct biochemical assays that represent the activity levels of upstream signaling components. Such expression data provide the basis for an efficient algorithm (Plato's Cave algorithm; PLACA) to reverse engineer functional signaling networks. Unlike conventional reverse engineering algorithms that use steady state values, PLACA uses stimulated early gene expression measurements associated with systematic perturbations of signaling components, without measuring the signaling components themselves. Besides the reverse engineered network, PLACA also identifies the genes detecting the functional interaction, thereby facilitating validation of the predicted functional network. Using simulated datasets, the algorithm is shown to be robust to experimental noise. Using experimental data obtained from gonadotropes, PLACA reverse engineered the interaction network of six perturbed signaling components. The network recapitulated many known interactions and identified novel functional interactions that were validated by further experiment. PLACA uses the results of experiments that are feasible for any signaling network to predict the functional topology of the network and to identify novel relationships.
Elucidating the biochemical interactions in living cells is essential to understanding their behavior under various external conditions. Some of these interactions occur between signaling components with many active states, and their activity levels may be difficult to measure directly. However, most methods to reverse engineer interaction networks rely on measuring gene activity at steady state under various cellular stimuli. Such gene measurements therefore ignore the intermediate effects of signaling components, and cannot reliably convey the interactions between the signaling components themselves. We propose using the changes in activity of early genes shortly after the stimulus to infer the functional interactions between the unmeasured signaling components. The change in expression in such genes at these times is directly and linearly affected by the signaling components, since there is insufficient time for other genes to be transcribed and interfere with the early genes' expression. We present an algorithm that uses such measurements to reverse engineer the functional interaction network between signaling components, and also provides a means for testing these predictions. The algorithm therefore uses feasible experiments to reconstruct functional networks. We applied the algorithm to experimental measurements and uncovered known interactions, as well as novel interactions that were then confirmed experimentally.
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs. STITCH can be accessed at http://stitch.embl.de/.
The Hedgehog signaling pathway plays a crucial role in development and disease. Its putative origins in an ancient system involved in regulating bacterial lipid transport and homeostasis offers clues about how the pathway might work today.
Although functionally related proteins can be reliably predicted from phylogenetic profiles, many functional modules do not seem to evolve cohesively according to case studies and systematic analyses in prokaryotes. In this study we quantify the extent of evolutionary cohesiveness of functional modules in eukaryotes and probe the biological and methodological factors influencing our estimates. We have collected various datasets of protein complexes and pathways in Saccheromyces cerevisiae. We define orthologous groups on 34 eukaryotic genomes and measure the extent of cohesive evolution of sets of orthologous groups of which members constitute a known complex or pathway. Within this framework it appears that most functional modules evolve flexibly rather than cohesively. Even after correcting for uncertain module definitions and potentially problematic orthologous groups, only 46% of pathways and complexes evolve more cohesively than random modules. This flexibility seems partly coupled to the nature of the functional module because biochemical pathways are generally more cohesively evolving than complexes.
Components of a protein complex or a metabolic pathway strongly cooperate to perform a specific function. Because of this functional interdependence, proteins that form a complex or pathway are expected to be present and absent together in different species. Phylogenetic profiling methods, in which proteins with similar presence and absence patterns are inferred to be functionally linked, are based on this assumption. In this report, we quantify to what extent proteins that together constitute a complex or pathway (a functional module) in yeast are present and absent together (evolve cohesively) in other eukaryotic species. We find that more than half of all complexes and pathways are only partially present in a number of species. It appears that evolution of functional modules is very flexible; components are not indispensable; they can be replaced or reused in a different functional context. This places a limit on how well phylogenetic profiling methods can detect functionally related proteins. Functional modules that evolve cohesively are typically involved in biological processes such as translation and amino acid metabolism.
Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein–protein interactions currently available. STRING can be reached at http://string-db.org/.
To investigate the extent of genetic stratification in structured microbial communities, we compared the metagenomes of 10 successive layers of a phylogenetically complex hypersaline mat from Guerrero Negro, Mexico. We found pronounced millimeter-scale genetic gradients that were consistent with the physicochemical profile of the mat. Despite these gradients, all layers displayed near-identical and acid-shifted isoelectric point profiles due to a molecular convergence of amino-acid usage, indicating that hypersalinity enforces an overriding selective pressure on the mat community.
metagenomics; hypersalinity; microbial ecology; fine-scale; salt-in
The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at http://stitch.embl.de/
Metagenomic analysis of termite gut flora reveals a diversity of wood-degrading enzymes.
Termites eat and digest wood, but how do they do it? Combining advanced genomics and proteomics techniques, researchers have now shown that microbes found in the termites' hindguts possess just the right tools.