Search tips
Search criteria

Results 1-25 (35)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  Ecological Consistency of SSU rRNA-Based Operational Taxonomic Units at a Global Scale 
PLoS Computational Biology  2014;10(4):e1003594.
Operational Taxonomic Units (OTUs), usually defined as clusters of similar 16S/18S rRNA sequences, are the most widely used basic diversity units in large-scale characterizations of microbial communities. However, it remains unclear how well the various proposed OTU clustering algorithms approximate ‘true’ microbial taxa. Here, we explore the ecological consistency of OTUs – based on the assumption that, like true microbial taxa, they should show measurable habitat preferences (niche conservatism). In a global and comprehensive survey of available microbial sequence data, we systematically parse sequence annotations to obtain broad ecological descriptions of sampling sites. Based on these, we observe that sequence-based microbial OTUs generally show high levels of ecological consistency. However, different OTU clustering methods result in marked differences in the strength of this signal. Assuming that ecological consistency can serve as an objective external benchmark for cluster quality, we conclude that hierarchical complete linkage clustering, which provided the most ecologically consistent partitions, should be the default choice for OTU clustering. To our knowledge, this is the first approach to assess cluster quality using an external, biologically meaningful parameter as a benchmark, on a global scale.
Author Summary
To characterize the composition of microbial communities, researchers often sequence and quantify specific marker genes, particularly the SSU (‘small subunit’) ribosomal RNA gene. One crucial step in such studies is the clustering of sequences into Operational Taxonomic Units (OTUs) of closely related organisms. However, this practice has repeatedly been called into question, arguing that the use of OTUs is not backed by microbial speciation theory. Here, we explore whether OTUs group ecologically similar organisms and show that indeed, OTUs are generally ecologically consistent. Moreover, we show how ecological consistency can be used as a measure of OTU ‘quality’ and compare different widely used OTU clustering methods. Our findings should help in the design and interpretation of SSU-based microbial ecology studies, in a research field that is only beginning to unfold its full potential to help understand life at the smallest scales.
PMCID: PMC3998914  PMID: 24763141
2.  Chromothripsis-like patterns are recurring but heterogeneously distributed features in a survey of 22,347 cancer genome screens 
BMC Genomics  2014;15:82.
Chromothripsis is a recently discovered phenomenon of genomic rearrangement, possibly arising during a single genome-shattering event. This could provide an alternative paradigm in cancer development, replacing the gradual accumulation of genomic changes with a “one-off” catastrophic event. However, the term has been used with varying operational definitions, with the minimal consensus being a large number of locally clustered copy number aberrations. The mechanisms underlying these chromothripsis-like patterns (CTLP) and their specific impact on tumorigenesis are still poorly understood.
Here, we identified CTLP in 918 cancer samples, from a dataset of more than 22,000 oncogenomic arrays covering 132 cancer types. Fragmentation hotspots were found to be located on chromosome 8, 11, 12 and 17. Among the various cancer types, soft-tissue tumors exhibited particularly high CTLP frequencies. Genomic context analysis revealed that CTLP rearrangements frequently occurred in genomes that additionally harbored multiple copy number aberrations (CNAs). An investigation into the affected chromosomal regions showed a large proportion of arm-level pulverization and telomere related events, which would be compatible to a number of underlying mechanisms. We also report evidence that these genomic events may be correlated with patient age, stage and survival rate.
Through a large-scale analysis of oncogenomic array data sets, this study characterized features associated with genomic aberrations patterns, compatible to the spectrum of “chromothripsis”-definitions as previously used. While quantifying clustered genomic copy number aberrations in cancer samples, our data indicates an underlying biological heterogeneity behind these chromothripsis-like patterns, beyond a well defined “chromthripsis” phenomenon.
PMCID: PMC3909908  PMID: 24476156
Chromothripsis; Human cancer; Array comparative genomic hybridization; SNP array
3.  eggNOG v4.0: nested orthology inference across 3686 organisms 
Nucleic Acids Research  2013;42(D1):D231-D239.
With the increasing availability of various ‘omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.
PMCID: PMC3964997  PMID: 24297252
4.  STITCH 4: integration of protein–chemical interactions with user data 
Nucleic Acids Research  2013;42(D1):D401-D407.
STITCH is a database of protein–chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions. Available at, the resulting interaction network includes 390 000 chemicals and 3.6 million proteins from 1133 organisms. Compared with the previous version, the number of high-confidence protein–chemical interactions in human has increased by 45%, to 367 000. In this version, we added features for users to upload their own data to STITCH in the form of internal identifiers, chemical structures or quantitative data. For example, a user can now upload a spreadsheet with screening hits to easily check which interactions are already known. To increase the coverage of STITCH, we expanded the text mining to include full-text articles and added a prediction method based on chemical structures. We further changed our scheme for transferring interactions between species to rely on orthology rather than protein similarity. This improves the performance within protein families, where scores are now transferred only to orthologous proteins, but not to paralogous proteins. STITCH can be accessed with a web-interface, an API and downloadable files.
PMCID: PMC3964996  PMID: 24293645
5.  HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences 
Bioinformatics  2013;30(2):287-288.
Motivation: Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis—intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desirable because heuristic shortcuts taken during clustering might have unintended consequences in later analysis steps.
Results: Here we present HPC-CLUST, a highly optimized software pipeline that can cluster large numbers of pre-aligned DNA sequences by running on distributed computing hardware. It allocates both memory and computing resources efficiently, and can process more than a million sequences in a few hours on a small cluster.
Availability and implementation: Source code and binaries are freely available at; the pipeline is implemented in C++ and uses the Message Passing Interface (MPI) standard for distributed computing.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC3892691  PMID: 24215029
6.  Metaproteogenomic analysis of microbial communities in the phyllosphere and rhizosphere of rice 
The ISME Journal  2011;6(7):1378-1390.
The above- and below-ground parts of rice plants create specific habitats for various microorganisms. In this study, we characterized the phyllosphere and rhizosphere microbiota of rice cultivars using a metaproteogenomic approach to get insight into the physiology of the bacteria and archaea that live in association with rice. The metaproteomic datasets gave rise to a total of about 4600 identified proteins and indicated the presence of one-carbon conversion processes in the rhizosphere as well as in the phyllosphere. Proteins involved in methanogenesis and methanotrophy were found in the rhizosphere, whereas methanol-based methylotrophy linked to the genus Methylobacterium dominated within the protein repertoire of the phyllosphere microbiota. Further, physiological traits of differential importance in phyllosphere versus rhizosphere bacteria included transport processes and stress responses, which were more conspicuous in the phyllosphere samples. In contrast, dinitrogenase reductase was exclusively identified in the rhizosphere, despite the presence of nifH genes also in diverse phyllosphere bacteria.
PMCID: PMC3379629  PMID: 22189496
metaproteogenomics; phyllosphere; rhizosphere; Oryza sativa; microbial community; rice
7.  Microbial rhodopsins on leaf surfaces of terrestrial plants 
Environmental microbiology  2011;14(1):140-146.
The above-ground surfaces of terrestrial plants, the phyllosphere, comprise the main interface between the terrestrial biosphere and solar radiation. It is estimated to host up to 1026 microbial cells that may intercept part of the photon flux impinging on the leaves. Based on 454-pyrosequencing-generated metagenome data, we report on the existence of diverse microbial rhodopsins in five distinct phyllospheres from tamarisk (Tamarix nilotica), soybean (Glycine max), Arabidopsis (Arabidopsis thaliana), clover (Trifolium repens) and rice (Oryza sativa). Our findings, for the first time describing microbial rhodopsins from non-aquatic habitats, point towards the potential coexistence of microbial rhodopsin-based phototrophy and plant chlorophyll-based photosynthesis, with the different pigments absorbing non-overlapping fractions of the light spectrum.
PMCID: PMC3608849  PMID: 21883799
8.  STRING v9.1: protein-protein interaction networks, with increased coverage and integration 
Nucleic Acids Research  2012;41(D1):D808-D815.
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lower-quality data and/or computational predictions. The STRING database ( aims to provide such a global perspective for as many organisms as feasible. Known and predicted associations are scored and integrated, resulting in comprehensive protein networks covering >1100 organisms. Here, we describe the update to version 9.1 of STRING, introducing several improvements: (i) we extend the automated mining of scientific texts for interaction information, to now also include full-text articles; (ii) we entirely re-designed the algorithm for transferring interactions from one model organism to the other; and (iii) we provide users with statistical information on any functional enrichment observed in their networks.
PMCID: PMC3531103  PMID: 23203871
9.  Specific Genomic Regions Are Differentially Affected by Copy Number Alterations across Distinct Cancer Types, in Aggregated Cytogenetic Data 
PLoS ONE  2012;7(8):e43689.
Regional genomic copy number alterations (CNA) are observed in the vast majority of cancers. Besides specifically targeting well-known, canonical oncogenes, CNAs may also play more subtle roles in terms of modulating genetic potential and broad gene expression patterns of developing tumors. Any significant differences in the overall CNA patterns between different cancer types may thus point towards specific biological mechanisms acting in those cancers. In addition, differences among CNA profiles may prove valuable for cancer classifications beyond existing annotation systems.
Principal Findings
We have analyzed molecular-cytogenetic data from 25579 tumors samples, which were classified into 160 cancer types according to the International Classification of Disease (ICD) coding system. When correcting for differences in the overall CNA frequencies between cancer types, related cancers were often found to cluster together according to similarities in their CNA profiles. Based on a randomization approach, distance measures from the cluster dendrograms were used to identify those specific genomic regions that contributed significantly to this signal. This approach identified 43 non-neutral genomic regions whose propensity for the occurrence of copy number alterations varied with the type of cancer at hand. Only a subset of these identified loci overlapped with previously implied, highly recurrent (hot-spot) cytogenetic imbalance regions.
Thus, for many genomic regions, a simple null-hypothesis of independence between cancer type and relative copy number alteration frequency can be rejected. Since a subset of these regions display relatively low overall CNA frequencies, they may point towards second-tier genomic targets that are adaptively relevant but not necessarily essential for cancer development.
PMCID: PMC3427184  PMID: 22937079
10.  High Confidence Prediction of Essential Genes in Burkholderia Cenocepacia 
PLoS ONE  2012;7(6):e40064.
Essential genes are absolutely required for the survival of an organism. The identification of essential genes, besides being one of the most fundamental questions in biology, is also of interest for the emerging science of synthetic biology and for the development of novel antimicrobials. New antimicrobial therapies are desperately needed to treat multidrug-resistant pathogens, such as members of the Burkholderia cepacia complex.
Methodology/Principal Findings
We hypothesize that essential genes may be highly conserved within a group of evolutionary closely related organisms. Using a bioinformatics approach we determined that the core genome of the order Burkholderiales consists of 649 genes. All but two of these identified genes were located on chromosome 1 of Burkholderia cenocepacia. Although many of the 649 core genes of Burkholderiales have been shown to be essential in other bacteria, we were also able to identify a number of novel essential genes present mainly, or exclusively, within this order. The essentiality of some of the core genes, including the known essential genes infB, gyrB, ubiB, and valS, as well as the so far uncharacterized genes BCAL1882, BCAL2769, BCAL3142 and BCAL3369 has been confirmed experimentally in B. cenocepacia.
We report on the identification of essential genes using a novel bioinformatics strategy and provide bioinformatics and experimental evidence that the large majority of the identified genes are indeed essential. The essential genes identified here may represent valuable targets for the development of novel antimicrobials and their detailed study may shed new light on the functions required to support life.
PMCID: PMC3386938  PMID: 22768221
11.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges 
Nucleic Acids Research  2011;40(D1):D284-D289.
Orthologous relationships form the basis of most comparative genomic and metagenomic studies and are essential for proper phylogenetic and functional analyses. The third version of the eggNOG database ( contains non-supervised orthologous groups constructed from 1133 organisms, doubling the number of genes with orthology assignment compared to eggNOG v2. The new release is the result of a number of improvements and expansions: (i) the underlying homology searches are now based on the SIMAP database; (ii) the orthologous groups have been extended to 41 levels of selected taxonomic ranges enabling much more fine-grained orthology assignments; and (iii) the newly designed web page is considerably faster with more functionality. In total, eggNOG v3 contains 721 801 orthologous groups, encompassing a total of 4 396 591 genes. Additionally, we updated 4873 and 4850 original COGs and KOGs, respectively, to include all 1133 organisms. At the universal level, covering all three domains of life, 101 208 orthologous groups are available, while the others are applicable at 40 more limited taxonomic ranges. Each group is amended by multiple sequence alignments and maximum-likelihood trees and broad functional descriptions are provided for 450 904 orthologous groups (62.5%).
PMCID: PMC3245133  PMID: 22096231
12.  STITCH 3: zooming in on protein–chemical interactions 
Nucleic Acids Research  2011;40(D1):D876-D880.
To facilitate the study of interactions between proteins and chemicals, we have created STITCH, an aggregated database of interactions connecting over 300 000 chemicals and 2.6 million proteins from 1133 organisms. Compared to the previous version, the number of chemicals with interactions and the number of high-confidence interactions both increase 4-fold. The database can be accessed interactively through a web interface, displaying interactions in an integrated network view. It is also available for computational studies through downloadable files and an API. As an extension in the current version, we offer the option to switch between two levels of detail, namely whether stereoisomers of a given compound are shown as a merged entity or as separate entities. Separate display of stereoisomers is necessary, for example, for carbohydrates and chiral drugs. Combining the isomers increases the coverage, as interaction databases and publications found through text mining will often refer to compounds without specifying the stereoisomer. The database is accessible at
PMCID: PMC3245073  PMID: 22075997
13.  In Situ Enzyme Activity in the Dissolved and Particulate Fraction of the Fluid from Four Pitcher Plant Species of the Genus Nepenthes 
PLoS ONE  2011;6(9):e25144.
The genus Nepenthes, a carnivorous plant, has a pitcher to trap insects and digest them in the contained fluid to gain nutrient. A distinctive character of the pitcher fluid is the digestive enzyme activity that may be derived from plants and dwelling microbes. However, little is known about in situ digestive enzymes in the fluid. Here we examined the pitcher fluid from four species of Nepenthes. High bacterial density was observed within the fluids, ranging from 7×106 to 2.2×108 cells ml−1. We measured the activity of three common enzymes in the fluid: acid phosphatases, β-d-glucosidases, and β-d-glucosaminidases. All the tested enzymes detected in the liquid of all the pitcher species showed activity that considerably exceeded that observed in aquatic environments such as freshwater, seawater, and sediment. Our results indicate that high enzyme activity within a pitcher could assist in the rapid decomposition of prey to maximize efficient nutrient use. In addition, we filtered the fluid to distinguish between dissolved enzyme activity and particle-bound activity. As a result, filtration treatment significantly decreased the activity in all enzymes, while pH value and Nepenthes species did not affect the enzyme activity. It suggested that enzymes bound to bacteria and other organic particles also would significantly contribute to the total enzyme activity of the fluid. Since organic particles are themselves usually colonized by attached and highly active bacteria, it is possible that microbe-derived enzymes also play an important role in nutrient recycling within the fluid and affect the metabolism of the Nepenthes pitcher plant.
PMCID: PMC3174996  PMID: 21949872
14.  Phosphoproteomic Analysis Reveals Interconnected System-Wide Responses to Perturbations of Kinases and Phosphatases in Yeast 
Science signaling  2010;3(153):rs4.
The phosphorylation and dephosphorylation of proteins by kinases and phosphatases constitute an essential regulatory network in eukaryotic cells. This network supports the flow of information from sensors through signaling systems to effector molecules, and ultimately drives the phenotype and function of cells, tissues, and organisms. Dysregulation of this process has severe consequences and is one of the main factors in the emergence and progression of diseases, including cancer. Thus, major efforts have been invested in developing specific inhibitors that modulate the activity of individual kinases or phosphatases; however, it has been difficult to assess how such pharmacological interventions would affect the cellular signaling network as a whole. Here, we used label-free, quantitative phosphoproteomics in a systematically perturbed model organism (Saccharomyces cerevisiae) to determine the relationships between 97 kinases, 27 phosphatases, and more than 1000 phosphoproteins. We identified 8814 regulated phosphorylation events, describing the first system-wide protein phosphorylation network in vivo. Our results show that, at steady state, inactivation of most kinases and phosphatases affected large parts of the phosphorylation-modulated signal transduction machinery, and not only the immediate downstream targets. The observed cellular growth phenotype was often well maintained despite the perturbations, arguing for considerable robustness in the system. Our results serve to constrain future models of cellular signaling and reinforce the idea that simple linear representations of signaling pathways might be insufficient for drug development and for describing organismal homeostasis.
PMCID: PMC3072779  PMID: 21177495
15.  Cell-Sorting at the A/P Boundary in the Drosophila Wing Primordium: A Computational Model to Consolidate Observed Non-Local Effects of Hh Signaling 
PLoS Computational Biology  2011;7(4):e1002025.
Non-intermingling, adjacent populations of cells define compartment boundaries; such boundaries are often essential for the positioning and the maintenance of tissue-organizers during growth. In the developing wing primordium of Drosophila melanogaster, signaling by the secreted protein Hedgehog (Hh) is required for compartment boundary maintenance. However, the precise mechanism of Hh input remains poorly understood. Here, we combine experimental observations of perturbed Hh signaling with computer simulations of cellular behavior, and connect physical properties of cells to their Hh signaling status. We find that experimental disruption of Hh signaling has observable effects on cell sorting surprisingly far from the compartment boundary, which is in contrast to a previous model that confines Hh influence to the compartment boundary itself. We have recapitulated our experimental observations by simulations of Hh diffusion and transduction coupled to mechanical tension along cell-to-cell contact surfaces. Intriguingly, the best results were obtained under the assumption that Hh signaling cannot alter the overall tension force of the cell, but will merely re-distribute it locally inside the cell, relative to the signaling status of neighboring cells. Our results suggest a scenario in which homotypic interactions of a putative Hh target molecule at the cell surface are converted into a mechanical force. Such a scenario could explain why the mechanical output of Hh signaling appears to be confined to the compartment boundary, despite the longer range of the Hh molecule itself. Our study is the first to couple a cellular vertex model describing mechanical properties of cells in a growing tissue, to an explicit model of an entire signaling pathway, including a freely diffusible component. We discuss potential applications and challenges of such an approach.
Author Summary
In developing animal tissues, cells can often re-arrange locally and mix relatively freely. However, in some stereotypic and crucially important instances during body development, cells will strictly not intermingle, and instead form sharp boundaries along which they will sort out from each other. This mechanism helps organisms to establish signaling centers and to maintain distinct cellular identities. Often, cells at such boundaries will remain in close physical contact and are morphologically alike. Thus, the boundary itself can be difficult to observe unless the expression status of specific marker genes is monitored experimentally. How are these ‘compartment boundaries’ established? Here we devise a computational model that aims to describe one such boundary in a well-studied animal tissue: the developing wing primordium of Drosophila melanogaster. We model the production, diffusion and local sensing of an essential signaling molecule, the Hedgehog protein. We reveal one possible mechanism by which Hedgehog sensing can influence the mechanical properties of cells, and compare the simulated outcome to observations in experimentally perturbed, actual wing discs. Our relatively simple model suffices to establish a straight and stable compartment boundary.
PMCID: PMC3072364  PMID: 21490725
16.  The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored 
Nucleic Acids Research  2010;39(Database issue):D561-D568.
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein–protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space. New features in STRING include an interactive network viewer that can cluster networks on demand, updated on-screen previews of structural information including homology models, extensive data updates and strongly improved connectivity and integration with third-party resources. Version 9.0 of STRING covers more than 1100 completely sequenced organisms; the resource can be reached at
PMCID: PMC3013807  PMID: 21045058
17.  The Microbiota Mediates Pathogen Clearance from the Gut Lumen after Non-Typhoidal Salmonella Diarrhea 
PLoS Pathogens  2010;6(9):e1001097.
Many enteropathogenic bacteria target the mammalian gut. The mechanisms protecting the host from infection are poorly understood. We have studied the protective functions of secretory antibodies (sIgA) and the microbiota, using a mouse model for S. typhimurium diarrhea. This pathogen is a common cause of diarrhea in humans world-wide. S. typhimurium (S. tmatt, sseD) causes a self-limiting gut infection in streptomycin-treated mice. After 40 days, all animals had overcome the disease, developed a sIgA response, and most had cleared the pathogen from the gut lumen. sIgA limited pathogen access to the mucosal surface and protected from gut inflammation in challenge infections. This protection was O-antigen specific, as demonstrated with pathogens lacking the S. typhimurium O-antigen (wbaP, S. enteritidis) and sIgA-deficient mice (TCRβ−/−δ−/−, JH−/−, IgA−/−, pIgR−/−). Surprisingly, sIgA-deficiency did not affect the kinetics of pathogen clearance from the gut lumen. Instead, this was mediated by the microbiota. This was confirmed using ‘L-mice’ which harbor a low complexity gut flora, lack colonization resistance and develop a normal sIgA response, but fail to clear S. tmatt from the gut lumen. In these mice, pathogen clearance was achieved by transferring a normal complex microbiota. Thus, besides colonization resistance ( = pathogen blockage by an intact microbiota), the microbiota mediates a second, novel protective function, i.e. pathogen clearance. Here, the normal microbiota re-grows from a state of depletion and disturbed composition and gradually clears even very high pathogen loads from the gut lumen, a site inaccessible to most “classical” immune effector mechanisms. In conclusion, sIgA and microbiota serve complementary protective functions. The microbiota confers colonization resistance and mediates pathogen clearance in primary infections, while sIgA protects from disease if the host re-encounters the same pathogen. This has implications for curing S. typhimurium diarrhea and for preventing transmission.
Author Summary
Numerous pathogens infect the gut. Protection against these infections is mediated by mucosal immune defenses including secreted IgA as well as by the competing intestinal microbiota. However, so far the relative importance of these two different defense mechanisms remains unclear. We addressed this question using the example of non-typhoidal Salmonella (NTS) gut infections which can be spread in stool of infected patients over long periods of time. We used a mouse model to reveal that the intestinal microbiota and the adaptive immune system hold different but complementary functions in fighting NTS infections. A primary Salmonella infection disrupts the normal microbiota and elicits Salmonella-specific sIgA. sIgA prevents disease when the animal is infected with NTS for a second time. However, sIgA was dispensable for pathogen clearance from the gut. Instead, this was mediated by the microbiota. By re-establishing its normal density and composition, the microbiota was necessary and sufficient for terminating long-term fecal Salmonella excretion. This establishes a novel paradigm: The microbiota clears the pathogen from the gut lumen, while sIgA protects from disease upon re-infection with the same pathogen. This has implications for the evolutionary role of sIgA responses as well as for developing microbiota-based therapies for curing infected patients.
PMCID: PMC2936549  PMID: 20844578
18.  MLTreeMap - accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies 
BMC Genomics  2010;11:461.
Shotgun sequencing of environmental DNA is an essential technique for characterizing uncultivated microbes in situ. However, the taxonomic and functional assignment of the obtained sequence fragments remains a pressing problem.
Existing algorithms are largely optimized for speed and coverage; in contrast, we present here a software framework that focuses on a restricted set of informative gene families, using Maximum Likelihood to assign these with the best possible accuracy. This framework ('MLTreeMap'; uses raw nucleotide sequences as input, and includes hand-curated, extensible reference information.
We discuss how we validated our pipeline using complete genomes as well as simulated and actual environmental sequences.
PMCID: PMC3091657  PMID: 20687950
19.  Like Will to Like: Abundances of Closely Related Species Can Predict Susceptibility to Intestinal Colonization by Pathogenic and Commensal Bacteria 
PLoS Pathogens  2010;6(1):e1000711.
The intestinal ecosystem is formed by a complex, yet highly characteristic microbial community. The parameters defining whether this community permits invasion of a new bacterial species are unclear. In particular, inhibition of enteropathogen infection by the gut microbiota ( = colonization resistance) is poorly understood. To analyze the mechanisms of microbiota-mediated protection from Salmonella enterica induced enterocolitis, we used a mouse infection model and large scale high-throughput pyrosequencing. In contrast to conventional mice (CON), mice with a gut microbiota of low complexity (LCM) were highly susceptible to S. enterica induced colonization and enterocolitis. Colonization resistance was partially restored in LCM-animals by co-housing with conventional mice for 21 days (LCMcon21). 16S rRNA sequence analysis comparing LCM, LCMcon21 and CON gut microbiota revealed that gut microbiota complexity increased upon conventionalization and correlated with increased resistance to S. enterica infection. Comparative microbiota analysis of mice with varying degrees of colonization resistance allowed us to identify intestinal ecosystem characteristics associated with susceptibility to S. enterica infection. Moreover, this system enabled us to gain further insights into the general principles of gut ecosystem invasion by non-pathogenic, commensal bacteria. Mice harboring high commensal E. coli densities were more susceptible to S. enterica induced gut inflammation. Similarly, mice with high titers of Lactobacilli were more efficiently colonized by a commensal Lactobacillus reuteri RR strain after oral inoculation. Upon examination of 16S rRNA sequence data from 9 CON mice we found that closely related phylotypes generally display significantly correlated abundances (co-occurrence), more so than distantly related phylotypes. Thus, in essence, the presence of closely related species can increase the chance of invasion of newly incoming species into the gut ecosystem. We provide evidence that this principle might be of general validity for invasion of bacteria in preformed gut ecosystems. This might be of relevance for human enteropathogen infections as well as therapeutic use of probiotic commensal bacteria.
Author Summary
The commensal microbiota, populating the intestinal tract to high levels, is fundamental to human health. It exerts beneficial effects on the immune system and contributes to protection against gastrointestinal infections ( = colonization resistance) by largely unknown mechanisms. Here, we reveal characteristics of the commensal microbiota indicative for a high or low degree of colonization resistance. Using a mouse model for Salmonella enterica induced gut inflammation and microbiota analysis by 454 amplicon sequencing, we show that mice having different types of microbiota exhibit differential susceptibility to pathogen infection. In addition, our data lead to the description of a new concept in gut ecosystem biology: the intrusion-success of an extrinsic bacterial species into an established gut ecosystem is related to the abundance of closely related bacteria, already present in this gut ecosystem. We show that this principle applies not only to enteropathogen infection but also to inoculation with beneficial gut bacteria. Humans can display largely different degrees of susceptibility to enteric infections. Similarly, the effectiveness of probiotic therapy varies greatly from person to person. Our data might explain these differences and could be used for increasing the efficacy of probiotic therapy and for identifying patients at risk of developing enteric infections.
PMCID: PMC2796170  PMID: 20062525
20.  STITCH 2: an interaction network database for small molecules and proteins 
Nucleic Acids Research  2009;38(Database issue):D552-D556.
Over the last years, the publicly available knowledge on interactions between small molecules and proteins has been steadily increasing. To create a network of interactions, STITCH aims to integrate the data dispersed over the literature and various databases of biological pathways, drug–target relationships and binding affinities. In STITCH 2, the number of relevant interactions is increased by incorporation of BindingDB, PharmGKB and the Comparative Toxicogenomics Database. The resulting network can be explored interactively or used as the basis for large-scale analyses. To facilitate links to other chemical databases, we adopt InChIKeys that allow identification of chemicals with a short, checksum-like string. STITCH 2.0 connects proteins from 630 organisms to over 74 000 different chemicals, including 2200 drugs. STITCH can be accessed at
PMCID: PMC2808890  PMID: 19897548
21.  The Hedgehog Signaling Pathway: Where Did It Come From? 
PLoS Biology  2009;7(6):e1000146.
The Hedgehog signaling pathway plays a crucial role in development and disease. Its putative origins in an ancient system involved in regulating bacterial lipid transport and homeostasis offers clues about how the pathway might work today.
PMCID: PMC2698682  PMID: 19564910
22.  Comparative Functional Analysis of the Caenorhabditis elegans and Drosophila melanogaster Proteomes 
PLoS Biology  2009;7(3):e1000048.
The nematode Caenorhabditis elegans is a popular model system in genetics, not least because a majority of human disease genes are conserved in C. elegans. To generate a comprehensive inventory of its expressed proteome, we performed extensive shotgun proteomics and identified more than half of all predicted C. elegans proteins. This allowed us to confirm and extend genome annotations, characterize the role of operons in C. elegans, and semiquantitatively infer abundance levels for thousands of proteins. Furthermore, for the first time to our knowledge, we were able to compare two animal proteomes (C. elegans and Drosophila melanogaster). We found that the abundances of orthologous proteins in metazoans correlate remarkably well, better than protein abundance versus transcript abundance within each organism or transcript abundances across organisms; this suggests that changes in transcript abundance may have been partially offset during evolution by opposing changes in protein abundance.
Author Summary
Proteins are the active players that execute the genetic program of a cell, and their levels and interactions are precisely controlled. Routinely monitoring thousands of proteins is difficult, as they can be present at vastly different abundances, come with various sizes, shapes, and charge, and have a more complex alphabet of twenty “letters,” in contrast to the four letters of the genome itself. Here, we used mass spectrometry to extensively characterize the proteins of a popular model organism, the nematode Caenorhabditis elegans. Together with previous data from the fruit fly Drosophila melanogaster, this allows us to compare the protein levels of two animals on a global scale. Surprisingly, we find that individual protein abundance is highly conserved between the two species. So, although worms and flies look very different, they need similar amounts of each conserved, orthologous protein. Because many C. elegans and D. melanogaster proteins also have counterparts in humans, our results suggest that similar rules may apply to our own proteins.
A quantitative comparison of two animal proteomes shows a striking correlation of protein abundance levels, a better correlation than transcript levels. Are the latter more variable during evolution?
PMCID: PMC2650730  PMID: 19260763
23.  STRING 8—a global view on proteins and their functional interactions in 630 organisms 
Nucleic Acids Research  2008;37(Database issue):D412-D416.
Functional partnerships between proteins are at the core of complex cellular phenotypes, and the networks formed by interacting proteins provide researchers with crucial scaffolds for modeling, data reduction and annotation. STRING is a database and web resource dedicated to protein–protein interactions, including both physical and functional interactions. It weights and integrates information from numerous sources, including experimental repositories, computational prediction methods and public text collections, thus acting as a meta-database that maps all interaction evidence onto a common set of genomes and proteins. The most important new developments in STRING 8 over previous releases include a URL-based programming interface, which can be used to query STRING from other resources, improved interaction prediction via genomic neighborhood in prokaryotes, and the inclusion of protein structures. Version 8.0 of STRING covers about 2.5 million proteins from 630 organisms, providing the most comprehensive view on protein–protein interactions currently available. STRING can be reached at
PMCID: PMC2686466  PMID: 18940858
24.  Millimeter-scale genetic gradients and community-level molecular convergence in a hypersaline microbial mat 
To investigate the extent of genetic stratification in structured microbial communities, we compared the metagenomes of 10 successive layers of a phylogenetically complex hypersaline mat from Guerrero Negro, Mexico. We found pronounced millimeter-scale genetic gradients that were consistent with the physicochemical profile of the mat. Despite these gradients, all layers displayed near-identical and acid-shifted isoelectric point profiles due to a molecular convergence of amino-acid usage, indicating that hypersalinity enforces an overriding selective pressure on the mat community.
PMCID: PMC2483411  PMID: 18523433
metagenomics; hypersalinity; microbial ecology; fine-scale; salt-in
25.  STITCH: interaction networks of chemicals and proteins 
Nucleic Acids Research  2007;36(Database issue):D684-D688.
The knowledge about interactions between proteins and small molecules is essential for the understanding of molecular and cellular functions. However, information on such interactions is widely dispersed across numerous databases and the literature. To facilitate access to this data, STITCH (‘search tool for interactions of chemicals’) integrates information about interactions from metabolic pathways, crystal structures, binding experiments and drug–target relationships. Inferred information from phenotypic effects, text mining and chemical structure similarity is used to predict relations between chemicals. STITCH further allows exploring the network of chemical relations, also in the context of associated binding proteins. Each proposed interaction can be traced back to the original data sources. Our database contains interaction information for over 68 000 different chemicals, including 2200 drugs, and connects them to 1.5 million genes across 373 genomes and their interactions contained in the STRING database. STITCH is available at
PMCID: PMC2238848  PMID: 18084021

Results 1-25 (35)