Gram-positive bacteria can transport molecules necessary for their survival through holes in their cell wall. The holes in cell walls need to be large enough to let critical nutrients pass through. However, the cell wall must also function to prevent the bacteria's membrane from protruding through a large hole into the environment and lysing the cell. As such, we hypothesize that there exists a range of cell wall hole sizes that allow for molecule transport but prevent membrane protrusion. Here, we develop and analyse a biophysical theory of the response of a Gram-positive cell's membrane to the formation of a hole in the cell wall. We predict a critical hole size in the range of 15–24 nm beyond which lysis occurs. To test our theory, we measured hole sizes in Streptococcus pyogenes cells undergoing enzymatic lysis via transmission electron microscopy. The measured hole sizes are in strong agreement with our theoretical prediction. Together, the theory and experiments provide a means to quantify the mechanisms of death of Gram-positive cells via enzymatically mediated lysis and provides insights into the range of cell wall hole sizes compatible with bacterial homeostasis.
doi:10.1098/rsif.2012.0892
PMCID: PMC3565739
PMID: 23303219
enzybiotic; biophysics; membrane dynamics; microbiology
Bacteriophages are the most abundant biological life forms on Earth. However, relatively little is known regarding which bacteriophages infect and exploit which bacteria. A recent meta-analysis showed that empirically measured phage-bacteria infection networks are often significantly nested, on average, and not modular. A perfectly nested network is one in which phages can be ordered from specialist to generalist such that the host range of a given phage is a subset of the host range of the subsequent phage in the ordering. The same meta-analysis hypothesized that modularity, in which groups of phages specialize on distinct groups of hosts, should emerge at larger geographic and/or taxonomic scales. In this paper, we evaluate the largest known phage-bacteria interaction data set, representing the interaction of 215 phage types with 286 host types sampled from geographically separated sites in the Atlantic Ocean. We find that this interaction network is highly modular. In addition, some of the modules identified in this data set are nested or contain submodules, indicating the presence of multi-scale structure, as hypothesized in the earlier meta-analysis. We examine the role of geography in driving these patterns and find evidence that the host range of phages and the phage permissibility of bacteria is driven, in part, by geographic separation. We conclude by discussing approaches to disentangle the roles of ecology and evolution in driving complex patterns of interaction between phages and bacteria.
doi:10.1038/ismej.2012.135
PMCID: PMC3578562
PMID: 23178671
microbial ecology; viruses; biogeography; networks
The direct “metagenomic” sequencing of genomic material from complex assemblages of bacteria, archaea, viruses and microeukaryotes has yielded new insights into the structure of microbial communities. For example, analysis of metagenomic data has revealed the existence of previously unknown microbial taxa whose spatial distributions are limited by environmental conditions, ecological competition, and dispersal mechanisms. However, differences in genotypes that might lead biologists to designate two microbes as taxonomically distinct need not necessarily imply differences in ecological function. Hence, there is a growing need for large-scale analysis of the distribution of microbial function across habitats. Here, we present a framework for investigating the biogeography of microbial function by analyzing the distribution of protein families inferred from environmental sequence data across a global collection of sites. We map over 6,000,000 protein sequences from unassembled reads from the Global Ocean Survey dataset to protein families, generating a protein family relative abundance matrix that describes the distribution of each protein family across sites. We then use non-negative matrix factorization (NMF) to approximate these protein family profiles as linear combinations of a small number of ecological components. Each component has a characteristic functional profile and site profile. Our approach identifies common functional signatures within several of the components. We use our method as a filter to estimate functional distance between sites, and find that an NMF-filtered measure of functional distance is more strongly correlated with environmental distance than a comparable PCA-filtered measure. We also find that functional distance is more strongly correlated with environmental distance than with geographic distance, in agreement with prior studies. We identify similar protein functions in several components and suggest that functional co-occurrence across metagenomic samples could lead to future methods for de-novo functional prediction. We conclude by discussing how NMF, and other dimension reduction methods, can help enable a macroscopic functional description of marine ecosystems.
doi:10.1371/journal.pone.0043866
PMCID: PMC3445553
PMID: 23049741
Viruses are the most abundant life forms on Earth, with an estimated 1031 total viruses globally. The majority of these viruses infect microbes, whether bacteria, archaea or microeukaryotes. Given the importance of microbes in driving global biogeochemical cycles, it would seem, based on numerical abundances alone, that viruses also play an important role in the global cycling of carbon and nutrients. However, the importance of viruses in controlling host populations and ecosystem functions, such as the regeneration, storage and export of carbon and other nutrients, remains unresolved. Here, we report on advances in the study of ecological effects of viruses of microbes. In doing so, we focus on an area of increasing importance: the role that ocean viruses play in shaping microbial population sizes as well as in regenerating carbon and other nutrients.
doi:10.3410/B4-17
PMCID: PMC3434959
PMID: 22991582
Galkovskyi, Taras | Mileyko, Yuriy | Bucksch, Alexander | Moore, Brad | Symonova, Olga | Price, Charles A | Topp, Christopher N | Iyer-Pascuzzi, Anjali S | Zurek, Paul R | Fang, Suqin | Harer, John | Benfey, Philip N | Weitz, Joshua S
Background
Characterizing root system architecture (RSA) is essential to understanding the development and function of vascular plants. Identifying RSA-associated genes also represents an underexplored opportunity for crop improvement. Software tools are needed to accelerate the pace at which quantitative traits of RSA are estimated from images of root networks.
Results
We have developed GiA Roots (General Image Analysis of Roots), a semi-automated software tool designed specifically for the high-throughput analysis of root system images. GiA Roots includes user-assisted algorithms to distinguish root from background and a fully automated pipeline that extracts dozens of root system phenotypes. Quantitative information on each phenotype, along with intermediate steps for full reproducibility, is returned to the end-user for downstream analysis. GiA Roots has a GUI front end and a command-line interface for interweaving the software into large-scale workflows. GiA Roots can also be extended to estimate novel phenotypes specified by the end-user.
Conclusions
We demonstrate the use of GiA Roots on a set of 2393 images of rice roots representing 12 genotypes from the species Oryza sativa. We validate trait measurements against prior analyses of this image set that demonstrated that RSA traits are likely heritable and associated with genotypic differences. Moreover, we demonstrate that GiA Roots is extensible and an end-user can add functionality so that GiA Roots can estimate novel RSA traits. In summary, we show that the software can function as an efficient tool as part of a workflow to move from large numbers of root images to downstream analysis.
doi:10.1186/1471-2229-12-116
PMCID: PMC3444351
PMID: 22834569
The structure of hierarchical networks in biological and physical systems has long been characterized using the Horton-Strahler ordering scheme. The scheme assigns an integer order to each edge in the network based on the topology of branching such that the order increases from distal parts of the network (e.g., mountain streams or capillaries) to the “root” of the network (e.g., the river outlet or the aorta). However, Horton-Strahler ordering cannot be applied to networks with loops because they they create a contradiction in the edge ordering in terms of which edge precedes another in the hierarchy. Here, we present a generalization of the Horton-Strahler order to weighted planar reticular networks, where weights are assumed to correlate with the importance of network edges, e.g., weights estimated from edge widths may correlate to flow capacity. Our method assigns hierarchical levels not only to edges of the network, but also to its loops, and classifies the edges into reticular edges, which are responsible for loop formation, and tree edges. In addition, we perform a detailed and rigorous theoretical analysis of the sensitivity of the hierarchical levels to weight perturbations. In doing so, we show that the ordering of the reticular edges is more robust to noise in weight estimation than is the ordering of the tree edges. We discuss applications of this generalized Horton-Strahler ordering to the study of leaf venation and other biological networks.
doi:10.1371/journal.pone.0036715
PMCID: PMC3368924
PMID: 22701559
Background
The gene composition of bacteria of the same species can differ significantly between isolates. Variability in gene composition can be summarized in terms of gene frequency distributions, in which individual genes are ranked according to the frequency of genomes in which they appear. Empirical gene frequency distributions possess a U-shape, such that there are many rare genes, some genes of intermediate occurrence, and many common genes. It would seem that U-shaped gene frequency distributions can be used to infer the essentiality and/or importance of a gene to a species. Here, we ask: can U-shaped gene frequency distributions, instead, arise generically via neutral processes of genome evolution?
Results
We introduce a neutral model of genome evolution which combines birth-death processes at the organismal level with gene uptake and loss at the genomic level. This model predicts that gene frequency distributions possess a characteristic U-shape even in the absence of selective forces driving genome and population structure. We compare the model predictions to empirical gene frequency distributions from 6 multiply sequenced species of bacterial pathogens. We fit the model with constant population size to data, matching U-shape distributions albeit without matching all quantitative features of the distribution. We find stronger model fits in the case where we consider exponentially growing populations. We also show that two alternative models which contain a "rigid" and "flexible" core component of genomes provide strong fits to gene frequency distributions.
Conclusions
The analysis of neutral models of genome evolution suggests that U-shaped gene frequency distributions provide less information than previously suggested regarding gene essentiality. We discuss the need for additional theory and genomic level information to disentangle the roles of evolutionary mechanisms operating within and amongst individuals in driving the dynamics of gene distributions.
doi:10.1186/1471-2164-13-196
PMCID: PMC3386021
PMID: 22613814
Bacteria; Neutral model; Pan-genome; Population genomics; Selection
The processes responsible for the evolution of key innovations, whereby lineages acquire qualitatively new functions that expand their ecological opportunities, remain poorly understood. We examined how a virus, bacteriophage λ, evolved to infect its host, Escherichia coli, through a novel pathway. Natural selection promoted the fixation of mutations in the virus’s host-recognition protein, J, that improved fitness on the original receptor, LamB, and set the stage for other mutations that allowed infection through a new receptor, OmpF. These viral mutations arose only after the host evolved reduced expression of LamB, whereas certain other host mutations prevented the phage from evolving the new function. This study shows the complex interplay between genomic processes and ecological conditions that favor the emergence of evolutionary innovations.
doi:10.1126/science.1214449
PMCID: PMC3306806
PMID: 22282803
Cell fate determination is usually described as the result of the stochastic dynamics of gene regulatory networks (GRNs) reaching one of multiple steady-states each of which corresponds to a specific decision. However, the fate of a cell is determined in finite time suggesting the importance of transient dynamics in cellular decision making. Here we consider cellular decision making as resulting from first passage processes of regulatory proteins and examine the effect of transient dynamics within the initial lysis-lysogeny switch of phage λ. Importantly, the fate of an infected cell depends, in part, on the number of coinfecting phages. Using a quantitative model of the phage λ GRN, we find that changes in the likelihood of lysis and lysogeny can be driven by changes in phage co-infection number regardless of whether or not there exists steady-state bistability within the GRN. Furthermore, two GRNs which yield qualitatively distinct steady state behaviors as a function of phage infection number can show similar transient responses, sufficient for alternative cell fate determination. We compare our model results to a recent experimental study of cell fate determination in single cell assays of multiply infected bacteria. Whereas the experimental study proposed a “quasi-independent” hypothesis for cell fate determination consistent with an observed data collapse, we demonstrate that observed cell fate results are compatible with an alternative form of data collapse consistent with a partial gene dosage compensation mechanism. We show that including partial gene dosage compensation at the mRNA level in our stochastic model of fate determination leads to the same data collapse observed in the single cell study. Our findings elucidate the importance of transient gene regulatory dynamics in fate determination, and present a novel alternative hypothesis to explain single-cell level heterogeneity within the phage λ lysis-lysogeny decision switch.
Author Summary
Multicellular organisms, single-celled organisms, and even viruses can exhibit alternative responses to various internal and environmental conditions. At the cellular level, alternative fate determination is usually described as the result of the inherent bistability of gene regulatory networks (GRNs). However, the fate of a cell is determined in finite time suggesting the importance of transient dynamics to cellular decision making. Here, we present a quantitative gene regulatory model of how bacteriophages determine the fate of an infected bacterium. We find that increasing the number of infecting phages increases the chance of quiescent (i.e., lysogeny) vs. productive (i.e. lysis) viral growth, in agreement with prior studies. However, unlike previous theoretical studies, the bias in cell fate is a result of the transient divergence of stochastic gene expression dynamics. We compare and contrast our theoretical model with recent observations of cell fate measured at the single-cell level within multiply-infected cells. Predicted heterogeneity in cell fate is shown to agree with data when including a previously unidentified gene dosage compensation mechanism, which represents an alternative hypothesis to how multiple phages interact in influencing cell fate. Together, our results suggest the importance of quantitative details of transient gene regulation in driving stochastic fate determination.
doi:10.1371/journal.pcbi.1002006
PMCID: PMC3053317
PMID: 21423715
Background
The dual concepts of pan and core genomes have been widely adopted as means to assess the distribution of gene families within microbial species and genera. The core genome is the set of genes shared by a group of organisms; the pan genome is the set of all genes seen in any of these organisms. A variety of methods have provided drastically different estimates of the sizes of pan and core genomes from sequenced representatives of the same groups of bacteria.
Results
We use a combination of mathematical, statistical and computational methods to show that current predictions of pan and core genome sizes may have no correspondence to true values. Pan and core genome size estimates are problematic because they depend on the estimation of the occurrence of rare genes and genomes, respectively, which are difficult to estimate precisely because they are rare. Instead, we introduce and evaluate a robust metric - genomic fluidity - to categorize the gene-level similarity among groups of sequenced isolates. Genomic fluidity is a measure of the dissimilarity of genomes evaluated at the gene level.
Conclusions
The genomic fluidity of a population can be estimated accurately given a small number of sequenced genomes. Further, the genomic fluidity of groups of organisms can be compared robustly despite variation in algorithms used to identify genes and their homologs. As such, we recommend that genomic fluidity be used in place of pan and core genome size estimates when assessing gene diversity within genomes of a species or a group of closely related organisms.
doi:10.1186/1471-2164-12-32
PMCID: PMC3030549
PMID: 21232151
Background
The development of effective environmental shotgun sequence binning methods remains an ongoing challenge in algorithmic analysis of metagenomic data. While previous methods have focused primarily on supervised learning involving extrinsic data, a first-principles statistical model combined with a self-training fitting method has not yet been developed.
Results
We derive an unsupervised, maximum-likelihood formalism for clustering short sequences by their taxonomic origin on the basis of their k-mer distributions. The formalism is implemented using a Markov Chain Monte Carlo approach in a k-mer feature space. We introduce a space transformation that reduces the dimensionality of the feature space and a genomic fragment divergence measure that strongly correlates with the method's performance. Pairwise analysis of over 1000 completely sequenced genomes reveals that the vast majority of genomes have sufficient genomic fragment divergence to be amenable for binning using the present formalism. Using a high-performance implementation, the binner is able to classify fragments as short as 400 nt with accuracy over 90% in simulations of low-complexity communities of 2 to 10 species, given sufficient genomic fragment divergence. The method is available as an open source package called LikelyBin.
Conclusion
An unsupervised binning method based on statistical signatures of short environmental sequences is a viable stand-alone binning method for low complexity samples. For medium and high complexity samples, we discuss the possibility of combining the current method with other methods as part of an iterative process to enhance the resolving power of sorting reads into taxonomic and/or functional bins.
doi:10.1186/1471-2105-10-316
PMCID: PMC2765972
PMID: 19799776
Theoretical models for allometric relationships between organismal form and function are typically tested by comparing a single predicted relationship with empirical data. Several prominent models, however, predict more than one allometric relationship, and comparisons among alternative models have not taken this into account. Here we evaluate several different scaling models of plant morphology within a hierarchical Bayesian framework that simultaneously fits multiple scaling relationships to three large allometric datasets. The scaling models include: inflexible universal models derived from biophysical assumptions (e.g. elastic similarity or fractal networks), a flexible variation of a fractal network model, and a highly flexible model constrained only by basic algebraic relationships. We demonstrate that variation in intraspecific allometric scaling exponents is inconsistent with the universal models, and that more flexible approaches that allow for biological variability at the species level outperform universal models, even when accounting for relative increases in model complexity.
doi:10.1111/j.1461-0248.2009.01316.x
PMCID: PMC2730548
PMID: 19453621
Allometry; elastic similarity; fractal; geometric similarity; hierarchical Bayes; leaves; scaling; stress similarity; trees
Shifting the perspective of the questions we ask will ensure that network theory continues to excite the network theorists, but more importantly, that it remains vital to progress in biological research.
doi:10.1371/journal.pbio.0050011
PMCID: PMC1769436
PMID: 17227140
Trade-offs have been put forward as essential to the generation and maintenance of diversity. However, variation in trade-offs is often determined at the molecular level, outside the scope of conventional ecological inquiry. In this study, we propose that understanding the intracellular basis for trade-offs in microbial systems can aid in predicting and interpreting patterns of diversity. First, we show how laboratory experiments and mathematical models have unveiled the hidden intracellular mechanisms underlying trade-offs key to microbial diversity: (i) metabolic and regulatory trade-offs in bacteria and yeast; (ii) life-history trade-offs in bacterial viruses. Next, we examine recent studies of marine microbes that have taken steps toward reconciling the molecular and the ecological views of trade-offs, despite the challenges in doing so in natural settings. Finally, we suggest avenues for research where mathematical modelling, experiments and studies of natural microbial communities provide a unique opportunity to integrate studies of diversity across multiple scales.
doi:10.1111/j.1461-0248.2010.01507.x
PMCID: PMC3069490
PMID: 20576029
Ecological genomics; experimental evolution; mathematical models; micro-organisms; metabolism; parasitism; trade-offs; viruses
The CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) system is a recently discovered type of adaptive immune defense in bacteria and archaea that functions via directed incorporation of viral and plasmid DNA into host genomes. Here, we introduce a multiscale model of dynamic coevolution between hosts and viruses in an ecological context that incorporates CRISPR immunity principles. We analyze the model to test whether and how CRISPR immunity induces host and viral diversification and the maintenance of many coexisting strains. We show that hosts and viruses coevolve to form highly diverse communities. We observe the punctuated replacement of existent strains, such that populations have very low similarity compared over the long term. However, in the short term, we observe evolutionary dynamics consistent with both incomplete selective sweeps of novel strains (as single strains and coalitions) and the recurrence of previously rare strains. Coalitions of multiple dominant host strains are predicted to arise because host strains can have nearly identical immune phenotypes mediated by CRISPR defense albeit with different genotypes. We close by discussing how our explicit eco-evolutionary model of CRISPR immunity can help guide efforts to understand the drivers of diversity seen in microbial communities where CRISPR systems are active.
doi:10.1111/j.1558-5646.2012.01595.x
PMCID: PMC3437473
PMID: 22759281
Evolutionary biology; host–parasite interactions; immune defense; microbial ecology; viral evolution