Cells of undomesticated species of Bacillus subtilis frequently form complex colonies during spreading on agar surfaces. Given that menaquinone is involved in another form of coordinated behavior, namely, sporulation, we looked for a possible role for menaquinone in complex colony development (CCD) in the B. subtilis
strain NCIB 3610. Here we show that inhibition of menaquinone biosynthesis in B. subtilis indeed abolished its ability to develop complex colonies. Additionally some mutations of B. subtilis which confer defective CCD could be suppressed by menaquinone derivatives. Several such mutants mapped to the dhb operon encoding the genes responsible for the biosynthesis of the iron siderophore, bacillibactin. Our results demonstrate that both menaquinone and iron are essential for CCD in B. subtilis.
Methanosphaera stadtmanae is a commensal methanogenic archaeon found in the human gut. As most of its niche-neighbors are bacteria, it is expected that lateral gene transfer (LGT) from bacteria might have contributed to the evolutionary history of this organism. We performed a phylogenomic survey of putative LGT events in M. stadtmanae, using a phylogenetic pipeline. Our analysis indicates that a substantial fraction of the proteins of M. stadtmanae are inferred to have been involved in inter-domain LGT. Laterally acquired genes have had a large contribution to surface functions, by providing novel glycosyltransferase functions. In addition, several ABC transporters seem to be of bacterial origin, including the molybdate transporter. Thus, bacterial genes contributed to the adaptation of M. stadtmanae to a host-dependent lifestyle by allowing a larger variation in surface structures and increasing transport efficiency in the gut niche which is diverse and competitive.
horizontal gene transfer; microbial evolution; archaeal genomics; archaea; methanogens; human gut
KEOPS is an important cellular complex conserved in Eukarya, with some subunits conserved in Archaea and Bacteria. This complex was recently found to play an essential role in formation of the tRNA modification threonylcarbamoyladenosine (t6A), and was previously associated with telomere length maintenance and transcription. KEOPS subunits are conserved in Archaea, especially in the Euryarchaea, where they had been studied in vitro. Here we attempted to delete the genes encoding the four conserved subunits of the KEOPS complex in the euryarchaeote Haloferax volcanii and study their phenotypes in vivo. The fused kae1-bud32 gene was shown to be essential as was cgi121, which is dispensable in yeast. In contrast, pcc1 (encoding the putative dimerizing unit of KEOPS) was not essential in H. volcanii. Deletion of pcc1 led to pleiotropic phenotypes, including decreased growth rate, reduced levels of t6A modification, and elevated levels of intra-cellular glycation products.
Microbial ecosystems are often assumed to be relatively stable over short periods of time, but this assumption is seldom tested. An urban stream influenced by both flow and varying levels of anthropogenic influences is expected to have high temporal variability in microbial composition, and short-term ecological instability. Thus, we analyzed the bacterioplankton composition of a weir-fragmented urban stream using Automated rRNA Intergenic Spacer Analysis (ARISA). A total of 46 sequential samples were collected in July 2009 for 7 days, every 7 hours, from both the up-stream side of the weir (stream water) and the downstream side of the weir (estuarine) water. Bray-Curtis similarity based analysis showed a clear division between upstream and downstream communities. A sudden pH drop induced change in both communities, but composition stability partially recovered within less than a day. Thus, our results show that microbial ecosystems can change rapidly, but re-establish a new equilibrium relatively quickly.
CRISPR (Clustered, Regularly, Interspaced, Short, Palindromic Repeats) loci have been shown to provide prokaryotes with an adaptive immunity against viruses and plasmids. CRISPR arrays are transcribed and processed into small CRISPR RNA molecules, which base-pair with invading DNA or RNA and lead to its degradation by CRISPR-associated (Cas) protein complexes. New spacers can be acquired by active CRISPR/Cas systems, and thus the sequences of these spacers provide a record of the past “infection history” of the organism. Recently we used spacer sequences from archaeal genomes to infer gene exchange events among archaeal species and genera and to demonstrate that at least in this domain of life CRISPR indeed has an anti-viral role.
CRISPR; Lateral Gene Transfer; archaea; horizontal gene transfer; viruses
CRISPR (Clustered, Regularly, Interspaced, Short, Palindromic Repeats) loci provide prokaryotes with an adaptive immunity against viruses and other mobile genetic elements. CRISPR arrays can be transcribed and processed into small crRNA molecules, which are then used by the cell to target the foreign nucleic acid. Since spacers are accumulated by active CRISPR/Cas systems, the sequences of these spacers provide a record of the past "infection history" of the organism.
Here we analyzed all currently known spacers present in archaeal genomes and identified their source by DNA similarity. While nearly 50% of archaeal spacers matched mobile genetic elements, such as plasmids or viruses, several others matched chromosomal genes of other organisms, primarily other archaea. Thus, networks of gene exchange between archaeal species were revealed by the spacer analysis, including many cases of inter-genus and inter-species gene transfer events. Spacers that recognize viral sequences tend to be located further away from the leader sequence, implying that there exists a selective pressure for their retention.
CRISPR spacers provide direct evidence for extensive gene exchange in archaea, especially within genera, and support the current dogma where the primary role of the CRISPR/Cas system is anti-viral and anti-plasmid defense.
Open peer review
This article was reviewed by: Profs. W. Ford Doolittle, John van der Oost, Christa Schleper (nominated by board member Prof. J Peter Gogarten)
CRISPR; Lateral Gene transfer; Horizontal gene transfer; viruses; archaea; competence
Degradation of mRNA in bacteria is a regulatory mechanism, providing an efficient way to fine-tune protein abundance in response to environmental changes. While the mechanisms responsible for initiation and subsequent propagation of mRNA degradation are well studied, the mRNA features that affect its stability are yet to be elucidated. We calculated three properties for each mRNA in the E. coli transcriptome: G+C content, tRNA adaptation index (tAI) and folding energy. Each of these properties were then correlated with the experimental transcript half life measured for each transcript and detected significant correlations. A sliding window analysis identified the regions that displayed the maximal signal. The correlation between transcript half life and both G+C content and folding energy was strongest at the 5′ termini of the mRNAs. Partial correlations showed that each of the parameters contributes separately to mRNA half life. Notably, mRNAs of recently-acquired genes in the E. coli genome, which have a distinct nucleotide composition, tend to be highly stable. This high stability may aid the evolutionary fixation of horizontally acquired genes.
In recent years, both homing endonucleases (HEases) and zinc-finger nucleases (ZFNs) have been engineered and selected for the targeting of desired human loci for gene therapy. However, enzyme engineering is lengthy and expensive and the off-target effect of the manufactured endonucleases is difficult to predict. Moreover, enzymes selected to cleave a human DNA locus may not cleave the homologous locus in the genome of animal models because of sequence divergence, thus hampering attempts to assess the in vivo efficacy and safety of any engineered enzyme prior to its application in human trials. Here, we show that naturally occurring HEases can be found, that cleave desirable human targets. Some of these enzymes are also shown to cleave the homologous sequence in the genome of animal models. In addition, the distribution of off-target effects may be more predictable for native HEases. Based on our experimental observations, we present the HomeBase algorithm, database and web server that allow a high-throughput computational search and assignment of HEases for the targeting of specific loci in the human and other genomes. We validate experimentally the predicted target specificity of candidate fungal, bacterial and archaeal HEases using cell free, yeast and archaeal assays.
Horizontal gene transfer (HGT) is a major force in microbial evolution. Previous studies have suggested that a variety of factors, including restricted recombination and toxicity of foreign gene products, may act as barriers to the successful integration of horizontally transferred genes. This study identifies an additional central barrier to HGT—the lack of co-adaptation between the codon usage of the transferred gene and the tRNA pool of the recipient organism. Analyzing the genomic sequences of more than 190 microorganisms and the HGT events that have occurred between them, we show that the number of genes that were horizontally transferred between organisms is positively correlated with the similarity between their tRNA pools. Those genes that are better adapted to the tRNA pools of the target genomes tend to undergo more frequent HGT. At the community (or environment) level, organisms that share a common ecological niche tend to have similar tRNA pools. These results remain significant after controlling for diverse ecological and evolutionary parameters. Our analysis demonstrates that there are bi-directional associations between the similarity in the tRNA pools of organisms and the number of HGT events occurring between them. Similar tRNA pools between a donor and a host tend to increase the probability that a horizontally acquired gene will become fixed in its new genome. Our results also suggest that frequent HGT may be a homogenizing force that increases the similarity in the tRNA pools of organisms within the same community.
Inteins are parasitic genetic elements, analogous to introns that excise themselves at the protein level by self-splicing, allowing the formation of functional non-disrupted proteins. Many inteins contain a homing endonuclease (HEN) gene, and rely on its activity for horizontal propagation. In the halophilic archaeon, Haloferax volcanii, the gene encoding DNA polymerase B (polB) contains an intein with an annotated but uncharacterized HEN. Here we examine the activity of the polB HEN in vivo, within its natural archaeal host. We show that this HEN is highly active, and able to insert the intein into both a chromosomal target and an extra-chromosomal plasmid target, by gene conversion. We also demonstrate that the frequency of its incorporation depends on the length of the flanking homologous sequences around the target site, reflecting its dependence on the homologous recombination machinery. Although several evolutionary models predict that the presence of an intein involves a change in the fitness of the host organism, our results show that a strain deleted for the intein sequence shows no significant changes in growth rate compared to the wild type.
Amadori-modified proteins (AMPs) are the products of nonenzymatic glycation formed by reaction of reducing sugars with primary amine-containing amino acids and can develop into advanced glycated end products (AGEs), highly stable toxic compounds. AGEs are known to participate in many age-related human diseases, including cardiovascular, neurological, and liver diseases. The metabolism of these glycated proteins is not yet understood, and the mechanisms that reduce their accumulation are not known so far. Here, we show for Escherichia coli that a conserved glycopeptidase (Gcp, also called Kae1), which is encoded by nearly every sequenced genome in the three domains of life, prevents the accumulation of Amadori products and AGEs. Using mutants, we show that Gcp depletion results in accumulation of AMPs and eventually leads to the accumulation of AGEs. We demonstrate that Gcp binds to glycated proteins, including pyruvate dehydrogenase, previously shown to be a glycation-prone enzyme. Our experiments also show that the severe phenotype of Gcp depletion can be relieved under conditions of low intracellular glycation. As glycated proteins are ubiquitous, the involvement of Gcp in the metabolism of AMPs and AGEs is likely to have been conserved in evolution, suggesting a universal involvement of Gcp in cellular aging and explaining the essentiality of Gcp in many organisms.
Glycated proteins (Amadori-modified proteins [AMPs] and advanced glycated end products [AGEs]) are known to participate in many age-related diseases. Their existence in fast-growing organisms was considered unlikely, as their formation was assumed to be slow. Yet, recent evidence demonstrated their existence in bacteria, and our data suggest a bacterial mechanism that reduced their accumulation. We identify in Escherichia coli a protein, Gcp, which carries out this function. Gcp is conserved in all domains of life and is essential in many organisms. Although it was annotated as a chaperon protease, there were no experimental data to support this function. Our findings are compatible with the annotation and will open up studies of the bacterial metabolism of glycated proteins. Furthermore, the data from the bacterial systems may also be instrumental in understanding the metabolism of glycated proteins, including their toxicity in human health and disease.
We propose a method for deriving enzymatic signatures from short read metagenomic data of unknown species. The short read data are converted to six pseudo-peptide candidates. We search for occurrences of Specific Peptides (SPs) on the latter. SPs are peptides that are indicative of enzymatic function as defined by the Enzyme Commission (EC) nomenclature. The number of SP hits on an ensemble of short reads is counted and then converted to estimates of numbers of enzymatic genes associated with different EC categories in the studied metagenome. Relative amounts of different EC categories define the enzymatic spectrum, without the need to perform genomic assemblies of short reads.
The method is developed and tested on 22 bacteria for which there exist many EC annotations in Uniprot. Enzymatic signatures are derived for 3 metagenomes, and their functional profiles are explored.
We extend the SP methodology to taxon-specific SPs (TSPs), allowing us to estimate taxonomic features of metagenomic data from short reads. Using recent Swiss-Prot data we obtain TSPs for different phyla of bacteria, and different classes of proteobacteria. These allow us to analyze the major taxonomic content of 4 different metagenomic data-sets.
The SP methodology can be successfully extended to applications on short read genomic and metagenomic data. This leads to direct derivation of enzymatic signatures from raw short reads. Furthermore, by employing TSPs, one obtains valuable taxonomic information.
Carcinoembryonic antigen-related cell adhesion molecule 1 (CEACAM1), an immunoglobulin (Ig)-related glycoprotein, serves as cellular receptor for a variety of Gram-negative bacterial pathogens associated with the human mucosa. In particular, Neisseria gonorrhoeae, N. meningitidis, Moraxella catarrhalis, and Haemophilus influenzae possess well-characterized CEACAM1-binding adhesins. CEACAM1 is typically involved in cell-cell attachment, epithelial differentiation, neovascularisation and regulation of T-cell proliferation, and is one of the few CEACAM family members with homologues in different mammalian lineages. However, it is unknown whether bacterial adhesins of human pathogens can recognize CEACAM1 orthologues from other mammals.
Sequence comparisons of the amino-terminal Ig-variable-like domain of CEACAM1 reveal that the highest sequence divergence between human, murine, canine and bovine orthologues is found in the β-strands comprising the bacteria-binding CC'FG-face of the Ig-fold. Using GFP-tagged, soluble amino-terminal domains of CEACAM1, we demonstrate that bacterial pathogens selectively associate with human, but not other mammalian CEACAM1 orthologues. Whereas full-length human CEACAM1 can mediate internalization of Neisseria gonorrhoeae in transfected cells, murine CEACAM1 fails to support bacterial internalization, demonstrating that the sequence divergence of CEACAM1 orthologues has functional consequences with regard to bacterial recognition and cellular invasion.
Our results establish the selective interaction of several human-restricted bacterial pathogens with human CEACAM1 and suggest that co-evolution of microbial adhesins with their corresponding receptors on mammalian cells contributes to the limited host range of these highly adapted infectious agents.
In their natural environments, microorganisms form complex systems of interactions. Understating the structure and organization of bacterial communities is likely to have broad medical and ecological consequences, yet a comprehensive description of the network of environmental interactions is currently lacking. Here, we mine co-occurrences in the scientific literature to construct such a network and demonstrate an expected pattern of association between the species’ lifestyle and the recorded number of co-occurring partners. We further focus on the well-annotated gut community and show that most co-occurrence interactions of typical gut bacteria occur within this community. The network is then clustered into species-groups that significantly correspond with natural occurring communities. The relationships between resource competition, metabolic yield and growth rate within the clusters correspond with the r/K selection theory. Overall, these results support the constructed clusters as a first approximation of a bacterial ecosystem model. This comprehensive collection of predicted communities forms a new data resource for further systematic characterization of the ecological design principals shaping communities. Here, we demonstrate its utility for predicting cooperation and inhibition within communities.
The evolutionary origins of genetic robustness are still under debate: it may arise as a consequence of requirements imposed by varying environmental conditions, due to intrinsic factors such as metabolic requirements, or directly due to an adaptive selection in favor of genes that allow a species to endure genetic perturbations. Stratifying the individual effects of each origin requires one to study the pertaining evolutionary forces across many species under diverse conditions. Here we conduct the first large-scale computational study charting the level of robustness of metabolic networks of hundreds of bacterial species across many simulated growth environments. We provide evidence that variations among species in their level of robustness reflect ecological adaptations. We decouple metabolic robustness into two components and quantify the extents of each: the first, environmental-dependent, is responsible for at least 20% of the non-essential reactions and its extent is associated with the species' lifestyle (specialized/generalist); the second, environmental-independent, is associated (correlation = ∼0.6) with the intrinsic metabolic capacities of a species—higher robustness is observed in fast growers or in organisms with an extensive production of secondary metabolites. Finally, we identify reactions that are uniquely susceptible to perturbations in human pathogens, potentially serving as novel drug-targets.
When a species is grown under optimal conditions the single-knockout of most of its genes is not likely to affect its viability. The resilience of biological systems to mutations is termed genetic robustness and its extent across different species has not yet been systematically described. Since the deletion of a gene can have varying consequences depending on the environmental conditions, the extent of species' genetic robustness reflects both the range of conditions (or environments) in which it can survive as well as the availability of alternative cellular routes (compensating for a gene's loss of function). Here, we developed a computational model for estimating the essentiality of metabolic reactions across natural-like environments and applied it to chart species' level of genetic robustness, providing the first systematic description of genetic robustness across species. Studying robustness across a wide collection of natural-like environments enables one to stratify, for each species individually, the extent of environmental-dependant and independent robustness and hence advances our understanding of its evolutionary origins. Our main finding is that the level of environmental dependent robustness is associated with the lifestyle of a species (i.e., specialists versus generalist), whereas the level of environmental-independent robustness is associated with its metabolic production capacities.
Thymidylate synthases (Thy) are key enzymes in the synthesis of deoxythymidylate, 1 of the 4 building blocks of DNA. As such, they are essential for all DNA-based forms of life and therefore implicated in the hypothesized transition from RNA genomes to DNA genomes. Two evolutionally unrelated Thy enzymes, ThyA and ThyX, are known to catalyze the same biochemical reaction. Both enzymes are sporadically distributed within each of the 3 domains of life in a pattern that suggests multiple nonhomologous lateral gene transfer (LGT) events. We present a phylogenetic analysis of the evolution of the 2 enzymes, aimed at unraveling their entangled evolutionary history and tracing their origin back to early life. A novel probabilistic evolutionary model was developed, which allowed us to compute the posterior probabilities and the posterior expectation of the number of LGT events. Simulation studies were performed to validate the model's ability to accurately detect LGT events, which have occurred throughout a large phylogeny. Applying the model to the Thy data revealed widespread nonhomologous LGT between and within all 3 domains of life. By reconstructing the ThyA and ThyX gene trees, the most likely donor of each LGT event was inferred. The role of viruses in LGT of Thy is finally discussed.
Evolutionary models; lateral gene transfer; thymidylate synthase
Probabilistic evolutionary models revolutionized our capability to extract biological insights from sequence data. While these models accurately describe the stochastic processes of site-specific substitutions, single-base substitutions represent only a fraction of all the events that shape genomes. Specifically, in microbes, events in which entire genes are gained (e.g. via horizontal gene transfer) and lost play a pivotal evolutionary role. In this research, we present a novel likelihood-based evolutionary model for gene gains and losses, and use it to analyse genome-wide patterns of the presence and absence of gene families. The model assumes a Markovian stochastic process, where gains and losses are represented by the transition between presence and absence, respectively, given an underlying phylogenetic tree. To account for differences in the rates of gain and loss of different gene families, we assume among-gene family rate variability, thus allowing for more accurate description of the data. Using the Bayesian approach, we estimated an evolutionary rate for each gene family. Simulation studies demonstrated that our methodology accurately infers these rates. Our methodology was applied to analyse a large corpus of data, consisting of 4873 gene families spanning 63 species and revealed novel insights regarding the evolutionary nature of genome-wide gain and loss dynamics.
phyletic pattern; probabilistic evolutionary models; genome evolution; gene gain and loss; horizontal gene transfer; gene content
Bacterial ecological strategies revealed by metabolic network analysis show that ecological diversity correlates with metabolic flexibility, faster growth rate and intense co-habitation.
The growth-rate of an organism is an important phenotypic trait, directly affecting its ability to survive in a given environment. Here we present the first large scale computational study of the association between ecological strategies and growth rate across 113 bacterial species, occupying a variety of metabolic habitats. Genomic data are used to reconstruct the species' metabolic networks and habitable metabolic environments. These reconstructions are then used to investigate the typical ecological strategies taken by organisms in terms of two basic species-specific measures: metabolic variability - the ability of a species to survive in a variety of different environments; and co-habitation score vector - the distribution of other species that co-inhabit each environment.
We find that growth rate is significantly correlated with metabolic variability and the level of co-habitation (that is, competition) encountered by an organism. Most bacterial organisms adopt one of two main ecological strategies: a specialized niche with little co-habitation, associated with a typically slow rate of growth; or ecological diversity with intense co-habitation, associated with a typically fast rate of growth.
The pattern observed suggests a universal principle where metabolic flexibility is associated with a need to grow fast, possibly in the face of competition. This new ability to produce a quantitative description of the growth rate-metabolism-community relationship lays a computational foundation for the study of a variety of aspects of the communal metabolic life.
Laterally transferred genes are shown to be less involved in protein-protein interactions, and essential genes that exhibit low duplicability and high connectivity do exhibit mostly vertical descent.
Lateral gene transfer is a major force in microbial evolution and a great source of genetic innovation in prokaryotes. Protein complexity has been claimed to be a barrier for gene transfer, due to either the inability of a new gene's encoded protein to become a subunit of an existing complex (lack of positive selection), or from a harmful effect exerted by the newcomer on native protein assemblages (negative selection).
We tested these scenarios using data from the model prokaryote Escherichia coli. Surprisingly, the data did not support an inverse link between membership in protein complexes and gene transfer. As the complexity hypothesis, in its strictest sense, seemed valid only to essential complexes, we broadened its scope to include connectivity in general. Transferred genes are found to be less involved in protein-protein interactions, outside stable complexes, and this is especially true for genes recently transferred to the E. coli genome. Thus, subsequent to transfer, new genes probably integrate slowly into existing protein-interaction networks. We show that a low duplicability of a gene is linked to a lower chance of being horizontally transferred. Notably, many essential genes in E. coli are conserved as singletons across multiple related genomes, have high connectivity and a highly vertical phylogenetic signal.
High complexity and connectivity generally do not impede gene transfer. However, essential genes that exhibit low duplicability and high connectivity do exhibit mostly vertical descent.
A leading hypothesis for the role of bacteria in inflammatory bowel diseases is that an imbalance in normal gut flora is a prerequisite for inflammation. Testing this hypothesis requires comparisons between the microbiota compositions of ulcerative colitis and Crohn's disease patients and those of healthy individuals. In this study, we obtained biopsy samples from patients with Crohn's disease and ulcerative colitis and from healthy controls. Bacterial DNA was extracted from the tissue samples, amplified using universal bacterial 16S rRNA gene primers, and cloned into a plasmid vector. Insert-containing colonies were picked for high-throughput sequencing, and sequence data were analyzed, yielding species-level phylogenetic data. The clone libraries yielded 3,305 sequenced clones, representing 151 operational taxonomical units. There was no significant difference between floras from inflamed and healthy tissues from within the same individual. Proteobacteria were significantly (P = 0.0007) increased in Crohn's disease patients, as were Bacteroidetes (P < 0.0001), while Clostridia were decreased in that group (P < 0.0001) in comparison with the healthy and ulcerative colitis groups, which displayed no significant differences. Thus, the bacterial flora composition of Crohn's patients appears to be significantly altered from that of healthy controls, unlike that of ulcerative colitis patients. Imbalance in flora in Crohn's disease is probably not sufficient to cause inflammation, since microbiotas from inflamed and noninflamed tissues were of similar compositions within the same individual.
The type III secretion system (T3SS) is an important virulence factor used by several gram-negative bacteria to deliver effector proteins which subvert host cellular processes. Enterohemorrhagic Escherichia coli O157 has a well-defined T3SS involved in attachment and effacement (ETT1) and critical for virulence. A gene cluster potentially encoding an additional T3SS (ETT2), which resembles the SPI-1 system in Salmonella enterica, was found in its genome sequence. The ETT2 gene cluster has since been found in many E. coli strains, but its in vivo role is not known. Many of the ETT2 gene clusters carry mutations and deletions, raising the possibility that they are not functional. Here we show the existence in septicemic E. coli strains of an ETT2 gene cluster, ETT2sepsis, which, although degenerate, contributes to pathogenesis. ETT2sepsis has several premature stop codons and a large (5 kb) deletion, which is conserved in 11 E. coli strains from cases of septicemia and newborn meningitis. A null mutant constructed to remove genes coding for the putative inner membrane ring of the secretion complex exhibited significantly reduced virulence. These results are the first demonstration of the importance of ETT2 for pathogenesis.
There are many ways to group completed genome sequences in hierarchical patterns (trees) reflecting relationships between their genes. Such groupings help us organize biological information and bear crucially on underlying processes of genome and organismal evolution. Genome trees make use of all comparable genes but can variously weight the contributions of these genes according to similarity, congruent patterns of similarity, or prevalence among genomes. Here we explore such possible weighting strategies, in an analysis of 142 prokaryotic and 5 eukaryotic genomes. We demonstrate that alternate weighting strategies have different advantages, and we propose that each may have its specific uses in systematic or evolutionary biology. Comparisons of results obtained with different methods can provide further clues to major events and processes in genome evolution.
Extraintestinal pathogenic Escherichia coli strains (ExPEC) are the cause of a diverse spectrum of invasive infections in humans and animals, and these infections often lead to septicemia. Strains of serogroups O2 and O78 of E. coli are involved in human urinary tract infections and newborn meningitis and also constitute the major serotypes involved in avian colisepticemia. In the present study we compared the unique genomic sequences of two such septicemic strains, strains O2-1772 and O78-9, obtained by suppression subtractive hybridization. Evaluation of the degree of similarity between these two strains, which cause the same disease, revealed a high degree of diversity, with only a few shared genes. Subsequently, additional strains of each serogroup of human and animal origin were screened by PCR, and the results provided further evidence for the existence of a high degree of genome plasticity. These results were unexpected, in view of data showing that the two O157:H7 strains that have been sequenced are nearly identical in terms of virulence factors. Furthermore, the data obtained for the septicemic strains suggest that each step in the infection can be mediated by a number of alternative virulence factors, indicating the existence of a mix-and-match combinatorial system. Although whole-genome comparisons of E. coli strains causing different diseases have shown great differences in gene contents, we show that such differences exist even within strains that cause the same disease and that target the same host tissues. Moreover, in addition to the high level of genome plasticity, we show that the large pool of virulence genes in the septicemic strains is independent of the host, implying a high degree of zoonotic risk.