|Home | About | Journals | Submit | Contact Us | Français|
Shuffling and disruption of operons and horizontal gene transfer are major contributions to the new, dynamic view of prokaryotic evolution. Under the 'selfish operon' hypothesis, operons are viewed as mobile genetic entities that are constantly disseminated via horizontal gene transfer, although their retention could be favored by the advantage of coregulation of functionally linked genes. Here we apply comparative genomics and phylogenetic analysis to examine horizontal transfer of entire operons versus displacement of individual genes within operons by horizontally acquired orthologs and independent assembly of the same or similar operons from genes with different phylogenetic affinities.
Since a substantial number of operons have been identified experimentally in only a few model bacteria, evolutionarily conserved gene strings were analyzed as surrogates of operons. The phylogenetic affinities within these predicted operons were assessed first by sequence similarity analysis and then by phylogenetic analysis, including statistical tests of tree topology. Numerous cases of apparent horizontal transfer of entire operons were detected. However, it was shown that apparent horizontal transfer of individual genes or arrays of genes within operons is not uncommon either and results in xenologous gene displacement in situ, that is, displacement of an ancestral gene by a horizontally transferred ortholog from a taxonomically distant organism without change of the local gene organization. On rarer occasions, operons might have evolved via independent assembly, in part from horizontally acquired genes.
The discovery of in situ gene displacement shows that combination of rampant horizontal gene transfer with selection for preservation of operon structure provides for events in prokaryotic evolution that, a priori, seem improbable. These findings also emphasize that not all aspects of operon evolution are selfish, with operon integrity maintained by purifying selection at the organism level.
Operons, clusters of co-transcribed genes that often encode functionally linked proteins, are the principal form of gene organization and regulation in prokaryotes [1,2]. Comparative analysis of bacterial and archaeal genomes has shown that only a few operons are conserved across large evolutionary distances. In general, gene order in prokaryotes is poorly conserved and prone to numerous rearrangements [3-6]. A detailed analysis of gene order conservation has shown that only 5-25% of the genes in bacterial and archaeal genomes belongs to gene strings (probable operons) shared by at least two distantly related species . The presence of identical or similarly organized operons and suboperons in phylogenetically distant bacterial or archaeal lineages may result from three distinct evolutionary processes. Firstly, inheritance from the respective common ancestor - the core of the ribosomal protein superoperon is a case in point, but such conservation of operon organization is relatively rare; secondly, independent origin of identical operons or suboperons in different lineages; and thirdly, emergence of operons in a single lineage with subsequent dissemination by horizontal transfer. The potential central role of horizontal transfer in the evolution of operon organization of prokaryotic genomes is embodied in the 'selfish operon model' (SOM) [8-10]. This model posits that "the physical proximity of genes in an operon provides no selective benefit to the individual organism but does enhance the fitness of the gene cluster itself, as clusters can be efficiently inherited horizontally as well as vertically" . Under SOM, operons are conceptually analogous to integrating viruses (phages), transposons and other mobile genetic elements, although coregulation of the genes in an operon could be an important selective factor that favors retention of operons during evolution.
Horizontal gene transfer (HGT) events have been classified into distinct categories of acquisition of new genes, acquisition of paralogs of existing genes and xenologous gene displacement whereby a gene is displaced by a horizontally transferred ortholog from another lineage (xenolog ). Each of these types of horizontal transfer is common among prokaryotes, but their relative contributions differ in different lineages . Comparative-genomic analyses by many groups have suggested that, on the whole, horizontal gene transfer had substantial effects, albeit uneven in different lineages, on the gene content of bacterial and archaeal genomes [13-19]. However, in spite of the considerable popularity of the selfish operon theory, we are unaware of systematic studies of horizontal gene transfer events at the level of operons. In part, this is likely to have been caused by the scarcity of experimental data on operon organization in any prokaryote other than Escherichia coli.
Recent phylogenetic analyses of ribosomal proteins revealed several instances of apparent xenologous gene displacement within a conserved operon, in which other genes have not been horizontally transferred; in other words, these operons appear to represent an evolutionary mosaic [20-22]. Another study demonstrated a complicated mosaic organization of the leukotoxin operon in bacteria of the genus Mannheimia (Pasteurella); the observed evolutionary pattern had to be explained through multiple gene transfer events, which led to the hypothesis that, in this case, frequent gene displacement conferred selective advantage onto the bacterium by maintaining antigenic variation . In earlier studies, evolution of operons from gene blocks with distinct evolutionary fates has been considered for rfb operons coding for lipopolysaccharide biosynthesis in enterobacteria .
To assess the role of horizontal gene transfer in the evolution of operons systematically, we undertook phylogenetic analysis of members of highly conserved gene neighborhoods that are predicted to constitute operons . We focused primarily on mosaic operons in which one or more of the genes apparently have been transferred from distantly related species such that the phylogeny of the transferred genes is obviously incongruent with the phylogeny of the remaining genes in the respective operons.
Experimental data on operons in organisms other than E. coli and, to a lesser extent, B. subtilis are scarce. Therefore we used conserved gene pairs and connected gene neighborhoods associated with them as an approximation of operon organization of genes in other prokaryotic genomes. Several studies have suggested strongly that all gene pairs that are conserved in multiple genomes belong to the same operon [7,25,26]. Here we used an extremely conservative threshold (conservation of a gene pair in 10 genomes) to ensure that only genuine operons were analyzed. BLASTP searches for potential horizontal gene transfer identified 729 candidate genes (9% of all genes comprising conserved neighborhoods in 41 analyzed genomes), that is, genes whose encoded protein sequences were more similar to homologs from phylogenetically distant taxa than to those from the reference taxon (it might be worth noting that, throughout this analysis, we treated genes as atomic units and did not consider the relatively unlikely possibility of HGT for portions of genes). Phylogenetic analysis of these genes and their neighbors revealed different types of evolutionary events, some of which involve whole operons, whereas others seem to reflect operon mosaicity.
Probable horizontal transfer of whole operons or large portions of operons, when phylogenetic trees for all genes in a predicted operon had the same topology (which, however, was incompatible with the species tree) was identified in 35 neighborhoods - approximately one third of all analyzed neighborhoods. These events were classified into three categories: acquisition of a new (for the given lineage) operon, paralogous operon acquisition and xenologous operon displacement . Examples of all these classes of apparent operon transfer events are given in Table Table1.1. These 35 neighborhoods generally represented functional classes of genes known to be prone to HGT: transporters, general metabolism-related genes and signal transduction systems [13,15,17]. This seems to be a relatively low level of horizontal transfer in view of the purported selfish behavior of operons [9,10]. However, the strict threshold, described above, on the detection of conserved gene pairs undoubtedly led to many horizontally transferred operons being missed. Thus, the present analysis gives a conservative low bound of operon transfer.
In addition, 19 predicted operons with different phylogenetic affinities of the constituent genes, that is, apparent mosaic operons, were identified (Table (Table2).2). Again, this is definitely a low bound - not only because of the high threshold set for the identification of conserved gene pairs, but also because this number includes only cases that were clearly resolved by phylogenetic tree analysis. In addition, we detected many uncertain cases where the different phylogenetic affinities of genes within an operon were not strongly supported (data not shown); at least some of these are probably also mosaic operons.
Below we describe in greater detail several case studies of putative mosaic operons; in each of these cases, in addition to the basic set of 41 species, we included in the analysis the apparent orthologs of the respective proteins from all prokaryotic species in which they were detected, in order to control for possible effects of taxon sampling. We found that, although the details of tree topology inevitably depended on the set of species analyzed, the conclusions regarding HGT were not affected by the inclusion of additional species.
In the previous study that prompted this work, we analyzed the phylogeny of several ribosomal proteins and found several cases of apparent horizontal transfer resulting in mosaic operon organization . Horizontal transfer "in the heart of the ribosome" also has been independently described by others [21,22]. Here we report another case of a ribosomal protein operon with apparent in situ gene displacement (that is, displacement without change of the local gene arrangement) via HGT. Figure Figure1a1a shows the highly conserved gene arrangement around the gene for the large subunit protein L29. The phylogenetic trees for the flanking L16 and S17 genes showed largely congruent topologies without any indications of HGT (Figure 1b,d). In contrast in the L29 tree, unexpected clustering is seen for Aquifex aeolicus and both Rickettsia: the Aquifex branch is within the archaeal cluster, whereas the Rickettsia group is with Chlamydia, rather than with the rest of alpha-proteobacteria: the taxon where Rickettsia belong (Figure (Figure1c).1c). In situ displacement is the most likely mechanism behind this observation given that the structure of this operon is conserved in the majority of bacteria. The nature of the selective advantages conferred by this gene substitution is unclear, but the apparent sources of the transferred genes suggest that the displacements indeed might be adaptive. Aquifex apparently acquired the L29 gene from archaea, which could be related to the adaptation to the hyperthermal conditions, whereas Rickettsia probably captured the gene from other parasitic bacteria, such as Chlamydia. However, these observations also allow a non-adaptationist interpretation, under which the apparent source of acquired genes simply reflects the increased likelihood of gene exchange between the respective organisms due to co-habitation, with chance fixation of some of the transferred genes.
The genes for Holliday junction resolvase subunits RuvA and RuvB form an operon that is conserved in most of the sequenced bacterial genomes (Figure (Figure2a).2a). In the phylogenetic trees for RuvA and RuvB, the branch that includes Ureaplasma and Mycoplasma occupies drastically different positions. In contrast to RuvA, which belongs to the Gram-positive clade as expected (Figure (Figure2b),2b), mycoplasmal RuvB clusters with the epsilon-proteobacteria (Helicobacter and Campylobacter) and the mycoplasma-epsilon-proteobacteria clade further joins alpha-proteobacteria (Figure (Figure2c).2c). This clustering is strongly supported by bootstrap analysis and was shown to be robust using statistical tests of tree topology (Table (Table3).3). Thus, the ruvB gene seems to have undergone xenologous displacement in situ after the divergence of the mycoplasmal branch from the rest of Gram-positive bacteria. Notably, the gene exchange seems to have occurred between phylogenetically distant parasitic bacteria.
In Rickettsia, the undecaprenyl pyrophosphate synthase gene (uppS), which belongs to a highly conserved doublet of lipid biosynthesis genes embedded in functionally diverse operons (Figure (Figure3a),3a), clusters with an unexpected assemblage of bacterial orthologs, including those from the spirochete Treponema pallidum and Fusobacterium nucleatum, but not with the 'native' taxon, alpha-proteobacteria (Figure 3b,c). Statistical testing of the tree topology showed that clustering of rickettsial uppS with those from other alpha-proteobacteria is highly unlikely (Table (Table3).3). The apparent in situ gene displacement of the uppS gene in Rickettsia was accompanied by a breakdown of the operon into three fragments (Figure (Figure3a).3a). The topology of the uppS tree suggests the possibility of multiple HGT events, although only the rickettsial genomes show evidence of gene displacement in situ. The emergence of gene displacement in bacterial parasites is noted here again.
Gene organization in the NADH:ubiquinone oxidoreductase operon is highly conserved in all sequenced archaeal genomes and those of several groups of bacteria (Figure (Figure4a).4a). The nuoI gene of Halobacterium sp. shows an unexpected phylogenetic affinity with proteobacteria (Figure (Figure4c),4c), whereas the neighboring genes have the regular archaeal affinities (Figure 4b,d). The unusual phylogeny of halobacterial NuoI, which was strongly supported by statistical tests (Table (Table3),3), suggests in situ displacement by a proteobacterial gene. Notably, all three NADH:ubiquinone oxidoreductase subunits of the cyanobacteria unexpectedly grouped within the archaeal clusters of the respective trees (Figure 4b-d). These observations point to a complex history of HGT for the genes encoding all subunits of NADH:ubiquinone oxidoreductase.
The genes of the lipopolysaccharide biosynthesis (rfbABCD) operon appear to have been extensively and independently shuffled in many prokaryotic genomes and might have undergone multiple horizontal transfers. This conclusion is supported both by examination of the operon organization (Figure (Figure5a)5a) and by phylogenetic tree analysis (Figure 5b-e). The trees showed a clear affinity between the rfbA, rfbB, rfbC genes of Methanothermobacter thermoautotrophicum and Clostridium acetobutylicum (Figure 5b-d), with Fusobacterium nucleatum and Listeria monocytogenes joining the cluster in the case of rfbB (Figure (Figure5b),5b), whereas M. thermoautotrophicum RfbD clustered with its archaeal orthologs as expected (Figure (Figure5e).5e). The genes of the rfbABCD operon in Methanothermobacter are shuffled compared to the probable ancestral order, which is found in many bacteria and C. acetobutylicum also shows a rearrangement (Figure (Figure5a).5a). One likely scenario in this case is that M. thermoautotrophicum acquired the rfbABCD operon with the typical gene order from a bacterium of the clostridial lineage, which was followed by displacement of three resident genes and loss of one of the invading genes, accompanied by operon rearrangement. An alternative scenario is that the rearrangement occurred in the source bacterium of the clostridial group and Methanothermobacter acquired only the rfbACB portion, which might have inserted head-to-tail downstream of the original operon, followed by elimination of the resident rfbABC (Figure (Figure5a5a).
Another interesting case of mosaic structure of the same operon is seen in Deinococcus radiodurans (Figure (Figure5a).5a). Deinococcus RfbA shows clear affinity with proteobacteria (Figure (Figure5d),5d), whereas RfbD is of archaeal descent (Figure (Figure5e),5e), with RELL analysis revealing no competing topologies (Table (Table3).3). The remaining two genes of this operon in Deinococcus, rfbB (DRA0041) and rfbC (DRA0043), have uncertain phylogenetic affinities (Figure 5b,5c). Thus, as in the case of M. thermoautotrophicus, this operon in Deinococcus was apparently formed through at least two events of xenologous gene displacement in situ and gene shuffling.
Perhaps the most prominent case of mosaic operon organization is the leucine/isoleucine biosynthesis operon of several bacteria and archaea, particularly Thermotoga maritima. This is the only known branched chain amino acid biosynthesis operon, and it is partly conserved in a wide range of bacteria (Figure (Figure6a).6a). Following initial indications from the analysis of taxon-specific BLAST hits, we constructed phylogenetic trees for each of the genes of this operon. Unlike other bacteria, Thermotoga has two leuA paralogs, which are adjacent in the operon. The proteins encoded by these paralogous genes show clearly distinct phylogenetic affinities: TM0552 belongs to a distinct clade within the archaeal domain, whereas TM0553 is part of a Gram-positive bacterial cluster (Figure (Figure6b).6b). This phylogenetic mosaic in Thermotoga extends further, with LeuB (TM0556) clustering with proteobacterial orthologs (Figure (Figure6c),6c), and LeuC (TM0554) and LeuD (TM0555) with archaeal orthologs (Figure 6d,e). All these affinities were strongly supported by two versions of bootstrap analysis (Table (Table3).3). The genes encoding LeuA, LeuC, and LeuD from Thermotoga, Clostridium, Aquifex and both Pyrococcus abyssi and P. furiosus belong to a well-defined clade, which also includes a medley of alpha-proteobacteria and cyanobacteria, within the archaeal domain in the respective trees (Figure 6b-e). Thus, this sub-operon apparently has been relatively recently horizontally spread among these organisms. Pyrococcus abyssi and P. furiosus probably acquired these genes after the divergence from the common ancestor with P. horikoshii because the latter has only the typical archaeal operon (Figure (Figure6a6a).
Given the apparent propensity of Thermotoga (and other hyperthermophilic bacteria) for acquisition of archaeal genes via HGT, it seems most likely that the archaeal version of the leuACD suboperon originally entered the bacterial domain via Thermotoga or a related thermophilic bacterium. Formally, in Thermotoga these events could be classified as a combination of paralogous (sub)operon acquisition (TM0554-TM0555 in addition to another paralogous archaeal gene pair TM0291-TM0292) and xenologous gene displacements (genes TM0553, TM0556). In Clostridium, xenologous operon displacement seems to have occurred because the ancestral operon of the Gram-positive type apparently had been lost. The subsequent evolution of this operon in the four organisms proceeded along different paths. Aquifex has lost the operon structure even for the two subunits of 3-isopropylmalate dehydratase (LeuB, LeuD). Different genes in the operons of P. abyssi and C. acetobutylicum have been translocated and several genes probably have been independently accrued (Figure (Figure6a).6a). In both P. abyssi and Thermotoga, the original leuA and leuB genes within the leuABDC core seem to have been independently displaced by bacterial orthologs without a clear affinity with any specific bacterial lineage (Figure (Figure6a).6a). The most likely scenario for evolution of this operon in Thermotoga is that it originated as a Gram-positive type operon and subsequently many genes (or sub-operons) have been displaced in situ through multiple horizontal transfers and a few additional genes have been inserted into the preexisting structure. The alternative but less likely hypothesis involves independent, de novo operon assembly from genes of different phylogenetic affinities. Several other apparent HGT events were detected during the analysis of the phylogenetic trees for leucine biosynthesis genes (DR1614 in LeuD tree, DR1610 in LeuC tree (Figure 6d,e)) but, in these cases, the acquired genes do not belong to conserved operons.
Intragenomic plasticity and inter-species horizontal mobility of operons are thought to be important facets of prokaryotic genome evolution. Indeed, the results presented here indicate that horizontal transfer of entire operons is the most likely explanation for most of the findings of co-localized 'alien' genes in a genome, which is generally consistent with SOM. However, a substantial fraction - approximately 35% - of operons with indications of horizontal transfer events appear to consist of genes with different phylogenetic affinities. Barring artifacts of phylogenetic analysis, which can never be ruled out completely, but appear unlikely given the strong statistical support for the anomalous placement of the genes in question in phylogenetic trees, two evolutionary scenarios for the origin of such mosaic operons are conceivable. The first involves de novo assembly of operons, in part from genes acquired via HGT, whereas the second one postulates in situ xenologous displacement of genes within a resident operon. Analysis of mosaic operons suggested that both scenarios might apply, but in situ displacement is likely to be more frequent. In several cases, in situ displacement seems to have occurred between genomes of distantly related parasitic bacteria that might have shared a host. A sequence of events that is often considered as an alternative to HGT is an ancient duplication with subsequent differential loss of paralogs. However, in the cases analyzed here, this seems to be a particularly remote possibility because a tandem duplication followed by lengthy evolution of both paralogs within the operon would be required to mimic in situ displacement. Tandem pairs of paralogs are uncommon in operons and such a 'smoking gun' was not observed in any of the suspected cases of in situ displacement.
At first glance, in situ gene displacement seems highly unlikely: given the vast evolutionary distance separating the donor and recipient genomes, homologous recombination is out of the question. In cases when the displacing gene(s) is located on the periphery of an operon (for example, Figure Figure5a),5a), a plausible mechanism could involve initial insertion of the invading gene in the vicinity of the resident operon, followed by deletion of intervening genes (provided these are non-essential). However, when the displacing gene is tucked between resident ones (for example, Figures Figures4a,4a, ,6a),6a), displacement must have occurred with surgical precision. The only conceivable explanation seems to be that HGT is extremely common in the evolution of prokaryotes and so is intragenomic recombination, which provides for rare chance occurrences of in situ displacement. Conceivably, a horizontally acquired gene that displaces the resident ortholog without disruption of operon organization would have its chances of evolutionary fixation greatly increased, hence the apparent disproportional survival of the displacing genes. This explanation does not refute SOM as the conceptual framework explaining the origin of operons but emphasizes the 'altruistic' aspect of the evolution of operons whereby the operon integrity is maintained by strong purifying selection at the organism level.
Amino acid sequences from 41 completely sequenced prokaryotic genomes were extracted from the Genome division of the Entrez retrieval system  and used as the master species set for this analysis. Bacterial species abbreviations: Aquifex aeolicus (Aae), Bacillus halodurans (Bha), Bacillus subtilis (Bsu), Streptococcus pyogenes (Spy), Staphylococcus aureus (Sau), Clostridium acetobutylicum (Cac), Borrelia burgdorferi (Bbu), Campylobacter jejunii (Cje), Chlamydia trachomatis (Ctr), Chlamydophila pneumoniae (Cpn), Deinococcus radiodurans (Dra), Escherichia coli (Eco), Haemophilus influenzae (Hin), Helicobacter pylori (Hpy), Lactococcus lactis (Lla), Mesorhizobium loti (Mlo), Mycoplasma genitalium (Mge), Mycoplasma pneumoniae (Mpn), Mycobacterium tuberculosis (Mtu), Mycobacterium leprae (Mle), Pasteurella multocida (Pmu), Neisseria meningitidis (Nme), Pseudomonas aeruginosa (Pae), Rickettsia prowazekii (Rpr), Rickettsia conorii (Rco), Synechocystis PCC6803 (Ssp), Thermotoga maritima (Tma), Treponema pallidum (Tpa), Vibrio cholerae (Vch), Xylella fastidiosa (Xfa), Buchnera sp. (Bsp), Caulobacter crescentus (Ccr), and Ureaplasma urealyticum (Uur). Archaeal species abbreviations: Aeropyrum pernix (Ape), Archaeoglobus fulgidus (Afu), Halobacterium sp. (Hsp), Methanothermobacter thermoautotrophicum (Mth), Methanococcus jannaschii (Mja), Pyrococcus horikoshii (Pho), Pyrococcus abyssi (Pab), Thermoplasma volcanium (Tvo), Thermoplasma acidophilum (Tac), Sulfolobus solfataricus (Sso). In addition, the following species were included in the case studies described in the text; bacteria: Agrobacterium tumefaciens (Atu), Bifidobacterium longum (Blo), Brucella melitensis (Rso), Chlorobium tepidum (Cte), Enterococcus faecalis (Efa), Fusobacterium nucleatum (Fnu), Lactobacillus plantarum (Lpl), Leptospira interrogans serovar (Lint), Listeria innocua (Lin), Listeria monocytogenes (Lmo), Nitrosomonas europaea (Neu), Nostoc sp. (Nsp), Oceanobacillus iheyensis (Oih), Ralstonia solanacearum (Rso), Sinorhizobium meliloti (Sme), Streptomyces coelicolor (Sco), Thermoanaerobacter tengcongensis (Tte), Thermosynechococcus elongatus (Tel), Xanthomonas campestris (Xca), Shewanella oneidensis (Son); archaea: Methanopyrus kandleri (Mka), Methanosarcina acetivorans (Mac), Pyrobaculum aerophilum (Pae), Pyrococcus furiosus (Pfu).
Gene neighborhoods for the 41 compared genomes were reconstructed as previously described . Briefly, the collection of clusters of orthologous groups of proteins from complete genomes (COGs)  was used as the source of information on orthologous relationships for detecting conserved gene pairs. For the purpose of this analysis only 'highly conserved' gene pairs were considered, that is, those formed by genes from two COGs that were present in the same orientation and separated by less than three genes in at least 10 of the compared genomes. This conservative approach was adopted in order to ensure that all analyzed gene pairs belong to the same operon. At the next step, overlapping gene pairs were joined in triplets; each triplet was required to exist in at least one genome. Overlapping triplets were used to construct gene arrays by run search in an oriented graph; a gene array may or may not be found in its entirety in any available genome. Finally, gene arrays that shared at least three COGs were clustered into neighborhoods by using a single-linkage clustering algorithm . Conserved gene pairs that did not belong to the reconstructed gene arrays were also analyzed.
The protein sequences encoded by the genes of each neighborhood were searched against the non-redundant protein sequence database (NCBI, NIH, Bethesda) using the BLASTP program. The BLAST hits were analyzed to identify their potential phylogenetic affinity. For each protein, the best hits were identified to the taxon to which the given species belongs (hereinafter, reference taxon) and to other major taxa; hits to closely related species were disregarded (see Table 1S in the additional data file). Proteins that had more significant (lower E-value) hits to a non-reference taxon than to the reference taxon were considered candidates for horizontal transfer and the respective orthologous protein clusters were subject to further phylogenetic analysis as described in the next section. If phylogenetic analysis indicated that a particular gene was likely to be horizontally transferred, phylogenetic trees were built also for the genes predicted to belong to the same operon. When different phylogenetic affinities were found for genes of the same predicted operon, this operon was considered to be 'mosaic'.
Multiple protein sequence alignments were constructed using the T-Coffee program  and positions containing >70% gaps were excluded. Distance trees were constructed by using the least-square method as implemented in the FITCH program of the PHYLIP package [30,31]. The least-square trees were subjected to maximum-likelihood local rearrangement using the ProtML program of the MOLPHY package, with the JTT-F model of amino acid substitutions [32,33]. The resulting trees are a surrogate for maximum-likelihood phylogenies; exhaustive maximum-likelihood tree construction is impractical for the number of species analyzed here. Bootstrap analysis was performed for each maximum-likelihood tree using the Resampling of Estimated Log-Likelihoods (RELL) method as implemented in MOLPHY [32-34]. Alternative placements of selected clades in maximum-likelihood trees were compared by using the rearrangement optimization (Kishino-Hasegawa) method as implemented in the ProtML program .
Additional data, including schematics of operon organization and phylogenetic trees for all gene clusters listed in Table 2
We thank Jeffrey Lawrence for critical reading of the manuscript. Marina V. Omelchenko is supported by a grant from the US Department of Energy (Office of Biological and Environmental Research, Office of Science) grants DE-FG02 01ER63220 from the Genomes to Life Program.