|Home | About | Journals | Submit | Contact Us | Français|
With billions of years of evolution under its belt, Nature has been expanding and optimizing its biosynthetic capabilities. Chemically complex secondary metabolites continue to challenge and inspire today’s most talented synthetic chemists. A brief glance at these natural products, especially the substantial structural variation within a class of compounds, clearly demonstrates that Nature has long played the role of medicinal chemist. The recent explosion in genome sequencing has expanded our appreciation of natural product space and the vastness of uncharted territory that remains. One small corner of natural product chemical space is occupied by the recently dubbed thiazole/oxazole-modified microcins (TOMMs), which are ribosomally produced peptides with posttranslationally installed heterocycles derived from cysteine, serine and threonine residues. As with other classes of natural products, the genetic capacity to synthesize TOMMs has been widely disseminated among bacteria. Over the evolutionary timescale, Nature has tested countless random mutations and selected for gain of function in TOMM biosynthetic gene clusters, yielding several privileged molecular scaffolds. Today, this burgeoning class of natural products encompasses a structurally and functionally diverse set of molecules (i.e. microcin B17, cyanobactins, and thiopeptides). TOMMs presumably provide their producers with an ecological advantage. This advantage can include chemical weapons wielded in the battle for nutrients, disease-promoting virulence factors, or compounds presumably beneficial for symbiosis. Despite this plethora of functions, many TOMMs await experimental interrogation. This review will focus on the biosynthesis and natural combinatorial diversity of the TOMM family.
One strategy that Nature employs in the production of biologically active secondary metabolites uses existing machinery to synthesize inactive precursor peptides. Upon posttranslational modification, these inactive precursors undergo structural rigidification and are endowed with function. The major advantages to this strategy are (i) producer organisms did not need to start from scratch as all organisms possess ribosomes, (ii) amino acid mutations, which provide chemical diversity, occur at the genomic level by simple changes in the codons, and (iii) conversion of inactive precursors to active products could be temporally and spatially controlled at the posttranslational level, allowing organisms to respond quickly to environmental changes. Two major classes of peptide-derived natural products follow the above strategy: the lantipeptides and the TOMMs. The lantipeptides contain (methyl)lanthionine crosslinks, which are installed through the Michael-type addition of a cysteine thiol to a dehydrated serine or threonine . In the TOMM class of natural products, cysteine, serine, and threonine residues are heterocyclized, which conformationally restrains the flexibility of the peptide.
Indeed, thiazole and oxazole heterocycles are ubiquitous in bioactive molecules and show impressive functional versatility in TOMM, non-TOMM, and synthetic products. A few examples include thiostrepton (50S ribosome inhibitor) , trunkamide (anti-cancer compound) , microcin B17 (DNA gyrase inhibitor) , goadsporin (secondary metabolism inducer) , yersiniabactin (siderophore) , and Ritonavir (HIV-1 protease inhibitor)  (Figure 1b). Thus, it is not surprising that Nature has devised two enzymatic solutions to construct thiazol(in)es and oxazol(in)es, one of which operates on non-ribosomal peptides and the other on ribosomal peptides (TOMMs). In both systems, cyclodehydration converts amino acids with a beta-nucleophile (cysteine, serine, or threonine) into thiazoline or (methyl)oxazoline rings [8•,9]. Select azoline rings can then be oxidized to azoles, which in characterized cases is catalyzed by a flavin mononucleotide (FMN)-dependent dehydrogenase [8•,10]. The non-ribosomal peptide synthetase (NRPS) and TOMM enzymes that catalyze the cyclodehydration bear no amino acid similarity to each other and present an example of convergent evolution in biosynthetic processes.
In addition to the cyclodehydratase and optional dehydrogenase, all characterized TOMM clusters contain a ‘docking protein’. The precise role of the docking protein has been debated, but the available data suggest that it plays a direct role in regulating cyclodehydratase activity and assembly of an active synthetic complex [11,12]. Despite this unclear role in TOMM biosynthesis, Nature has given us a hint as to the importance of the docking protein by fusing this gene to the C-terminus of the cyclodehydratase in approximately half of all known TOMM clusters. In the other cases, the cyclodehydratase is found as a separate open reading frame but is expected to form a complex with the docking protein, as demonstrated in microcin B17 biosynthesis [8•]. Beyond the heterocyclization machinery, TOMM clusters can also possess other posttranslational modification enzymes, a topic discussed later in this review.
Akin to the other classes of ribosomal natural products, all characterized TOMM precursor peptides are bipartite. The N-terminal region of the precursor peptide, known as the leader peptide, contains key recognition motifs for the biosynthetic machinery [13••]. The C-terminal region of the precursor peptide, referred to as the core peptide, is rich in heterocyclizable residues and can be the site of numerous other posttranslational modifications [13••]. From the perspective of the producing organism, diversification of a TOMM is expected to be facile due to the relative ease of altering the composition of the core peptide . As long as key binding sites are present within the leader peptide, the synthetase complex will carry out heterocyclization reactions on highly variable core peptides due to its promiscuous nature [15•,16,17•]. NRPS-derived natural products are expected to require a much more substantial genetic rearrangement to produce alternative compounds .
In this review, we use ‘TOMM’ as an umbrella term to refer to all ribosomal peptides posttranslationally modified by proteins with identifiable amino acid similarity to known cyclodehydratases and docking proteins. Such natural products have appeared in several recent reviews [1,2,14,19–27]. This review, however, will focus on the biosynthesis and natural combinatorial diversity of the TOMM family.
The general biosynthetic route for the production of microcin B17 (Figure 1b), a TOMM from select strains of Escherichia coli, was first reported in 1996 [8•]. In this seminal paper, it was shown that a trimeric synthetase complex composed of McbB (cyclodehydratase), McbC (dehydrogenase), and McbD installed thiazole and oxazole moieties onto the McbA precursor peptide. This first enzymatic insight into the maturation of microcin B17 prompted numerous additional studies. In one such study, site-directed mutagenesis was used in conjunction with western blotting and mass spectrometry (MS) to biochemically dissect how the first bisheterocycle site was formed (Gly-Ser-Cys-Gly, boxed in Figure 1b). In vitro experiments suggested that the cysteine cyclized first, which then proceeded to the bisheterocycle. Alteration of the bisheterocycle site to Gly-Cys-Ser-Gly (Ser/Cys order reversed) did not mature to the bisheterocycle and predominantly formed only the one thiazole ring species . This was a relevant observation since a naturally occurring Gly-Cys-Ser-Gly sequence resulted in a bisheterocycle elsewhere on the McbA peptide (Figure 1b). Clearly, heterocycle formation was not just dependent on the cyclizable residue, but also on the flanking residues. The glycine before the first bisheterocycle site proved to be crucial for heterocycle formation, as mutation of this residue to alanine precluded heterocycle formation . Replacement of the C-terminal flanking glycine had a lesser effect. When mutated to alanine, valine, or asparagine, heterocycle formation was observed, although the rate was diminished by approximately two-fold in the valine and asparagine mutations . Also, it was shown that cysteine residues cyclize 100-fold to 1000-fold faster than serine residues, depending on their location . The increased rate was presumably due to the increased nucleophilicity of thiols, relative to alcohols. There also appeared to be a distance dependence on microcin B17 heterocycle formation. Upon shortening of the naturally occurring 10-glycine linker region between the leader sequence and the core domain (boldface/underlined in Figure 1b), a decrease in the rate of heterocyclization was observed. Likewise, a two-fold increase in rate was observed when the linker was lengthened to 11-glycines (Figure 1b) . The order of heterocycle formation was also determined for microcin B17 via tandem MS. Overall, the installation of heterocycles was found to be directional, from the N-terminus to C-terminus. Furthermore, McbBCD released the McbA substrate after each heterocycle was formed, making microcin B17 biosynthesis a distributive process . From all of these biochemical studies, it was evident that the microcin B17 biosynthetic machinery displayed regiospecific activity. However, this interpretation is somewhat convoluted due to the fact that in the N-terminal to C-terminal direction, four of the first five cyclizable residues encountered are cysteines, which are inherently more reactive (Figure 1b).
The cyanobactins are a large family (>100) of macrocyclized TOMMs produced by cyanobacteria (e.g. trunkamide, Figure 1b) [17•,32,33]. A landmark discovery established that patellamides A and C were not being produced by the sea squirt (Lissoclinum patella), as was long suspected, but instead originated from its cyanobacterial symbiont (Prochloron didemni). Moreover, this report proved that these metabolites were of ribosomal origin [34••]. Since this initial discovery, the biosynthetic gene clusters for many cyanobactins have been identified [17•,32,35,36], some of which appear to have been distributed through horizontal gene transfer . Additional studies with patellamide and trunkamide determined that the heterocycles were installed in a manner similar to microcin B17, supporting the classification of the heterocycle-containing cyanobactins as TOMMs. One key difference is that, unlike microcin B17, the cyanobactin hypervariable core peptides are flanked by highly conserved motifs. These motifs serve as proteolytic cleavage sites that ultimately lead to macrocyclization by an enzyme resembling subtilisin .
The cyanobactins provide a remarkable example of natural combinatorial biosynthesis, as illustrated by the variation at every position of the core peptide (Figure 2a). The least variance is observed at the C-terminus of the core peptide where a heterocycle is always found . Interestingly, the penultimate position never contains a heterocyclizable residue. It remains unclear if the resulting molecules would be either biologically inactive or the biosynthetic machinery cannot carry out the transformation due to spatial considerations. A typical cyanobactin precursor peptide will contain two core peptides, allowing for the production of multiple cyanobactins from a single polypeptide [13••]. This natural library displays remarkable chemical diversity and supports the notion that the biosynthetic enzymes are permissive in their substrate tolerance. This inherent substrate flexibility allows for variation to emerge in the core domain and for new products with enhanced activity or novel modes of action. If the substrate tolerance and specificity of each cyanobactin modifying enzyme were known, it would allow the rational design of substrates to give rise to a desired product.
As the first steps towards the rational engineering of cyanobactins, preliminary biochemical work has been done on PatD and TruD. These heterocyclization enzymes are single polypeptide fusions of the cyclodehydratase and docking proteins (CD fusion) from the patellamide and trunkamide biosynthetic clusters, respectively [39,40]. These two proteins are 88% identical at the amino acid level, with essentially all of the differences occurring in the C-terminal docking protein domain . Given their similarity, it is quite surprising that PatD and TruD have different regiospecificities . After heterocyclization and prenylation, cyanobactins must be proteolytically cleaved from the precursor peptide and macrocyclized. In the patellamide biosynthetic cluster, PatA and PatG were shown to cleave the N-termini and C-termini of the core peptide, respectively . PatG also catalyzed the final macrocyclization reaction through a proposed transamidation mechanism [38,41••]. PatG, which exhibits a broad substrate scope, catalyzed the proteolytic cleavage and macrocyclization of synthetic peptides of varying length and composition, even peptides containing nonproteinogenic residues [41••]. An early demonstration of this enzymatic promiscuity was illustrated by engineering the production of eptidemnamide, an entirely artificial substrate similar to the antiplatelet drug, eptifibatide [17•]. Again, the inherent substrate tolerance of the modifying enzymes permits for rapid alteration of the natural product at the genetic level to meet organismal demands upon encountering environmental challenges.
For over a century, scientists have repeatedly sought to isolate and characterize the factor responsible for the classic, β-hemolytic phenotype exhibited by S. pyogenes [42•]. Early efforts were met with limited success due to the frustrating biophysical characteristics and non-antigenic nature of streptolysin S (SLS). A pioneering discovery in 1998 identified the first SLS-associated gene (sagA) and led to the discovery of the entire sag operon [43,44]. Further studies demonstrated that the biosynthetic gene cluster required for SLS production was composed of a dehydrogenase (SagB), a cyclodehydratase (SagC), and a docking protein (SagD), which act collectively to install thiazole and oxazole heterocycles onto an inactive precursor peptide (SagA) [42•]. After heterocycle installation by in vitro reconstitution, SagA was converted into a broadly active cytolysin, providing the first molecular level description of the factor responsible for the β-hemolytic phenotype of S. pyogenes [42•]. Given that the biosynthetic route to SLS was highly reminiscent of microcin B17, it was subsequently demonstrated that reacting McbA with SagBCD resulted in the formation of heterocycles on McbA, again highlighting the enzymatic promiscuity of the TOMM machinery [42•]. As with the early studies on SLS, the problematic amino acid composition of SagA thwarted numerous modern MS methods to identify sites of posttranslational modification after reaction with SagBCD. However, a Herculean effort, involving derivatization of SagBCD-treated SagA, followed by tandem MS, and extensive data analysis demonstrated that SagBCD installed heterocycles onto the SagA core peptide . Similarity to the sag operon provided genetic evidence that SLS-like toxins may promote the virulence of non-Streptococcal human pathogens (Clostridium botulinum, Listeria monocytogenes, and Staphylococcus aureus), which have since been experimentally validated to be cytolytic using a combination of genetic deletions and in vitro reconstitution [16,45,46].
The promiscuity observed with cyanobactin biosynthesis is echoed in the cytolytic TOMMs. This was first illustrated by the ability of SagBCD to install heterocycles onto McbA and the SagBCD-dependent conversion of the precursor peptide from C. botulinum (ClosA) into a potent cytolysin [42•]. However, incubating the precursor peptides of S. aureus and L. monocytogenes (StaphA and ListA, respectively) with SagBCD or complementing their genes into a ΔsagA S. pyogenes strain did not result in cytolytic activity . Upon further study, it was established that although the heterocycle-forming enzymes tolerate diverse substrates, specificity is achieved by the leader peptides having distinct binding motifs, which direct the precursor peptide to the biosynthetic machinery [13••,16,47]. The requirement of these binding motifs prevents the heterocyclization of all cellular peptides and proteins, which would presumably be lethal to the producing organism. Therefore, cytolytic activity was not observed with StaphA and ListA because their leader peptides lack important binding motifs recognized by SagBCD [13••,16]. Further studies utilizing chimeric precursor peptides containing the SagA leader peptide and the StaphA and ListA core peptides resulted in cytolytic activity after treatment with SagBCD . The promiscuity of the SagBCD complex has also been further illustrated by the processing of artificial substrates into cytolysins .
Owing to the lack of structural information on SLS, inferences about regiospecificity and chemospecificity of the biosynthetic machinery must be based on cytolytic activity of site-directed mutants. As expected, some residues of SagA were critical for the cytolytic activity, as mutation to alanine abolished activity . Replacement of a cysteine for a serine or vice versa at a critical location resulted in slightly reduced cytolytic activity in vitro and in vivo . Intriguingly, prolines were also tolerated at such positions. The diminished activity suggested that heterocycles were likely present and played mostly a structural role, but given that the unnatural heterocycle decreased the cytolytic potency, electronic contributions cannot be ruled out.
The thiopeptides, defined by a central (tetrahydro)pyridine ring and at least one thiazole substituent (e.g. thiostrepton, Figure 1b) , were first discovered in 1948 . Although the thiopeptides have a diverse array of functions, the most well known activity is to inhibit protein synthesis by interacting with the 50S ribosomal subunit or elongation factor Tu . Early feeding experiments using labeled amino acids demonstrated that these highly modified natural products are amino acid derived [2,49]. However, for over half a century, the biosynthetic pathway to this class of natural products remained elusive until four independent research labs reported the biosynthetic genes responsible for several thiopeptides within the span of a few months [50–53]. These initial studies have led to a number of other discoveries for various thiopeptide biosynthetic gene clusters [54–58]. Thiopeptides feature an extraordinary array of posttranslational modifications. Thiostrepton, for example, contains four thiazoles, a thiazoline, a dehydrobutyrine, three dehydroalanines, a quinaldic acid moiety, a tetrahydropyridine ring, a dihydroxylated isoleucine, and a carboxy amide (Figure 1b) .
It was hypothesized that two dehydroalanines underwent a [4+2] cycloaddition to produce the central 6-membered ring found in all thiopeptides, which was later supported by feeding experiments [49,59]. The ‘hetero-Diels-Alderase’ responsible for this remarkable transformation was recently identified (TclM in thiocillin biosynthesis), which confirmed the original dehydroalanine hypothesis [60••]. Other enzymes involved in the ancillary posttranslational modifications have also been elucidated through genetic and biochemical studies [50,54,61].
Although thiopeptide biosynthesis is a fascinating example of chemical and genetic complexity, the use of these molecules in clinical medicine has been plagued by poor water solubility . In an effort to increase water solubility, and thus bioavailability, initial studies to determine the structure-activity relationship have been conducted using the thiostrepton and thiocillin biosynthetic frameworks [62,63•,64]. Introduction of a fosmid that contained the thiostrepton biosynthetic cluster into a precursor peptide deficient strain of Streptomyces laurentii (thiostrepton producer) allowed genetic manipulation of the precursor peptide . Alanines 2 and 4 (numbered N-terminus to C-terminus, Figure 1b) in thiostrepton were permissive toward mutation . A parallel study with the thiocillin core peptide demonstrated that some positions were tolerant to substitution, while mutation to other positions either attenuated or abolished activity [63•,64]. Notably, threonine-3 is the only unmodified residue in thiocillin, and mutation of this position to serine resulted in a moderate loss of antibiotic activity (~50%) . Attempts to increase the water solubility by incorporating charged residues into the more tolerant positions dramatically decreased antibiotic activity [63•]. Thus, the quest to engineer more bioavailable thiopeptide analogs continues.
The aforementioned TOMMs only cover about half of the currently known landscape occupied by this natural product class, as illustrated in Figure 3. Some light was recently shed on a subset of the uncharacterized TOMMs with the discovery of two new families of precursor peptides that share two traits: uncharacteristically long leader peptides that are homologous to known proteins and hypervariable C-terminal core regions [15•]. One family of these precursor peptides shares homology with the alpha subunit of nitrile hydratase [15•]. These nitrile hydratase-related leader peptides (NHLPs) are present in a diverse set of organisms, with a distinct lineage being present in the Burkholderia order. These peptides have been subjected to natural combinatorial biosynthesis, as demonstrated by paralogous duplication within many genomes and peptides with hypervariable C-termini [13••,15•]. Thus, multiple NHLPs are predicted to be biosynthesized by a single organism. This is exemplified by the 12 NHLPs found in Pelotomaculum thermopropionicum SI [15•]. The second family of recently discovered precursor peptides has homology to the Nif11 protein family, which are found in nitrogen-fixing bacteria but have an unknown function. In many cases, the TOMM biosynthetic machinery is not present in the genomes with Nif11-related precursor peptides and is replaced with lanthionine biosynthetic machinery. Numerous Nif11-like peptides from Prochlorococcus marinus have been experimentally validated to produce mature lantipeptides. In this non-TOMM system, one bifunctional LanM enzyme was shown to install (methyl)lanthionine rings on 17 different substrates but likely modifies all 29 genetically encoded substrates . Catalytic promiscuity allows bacteria with compact genomes to produce an array of secondary metabolites with minimal increase in genome size .
Another recently characterized TOMM, plantazolicin, from the plant growth-promoting bacterium, Bacillus amyloliquefaciens FZB42, exhibits antibacterial properties towards select gram-positive bacteria through an unknown mechanism . The structure of this natural product has yet to be established, but deletion studies have indicated that the compound is extensively posttranslationally modified .
TOMMs can be diversified using multiple mechanisms (Figure 4). In regard to precursor diversification, the bipartite character of TOMM precursor peptides allows the TOMM biosynthetic enzymes to be both specific for substrate binding yet promiscuous in terms of installing posttranslational modifications. This allows an organism to assess the functional ramification of core peptide mutations, of which some may yield increased target potency or a new target altogether (Figure 4). Nature has iteratively performed this operation over the eons to arrive at numerous privileged scaffolds. Much like medicinal chemists would diversify a promising lead compound, TOMM producers can obtain/evolve genes encoding enzymes that further elaborate existing TOMMs. An example of this is the ‘hetero-Diels-Alderase’ of the thiopeptides. Here, Nature took a thiopeptide ancestor, likely resembling goadsporin (Figure 1b) , and developed an enzyme that could join two dehydroalanines into a tetrahydropyridine ring (Figure 4). This quantum leap in biosynthetic capacity transformed the mode of action of the resultant molecule into a potent inhibitor of protein synthesis. Thus, a powerful weapon for niche competition was born, which is now disseminated in many genera of soil bacteria . It is probable that the scope of functionality for this natural product family has not yet been fully realized.
By sifting through publicly available genome databases, bioinformatics-based approaches have identified over 300 TOMM biosynthetic gene clusters (Figure 3). The function of many of these can be inferred based on close phylogenetic relationships to characterized TOMMs. We refer to this process as bioinformatics-guided chemotyping. However, there is a vast area of TOMM chemical/genetic space that awaits characterization, and will continue to be an area of growth. This unexplored area has been growing linearly with efforts in microbial genome sequencing, suggesting that new TOMM subfamilies will continue to be uncovered.
Future studies, from both chemical and biological perspectives, will provide a more complete knowledge of the enzymes responsible for the peptide modifications, as well as the order and specificity in which they occur. Additionally, the role of the docking protein in orchestrating TOMM biosynthesis remains unclear. Future efforts should focus on the docking protein’s role in heterocycle formation. The biochemical knowledge of critical and ancillary enzymes will allow for the rational design and production of artificial TOMMs. Other future studies will focus on the production and screening of combinatorial libraries for increased potency and novel modes of action. Overall, the TOMM natural product family is a prolific source of chemical and functional diversity, and represents one strategy Nature adopted to synthesize complex natural products from a ribosomal template.
We would like to thank members of the Mitchell lab for critical reading of the manuscript. We also acknowledge the generous support from the Department of Chemistry and the Institute for Genomic Biology at the University of Illinois at Urbana-Champaign.
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest