|Home | About | Journals | Submit | Contact Us | Français|
A new compound, designated ML-449, structurally similar to the known 20-membered macrolactam BE-14106, was isolated from a marine sediment-derived Streptomyces sp. Cloning and sequencing of the 83-kb ML-449 biosynthetic gene cluster revealed its high level of similarity to the BE-14106 gene cluster. Comparison of the respective biosynthetic pathways indicated that the difference in the compounds' structures stems from the incorporation of one extra acetate unit during the synthesis of the acyl side chain. A phylogenetic analysis of the β-ketosynthase (KS) domains from polyketide synthases involved in the biosynthesis of macrolactams pointed to a common ancestry for the two clusters. Furthermore, the analysis demonstrated the formation of a macrolactam-specific subclade for the majority of the KS domains from several macrolactam-biosynthetic gene clusters, indicating a closer relationship between macrolactam clusters than with the macrolactone clusters included in the analysis. Some KS domains from the ML-449, BE-14106, and salinilactam gene clusters did, however, show a closer relationship with KS domains from the polyene macrolide clusters, suggesting potential acquisition rather than duplication of certain PKS genes. Comparison of the ML-449, BE-14106, vicenistatin, and salinilactam biosynthetic gene clusters indicated an evolutionary relationship between them and provided new insights into the processes governing the evolution of small-ring macrolactam biosynthesis.
Numerous natural products, including clinically important drugs such as erythromycin and vancomycin, are polyketides (PKs), nonribosomal peptides (NRPs), or hybrid PK-NRPs. Both PKs and NRPs are synthesized by enzyme complexes through condensation of specialized building blocks. For PK synthesis, malonyl- and methylmalonyl-coenzyme A (CoA) represent the most common building blocks, while nonribosomal peptide synthetases (NRPSs) utilize the common l and d amino acids as well as a wide range of nonproteinogenic amino acids and some carboxylic acids (9). The bacterial type I polyketide synthases (PKSs) and NRPSs have modular compositions, where each module is responsible for the incorporation of one building block unit. The minimal set of domains present in one module is usually the β-ketosynthase (KS), acyltransferase (AT), and acyl carrier protein (ACP) domains for PKSs and the condensation (C), adenylation (A), and peptidyl carrier protein (PCP) domains for NRPSs. However, there are many examples where all or a few modules may lack some of these core domains (6, 20, 37). PKS modules may also include ketoreductase (KR), dehydratase (DH), and enoyl reductase (ER) domains, which determine the oxidation state of the β-carbon, while accessory NRPS domains include, e.g., epimerization, heterocyclization, and methyltransferase domains (9).
The modular nature of PKSs and NRPSs has been exploited by scientists ever since the principle of how they operate was first discovered. Although the possibilities for designing new types of structures by combining different types of domains and modules seem endless, combinatorial biosynthesis in practice appears to be more complicated than first assumed, and as of today, only a few modules can efficiently be combined to produce novel structures (27). Interestingly, nature appears to be playing the same type of game, as studies have pointed to the existence of a natural version of biocombinatorics (8, 17, 34). Although phylogenetic analyses of the KS domains in the PKS enzymes point to generation of new modules within one pathway by duplication of a single ancestor module, the AT, KR, and (DH)-KR domains seem to be exchangeable through recombinational replacements occurring both within and between biosynthetic gene clusters (17). The same phenomenon appears to be true for NRPS adenylation domains (8). Furthermore, in addition to gene duplications, recombination, and gene loss, horizontal gene transfer may have played a central role in the evolution of bacterial type I PKSs (11, 16, 29). Understanding the processes that lead to the generation of new biosynthetic gene clusters in nature may be important for successful application of the same concept in the laboratory.
Several biosynthetic gene clusters and the corresponding pathways for macrolactam biosynthesis have been described in recent years. Leinamycin (Fig. (Fig.1)1) is synthesized by an AT-less NRPS-PKS hybrid system, using d-alanine as a starter (6, 39). Vicenistatin and salinilactam (Fig. (Fig.1)1) are synthesized by regular type I PKSs, but amino acids are utilized as starter units (30, 43). In the biosynthesis of vicenistatin, 3-methylaspartate is presumed to represent the starter, while lysine was suggested as the starter for the biosynthesis of salinilactam. We have recently sequenced the biosynthetic gene cluster for the macrolactam BE-14106 (Fig. (Fig.1)1) and proposed a biosynthetic pathway involving the synthesis of an aminoacyl starter through cooperation between a PKS and NRPS-related enzymes (19). Some of the enzymes presumed to be involved in the biosynthesis of BE-14106 have putative homologs in the other macrolactam gene clusters, suggesting common steps in the biosynthetic pathways. In this study, we present the cloning and sequencing of a new macrolactam biosynthetic gene cluster with high similarity to the BE-14106 gene cluster. Through phylogenetic analyses and comparison of the different macrolactam gene clusters, we establish the evolutionary relationship between the ML-449 and BE-14106 biosynthetic pathways and provide new insights into the evolution of the macrolactam gene clusters in general.
Sediment samples were collected from different sites in the Trondheimsfjord and used for isolation of marine actinomycetes as described by Bredholt et al. (3). The Streptomyces isolate MP39-85, producing the compound designated ML-449, was isolated from a sediment sample collected by use of a box corer at a 450-m depth at position 63°29′N, 10°18′E. The medium used for isolation of this strain was IM7 (3).
The collection of actinomycete isolates obtained from sediment samples was investigated for production of antifungal compounds by using a set of different solidified production media. For Streptomyces sp. MP39-85, growth at 25°C on medium PM4 (3) for 16 days resulted in production of a compound with strong antifungal activity. Dimethyl sulfoxide (DMSO) extracts of Streptomyces sp. MP39-85 were used in a robotic bioassay procedure with Candida albicans CCUG3943 and Candida glabrata CCUG3942 as indicator organisms as described by Jørgensen et al. (18). The latter strain has a high level of resistance against polyene antibiotics, while the C. albicans strain is sensitive to polyenes.
High-performance liquid chromatography (HPLC) fractionation and subsequent liquid chromatography-mass spectrometry-time of flight (LC-MS-TOF) characterization of active fractions were performed as described earlier (19). Molecular masses (10-ppm window) were submitted to the online version of the Dictionary of Natural Products (DNP) in order to search for previously characterized compounds with bioactivity.
Cultivation of Streptomyces sp. MP39-85 for production of ML-449 was performed with BPS-3-1 (oatmeal [30 g/liter], malt extract [5.0 g/liter], yeast extract [3.0 g/liter], MgSO4-7H2O [0.4 g/liter], NaCl [1.0 g/liter], CaCO3 [5.0 g/liter], and glucose [50.0 g/liter]) supplemented with 3.0 ml/liter trace mineral solution 1 (35). Fermentations were performed with Applikon 3-liter fermentors with 1.5 liters of medium. The production cultures were inoculated with 3% (vol/vol) from a 0.5× Trypticase soy broth (TSB)-glucose preculture cultivated as described previously (19). Fermentations were run for 6 days at 25°C with constant aeration (0.25% [vol/vol/min]) and agitation (1,000 rpm). The pH was controlled at 7.3 with 2 M NaOH. Production of ML-449 in shake flasks was performed with 0.3× BPS-3-1 supplemented with glucose as described previously (19). ML-449 was extracted from fermentation broth and purified by preparative HPLC essentially as described previously (19).
Quantitative and qualitative LC-MS analyses of ML-449 were performed on methanol extracts from culture pellets by using an Agilent 1100 HPLC system connected both to a diode array detector (DAD) and a TOF mass spectrometer. Electrospray ionization was performed in the negative (ESI−) mode as described previously (4, 19). Quantitative analysis of ML-449 in fermentation broth was performed using BE-14106 purified by preparative HPLC as a standard, assuming that the extinction coefficient of the two compounds is the same.
Antimicrobial activity of purified ML-449 was determined in a robotic bioassay procedure with the yeast strains C. albicans CCUG3943 and C. glabrata CCUG3942 and the bacterial strain Micrococcus luteus ATCC 9341. Serial dilutions (6 parallel wells) of the purified compound were prepared in a 384-well plate. Diluted samples were added to plates containing medium AM19B (19) and inoculated with the yeast strains specified above for assay of antifungal activity and to plates with medium AM1 (6 g/liter peptone [Oxoid], 4 g/liter tryptone [Difco], 3 g/liter yeast extract [Oxoid], 1.5 g/liter beef extract [Difco], 1 g/liter glucose [BDH], distilled water) inoculated with M. luteus for the assay of antibacterial activity. Growth in the presence of ML-449 at different concentrations was compared to growth in reference wells where only a solvent (DMSO) was added, and the concentration resulting in 50% or more reduction in growth was determined. The assay conditions were otherwise as described by Jørgensen et al. (18).
Samples for nuclear magnetic resonance (NMR) spectroscopy were prepared by dissolving the HPLC-purified and freeze-dried ML-449 in deuterated d6-DMSO to give a final concentration of 1 mM. The NMR sample conditions were maintained the same as those previously reported for NMR assignment of a similar macrolactam, BE-14106 (22). All NMR experiments were recorded at 298 K on a Bruker Avance 600-MHz spectrometer equipped with a 5-mm z-gradient TXI (H/C/N) cryogenic probe. Proton and carbon chemical shifts were referenced to the tetramethylsilyl (TMS) signal. To investigate the chemical structure of the investigated compound, both one-dimensional 1H and two-dimensional correlation spectroscopy (COSY), 1H-13C heteronuclear single quantum coherence (HSQC), and 1H-13C heteronuclear multiple bond correlation (HMBC) spectra were recorded.
Streptomyces sp. MP39-85 strains were grown on ISP2 agar medium (Difco), soy flour mannitol (SFM), medium and mineral agar Gause 1 (soluble starch [20 g/liter], K2HPO4 [0.5 g/liter], MgSO4 [0.5 g/liter], KNO3 [1.0 g/liter], NaCl [0.5 g/liter], FeSO4 [0.01 g/liter], agar [20.0 g/liter], tap water) and in tryptone soy broth (Oxoid). Escherichia coli strains were grown in Luria-Bertani (LB) broth or on LB agar. DH5α was used for general cloning. EZ cells were used for vectors with blue-white selection. XL-Blue MR was used for construction of the genomic library. ET12567 (pUZ8002) was used for intergeneric transfer of pSOK201-based constructs to Streptomyces sp. MP39-85. Antibiotics were supplemented to growth medium at the following concentrations: for ampicillin, 100 or 150 μg/ml; for apramycin, 50 or 100 μg/ml; for chloramphenicol, 20 μg/ml; for kanamycin 50 μg/ml; and for nalidixic acid, 30 μg/ml.
Genomic DNA was isolated from Streptomyces sp. MP39-85 according to the Kirby mix procedure (21). The genomic library for Streptomyces sp. MP39-85 was constructed using a SuperCos 1 cosmid vector kit (Stratagene). The genomic DNA was partially digested with MboI and dephosphorylated before ligation with XbaI-, CIAP-, and BamHI-treated SuperCos 1. E. coli XL1-Blue MR (Stratagene) was used as a host for the construction of the library. A probe for the ML-449 gene cluster was generated using degenerate primers designed for amplification of the conserved KS domain-coding regions of PKS genes (15). PCR products were cloned using a Qiagen PCR cloning kit (Qiagen) and sequenced by MWG. Digoxigenin (DIG)-labeled probes were generated using a PCR DIG probe synthesis kit or a High Prime DNA labeling kit (both from Roche Applied Science). The genomic library was screened as described before (19), using the DIG-labeled probe. Cosmid DNA was isolated from positive clones by using the Wizard Plus SV Minipreps DNA purification system (Promega) and end sequenced using primers designed for the cosmid regions flanking the insert site (SuperCos_forw [5′ GGC CGC AAT TAA CCC TCA C 3′] and SuperCos_rev [5′ GGC CGC ATA ATA CGA CTC AC 3′]). A BigDye Terminator version 1.1 cycle sequencing kit (Applied Biosystems) was applied for the end sequencing. For complete sequencing of the cosmids, cosmid DNA was isolated using a Genopure Plasmid maxikit (Roche). Sequencing was performed by the Centre for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo.
A 3.63-kb BamHI-EcoRI fragment from one of the sequenced cosmids was ligated into pGEM3Zf(−) digested with BamHI-EcoRI. A 3.66-kb EcoRI-HindIII fragment from this construct was ligated into pSOK201 digested with EcoRI-HindIII, resulting in the mlaA1 disruption vector. The vector was introduced into ET12567(pUZ8002) and then used for conjugation with Streptomyces sp. MP39-85 by following the procedure described by Flett et al. (10), but with the donor cells grown to an optical density at 600 nm (OD600) of 0.4 to 0.5. Antibiotics were added after 27 h of incubation. Resulting transconjugants were subjected to a Southern blot analysis to verify correct integration of the vector. Production of ML-449 was tested by LC-MS of fermentation extracts as described above.
KS domains were assigned to the PKS sequences from the BE-14106 (GenBank accession number FJ872523) and ML-449 biosynthetic gene clusters as well as PKS sequences from 9 other PKS clusters retrieved from the NCBI database (http://www.ncbi.nlm.nih.gov/). PKS clusters for the biosynthesis of the following metabolites were chosen (the source organism and GenBank accession number are shown in parentheses): vicenistatin (Streptomyces halstedii; AB086653), salinilactam (Salinispora tropica; NC_009380), leinamycin (Streptomyces atroolivaceus; AF484556), nystatin (Streptomyces noursei; AF263912), pimaricin (Streptomyces natalensis; AJ278573), rapamycin (Streptomyces hygroscopicus; X86780), erythromycin (Saccharopolyspora erythrea; Y11199), spinosad (Saccharopolyspora spinosa; AY007564), and rifamycin (Amycolatopsis mediterranei; AF040570).
AT domains were assigned to some of the PKS sequences from the BE-14106, ML-449, and salinilactam biosynthetic gene clusters and aligned with MlaK, BecK, Strop_2773, and VinK.
Mla/BecJ and -L were aligned with A domains from the bleomycin (BlmX A1/A2 and BlmIX), virginiamycin (VirA), nikkomycin (NikP1), novobiocin (NovH), and yersiniabactin (HMWP2) biosyntheses as well as Strop_2774, Strop_2775, VinM, VinN, and LnmQ.
All protein sequences were aligned by CLUSTAL W (41) as implemented by MEGA 4.0.2 (38), using a gap opening penalty of 3.0 and a gap extension penalty of 1.8 as suggested by Hall (13). Alignments were manually edited, and regions that could not be unambiguously aligned were deleted. Sequence alignments were exported in the FASTA format and converted to the PHYLIP format (input file for PhyML) by using FasToPhyNex (13).
The alignment of KS domains consisted of 151 sequences displaying 349 amino acid positions. Phylogenetic trees were inferred by the maximum likelihood (ML) method using PhyML version 2.4.5 (12) and the Jones-Taylor-Thornton (JTT) model with invariants and gamma distribution. Tree reliability was estimated by the approximate likelihood ratio test (aLRT) (1), using the SH-like option.
The alignment of AT domains consisted of 24 sequences displaying 281 amino acid positions. The phylogenetic tree was inferred by ML, using PhyML with the Whelan and Goldman (WAG) model with invariants and gamma distribution.
The alignment of A domains/AMP-dependent synthetases/ligases consisted of 16 sequences displaying 418 amino acid positions. The phylogenetic tree was inferred by ML using PhyML with the WAG model.
The DNA sequences for the ML-449 biosynthetic gene cluster and the 16S rRNA gene of Streptomyces sp. MP39-85 were deposited in GenBank under accession numbers FJ872525 and FJ872526, respectively.
Streptomyces sp. MP39-85 was isolated as a part of a screening effort aimed at identifying new antifungal compounds produced by marine actinomycetes (18). The screening resulted in isolation of Streptomyces sp. MP39-85, producing a compound with strong antifungal activity. DMSO extract from Streptomyces sp. MP39-85 was subjected to LC-fractionation, identification of bioactive fractions, and LC-MS-TOF analysis. The LC-MS-TOF analysis of the bioactive fractions revealed a significant chromatographic peak corresponding to a molecular mass of 449.2924 Da (see Fig. S1 in the supplemental material), and a search in the Dictionary of Natural Products (DNP) using a 10-ppm window returned no relevant hits for the observed molecular mass. The DAD-UV profile of the unknown compound was, however, remarkably similar to that of the previously characterized macrolactam BE-14106 (20). In addition, the observed molecular mass of the presumably new compound was within 2 ppm of the expected molecular mass of BE-14106 (see Fig. S2 in the supplemental material), but with the addition of one C2H2 group (calculated molecular mass, 449.293 Da). Taken together, these observations suggested that the identified compound might be represented by a novel macrocyclic lactam structurally similar to BE-14106. The antimicrobial activity of a purified sample of ML-449 was determined using the M. luteus bacterial strain and the C. albicans and C. glabrata yeast strains. It was found that the minimum concentrations resulting in at least 50% inhibition of growth (IC50) were approximately 3.0 μg/ml for all three strains.
The chemical structure of ML-449 was determined using NMR spectroscopy. Full correspondence of the NMR experimental conditions used in this work with those previously reported for structure elucidation of BE-14106 (22) allowed for direct comparison of the NMR data acquired for ML-449 and BE-14106 and thus significantly simplified the assignment of the 1H and 13C NMR spectra for the former. A comparison of the 1H-13C HSQC spectra of ML-449 and BE-14106 (data not shown) revealed their close similarity. Upon comparison, correspondence for the vast majority of the observed cross peaks could easily be found, and the total spectral patterns were the same for both compounds. As the chemical shifts are very sensitive indicators of the nuclear chemical environment, this comparison indicated that the chemical structures of ML-449 and BE-14106 macrolactam rings were identical. The only significant difference observed between the ML-449 and BE-14106 HSQC spectra occurred in the ranges from 5 to 6.8 ppm and 122 to 144 ppm for proton and carbon resonances, respectively. The specified spectral regions for both compounds are shown in Fig. S3 in the supplemental material. The 1H-13C HSQC spectrum of ML-449 clearly shows the presence of two new cross peaks, while two other cross peaks are slightly shifted compared to the analogous spectral region for BE-14106 (see Fig. S3, panel A, in the supplemental material). Comparison of the extra cross peaks' chemical shifts with those assigned for BE-14016 indicated that ML-449 has one additional double bond with respect to BE-14106. This, together, with the data on the accurate mass of ML-449 (and, later, information on the biosynthetic gene cluster), suggested that the C-19 acyl side chain of ML-449 contains 2 additional carbon atoms and one additional double bond, compared to what was found for BE-14106. To prove this hypothesis, the full NMR assignment for the C-19 side chain of ML-449 was performed using two-dimensional COSY, 1H-13C HSQC, and 1H-13C HMBC experiments. The obtained NMR data allowed for assignment of all the chain's proton and carbon resonances, including those differing from the BE-14106 NMR signals (see Table S1 in the supplemental material). The results clearly show that the additional double bond is situated in the ML-449 side chain between C-23 and C-24 (Fig. (Fig.1A1A).
A genomic cosmid library was established for the ML-449 producer Streptomyces sp. MP39-85. A probe for the ML-449 biosynthetic gene cluster was obtained by PCR amplification of conserved KS domain-coding regions by using degenerate primers. Cloning and sequencing of these DNA fragments and subsequent BLAST analysis of the translated sequences revealed KS domains with high similarity to PKS sequences from the pyrrolomycin biosynthetic gene cluster (44) and to VinP1 from the vicenistatin biosynthetic gene cluster (30). As a DNA fragment encoding a KS domain with high similarity to VinP1 was successfully used as a probe for the BE-14106 biosynthetic gene cluster (19), the latter was chosen as a probe for identification of the ML-449 gene cluster. Screening of the MP39-85 library with this probe, followed by genome “walking” and partial DNA sequencing, yielded 4 cosmids that presumably encompassed a complete ML-449 biosynthetic gene cluster. These cosmids were fully sequenced, and analysis of the final, contiguous 83-kb sequence by use of Frameplot (14) and BLAST identified 20 open reading frames (Table (Table1)1) within the presumed ML-449 biosynthetic gene cluster (mla). The organization of the mla cluster appeared to be almost 100% identical to that of the BE-14106 biosynthetic gene cluster (bec) (Fig. (Fig.1C),1C), with every gene in the bec cluster having a counterpart in the mla cluster (Table (Table1).1). The only difference was represented by the mlaA1 and mlaA2 PKS genes. The latter gene is not present in the BE-14106 gene cluster, while mlaA1 seemingly constitutes the complete version of the truncated becA. The deduced functional domains of the BecA, MlaA1, and MlaA2 proteins are shown in Fig. Fig.2.2. The high level of similarity between the two clusters indicates that the biosynthesis must also be similar, and the structure of ML-449 points to the only difference being in the synthesis of the C-19-to-C-27 side chain. In the biosynthesis of BE-14106 (19), BecA is presumed to supply the C-17-to-C-25 part of the molecule, including the C-19-to-C-25 side chain. The third module of BecA is incomplete and lacks the terminal ACP domain, but a second truncated PKS, BecC, is believed to constitute the missing part of the module and contains a KR domain and an ACP domain (19). BecA and BecC are thought to complete the synthesis of the acyl chain, which is subsequently modified into an aminoacyl. Such splitting of modules on separate proteins has been reported in several cases (7, 23, 32, 36). The structure of ML-449 indicates incorporation of one additional acetate unit compared to what is observed in BE-14106 biosynthesis. MlaA1 constitutes a PKS with three complete modules, while MlaA2 is truncated and lacks the KR and ACP domains. In analogy to BecA and BecC, MlaA2 and MlaC presumably represent the fourth module, incorporating the last acetate unit of the C-27-to-C-17 acyl chain (Fig. (Fig.3A).3A). A conserved domain search for MlaA2 indicated the presence of a docking domain in the N-terminal region of the enzyme, and this prompted us to look for a C-terminal docking domain in MlaA1, which would support protein-protein interaction between the two enzymes. Alignments of the C-terminal region of MlaA1 and the N-terminal region of MlaA2 with docking domain sequences as described by Thattai et al. (40) revealed the presence of highly conserved docking domains belonging to the groups H1 and T1 (see Fig. S4 in the supplemental material). C-terminal docking domains from group H1 are assumed to interact with N-terminal docking domains from group T1 (40), and the presence of such domains in MlaA1 and MlaA2 indicates that there is an interaction between the enzymes. It is therefore reasonable to assume that MlaA2 and MlaC represent the fourth module, which extends the unfinished acyl chain synthesized by MlaA1 and provides the C-19 carbonyl assumed to be necessary for amination and generation of an aminoacyl starter. Through feeding experiments with 15N labeled amino acids, the nitrogen atom in the BE-14106 structure was shown to originate from glycine, whereas parallel experiments with 13C-labeled amino acids demonstrated that the carbon atoms from glycine were not incorporated into the structure to any extent (19). The enzyme most likely to catalyze the transfer of the amino group from glycine to the acyl chain generated by BecA and BecC is the putative glycine oxidase BecI, although the exact mechanism of transamination remains unclear (19). The biosynthesis of ML-449 is thought to proceed with complete analogy to BE-14106 biosynthesis, with release and subsequent activation of the resulting β-amino acid through ligation with coenzyme A, followed by loading on the first ACP domain of MlaB (Fig. (Fig.3A).3A). The biosynthesis of the macrolactam then proceeds through all modules of the MlaB and MlaD to -G PKS enzymes, resulting in release and cyclization of the lactam by the terminal TE domain of MlaG (Fig. (Fig.3B).3B). P450 monooxygenase MlaO presumably hydroxylates the released macrolactam at C-8.
To verify the involvement of the sequenced gene cluster in the biosynthesis of ML-449, a gene inactivation experiment was performed. To construct a suicide vector for gene disruption, a 3.6-kb fragment representing the part of mlaA1 encoding the KR2, ACP2, KS3, and AT3 domains was ligated with the part of the conjugative vector pSOK201 lacking the Streptomyces origin of replication (45) and transferred from Escherichia coli to Streptomyces sp. MP39-85 by conjugation. Resulting transconjugants were verified by Southern blot analysis to contain the right insertion (data not shown), and production of ML-449 was tested by LC-MS-TOF analysis of the culture extracts. The mutant was found to produce less than 1% of the wild-type level (see Fig. S5 in the supplemental material), confirming the involvement of the sequenced gene cluster in the biosynthesis of ML-449.
The organization of the genes in the mla cluster was identical to that of the bec cluster, with the exception of mlaA2 (Fig. (Fig.1C).1C). The similarity at the protein level was quite striking, with percent identities in the range of 80 to 90% for most of the deduced protein sequences (Table (Table1),1), thus suggesting a common ancestor for the two clusters. Comparison of the gene clusters at the nucleotide level by use of the Artemis Comparison Tool (ACT) (5) showed that the DNA sequence is quite conserved between the clusters, as most regions displayed a percent identity within the range of 80 to 93% (see Fig. S5 in the supplemental material). Interestingly, the analysis disclosed that the 3′ end of becA is more similar to mlaA2 than to the corresponding part of mlaA1. A closer inspection of the sequences revealed that the KS3 domain of BecA is more similar to the KS3 domain of MlaA1 than the KS domain of MlaA2, while the opposite is true for the AT3 domain of BecA. This indicates that the becA gene is most likely a product of recombination between the mlaA1 and mlaA2 ancestor genes, so that BecA retained the KS3 of MlaA1 and gained a new AT3 domain from MlaA2 (Fig. (Fig.22).
The organization of the mla and bec gene clusters suggests that some of the PKS genes may have different origins. mla-becD, -E, -F, and -G are all transcribed in the same direction and may form part of an operon, and their translational products participate in macrolactam ring biosynthesis and are thus likely to have the same origin. mla-becB and C, although transcribed in the same direction as mla-becD to -G, are separated from the other PKS genes by genes encoding enzymes presumed to be involved in the activation of glycine and the aminoacyl starter as well as a regulator, an efflux pump, and a P450 monooxygenase. MlaB/BecB apparently participate in macrolactam ring biosynthesis, while MlaC/BecC are thought to complete the synthesis of the acyl side chain. It is possible that the ancestor gene for mlaC-becC was originally a part of the mlaA2-becA ancestor gene, since MlaC/BecC seemingly represent the C-terminal parts of MlaA2/BecA. The split module may have originated following a rearrangement of the cluster. Another possibility is that mlaC-becC represents the remnants of a separate, partially deleted PKS gene. mlaA1-mlaA2 and becA, on the other hand, are positioned at the left flank of their respective gene clusters and are transcribed in the opposite direction compared to the other PKS genes in the cluster. Moreover, their translational products synthesize the acyl side chain separately from the macrolactam ring, and in that respect, it does not seem unlikely that mlaA1-mlaA2-becA may be of a different origin than mla-becB to -G.
To address the origin of the individual PKS modules in the two gene clusters, a phylogenetic analysis for the KS domains was performed. Earlier analyses have demonstrated that KS domains (from cis-AT PKSs) belonging to a particular biosynthetic gene cluster usually form cluster-specific clades, indicating that the KS domains in individual pathways are generated by duplication of single ancestor modules (11, 17, 25, 42). For a gene cluster with PKS genes of different origins, one would thus expect a phylogenetic separation of the respective KS domains. The amino acid sequences of the mla and bec KS domains were compared with those from 9 other characterized PKS clusters, including 3 gene clusters involved in macrolactam biosynthesis (vicenistatin, salinilactam, and leinamycin), and a phylogenetic tree was reconstructed using the maximum likelihood method (Fig. (Fig.4).4). The KS domains from the leinamycin (lnm) pathway and the KSQ domains (represented by the “loading” KS domains of BecA, MlaA1, and SpnA) were found to be quite different from the rest of the KS domains, forming two separate clades. Leinamycin is synthesized by a PKS system where AT domains are not present in the individual modules and the acyl-CoA substrates are supplied in trans by a discrete acyltransferase (6). Piel et al. found that KS domains from such pathways form a distinct clade from PKS with cis-acting ATs, suggesting a single evolutionary origin for the AT-less systems (31). The separation of the lnm KS domains from the other KS domains is therefore consistent with earlier observations. KSQ domains are specialized loading module domains that lack condensation activity but retain the ability to decarboxylate dicarboxylic acid starters (2, 24). Several studies have found that KSQ domains consistently group with other KSQ domains rather than the KS domains from their respective PKS clusters (11, 28), and this was observed for the KSQ domains included in this study as well. A second feature shown by the same studies is the clustering of KS domains preceded by NRPS modules in hybrid NRPS-PKS assembly lines (11, 28). The formation of a separate clade by such KS domains may be explained by the functional constraint imposed by the unusual amino acid/peptidyl substrate (11). The first KS domain of MlaB/BecB formed a group with the KS1 domain from VinP1 and Strop_2768, and since these KS domains are presumed to perform condensation of an acyl extender unit onto an amino acid derivative/aminoacyl, the formation of a separate clade by these KS domains seems to support the above-mentioned observation. However, Ginolhac et al. found that the hybrid KS domain clade was separated from the regular KS domains, while the KS domains from the hybrid systems in this study were localized among the regular KS domains (11).
The overall tree topology corresponded well with earlier studies, as most KS domains grouped within the expected cluster-specific clades (17, 42). A few KS domains were found to be separate from their respective clade (e.g., some KS domains from the oligomycin [olm] and pimaricin [pim] biosynthetic gene clusters). Such a split has been observed for the same KS domains in an earlier study (17). The KS domains from Mla/BecD to -G formed a subclade within a bigger macrolactam-specific (cis-AT) clade shared with all vicenistatin (vin) KS domains (except the amino acid-accepting KS1 domain from VinP1) and some salinilactam (slm) KS domains (from Strop_2768 and -2778), suggesting common ancestry for the mla, bec, and vin clusters and, partially, the slm cluster. Most of the remaining slm domains (from Strop_2778, -2780, and -2781) formed a separate group, except for the KS domain from Strop_2779, which was embedded into the polyene macrolide clade. The MlaA1/A2 and BecA KS domains were also found to group with the polyene clusters, forming a separate subclade together with the KS domain from Strop_2779. Polyene macrolide clusters has been shown to form a mixed group rather than cluster-specific clades (17, 42), and this analysis displays the same type of distribution for the nys and pim KS domains. The association of the MlaA1/A2 and BecA KS domains with the polyene macrolide-specific clade suggests that the becA-mlaA genes might have originated from polyene macrolide PKS genes.
Concerning the issue of a common ancestry for the mla and bec clusters, the phylogenetic tree reconstruction for the KS domains leaves little doubt, as the mla KS domains are always direct neighbors of their bec homologs from the corresponding modules.
Mla/BecJ, Mla/BecK, Mla/BecS, and Mla/BecL represent a discrete AMP-dependent synthetase/ligase, acyltransferase, peptidyl carrier protein, and NRPS adenylation domain, respectively, and are presumed to be involved in activation of glycine and the aminoacyl starter (19). The four genes encoding these enzymes form a subcluster within the ML-449 and BE-14106 biosynthetic gene clusters. Putative homologs of all four genes are found in the vin and slm clusters, while mla-becJ and mla-becK homologs are missing from the lnm cluster (Fig. (Fig.5A).5A). In the vin cluster, the four genes encoding these enzymes (vinK, vinL, vinM, and vinN) are all transcribed in the same direction, as are the two putative homologous genes present in the lnm cluster (lnmP and lnmQ). In the mla-bec clusters and slm cluster, these genes have different organizations (Fig. (Fig.5A).5A). Another interesting feature is the location of vinP1, mla-becB, and strop_2768 in close proximity to these four genes, as the KS1 domains from VinP1, Mla/BecB, and Strop_2768 apparently accept the amino acid derivative/aminoacyl resulting from modifications/activation accomplished by these (and some other) enzymes. A phylogenetic analysis was undertaken for three of the enzymes and their putative homologs to establish the relationship between them. Amino acid sequences of the putative NRPS adenylation domains and AMP-dependent synthetases/ligases were aligned with a small collection of A domains from other NRPSs. A phylogenetic tree was reconstructed using the maximum likelihood method (Fig. (Fig.5B).5B). The A domains and the AMP-dependent synthetases/ligases are presumed to be homologous, and they all contain the 10 core motifs of the adenylate-forming superfamily of enzymes (26, 33). The analysis showed that the putative adenylation domains Mla/BecL formed a separate group together with Strop_2774, VinM, and LnmQ, suggesting a closer evolutionary relationship between these enzymes than with A domains from other NRPS clusters. The same clustering was observed for Mla/BecJ, Strop_2775, and VinN, indicating that the four enzymes are more closely related to each other than to the A domains included in the analysis. The putative acyltransferases MlaK and BecK were aligned with VinK, Strop_2773, and a random selection of AT domains from the mla, bec, and slm PKSs (both malonyl-CoA and methylmalonyl-CoA-specific AT domains were included). A phylogenetic tree was reconstructed using the maximum likelihood method (Fig. (Fig.5C).5C). As expected, malonyl-CoA-specific AT domains formed a separate clade from the methylmalonyl-CoA specific AT domains (11). The discrete acyltransferases Mla/BecK, VinK, and Strop_2773 formed a separate clade from the other AT domains, indicating a closer relationship between these enzymes than with the other AT domains in the clusters. For comparison, a phylogenetic tree was also constructed for the 16S rRNA gene sequences of Streptomyces sp. DSM 21069 (BE-14106 producer), Streptomyces sp. MP39-85, S. halstedii (3 strains), S. atroolivaceus (1 type strain plus 1 additional strain), and S. tropica CNB440 (see Fig. S6 in the supplemental material). 16S rRNA gene sequences for the vicenistatin producer S. halstedii HC34 and the leinamycin producer S. atroolivaceus S-140 could not be found in the Ribosomal Database Project (RDP) or GenBank, and sequences from 3 strains reported as S. halstedii and 2 strains assigned as S. atroolivaceus were included instead. The analysis confirmed the assumption that Streptomyces sp. MP39-85 and Streptomyces sp. DSM 21069 are quite closely related.
Judging from the organization of the genes in the mla and bec clusters; the high levels of similarity at both the nucleotide and the protein levels; and the phylogenetic trees reconstructed for the KS, AT, and A domains, there seems to be little doubt that the two gene clusters are very closely related. The 16S rRNA gene tree reconstruction is in agreement with the above-mentioned observations, and it is therefore reasonable to assume that Streptomyces sp. DSM 21069 and Streptomyces sp. MP39-85 share a recent common ancestor. The bec cluster is thought to have arisen following a deletion in the ancestor gene cluster.
As to the composition of the two gene clusters, the phylogenetic reconstruction for the KS domains points to a separate ancestry for the mlaA1-mlaA2-becA genes and the rest of the PKS genes, indicating a closer relationship with polyene macrolide clusters for the former. One possible origin of the mlaA-becA genes might be a separate polyene macrolide cluster present in the same strain. Alternatively, they might represent parts from a polyene macrolide gene cluster acquired through horizontal gene transfer.
The phylogenetic tree reconstructions for the KS, AT, and A domains also point to a common ancestry for the mla-bec clusters and the vin and slm clusters. KS domains from Mla/BecD to -G form a subclade with the KS domains from the vin cluster and some KS domains from the slm cluster. The discrete AT domains and A domains/AMP-dependent synthetase/ligases from the four clusters display a closer relationship to each other than to other AT and A domains. The colocalization of the genes encoding the discrete AT, A, and PCP domains and the AMP-dependent synthetases/ligases also point to a relationship between the clusters, although several rearrangements must have occurred during the evolution of the clusters. Considering the different amino acid substrates utilized in the different biosynthetic pathways, one might envision that these four enzymes represent a flexible platform for sequestering amino acids for modification and then subsequent activation and loading of the amino acid derivative on the PKS. The discrete adenylation domain presumably activates the amino acid and loads it on the PCP domain to facilitate modification by pathway-specific enzymes, such as the glycine oxidase from the BE-14106/ML-449 biosyntheses. Upon release from the PCP domain, the AMP-dependent synthetase/ligase activates the now modified amino acid (or aminoacyl in the case of BE-14106/ML-449) by ligation with CoA, making the amino acid derivative an acceptable substrate for the discrete acyltransferase. The acyltransferase subsequently loads the amino acid-CoA on a PKS loading module ACP domain, providing a starter for the biosynthesis of the macrolactam ring. The system can be viewed as flexible in the sense that different types of enzymes could potentially be recruited for modification of the amino acid or the amino acid itself could be exchanged by altering the substrate specificity of the discrete adenylation domain.
In addition to the genes mentioned above, there are also other putative homologs in the vin and slm clusters. Table Table11 summarizes all potential homologs from the vin, slm, and lnm clusters. Particularly, the slm cluster appears to have much in common with the mla and bec clusters, as a total of 9 possible homologs are present in all three clusters (excluding the PKS genes). However, the overall organizations of the slm cluster and the mla-bec clusters are dissimilar. There are also several other genes within the vin and slm clusters that are not present in the other clusters, such as the genes encoding enzymes presumed to synthesize the vicenisamine sugar moiety of vicenistatin and some enzymes presumed to be involved in modification of the amino acid starter. Assuming a distant common ancestor for all of the four macrolactam gene clusters, several deletions, insertions, and rearrangements must be invoked to explain the differences. The mechanism behind these events remains unclear, although homologous recombination between different gene clusters may be the more likely explanation, as there is no trace of transfer-related elements, such as insertion sequence (IS) elements or transposons, in the mla-bec clusters. As domains within PKS and NRPS enzymes seem to be readily exchangeable (8, 17), it does not appear unreasonable to think that other parts of PKS/NRPS gene clusters may be exchangeable as well. All in all, this paints a picture for the evolution of such gene clusters occurring through a natural type of biocombinatorics, where individual genes, modules, and even domains can be recruited from other gene clusters or within the same cluster.
We thank H. Sletvold for help with searching for IS elements and transposons and I. Bakke for assistance with the phylogenetic analysis and for helpful comments on the manuscript. We are also thankful to H. Sletta and T. E. Ellingsen for discussions and comments.
This work was supported by the Research Council of Norway and the Norwegian University of Science and Technology.
Published ahead of print on 23 October 2009.
†Supplemental material for this article may be found at http://aem.asm.org/.