We have performed a comprehensive computational analysis of type II PKSs and their gene clusters for the identification of type II PKSs and the prediction of polyketide chemotypes in actinobacterial genomes. Even though subclasses of type II PKS have been inferred from the chemical structure of the aromatic polyketide, earlier studies have not specifically defined subclasses within type II PKS class based on their biosynthetic functions and sequence patterns. We solved this issues using homology based sequence clustering analysis of known type II PKSs. The results of this analysis showed that several type II PKS classes such as KR, ARO, CYC could be separated into type II PKS subclasses with different biosynthetic function. Furthermore, we could identify domain subfamilies of type II PKSs by using sequence patterns of type II PKS subclasses. These results imply that several type II PKS classes could be more sophisticatedly classified into subclasses based on patterns of domain sequences and various different types of aromatic polyketides are synthesized by different biosynthetic pathway catalyzed by type II PKS subclasses.
The identification of type II PKS subclasses enabled us to make prediction rules for aromatic polyketide chemotype corresponding to the combination of type II PKS domains. It has been known that aromatic polyketide is synthesized by various biosynthetic processes including starter unit selection, chain length determination, folding pattern determination, chain tailoring such as methylation, glycosylation and so on. Several previous studies have reported key factors by correlating individual type II PKS sequence with chemical structure of aromatic polyketide [30
]. Based on previous reports, we tried to deduce general rules applicable to our known type II PKSs for various biosynthetic processes of aromatic polyketide formation. However, we could only find correlation between ARO/CYC domain combination and carbon chain folding pattern for our known type II PKSs.
The development of type II PKS domain classifiers and derivation of prediction rule for aromatic polyketide chemotype allowed us to identify and analyze type II PKS gene cluster. It is important to predict aromatic polyketide chemotype by analyzing type II PKS gene cluster. The aromatic polyketide chemotype provides a framework to understand the type II PKS gene cluster within the known biosynthetic pathway. It also suggests the potential function of individual type II PKS in polyketide biosynthesis pathway. Furthermore, it provides a possibility to design novel aromatic polyketide by engineering the biosynthetic pathway through substitution of type II PKS.
The integration of the type II PKS domain classifiers with the chemotype-prediction rules leaded to development of PKMiner, which can detect type II PKS gene cluster, provides type II PKS functional annotation and predicts the polyketide chemotype of type II PKS product. Compared to previous software antiSMASH, the analysis functionalities described here are unique features in analyzing type II PKS gene cluster. Even though the antiSMASH provides various analysis functionalities such as gene cluster detection, function annotation, prediction of chemical structure, comparative gene cluster analysis and phylogenetic analysis, some of analysis functionalities such as gene cluster detection, comparative gene cluster analysis and phylogenetic analysis are only effective in analyzing type II PKS gene cluster because it lacks comprehensive type II PKS specific domain classifiers and aromatic polyketide structure prediction module.
Genome analysis and literature based validation showed that our method can be successfully applied to identify type II PKSs and predict aromatic polyketide chemotype by analyzing type II PKS gene clusters. Especially, it turns out that pentangular polyphenol is the most abundant polyketide chemotype predicted by the largest number of organisms. However, this approach has potential limitations in type II PKS domain identification and aromatic polyketide prediction. Because our domain classifiers and polyketide chemotype prediction rules always depend on known type II PKS information and type II PKS domain organization, it can miss some totally new types of PKS subclasses or failed to predict aromatic polyketide chemotype with novel domain combination for existing or novel aromatic polyketide chemotype. For example, 9 potential type II PKSs in Steptomyces avermitilis
MA-4680 were reported based on their general similarity to type II PKSs, but these did not show distinguished sequence similarity to any of our type II PKS domains and their PKS activities have not been validated experimentally [27
]. We consider including these type II PKSs into a separate domain subfamily group after their type II PKS activities are proved.
The result of genome analysis remains taxonomic characteristics of microorganisms with type II PSK gene clusters. We thus investigated taxonomic distribution for the above results in more detail. To estimate relative abundance of type II PKS containing genomes between different taxonomic groups, we calculated the ratio between the type II PKS containing genomes and total sequenced genomes in taxonomic hierarchy as a taxonomic group ratio. We chose the suborder as criteria taxon for calculating the taxonomic group ratio because it is known that microorganisms belonging to the order Actinomycetales are fascinatingly diverse. Currently, 319 actinobacterial genomes are classified into 6 orders, 17 suborders and 41 families in the NCBI taxonomy. Table shows taxonomic distribution of microorganisms with type II PKS gene clusters. For each of the different suborders, Table shows total number of sequenced genomes, the number of type II PKS containing genomes and the taxonomic group ratio. As can be seen, type II PKS containing genomes exhibited certain taxon-specific distribution. The microorganisms with type II PKS containing genomes are only included in the suborder Catenulisporineae, Frankineae, Micrococcineae, Micromonosporineae, Pseudonocardineae, Streptosporangineae and Streptosporangineae. Interestingly, the taxonomic PKS group ratio shows that the microorganisms included in suborder Frankineae, Micromonosporineae, Streptosporangineae and Streptosporangineae have relatively high proportion type II PKS containing genomes, whereas microorganisms included in the suborder Actinomycineae,Corynebacterineae, Glycomycineae, Kineosporiineae and Propionibacterineae does not have any type II PKS gene clusters. Remarkably, the suborder Streptosporangineae which includes genus Steptomyces known as prolific taxa for polyketide synthesis is not top rank suborder in taxonomic group ratio. This result suggests that there exist other aromatic polyketide prolific sources besides Streptosporangineae.
Taxonomical distribution of microorganisms with type II PKS gene clusters