Analysis of the scytonemin biosynthesis genomic region in N. punctiforme
In our previous study we proposed that the open reading frames (referred to herein as genes) NpR1276 to NpR1259 in the
N. punctiforme genome comprise a functional unit dedicated to scytonemin biosynthesis [
11]. Within this 18-gene cluster, there appears to be a functional separation between the upstream genes and those in the downstream region (Figure ). Although some of the genes in the upstream region had not been associated with any protein function, others had been preliminarily annotated. For example, NpR1276 is annotated in GenBank as an acetolactate synthase, which is a thiamine pyrophosphate (TPP)-requiring enzyme. Functionally, acetolactate synthase is able to condense two pyruvate molecules [
13] and is almost always found as part of the valine and isoleucine biosynthesis
ilvBN operon [
14]. NpR1276, on the other hand, is not found anywhere near
ilv- genes in the
N. punctiforme genome. It does, however, contain domains specific for a TPP-requiring enzyme [
15], and it has been shown to have a similarly condensing activity on phenol- and indole-pyruvate moieties [
12]. This constitutes sufficient divergence to revisit the annotation and rename the gene
scyA. The next gene in the cluster, NpR1275, was annotated as leucine dehydrogenase (
gdhA). Even though the protein sequence has the necessary domain for glutamate and leucine dehydrogenase, both of which are structurally related NAD
+-dependent oxidoreductases [
16], it only shares a 48% similarity to the leucine dehydrogenase characterized from
Thermoactinomyces intermedius. The GdhA from
T. intermedius is involved in catalyzing the oxidative deamination of branched amino acids [
17]. The product of NpR1275 has a similar activity, but involves the oxidative deamination of aromatic amino acids [
12]. As in the case of
scyA, there are sufficient differentiating traits to rename the gene as
scyB. Even though a protein function cannot be readily predicted for the next four genes (NpR1274 to NpR1271), NpR1273 has experimentally been shown to prevent scytonemin production when inactivated through transposon insertion [
11]. For consistency, given the lack of alternatives, and in keeping with the continuity of
scyA and
scyB, we propose that these four genes encode for truly unique proteins likely essential to scytonemin biosynthesis, and will be referred to as
scyC-F, respectively.
The predicted structural features found in some of these genes are also interesting and support a cellular compartmentalization of scytonemin biosynthesis. For example, ScyD, ScyE, and ScyF, none of which had been assigned a protein function by annotation, each contain a signal peptide export domain in their derived protein sequence. These N-terminal signature sequences are often associated with periplasmic proteins, suggesting that some stages of scytonemin biosynthesis may occur in the periplasm. Furthermore, the protein sequences of ScyA, TyrP, and NpR1259 all contain at least one transmembrane domain. The software program PSLpred [
18], which predicts the subcellular localization of bacterial proteins based on their protein sequences, suggests that TyrP may also function on the periplasmic side, while ScyA and NpR1259 likely function on the cytoplasmic side. While the protein sequence of NpR1268 does not have an N-terminal export domain, the fact that it resembles
dsbA, a dithiol-disulfide isomerase (oxidoreductase) that facilitates the formation of disulfide bridges in the folding of periplasmic proteins [
19], suggests that it may also localize to the periplasm. This leads us to speculate that a dithiol-disulfide isomerase of this kind could be important as an accessory to the other proteins predicted to be active in the periplasm. Thus, the upstream region of the cluster is comprised of novel genes likely involved directly in the assembly of scytonemin biosynthesis, where early condensing reactions occur in the cytoplasm and presumably later steps appear to be localized to the periplasm.
Most of the genes located towards the downstream portion of the cluster are clearly associated by similarity with the biosynthesis of aromatic amino acids [
11,
20]. Furthermore, they do not contain structural motives that predict their association with cellular membranes or their transport to the periplasmic space. In this region of the cluster are genes predicted to code for the first two enzymes of the shikimic acid pathway (
aroG,
aroB), leading to the formation of 5-dehydroquinate. All of the genes necessary for the biosynthesis of tryptophan from chorismate (
trpE, trpC, trpA, trpB, trpD) are also present, while only prephenate dehydrogenase (encoded by
tyrA) is present from the tyrosine biosynthesis pathway, thus ending that pathway at
p-hydroxyphenylpyruvate, one amination short of tyrosine [
21,
22]. In fact, on the basis of chemical structures [
7],
p-hydroxyphenylpyruvate is a theoretically more direct precursor for scytonemin than tyrosine.
One of the most significant observations regarding these aromatic amino acid genes is that there is at least one other copy of each of them elsewhere in the genome of
N. punctiforme at dispersed loci. Genes in this dispersed set find homologues in all other cyanobacteria sequenced so far and thus likely have a housekeeping function [
20]. The cluster of redundant copies of aromatic amino acid biosynthetic genes, by contrast, appears to be unique and always spatially associated with the scytonemin cluster in the few cyanobacterial genomes that have it. Therefore, it is reasonable to hypothesize that the downstream region of the scytonemin cluster is likely dedicated to supplying the building blocks for the biosynthesis of scytonemin, while the standard housekeeping copies remain important for central metabolism. This is supported by the differential up-regulation of these redundant genes along with the induction of scytonemin synthesis in
N. punctiforme, while the expression levels of the housekeeping genes remain unaltered [
23].
Two genes in the downstream region of the cluster have previously been assigned putative protein functions not related to aromatic amino acid biosynthesis. NpR1270 shows similarity to a putative glycosyltransferase, with 77% identity to a glycosyltransferase in
Nodularia. Interestingly, some glycosyltransferases in bacteria have been linked to exopolysaccharide biosynthesis [
24]. Specifically, in
Nostoc commune, the synthesis of scytonemin is coupled to the synthesis of the exopolysaccharide [
25]. The protein sequence of NpR1263 has a transmembrane domain and is annotated as a putative tyrosinase, TyrP, a copper monooxygenase that can hydroxylate monophenols and oxidize o-diphenols to o-quinols [
26]. Indeed, NpR1263 has the essential conserved residues for Cu
2+ binding and is a putative tyrosinase-like protein. It is unique, in that it does not have any cyanobacterial protein sequence homologs in GenBank, and it can be predicted to play an important role in scytonemin biosynthesis, as explained below. The other downstream gene is NpR1259, the last gene in this cluster. It has two putative transmembrane domains and was annotated as a hypothetical membrane protein, since it lacks real homologies with known genes.
Upstream from the gene cluster are two genes that might be involved in the regulation of scytonemin biosynthesis, given their high degree of conservation in sequence and location among distantly related strains (see below). These protein sequences reveal strong similarities to two-component signal transduction systems. These systems typically involve the autophosphorylation of a histidine kinase (in our case, NpF1277) and the subsequent transfer of the phosphate group to an aspartate on the protein. This phosphorylated aspartate then acts as a phospho-donor to a response regulator protein (in our case, NpF1278), which ultimately turns on the transcription of the genes the system regulates [
27,
28]. NpF1277 likely belongs to class II histidine kinases, which are characterized by the presence of PAS/PAC sensory domains that are generally sensitive to oxygen, redox, or light [
29]. NpF1278 is a class II response regulator (RR) [
30] predicted to be a positive transcriptional regulator [
31]. A working hypothesis is that NpF1277 and NpF1278 might regulate the adjacent genomic region (NpR1276 to NpR1259) associated with scytonemin biosynthesis.
Comparative genomics of the scytonemin gene cluster
The scytonemin-associated gene region was identified in three additional strains, belonging to the genera Anabaena, Lyngbya, and Nodularia, among all bacteria whose genomes have been completely sequenced. Genomic arrangements of homologous genes were similar to those of N. punctiforme (Figure ). The scytonemin core genes (scyA-F) are conserved in all four genomes, their orthologs are at least 42%, and most greater than 65%, similar to one another (Table ), and they are positioned near sets of redundant copies of aromatic amino acid biosynthesis genes. These redundant copies are orthologous to the exact same set found in the N. punctiforme genomic region. The only other gene in the cluster conserved across all four genomes was the response regulator, NpF1278 in N. punctiforme.
| Table 1Cyanobacterial orthologs to the scytonemin-associated genes of N. punctiforme. |
In the scytonemin gene cluster of Anabaena, Lyngbya, and Nodularia, there are five conserved genes downstream of scyF that are absent from the N. punctiforme cluster (shown in black in Figure ). In hindsight inspection, orthologs of these genes could readily be identified elsewhere on N. punctiforme's chromosome. There, they comprised a five-gene satellite cluster with all five genes oriented in the same transcriptional direction (NpF5232 to NpF5236). In N. punctiforme, NpF5232 and NpF5235 are annotated as unknown hypothetical proteins, while NpF5233, NpF5234, and NpF5236 are annotated as a putative metal-dependent hydrolase, prenyltransferase (ubiA), and type I phosphodiesterase, respectively. However, these annotations are based on weak similarity, and the orthologs of each of these genes are annotated as unknown hypothetical proteins in the Anabaena, Lyngbya, and Nodularia genomes. At this point, it seems that ambiguity calls for a cautious approach by postponing a specific annotation for these genes.
In a previous study we determined that
Anabaena was unable to produce scytonemin [
11], even though it contained many of the genes in the scytonemin cluster, and interpreted this as a case of relic genetic information. It was thus important to test if scytonemin was produced in the other strains used in the comparisons. We could elicit the production of scytonemin neither in
Lyngbya nor in
Nodularia, upon exposing cultures of each strain to UVA radiation, which is the standard procedure to achieve biosynthetic induction (see Methods). It is possible that these strains may have had the ability to produce scytonemin at some point in their evolutionary history, but have now lost it, since laboratory strains are rarely, if ever, exposed to the doses of UVA required for scytonemin biosynthesis. Furthermore, since scytonemin is a passive sunscreen it is most effective in environments with pulsed resource availability as explained above. Since
Anabaena and
Nodularia are planktonic [
32], their need for a passive sunscreen is not as crucial as it is for the
Nostoc and
Chlorogloeopsis strains of terrestrial habitats [
32]. Although some strains of
Lyngbya produce scytonemin,
Lyngbya PCC 8106 does not produce it. This may be because the marine inter-tidal zone that it was isolated from had varying degrees of resource availability and UV exposure, thus this
Lyngbya strain may have not needed a passive sunscreen.
Given these results, it seemed important to obtain sequences for the scytonemin-associated region from another scytonemin-producing strain besides
N. punctiforme.
Chlorogloeopsis sp. strain Cgs-O-89 [
1], a cyanobacterium known to produce scytonemin [
2], was selected for this purpose. Using targeted PCR based on primers designed from the
N. punctiforme genome, we were able to amplify and sequence several genes from the genomic region associated with scytonemin biosynthesis of
Chlorogloeopsis, and found that their genomic arrangement was very similar to that of
N. punctiforme (Figure ). Additionally, the five-gene satellite cluster from
N. punctiforme was found and sequenced in
Chlorogloeopsis as a continuous segment. As in
N. punctiforme, the
Chlorogloeopsis satellite gene cluster was not continuous with the scytonemin-associated gene cluster. Although we were unable to link all of the scytonemin-associated gene orthologs of
Chlorogloeopsis into a single contig, we could establish clear similarities between the
Chlorogloeopsis and
N. punctiforme gene clusters (Figure ).
Insights into the biosynthetic pathway and working model for scytonemin biosynthesis
Scytonemin is a symmetrical dimeric molecule, and it is expected that each monomer is synthesized separately before condensing to form the dimer. In theory, if tryptophan and tyrosine were used as building blocks, the biosynthesis of scytonemin could involve as little as four to six biosynthetic steps. In fact, structural, genetic, and preliminary radiotracer evidence indicates that the biosynthesis of scytonemin starts from aromatic amino acid (or related) precursors [
7,
8,
11]. Previously isolated natural products, with structural similarities to putative scytonemin subunits, also provide useful biosynthetic clues. Nostodione A (Figure ) has not only been isolated by ozonolysis of scytonemin [
7], but has also been isolated from
Nostoc commune and
Scytonema hofmanni [
33], two typical scytonemin-producing strains. It is thus logical to assume that nostodione A is the most likely monomeric intermediate of scytonemin. Prenostodione (Figure ), the methylated carboxylic acid precursor of nostodione A, has been reported from
Nostoc sp. TAU strain IL-235, further suggesting that the origin of the biosynthetic pathway of scytonemin is from a condensation of tryptophan and phenylpropanoid derived subunits [
34]. Indeed, a recent study found that deaminated tryptophan and tyrosine (indole-3-pyruvic acid and
p-hydroxyphenylpyruvate, respectively) condense, through the action of ScyA and ScyB, to form an intermediate that is structurally similar to diolmycin A1 (Figure ) [
12]. Diolmycin A1 has been isolated from
Streptomyces sp. [
35] and is a plausible intermediate in the scytonemin biosynthetic pathway. Furthermore, oxidation of the tyrosine moiety appears to be essential for the biosynthesis of nostodione A, an essential precursor to scytonemin as mentioned above. We propose that this oxidation could be carried by the tyrosinase-like TyrP encoded for in the scytonemin gene cluster, since tyrosinases are known to promote monooxygenation in similar moieties [
26]. It is interesting to note that the only scytonemin-associated gene in common between
N. punctiforme and
Chlorogloeopsis (the two proven scytonemin producers), that is absent from the other three strains (which, in our hands, do not produce it), is
tyrP (putative tyrosinase). In fact, the gene appears to be absent from the genomes of these
Lyngbya, Anabaena,
and Nodularia strains altogether, as is the case for all other fully sequenced cyanobacterial genomes. We do note, however, that while the genome of
Anabaena is complete, the
Lyngbya and
Nodularia genome projects are almost complete, and because of this we cannot determine with absolute certainty at the time of this publication if
tyrP is absent from these genomes.
A working model of the subcellular compartmentalization of scytonemin biosynthesis in the cell, based on the above genomic analyses, is provided in Figure . Following a UVA radiation cue, the redundant genomic copies of the
trp and
tyr genes are expressed to lead the production of the tryptophan and
p-hydroxyphenylpyruvate monomers from chorismate. The production of chorismate from central metabolites is boosted by additional expression of the genes
aroG and
aroB, which code for the regulatory and rate-limiting enzymes in the shikimic acid pathway, respectively. These precursors are first processed by ScyA, ScyB, ScyC, and NpR1259 in the cytoplasm. The resulting intermediaries are then excreted to the periplasm via some unknown membrane transport mechanism, as no known mechanism is coded for within the scytonemin cluster. There, they are subject to reactions orchestrated by the periplasmic enzymes ScyD, ScyE, ScyF, DsbA, and TyrP to produce the reduced form of scytonemin. Once secreted to the extracellular matrix, it auto-oxidizes and takes on its final yellow-brown appearance. Parallel studies suggest that a type IV secretion system, similar in mechanism to a bacterial conjugation system [
36], is used to secrete scytonemin to the extracellular matrix (Soule
et al., unpublished data). Once scytonemin is in the extracellular slime layer in sufficient quantities, it blocks the incoming UVA cue, thus returning the gene expression to background levels and halting the further synthesis of the sunscreen.