Summary: Gibberellic acids (GAs) are key plant hormones, regulating various aspects of growth and development, which have been at the center of the ‘green revolution’. GRAS family proteins, the primary players in GA signaling pathways, remain poorly understood. Using sequence-profile searches, structural comparisons and phylogenetic analysis, we establish that the GRAS family first emerged in bacteria and belongs to the Rossmann fold methyltransferase superfamily. All bacterial and a subset of plant GRAS proteins are likely to function as small-molecule methylases. The remaining plant versions have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. We predict that GRAS proteins might either modify or bind small molecules such as GAs or their derivatives.
Supplementary Material for this article is available at Bioinformatics online.
We provide a portrait of the bacterial transcription apparatus in light of the data emerging from structural studies, sequence analysis and comparative genomics to bring out important but underappreciated features. We first describe the key structural highlights and evolutionary implications emerging from comparison of the cellular RNA polymerase subunits with the RNA-dependent RNA polymerase involved in RNAi in eukaryotes and their homologs from newly identified bacterial selfish elements. We describe some previously unnoticed domains and the possible evolutionary stages leading to the RNA polymerases of extant life forms. We then present the case for the ancient orthology of the basal transcription factors, the sigma factor and TFIIB, in the bacterial and the archaeo-eukaryotic lineages. We also present a synopsis of the structural and architectural taxonomy of specific transcription factors and their genome-scale demography. In this context, we present certain notable deviations from the otherwise invariant prote-ome-wide trends in transcription factor distribution and use it to predict the presence of an unusual lineage-specifically expanded signaling system in certain firmicutes like Paenibacillus. We then discuss the intersection between functional properties of transcription factors and the organization of transcriptional networks. Finally, we present some of the interesting evolutionary conundrums posed by our newly gained understanding of the bacterial transcription apparatus and potential areas for future explorations.
RNA polymerase; Beta barrel; Two component system; Activators; Transcription factors; Mobile elements; ATPases
The bacterial SOS response is an elaborate program for DNA repair, cell cycle regulation and adaptive mutagenesis under stress conditions. Using sensitive sequence and structure analysis, combined with contextual information derived from comparative genomics and domain architectures, we identify two novel domain superfamilies in the SOS response system. We present evidence that one of these, the SOS response associated peptidase (SRAP; Pfam: DUF159) is a novel thiol autopeptidase. Given the involvement of other autopeptidases, such as LexA and UmuD, in the SOS response, this finding suggests that multiple structurally unrelated peptidases have been recruited to this process. The second of these, the ImuB-C superfamily, is linked to the Y-family DNA polymerase-related domain in ImuB, and also occurs as a standalone protein. We present evidence using gene neighborhood analysis that both these domains function with different mutagenic polymerases in bacteria, such as Pol IV (DinB), Pol V (UmuCD) and ImuA-ImuB-DnaE2 and also other repair systems, which either deploy Ku and an ATP-dependent ligase or a SplB-like radical SAM photolyase. We suggest that the SRAP superfamily domain functions as a DNA-associated autoproteolytic switch that recruits diverse repair enzymes upon DNA damage, whereas the ImuB-C domain performs a similar function albeit in a non-catalytic fashion. We propose that C3Orf37, the eukaryotic member of the SRAP superfamily, which has been recently shown to specifically bind DNA with 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxycytosine, is a sensor for these oxidized bases generated by the TET enzymes from methylcytosine. Hence, its autoproteolytic activity might help it act as a switch that recruits DNA repair enzymes to remove these oxidized methylcytosine species as part of the DNA demethylation pathway downstream of the TET enzymes.
This article was reviewed by RDS, RF and GJ.
Discovery of the TET/JBP family of dioxygenases that modify bases in DNA has sparked considerable interest in novel DNA base modifications and their biological roles. Using sensitive sequence and structure analyses combined with contextual information from comparative genomics, we computationally characterize over 12 novel biochemical systems for DNA modifications. We predict previously unidentified enzymes, such as the kinetoplastid J-base generating glycosyltransferase (and its homolog GREB1), the catalytic specificity of bacteriophage TET/JBP proteins and their role in complex DNA base modifications. We also predict the enzymes involved in synthesis of hypermodified bases such as alpha-glutamylthymine and alpha-putrescinylthymine that have remained enigmatic for several decades. Moreover, the current analysis suggests that bacteriophages and certain nucleo-cytoplasmic large DNA viruses contain an unexpectedly diverse range of DNA modification systems, in addition to those using previously characterized enzymes such as Dam, Dcm, TET/JBP, pyrimidine hydroxymethylases, Mom and glycosyltransferases. These include enzymes generating modified bases such as deazaguanines related to queuine and archaeosine, pyrimidines comparable with lysidine, those derived using modified S-adenosyl methionine derivatives and those using TET/JBP-generated hydroxymethyl pyrimidines as biosynthetic starting points. We present evidence that some of these modification systems are also widely dispersed across prokaryotes and certain eukaryotes such as basidiomycetes, chlorophyte and stramenopile alga, where they could serve as novel epigenetic marks for regulation or discrimination of self from non-self DNA. Our study extends the role of the PUA-like fold domains in recognition of modified nucleic acids and predicts versions of the ASCH and EVE domains to be novel ‘readers’ of modified bases in DNA. These results open opportunities for the investigation of the biology of these systems and their use in biotechnology.
The PIWI module, found in the PIWI/AGO superfamily of proteins, is a critical component of several cellular pathways including germline maintenance, chromatin organization, regulation of splicing, RNA interference, and virus suppression. It binds a guide strand which helps it target complementary nucleic strands.
Here we report the discovery of two divergent, novel families of PIWI modules, the first such to be described since the initial discovery of the PIWI/AGO superfamily over a decade ago. Both families display conservation patterns consistent with the binding of oligonucleotide guide strands. The first family is bacterial in distribution and is typically encoded by a distinctive three-gene operon alongside genes for a restriction endonuclease fold enzyme and a helicase of the DinG family. The second family is found only in eukaryotes. It is the core conserved module of the Med13 protein, a subunit of the CDK8 subcomplex of the transcription regulatory Mediator complex.
Based on the presence of the DinG family helicase, which specifically acts on R-loops, we infer that the first family of PIWI modules is part of a novel RNA-dependent restriction system which could target invasive DNA from phages, plasmids or conjugative transposons. It is predicted to facilitate restriction of actively transcribed invading DNA by utilizing RNA guides. The PIWI family found in the eukaryotic Med13 proteins throws new light on the regulatory switch through which the CDK8 subcomplex modulates transcription at Mediator-bound promoters of highly transcribed genes. We propose that this involves recognition of small RNAs by the PIWI module in Med13 resulting in a conformational switch that propagates through the Mediator complex.
This article was reviewed by Sandor Pongor, Frank Eisenhaber and Balaji Santhanam.
The tripartite DENN module, comprised of a N-terminal longin domain, followed by DENN, and d-DENN domains, is a GDP-GTP exchange factor (GEFs) for Rab GTPases, which are regulators of practically all membrane trafficking events in eukaryotes. Using sequence and structure analysis we identify multiple novel homologs of the DENN module, many of which can be traced back to the ancestral eukaryote. These findings provide unexpected leads regarding key cellular processes such as autophagy, vesicle-vacuole interactions, chromosome segregation, and human disease. Of these, SMCR8, the folliculin interacting protein-1 and 2 (FNIP1 and FNIP2), nitrogen permease regulator 2 (NPR2), and NPR3 are proposed to function in recruiting Rab GTPases during different steps of autophagy, fusion of autophagosomes with the vacuole and regulation of cellular metabolism. Another novel DENN protein identified in this study is C9ORF72; expansions of the hexanucleotide GGGGCC in its first intron have been recently implicated in amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD). While this mutation is proposed to cause a RNA-level defect, the identification of C9ORF72 as a potential DENN-type GEF raises the possibility that at least part of the pathology might relate to a specific Rab-dependent vesicular trafficking process, as has been observed in the case of some other neurological conditions with similar phenotypes. We present evidence that the longin domain, such as those found in the DENN module, are likely to have been ultimately derived from the related domains found in prokaryotic GTPase-activating proteins of MglA-like GTPases. Thus, the origin of the longin domains from this ancient GTPase-interacting domain, concomitant with the radiation of GTPases, especially of the Rab clade, played an important role in the dynamics of eukaryotic intracellular membrane systems.
membrane trafficking; evolution; homology detection; DENN domain; longin domain; C9ORF72; ALS; FTD
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
DIRS1; Tyrosine recombinase; Plant development; DNA-binding; Retroposon; Transcription factor; Chromatin protein; Plant defense
Proteinaceous toxins are observed across all levels of inter-organismal and intra-genomic conflicts. These include recently discovered prokaryotic polymorphic toxin systems implicated in intra-specific conflicts. They are characterized by a remarkable diversity of C-terminal toxin domains generated by recombination with standalone toxin-coding cassettes. Prior analysis revealed a striking diversity of nuclease and deaminase domains among the toxin modules. We systematically investigated polymorphic toxin systems using comparative genomics, sequence and structure analysis.
Polymorphic toxin systems are distributed across all major bacterial lineages and are delivered by at least eight distinct secretory systems. In addition to type-II, these include type-V, VI, VII (ESX), and the poorly characterized “Photorhabdus virulence cassettes (PVC)”, PrsW-dependent and MuF phage-capsid-like systems. We present evidence that trafficking of these toxins is often accompanied by autoproteolytic processing catalyzed by HINT, ZU5, PrsW, caspase-like, papain-like, and a novel metallopeptidase associated with the PVC system. We identified over 150 distinct toxin domains in these systems. These span an extraordinary catalytic spectrum to include 23 distinct clades of peptidases, numerous previously unrecognized versions of nucleases and deaminases, ADP-ribosyltransferases, ADP ribosyl cyclases, RelA/SpoT-like nucleotidyltransferases, glycosyltranferases and other enzymes predicted to modify lipids and carbohydrates, and a pore-forming toxin domain. Several of these toxin domains are shared with host-directed effectors of pathogenic bacteria. Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains. In some organisms multiple tandem immunity genes or immunity protein domains are organized into polyimmunity loci or polyimmunity proteins. Gene-neighborhood-analysis of polymorphic toxin systems predicts the presence of novel trafficking-related components, and also the organizational logic that allows toxin diversification through recombination. Domain architecture and protein-length analysis revealed that these toxins might be deployed as secreted factors, through directed injection, or via inter-cellular contact facilitated by filamentous structures formed by RHS/YD, filamentous hemagglutinin and other repeats. Phyletic pattern and life-style analysis indicate that polymorphic toxins and polyimmunity loci participate in cooperative behavior and facultative ‘cheating’ in several ecosystems such as the human oral cavity and soil. Multiple domains from these systems have also been repeatedly transferred to eukaryotes and their viruses, such as the nucleo-cytoplasmic large DNA viruses.
Along with a comprehensive inventory of toxins and immunity proteins, we present several testable predictions regarding active sites and catalytic mechanisms of toxins, their processing and trafficking and their role in intra-specific and inter-specific interactions between bacteria. These systems provide insights regarding the emergence of key systems at different points in eukaryotic evolution, such as ADP ribosylation, interaction of myosin VI with cargo proteins, mediation of apoptosis, hyphal heteroincompatibility, hedgehog signaling, arthropod toxins, cell-cell interaction molecules like teneurins and different signaling messengers.
This article was reviewed by AM, FE and IZ.
Recent studies have shown that the ubiquitin system had its origins in ancient cofactor/amino acid biosynthesis pathways. Preliminary studies also indicated that conjugation systems for other peptide tags on proteins, such as pupylation, have evolutionary links to cofactor/amino acid biosynthesis pathways. Following up on these observations, we systematically investigated the non-ribosomal amidoligases of the ATP-grasp, glutamine synthetase-like and acetyltransferase folds by classifying the known members and identifying novel versions. We then established their contextual connections using information from domain architectures and conserved gene neighborhoods. This showed remarkable, previously uncharacterized functional links between diverse peptide ligases, several peptidases of unrelated folds and enzymes involved in synthesis of modified amino acids. Using the network of contextual connections we were able to predict numerous novel pathways for peptide synthesis and modification, amine-utilization, secondary metabolite synthesis and potential peptide-tagging systems. One potential peptide-tagging system, which is widely distributed in bacteria, involves an ATP-grasp domain and a glutamine synthetase-like ligase, both of which are circularly permuted, an NTN hydrolase fold peptidase and a novel alpha helical domain. Our analysis also elucidates key steps in the biosynthesis of antibiotics such as friulimicin, butirosin and bacilysin and cell surface structures such as capsular polymers and teichuronopeptides. We also report the discovery of several novel ribosomally synthesized bacterial peptide metabolites that are cyclized via amide and lactone linkages formed by ATP-grasp enzymes. We present an evolutionary scenario for the multiple convergent origins of peptide ligases in various folds and clarify the bacterial origin of eukaryotic peptide-tagging enzymes of the TTL family.
Human ASXL proteins, orthologs of Drosophila Additional sex combs, have been implicated in conjunction with TET2 as a major target for mutations and translocations, leading to a wide range of myeloid leukemias, related myelodysplastic conditions (ASXL1 and ASXL2) and Bohring-Opitz syndrome, a developmental disorder (ASXL1). Using sensitive sequence and structure comparison methods, we show that most animal ASXL proteins contain a novel N-terminal domain that is also found in several other eukaryotic chromatin proteins, diverse restriction endonucleases and DNA glycosylases, the RNA polymerase delta subunit of Gram-positive bacteria and certain bacterial proteins that combine features of the RNA polymerase α-subunit and sigma factors. This domain adopts the winged helix-turn-helix fold (wHTH) and is predicted to bind DNA. Based on its domain architectural contexts, we present evidence that this domain might play an important role both in eukaryotes and bacteria in the recruitment of diverse effector activities, including the the deubiquitinase subunit of Polycomb repressive complexes, to DNA, depending on the state of epigenetic modifications such as 5-methylcytosine and its oxidized derivatives. In other eukaryotic chromatin proteins, wHTH domain is fused to a region with three conserved motifs, which are also found in diverse eukaryotic chromatin proteins, such as the animal BAZ/WAL proteins, plant HB1 and MBD9, yeast Itc1p and Ioc3, RSF1, CECR2 and NURF1. Based on the crystal structure of Ioc3, we establish that these motifs in conjunction with the DDT motif constitute a structural determinant that is central to nucleosomal repositioning by the ISWI clade of SWI2/SNF2 ATPases. We also show that the central domain of the ASXL proteins (ASXH domain) is conserved outside of animals in fungi and plants, where it is combined with other domains, suggesting that it might be an ancient module mediating interactions between chromatin-linked protein complexes and transcription factors and UCH37-like deubiquitnases via its conserved LXXLL motif. We present evidence that the C-terminal PHD finger of ASXL protein has peculiar structural modifications that might allow it to recognize internal modified lysines other than those from the N terminus of histone H3, making it the mediator of previously unexpected interactions in chromatin.
DNA modifications; methylation; Tet2; nucleosomal spacing ISWI; restriction enzymes; polycomb repressive complex
The endosymbiotic origin of eukaryotes brought together two disparate genomes in the cell. Additionally, eukaryotic natural history has included other endosymbiotic events, phagotrophic consumption of organisms, and intimate interactions with viruses and endoparasites. These phenomena facilitated large-scale lateral gene transfer and biological conflicts. We synthesize information from nearly two decades of genomics to illustrate how the interplay between lateral gene transfer and biological conflicts has impacted the emergence of new adaptations in eukaryotes. Using apicomplexans as example, we illustrate how lateral transfer from animals has contributed to unique parasite-host interfaces comprised of adhesion- and O-linked glycosylation-related domains. Adaptations, emerging due to intense selection for diversity in the molecular participants in organismal and genomic conflicts, being dispersed by lateral transfer, were subsequently exapted for eukaryote-specific innovations. We illustrate this using examples relating to eukaryotic chromatin, RNAi and RNA-processing systems, signaling pathways, apoptosis and immunity. We highlight the major contributions from catalytic domains of bacterial toxin systems to the origin of signaling enzymes (e.g., ADP-ribosylation and small molecule messenger synthesis), mutagenic enzymes for immune receptor diversification and RNA-processing. Similarly, we discuss contributions of bacterial antibiotic/siderophore synthesis systems and intra-genomic and intra-cellular selfish elements (e.g., restriction-modification, mobile elements and lysogenic phages) in the emergence of chromatin remodeling/modifying enzymes and RNA-based regulation. We develop the concept that biological conflict systems served as evolutionary “nurseries” for innovations in the protein world, which were delivered to eukaryotes via lateral gene flow to spur key evolutionary innovations all the way from nucleogenesis to lineage-specific adaptations.
antibiotics; biological conflict; endosymbiosis; immunity proteins; restriction-modfication; RNAi; selfish elements; toxins
Double-stranded DNA viruses display a great variety of proteins that interact with host chromatin. Using the wealth of available genomic and functional information, we have systematically surveyed chromatin-related proteins encoded by dsDNA viruses. The distribution of viral chromatin-related proteins is primarily influenced by viral genome size and the superkingdom to which the host of the virus belongs. Smaller viruses usually encode multifunctional proteins that mediate several distinct interactions with host chromatin proteins and viral or host DNA. Larger viruses additionally encode several enzymes, which catalyze manipulations of chromosome structure, chromatin remodeling and covalent modifications of proteins and DNA. Among these viruses, it is also common to encounter transcription factors and DNA-packaging proteins such as histones and IHF/HU derived from cellular genomes, which might play a role in constituting virus-specific chromatin states. Through all size ranges a subset of domains in viral chromatin proteins appear to have been derived from those found in host proteins. Examples include the Zn-finger domains of the E6 and E7 proteins of papillomaviruses, SET-domain methyltransferases and Jumonji-related demethylases in certain nucleocytoplasmic large DNA viruses and BEN domains in poxviruses and polydnaviruses. In other cases, chromatin-interacting modules, such as the LxCxE motif, appear to have been widely disseminated across distinct viral lineages, resulting in similar retinoblastoma targeting strategies. Viruses, especially those with large linear genomes, have evolved a number of mechanisms to manipulate viral chromosomes in the process of replication-associated recombination. These include topoisomerases, Rad50/SbcC-like ABC ATPases and a novel recombinase system in bacteriophages utilizing RecA and Rad52 homologs. Larger DNA viruses also encode SWI2/SNF2 and A18-like ATPases which appear to play specialized roles in transcription and recombination. Finally, it also appears that certain domains of viral provenance have given rise to key functions in eukaryotic chromatin such as a HEH domain of chromosome tethering proteins and the TET/JBP-like cytosine and thymine hydroxylases.
The deaminase-like fold includes, in addition to nucleic acid/nucleotide deaminases, several catalytic domains such as the JAB domain, and others involved in nucleotide and ADP-ribose metabolism. Using sensitive sequence and structural comparison methods, we develop a comprehensive natural classification of the deaminase-like fold and show that its ancestral version was likely to operate on nucleotides or nucleic acids. Consequently, we present evidence that a specific group of JAB domains are likely to possess a DNA repair function, distinct from the previously known deubiquitinating peptidase activity. We also identified numerous previously unknown clades of nucleic acid deaminases. Using inference based on contextual information, we suggest that most of these clades are toxin domains of two distinct classes of bacterial toxin systems, namely polymorphic toxins implicated in bacterial interstrain competition and those that target distantly related cells. Genome context information suggests that these toxins might be delivered via diverse secretory systems, such as Type V, Type VI, PVC and a novel PrsW-like intramembrane peptidase-dependent mechanism. We propose that certain deaminase toxins might be deployed by diverse extracellular and intracellular pathogens as also endosymbionts as effectors targeting nucleic acids of host cells. Our analysis suggests that these toxin deaminases have been acquired by eukaryotes on several independent occasions and recruited as organellar or nucleo-cytoplasmic RNA modifiers, operating on tRNAs, mRNAs and short non-coding RNAs, and also as mutators of hyper-variable genes, viruses and selfish elements. This scenario potentially explains the origin of mutagenic AID/APOBEC-like deaminases, including novel versions from Caenorhabditis, Nematostella and diverse algae and a large class of fast-evolving fungal deaminases. These observations greatly expand the distribution of possible unidentified mutagenic processes catalyzed by nucleic acid deaminases.
The use of nucleases as toxins for defense, offense or addiction of selfish elements is widely encountered across all life forms. Using sensitive sequence profile analysis methods, we characterize a novel superfamily (the SUKH superfamily) that unites a diverse group of proteins including Smi1/Knr4, PGs2, FBXO3, SKIP16, Syd, herpesviral US22, IRS1 and TRS1, and their bacterial homologs. Using contextual analysis we present evidence that the bacterial members of this superfamily are potential immunity proteins for a variety of toxin systems that also include the recently characterized contact-dependent inhibition (CDI) systems of proteobacteria. By analyzing the toxin proteins encoded in the neighborhood of the SUKH superfamily we predict that they possess domains belonging to diverse nuclease and nucleic acid deaminase families. These include at least eight distinct types of DNases belonging to HNH/EndoVII- and restriction endonuclease-fold, and RNases of the EndoU-like and colicin E3-like cytotoxic RNases-folds. The N-terminal domains of these toxins indicate that they are extruded by several distinct secretory mechanisms such as the two-partner system (shared with the CDI systems) in proteobacteria, ESAT-6/WXG-like ATP-dependent secretory systems in Gram-positive bacteria and the conventional Sec-dependent system in several bacterial lineages. The hedgehog-intein domain might also release a subset of toxic nuclease domains through auto-proteolytic action. Unlike classical colicin-like nuclease toxins, the overwhelming majority of toxin systems with the SUKH superfamily is chromosomally encoded and appears to have diversified through a recombination process combining different C-terminal nuclease domains to N-terminal secretion-related domains. Across the bacterial superkingdom these systems might participate in discriminating `self’ or kin from `non-self’ or non-kin strains. Using structural analysis we demonstrate that the SUKH domain possesses a versatile scaffold that can be used to bind a wide range of protein partners. In eukaryotes it appears to have been recruited as an adaptor to regulate modification of proteins by ubiquitination or polyglutamylation. Similarly, another widespread immunity protein from these toxin systems, namely the suppressor of fused (SuFu) superfamily has been recruited for comparable roles in eukaryotes. In animal DNA viruses, such as herpesviruses, poxviruses, iridoviruses and adenoviruses, the ability of the SUKH domain to bind diverse targets has been deployed to counter diverse anti-viral responses by interacting with specific host proteins.
Modified bases in nucleic acids present a layer of information that directs biological function over and beyond the coding capacity of the conventional bases. While a large number of modified bases have been identified, many of the enzymes generating them still remain to be discovered. Recently, members of the 2-oxoglutarate- and iron(II)-dependent dioxygenase superfamily, which modify diverse substrates from small molecules to biopolymers, were predicted and subsequently confirmed to catalyze oxidative modification of bases in nucleic acids. Of these, two distinct families, namely the AlkB and the kinetoplastid base J binding proteins (JBP) catalyze in situ hydroxylation of bases in nucleic acids. Using sensitive computational analysis of sequences, structures and contextual information from genomic structure and protein domain architectures, we report five distinct families of 2-oxoglutarate- and iron(II)-dependent dioxygenase that we predict to be involved in nucleic acid modifications. Among the DNA-modifying families, we show that the dioxygenase domains of the kinetoplastid base J-binding proteins belong to a larger family that includes the Tet proteins, prototyped by the human oncogene Tet1, and proteins from basidiomycete fungi, chlorophyte algae, heterolobosean amoeboflagellates and bacteriophages. We present evidence that some of these proteins are likely to be involved in oxidative modification of the 5-methyl group of cytosine leading to the formation of 5-hydroxymethyl-cytosine. The Tet/JBP homologs from basidiomycete fungi such as Laccaria and Coprinopsis show large lineage-specific expansions and a tight linkage with genes encoding a novel and distinct family of predicted transposases, and a member of the Maelstrom-like HMG family. We propose that these fungal members are part of a mobile transposon. To the best of our knowledge, this is the first report of a eukaryotic transposable element that encodes its own DNA-modification enzyme with a potential regulatory role. Through a wider analysis of other poorly characterized DNA-modifying enzymes we also show that the phage Mu Mom-like proteins, which catalyze the N6-carbamoylmethylation of adenines, are also linked to diverse families of bacterial transposases, suggesting that DNA modification by transposable elements might have a more general presence than previously appreciated. Among the other families of 2-oxoglutarate- and iron(II)-dependent dioxygenases identified in this study, one which is found in algae, is predicted to mainly comprise of RNA-modifying enzymes and shows a striking diversity in protein domain architectures suggesting the presence of RNA modifications with possibly unique adaptive roles. The results presented here are likely to provide the means for future investigation of unexpected epigenetic modifications, such as hydroxymethyl cytosine, that could profoundly impact our understanding of gene regulation and processes such as DNA demethylation.
DNA methylation; dioxygenases; mom; transposons; bacteriophage; AlkB; hydroxymethylcytosine; demethylation; algae; RNA modification; CXXC domain
Recent studies point to a great diversity of non-ribosomal peptide synthesis systems with major roles in amino acid and co-factor biosynthesis, secondary metabolism, and post-translational modifications of proteins by peptide tags. The least studied of these systems are those utilizing tRNAs or aminoacyl-tRNA synthetases (AAtRS) in non-ribosomal peptide ligation.
Here we describe novel examples of AAtRS related proteins that are likely to be involved in the synthesis of widely distributed peptide-derived metabolites. Using sensitive sequence profile methods we show that the cyclodipeptide synthases (CDPSs) are members of the HUP class of Rossmannoid domains and are likely to be highly derived versions of the class-I AAtRS catalytic domains. We also identify the first eukaryotic CDPSs in fungi and in animals; they might be involved in immune response in the latter organisms. We also identify a paralogous version of the methionyl-tRNA synthetase, which is widespread in bacteria, and present evidence using contextual information that it might function independently of protein synthesis as a peptide ligase in the formation of a peptide- derived secondary metabolite. This metabolite is likely to be heavily modified through multiple reactions catalyzed by a metal-binding cupin domain and a lysine N6 monooxygenase that are strictly associated with this paralogous methionyl-tRNA synthetase (MtRS). We further identify an analogous system wherein the MtRS has been replaced by more typical peptide ligases with the ATP-grasp or modular condensation-domains.
The prevalence of these predicted biosynthetic pathways in phylogenetically distant, pathogenic or symbiotic bacteria suggests that metabolites synthesized by them might participate in interactions with the host. More generally, these findings point to a complete spectrum of recruitment of AAtRS to various non-ribosomal biosynthetic pathways, ranging from the conventional AAtRS, through closely related paralogous AAtRS dedicated to certain pathways, to highly derived versions of the class-I AAtRS catalytic domain like the CDPSs. Both the conventional AAtRS and their closely related paralogs often provide aminoacylated tRNAs for peptide ligations by MprF/Fem/MurM-type acetyltransferase fold ligases in the synthesis of peptidoglycan, N-end rule modifications of proteins, lipid aminoacylation or biosynthesis of antibiotics, such as valinamycin. Alternatively they might supply aminoacylated tRNAs for other biosynthetic pathways like that for tetrapyrrole or directly function as peptide ligases as in the case of mycothiol and those identified here.
This article was reviewed by Andrei Osterman and Igor Zhulin.
Almost all known nucleic acid polymerases catalyze 5'-3' polymerization by mediating the attack on an incoming nucleotide 5' triphosphate by the 3'OH from the growing polynucleotide chain in a template dependent or independent manner. The only known exception to this rule is the Thg1 RNA polymerase that catalyzes 3'-5' polymerization in vitro and also in vivo as a part of the maturation process of histidinyl tRNA. While the initial reaction catalyzed by Thg1 has been compared to adenylation catalyzed by the aminoacyl tRNA synthetases, the evolutionary relationships of Thg1 and the actual nature of the polymerase reaction catalyzed by it remain unclear.
Using sensitive profile-profile comparison and structure prediction methods we show that the catalytic domain Thg1 contains a RRM (ferredoxin) fold palm domain, just like the viral RNA-dependent RNA polymerases, reverse transcriptases, family A and B DNA polymerases, adenylyl cyclases, diguanylate cyclases (GGDEF domain) and the predicted polymerase of the CRISPR system. We show just as in these polymerases, Thg1 possesses an active site with three acidic residues that chelate Mg++ cations. Based on this we predict that Thg1 catalyzes polymerization similarly to the 5'-3' polymerases, but uses the incoming 3' OH to attack the 5' triphosphate generated at the end of the elongating polynucleotide. In addition we identify a distinct set of residues unique to Thg1 that we predict as comprising a second active site, which catalyzes the initial adenylation reaction to prime 3'-5' polymerization. Based on contextual information from conserved gene neighborhoods we show that Thg1 might function in conjunction with a polynucleotide kinase that generates an initial 5' phosphate substrate for it at the end of a RNA molecule. In addition to histidinyl tRNA maturation, Thg1 might have other RNA repair roles in representatives from all the three superkingdoms of life as well as certain large DNA viruses. We also present evidence that among the polymerase-like domains Thg1 is most closely related to the catalytic domains of the GGDEF and CRISPR polymerase proteins.
Based on this relationship and the phyletic patterns of these enzymes we infer that the Thg1 protein is likely to represent an archaeo-eukaryotic branch of the same clade of proteins that gave rise to the mobile CRISPR polymerases and in bacteria spawned the GGDEF domains. Thg1 is likely to be close to the ancestral version of this family of enzymes that might have played a role in RNA repair in the last universal common ancestor.
This article was reviewed by S. Balaji and V.V. Dolja.
The E1-like superfamily is central to ubiquitin (Ub) conjugation, biosynthesis of cysteine, thiamine and MoCo and several secondary metabolites. Yet, its functional diversity and evolutionary history is not well-understood. We develop a natural classification of this superfamily and use it to decipher the major adaptive trends occurring in the evolution of the E1-like superfamily. Within the Rossmann fold, E1-like proteins are closest to NAD(P)/FAD-dependent dehydrogenases and S-AdoMet-dependent methyltransferases. Hence, their phosphotransfer activity is an independent catalytic “invention” with respect to such activities seen in other Rossmannoid folds. Sequence and structure analysis reveals a striking diversity of residues and structures involved in adenylation, sulfotransfer and substrate-binding between different E1-like families, allowing us to predict previously uncharacterized functional adaptations. E1-like proteins are fused to several previously undetected domains, such as a predicted sulfur transfer domain containing a novel superfamily of the TATA-binding protein fold, different types of catalytic domains, a novel winged helix-turn-helix domain and potential adaptor domains related to Ub conjugation. Based on these fusions we develop a generalized model for the linking of E1 catalyzed adenylation/thiolation with further down-stream reactions. This is likely to involve a dynamic interplay between the E1 active sites and diverse fused C-terminal domains. We also predict participation of E1-like domains in previously uncharacterized bacterial secondary metabolism pathways, new cysteine biosynthesis systems, such as those associated with archaeal O-phosphoseryl tRNA, metal-sulfur cluster assembly (e.g. in nitrogen fixation) and Ub-conjugation. Evolutionary reconstructions suggest that the last universal common ancestor (LUCA) contained a single E1-like domain possessing both phosphotransfer and thiolating activities and participating in multiple sulfotransfer reactions. The E1-like superfamily subsequently expanded to include 26 families clustering into three major radiations. These are broadly involved in ubiquitin activation, cofactor and cysteine biosynthesis, and biosynthesis of secondary metabolites. In light of this we present evidence that in eukaryotes other E1-like enzymes, such as Urm1, were independently recruited for Ubl conjugation, probably functioning without conventional E2-like enzymes.
Unlike classical 2-oxoglutarate and iron-dependent dioxygenases, which include several nucleic acid modifiers, the structurally similar jumonji-related dioxygenase superfamily was only known to catalyze peptide modifications. Using comparative genomics methods, we predict that a family of jumonji-related enzymes catalyzes wybutosine hydroxylation/peroxidation at position 37 of eukaryotic tRNAPhe. Identification of this enzyme raised questions regarding the emergence of protein- and nucleic acid-modifying activities among jumonji-related domains. We addressed these with a natural classification of DSBH domains and reconstructed the precursor of the dioxygenases as a sugar-binding domain. This precursor gave rise to sugar epimerases and metal-binding sugar isomerases. The sugar isomerase active site was exapted for catalysis of oxygenation, with a radiation of these enzymes in bacteria, probably due to impetus from the primary oxygenation event in Earth’s history. 2-Oxoglutarate-dependent versions appear to have further expanded with rise of the tricarboxylic acid cycle. We identify previously under-appreciated aspects of their active site and multiple independent innovations of 2-oxoacid-binding basic residues among these superfamilies. We show that double-stranded β-helix dioxygenases diversified extensively in biosynthesis and modification of halogenated siderophores, antibiotics, peptide secondary metabolites and glycine-rich collagen-like proteins in bacteria. Jumonji-related domains diversified into three distinct lineages in bacterial secondary metabolism systems and these were precursors of the three major clades of eukaryotic enzymes. The specificity of wybutosine hydroxylase/peroxidase probably relates to the structural similarity of the modified moiety to the ancestral amino acid substrate of this superfamily.
Variable lymphocyte receptors (VLRs) are leucine-rich repeat (LRR) proteins that mediate adaptive immunity in jawless vertebrates. VLRs are fundamentally different from the antibodies of jawed vertebrates, which consist of immunoglobulin (Ig) domains. We determined the structure of an anti-hen egg white lysozyme (HEL) VLRB, isolated by yeast display, bound to HEL. The VLR, whose affinity resembles that of IgM antibodies, uses nearly all its concave surface to bind the protein, in addition to a loop that penetrates into the enzyme active site. The VLR-HEL structure, combined with sequence analysis, revealed an almost perfect match between ligand-contacting positions and positions with highest sequence diversity. Thus, we have defined the generalized antigen-binding site of VLRs. We further demonstrated that VLRs can be affinity-matured to affinities as high as those of IgG antibodies, making VLRs potential alternatives to antibodies for biotechnology applications.
We compared the inferred transcription regulatory interactions from three high-throughput methods. As these methods utilize different principles, they have few interactions in common, suggesting that they capture distinct facets of the actual transcription regulatory program. In addition, we show that these methods capture disparate biological phenomena, which include long-range interactions between telomeres and transcription factors, downstream effects of interference with ribosome biogenesis and a protein-aggregation response. Through a detailed analysis of the latter we predict components of the system responding to protein-aggregation stress.
The Anabaena sensory rhodopsin transducer (ASRT) is a small protein that has been claimed to function as a signaling molecule downstream of the cyanobacterial sensory rhodopsin. However, orthologs of ASRT have been detected in several bacteria that lack rhodopsin, raising questions about the generality of this function. Using sequence profile searches we show that ASRT defines a novel superfamily of β-sandwich fold domains. Through contextual inference based on domain architectures and predicted operons and structural analysis we present strong evidence that these domains bind small molecules, most probably sugars. We propose that the intracellular versions like ASRT probably participate as sensors that regulate a diverse range of sugar metabolism operons or even the light sensory behavior in Anabaena by binding sugars or related metabolites. We also show that one of the extracellular versions define a predicted sugar-binding structure in a novel cell-surface lipoprotein found across actinobacteria, including several pathogens such as Tropheryma, Actinomyces and Thermobifida. The analysis of this superfamily also provides new data to investigate the evolution of carbohydrate binding modes in β-sandwich domains with very different topologies.
Reviewers: This article was reviewed by M. Madan Babu and Mark A. Ragan.
DNA cytosine methylation is crucial for retrotransposon silencing and mammalian development. In a computational search for enzymes that could modify 5-methylcytosine (5mC), we identified TET proteins as mammalian homologs of the trypanosome proteins JBP1 and JBP2, which have been proposed to oxidize the 5-methyl group of thymine. We show here that TET1, a fusion partner of the MLL gene in acute myeloid leukemia, is a 2-oxoglutarate (2OG)- and Fe(II)-dependent enzyme that catalyzes conversion of 5mC to 5-hydroxymethylcytosine (hmC) in cultured cells and in vitro. hmC is present in the genome of mouse embryonic stem cells, and hmC levels decrease upon RNA interference–mediated depletion of TET1. Thus, TET proteins have potential roles in epigenetic regulation through modification of 5mC to hmC.
The configuration of the active site of E2 ligases, central enzymes in the ubiquitin/ubiquitin-like protein (Ub/Ubl) conjugation systems, has long puzzled researchers. Taking advantage of the wealth of newly available structures and sequences of E2s from diverse organisms, we performed a large-scale comparative analysis of these proteins. As a result we identified a previously under-appreciated diversity in the active site of these enzymes, in particular, the spatial location of the catalytic cysteine and a constellation of associated conserved residues that potentially contributes to catalysis. We observed structural innovations of differing magnitudes occurring in various families across the E2 fold that might correlate in part with differences in target interaction. A key finding was the independent emergence on multiple occasions of a polar residue, often a histidine, in the vicinity of the catalytic cysteine in different E2 families. We propose that these convergently emerging polar residues have a common function, such as in the stabilization of oxyanion holes during Ub/Ubl transfer and spatial localization of the Ub/Ubl tails in the active site. Thus, the E2 ligases represent a rare example in enzyme evolution of high structural diversity of the active site and position of the catalytic residue despite all characterized members catalyzing a similar reaction. Our studies also indicated certain evolutionarily conserved features in all active members of the E2 superfamily that stabilize the unusual flap-like structure in the fold. These features are likely to form a critical mechanical element of the fold required for catalysis. The results presented here could aid in new experiments to understand E2 catalysis.
A computational model of the yeast Ubiquitin system highlights interesting biological features including functional interactions between components and interplay with other regulatory mechanisms.
The ubiquitin system (Ub-system) can be defined as the ensemble of components including Ub/ubiquitin-like proteins, their conjugation and deconjugation apparatus, binding partners and the proteasomal system. While several studies have concentrated on structure-function relationships and evolution of individual components of the Ub-system, a study of the system as a whole is largely lacking.
Using numerous genome-scale datasets, we assemble for the first time a comprehensive reconstruction of the budding yeast Ub-system, revealing static and dynamic properties. We devised two novel representations, the rank plot to understand the functional diversification of different components and the clique-specific point-wise mutual-information network to identify significant interactions in the Ub-system.
Using these representations, evidence is provided for the functional diversification of components such as SUMO-dependent Ub-ligases. We also identify novel components of SCF (Skp1-cullin-F-box)-dependent complexes, receptors in the ERAD (endoplasmic reticulum associated degradation) system and a key role for Sus1 in coordinating multiple Ub-related processes in chromatin dynamics. We present evidence for a major impact of the Ub-system on large parts of the proteome via its interaction with the transcription regulatory network. Furthermore, the dynamics of the Ub-network suggests that Ub and SUMO modifications might function cooperatively with transcription control in regulating cell-cycle-stage-specific complexes and in reinforcing periodicities in gene expression. Combined with evolutionary information, the structure of this network helps in understanding the lineage-specific expansion of SCF complexes with a potential role in pathogen response and the origin of the ERAD and ESCRT systems.