Search tips
Search criteria

Results 1-19 (19)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Novel autoproteolytic and DNA-damage sensing components in the bacterial SOS response and oxidized methylcytosine-induced eukaryotic DNA demethylation systems 
Biology Direct  2013;8:20.
The bacterial SOS response is an elaborate program for DNA repair, cell cycle regulation and adaptive mutagenesis under stress conditions. Using sensitive sequence and structure analysis, combined with contextual information derived from comparative genomics and domain architectures, we identify two novel domain superfamilies in the SOS response system. We present evidence that one of these, the SOS response associated peptidase (SRAP; Pfam: DUF159) is a novel thiol autopeptidase. Given the involvement of other autopeptidases, such as LexA and UmuD, in the SOS response, this finding suggests that multiple structurally unrelated peptidases have been recruited to this process. The second of these, the ImuB-C superfamily, is linked to the Y-family DNA polymerase-related domain in ImuB, and also occurs as a standalone protein. We present evidence using gene neighborhood analysis that both these domains function with different mutagenic polymerases in bacteria, such as Pol IV (DinB), Pol V (UmuCD) and ImuA-ImuB-DnaE2 and also other repair systems, which either deploy Ku and an ATP-dependent ligase or a SplB-like radical SAM photolyase. We suggest that the SRAP superfamily domain functions as a DNA-associated autoproteolytic switch that recruits diverse repair enzymes upon DNA damage, whereas the ImuB-C domain performs a similar function albeit in a non-catalytic fashion. We propose that C3Orf37, the eukaryotic member of the SRAP superfamily, which has been recently shown to specifically bind DNA with 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxycytosine, is a sensor for these oxidized bases generated by the TET enzymes from methylcytosine. Hence, its autoproteolytic activity might help it act as a switch that recruits DNA repair enzymes to remove these oxidized methylcytosine species as part of the DNA demethylation pathway downstream of the TET enzymes.
This article was reviewed by RDS, RF and GJ.
PMCID: PMC3765255  PMID: 23945014
2.  Comprehensive analysis of the HEPN superfamily: identification of novel roles in intra-genomic conflicts, defense, pathogenesis and RNA processing 
Biology Direct  2013;8:15.
The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity.
The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes.
Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life.
This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin
PMCID: PMC3710099  PMID: 23768067
3.  Two novel PIWI families: roles in inter-genomic conflicts in bacteria and Mediator-dependent modulation of transcription in eukaryotes 
Biology Direct  2013;8:13.
The PIWI module, found in the PIWI/AGO superfamily of proteins, is a critical component of several cellular pathways including germline maintenance, chromatin organization, regulation of splicing, RNA interference, and virus suppression. It binds a guide strand which helps it target complementary nucleic strands.
Here we report the discovery of two divergent, novel families of PIWI modules, the first such to be described since the initial discovery of the PIWI/AGO superfamily over a decade ago. Both families display conservation patterns consistent with the binding of oligonucleotide guide strands. The first family is bacterial in distribution and is typically encoded by a distinctive three-gene operon alongside genes for a restriction endonuclease fold enzyme and a helicase of the DinG family. The second family is found only in eukaryotes. It is the core conserved module of the Med13 protein, a subunit of the CDK8 subcomplex of the transcription regulatory Mediator complex.
Based on the presence of the DinG family helicase, which specifically acts on R-loops, we infer that the first family of PIWI modules is part of a novel RNA-dependent restriction system which could target invasive DNA from phages, plasmids or conjugative transposons. It is predicted to facilitate restriction of actively transcribed invading DNA by utilizing RNA guides. The PIWI family found in the eukaryotic Med13 proteins throws new light on the regulatory switch through which the CDK8 subcomplex modulates transcription at Mediator-bound promoters of highly transcribed genes. We propose that this involves recognition of small RNAs by the PIWI module in Med13 resulting in a conformational switch that propagates through the Mediator complex.
This article was reviewed by Sandor Pongor, Frank Eisenhaber and Balaji Santhanam.
PMCID: PMC3702460  PMID: 23758928
4.  Live virus-free or die: coupling of antivirus immunity and programmed suicide or dormancy in prokaryotes 
Biology Direct  2012;7:40.
The virus-host arms race is a major theater for evolutionary innovation. Archaea and bacteria have evolved diverse, elaborate antivirus defense systems that function on two general principles: i) immune systems that discriminate self DNA from nonself DNA and specifically destroy the foreign, in particular viral, genomes, whereas the host genome is protected, or ii) programmed cell suicide or dormancy induced by infection.
Presentation of the hypothesis
Almost all genomic loci encoding immunity systems such as CRISPR-Cas, restriction-modification and DNA phosphorothioation also encompass suicide genes, in particular those encoding known and predicted toxin nucleases, which do not appear to be directly involved in immunity. In contrast, the immunity systems do not appear to encode antitoxins found in typical toxin-antitoxin systems. This raises the possibility that components of the immunity system themselves act as reversible inhibitors of the associated toxin proteins or domains as has been demonstrated for the Escherichia coli anticodon nuclease PrrC that interacts with the PrrI restriction-modification system. We hypothesize that coupling of diverse immunity and suicide/dormancy systems in prokaryotes evolved under selective pressure to provide robustness to the antivirus response. We further propose that the involvement of suicide/dormancy systems in the coupled antivirus response could take two distinct forms:
1) induction of a dormancy-like state in the infected cell to ‘buy time’ for activation of adaptive immunity; 2) suicide or dormancy as the final recourse to prevent viral spread triggered by the failure of immunity.
Testing the hypothesis
This hypothesis entails many experimentally testable predictions. Specifically, we predict that Cas2 protein present in all cas operons is a mRNA-cleaving nuclease (interferase) that might be activated at an early stage of virus infection to enable incorporation of virus-specific spacers into the CRISPR locus or to trigger cell suicide when the immune function of CRISPR-Cas systems fails. Similarly, toxin-like activity is predicted for components of numerous other defense loci.
Implications of the hypothesis
The hypothesis implies that antivirus response in prokaryotes involves key decision-making steps at which the cell chooses the path to follow by sensing the course of virus infection.
This article was reviewed by Arcady Mushegian, Etienne Joly and Nick Grishin. For complete reviews, go to the Reviewers’ reports section.
PMCID: PMC3506569  PMID: 23151069
5.  ALOG domains: provenance of plant homeotic and developmental regulators from the DNA-binding domain of a novel class of DIRS1-type retroposons 
Biology Direct  2012;7:39.
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
PMCID: PMC3537659  PMID: 23146749
DIRS1; Tyrosine recombinase; Plant development; DNA-binding; Retroposon; Transcription factor; Chromatin protein; Plant defense
6.  Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics 
Biology Direct  2012;7:18.
Proteinaceous toxins are observed across all levels of inter-organismal and intra-genomic conflicts. These include recently discovered prokaryotic polymorphic toxin systems implicated in intra-specific conflicts. They are characterized by a remarkable diversity of C-terminal toxin domains generated by recombination with standalone toxin-coding cassettes. Prior analysis revealed a striking diversity of nuclease and deaminase domains among the toxin modules. We systematically investigated polymorphic toxin systems using comparative genomics, sequence and structure analysis.
Polymorphic toxin systems are distributed across all major bacterial lineages and are delivered by at least eight distinct secretory systems. In addition to type-II, these include type-V, VI, VII (ESX), and the poorly characterized “Photorhabdus virulence cassettes (PVC)”, PrsW-dependent and MuF phage-capsid-like systems. We present evidence that trafficking of these toxins is often accompanied by autoproteolytic processing catalyzed by HINT, ZU5, PrsW, caspase-like, papain-like, and a novel metallopeptidase associated with the PVC system. We identified over 150 distinct toxin domains in these systems. These span an extraordinary catalytic spectrum to include 23 distinct clades of peptidases, numerous previously unrecognized versions of nucleases and deaminases, ADP-ribosyltransferases, ADP ribosyl cyclases, RelA/SpoT-like nucleotidyltransferases, glycosyltranferases and other enzymes predicted to modify lipids and carbohydrates, and a pore-forming toxin domain. Several of these toxin domains are shared with host-directed effectors of pathogenic bacteria. Over 90 families of immunity proteins might neutralize anywhere between a single to at least 27 distinct types of toxin domains. In some organisms multiple tandem immunity genes or immunity protein domains are organized into polyimmunity loci or polyimmunity proteins. Gene-neighborhood-analysis of polymorphic toxin systems predicts the presence of novel trafficking-related components, and also the organizational logic that allows toxin diversification through recombination. Domain architecture and protein-length analysis revealed that these toxins might be deployed as secreted factors, through directed injection, or via inter-cellular contact facilitated by filamentous structures formed by RHS/YD, filamentous hemagglutinin and other repeats. Phyletic pattern and life-style analysis indicate that polymorphic toxins and polyimmunity loci participate in cooperative behavior and facultative ‘cheating’ in several ecosystems such as the human oral cavity and soil. Multiple domains from these systems have also been repeatedly transferred to eukaryotes and their viruses, such as the nucleo-cytoplasmic large DNA viruses.
Along with a comprehensive inventory of toxins and immunity proteins, we present several testable predictions regarding active sites and catalytic mechanisms of toxins, their processing and trafficking and their role in intra-specific and inter-specific interactions between bacteria. These systems provide insights regarding the emergence of key systems at different points in eukaryotic evolution, such as ADP ribosylation, interaction of myosin VI with cargo proteins, mediation of apoptosis, hyphal heteroincompatibility, hedgehog signaling, arthropod toxins, cell-cell interaction molecules like teneurins and different signaling messengers.
This article was reviewed by AM, FE and IZ.
PMCID: PMC3482391  PMID: 22731697
7.  Unification of Cas protein families and a simple scenario for the origin and evolution of CRISPR-Cas systems 
Biology Direct  2011;6:38.
The CRISPR-Cas adaptive immunity systems that are present in most Archaea and many Bacteria function by incorporating fragments of alien genomes into specific genomic loci, transcribing the inserts and using the transcripts as guide RNAs to destroy the genome of the cognate virus or plasmid. This RNA interference-like immune response is mediated by numerous, diverse and rapidly evolving Cas (CRISPR-associated) proteins, several of which form the Cascade complex involved in the processing of CRISPR transcripts and cleavage of the target DNA. Comparative analysis of the Cas protein sequences and structures led to the classification of the CRISPR-Cas systems into three Types (I, II and III).
A detailed comparison of the available sequences and structures of Cas proteins revealed several unnoticed homologous relationships. The Repeat-Associated Mysterious Proteins (RAMPs) containing a distinct form of the RNA Recognition Motif (RRM) domain, which are major components of the CRISPR-Cas systems, were classified into three large groups, Cas5, Cas6 and Cas7. Each of these groups includes many previously uncharacterized proteins now shown to adopt the RAMP structure. Evidence is presented that large subunits contained in most of the CRISPR-Cas systems could be homologous to Cas10 proteins which contain a polymerase-like Palm domain and are predicted to be enzymatically active in Type III CRISPR-Cas systems but inactivated in Type I systems. These findings, the fact that the CRISPR polymerases, RAMPs and Cas2 all contain core RRM domains, and distinct gene arrangements in the three types of CRISPR-Cas systems together provide for a simple scenario for origin and evolution of the CRISPR-Cas machinery. Under this scenario, the CRISPR-Cas system originated in thermophilic Archaea and subsequently spread horizontally among prokaryotes.
Because of the extreme diversity of CRISPR-Cas systems, in-depth sequence and structure comparison continue to reveal unexpected homologous relationship among Cas proteins. Unification of Cas protein families previously considered unrelated provides for improvement in the classification of CRISPR-Cas systems and a reconstruction of their evolution.
Open peer review
This article was reviewed by Malcolm White (nominated by Purficacion Lopez-Garcia), Frank Eisenhaber and Igor Zhulin. For the full reviews, see the Reviewers' Comments section.
PMCID: PMC3150331  PMID: 21756346
8.  Predicted class-I aminoacyl tRNA synthetase-like proteins in non-ribosomal peptide synthesis 
Biology Direct  2010;5:48.
Recent studies point to a great diversity of non-ribosomal peptide synthesis systems with major roles in amino acid and co-factor biosynthesis, secondary metabolism, and post-translational modifications of proteins by peptide tags. The least studied of these systems are those utilizing tRNAs or aminoacyl-tRNA synthetases (AAtRS) in non-ribosomal peptide ligation.
Here we describe novel examples of AAtRS related proteins that are likely to be involved in the synthesis of widely distributed peptide-derived metabolites. Using sensitive sequence profile methods we show that the cyclodipeptide synthases (CDPSs) are members of the HUP class of Rossmannoid domains and are likely to be highly derived versions of the class-I AAtRS catalytic domains. We also identify the first eukaryotic CDPSs in fungi and in animals; they might be involved in immune response in the latter organisms. We also identify a paralogous version of the methionyl-tRNA synthetase, which is widespread in bacteria, and present evidence using contextual information that it might function independently of protein synthesis as a peptide ligase in the formation of a peptide- derived secondary metabolite. This metabolite is likely to be heavily modified through multiple reactions catalyzed by a metal-binding cupin domain and a lysine N6 monooxygenase that are strictly associated with this paralogous methionyl-tRNA synthetase (MtRS). We further identify an analogous system wherein the MtRS has been replaced by more typical peptide ligases with the ATP-grasp or modular condensation-domains.
The prevalence of these predicted biosynthetic pathways in phylogenetically distant, pathogenic or symbiotic bacteria suggests that metabolites synthesized by them might participate in interactions with the host. More generally, these findings point to a complete spectrum of recruitment of AAtRS to various non-ribosomal biosynthetic pathways, ranging from the conventional AAtRS, through closely related paralogous AAtRS dedicated to certain pathways, to highly derived versions of the class-I AAtRS catalytic domain like the CDPSs. Both the conventional AAtRS and their closely related paralogs often provide aminoacylated tRNAs for peptide ligations by MprF/Fem/MurM-type acetyltransferase fold ligases in the synthesis of peptidoglycan, N-end rule modifications of proteins, lipid aminoacylation or biosynthesis of antibiotics, such as valinamycin. Alternatively they might supply aminoacylated tRNAs for other biosynthetic pathways like that for tetrapyrrole or directly function as peptide ligases as in the case of mycothiol and those identified here.
This article was reviewed by Andrei Osterman and Igor Zhulin.
PMCID: PMC2922099  PMID: 20678224
9.  Presence of a classical RRM-fold palm domain in Thg1-type 3'- 5'nucleic acid polymerases and the origin of the GGDEF and CRISPR polymerase domains 
Biology Direct  2010;5:43.
Almost all known nucleic acid polymerases catalyze 5'-3' polymerization by mediating the attack on an incoming nucleotide 5' triphosphate by the 3'OH from the growing polynucleotide chain in a template dependent or independent manner. The only known exception to this rule is the Thg1 RNA polymerase that catalyzes 3'-5' polymerization in vitro and also in vivo as a part of the maturation process of histidinyl tRNA. While the initial reaction catalyzed by Thg1 has been compared to adenylation catalyzed by the aminoacyl tRNA synthetases, the evolutionary relationships of Thg1 and the actual nature of the polymerase reaction catalyzed by it remain unclear.
Using sensitive profile-profile comparison and structure prediction methods we show that the catalytic domain Thg1 contains a RRM (ferredoxin) fold palm domain, just like the viral RNA-dependent RNA polymerases, reverse transcriptases, family A and B DNA polymerases, adenylyl cyclases, diguanylate cyclases (GGDEF domain) and the predicted polymerase of the CRISPR system. We show just as in these polymerases, Thg1 possesses an active site with three acidic residues that chelate Mg++ cations. Based on this we predict that Thg1 catalyzes polymerization similarly to the 5'-3' polymerases, but uses the incoming 3' OH to attack the 5' triphosphate generated at the end of the elongating polynucleotide. In addition we identify a distinct set of residues unique to Thg1 that we predict as comprising a second active site, which catalyzes the initial adenylation reaction to prime 3'-5' polymerization. Based on contextual information from conserved gene neighborhoods we show that Thg1 might function in conjunction with a polynucleotide kinase that generates an initial 5' phosphate substrate for it at the end of a RNA molecule. In addition to histidinyl tRNA maturation, Thg1 might have other RNA repair roles in representatives from all the three superkingdoms of life as well as certain large DNA viruses. We also present evidence that among the polymerase-like domains Thg1 is most closely related to the catalytic domains of the GGDEF and CRISPR polymerase proteins.
Based on this relationship and the phyletic patterns of these enzymes we infer that the Thg1 protein is likely to represent an archaeo-eukaryotic branch of the same clade of proteins that gave rise to the mobile CRISPR polymerases and in bacteria spawned the GGDEF domains. Thg1 is likely to be close to the ancestral version of this family of enzymes that might have played a role in RNA repair in the last universal common ancestor.
This article was reviewed by S. Balaji and V.V. Dolja.
PMCID: PMC2904730  PMID: 20591188
10.  OST-HTH: a novel predicted RNA-binding domain 
Biology Direct  2010;5:13.
The mechanism by which the arthropod Oskar and vertebrate TDRD5/TDRD7 proteins nucleate or organize structurally related ribonucleoprotein (RNP) complexes, the polar granule and nuage, is poorly understood. Using sequence profile searches we identify a novel domain in these proteins that is widely conserved across eukaryotes and bacteria.
Using contextual information from domain architectures, sequence-structure superpositions and available functional information we predict that this domain is likely to adopt the winged helix-turn-helix fold and bind RNA with a potential specificity for dsRNA. We show that in eukaryotes this domain is often combined in the same polypeptide with protein-protein- or lipid- interaction domains that might play a role in anchoring these proteins to specific cytoskeletal structures.
Thus, proteins with this domain might have a key role in the recognition and localization of dsRNA, including miRNAs, rasiRNAs and piRNAs hybridized to their targets. In other cases, this domain is fused to ubiquitin-binding, E3 ligase and ubiquitin-like domains indicating a previously under-appreciated role for ubiquitination in regulating the assembly and stability of nuage-like RNP complexes. Both bacteria and eukaryotes encode a conserved family of proteins that combines this predicted RNA-binding domain with a previously uncharacterized domain (DUF88). We present evidence that it is an RNAse belonging to the superfamily that includes the 5'->3' nucleases, PIN and NYN domains and might be recruited to degrade certain RNAs.
This article was reviewed by Sandor Pongor and Arcady Mushegian.
PMCID: PMC2848206  PMID: 20302647
11.  Novel eukaryotic enzymes modifying cell-surface biopolymers 
Biology Direct  2010;5:1.
Eukaryotic extracellular matrices such as proteoglycans, sclerotinized structures, mucus, external tests, capsules, cell walls and waxes contain highly modified proteins, glycans and other composite biopolymers. Using comparative genomics and sequence profile analysis we identify several novel enzymes that could be potentially involved in the modification of cell-surface glycans or glycoproteins.
Using sequence analysis and conservation we define the acyltransferase domain prototyped by the fungal Cas1p proteins, identify its active site residues and unify them to the superfamily of classical 10TM acyltransferases (e.g. oatA). We also identify a novel family of esterases (prototyped by the previously uncharacterized N-terminal domain of Cas1p) that have a similar fold as the SGNH/GDSL esterases but differ from them in their conservation pattern.
We posit that the combined action of the acyltransferase and esterase domain plays an important role in controlling the acylation levels of glycans and thereby regulates their physico-chemical properties such as hygroscopicity, resistance to enzymatic hydrolysis and physical strength. We present evidence that the action of these novel enzymes on glycans might play an important role in host-pathogen interaction of plants, fungi and metazoans. We present evidence that in plants (e.g. PMR5 and ESK1) the regulation of carbohydrate acylation by these acylesterases might also play an important role in regulation of transpiration and stress resistance. We also identify a subfamily of these esterases in metazoans (e.g. C7orf58), which are fused to an ATP-grasp amino acid ligase domain that is predicted to catalyze, in certain animals, modification of cell surface polymers by amino acid or peptides.
This article was reviewed by Gaspar Jekely and Frank Eisenhaber
PMCID: PMC2824669  PMID: 20056006
12.  The Anabaena sensory rhodopsin transducer defines a novel superfamily of prokaryotic small-molecule binding domains 
Biology Direct  2009;4:25.
The Anabaena sensory rhodopsin transducer (ASRT) is a small protein that has been claimed to function as a signaling molecule downstream of the cyanobacterial sensory rhodopsin. However, orthologs of ASRT have been detected in several bacteria that lack rhodopsin, raising questions about the generality of this function. Using sequence profile searches we show that ASRT defines a novel superfamily of β-sandwich fold domains. Through contextual inference based on domain architectures and predicted operons and structural analysis we present strong evidence that these domains bind small molecules, most probably sugars. We propose that the intracellular versions like ASRT probably participate as sensors that regulate a diverse range of sugar metabolism operons or even the light sensory behavior in Anabaena by binding sugars or related metabolites. We also show that one of the extracellular versions define a predicted sugar-binding structure in a novel cell-surface lipoprotein found across actinobacteria, including several pathogens such as Tropheryma, Actinomyces and Thermobifida. The analysis of this superfamily also provides new data to investigate the evolution of carbohydrate binding modes in β-sandwich domains with very different topologies.
Reviewers: This article was reviewed by M. Madan Babu and Mark A. Ragan.
PMCID: PMC2739507  PMID: 19682383
13.  Unraveling the biochemistry and provenance of pupylation: a prokaryotic analog of ubiquitination 
Biology Direct  2008;3:45.
Recently Mycobacterium tuberculosis was shown to possess a novel protein modification, in which a small protein Pup is conjugated to the epsilon-amino groups of lysines in target proteins. Analogous to ubiquitin modification in eukaryotes, this remarkable modification recruits proteins for degradation via archaeal-type proteasomes found in mycobacteria and allied actinobacteria. While a mycobacterial protein named PafA was found to be required for this conjugation reaction, its biochemical mechanism has not been elucidated. Using sensitive sequence profile comparison methods we establish that the PafA family proteins are related to the γ-glutamyl-cysteine synthetase and glutamine synthetase. Hence, we predict that PafA is the Pup ligase, which catalyzes the ATP-dependent ligation of the terminal γ-carboxylate of glutamate to lysines, similar to the above enzymes. We further discovered that an ortholog of the eukaryotic PAC2 (e.g. cg2106) is often present in the vicinity of the actinobacterial Pup-proteasome gene neighborhoods and is likely to represent the ancestral proteasomal chaperone. Pup-conjugation is sporadically present outside the actinobacteria in certain lineages, such as verrucomicrobia, nitrospirae, deltaproteobacteria and planctomycetes, and in the latter two lineages it might modify membrane proteins.
This article was reviewed by M. Madan Babu and Andrei Osterman
PMCID: PMC2588565  PMID: 18980670
14.  A new family of polymerases related to superfamily A DNA polymerases and T7-like DNA-dependent RNA polymerases 
Biology Direct  2008;3:39.
Using sequence profile methods and structural comparisons we characterize a previously unknown family of nucleic acid polymerases in a group of mobile elements from genomes of diverse bacteria, an algal plastid and certain DNA viruses, including the recently reported Sputnik virus. Using contextual information from domain architectures and gene-neighborhoods we present evidence that they are likely to possess both primase and DNA polymerase activity, comparable to the previously reported prim-pol proteins. These newly identified polymerases help in defining the minimal functional core of superfamily A DNA polymerases and related RNA polymerases. Thus, they provide a framework to understand the emergence of both DNA and RNA polymerization activity in this class of enzymes. They also provide evidence that enigmatic DNA viruses, such as Sputnik, might have emerged from mobile elements coding these polymerases.
This article was reviewed by Eugene Koonin and Mark Ragan.
PMCID: PMC2579912  PMID: 18834537
15.  MutL homologs in restriction-modification systems and the origin of eukaryotic MORC ATPases 
Biology Direct  2008;3:8.
The provenance and biochemical roles of eukaryotic MORC proteins have remained poorly understood since the discovery of their prototype MORC1, which is required for meiotic nuclear division in animals. The MORC family contains a combination of a gyrase, histidine kinase, and MutL (GHKL) and S5 domains that together constitute a catalytically active ATPase module. We identify the prokaryotic MORCs and establish that the MORC family belongs to a larger radiation of several families of GHKL proteins (paraMORCs) in prokaryotes. Using contextual information from conserved gene neighborhoods we show that these proteins primarily function in restriction-modification systems, in conjunction with diverse superfamily II DNA helicases and endonucleases. The common ancestor of these GHKL proteins, MutL and topoisomerase ATPase modules appears to have catalyzed structural reorganization of protein complexes and concomitant DNA-superstructure manipulations along with fused or standalone nuclease domains. Furthermore, contextual associations of the prokaryotic MORCs and their relatives suggest that their eukaryotic counterparts are likely to carry out chromatin remodeling by DNA superstructure manipulation in response to epigenetic signals such as histone and DNA methylation.
This article was reviewed by Arcady Mushegian and Gaspar Jekely.
PMCID: PMC2292703  PMID: 18346280
16.  Opening Pandora's Box: making biological discoveries through computational data exploration 
Biology Direct  2007;2:29.
PMCID: PMC2092420  PMID: 18028542
17.  Small but versatile: the extraordinary functional and structural diversity of the β-grasp fold 
Biology Direct  2007;2:18.
The β-grasp fold (β-GF), prototyped by ubiquitin (UB), has been recruited for a strikingly diverse range of biochemical functions. These functions include providing a scaffold for different enzymatic active sites (e.g. NUDIX phosphohydrolases) and iron-sulfur clusters, RNA-soluble-ligand and co-factor-binding, sulfur transfer, adaptor functions in signaling, assembly of macromolecular complexes and post-translational protein modification. To understand the basis for the functional versatility of this small fold we undertook a comprehensive sequence-structure analysis of the fold and developed a natural classification for its members.
As a result we were able to define the core distinguishing features of the fold and numerous elaborations, including several previously unrecognized variants. Systematic analysis of all known interactions of the fold showed that its manifold functional abilities arise primarily from the prominent β-sheet, which provides an exposed surface for diverse interactions or additionally, by forming open barrel-like structures. We show that in the β-GF both enzymatic activities and the binding of diverse co-factors (e.g. molybdopterin) have independently evolved on at least three occasions each, and iron-sulfur-cluster-binding on at least two independent occasions. Our analysis identified multiple previously unknown large monophyletic assemblages within the β-GF, including one which unifies versions found in the fasciclin-1 superfamily, the ribosomal protein L25, the phosphoribosyl AMP cyclohydrolase (HisI) and glutamine synthetase. We also uncovered several new groups of β-GF domains including a domain found in bacterial flagellar and fimbrial assembly components, and 5 new UB-like domains in the eukaryotes.
Evolutionary reconstruction indicates that the β-GF had differentiated into at least 7 distinct lineages by the time of the last universal common ancestor of all extant organisms, encompassing much of the structural diversity observed in extant versions of the fold. The earliest β-GF members were probably involved in RNA metabolism and subsequently radiated into various functional niches. Most of the structural diversification occurred in the prokaryotes, whereas the eukaryotic phase was mainly marked by a specific expansion of the ubiquitin-like β-GF members. The eukaryotic UB superfamily diversified into at least 67 distinct families, of which at least 19–20 families were already present in the eukaryotic common ancestor, including several protein and one lipid conjugated forms. Another key aspect of the eukaryotic phase of evolution of the β-GF was the dramatic increase in domain architectural complexity of proteins related to the expansion of UB-like domains in numerous adaptor roles.
This article was reviewed by Igor Zhulin, Arcady Mushegian and Frank Eisenhaber.
PMCID: PMC1949818  PMID: 17605815
18.  A novel superfamily containing the β-grasp fold involved in binding diverse soluble ligands 
Biology Direct  2007;2:4.
Domains containing the β-grasp fold are utilized in a great diversity of physiological functions but their role, if any, in soluble or small molecule ligand recognition is poorly studied.
Using sensitive sequence and structure similarity searches we identify a novel superfamily containing the β-grasp fold. They are found in a diverse set of proteins that include the animal vitamin B12 uptake proteins transcobalamin and intrinsic factor, the bacterial polysaccharide export proteins, the competence DNA receptor ComEA, the cob(I)alamin generating enzyme PduS and the Nqo1 subunit of the respiratory electron transport chain. We present evidence that members of this superfamily are likely to bind a range of soluble ligands, including B12. There are two major clades within this superfamily, namely the transcobalamin-like clade and the Nqo1-like clade. The former clade is typified by an insert of a β-hairpin after the helix of the β-grasp fold, whereas the latter clade is characterized by an insert between strands 4 and 5 of the core fold.
Members of both clades within this superfamily are predicted to interact with ligands in a similar spatial location, with their specific inserts playing a role in the process. Both clades are widely represented in bacteria suggesting that this superfamily was derived early in bacterial evolution. The animal lineage appears to have acquired the transcobalamin-like proteins from low GC Gram-positive bacteria, and this might be correlated with the emergence of the ability to utilize B12 produced by gut bacteria.
This article was reviewed by Andrei Osterman, Igor Zhulin, and Arcady Mushegian.
PMCID: PMC1796856  PMID: 17250770
19.  The signaling helix: a common functional theme in diverse signaling proteins 
Biology Direct  2006;1:25.
The mechanism by which the signals are transmitted between receptor and effector domains in multi-domain signaling proteins is poorly understood.
Using sensitive sequence analysis methods we identify a conserved helical segment of around 40 residues in a wide range of signaling proteins, including numerous sensor histidine kinases such as Sln1p, and receptor guanylyl cyclases such as the atrial natriuretic peptide receptor and nitric oxide receptors. We term this helical segment the signaling (S)-helix and present evidence that it forms a novel parallel coiled-coil element, distinct from previously known helical segments in signaling proteins, such as the Dimerization-Histidine phosphotransfer module of histidine kinases, the intra-cellular domains of the chemotaxis receptors, inter-GAF domain helical linkers and the α-helical HAMP module. Analysis of domain architectures allowed us to reconstruct the domain-neighborhood graph for the S-helix, which showed that the S-helix almost always occurs between two signaling domains. Several striking patterns in the domain neighborhood of the S-helix also became evident from the graph. It most often separates diverse N-terminal sensory domains from various C-terminal catalytic signaling domains such as histidine kinases, cNMP cyclase, PP2C phosphatases, NtrC-like AAA+ ATPases and diguanylate cyclases. It might also occur between two sensory domains such as PAS domains and occasionally between a DNA-binding HTH domain and a sensory domain. The sequence conservation pattern of the S-helix revealed the presence of a unique constellation of polar residues in the dimer-interface positions within the central heptad of the coiled-coil formed by the S-helix.
Combining these observations with previously reported mutagenesis studies on different S-helix-containing proteins we suggest that it functions as a switch that prevents constitutive activation of linked downstream signaling domains. However, upon occurrence of specific conformational changes due to binding of ligand or other sensory inputs in a linked upstream domain it transmits the signal to the downstream domain. Thus, the S-helix represents one of the most prevalent functional themes involved in the flow of signals between modules in diverse prokaryote-type multi-domain signaling proteins.
This article was reviewed by Frank Eisenhaber, Arcady Mushegian and Sandor Pongor.
PMCID: PMC1592074  PMID: 16953892

Results 1-19 (19)