Ribosomal biogenesis has been extensively investigated, especially to identify the elusive nucleases and cofactors involved in the complex rRNA processing events in eukaryotes. Large-scale screens in yeast identified two biochemically uncharacterized proteins, TSR3 and TSR4, as being key players required for rRNA maturation. Using multiple computational approaches we identify the conserved domains comprising these proteins and establish sequence and structural features providing novel insights regarding their roles. TSR3 is unified with the DTW domain into a novel superfamily of predicted enzymatic domains, with the balance of the available evidence pointing toward an RNase role with the archaeo-eukaryotic TSR3 proteins processing rRNA and the bacterial versions potentially processing tRNA. TSR4, its other eukaryotic homologs PDCD2/rp-8, PDCD2L, Zfrp8, and trus, the predominantly bacterial DUF1963 proteins, and other uncharacterized proteins are unified into a new domain superfamily, which arose from an ancient duplication event of a strand-swapped, dimer-forming all-beta unit. We identify conserved features mediating protein-protein interactions (PPIs) and propose a potential chaperone-like function. While contextual evidence supports a conserved role in ribosome biogenesis for the eukaryotic TSR4-related proteins, there is no evidence for such a role for the bacterial versions. Whereas TSR3-related proteins can be traced to the last universal common ancestor (LUCA) with a well-supported archaeo-eukaryotic branch, TSR4-related proteins of eukaryotes are derived from within the bacterial radiation of this superfamily, with archaea entirely lacking them. This provides evidence for “systems admixture,” which followed the early endosymbiotic event, playing a key role in the emergence of the uniquely eukaryotic ribosome biogenesis process.
rRNA; TSR4; TSR3; 20S; 18S rRNA; tRNA; DTW domain; endosymbiosis
The complete genome sequence of the radiation resistant bacterium Deinococcus radiodurans R1 is composed of two chromosomes (2,648,615 and 412,340 basepairs), a megaplasmid (177,466 basepairs), and a small plasmid (45,702 basepairs) yielding a total genome of 3,284,123 basepairs. Multiple components distributed on the chromosomes and megaplasmid that contribute to the ability of D. radiodurans to survive under conditions of starvation, oxidative stress, and high levels of DNA-damage have been identified. D. radiodurans represents an organism in which all systems for DNA repair, DNA damage export, desiccation and starvation recovery, and genetic redundancy are present in one cell.
The mode of action of the bacterial ter cluster and TelA genes, implicated in natural resistance to tellurite and other xenobiotic toxic compounds, pore-forming colicins and several bacteriophages has remained enigmatic for almost two decades. Using comparative genomics, sequence-profile searches and structural analysis we present evidence that the ter gene products and their functional partners constitute previously underappreciated, chemical stress response and anti-viral defense systems of bacteria. Based on contextual information from conserved gene neighborhoods and domain architectures, we show that the ter gene products and TelA lie at the center of membrane-linked metal recognition complexes with regulatory ramifications encompassing phosphorylation-dependent signal transduction, RNA-dependent regulation, biosynthesis of nucleoside-like metabolites and DNA processing. Our analysis suggests that the multiple metal-binding and non-binding TerD paralogs and TerC are likely to constitute a membrane-associated complex, which might also include TerB and TerY, and feature several, distinct metal-binding sites. Versions of the TerB domain might also bind small molecule ligands and link the TerD paralog-TerC complex to biosynthetic modules comprised of phosphoribosyltransferases (PRTases), ATP grasp amidoligases, TIM-barrel carbon-carbon lyases, and HAD phosphoesterases, which are predicted to synthesize novel nucleoside-like molecules. One of the PRTases is also likely to interact with RNA by means of its Pelota/Ribosomal protein L7AE-like domain. The von Willebrand factor A domain protein, TerY, is predicted to be part of a distinct phosphorylation switch, coupling a protein kinase and a PP2C phosphatase. We show, based on the evidence from numerous conserved gene neighborhoods and domain architectures, that both the TerB and TelA domains have been linked to diverse lipid-interaction domains, such as two novel PH-like and the Coq4 domains, in different bacteria and are likely to comprise membrane-associated sensory complexes that might additionally contain periplasmic binding-protein-II and OmpA domains. The TerD and TerB domains and the TerY-associated phosphorylation system are also functionally linked to distinct DNA-processing complexes, which contain proteins with SWI2/SNF2 and RecQ-like helicases, multiple AAA+ ATPases, McrC-N-terminal domain proteins, several restriction endonuclease fold DNases, DNA-binding domains and a type-VII/Esx-like system, which is at the center of a predicted DNA transfer apparatus. These DNA-processing modules and associated genes are predicted to be involved in restriction or suicidal action in response to phages and possibly repairing xenobiotic-induced DNA damage. In some eukaryotes, certain components of the ter system appear to have recruited to function in conjunction with the ubiquitin system and calcium-signaling pathways.
In many organisms, the methylation of cytosine in DNA has a key role in silencing ‘parasitic’ DNA elements, regulating transcription and establishing cellular identity. The recent discovery that ten-eleven translocation (TET) proteins are 5-methylcytosine oxidases has provided several chemically plausible pathways for the reversal of DNA methylation, thus triggering a paradigm shift in our understanding of how changes in DNA methylation are coupled to cell differentiation, embryonic development and cancer.
TET (Ten-Eleven-Translocation) proteins are Fe(II) and α-ketoglutarate-dependent dioxygenases1-3 that modify the methylation status of DNA by successively oxidizing 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine and 5-carboxycytosine1,3-5, potential intermediates in the active erasure of DNA methylation marks5,6. We show here that IDAX/ CXXC4, a player in the Wnt signaling pathway7 that has been implicated in malignant renal cell carcinoma8 and colonic villous adenoma9, functions as a negative regulator of TET2 protein expression. IDAX/ CXXC4 was originally encoded within an ancestral TET2 gene that underwent a chromosomal gene inversion during evolution, thus separating the TET2 CXXC domain from the catalytic domain. The Idax CXXC domain binds DNA sequences containing unmethylated CpGs, localises to promoters and CpG islands in genomic DNA, and interacts directly with the catalytic domain of Tet2. Unexpectedly, Idax expression resulted in caspase activation and Tet2 protein downregulation, in a manner that depended on DNA-binding through the Idax CXXC domain. Idax depletion prevented Tet2 downregulation in differentiating mouse embryonic stem (ES) cells, and shRNA against IDAX increased TET2 protein expression in the human monocytic cell line U937. Notably, we find that the expression and activity of TET3 are also regulated through its CXXC domain. Taken together, these results establish the separate and linked CXXC domains of TET2 and TET3 respectively as novel regulators of caspase activation and TET enzymatic activity.
CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and “effector” domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death.
CRISPR; Rossmann fold; beta barrel; DNA-binding proteins; phage defense
CA_C2195 from Clostridium acetobutylicum is a protein of unknown function. Sequence analysis predicted that part of the protein contained a metallopeptidase-related domain. There are over 200 homologs of similar size in large sequence databases such as UniProt, with pairwise sequence identities in the range of ~40-60%. CA_C2195 was chosen for crystal structure determination for structure-based function annotation of novel protein sequence space.
The structure confirmed that CA_C2195 contained an N-terminal metallopeptidase-like domain. The structure revealed two extra domains: an α+β domain inserted in the metallopeptidase-like domain and a C-terminal circularly permuted winged-helix-turn-helix domain.
Based on our sequence and structural analyses using the crystal structure of CA_C2195 we provide a view into the possible functions of the protein. From contextual information from gene-neighborhood analysis, we propose that rather than being a peptidase, CA_C2195 and its homologs might play a role in biosynthesis of a modified cell-surface carbohydrate in conjunction with several sugar-modification enzymes. These results provide the groundwork for the experimental verification of the function.
CA_C2195; Peptidase; DUF4910; DUF2172; HTH_47; Structural genomics
The variant antigen, Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1), expressed on the surface of P. falciparum infected Red Blood Cells (iRBCs) is a critical virulence factor for malaria1. Each parasite encodes 60 antigenically distinct var genes encoding PfEMP1s, but during infection the clonal parasite population expresses only one gene at a time before switching to the expression of a new variant antigen as an immune evasion mechanism to avoid the host’s antibody responses2,3. The mechanism by which 59 of the 60 var genes are silenced remains largely unknown4–7. Here we show that knocking out the P. falciparum variant-silencing SET gene (PfSETvs), which encodes an ortholog of Drosophila melanogaster ASH1 and controls histone H3 lysine 36 trimethylation (H3K36me3) on var genes, results in the transcription of virtually all var genes in the single parasite nuclei and their expression as proteins on the surface of individual iRBCs. PfSETvs-dependent H3K36me3 is present along the entire gene body including the transcription start site (TSS) to silence var genes. With low occupancy of PfSETvs at both the TSS of var genes and the intronic promoter, expression of var genes coincides with transcription of their corresponding antisense long non-coding RNA (lncRNA). These results uncover a novel role of the PfSETvs-dependent H3K36me3 in silencing var genes in P. falciparum that might provide a general mechanism by which orthologs of PfSETvs repress gene expression in other eukaryotes. PfSETvs knockout parasites expressing all PfEMP1s may also be applied to the development of a malaria vaccine.
Background. γ-irradiation is commonly used to create attenuation in Plasmodium parasites. However, there are no systematic studies on the survival, reversion of virulence, and molecular basis for γ-radiation–induced cell death in malaria parasites.
Methods. The effect of γ-irradiation on the growth of asexual Plasmodium falciparum was studied in erythrocyte cultures. Cellular and ultrastructural changes within the parasite were studied by fluorescence and electron microscopy, and genome-wide transcriptional profiling was performed to identify parasite biomarkers of attenuation and cell death.
Results. γ-radiation induced the death of P. falciparum in a dose-dependent manner. These parasites had defective mitosis, sparse cytoplasm, fewer ribosomes, disorganized and clumped organelles, and large vacuoles—observations consistent with “distressed” or dying parasites. A total of 185 parasite genes were transcriptionally altered in response to γ-irradiation (45.9% upregulated, 54.1% downregulated). Loss of parasite survival was correlated with the downregulation of genes encoding translation factors and with upregulation of genes associated with messenger RNA–sequestering stress granules. Genes pertaining to cell-surface interactions, host-cell remodeling, and secreted proteins were also altered.
Conclusions. These studies provide a framework to assess the safety of γ-irradiation attenuation and promising targets for genetic deletion to produce whole parasite-based attenuated vaccines.
NOD2; IL-17; Th17; T lymphocytes; ocular toxoplasmosis; Toxoplasma gondii
Group B Streptococcus invades human amniotic epithelial cells using a hemolytic pigment.
Microbial infection of the amniotic fluid is a significant cause of fetal injury, preterm birth, and newborn infections. Group B Streptococcus (GBS) is an important human bacterial pathogen associated with preterm birth, fetal injury, and neonatal mortality. Although GBS has been isolated from amniotic fluid of women in preterm labor, mechanisms of in utero infection remain unknown. Previous studies indicated that GBS are unable to invade human amniotic epithelial cells (hAECs), which represent the last barrier to the amniotic cavity and fetus. We show that GBS invades hAECs and strains lacking the hemolysin repressor CovR/S accelerate amniotic barrier failure and penetrate chorioamniotic membranes in a hemolysin-dependent manner. Clinical GBS isolates obtained from women in preterm labor are hyperhemolytic and some are associated with covR/S mutations. We demonstrate for the first time that hemolytic and cytolytic activity of GBS is due to the ornithine rhamnolipid pigment and not due to a pore-forming protein toxin. Our studies emphasize the importance of the hemolytic GBS pigment in ascending infection and fetal injury.
A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family.
JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome.
We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
LUD; DUF162; LutB; LutC; Domain of unknown function; Deinococcus radiodurans
The NTF2-like superfamily is a versatile group of protein domains sharing a common fold. The sequences of these domains are very diverse and they share no common sequence motif. These domains serve a range of different functions within the proteins in which they are found, including both catalytic and non-catalytic versions. Clues to the function of protein domains belonging to such a diverse superfamily can be gleaned from analysis of the proteins and organisms in which they are found.
Here we describe three protein domains of unknown function found mainly in bacteria: DUF3828, DUF3887 and DUF4878. Structures of representatives of each of these domains: BT_3511 from Bacteroides thetaiotaomicron (strain VPI-5482) [PDB:3KZT], Cj0202c from Campylobacter jejuni subsp. jejuni serotype O:2 (strain NCTC 11168) [PDB:3K7C], rumgna_01855) and RUMGNA_01855 from Ruminococcus gnavus (strain ATCC 29149) [PDB:4HYZ] have been solved by X-ray crystallography. All three domains are similar in structure and all belong to the NTF2-like superfamily. Although the function of these domains remains unknown at present, our analysis enables us to present a hypothesis concerning their role.
Our analysis of these three protein domains suggests a potential non-catalytic ligand-binding role. This may regulate the activities of domains with which they are combined in the same polypeptide or via operonic linkages, such as signaling domains (e.g. serine/threonine protein kinase), peptidoglycan-processing hydrolases (e.g. NlpC/P60 peptidases) or nucleic acid binding domains (e.g. Zn-ribbons).
NTF2-like superfamily; Protein function prediction; Protein structure; Ligand-binding; JCSG; 3D structure; Protein family
Summary: Gibberellic acids (GAs) are key plant hormones, regulating various aspects of growth and development, which have been at the center of the ‘green revolution’. GRAS family proteins, the primary players in GA signaling pathways, remain poorly understood. Using sequence-profile searches, structural comparisons and phylogenetic analysis, we establish that the GRAS family first emerged in bacteria and belongs to the Rossmann fold methyltransferase superfamily. All bacterial and a subset of plant GRAS proteins are likely to function as small-molecule methylases. The remaining plant versions have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. We predict that GRAS proteins might either modify or bind small molecules such as GAs or their derivatives.
Supplementary Material for this article is available at Bioinformatics online.
We provide a portrait of the bacterial transcription apparatus in light of the data emerging from structural studies, sequence analysis and comparative genomics to bring out important but underappreciated features. We first describe the key structural highlights and evolutionary implications emerging from comparison of the cellular RNA polymerase subunits with the RNA-dependent RNA polymerase involved in RNAi in eukaryotes and their homologs from newly identified bacterial selfish elements. We describe some previously unnoticed domains and the possible evolutionary stages leading to the RNA polymerases of extant life forms. We then present the case for the ancient orthology of the basal transcription factors, the sigma factor and TFIIB, in the bacterial and the archaeo-eukaryotic lineages. We also present a synopsis of the structural and architectural taxonomy of specific transcription factors and their genome-scale demography. In this context, we present certain notable deviations from the otherwise invariant prote-ome-wide trends in transcription factor distribution and use it to predict the presence of an unusual lineage-specifically expanded signaling system in certain firmicutes like Paenibacillus. We then discuss the intersection between functional properties of transcription factors and the organization of transcriptional networks. Finally, we present some of the interesting evolutionary conundrums posed by our newly gained understanding of the bacterial transcription apparatus and potential areas for future explorations.
RNA polymerase; Beta barrel; Two component system; Activators; Transcription factors; Mobile elements; ATPases
Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology.
We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria.
Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.
Domain of unknown function; Protein family; Protein structure; DUF4424; YARHG domain; Sequence analysis
The bacterial SOS response is an elaborate program for DNA repair, cell cycle regulation and adaptive mutagenesis under stress conditions. Using sensitive sequence and structure analysis, combined with contextual information derived from comparative genomics and domain architectures, we identify two novel domain superfamilies in the SOS response system. We present evidence that one of these, the SOS response associated peptidase (SRAP; Pfam: DUF159) is a novel thiol autopeptidase. Given the involvement of other autopeptidases, such as LexA and UmuD, in the SOS response, this finding suggests that multiple structurally unrelated peptidases have been recruited to this process. The second of these, the ImuB-C superfamily, is linked to the Y-family DNA polymerase-related domain in ImuB, and also occurs as a standalone protein. We present evidence using gene neighborhood analysis that both these domains function with different mutagenic polymerases in bacteria, such as Pol IV (DinB), Pol V (UmuCD) and ImuA-ImuB-DnaE2 and also other repair systems, which either deploy Ku and an ATP-dependent ligase or a SplB-like radical SAM photolyase. We suggest that the SRAP superfamily domain functions as a DNA-associated autoproteolytic switch that recruits diverse repair enzymes upon DNA damage, whereas the ImuB-C domain performs a similar function albeit in a non-catalytic fashion. We propose that C3Orf37, the eukaryotic member of the SRAP superfamily, which has been recently shown to specifically bind DNA with 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxycytosine, is a sensor for these oxidized bases generated by the TET enzymes from methylcytosine. Hence, its autoproteolytic activity might help it act as a switch that recruits DNA repair enzymes to remove these oxidized methylcytosine species as part of the DNA demethylation pathway downstream of the TET enzymes.
This article was reviewed by RDS, RF and GJ.
Discovery of the TET/JBP family of dioxygenases that modify bases in DNA has sparked considerable interest in novel DNA base modifications and their biological roles. Using sensitive sequence and structure analyses combined with contextual information from comparative genomics, we computationally characterize over 12 novel biochemical systems for DNA modifications. We predict previously unidentified enzymes, such as the kinetoplastid J-base generating glycosyltransferase (and its homolog GREB1), the catalytic specificity of bacteriophage TET/JBP proteins and their role in complex DNA base modifications. We also predict the enzymes involved in synthesis of hypermodified bases such as alpha-glutamylthymine and alpha-putrescinylthymine that have remained enigmatic for several decades. Moreover, the current analysis suggests that bacteriophages and certain nucleo-cytoplasmic large DNA viruses contain an unexpectedly diverse range of DNA modification systems, in addition to those using previously characterized enzymes such as Dam, Dcm, TET/JBP, pyrimidine hydroxymethylases, Mom and glycosyltransferases. These include enzymes generating modified bases such as deazaguanines related to queuine and archaeosine, pyrimidines comparable with lysidine, those derived using modified S-adenosyl methionine derivatives and those using TET/JBP-generated hydroxymethyl pyrimidines as biosynthetic starting points. We present evidence that some of these modification systems are also widely dispersed across prokaryotes and certain eukaryotes such as basidiomycetes, chlorophyte and stramenopile alga, where they could serve as novel epigenetic marks for regulation or discrimination of self from non-self DNA. Our study extends the role of the PUA-like fold domains in recognition of modified nucleic acids and predicts versions of the ASCH and EVE domains to be novel ‘readers’ of modified bases in DNA. These results open opportunities for the investigation of the biology of these systems and their use in biotechnology.
The major role of enzymatic toxins that target nucleic acids in biological conflicts at all levels has become increasingly apparent thanks in large part to the advances of comparative genomics. Typically, toxins evolve rapidly hampering the identification of these proteins by sequence analysis. Here we analyze an unexpectedly widespread superfamily of toxin domains most of which possess RNase activity.
The HEPN superfamily is comprised of all α-helical domains that were first identified as being associated with DNA polymerase β-type nucleotidyltransferases in prokaryotes and animal Sacsin proteins. Using sensitive sequence and structure comparison methods, we vastly extend the HEPN superfamily by identifying numerous novel families and by detecting diverged HEPN domains in several known protein families. The new HEPN families include the RNase LS and LsoA catalytic domains, KEN domains (e.g. RNaseL and Ire1) and the RNase domains of RloC and PrrC. The majority of HEPN domains contain conserved motifs that constitute a metal-independent endoRNase active site. Some HEPN domains lacking this motif probably function as non-catalytic RNA-binding domains, such as in the case of the mannitol repressor MtlR. Our analysis shows that HEPN domains function as toxins that are shared by numerous systems implicated in intra-genomic, inter-genomic and intra-organismal conflicts across the three domains of cellular life. In prokaryotes HEPN domains are essential components of numerous toxin-antitoxin (TA) and abortive infection (Abi) systems and in addition are tightly associated with many restriction-modification (R-M) and CRISPR-Cas systems, and occasionally with other defense systems such as Pgl and Ter. We present evidence of multiple modes of action of HEPN domains in these systems, which include direct attack on viral RNAs (e.g. LsoA and RNase LS) in conjunction with other RNase domains (e.g. a novel RNase H fold domain, NamA), suicidal or dormancy-inducing attack on self RNAs (RM systems and possibly CRISPR-Cas systems), and suicidal attack coupled with direct interaction with phage components (Abi systems). These findings are compatible with the hypothesis on coupling of pathogen-targeting (immunity) and self-directed (programmed cell death and dormancy induction) responses in the evolution of robust antiviral strategies. We propose that altruistic cell suicide mediated by HEPN domains and other functionally similar RNases was essential for the evolution of kin and group selection and cell cooperation. HEPN domains were repeatedly acquired by eukaryotes and incorporated into several core functions such as endonucleolytic processing of the 5.8S-25S/28S rRNA precursor (Las1), a novel ER membrane-associated RNA degradation system (C6orf70), sensing of unprocessed transcripts at the nuclear periphery (Swt1). Multiple lines of evidence suggest that, similar to prokaryotes, HEPN proteins were recruited to antiviral, antitransposon, apoptotic systems or RNA-level response to unfolded proteins (Sacsin and KEN domains) in several groups of eukaryotes.
Extensive sequence and structure comparisons reveal unexpectedly broad presence of the HEPN domain in an enormous variety of defense and stress response systems across the tree of life. In addition, HEPN domains have been recruited to perform essential functions, in particular in eukaryotic rRNA processing. These findings are expected to stimulate experiments that could shed light on diverse cellular processes across the three domains of life.
This article was reviewed by Martijn Huynen, Igor Zhulin and Nick Grishin
The PIWI module, found in the PIWI/AGO superfamily of proteins, is a critical component of several cellular pathways including germline maintenance, chromatin organization, regulation of splicing, RNA interference, and virus suppression. It binds a guide strand which helps it target complementary nucleic strands.
Here we report the discovery of two divergent, novel families of PIWI modules, the first such to be described since the initial discovery of the PIWI/AGO superfamily over a decade ago. Both families display conservation patterns consistent with the binding of oligonucleotide guide strands. The first family is bacterial in distribution and is typically encoded by a distinctive three-gene operon alongside genes for a restriction endonuclease fold enzyme and a helicase of the DinG family. The second family is found only in eukaryotes. It is the core conserved module of the Med13 protein, a subunit of the CDK8 subcomplex of the transcription regulatory Mediator complex.
Based on the presence of the DinG family helicase, which specifically acts on R-loops, we infer that the first family of PIWI modules is part of a novel RNA-dependent restriction system which could target invasive DNA from phages, plasmids or conjugative transposons. It is predicted to facilitate restriction of actively transcribed invading DNA by utilizing RNA guides. The PIWI family found in the eukaryotic Med13 proteins throws new light on the regulatory switch through which the CDK8 subcomplex modulates transcription at Mediator-bound promoters of highly transcribed genes. We propose that this involves recognition of small RNAs by the PIWI module in Med13 resulting in a conformational switch that propagates through the Mediator complex.
This article was reviewed by Sandor Pongor, Frank Eisenhaber and Balaji Santhanam.
Complex regulatory networks orchestrate most cellular processes in biological systems. Genes in such networks are subject to expression noise, resulting in isogenic cell populations exhibiting cell-to-cell variation in protein levels. Increasing evidence suggests that cells have evolved regulatory strategies to limit, tolerate, or amplify expression noise. In this context, fundamental questions arise: how can the architecture of gene regulatory networks generate, make use of, or be constrained by expression noise? Here, we discuss the interplay between expression noise and gene regulatory network at different levels of organization, ranging from a single regulatory interaction to entire regulatory networks. We then consider how this interplay impacts a variety of phenomena such as pathogenicity, disease, adaptation to changing environments, differential cell-fate outcome and incomplete or partial penetrance effects. Finally, we highlight recent technological developments that permit measurements at the single-cell level, and discuss directions for future research.
expression noise; gene regulatory network; persistence; phenotypic variation; single-cell analysis; differentiation and development
Memory B cells are generated during an individual's first encounter with a foreign antigen and respond to re-encounter with the same antigen through cell surface immunoglobulin G (IgG) B cell receptors (BCRs) resulting in rapid, high-titered IgG antibody responses. Despite a central role for IgG BCRs in B cell memory, our understanding of the molecular mechanism by which IgG BCRs enhance antibody responses is incomplete. Here, we showed that the conserved cytoplasmic tail of the IgG BCR, which contains a putative PDZ-binding motif, associated with synapse-associated protein 97 (SAP97), a member of the PDZ domain–containing, membrane-associated guanylate-kinase family of scaffolding molecules that play key roles in controlling receptor density and signal strength at neuronal synapses. We showed that SAP97 accumulated and bound to IgG BCRs in the immune synapses that formed in response to engagement of the B cell with antigen. Knocking down SAP97 in IgG-expressing B cells or mutating the putative PDZ-binding motif in the tail impaired immune synapse formation, the initiation of IgG BCR signaling, and downstream activation of p38 mitogen-activated protein kinase. Thus, heightened B cell memory responses are encoded, in part, by a mechanism that involves SAP97 serving as a scaffolding protein in the IgG BCR immune synapse.
Chromatin dynamics play a central role in maintaining genome integrity, but how this is achieved remains largely unknown. Here, we report that microrchidia CW-type zinc finger 2 (MORC2), an uncharacterized protein with a derived PHD finger domain and a conserved GHKL-type ATPase module, is a physiological substrate of p21-activated kinase 1 (PAK1), an important integrator of extracellular signals and nuclear processes. Following DNA damage, MORC2 is phosphorylated on serine 739 in a PAK1 dependent manner, and phosphorylated MORC2 regulates its DNA-dependent ATPase activity to facilitate chromatin remodeling. Moreover, MORC2 associates with chromatin and promotes gamma-H2AX induction in a PAK1 phosphorylation-dependent manner. Consequently, cells expressing MORC2-S739A mutation displayed a reduction in DNA repair efficiency and were hypersensitive to DNA-damaging agent. These findings suggest that the PAK1-MORC2 axis is critical for orchestrating the interplay between chromatin dynamics and the maintenance of genomic integrity through sequentially integrating multiple essential enzymatic processes.
Chromatin remodeling; DNA damage response; Genomic stability; Modifier of radiosensitivity; MORC2
The tripartite DENN module, comprised of a N-terminal longin domain, followed by DENN, and d-DENN domains, is a GDP-GTP exchange factor (GEFs) for Rab GTPases, which are regulators of practically all membrane trafficking events in eukaryotes. Using sequence and structure analysis we identify multiple novel homologs of the DENN module, many of which can be traced back to the ancestral eukaryote. These findings provide unexpected leads regarding key cellular processes such as autophagy, vesicle-vacuole interactions, chromosome segregation, and human disease. Of these, SMCR8, the folliculin interacting protein-1 and 2 (FNIP1 and FNIP2), nitrogen permease regulator 2 (NPR2), and NPR3 are proposed to function in recruiting Rab GTPases during different steps of autophagy, fusion of autophagosomes with the vacuole and regulation of cellular metabolism. Another novel DENN protein identified in this study is C9ORF72; expansions of the hexanucleotide GGGGCC in its first intron have been recently implicated in amyotrophic lateral sclerosis (ALS) and fronto-temporal dementia (FTD). While this mutation is proposed to cause a RNA-level defect, the identification of C9ORF72 as a potential DENN-type GEF raises the possibility that at least part of the pathology might relate to a specific Rab-dependent vesicular trafficking process, as has been observed in the case of some other neurological conditions with similar phenotypes. We present evidence that the longin domain, such as those found in the DENN module, are likely to have been ultimately derived from the related domains found in prokaryotic GTPase-activating proteins of MglA-like GTPases. Thus, the origin of the longin domains from this ancient GTPase-interacting domain, concomitant with the radiation of GTPases, especially of the Rab clade, played an important role in the dynamics of eukaryotic intracellular membrane systems.
membrane trafficking; evolution; homology detection; DENN domain; longin domain; C9ORF72; ALS; FTD
The virus-host arms race is a major theater for evolutionary innovation. Archaea and bacteria have evolved diverse, elaborate antivirus defense systems that function on two general principles: i) immune systems that discriminate self DNA from nonself DNA and specifically destroy the foreign, in particular viral, genomes, whereas the host genome is protected, or ii) programmed cell suicide or dormancy induced by infection.
Presentation of the hypothesis
Almost all genomic loci encoding immunity systems such as CRISPR-Cas, restriction-modification and DNA phosphorothioation also encompass suicide genes, in particular those encoding known and predicted toxin nucleases, which do not appear to be directly involved in immunity. In contrast, the immunity systems do not appear to encode antitoxins found in typical toxin-antitoxin systems. This raises the possibility that components of the immunity system themselves act as reversible inhibitors of the associated toxin proteins or domains as has been demonstrated for the Escherichia coli anticodon nuclease PrrC that interacts with the PrrI restriction-modification system. We hypothesize that coupling of diverse immunity and suicide/dormancy systems in prokaryotes evolved under selective pressure to provide robustness to the antivirus response. We further propose that the involvement of suicide/dormancy systems in the coupled antivirus response could take two distinct forms:
1) induction of a dormancy-like state in the infected cell to ‘buy time’ for activation of adaptive immunity; 2) suicide or dormancy as the final recourse to prevent viral spread triggered by the failure of immunity.
Testing the hypothesis
This hypothesis entails many experimentally testable predictions. Specifically, we predict that Cas2 protein present in all cas operons is a mRNA-cleaving nuclease (interferase) that might be activated at an early stage of virus infection to enable incorporation of virus-specific spacers into the CRISPR locus or to trigger cell suicide when the immune function of CRISPR-Cas systems fails. Similarly, toxin-like activity is predicted for components of numerous other defense loci.
Implications of the hypothesis
The hypothesis implies that antivirus response in prokaryotes involves key decision-making steps at which the cell chooses the path to follow by sensing the course of virus infection.
This article was reviewed by Arcady Mushegian, Etienne Joly and Nick Grishin. For complete reviews, go to the Reviewers’ reports section.
Members of the Arabidopsis LSH1 and Oryza G1 (ALOG) family of proteins have been shown to function as key developmental regulators in land plants. However, their precise mode of action remains unclear. Using sensitive sequence and structure analysis, we show that the ALOG domains are a distinct version of the N-terminal DNA-binding domain shared by the XerC/D-like, protelomerase, topoisomerase-IA, and Flp tyrosine recombinases. ALOG domains are distinguished by the insertion of an additional zinc ribbon into this DNA-binding domain. In particular, we show that the ALOG domain is derived from the XerC/D-like recombinases of a novel class of DIRS-1-like retroposons. Copies of this element, which have been recently inactivated, are present in several marine metazoan lineages, whereas the stramenopile Ectocarpus, retains an active copy of the same. Thus, we predict that ALOG domains help establish organ identity and differentiation by binding specific DNA sequences and acting as transcription factors or recruiters of repressive chromatin. They are also found in certain plant defense proteins, where they are predicted to function as DNA sensors. The evolutionary history of the ALOG domain represents a unique instance of a domain, otherwise exclusively found in retroelements, being recruited as a specific transcription factor in the streptophyte lineage of plants. Hence, they add to the growing evidence for derivation of DNA-binding domains of eukaryotic specific TFs from mobile and selfish elements.
DIRS1; Tyrosine recombinase; Plant development; DNA-binding; Retroposon; Transcription factor; Chromatin protein; Plant defense