In flowering plants a number of genes have been identified which control the transition from a vegetative to generative phase of life cycle. In bryophytes representing basal lineage of land plants, there is little data regarding the mechanisms that control this transition. Two species from bryophytes - moss Physcomitrella patens and liverwort Marchantia polymorpha are under advanced molecular and genetic research. The goal of our study was to identify genes connected to female gametophyte development and archegonia production in the dioecious liverwort Pellia endiviifolia species B, which is representative of the most basal lineage of the simple thalloid liverworts.
The utility of the RDA-cDNA technique allowed us to identify three genes specifically expressed in the female individuals of P.endiviifolia: PenB_CYSP coding for cysteine protease, PenB_MT2 and PenB_MT3 coding for Mysterious Transcripts1 and 2 containing ORFs of 143 and 177 amino acid residues in length, respectively. The exon-intron structure of all three genes has been characterized and pre-mRNA processing was investigated. Interestingly, five mRNA isoforms are produced from the PenB_MT2 gene, which result from alternative splicing within the second and third exon. All observed splicing events take place within the 5′UTR and do not interfere with the coding sequence. All three genes are exclusively expressed in the female individuals, regardless of whether they were cultured in vitro or were collected from a natural habitat. Moreover we observed ten-fold increased transcripts level for all three genes in the archegonial tissue in comparison to the vegetative parts of the same female thalli grown in natural habitat suggesting their connection to archegonia development.
We have identified three genes which are specifically expressed in P.endiviifolia sp B female gametophytes. Moreover, their expression is connected to the female sex-organ differentiation and is developmentally regulated. The contribution of the identified genes may be crucial for successful liverwort sexual reproduction.
Liverwort; Pellia; Archegonia development; Sexual reproduction; Dioecious gametophytes; Gene expression
•AZIN2, unlike ornithine decarboxylase, exists as a monomer in solution.•Conserved residues among AZIN2 orthologs are critical for the binding to antizymes (AZs).•Substitution of the conserved residues affects the ability of AZIN2 to modulate polyamine levels.•AZIN2 and AZs are extremely labile proteins, which mutually stabilize each other.•Other proteolytic systems, besides the 26S proteasome, might be involved in AZIN2 degradation.
Ornithine decarboxylase (ODC) is the key enzyme in the polyamine biosynthetic pathway. ODC levels are controlled by polyamines through the induction of antizymes (AZs), small proteins that inhibit ODC and target it to proteasomal degradation without ubiquitination. Antizyme inhibitors (AZIN1 and AZIN2) are proteins homologous to ODC that bind to AZs and counteract their negative effect on ODC. Whereas ODC and AZIN1 are well-characterized proteins, little is known on the structure and stability of AZIN2, the lastly discovered member of this regulatory circuit. In this work we first analyzed structural aspects of AZIN2 by combining biochemical and computational approaches. We demonstrated that AZIN2, in contrast to ODC, does not form homodimers, although the predicted tertiary structure of the AZIN2 monomer was similar to that of ODC. Furthermore, we identified conserved residues in the antizyme-binding element, whose substitution drastically affected the capacity of AZIN2 to bind AZ1. On the other hand, we also found that AZIN2 is much more labile than ODC, but it is highly stabilized by its binding to AZs. Interestingly, the administration of the proteasome inhibitor MG132 caused differential effects on the three AZ-binding proteins, having no effect on ODC, preventing the degradation of AZIN1, but unexpectedly increasing the degradation of AZIN2. Inhibitors of the lysosomal function partially prevented the effect of MG132 on AZIN2. These results suggest that the degradation of AZIN2 could be also mediated by an alternative route to that of proteasome. These findings provide new relevant information on this unique regulatory mechanism of polyamine metabolism.
AZ, antizyme; AZBE, antizyme-binding element; AZIN, antizyme inhibitor; ERGIC, endoplasmic reticulum-Golgi intermediate compartment; ODC, ornithine decarboxylase; GDT_TS, global distance test total score; HA, hemagglutinin; HEK, human embryonic kidney; PAGE, polyacrylamide gel electrophoresis; RMSD, root-mean-square deviation; TGN, trans-Golgi network; Antizyme; Antizyme-binding element; Homology modeling; Polyamines; Protein degradation; Proteasome inhibitors
R.MwoI is a Type II restriction endonucleases enzyme (REase), which specifically recognizes a palindromic interrupted DNA sequence 5′-GCNNNNNNNGC-3′ (where N indicates any nucleotide), and hydrolyzes the phosphodiester bond in the DNA between the 7th and 8th base in both strands. R.MwoI exhibits remote sequence similarity to R.BglI, a REase with known structure, which recognizes an interrupted palindromic target 5′-GCCNNNNNGGC-3′. A homology model of R.MwoI in complex with DNA was constructed and used to predict functionally important amino acid residues that were subsequently targeted by mutagenesis. The model, together with the supporting experimental data, revealed regions important for recognition of the common bases in DNA sequences recognized by R.BglI and R.MwoI. Based on the bioinformatics analysis, we designed substitutions of the S310 residue in R.MwoI to arginine or glutamic acid, which led to enzyme variants with altered sequence selectivity compared with the wild-type enzyme. The S310R variant of R.MwoI preferred the 5′-GCCNNNNNGGC-3′ sequence as a target, similarly to R.BglI, whereas the S310E variant preferentially cleaved a subset of the MwoI sites, depending on the identity of the 3rd and 9th nucleotide residues. Our results represent a case study of a REase sequence specificity alteration by a single amino acid substitution, based on a theoretical model in the absence of a crystal structure.
DNA is continuously exposed to many different damaging agents such as environmental chemicals, UV light, ionizing radiation, and reactive cellular metabolites. DNA lesions can result in different phenotypical consequences ranging from a number of diseases, including cancer, to cellular malfunction, cell death, or aging. To counteract the deleterious effects of DNA damage, cells have developed various repair systems, including biochemical pathways responsible for the removal of single-strand lesions such as base excision repair (BER) and nucleotide excision repair (NER) or specialized polymerases temporarily taking over lesion-arrested DNA polymerases during the S phase in translesion synthesis (TLS). There are also other mechanisms of DNA repair such as homologous recombination repair (HRR), nonhomologous end-joining repair (NHEJ), or DNA damage response system (DDR). This paper reviews bioinformatics resources specialized in disseminating information about DNA repair pathways, proteins involved in repair mechanisms, damaging agents, and DNA lesions.
Ribonuclease H-like (RNHL) superfamily, also called the retroviral integrase superfamily, groups together numerous enzymes involved in nucleic acid metabolism and implicated in many biological processes, including replication, homologous recombination, DNA repair, transposition and RNA interference. The RNHL superfamily proteins show extensive divergence of sequences and structures. We conducted database searches to identify members of the RNHL superfamily (including those previously unknown), yielding >60 000 unique domain sequences. Our analysis led to the identification of new RNHL superfamily members, such as RRXRR (PF14239), DUF460 (PF04312, COG2433), DUF3010 (PF11215), DUF429 (PF04250 and COG2410, COG4328, COG4923), DUF1092 (PF06485), COG5558, OrfB_IS605 (PF01385, COG0675) and Peptidase_A17 (PF05380). Based on the clustering analysis we grouped all identified RNHL domain sequences into 152 families. Phylogenetic studies revealed relationships between these families, and suggested a possible history of the evolution of RNHL fold and its active site. Our results revealed clear division of the RNHL superfamily into exonucleases and endonucleases. Structural analyses of features characteristic for particular groups revealed a correlation between the orientation of the C-terminal helix with the exonuclease/endonuclease function and the architecture of the active site. Our analysis provides a comprehensive picture of sequence-structure-function relationships in the RNHL superfamily that may guide functional studies of the previously uncharacterized protein families.
The reovirus λ2 protein catalyzes mRNA capping, that is, addition of a guanosine to the 5' end of each transcript in a 5'-to-5' orientation, as well as transfer of a methyl group from S-adenosyl-L-methionine (AdoMet) to the N7 atom of the added guanosyl moiety and subsequently to the ribose 2'-O atom of the first template-encoded nucleotide. The structure of the human reovirus core has been solved at 3.6 Å resolution, revealing a series of domains that include a putative guanylyltransferase domain and two putative methyltransferase (MTase) domains. It has been suggested that the order of domains in the λ2 protein corresponds to the order of reactions in the pathway and that the m7G (cap 0) and the 2'-O-ribose (cap 1) MTase activities may be exerted by the MTase 1 and the MTase 2 domains, respectively.
We show that the reovirus MTase 1 domain shares a putative active site with the structurally characterized 2'-O-ribose MTases, including vaccinia virus cap 1 MTase, whereas the MTase 2 domain is structurally similar to glycine N-MTase.
On the basis of our analysis of the structural details we propose that the previously suggested functional assignments of the MTase 1 and MTase 2 domains should be swapped.
The RNA Bricks database (http://iimcb.genesilico.pl/rnabricks), stores information about recurrent RNA 3D motifs and their interactions, found in experimentally determined RNA structures and in RNA–protein complexes. In contrast to other similar tools (RNA 3D Motif Atlas, RNA Frabase, Rloom) RNA motifs, i.e. ‘RNA bricks’ are presented in the molecular environment, in which they were determined, including RNA, protein, metal ions, water molecules and ligands. All nucleotide residues in RNA bricks are annotated with structural quality scores that describe real-space correlation coefficients with the electron density data (if available), backbone geometry and possible steric conflicts, which can be used to identify poorly modeled residues. The database is also equipped with an algorithm for 3D motif search and comparison. The algorithm compares spatial positions of backbone atoms of the user-provided query structure and of stored RNA motifs, without relying on sequence or secondary structure information. This enables the identification of local structural similarities among evolutionarily related and unrelated RNA molecules. Besides, the search utility enables searching ‘RNA bricks’ according to sequence similarity, and makes it possible to identify motifs with modified ribonucleotide residues at specific positions.
Mandibuloacral dysplasia (MAD) is a rare disease resulting from a mutation of LMNA gene encoding lamins A and C. The most common mutation associated with this disease is a homozygous arginine 527 replacement by histidine. Three female patients originating from two unrelated families from Northeast Egypt were examined. Their growth was retarded; they had microcephaly, widened cranial sutures, prominent eyes and cheeks, micrognathia, dental crowding, hypoplastic mandible, acro-osteolysis of distal phalanges, and joint contractures. In addition, they presented some progeroid features, such as pinched nose, premature loss of teeth, loss of hair, scleroderma-like skin atrophy, spine rigidity, and waddling gait. The clinical presentation of the disease varied between the patient originating from Family 1 and patients from Family 2, suggesting that unknown, possibly epigenetic factors, modify the course of the disease. The first symptoms of the disease appeared at the age of 2.5 (a girl from Family 1), 5, and 3 years (girls from Family 2). All patients had the same, novel homozygous c.1580G>T LMNA mutation, resulting in the replacement of arginine 527 by leucine. Computational predictions of such substitution effects suggested that it might alter protein stability and increase the tendency for protein aggregation, and as a result, might influence its interaction with other proteins. In addition, restriction fragment-length polymorphism analysis performed in 178 unrelated individuals showed that up to 1.12% of inhabitants of Northeast Egypt might be heterozygous carriers of this mutation, suggesting the presence of a founder effect in this area.
LMNA; lamin A/C; mutation; mandibuloacral dysplasia; progeria
Risk alleles for complex diseases are widely spread throughout human populations. However, little is known about the geographic distribution and frequencies of risk alleles, which may contribute to differences in disease susceptibility and prevalence among populations. Here, we focus on Crohn's disease (CD) as a model for the evolutionary study of complex disease alleles. Recent genome-wide association studies and classical linkage analyses have identified more than 70 susceptible genomic regions for CD in Europeans, but only a few have been confirmed in non-European populations. Our analysis of eight European-specific susceptibility genes using HapMap data shows that at the NOD2 locus the CD-risk alleles are linked with a haplotype specific to CEU at a frequency that is significantly higher compared with the entire genome. We subsequently examined nine global populations and found that the CD-risk alleles spread through hitchhiking with a high-frequency haplotype (H1) exclusive to Europeans. To examine the neutrality of NOD2, we performed phylogenetic network analyses, coalescent simulation, protein structural prediction, characterization of mutation patterns, and estimations of population growth and time to most recent common ancestor (TMRCA). We found that while H1 was significantly prevalent in European populations, the H1 TMRCA predated human migration out of Africa. H1 is likely to have undergone negative selection because 1) the root of H1 genealogy is defined by a preexisting amino acid substitution that causes serious conformational changes to the NOD2 protein, 2) the haplotype has almost become extinct in Africa, and 3) the haplotype has not been affected by the recent European expansion reflected in the other haplotypes. Nevertheless, H1 has survived in European populations, suggesting that the haplotype is advantageous to this group. We propose that several CD-risk alleles, which destabilize and disrupt the NOD2 protein, have been maintained by natural selection on standing variation because the deleterious haplotype of NOD2 is advantageous in diploid individuals due to heterozygote advantage and/or intergenic interactions.
Crohn's disease; NOD2; hitchhiking effect; natural selection; standing variation; mildly deleterious mutation
QA-RecombineIt provides a web interface to assess the quality of protein 3D structure models and to improve the accuracy of models by merging fragments of multiple input models. QA-RecombineIt has been developed for protein modelers who are working on difficult problems, have a set of different homology models and/or de novo models (from methods such as I-TASSER or ROSETTA) and would like to obtain one consensus model that incorporates the best parts into one structure that is internally coherent. An advanced mode is also available, in which one can modify the operation of the fragment recombination algorithm by manually identifying individual fragments or entire models to recombine. Our method produces up to 100 models that are expected to be on the average more accurate than the starting models. Therefore, our server may be useful for crystallographic protein structure determination, where protein models are used for Molecular Replacement to solve the phase problem. To address the latter possibility, a special feature was added to the QA-RecombineIt server. The QA-RecombineIt server can be freely accessed at http://iimcb.genesilico.pl/qarecombineit/.
The continuously increasing amount of RNA sequence and experimentally determined 3D structure data drives the development of computational methods supporting exploration of these data. Contemporary functional analysis of RNA molecules, such as ribozymes or riboswitches, covers various issues, among which tertiary structure modeling becomes more and more important. A growing number of tools to model and predict RNA structure calls for an evaluation of these tools and the quality of outcomes their produce. Thus, the development of reliable methods designed to meet this need is relevant in the context of RNA tertiary structure analysis and can highly influence the quality and usefulness of RNA tertiary structure prediction in the nearest future. Here, we present RNAlyzer—a computational method for comparison of RNA 3D models with the reference structure and for discrimination between the correct and incorrect models. Our approach is based on the idea of local neighborhood, defined as a set of atoms included in the sphere centered around a user-defined atom. A unique feature of the RNAlyzer is the simultaneous visualization of the model-reference structure distance at different levels of detail, from the individual residues to the entire molecules.
We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.
A key step in proliferation of retroviruses is the conversion of their RNA genome to double-stranded DNA, a process catalysed by multifunctional reverse transcriptases (RTs). Dimeric and monomeric RTs have been described, the latter exemplified by the enzyme of Moloney murine leukaemia virus. However, structural information is lacking that describes the substrate binding mechanism for a monomeric RT. We report here the first crystal structure of a complex between an RNA/DNA hybrid substrate and polymerase-connection fragment of the single-subunit RT from xenotropic murine leukaemia virus-related virus, a close relative of Moloney murine leukaemia virus. A comparison with p66/p51 human immunodeficiency virus-1 RT shows that substrate binding around the polymerase active site is conserved but differs in the thumb and connection subdomains. Small-angle X-ray scattering was used to model full-length xenotropic murine leukaemia virus-related virus RT, demonstrating that its mobile RNase H domain becomes ordered in the presence of a substrate—a key difference between monomeric and dimeric RTs.
The structures of biological macromolecules provide a framework for studying their biological functions. Three-dimensional structures of proteins, nucleic acids, or their complexes, are difficult to visualize in detail on flat surfaces, and algorithms for their spatial superposition and comparison are computationally costly. Molecular structures, however, can be represented as 2D maps of interactions between the individual residues, which are easier to visualize and compare, and which can be reconverted to 3D structures with reasonable precision. There are many visualization tools for maps of protein structures, but few for nucleic acids.
We developed RNAmap2D, a platform-independent software tool for calculation, visualization and analysis of contact and distance maps for nucleic acid molecules and their complexes with proteins or ligands. The program addresses the problem of paucity of bioinformatics tools dedicated to analyzing RNA 2D maps, given the growing number of experimentally solved RNA structures in the Protein Data Bank (PDB) repository, as well as the growing number of tools for RNA 2D and 3D structure prediction. RNAmap2D allows for calculation and analysis of contacts and distances between various classes of atoms in nucleic acid, protein, and small ligand molecules. It also discriminates between different types of base pairing and stacking.
RNAmap2D is an easy to use method to visualize, analyze and compare structures of nucleic acid molecules and their complexes with other molecules, such as proteins or ligands and metal ions. Its special features make it a very useful tool for analysis of tertiary structures of RNAs. RNAmap2D for Windows/Linux/MacOSX is freely available for academic users at http://iimcb.genesilico.pl/rnamap2d.html
Contact maps; Distance maps; RNA secondary structure; RNA base pairing; RNA stacking; Protein-RNA complex; Docking
Computational models of protein structures were proved to be useful as search models in Molecular Replacement (MR), a common method to solve the phase problem faced by macromolecular crystallography. The success of MR depends on the accuracy of a search model. Unfortunately, this parameter remains unknown until the final structure of the target protein is determined. During the last few years, several Model Quality Assessment Programs (MQAPs) that predict the local accuracy of theoretical models have been developed. In this article, we analyze whether the application of MQAPs improves the utility of theoretical models in MR.
For our dataset of 615 search models, the real local accuracy of a model increases the MR success ratio by 101% compared to corresponding polyalanine templates. On the contrary, when local model quality is not utilized in MR, the computational models solved only 4.5% more MR searches than polyalanine templates. For the same dataset of the 615 models, a workflow combining MR with predicted local accuracy of a model found 45% more correct solution than polyalanine templates. To predict such accuracy MetaMQAPclust, a “clustering MQAP” was used.
Using comparative models only marginally increases the MR success ratio in comparison to polyalanine structures of templates. However, the situation changes dramatically once comparative models are used together with their predicted local accuracy. A new functionality was added to the GeneSilico Fold Prediction Metaserver in order to build models that are more useful for MR searches. Additionally, we have developed a simple method, AmIgoMR (Am I good for MR?), to predict if an MR search with a template-based model for a given template is likely to find the correct solution.
Molecular replacement; MR; MQAP; Model quality assessment; Protein structure prediction
MODOMICS is a database of RNA modifications that provides comprehensive information concerning the chemical structures of modified ribonucleosides, their biosynthetic pathways, RNA-modifying enzymes and location of modified residues in RNA sequences. In the current database version, accessible at http://modomics.genesilico.pl, we included new features: a census of human and yeast snoRNAs involved in RNA-guided RNA modification, a new section covering the 5′-end capping process, and a catalogue of ‘building blocks’ for chemical synthesis of a large variety of modified nucleosides. The MODOMICS collections of RNA modifications, RNA-modifying enzymes and modified RNAs have been also updated. A number of newly identified modified ribonucleosides and more than one hundred functionally and structurally characterized proteins from various organisms have been added. In the RNA sequences section, snRNAs and snoRNAs with experimentally mapped modified nucleosides have been added and the current collection of rRNA and tRNA sequences has been substantially enlarged. To facilitate literature searches, each record in MODOMICS has been cross-referenced to other databases and to selected key publications. New options for database searching and querying have been implemented, including a BLAST search of protein sequences and a PARALIGN search of the collected nucleic acid sequences.
Cytoplasmic initiator tRNAs from plants and fungi are excluded from participating in translational elongation by the presence of a unique 2′-phosphoribosyl modification of purine 64, introduced posttranscriptionally by the enzyme Rit1p. Members of the Rit1p family show no obvious similarity to other proteins or domains, there is no structural information available to guide experimental analyses, and the mechanism of action of this enzyme remains a mystery. Using protein fold recognition, we identified a phosphatase-like domain in the C-terminal part of Rit1p. A comparative model of the C-terminal domain was constructed and used to predict the function of conserved residues and to propose the mechanism of action of Rit1p. The model will facilitate experimental analyses of Rit1p and its interactions with the initiator tRNA substrate.
fold recognition; homology modeling; tRNA modification; Rit1p; bioinformatics
Ribonucleases (RNases) are valuable tools applied in the analysis of RNA sequence, structure and function. Their substrate specificity is limited to recognition of single bases or distinct secondary structures in the substrate. Currently, there are no RNases available for purely sequence-dependent fragmentation of RNA. Here, we report the development of a new enzyme that cleaves the RNA strand in DNA–RNA hybrids 5 nt from a nonanucleotide recognition sequence. The enzyme was constructed by fusing two functionally independent domains, a RNase HI, that hydrolyzes RNA in DNA–RNA hybrids in processive and sequence-independent manner, and a zinc finger that recognizes a sequence in DNA–RNA hybrids. The optimization of the fusion enzyme’s specificity was guided by a structural model of the protein-substrate complex and involved a number of steps, including site-directed mutagenesis of the RNase moiety and optimization of the interdomain linker length. Methods for engineering zinc finger domains with new sequence specificities are readily available, making it feasible to acquire a library of RNases that recognize and cleave a variety of sequences, much like the commercially available assortment of restriction enzymes. Potentially, zinc finger-RNase HI fusions may, in addition to in vitro applications, be used in vivo for targeted RNA degradation.
The spliceosome is a molecular machine that performs the excision of introns from eukaryotic pre-mRNAs. This macromolecular complex comprises in human cells five RNAs and over one hundred proteins. In recent years, many spliceosomal proteins have been found to exhibit intrinsic disorder, that is to lack stable native three-dimensional structure in solution. Building on the previous body of proteomic, structural and functional data, we have carried out a systematic bioinformatics analysis of intrinsic disorder in the proteome of the human spliceosome. We discovered that almost a half of the combined sequence of proteins abundant in the spliceosome is predicted to be intrinsically disordered, at least when the individual proteins are considered in isolation. The distribution of intrinsic order and disorder throughout the spliceosome is uneven, and is related to the various functions performed by the intrinsic disorder of the spliceosomal proteins in the complex. In particular, proteins involved in the secondary functions of the spliceosome, such as mRNA recognition, intron/exon definition and spliceosomal assembly and dynamics, are more disordered than proteins directly involved in assisting splicing catalysis. Conserved disordered regions in spliceosomal proteins are evolutionarily younger and less widespread than ordered domains of essential spliceosomal proteins at the core of the spliceosome, suggesting that disordered regions were added to a preexistent ordered functional core. Finally, the spliceosomal proteome contains a much higher amount of intrinsic disorder predicted to lack secondary structure than the proteome of the ribosome, another large RNP machine. This result agrees with the currently recognized different functions of proteins in these two complexes.
In eukaryotic cells, introns are spliced out of proteincoding mRNAs by a highly dynamic and extraordinarily plastic molecular machine called the spliceosome. In recent years, multiple regions of intrinsic structural disorder were found in spliceosomal proteins. Intrinsically disordered regions lack stable native three-dimensional structure in solutions, which makes them structurally flexible and/or able to switch between different conformations. Hence, intrinsically disordered regions are the ideal candidate responsible for the spliceosome's plasticity. Intrinsically disordered regions are also frequently the sites of post-translational modifications, which were also proven to be important in spliceosome dynamics. In this article, we describe the results of a structural bioinformatics analysis focused on intrinsic disorder in the spliceosomal proteome. We systematically analyzed all known human spliceosomal proteins with regards to the presence and type of intrinsic disorder. Almost a half of the combined sequence of these spliceosomal proteins is predicted to be intrinsically disordered, and the type of intrinsic disorder in a protein varies with its function and its location in the spliceosome. The parts of the spliceosome that act earlier in the process are more disordered, which corresponds to their role in establishing a network of interactions, while the parts that act later are more ordered.
Dihydrouridine (D) is a modified base found in conserved positions in the
D-loop of tRNA in Bacteria, Eukaryota, and some Archaea. Despite the
abundant occurrence of D, little is known about its biochemical roles in
mediating tRNA function. It is assumed that D may destabilize the structure
of tRNA and thus enhance its conformational flexibility. D is generated
post-transcriptionally by the reduction of the 5,6-double bond of a uridine
residue in RNA transcripts. The reaction is carried out by dihydrouridine
synthases (DUS). DUS constitute a conserved family of enzymes encoded by the
orthologous gene family COG0042. In protein sequence databases, members of
COG0042 are typically annotated as “predicted TIM-barrel enzymes,
possibly dehydrogenases, nifR3 family”.
To elucidate sequence-structure-function relationships in the DUS family, a
comprehensive bioinformatic analysis was carried out. We performed extensive
database searches to identify all members of the currently known DUS family,
followed by clustering analysis to subdivide it into subfamilies of closely
related sequences. We analyzed phylogenetic distributions of all members of
the DUS family and inferred the evolutionary tree, which suggested a
scenario for the evolutionary origin of dihydrouridine-forming enzymes. For
a human representative of the DUS family, the hDus2 protein suggested as a
potential drug target in cancer, we generated a homology model. While this
article was under review, a crystal structure of a DUS representative has
been published, giving us an opportunity to validate the model.
We compared sequences and phylogenetic distributions of all members of the
DUS family and inferred the phylogenetic tree, which provides a framework to
study the functional differences among these proteins and suggests a
scenario for the evolutionary origin of dihydrouridine formation. Our
evolutionary and structural classification of the DUS family provides a
background to study functional differences among these proteins that will
guide experimental analyses.
Dihydrouridine synthases; Protein structure prediction; Fold recognition; Remote homology; RNA modification; Molecular evolution; Enzymes acting on RNA
Exonuclease VII (ExoVII) is a bacterial nuclease involved in DNA repair and recombination that hydrolyses single-stranded DNA. ExoVII is composed of two subunits: large XseA and small XseB. Thus far, little was known about the molecular structure of ExoVII, the interactions between XseA and XseB, the architecture of the nuclease active site or its mechanism of action. We used bioinformatics methods to predict the structure of XseA, which revealed four domains: an N-terminal OB-fold domain, a middle putatively catalytic domain, a coiled-coil domain and a short C-terminal segment. By series of deletion and site-directed mutagenesis experiments on XseA from Escherichia coli, we determined that the OB-fold domain is responsible for DNA binding, the coiled-coil domain is involved in binding multiple copies of the XseB subunit and residues D155, R205, H238 and D241 of the middle domain are important for the catalytic activity but not for DNA binding. Altogether, we propose a model of sequence–structure–function relationships in ExoVII.
DNA methylation-dependent restriction enzymes have many applications in genetic engineering and in the analysis of the epigenetic state of eukaryotic genomes. Nevertheless, high-resolution structures have not yet been reported, and therefore mechanisms of DNA methylation-dependent cleavage are not understood. Here, we present a biochemical analysis and high-resolution DNA co-crystal structure of the N6-methyladenine (m6A)-dependent restriction enzyme R.DpnI. Our data show that R.DpnI consists of an N-terminal catalytic PD-(D/E)XK domain and a C-terminal winged helix (wH) domain. Surprisingly, both domains bind DNA in a sequence- and methylation-sensitive manner. The crystal contains R.DpnI with fully methylated target DNA bound to the wH domain, but distant from the catalytic domain. Independent readout of DNA sequence and methylation by the two domains might contribute to R.DpnI specificity or could help the monomeric enzyme to cut the second strand after introducing a nick.
Intrinsically unstructured proteins (IUPs) lack a well-defined three-dimensional structure. Some of them may assume a locally stable structure under specific conditions, e.g. upon interaction with another molecule, while others function in a permanently unstructured state. The discovery of IUPs challenged the traditional protein structure paradigm, which stated that a specific well-defined structure defines the function of the protein. As of December 2011, approximately 60 methods for computational prediction of protein disorder from sequence have been made publicly available. They are based on different approaches, such as utilizing evolutionary information, energy functions, and various statistical and machine learning methods.
Given the diversity of existing intrinsic disorder prediction methods, we decided to test whether it is possible to combine them into a more accurate meta-prediction method. We developed a method based on arbitrarily chosen 13 disorder predictors, in which the final consensus was weighted by the accuracy of the methods. We have also developed a disorder predictor GSmetaDisorder3D that used no third-party disorder predictors, but alignments to known protein structures, reported by the protein fold-recognition methods, to infer the potentially structured and unstructured regions. Following the success of our disorder predictors in the CASP8 benchmark, we combined them into a meta-meta predictor called GSmetaDisorderMD, which was the top scoring method in the subsequent CASP9 benchmark.
A series of disorder predictors described in this article is available as a MetaDisorder web server at http://iimcb.genesilico.pl/metadisorder/. Results are presented both in an easily interpretable, interactive mode and in a simple text format suitable for machine processing.