Search tips
Search criteria

Results 1-25 (35)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
2.  RNA regulons in Hox 5′UTRs confer ribosome specificity to gene regulation 
Nature  2014;517(7532):33-38.
Emerging evidence suggests a regulatory function of the ribosome in directing how the genome is translated in time and space. However, how this regulation is encoded in mRNA sequence remains largely unknown. Here we uncover unique RNA regulons embedded in Homeobox (Hox) 5′UTRs that confer ribosome-mediated control of gene expression. These structured RNA elements, resembling viral Internal Ribosome Entry Sites (IRESes), are found in subsets of Hox mRNAs. They facilitate ribosome recruitment and require Ribosomal Protein L38 for their activity. Despite numerous layers of Hox gene regulation, these IRES elements are essential for converting Hox transcripts into proteins to pattern the mammalian body plan. This specialized mode of IRES-dependent translation is enabled by a regulatory element, the Translational Inhibitory Element (TIE), which blocks cap-dependent translation of these transcripts. Together, these data uncover a new paradigm for ribosome-mediated control of gene expression and organismal development.
PMCID: PMC4353651  PMID: 25409156
3.  Consistent global structures of complex RNA states through multidimensional chemical mapping 
eLife  null;4:e07600.
Accelerating discoveries of non-coding RNA (ncRNA) in myriad biological processes pose major challenges to structural and functional analysis. Despite progress in secondary structure modeling, high-throughput methods have generally failed to determine ncRNA tertiary structures, even at the 1-nm resolution that enables visualization of how helices and functional motifs are positioned in three dimensions. We report that integrating a new method called MOHCA-seq (Multiplexed •OH Cleavage Analysis with paired-end sequencing) with mutate-and-map secondary structure inference guides Rosetta 3D modeling to consistent 1-nm accuracy for intricately folded ncRNAs with lengths up to 188 nucleotides, including a blind RNA-puzzle challenge, the lariat-capping ribozyme. This multidimensional chemical mapping (MCM) pipeline resolves unexpected tertiary proximities for cyclic-di-GMP, glycine, and adenosylcobalamin riboswitch aptamers without their ligands and a loose structure for the recently discovered human HoxA9D internal ribosome entry site regulon. MCM offers a sequencing-based route to uncovering ncRNA 3D structure, applicable to functionally important but potentially heterogeneous states.
eLife digest
Our genetic material, in the form of molecules of DNA, provides instructions for many different processes in our cells. To issue these instructions, particular sections of DNA are copied to make a type of molecule called ribonucleic acid (RNA). Some of these RNA molecules contain instructions to make proteins, but others—known as non-coding RNAs—regulate the activity of genes in cells.
The genetic information within RNA is encoded by the sequence of four different chemical parts called ‘nucleotides’. RNA can exist as a single strand of nucleotides, but the nucleotides can also pair up in specific combinations to form sections of double-stranded RNA. Therefore, a single strand of non-coding RNA can fold into a complex three-dimensional shape that contains loops, twists, and bulges.
The three-dimensional structures of non-coding RNAs are crucial for their roles in cells, but the variety and complexity of shapes that they can form makes it technically difficult to study them. In 2008, researchers developed a new method called MOHCA that can map the positions of nucleotides that are close together in the three-dimensional structure. Highly reactive chemicals are attached to the nucleotides and these can react with, and damage, other nearby nucleotides. By detecting which nucleotides have been damaged, it is possible to map the positions of these nucleotides and decipher the structure of the RNA molecule using computer algorithms.
MOHCA is a promising approach, but the initial methods to find the damaged nucleotides were tedious and required specialized equipment. Now, Cheng, Das et al.—including some of the researchers involved in the 2008 work—have developed an improved version of MOHCA that uses readily available RNA sequencing techniques to find the damaged nucleotides. The RNA sequencing data are then analyzed by a new algorithm in the Rosetta computer modeling software.
Cheng, Das et al. used this newly developed ‘MOHCA-seq’ and Rosetta to reveal the structures of a human non-coding RNA and several other non-coding RNA molecules to a much higher level of detail than before. Together, MOHCA-seq and Rosetta provide a rapid method for researchers to decipher the three-dimensional structure of non-coding RNAs. This method is likely to speed up the analysis of the complex structures of non-coding RNAs. It will be useful in future efforts to work out what roles these RNAs play in cells, including their activity in cancer, neurodegeneration, and other diseases.
PMCID: PMC4495719  PMID: 26035425
non-coding RNA; riboswitches; ribozymes; structure prediction; next-generation sequencing; high-throughput; none
4.  Primerize: automated primer assembly for transcribing non-coding RNA domains 
Nucleic Acids Research  2015;43(Web Server issue):W522-W526.
Customized RNA synthesis is in demand for biological and biotechnological research. While chemical synthesis and gel or chromatographic purification of RNA is costly and difficult for sequences longer than tens of nucleotides, a pipeline of primer assembly of DNA templates, in vitro transcription by T7 RNA polymerase and kit-based purification provides a cost-effective and fast alternative for preparing RNA molecules. Nevertheless, designing template primers that optimize cost and avoid mispriming during polymerase chain reaction currently requires expert inspection, downloading specialized software or both. Online servers are currently not available or maintained for the task. We report here a server named Primerize that makes available an efficient algorithm for primer design developed and experimentally tested in our laboratory for RNA domains with lengths up to 300 nucleotides. Free access:
PMCID: PMC4489279  PMID: 25999345
5.  RNA-Redesign: a web server for fixed-backbone 3D design of RNA 
Nucleic Acids Research  2015;43(Web Server issue):W498-W501.
RNA is rising in importance as a design medium for interrogating fundamental biology and for developing therapeutic and bioengineering applications. While there are several online servers for design of RNA secondary structure, there are no tools available for the rational design of 3D RNA structure. Here we present RNA-Redesign (, an online 3D design tool for RNA. This resource utilizes fixed-backbone design to optimize the sequence identity and nucleobase conformations of an RNA to match a desired backbone, analogous to fundamental tools that underlie rational protein engineering. The resulting sequences suggest thermostabilizing mutations that can be experimentally verified. Further, sequence preferences that differ between natural and computationally designed sequences can suggest whether natural sequences possess functional constraints besides folding stability, such as cofactor binding or conformational switching. Finally, for biochemical studies, the designed sequences can suggest experimental tests of 3D models, including concomitant mutation of base triples. In addition to the designs generated, detailed graphical analysis is presented through an integrated and user-friendly environment.
PMCID: PMC4489241  PMID: 25964298
6.  Standardization of RNA Chemical Mapping Experiments 
Biochemistry  2014;53(19):3063-3065.
Chemical mapping experiments offer powerful information about RNA structure but currently involve ad hoc assumptions in data processing. We show that simple dilutions, referencing standards (GAGUA hairpins), and HiTRACE/MAPseeker analysis allow rigorous overmodification correction, background subtraction, and normalization for electrophoretic data and a ligation bias correction needed for accurate deep sequencing data. Comparisons across six noncoding RNAs stringently test the proposed standardization of dimethyl sulfate (DMS), 2′-OH acylation (SHAPE), and carbodiimide measurements. Identification of new signatures for extrahelical bulges and DMS “hot spot” pockets (including tRNA A58, methylated in vivo) illustrates the utility and necessity of standardization for quantitative RNA mapping.
PMCID: PMC4033625  PMID: 24766159
7.  Understanding Nucleic Acid–Ion Interactions 
Annual review of biochemistry  2014;83:813-841.
Ions surround nucleic acids in what is referred to as an ion atmosphere. As a result, the folding and dynamics of RNA and DNA and their complexes with proteins and with each other cannot be understood without a reasonably sophisticated appreciation of these ions’ electrostatic interactions. However, the underlying behavior of the ion atmosphere follows physical rules that are distinct from the rules of site binding that biochemists are most familiar and comfortable with. The main goal of this review is to familiarize nucleic acid experimentalists with the physical concepts that underlie nucleic acid–ion interactions. Throughout, we provide practical strategies for interpreting and analyzing nucleic acid experiments that avoid pitfalls from oversimplified or incorrect models. We briefly review the status of theories that predict or simulate nucleic acid–ion interactions and experiments that test these theories. Finally, we describe opportunities for going beyond phenomenological fits to a next-generation, truly predictive understanding of nucleic acid–ion interactions.
PMCID: PMC4384882  PMID: 24606136
ions; RNA/DNA; electrostatics; Poisson–Boltzmann; Manning condensation; Hill equation; free energy
8.  The Mutate-and-Map Protocol for Inferring Base Pairs in Structured RNA 
Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule’s reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule’s “contact map.” Here, we give our in-house protocol for this “mutate-and-map” (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure.
PMCID: PMC4080707  PMID: 24136598
RNA structure; Chemical mapping; Capillary sequencing; Systematic mutation
9.  Structure determination of noncanonical RNA motifs guided by 1H NMR chemical shifts 
Nature methods  2014;11(4):413-416.
Structured non-coding RNAs underline fundamental cellular processes, but determining their 3D structures remains challenging. We demonstrate herein that integrating NMR 1H chemical shift data with Rosetta de novo modeling can consistently return high-resolution RNA structures. On a benchmark set of 23 noncanonical RNA motifs, including 11 blind targets, Chemical-Shift-ROSETTA for RNA (CS-ROSETTA-RNA) recovered the experimental structures with high accuracy (0.6 to 2.0 Å all-heavy-atom rmsd) in 18 cases.
PMCID: PMC3985481  PMID: 24584194
10.  Blind Predictions of DNA and RNA Tweezers Experiments with Force and Torque 
PLoS Computational Biology  2014;10(8):e1003756.
Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's “spring-like” conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that ‘nucleosome-excluding’ poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology.
Author Summary
DNA and RNA are fundamental molecules in the central dogma of molecular biology. Many biological behaviors of double-stranded DNA and RNA – including transcription/translation by proteins and packaging into compact structures – depend on their ability to flex and twist. Single-molecule tweezers now provide accurate mechanical measurements of DNA and RNA helices under force and torque but have not been used to rigorously falsify and thereby advance computational models. Here we present the first such blind challenge, involving recent dsRNA tweezers data that were kept hidden from modelers and a new HelixMC toolkit that resolves challenges in simulating long double helices from base-pair level models. The predictions gave excellent agreement with bending and stretching measurements of dsRNA but failed to recover twisting properties, pinpointing a critical area of future investigation. HelixMC also predicted that poly(A)/poly(T) and Z-DNA–biologically important variants whose elastic responses have not been studied with tweezers–will have distinct mechanical properties. These results open a route to iteratively falsifying and refining computational models of long nucleic acid helices, as is necessary for attaining a predictive understanding of their biological behaviors.
PMCID: PMC4125081  PMID: 25102226
11.  Rosetta3: An Object-Oriented Software Suite for the Simulation and Design of Macromolecules 
Methods in enzymology  2011;487:545-574.
We have recently completed a full re-architecturing of the Rosetta molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy to use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as Rosetta3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This document describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
PMCID: PMC4083816  PMID: 21187238
12.  An RNA Mapping DataBase for curating RNA structure mapping experiments 
Bioinformatics  2012;28(22):3006-3008.
Summary: We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly.
Availability: Freely available on the web at
Supplementary information: Supplementary data are available at Bioinformatics Online.
PMCID: PMC3496344  PMID: 22976082
13.  Atomic-Accuracy Prediction of Protein Loop Structures through an RNA-Inspired Ansatz 
PLoS ONE  2013;8(10):e74830.
Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a ‘stepwise ansatz’, recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth ‘RNA-puzzle’ competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.
PMCID: PMC3804535  PMID: 24204571
14.  Quantitative DMS mapping for automated RNA secondary structure inference 
Biochemistry  2012;51(36):7037-7039.
For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using an energy minimization framework developed for 2′-OH acylation (SHAPE) map-ping. On six non-coding RNAs with crystallographic models, DMS-guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, comparable or better than SHAPE-guided modeling; and bootstrapping provides straightforward confidence estimates. Integrating DMS/SHAPE data and including CMCT reactivities give small additional improvements. These results establish DMS mapping – an already routine technique – as a quantitative tool for unbiased RNA structure modeling.
PMCID: PMC3448840  PMID: 22913637
15.  A two-dimensional mutate-and-map strategy for non-coding RNA structure 
Nature chemistry  2011;3(12):954-962.
Non-coding RNAs fold into precise base-pairing patterns to carry out critical roles in genetic regulation and protein synthesis, but determining RNA structure remains difficult. Here, we show that coupling systematic mutagenesis with high-throughput chemical mapping enables accurate base-pair inference of domains from ribosomal RNA, ribozymes and riboswitches. For a six-RNA benchmark that has challenged previous chemical/computational methods, this ‘mutate-and-map’ strategy gives secondary structures that are in agreement with crystallography (helix error rates, 2%), including a blind test on a double-glycine riboswitch. Through modelling of partially ordered states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the data report on tertiary contacts within non-coding RNAs, and coupling to the Rosetta/FARFAR algorithm gives nucleotide-resolution three-dimensional models (helix root-mean-squared deviation, 5.7 Å) of an adenine riboswitch. These results establish a promising two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behaviour.
PMCID: PMC3725140  PMID: 22109276
16.  Adding Diverse Noncanonical Backbones to Rosetta: Enabling Peptidomimetic Design 
PLoS ONE  2013;8(7):e67051.
Peptidomimetics are classes of molecules that mimic structural and functional attributes of polypeptides. Peptidomimetic oligomers can frequently be synthesized using efficient solid phase synthesis procedures similar to peptide synthesis. Conformationally ordered peptidomimetic oligomers are finding broad applications for molecular recognition and for inhibiting protein-protein interactions. One critical limitation is the limited set of design tools for identifying oligomer sequences that can adopt desired conformations. Here, we present expansions to the ROSETTA platform that enable structure prediction and design of five non-peptidic oligomer scaffolds (noncanonical backbones), oligooxopiperazines, oligo-peptoids, -peptides, hydrogen bond surrogate helices and oligosaccharides. This work is complementary to prior additions to model noncanonical protein side chains in ROSETTA. The main purpose of our manuscript is to give a detailed description to current and future developers of how each of these noncanonical backbones was implemented. Furthermore, we provide a general outline for implementation of new backbone types not discussed here. To illustrate the utility of this approach, we describe the first tests of the ROSETTA molecular mechanics energy function in the context of oligooxopiperazines, using quantum mechanical calculations as comparison points, scanning through backbone and side chain torsion angles for a model peptidomimetic. Finally, as an example of a novel design application, we describe the automated design of an oligooxopiperazine that inhibits the p53-MDM2 protein-protein interaction. For the general biological and bioengineering community, several noncanonical backbones have been incorporated into web applications that allow users to freely and rapidly test the presented protocols ( This work helps address the peptidomimetic community's need for an automated and expandable modeling tool for noncanonical backbones.
PMCID: PMC3712014  PMID: 23869206
17.  Correcting pervasive errors in RNA crystallography through enumerative structure prediction 
Nature methods  2012;10(1):74-76.
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors, and steric clashes. To address these problems, we present Enumerative Real-space Refinement ASsisted by Electron density under Rosetta (ERRASER), coupled to PHENIX (Python-based Hierarchical Environment for Integrated Xtallography) diffraction-based refinement. On 24 datasets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves average Rfree factor, resolves functionally important discrepancies in non-canonical structure, and refines low-resolution models to better match higher resolution models.
PMCID: PMC3531565  PMID: 23202432
18.  Structure prediction for CASP8 with all-atom refinement using Rosetta 
Proteins  2009;77(0 9):89-99.
We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For 50 targets with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 domains, and improved over the best starting model for 43 domains. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in 7 cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.
PMCID: PMC3688471  PMID: 19701941
19.  HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis 
Nucleic Acids Research  2013;41(Web Server issue):W492-W498.
To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure mapping experiments, including mutate-and-map contact inference, chromatin footprinting, the Eterna RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use and extend. Here, we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version and additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access:
PMCID: PMC3692083  PMID: 23761448
20.  Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE) 
PLoS ONE  2013;8(5):e63906.
The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code’s difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step ‘serverification’ protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at
PMCID: PMC3661552  PMID: 23717507
21.  Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements 
Recent hardware and software advances have enabled simulation studies of protein systems on biophysically-relevant timescales, often revealing the need for improved force fields. Although early force field development was limited by the lack of direct comparisons between simulation and experiment, recent work from several labs has demonstrated direct calculation of NMR observables from protein simulations. Here we quantitatively evaluate recent molecular dynamics force fields against a suite of 524 chemical shift and J coupling (3JHNHα, 3JHNCβ, 3JHαC′, 3JHNC′, and 3JHαN) measurements on dipeptides, tripeptides, tetra-alanine, and ubiquitin. Of the force fields examined (ff96, ff99, ff03, ff03*, ff03w, ff99sb*, ff99sb-ildn, ff99sb-ildn-phi, ff99sb-ildn-nmr, CHARMM27, OPLS-AA), two force fields (ff99sb-ildn-phi, ff99sb-ildn-nmr) combining recent side chain and backbone torsion modifications achieve high accuracy in our benchmark. For the two optimal force fields, the calculation error is comparable to the uncertainty in the experimental comparison. This observation suggests that extracting additional force field improvements from NMR data may require increased accuracy in J coupling and chemical shift prediction. To further investigate the limitations of current force fields, we also consider conformational populations of dipeptides, which were recently estimated using vibrational spectroscopy.
PMCID: PMC3383641  PMID: 22754404
22.  Understanding the errors of SHAPE-directed RNA structure modeling 
Biochemistry  2011;50(37):8049-8056.
Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2´-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0–2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method on six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from E. coli; the P4-P6 domain of the Tetrahymena group I ribozyme; and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR=12% and FDR=14%). The residual structure modeling errors are explained by insufficient information content of these RNAs’ SHAPE data, as evaluated by a nonparametric bootstrapping analysis inspired by approaches in phylogenetic inference. Beyond these benchmark cases, bootstrapping analysis suggests low confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.
PMCID: PMC3172344  PMID: 21842868
23.  Ultraviolet Shadowing of RNA Can Cause Significant Chemical Damage in Seconds 
Scientific Reports  2012;2:517.
Chemical purity of RNA samples is important for high-precision studies of RNA folding and catalytic behavior, but photodamage accrued during ultraviolet (UV) shadowing steps of sample preparation can reduce this purity. Here, we report the quantitation of UV-induced damage by using reverse transcription and single-nucleotide-resolution capillary electrophoresis. We found photolesions in a dozen natural and artificial RNAs; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps recommended for UV shadowing. Irradiation time-courses revealed detectable damage within a few seconds of exposure for 254 nm lamps held at a distance of 5 to 10 cm from 0.5-mm thickness gels. Under these conditions, 200-nucleotide RNAs subjected to 20 seconds of UV shadowing incurred damage to 16-27% of molecules; and, due to a ‘skin effect’, the molecule-by-molecule distribution of lesions gave 4-fold higher variance than a Poisson distribution. Thicker gels, longer wavelength lamps, and shorter exposure times reduced but did not eliminate damage. These results suggest that RNA biophysical studies should report precautions taken to avoid artifactual heterogeneity from UV shadowing.
PMCID: PMC3399121  PMID: 22816040
24.  Rosetta in CAPRI rounds 13–19 
Proteins  2010;78(15):3212-3218.
Modeling the conformational changes that occur upon binding of macromolecules is an unsolved challenge. In previous rounds of CAPRI it was demonstrated that the Rosetta approach to macromolecular modeling could capture sidechain conformational changes upon binding with high accuracy. In rounds 13–19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. While the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
PMCID: PMC2952713  PMID: 20597089
25.  Four Small Puzzles That Rosetta Doesn't Solve 
PLoS ONE  2011;6(5):e20044.
A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular “puzzles” should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
PMCID: PMC3098862  PMID: 21625446

Results 1-25 (35)