Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2´-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0–2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method on six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from E. coli; the P4-P6 domain of the Tetrahymena group I ribozyme; and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR=12% and FDR=14%). The residual structure modeling errors are explained by insufficient information content of these RNAs’ SHAPE data, as evaluated by a nonparametric bootstrapping analysis inspired by approaches in phylogenetic inference. Beyond these benchmark cases, bootstrapping analysis suggests low confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.
Chemical purity of RNA samples is important for high-precision studies of RNA folding and catalytic behavior, but photodamage accrued during ultraviolet (UV) shadowing steps of sample preparation can reduce this purity. Here, we report the quantitation of UV-induced damage by using reverse transcription and single-nucleotide-resolution capillary electrophoresis. We found photolesions in a dozen natural and artificial RNAs; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps recommended for UV shadowing. Irradiation time-courses revealed detectable damage within a few seconds of exposure for 254 nm lamps held at a distance of 5 to 10 cm from 0.5-mm thickness gels. Under these conditions, 200-nucleotide RNAs subjected to 20 seconds of UV shadowing incurred damage to 16-27% of molecules; and, due to a ‘skin effect’, the molecule-by-molecule distribution of lesions gave 4-fold higher variance than a Poisson distribution. Thicker gels, longer wavelength lamps, and shorter exposure times reduced but did not eliminate damage. These results suggest that RNA biophysical studies should report precautions taken to avoid artifactual heterogeneity from UV shadowing.
Modeling the conformational changes that occur upon binding of macromolecules is an unsolved challenge. In previous rounds of CAPRI it was demonstrated that the Rosetta approach to macromolecular modeling could capture sidechain conformational changes upon binding with high accuracy. In rounds 13–19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. While the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular “puzzles” should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
We present a Rosetta full-atom framework for predicting and designing the non-canonical motifs that define RNA tertiary structure, called FARFAR (Fragment Assembly of RNA with Full Atom Refinement). For a test set of thirty-two 6-to-20-nucleotide motifs, the method recapitulated 50% of the experimental structures at near-atomic accuracy. Additionally, design calculations recovered the native sequence at the majority of RNA residues engaged in non-canonical interactions, and mutations predicted to stabilize a signal recognition particle domain were experimentally validated.
We describe the proceedings and conclusions from a “Workshop on Applications of Protein Models in Biomedical Research” that was held at University of California at San Francisco on 11 and 12 July, 2008. At the workshop, international scientists involved with structure modeling explored (i) how models are currently used in biomedical research, (ii) what the requirements and challenges for different applications are, and (iii) how the interaction between the computational and experimental research communities could be strengthened to advance the field.
DNA is thought to behave as a stiff elastic rod with respect to the ubiquitous mechanical deformations inherent to its biology. Here, we measure the mean and variance of end-to-end length for a series of DNA double helices in solution, using small-angle X-ray scattering interference between gold nanocrystal labels. The data rule out the conventional elastic rod model. Specifically, the variance in end-to-end length follows a quadratic dependence on the number of base pairs rather than the expected linear dependence. Absent applied tension, DNA is at least one order of magnitude softer than measured by single-molecule stretching experiments. Our observations indicate that DNA stretching is cooperative over more than two turns of the DNA double helix, and support the idea of long-range allosteric communication through DNA structure.
We have developed protocols for rapidly quantifying the band intensities from nucleic acid chemical mapping gels at single nucleotide resolution. These protocols are implemented in the software SAFA (Semi-Automated Footprinting Analysis) that can be downloaded without charge from http://safa.stanford.edu. The protocols implemented in SAFA have five steps: 1.) Lane identification, 2.) Gel rectification, 3.) Band assignment, 4.) Model fitting, and 5.) Band intensity normalization. SAFA enables the rapid quantitation of gel images containing thousands of discrete bands, thereby eliminating a bottleneck to the analysis of chemical mapping experiments. An experienced user of the software can quantify a gel image in approximately 15 minutes. Although SAFA was developed to analyze hydroxyl radical (·OH) footprints, it effectively quantifies the gel images obtained with other types of chemical mapping probes. We also present a series of tutorial movies that illustrate the best practices and different steps in the SAFA analysis as a supplement to this protocol.
Gel Electrophoresis; Quantification; Chemical Mapping; Nucleic Acid; Phosphorimaging; SAFA; Footprint
In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software.
The prospect of phasing diffraction data sets ‘de novo’ for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets that are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33–79%) and asymmetric unit copy numbers (1–4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ‘de novo phasing with de novo models’ requires significant investment of computational power, much greater than 103 CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.
structure prediction; molecular replacement; de novo phasing
We report a novel molecular ruler for measurement of distances and distance distributions with accurate external calibration. Using solution X-ray scattering we determine the scattering interference between two gold nanocrystal probes attached site-specifically to a macromolecule of interest. Fourier transformation of the interference pattern provides a model-independent probability distribution for the distances between the probe centers-of-mass. To test the approach, we measure end-to-end distances for a variety of DNA structures. We demonstrate that measurements with independently prepared samples and using different X-ray sources are highly reproducible, we demonstrate the quantitative accuracy of the first and second moments of the distance distributions, and we demonstrate that the technique recovers complex distribution shapes. Distances measured with the solution scattering-interference ruler match the corresponding crystallographic values, but differ from distances measured previously with alternate ruler techniques. The X-ray scattering interference ruler should be a powerful tool for relating crystal structures to solution structures and for studying molecular fluctuations.
We describe a new approach to refining protein structure models that focuses sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom force field. In applications to models produced using NMR data and to comparative models based on distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the backbone conformations and the placement of core side chains. Further, the resulting models satisfy a particularly stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach the high accuracy required for molecular replacement. Phases for diffraction data for a 112-residue protein have been determined without any experimental phase information and in the absence of any templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of high resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate prior models.
Riboswitches are complex folded RNA domains found in non-coding regions of mRNA that regulate gene expression upon small molecule binding. Recently, Breaker and coworkers reported a tandem aptamer riboswitch (VCI-II) that binds glycine cooperatively. Here, we use hydroxyl radical footprinting and small-angle x-ray scattering (SAXS) to study the conformations of this tandem aptamer as a function of Mg2+ and glycine concentration. We fit a simple three-state thermodynamic model that describes the energetic coupling between magnesium-induced folding and glycine binding. Furthermore, we characterize the structural conformations of each of the three states: In low salt with no magnesium present, the VCI-II construct has an extended overall conformation, presumably representing unfolded structures. Addition of millimolar concentrations of Mg2+ in the absence of glycine leads to a significant compaction and partial folding as judged by hydroxyl radical protections. In the presence of millimolar Mg2+ concentrations, the tandem aptamer binds glycine cooperatively. The glycine binding transition involves a further compaction, additional tertiary packing interactions and further uptake of magnesium ions relative to the state in high Mg2+ but no glycine. Employing density reconstruction algorithms, we obtain low resolution 3-D structures for all three states from the SAXS measurements. These data provide a first glimpse into the structural conformations of the VCI-II aptamer, establish rigorous constraints for further modeling, and provide a framework for future mechanistic studies.
Riboswitches; Small-angle X-ray Scattering; RNA folding; RNA aptamers
Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13–19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement. Proteins 2010. © Wiley-Liss, Inc.
CAPRI; structure prediction; protein-protein interactions; RNA-protein interactions; Rosetta; flexible-backbone modeling; conformational changes; docking; backrub; fragment insertion