Dr. Das is a computational biochemist at Stanford University School of Medicine. His lab seeks a predictive understanding of how RNA and protein molecules code for complex biological machines. The lab's computer algorithms have consistently achieved leading predictions in world-wide structure prediction trials. Complementary to these computer methods, Dr. Das is developing high-throughput ‘multidimensional chemical mapping’ experiments to uncover three-dimensional structures and excited states of non-coding RNAs in their biological milieu, leading to discoveries of RNA regulons and influenza packaging signals that are critical for mammalian development and viral infection. Towards novel molecules of biomedical interest, Dr. Das leads the Eterna massive open laboratory, which couples a 100,000-player videogame to the lab’s massively parallel experimental tools and deep learning, the first such platform in citizen science. Dr. Das’s research has been recognized with the Burroughs-Wellcome Career Award at the Interface of Science, a W.M. Keck Medical Research Program award, and the OpenEye/American Chemical Society Outstanding Junior Faculty Award. Dr. Das mentors students from the biochemistry, biophysics, biomedical informatics, and learning sciences Ph.D. programs.
Honors & Awards
British Marshall Scholar, Marshall Aid Commemoration Commission (1998-2000)
Jane Coffin Childs Foundation Fellowship, Jane Coffin Childs Foundation (2006-2008)
Career Award at the Scientific Interface, Burroughs-Wellcome Foundation (2008-present)
Keck Medical Research Grant award, W. M. Keck Foundation (2012)
OpenEye Outstanding Junior Faculty Award, American Chemical Society (2015)
Ph.D., Stanford University, Physics (2005)
M.Res., University College London, Biocomplexity (2000)
M.Phil., Cambridge University, Physics (Radio Astronomy) (1999)
A.B.,s.c.l., Harvard University, Physics (1998)
Current Research and Scholarly Interests
Our lab strives to predict and design how biopolymer sequences define and regulate biopolymer structure/function, focusing on medically important RNA and RNA/protein complexes.
We are exploring algorithms to predict the structures and energetics of RNAs and RNA/protein interfaces at high resolution, focusing initially on small building blocks. We test these ideas through community-wide blind trials and by solving molecule structures and structure ensembles with sparse chemical mapping, NMR, crystallographic, and cryoelectron microscopy data.
Complementary to this computational research, we are developing information-rich biochemical methods to model the myriad structures of non-coding RNAs that remain unknown. Current efforts focus on probing the extent of RNA structure and conformational change inside cells and viruses, in close collaboration with expert biologists at Stanford.
In addition to modeling RNAs, we aim to design new ones for basic science, diagnostics, and therapeutics. Our videogame project EteRNA seeks missing rules and novel molecules for medicine by giving citizen scientists access to high-throughput wet-lab experiments.
- Biological Macromolecules
BIOC 241, BIOE 241, BIOPHYS 241, SBIO 241 (Win)
- Development of Thesis Research
BIOC 350 (Aut)
Independent Studies (12)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Investigation
BIOE 392 (Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biochemistry
BIOC 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biophysics
BIOPHYS 399 (Aut, Win, Spr, Sum)
- Graduate Research
BIOPHYS 300 (Aut, Win, Spr, Sum)
- Graduate Research and Special Advanced Work
BIOC 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOC 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
PHYSICS 490 (Aut, Spr)
- The Teaching of Biochemistry
BIOC 221 (Aut, Win, Spr, Sum)
- Undergraduate Research
BIOC 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
- Prior Year Courses
Doctoral Dissertation Reader (AC)
Steve Bonilla, Rachel Braun-Hagey, Gun Woo Byeon, Naomi Genuth, Carlos Hernandez, Bjoern Erik Wulff
Postdoctoral Faculty Sponsor
Michael Gotrik, Feriel Melaine, Andrew Watkins, Joseph Yesselman
Doctoral Dissertation Advisor (AC)
Blind prediction of noncanonical RNA structure at atomic accuracy.
2018; 4 (5): eaar5316
Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.
View details for DOI 10.1126/sciadv.aar5316
View details for PubMedID 29806027
Allosteric mechanism of the V-vulnificus adenine riboswitch resolved by four-dimensional chemical mapping
The structural interconversions that mediate the gene regulatory functions of RNA molecules may be different from classic models of allostery, but the relevant structural correlations have remained elusive in even intensively studied systems. Here, we present a four-dimensional expansion of chemical mapping called lock-mutate-map-rescue (LM2R), which integrates multiple layers of mutation with nucleotide-resolution chemical mapping. This technique resolves the core mechanism of the adenine-responsive V. vulnificus add riboswitch, a paradigmatic system for which both Monod-Wyman-Changeux (MWC) conformational selection models and non-MWC alternatives have been proposed. To discriminate amongst these models, we locked each functionally important helix through designed mutations and assessed formation or depletion of other helices via compensatory rescue evaluated by chemical mapping. These LM2R measurements give strong support to the pre-existing correlations predicted by MWC models, disfavor alternative models, and suggest additional structural heterogeneities that may be general across ligand-free riboswitches.
View details for DOI 10.7554/eLife.29602
View details for Web of Science ID 000427104800001
View details for PubMedID 29446752
View details for PubMedCentralID PMC5847336
- Updates to the RNA mapping database (RMDB), version 2 NUCLEIC ACIDS RESEARCH 2018; 46 (D1): D375–D379
An Activity Switch in Human Telomerase Based on RNA Conformation and Shaped by TCAB1.
Ribonucleoprotein enzymes require dynamic conformations of their RNA constituents for regulated catalysis. Human telomerase employs a non-coding RNA (hTR) with a bipartite arrangement of domains-a template-containing core and a distal three-way junction (CR4/5) that stimulates catalysis through unknown means. Here, we show that telomerase activity unexpectedly depends upon the holoenzyme protein TCAB1, which in turn controls conformation of CR4/5. Cells lacking TCAB1 exhibit a marked reduction in telomerase catalysis without affecting enzyme assembly. Instead, TCAB1 inactivation causes unfolding of CR4/5 helices that are required for catalysis and for association with the telomerase reverse-transcriptase (TERT). CR4/5 mutations derived from patients with telomere biology disorders provoke defects in catalysis and TERT binding similar to TCAB1 inactivation. These findings reveal a conformational "activity switch" in human telomerase RNA controlling catalysis and TERT engagement. The identification of two discrete catalytic states for telomerase suggests an intramolecular means for controlling telomerase in cancers and progenitor cells.
View details for DOI 10.1016/j.cell.2018.04.039
View details for PubMedID 29804836
Web-accessible molecular modeling with Rosetta: The Rosetta Online Server that Includes Everyone (ROSIE)
2018; 27 (1): 259–68
The Rosetta molecular modeling software package provides a large number of experimentally validated tools for modeling and designing proteins, nucleic acids, and other biopolymers, with new protocols being added continually. While freely available to academic users, external usage is limited by the need for expertise in the Unix command line environment. To make Rosetta protocols available to a wider audience, we previously created a web server called Rosetta Online Server that Includes Everyone (ROSIE), which provides a common environment for hosting web-accessible Rosetta protocols. Here we describe a simplification of the ROSIE protocol specification format, one that permits easier implementation of Rosetta protocols. Whereas the previous format required creating multiple separate files in different locations, the new format allows specification of the protocol in a single file. This new, simplified protocol specification has more than doubled the number of Rosetta protocols available under ROSIE. These new applications include pKa determination, lipid accessibility calculation, ribonucleic acid redesign, protein-protein docking, protein-small molecule docking, symmetric docking, antibody docking, cyclic toxin docking, critical binding peptide determination, and mapping small molecule binding sites. ROSIE is freely available to academic users at http://rosie.rosettacommons.org.
View details for DOI 10.1002/pro.3313
View details for Web of Science ID 000418254300026
View details for PubMedID 28960691
View details for PubMedCentralID PMC5734271
The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design.
Journal of chemical theory and computation
Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parametrized from small-molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, called the Rosetta Energy Function 2015 (REF15). Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend its capabilities from soluble proteins to also include membrane proteins, peptides containing noncanonical amino acids, small molecules, carbohydrates, nucleic acids, and other macromolecules.
View details for DOI 10.1021/acs.jctc.7b00125
View details for PubMedID 28430426
Single-molecule FRET-Rosetta reveals RNA structural rearrangements during human telomerase catalysis
2017; 23 (2): 175-188
Maintenance of telomeres by telomerase permits continuous proliferation of rapidly dividing cells, including the majority of human cancers. Despite its direct biomedical significance, the architecture of the human telomerase complex remains unknown. Generating homogeneous telomerase samples has presented a significant barrier to developing improved structural models. Here we pair single-molecule Förster resonance energy transfer (smFRET) measurements with Rosetta modeling to map the conformations of the essential telomerase RNA core domain within the active ribonucleoprotein. FRET-guided modeling places the essential pseudoknot fold distal to the active site on a protein surface comprising the C-terminal element, a domain that shares structural homology with canonical polymerase thumb domains. An independently solved medium-resolution structure of Tetrahymena telomerase provides a blind test of our modeling methodology and sheds light on the structural homology of this domain across diverse organisms. Our smFRET-Rosetta models reveal nanometer-scale rearrangements within the RNA core domain during catalysis. Taken together, our FRET data and pseudoatomic molecular models permit us to propose a possible mechanism for how RNA core domain rearrangement is coupled to template hybrid elongation.
View details for DOI 10.1261/rna.058743.116
View details for Web of Science ID 000392883800007
View details for PubMedID 28096444
View details for PubMedCentralID PMC5238793
RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme.
RNA (New York, N.Y.)
RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5'-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson-Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/.
View details for DOI 10.1261/rna.060368.116
View details for PubMedID 28138060
View details for PubMedCentralID PMC5393176
Controllable molecular motors engineered from myosin and RNA.
Engineering biomolecular motors can provide direct tests of structure-function relationships and customized components for controlling molecular transport in artificial systems 1 or in living cells 2 . Previously, synthetic nucleic acid motors 3-5 and modified natural protein motors 6-10 have been developed in separate complementary strategies to achieve tunable and controllable motor function. Integrating protein and nucleic-acid components to form engineered nucleoprotein motors may enable additional sophisticated functionalities. However, this potential has only begun to be explored in pioneering work harnessing DNA scaffolds to dictate the spacing, number and composition of tethered protein motors 11-15 . Here, we describe myosin motors that incorporate RNA lever arms, forming hybrid assemblies in which conformational changes in the protein motor domain are amplified and redirected by nucleic acid structures. The RNA lever arm geometry determines the speed and direction of motor transport and can be dynamically controlled using programmed transitions in the lever arm structure 7,9 . We have characterized the hybrid motors using in vitro motility assays, single-molecule tracking, cryo-electron microscopy and structural probing 16 . Our designs include nucleoprotein motors that reversibly change direction in response to oligonucleotides that drive strand-displacement 17 reactions. In multimeric assemblies, the controllable motors walk processively along actin filaments at speeds of 10-20 nm s-1. Finally, to illustrate the potential for multiplexed addressable control, we demonstrate sequence-specific responses of RNA variants to oligonucleotide signals.
View details for DOI 10.1038/s41565-017-0005-y
View details for PubMedID 29109539
Functional 5' UTR mRNA structures in eukaryotic translation regulation and how to find them.
Nature reviews. Molecular cell biology
RNA molecules can fold into intricate shapes that can provide an additional layer of control of gene expression beyond that of their sequence. In this Review, we discuss the current mechanistic understanding of structures in 5' untranslated regions (UTRs) of eukaryotic mRNAs and the emerging methodologies used to explore them. These structures may regulate cap-dependent translation initiation through helicase-mediated remodelling of RNA structures and higher-order RNA interactions, as well as cap-independent translation initiation through internal ribosome entry sites (IRESs), mRNA modifications and other specialized translation pathways. We discuss known 5' UTR RNA structures and how new structure probing technologies coupled with prospective validation, particularly compensatory mutagenesis, are likely to identify classes of structured RNA elements that shape post-transcriptional control of gene expression and the development of multicellular organisms.
View details for DOI 10.1038/nrm.2017.103
View details for PubMedID 29165424
Blind tests of RNA nearest-neighbor energy prediction
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2016; 113 (30): 8430-8435
The predictive modeling and design of biologically active RNA molecules requires understanding the energetic balance among their basic components. Rapid developments in computer simulation promise increasingly accurate recovery of RNA's nearest-neighbor (NN) free-energy parameters, but these methods have not been tested in predictive trials or on nonstandard nucleotides. Here, we present, to our knowledge, the first such tests through a RECCES-Rosetta (reweighting of energy-function collection with conformational ensemble sampling in Rosetta) framework that rigorously models conformational entropy, predicts previously unmeasured NN parameters, and estimates these values' systematic uncertainties. RECCES-Rosetta recovers the 10 NN parameters for Watson-Crick stacked base pairs and 32 single-nucleotide dangling-end parameters with unprecedented accuracies: rmsd of 0.28 kcal/mol and 0.41 kcal/mol, respectively. For set-aside test sets, RECCES-Rosetta gives rmsd values of 0.32 kcal/mol on eight stacked pairs involving G-U wobble pairs and 0.99 kcal/mol on seven stacked pairs involving nonstandard isocytidine-isoguanosine pairs. To more rigorously assess RECCES-Rosetta, we carried out four blind predictions for stacked pairs involving 2,6-diaminopurine-U pairs, which achieved 0.64 kcal/mol rmsd accuracy when tested by subsequent experiments. Overall, these results establish that computational methods can now blindly predict energetics of basic RNA motifs, including chemically modified variants, with consistently better than 1 kcal/mol accuracy. Systematic tests indicate that resolving the remaining discrepancies will require energy function improvements beyond simply reweighting component terms, and we propose further blind trials to test such efforts.
View details for DOI 10.1073/pnas.1523335113
View details for Web of Science ID 000380346200043
View details for PubMedID 27402765
View details for PubMedCentralID PMC4968729
RNA structure through multidimensional chemical mapping
QUARTERLY REVIEWS OF BIOPHYSICS
The discoveries of myriad non-coding RNA molecules, each transiting through multiple flexible states in cells or virions, present major challenges for structure determination. Advances in high-throughput chemical mapping give new routes for characterizing entire transcriptomes in vivo, but the resulting one-dimensional data generally remain too information-poor to allow accurate de novo structure determination. Multidimensional chemical mapping (MCM) methods seek to address this challenge. Mutate-and-map (M2), RNA interaction groups by mutational profiling (RING-MaP and MaP-2D analysis) and multiplexed •OH cleavage analysis (MOHCA) measure how the chemical reactivities of every nucleotide in an RNA molecule change in response to modifications at every other nucleotide. A growing body of in vitro blind tests and compensatory mutation/rescue experiments indicate that MCM methods give consistently accurate secondary structures and global tertiary structures for ribozymes, ribosomal domains and ligand-bound riboswitch aptamers up to 200 nucleotides in length. Importantly, MCM analyses provide detailed information on structurally heterogeneous RNA states, such as ligand-free riboswitches that are functionally important but difficult to resolve with other approaches. The sequencing requirements of currently available MCM protocols scale at least quadratically with RNA length, precluding general application to transcriptomes or viral genomes at present. We propose a modify-cross-link-map (MXM) expansion to overcome this and other current limitations to resolving the in vivo 'RNA structurome'.
View details for DOI 10.1017/S0033583516000020
View details for Web of Science ID 000375229800001
View details for PubMedID 27266715
Principles for Predicting RNA Secondary Structure Design Difficulty.
Journal of molecular biology
2016; 428 (5): 748-757
Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications.
View details for DOI 10.1016/j.jmb.2015.11.013
View details for PubMedID 26902426
View details for PubMedCentralID PMC4833017
Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server.
Methods in molecular biology (Clifton, N.J.)
2016; 1490: 187-198
Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts.
View details for DOI 10.1007/978-1-4939-6433-8_12
View details for PubMedID 27665600
RNA Structure Refinement Using the ERRASER-Phenix Pipeline.
Methods in molecular biology (Clifton, N.J.)
2016; 1320: 269-282
The final step of RNA crystallography involves the fitting of coordinates into electron density maps. The large number of backbone atoms in RNA presents a difficult and tedious challenge, particularly when experimental density is poor. The ERRASER-Phenix pipeline can improve an initial set of RNA coordinates automatically based on a physically realistic model of atomic-level RNA interactions. The pipeline couples diffraction-based refinement in Phenix with the Rosetta-based real-space refinement protocol ERRASER (Enumerative Real-Space Refinement ASsisted by Electron density under Rosetta). The combination of ERRASER and Phenix can improve the geometrical quality of RNA crystallographic models while maintaining or improving the fit to the diffraction data (as measured by R free). Here we present a complete tutorial for running ERRASER-Phenix through the Phenix GUI, from the command-line, and via an application in the Rosetta On-line Server that Includes Everyone (ROSIE).
View details for DOI 10.1007/978-1-4939-2763-0_17
View details for PubMedID 26227049
- Rich RNA Structure Landscapes Revealed by Mutate-and-Map Analysis PLOS COMPUTATIONAL BIOLOGY 2015; 11 (11)
Automated band annotation for RNA structure probing experiments with numerous capillary electrophoresis profiles.
2015; 31 (17): 2808-2815
Capillary electrophoresis (CE) is a powerful approach for structural analysis of nucleic acids, with recent high-throughput variants enabling three-dimensional RNA modeling and the discovery of new rules for RNA structure design. Among the steps composing CE analysis, the process of finding each band in an electrophoretic trace and mapping it to a position in the nucleic acid sequence has required significant manual inspection and remains the most time-consuming and error-prone step. The few available tools seeking to automate this band annotation have achieved limited accuracy and have not taken advantage of information across dozens of profiles routinely acquired in high-throughput measurements.We present a dynamic-programming-based approach to automate band annotation for high-throughput capillary electrophoresis. The approach is uniquely able to define and optimize a robust target function that takes into account multiple CE profiles (sequencing ladders, different chemical probes, different mutants) collected for the RNA. Over a large benchmark of multi-profile datasets for biological RNAs and designed RNAs from the EteRNA project, the method outperforms prior tools (QuSHAPE and FAST) significantly in terms of accuracy compared with gold-standard manual annotations. The amount of computation required is reasonable at a few seconds per dataset. We also introduce an 'E-score' metric to automatically assess the reliability of the band annotation and show it to be practically useful in flagging uncertainties in band annotation for further inspection.The implementation of the proposed algorithm is included in the HiTRACE software, freely available as an online server and for download at http://firstname.lastname@example.org or email@example.comSupplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btv282
View details for PubMedID 25943472
RNA-Redesign: a web server for fixed-backbone 3D design of RNA.
Nucleic acids research
2015; 43 (W1): W498-501
RNA is rising in importance as a design medium for interrogating fundamental biology and for developing therapeutic and bioengineering applications. While there are several online servers for design of RNA secondary structure, there are no tools available for the rational design of 3D RNA structure. Here we present RNA-Redesign (http://rnaredesign.stanford.edu), an online 3D design tool for RNA. This resource utilizes fixed-backbone design to optimize the sequence identity and nucleobase conformations of an RNA to match a desired backbone, analogous to fundamental tools that underlie rational protein engineering. The resulting sequences suggest thermostabilizing mutations that can be experimentally verified. Further, sequence preferences that differ between natural and computationally designed sequences can suggest whether natural sequences possess functional constraints besides folding stability, such as cofactor binding or conformational switching. Finally, for biochemical studies, the designed sequences can suggest experimental tests of 3D models, including concomitant mutation of base triples. In addition to the designs generated, detailed graphical analysis is presented through an integrated and user-friendly environment.
View details for DOI 10.1093/nar/gkv465
View details for PubMedID 25964298
- Primerize: automated primer assembly for transcribing non-coding RNA domains. Nucleic acids research 2015; 43 (W1): W522-6
RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures
2015; 21 (6): 1066-1084
This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.
View details for DOI 10.1261/rna.049502.114
View details for Web of Science ID 000356316200002
View details for PubMedID 25883046
View details for PubMedCentralID PMC4436661
Modeling complex RNA tertiary folds with rosetta.
Methods in enzymology
2015; 553: 35-64
Reliable modeling of RNA tertiary structures is key to both understanding these structures' roles in complex biological machines and to eventually facilitating their design for molecular computing and robotics. In recent years, a concerted effort to improve computational prediction of RNA structure through the RNA-Puzzles blind prediction trials has accelerated advances in the field. Among other approaches, the versatile and expanding Rosetta molecular modeling software now permits modeling of RNAs in the 100-300 nucleotide size range at consistent subhelical (~1nm) resolution. Our laboratory's current state-of-the-art methods for RNAs in this size range involve Fragment Assembly of RNA with Full-Atom Refinement (FARFAR), which optimizes RNA conformations in the context of a physically realistic energy function, as well as hybrid techniques that leverage experimental data to inform computational modeling. In this chapter, we give a practical guide to our current workflow for modeling RNA three-dimensional structures using FARFAR, including strategies for using data from multidimensional chemical mapping experiments to focus sampling and select accurate conformations.
View details for DOI 10.1016/bs.mie.2014.10.051
View details for PubMedID 25726460
Consistent global structures of complex RNA states through multidimensional chemical mapping.
Accelerating discoveries of non-coding RNA (ncRNA) in myriad biological processes pose major challenges to structural and functional analysis. Despite progress in secondary structure modeling, high-throughput methods have generally failed to determine ncRNA tertiary structures, even at the 1-nm resolution that enables visualization of how helices and functional motifs are positioned in three dimensions. We report that integrating a new method called MOHCA-seq (Multiplexed •OH Cleavage Analysis with paired-end sequencing) with mutate-and-map secondary structure inference guides Rosetta 3D modeling to consistent 1-nm accuracy for intricately folded ncRNAs with lengths up to 188 nucleotides, including a blind RNA-puzzle challenge, the lariat-capping ribozyme. This multidimensional chemical mapping (MCM) pipeline resolves unexpected tertiary proximities for cyclic-di-GMP, glycine, and adenosylcobalamin riboswitch aptamers without their ligands and a loose structure for the recently discovered human HoxA9D internal ribosome entry site regulon. MCM offers a sequencing-based route to uncovering ncRNA 3D structure, applicable to functionally important but potentially heterogeneous states.
View details for DOI 10.7554/eLife.07600
View details for PubMedID 26035425
RNA regulons in Hox 5' UTRs confer ribosome specificity to gene regulation.
2015; 517 (7532): 33-38
Emerging evidence suggests that the ribosome has a regulatory function in directing how the genome is translated in time and space. However, how this regulation is encoded in the messenger RNA sequence remains largely unknown. Here we uncover unique RNA regulons embedded in homeobox (Hox) 5' untranslated regions (UTRs) that confer ribosome-mediated control of gene expression. These structured RNA elements, resembling viral internal ribosome entry sites (IRESs), are found in subsets of Hox mRNAs. They facilitate ribosome recruitment and require the ribosomal protein RPL38 for their activity. Despite numerous layers of Hox gene regulation, these IRES elements are essential for converting Hox transcripts into proteins to pattern the mammalian body plan. This specialized mode of IRES-dependent translation is enabled by an additional regulatory element that we term the translation inhibitory element (TIE), which blocks cap-dependent translation of transcripts. Together, these data uncover a new paradigm for ribosome-mediated control of gene expression and organismal development.
View details for DOI 10.1038/nature14010
View details for PubMedID 25409156
Scientific rigor through videogames.
Trends in biochemical sciences
2014; 39 (11): 507-509
Hypothesis-driven experimentation - the scientific method - can be subverted by fraud, irreproducibility, and lack of rigorous predictive tests. A robust solution to these problems may be the 'massive open laboratory' model, recently embodied in the internet-scale videogame EteRNA. Deploying similar platforms throughout biology could enforce the scientific method more broadly.
View details for DOI 10.1016/j.tibs.2014.08.005
View details for PubMedID 25300714
- High-throughput mutate-map-rescue evaluates SHAPE-directed RNA structure and uncovers excited states RNA-A PUBLICATION OF THE RNA SOCIETY 2014; 20 (11): 1815-1826
Double-stranded RNA under force and torque: similarities to and striking differences from double-stranded DNA.
Proceedings of the National Academy of Sciences of the United States of America
2014; 111 (43): 15408-15413
RNA plays myriad roles in the transmission and regulation of genetic information that are fundamentally constrained by its mechanical properties, including the elasticity and conformational transitions of the double-stranded (dsRNA) form. Although double-stranded DNA (dsDNA) mechanics have been dissected with exquisite precision, much less is known about dsRNA. Here we present a comprehensive characterization of dsRNA under external forces and torques using magnetic tweezers. We find that dsRNA has a force-torque phase diagram similar to that of dsDNA, including plectoneme formation, melting of the double helix induced by torque, a highly overwound state termed "P-RNA," and a highly underwound, left-handed state denoted "L-RNA." Beyond these similarities, our experiments reveal two unexpected behaviors of dsRNA: Unlike dsDNA, dsRNA shortens upon overwinding, and its characteristic transition rate at the plectonemic buckling transition is two orders of magnitude slower than for dsDNA. Our results challenge current models of nucleic acid mechanics, provide a baseline for modeling RNAs in biological contexts, and pave the way for new classes of magnetic tweezers experiments to dissect the role of twist and torque for RNA-protein interactions at the single-molecule level.
View details for DOI 10.1073/pnas.1407197111
View details for PubMedID 25313077
Blind predictions of DNA and RNA tweezers experiments with force and torque.
PLoS computational biology
2014; 10 (8)
Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's "spring-like" conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that 'nucleosome-excluding' poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology.
View details for DOI 10.1371/journal.pcbi.1003756
View details for PubMedID 25102226
Understanding nucleic Acid-ion interactions.
Annual review of biochemistry
2014; 83: 813-841
Ions surround nucleic acids in what is referred to as an ion atmosphere. As a result, the folding and dynamics of RNA and DNA and their complexes with proteins and with each other cannot be understood without a reasonably sophisticated appreciation of these ions' electrostatic interactions. However, the underlying behavior of the ion atmosphere follows physical rules that are distinct from the rules of site binding that biochemists are most familiar and comfortable with. The main goal of this review is to familiarize nucleic acid experimentalists with the physical concepts that underlie nucleic acid-ion interactions. Throughout, we provide practical strategies for interpreting and analyzing nucleic acid experiments that avoid pitfalls from oversimplified or incorrect models. We briefly review the status of theories that predict or simulate nucleic acid-ion interactions and experiments that test these theories. Finally, we describe opportunities for going beyond phenomenological fits to a next-generation, truly predictive understanding of nucleic acid-ion interactions.
View details for DOI 10.1146/annurev-biochem-060409-092720
View details for PubMedID 24606136
Standardization of RNA chemical mapping experiments.
2014; 53 (19): 3063-3065
Chemical mapping experiments offer powerful information about RNA structure but currently involve ad hoc assumptions in data processing. We show that simple dilutions, referencing standards (GAGUA hairpins), and HiTRACE/MAPseeker analysis allow rigorous overmodification correction, background subtraction, and normalization for electrophoretic data and a ligation bias correction needed for accurate deep sequencing data. Comparisons across six noncoding RNAs stringently test the proposed standardization of dimethyl sulfate (DMS), 2'-OH acylation (SHAPE), and carbodiimide measurements. Identification of new signatures for extrahelical bulges and DMS "hot spot" pockets (including tRNA A58, methylated in vivo) illustrates the utility and necessity of standardization for quantitative RNA mapping.
View details for DOI 10.1021/bi5003426
View details for PubMedID 24766159
View details for PubMedCentralID PMC4033625
Structure determination of noncanonical RNA motifs guided by ¹H NMR chemical shifts.
2014; 11 (4): 413-416
Structured noncoding RNAs underlie fundamental cellular processes, but determining their three-dimensional structures remains challenging. We demonstrate that integrating ¹H NMR chemical shift data with Rosetta de novo modeling can be used to consistently determine high-resolution RNA structures. On a benchmark set of 23 noncanonical RNA motifs, including 11 'blind' targets, chemical-shift Rosetta for RNA (CS-Rosetta-RNA) recovered experimental structures with high accuracy (0.6-2.0 Å all-heavy-atom r.m.s. deviation) in 18 cases.
View details for DOI 10.1038/nmeth.2876
View details for PubMedID 24584194
View details for PubMedCentralID PMC3985481
Bayesian energy landscape tilting: towards concordant models of molecular ensembles.
2014; 106 (6): 1381-1390
Predicting biological structure has remained challenging for systems such as disordered proteins that take on myriad conformations. Hybrid simulation/experiment strategies have been undermined by difficulties in evaluating errors from computational model inaccuracies and data uncertainties. Building on recent proposals from maximum entropy theory and nonequilibrium thermodynamics, we address these issues through a Bayesian energy landscape tilting (BELT) scheme for computing Bayesian hyperensembles over conformational ensembles. BELT uses Markov chain Monte Carlo to directly sample maximum-entropy conformational ensembles consistent with a set of input experimental observables. To test this framework, we apply BELT to model trialanine, starting from disagreeing simulations with the force fields ff96, ff99, ff99sbnmr-ildn, CHARMM27, and OPLS-AA. BELT incorporation of limited chemical shift and (3)J measurements gives convergent values of the peptide's α, β, and PPII conformational populations in all cases. As a test of predictive power, all five BELT hyperensembles recover set-aside measurements not used in the fitting and report accurate errors, even when starting from highly inaccurate simulations. BELT's principled framework thus enables practical predictions for complex biomolecular systems from discordant simulations and sparse data.
View details for DOI 10.1016/j.bpj.2014.02.009
View details for PubMedID 24655513
RNA design rules from a massive open laboratory
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2014; 111 (6): 2122-2127
Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models--even at the secondary structure level--hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies--including several previously unrecognized negative design rules--were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.
View details for DOI 10.1073/pnas.1313039111
View details for Web of Science ID 000330999600027
View details for PubMedID 24469816
Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2014; 82: 26-42
For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins.
View details for Web of Science ID 000331147900004
View details for PubMedID 24318984
The Mutate-and-Map Protocol for Inferring Base Pairs in Structured RNA.
Methods in molecular biology (Clifton, N.J.)
2014; 1086: 53-77
Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule's "contact map." Here, we give our in-house protocol for this "mutate-and-map" (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure.
View details for DOI 10.1007/978-1-62703-667-2_4
View details for PubMedID 24136598
Massively Parallel RNA Chemical Mapping with a Reduced Bias MAP-Seq Protocol.
Methods in molecular biology (Clifton, N.J.)
2014; 1086: 95-117
Chemical mapping methods probe RNA structure by revealing and leveraging correlations of a nucleotide's structural accessibility or flexibility with its reactivity to various chemical probes. Pioneering work by Lucks and colleagues has expanded this method to probe hundreds of molecules at once on an Illumina sequencing platform, obviating the use of slab gels or capillary electrophoresis on one molecule at a time. Here, we describe optimizations to this method from our lab, resulting in the MAP-seq protocol (Multiplexed Accessibility Probing read out through sequencing), version 1.0. The protocol permits the quantitative probing of thousands of RNAs at once, by several chemical modification reagents, on the time scale of a day using a tabletop Illumina machine. This method and a software package MAPseeker ( http://simtk.org/home/map_seeker ) address several potential sources of bias, by eliminating PCR steps, improving ligation efficiencies of ssDNA adapters, and avoiding problematic heuristics in prior algorithms. We hope that the step-by-step description of MAP-seq 1.0 will help other RNA mapping laboratories to transition from electrophoretic to next-generation sequencing methods and to further reduce the turnaround time and any remaining biases of the protocol.
View details for DOI 10.1007/978-1-62703-667-2_6
View details for PubMedID 24136600
Atomic-Accuracy Prediction of Protein Loop Structures through an RNA-Inspired Ansatz
2013; 8 (10)
Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.
View details for DOI 10.1371/journal.pone.0074830
View details for Web of Science ID 000326032600003
View details for PubMedID 24204571
Adding Diverse Noncanonical Backbones to Rosetta: Enabling Peptidomimetic Design
2013; 8 (7)
Peptidomimetics are classes of molecules that mimic structural and functional attributes of polypeptides. Peptidomimetic oligomers can frequently be synthesized using efficient solid phase synthesis procedures similar to peptide synthesis. Conformationally ordered peptidomimetic oligomers are finding broad applications for molecular recognition and for inhibiting protein-protein interactions. One critical limitation is the limited set of design tools for identifying oligomer sequences that can adopt desired conformations. Here, we present expansions to the ROSETTA platform that enable structure prediction and design of five non-peptidic oligomer scaffolds (noncanonical backbones), oligooxopiperazines, oligo-peptoids, [Formula: see text]-peptides, hydrogen bond surrogate helices and oligosaccharides. This work is complementary to prior additions to model noncanonical protein side chains in ROSETTA. The main purpose of our manuscript is to give a detailed description to current and future developers of how each of these noncanonical backbones was implemented. Furthermore, we provide a general outline for implementation of new backbone types not discussed here. To illustrate the utility of this approach, we describe the first tests of the ROSETTA molecular mechanics energy function in the context of oligooxopiperazines, using quantum mechanical calculations as comparison points, scanning through backbone and side chain torsion angles for a model peptidomimetic. Finally, as an example of a novel design application, we describe the automated design of an oligooxopiperazine that inhibits the p53-MDM2 protein-protein interaction. For the general biological and bioengineering community, several noncanonical backbones have been incorporated into web applications that allow users to freely and rapidly test the presented protocols (http://rosie.rosettacommons.org). This work helps address the peptidomimetic community's need for an automated and expandable modeling tool for noncanonical backbones.
View details for DOI 10.1371/journal.pone.0067051
View details for Web of Science ID 000323110600005
View details for PubMedID 23869206
View details for PubMedCentralID PMC3712014
- HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis NUCLEIC ACIDS RESEARCH 2013; 41 (W1): W492-W498
Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE)
2013; 8 (5)
The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org.
View details for DOI 10.1371/journal.pone.0063906
View details for Web of Science ID 000320362700078
View details for PubMedID 23717507
- Remodeling a beta-peptide bundle CHEMICAL SCIENCE 2013; 4 (1): 319-324
- Correcting pervasive errors in RNA crystallography through enumerative structure prediction NATURE METHODS 2013; 10 (1): 74-U105
Advances, Interactions, and Future Developments in the CNS, Phenix, and Rosetta Structural Biology Software Systems
ANNUAL REVIEW OF BIOPHYSICS, VOL 42
2013; 42: 265-287
Advances in our understanding of macromolecular structure come from experimental methods, such as X-ray crystallography, and also computational analysis of the growing number of atomic models obtained from such experiments. The later analyses have made it possible to develop powerful tools for structure prediction and optimization in the absence of experimental data. In recent years, a synergy between these computational methods for crystallographic structure determination and structure prediction and optimization has begun to be exploited. We review some of the advances in the algorithms used for crystallographic structure determination in the Phenix and Crystallography & NMR System software packages and describe how methods from ab initio structure prediction and refinement in Rosetta have been applied to challenging crystallographic problems. The prospects for future improvement of these methods are discussed.
View details for DOI 10.1146/annurev-biophys-083012-130253
View details for Web of Science ID 000321695700013
View details for PubMedID 23451892
An RNA Mapping DataBase for curating RNA structure mapping experiments
2012; 28 (22): 3006-3008
We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly.Freely available on the web at http://firstname.lastname@example.org.Supplementary data are available at Bioinformatics Online.
View details for DOI 10.1093/bioinformatics/bts554
View details for Web of Science ID 000311303500028
View details for PubMedID 22976082
Quantitative Dimethyl Sulfate Mapping for Automated RNA Secondary Structure Inference
2012; 51 (36): 7037-7039
For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using an energy minimization framework developed for 2'-OH acylation (SHAPE) mapping. On six noncoding RNAs with crystallographic models, DMS-guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, respectively, comparable to or better than those of SHAPE-guided modeling, and bootstrapping provides straightforward confidence estimates. Integrating DMS-SHAPE data and including 1-cyclohexyl(2-morpholinoethyl) carbodiimide metho-p-toluene sulfonate (CMCT) reactivities provide small additional improvements. These results establish DMS mapping, an already routine technique, as a quantitative tool for unbiased RNA secondary structure modeling.
View details for DOI 10.1021/bi3008802
View details for Web of Science ID 000308833500001
View details for PubMedID 22913637
Squaring theory with practice in RNA design
CURRENT OPINION IN STRUCTURAL BIOLOGY
2012; 22 (4): 457-466
Ribonucleic acid (RNA) design offers unique opportunities for engineering genetic networks and nanostructures that self-assemble within living cells. Recent years have seen the creation of increasingly complex RNA devices, including proof-of-concept applications for in vivo three-dimensional scaffolding, imaging, computing, and control of biological behaviors. Expert intuition and simple design rules--the stability of double helices, the modularity of noncanonical RNA motifs, and geometric closure--have enabled these successful applications. Going beyond heuristics, emerging algorithms may enable automated design of RNAs with nucleotide-level accuracy but, as illustrated on a recent RNA square design, are not yet fully predictive. Looking ahead, technological advances in RNA synthesis and interrogation are poised to radically accelerate the discovery and stringent testing of design methods.
View details for DOI 10.1016/j.sbi.2012.06.003
View details for Web of Science ID 000308516800009
View details for PubMedID 22832174
Ultraviolet Shadowing of RNA Can Cause Significant Chemical Damage in Seconds
Chemical purity of RNA samples is important for high-precision studies of RNA folding and catalytic behavior, but photodamage accrued during ultraviolet (UV) shadowing steps of sample preparation can reduce this purity. Here, we report the quantitation of UV-induced damage by using reverse transcription and single-nucleotide-resolution capillary electrophoresis. We found photolesions in a dozen natural and artificial RNAs; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps recommended for UV shadowing. Irradiation time-courses revealed detectable damage within a few seconds of exposure for 254 nm lamps held at a distance of 5 to 10 cm from 0.5-mm thickness gels. Under these conditions, 200-nucleotide RNAs subjected to 20 seconds of UV shadowing incurred damage to 16-27% of molecules; and, due to a 'skin effect', the molecule-by-molecule distribution of lesions gave 4-fold higher variance than a Poisson distribution. Thicker gels, longer wavelength lamps, and shorter exposure times reduced but did not eliminate damage. These results suggest that RNA biophysical studies should report precautions taken to avoid artifactual heterogeneity from UV shadowing.
View details for DOI 10.1038/srep00517
View details for Web of Science ID 000306707600001
View details for PubMedID 22816040
Metal-ion rescue revisited: Biochemical detection of site-bound metal ions important for RNA folding
RNA-A PUBLICATION OF THE RNA SOCIETY
2012; 18 (6): 1123-1141
Within the three-dimensional architectures of RNA molecules, divalent metal ions populate specific locations, shedding their water molecules to form chelates. These interactions help the RNA adopt and maintain specific conformations and frequently make essential contributions to function. Defining the locations of these site-bound metal ions remains challenging despite the growing database of RNA structures. Metal-ion rescue experiments have provided a powerful approach to identify and distinguish catalytic metal ions within RNA active sites, but the ability of such experiments to identify metal ions that contribute to tertiary structure acquisition and structural stability is less developed and has been challenged. Herein, we use the well-defined P4-P6 RNA domain of the Tetrahymena group I intron to reevaluate prior evidence against the discriminatory power of metal-ion rescue experiments and to advance thermodynamic descriptions necessary for interpreting these experiments. The approach successfully identifies ligands within the RNA that occupy the inner coordination sphere of divalent metal ions and distinguishes them from ligands that occupy the outer coordination sphere. Our results underscore the importance of obtaining complete folding isotherms and establishing and evaluating thermodynamic models in order to draw conclusions from metal-ion rescue experiments. These results establish metal-ion rescue as a rigorous tool for identifying and dissecting energetically important metal-ion interactions in RNAs that are noncatalytic but critical for RNA tertiary structure.
View details for DOI 10.1261/rna.028738.111
View details for Web of Science ID 000304423000003
View details for PubMedID 22539523
View details for PubMedCentralID PMC3358636
RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction
RNA-A PUBLICATION OF THE RNA SOCIETY
2012; 18 (4): 610-625
We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.
View details for DOI 10.1261/rna.031054.111
View details for Web of Science ID 000301954600002
View details for PubMedID 22361291
View details for PubMedCentralID PMC3312550
Automated RNA Structure Prediction Uncovers a Kink-Turn Linker in Double Glycine Riboswitches
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
2012; 134 (3): 1404-1407
The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four prediction tools to a class of double glycine riboswitches that can bind two ligands cooperatively. A novel method (BPPalign), RMdetect, JAR3D, and Rosetta 3D modeling give consistent predictions for a new stem P0 and a kink-turn motif. These elements structure the linker between the RNAs' double aptamers. Chemical mapping on the Fusobacterium nucleatum riboswitch with N-methylisatoic anhydride, dimethyl sulfate and 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate probing, mutate-and-map studies, and mutation/rescue experiments all provide strong evidence for the structured linker. Under solution conditions that permit rigorous thermodynamic analysis, disrupting this helix-junction-helix structure gives 120- and 6-30-fold poorer dissociation constants for the RNA's two glycine-binding transitions, corresponding to an overall energetic impact of 4.3 ± 0.5 kcal/mol. Prior biochemical and crystallography studies did not include this critical element due to over-truncation of the RNA. We speculate that several further undiscovered elements are likely to exist in the flanking regions of this and other functional RNAs, and automated prediction tools can play a useful role in their detection and dissection.
View details for DOI 10.1021/ja2093508
View details for Web of Science ID 000301084400005
View details for PubMedID 22192063
An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (51): 20573-20578
Atomic-accuracy structure prediction of macromolecules should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz," for recursively constructing well-packed atomic-detail models in small steps, enumerating several million conformations for each monomer, and covering all build-up paths. By making use of high-performance computing and the Rosetta framework, we provide first tests of this hypothesis on a benchmark of 15 RNA loop-modeling problems drawn from riboswitches, ribozymes, and the ribosome, including 10 cases that are not solvable by current knowledge-based modeling approaches. For each loop problem, this deterministic stepwise assembly method either reaches atomic accuracy or exposes flaws in Rosetta's all-atom energy function, indicating the resolution of the conformational sampling bottleneck. As a further rigorous test, we have carried out a blind all-atom prediction for a noncanonical RNA motif, the C7.2 tetraloop/receptor, and validated this model through nucleotide-resolution chemical mapping experiments. Stepwise assembly is an enumerative, ab initio build-up method that systematically outperforms existing Monte Carlo and knowledge-based methods for 3D structure prediction.
View details for DOI 10.1073/pnas.1106516108
View details for Web of Science ID 000298289400065
View details for PubMedID 22143768
A two-dimensional mutate-and-map strategy for non-coding RNA structure
2011; 3 (12): 954-962
Non-coding RNAs fold into precise base-pairing patterns to carry out critical roles in genetic regulation and protein synthesis, but determining RNA structure remains difficult. Here, we show that coupling systematic mutagenesis with high-throughput chemical mapping enables accurate base-pair inference of domains from ribosomal RNA, ribozymes and riboswitches. For a six-RNA benchmark that has challenged previous chemical/computational methods, this 'mutate-and-map' strategy gives secondary structures that are in agreement with crystallography (helix error rates, 2%), including a blind test on a double-glycine riboswitch. Through modelling of partially ordered states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the data report on tertiary contacts within non-coding RNAs, and coupling to the Rosetta/FARFAR algorithm gives nucleotide-resolution three-dimensional models (helix root-mean-squared deviation, 5.7 Å) of an adenine riboswitch. These results establish a promising two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behaviour.
View details for DOI 10.1038/NCHEM.1176
View details for Web of Science ID 000297685800014
View details for PubMedID 22109276
Understanding the Errors of SHAPE-Directed RNA Structure Modeling
2011; 50 (37): 8049-8056
Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.
View details for DOI 10.1021/bi4200524n
View details for Web of Science ID 000294791100021
View details for PubMedID 21842868
Quantitative comparison of villin headpiece subdomain simulations and triplet-triplet energy transfer experiments
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (31): 12734-12739
As the fastest folding protein, the villin headpiece (HP35) serves as an important bridge between simulation and experimental studies of protein folding. Despite the simplicity of this system, experiments continue to reveal a number of surprises, including structure in the unfolded state and complex equilibrium dynamics near the native state. Using 2.5 ms of molecular dynamics and Markov state models, we connect to current experimental results in three ways. First, we present and validate a novel method for the quantitative prediction of triplet-triplet energy transfer experiments. Second, we construct a many-state model for HP35 that is consistent with previous experiments. Finally, we predict contact-formation time traces for all 1,225 possible triplet-triplet energy transfer experiments on HP35.
View details for DOI 10.1073/pnas.1010880108
View details for Web of Science ID 000293385700043
View details for PubMedID 21768345
HiTRACE: high-throughput robust analysis for capillary electrophoresis
2011; 27 (13): 1798-1805
Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical mapping for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics and kinetics. In particular, the slow rate and poor automation of available analysis tools have bottlenecked a new generation of studies involving hundreds of CE profiles per experiment.We propose a computational method called high-throughput robust analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in large-scale nucleic acid CE analysis, including the profile alignment that has heretofore been a rate-limiting step in the highest throughput experiments. We illustrate the application of HiTRACE on 13 datasets representing 4 different RNAs, 3 chemical modification strategies and up to 480 single mutant variants; the largest datasets each include 87 360 bands. By applying a series of robust dynamic programming algorithms, HiTRACE outperforms prior tools in terms of alignment and fitting quality, as assessed by measures including the correlation between quantified band intensities between replicate datasets. Furthermore, while the smallest of these datasets required 7-10 h of manual intervention using prior approaches, HiTRACE quantitation of even the largest datasets herein was achieved in 3-12 min. The HiTRACE method, therefore, resolves a critical barrier to the efficient and accurate analysis of nucleic acid structure in experiments involving tens of thousands of electrophoretic bands.
View details for DOI 10.1093/bioinformatics/btr277
View details for Web of Science ID 000291752600058
View details for PubMedID 21561922
Sharing and archiving nucleic acid structure mapping data
RNA-A PUBLICATION OF THE RNA SOCIETY
2011; 17 (7): 1204-1212
Nucleic acids are particularly amenable to structural characterization using chemical and enzymatic probes. Each individual structure mapping experiment reveals specific information about the structure and/or dynamics of the nucleic acid. Currently, there is no simple approach for making these data publically available in a standardized format. We therefore developed a standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments, or SNRNASMs. We propose a schema for sharing nucleic acid chemical probing data that uses generic public servers for storing, retrieving, and searching the data. We have also developed a consistent nomenclature (ontology) within the Ontology of Biomedical Investigations (OBI), which provides unique identifiers (termed persistent URLs, or PURLs) for classifying the data. Links to standardized data sets shared using our proposed format along with a tutorial and links to templates can be found at http://snrnasm.bio.unc.edu.
View details for DOI 10.1261/rna.2753211
View details for Web of Science ID 000291683500002
View details for PubMedID 21610212
View details for PubMedCentralID PMC3138558
Four Small Puzzles That Rosetta Doesn't Solve
2011; 6 (5)
A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
View details for DOI 10.1371/journal.pone.0020044
View details for Web of Science ID 000290793400036
View details for PubMedID 21625446
A mutate-and-map strategy accurately infers the base pairs of a 35-nucleotide model RNA
RNA-A PUBLICATION OF THE RNA SOCIETY
2011; 17 (3): 522-534
We present a rapid experimental strategy for inferring base pairs in structured RNAs via an information-rich extension of classic chemical mapping approaches. The mutate-and-map method, previously applied to a DNA/RNA helix, systematically searches for single mutations that enhance the chemical accessibility of base-pairing partners distant in sequence. To test this strategy for structured RNAs, we have carried out mutate-and-map measurements for a 35-nt hairpin, called the MedLoop RNA, embedded within an 80-nt sequence. We demonstrate the synthesis of all 105 single mutants of the MedLoop RNA sequence and present high-throughput DMS, CMCT, and SHAPE modification measurements for this library at single-nucleotide resolution. The resulting two-dimensional data reveal visually clear, punctate features corresponding to RNA base pair interactions as well as more complex features; these signals can be qualitatively rationalized by comparison to secondary structure predictions. Finally, we present an automated, sequence-blind analysis that permits the confident identification of nine of the 10 MedLoop RNA base pairs at single-nucleotide resolution, while discriminating against all 1460 false-positive base pairs. These results establish the accuracy and information content of the mutate-and-map strategy and support its feasibility for rapidly characterizing the base-pairing patterns of larger and more complex RNA systems.
View details for DOI 10.1261/rna.2516311
View details for Web of Science ID 000287195900014
View details for PubMedID 21239468
ROSETTA3: AN OBJECT-ORIENTED SOFTWARE SUITE FOR THE SIMULATION AND DESIGN OF MACROMOLECULES
METHODS IN ENZYMOLOGY, VOL 487: COMPUTER METHODS, PT C
We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
View details for DOI 10.1016/S0076-6879(11)87019-9
View details for Web of Science ID 000286532000019
View details for PubMedID 21187238
Rosetta in CAPRI rounds 13-19
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2010; 78 (15): 3212-3218
Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13-19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
View details for DOI 10.1002/prot.22784
View details for Web of Science ID 000283565000020
View details for PubMedID 20597089
A Mutate-and-Map Strategy for Inferring Base Pairs in Structured Nucleic Acids: Proof of Concept on a DNA/RNA Helix
2010; 49 (35): 7414-7416
We propose a rapid chemical strategy for identifying base pairs in structured nucleic acid systems. The approach goes beyond traditional chemical mapping approaches by monitoring perturbations of each residue's chemical accessibility in response to systematic mutagenesis of residues that are distant in sequence but nearby in three dimensions. As a proof of concept, we present high-throughput dimethyl sulfate accessibility data for a chimeric DNA/RNA system in which every possible sequence variation and deletion in a 20 bp region has been synthesized and tested. The data demonstrate that 88% of the system's base pairs can be robustly inferred, with A/A and T/C DNA/RNA mismatches giving the strongest signals. These results point to the feasibility of rapid base pair inference in larger and more complex nucleic acid systems with unknown structure.
View details for DOI 10.1021/bi101123g
View details for Web of Science ID 000281305200002
View details for PubMedID 20677780
Atomic accuracy in predicting and designing noncanonical RNA structure
2010; 7 (4): 291-294
We present fragment assembly of RNA with full-atom refinement (FARFAR), a Rosetta framework for predicting and designing noncanonical motifs that define RNA tertiary structure. In a test set of thirty-two 6-20-nucleotide motifs, FARFAR recapitulated 50% of the experimental structures at near-atomic accuracy. Sequence redesign calculations recovered native bases at 65% of residues engaged in noncanonical interactions, and we experimentally validated mutations predicted to stabilize a signal recognition particle domain.
View details for DOI 10.1038/NMETH.1433
View details for Web of Science ID 000276150600018
View details for PubMedID 20190761
Simultaneous prediction of protein folding and docking at high resolution
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (45): 18978-18983
Interleaved dimers and higher order symmetric oligomers are ubiquitous in biology but present a challenge to de novo structure prediction methodology: The structure adopted by a monomer can be stabilized largely by interactions with other monomers and hence not the lowest energy state of a single chain. Building on the Rosetta framework, we present a general method to simultaneously model the folding and docking of multiple-chain interleaved homo-oligomers. For more than a third of the cases in a benchmark set of interleaved homo-oligomers, the method generates near-native models of large alpha-helical bundles, interlocking beta sandwiches, and interleaved alpha/beta motifs with an accuracy high enough for molecular replacement based phasing. With the incorporation of NMR chemical shift information, accurate models can be obtained consistently for symmetric complexes with as many as 192 total amino acids; a blind prediction was within 1 A rmsd of the traditionally determined NMR structure, and fit independently collected RDC data equally well. Together, these results show that the Rosetta "fold-and-dock" protocol can produce models of homo-oligomeric complexes with near-atomic-level accuracy and should be useful for crystallographic phasing and the rapid determination of the structures of multimers with limited NMR information.
View details for DOI 10.1073/pnas.0904407106
View details for Web of Science ID 000271637500021
View details for PubMedID 19864631
A robust peak detection method for RNA structure inference by high-throughput contact mapping
2009; 25 (9): 1137-1144
For high-throughput prediction of the helical arrangements of large RNA molecules, an innovative method termed multiplexed hydroxyl radical (*OH) cleavage analysis (MOHCA) has been proposed. A key step in this promising technique is to detect peaks accurately from noisy radioactivity profiles. Since manual peak finding is laborious and prone to error, an automated peak detection method to improve the accuracy and throughput of MOHCA is required. Existing methods were not applicable to MOHCA due to their high false positive rates.We developed a two-step computational method that can detect peaks from MOHCA profiles in a robust manner. The first step exploits an ensemble of linear and non-linear signal processing techniques to find true peak candidates. In the second step, a binary classifier trained with the characteristics of true and false peaks is used to eliminate false peaks out of the peak candidates. We tested the proposed approach with 2002 MOHCA cleavage profiles and obtained the median recall, precision and F-measure values of 0.917, 0.750 and 0.830, respectively. Compared with the alternatives considered, the proposed method was able to handle false peaks substantially better, thus resulting in 51.0-71.8% higher median values of precision and F-measure.The software and supplementary data are available at http://dna.korea.ac.kr/pub/mohca.
View details for DOI 10.1093/bioinformatics/btp110
View details for Web of Science ID 000265523300007
View details for PubMedID 19246511
Prospects for de novo phasing with de novo protein models
ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY
2009; 65: 169-175
The prospect of phasing diffraction data sets ;de novo' for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets that are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33-79%) and asymmetric unit copy numbers (1-4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ;de novo phasing with de novo models' requires significant investment of computational power, much greater than 10(3) CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.
View details for DOI 10.1107/S0907444908020039
View details for Web of Science ID 000263557900009
View details for PubMedID 19171972
Structure prediction for CASP8 with all-atom refinement using Rosetta
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2009; 77: 89-99
We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For the 64 domains with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 cases, and improved over the best starting model for 43 cases. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in seven cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.
View details for DOI 10.1002/prot.22540
View details for Web of Science ID 000272244700009
View details for PubMedID 19701941
Remeasuring the double helix
2008; 322 (5900): 446-449
DNA is thought to behave as a stiff elastic rod with respect to the ubiquitous mechanical deformations inherent to its biology. To test this model at short DNA lengths, we measured the mean and variance of end-to-end length for a series of DNA double helices in solution, using small-angle x-ray scattering interference between gold nanocrystal labels. In the absence of applied tension, DNA is at least one order of magnitude softer than measured by single-molecule stretching experiments. Further, the data rule out the conventional elastic rod model. The variance in end-to-end length follows a quadratic dependence on the number of base pairs rather than the expected linear dependence, indicating that DNA stretching is cooperative over more than two turns of the DNA double helix. Our observations support the idea of long-range allosteric communication through DNA structure.
View details for DOI 10.1126/science.1158881
View details for Web of Science ID 000260094500048
View details for PubMedID 18927394
Structural inference of native and partially folded RNA by high-throughput contact mapping
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2008; 105 (11): 4144-4149
The biological behaviors of ribozymes, riboswitches, and numerous other functional RNA molecules are critically dependent on their tertiary folding and their ability to sample multiple functional states. The conformational heterogeneity and partially folded nature of most of these states has rendered their characterization by high-resolution structural approaches difficult or even intractable. Here we introduce a method to rapidly infer the tertiary helical arrangements of large RNA molecules in their native and non-native solution states. Multiplexed hydroxyl radical (.OH) cleavage analysis (MOHCA) enables the high-throughput detection of numerous pairs of contacting residues via random incorporation of radical cleavage agents followed by two-dimensional gel electrophoresis. We validated this technology by recapitulating the unfolded and native states of a well studied model RNA, the P4-P6 domain of the Tetrahymena ribozyme, at subhelical resolution. We then applied MOHCA to a recently discovered third state of the P4-P6 RNA that is stabilized by high concentrations of monovalent salt and whose partial order precludes conventional techniques for structure determination. The three-dimensional portrait of a compact, non-native RNA state reveals a well ordered subset of native tertiary contacts, in contrast to the dynamic but otherwise similar molten globule states of proteins. With its applicability to nearly any solution state, we expect MOHCA to be a powerful tool for illuminating the many functional structures of large RNA molecules and RNA/protein complexes.
View details for DOI 10.1073/pnas.0709032105
View details for Web of Science ID 000254263300015
View details for PubMedID 18322008
View details for PubMedCentralID PMC2393762
Macromolecular modeling with Rosetta
ANNUAL REVIEW OF BIOCHEMISTRY
2008; 77: 363-382
Advances over the past few years have begun to enable prediction and design of macromolecular structures at near-atomic accuracy. Progress has stemmed from the development of reasonably accurate and efficiently computed all-atom potential functions as well as effective conformational sampling strategies appropriate for searching a highly rugged energy landscape, both driven by feedback from structure prediction and design tests. A unified energetic and kinematic framework in the Rosetta program allows a wide range of molecular modeling problems, from fibril structure prediction to RNA folding to the design of new protein interfaces, to be readily investigated and highlights areas for improvement. The methodology enables the creation of novel molecules with useful functions and holds promise for accelerating experimental structural inference. Emerging connections to crystallographic phasing, NMR modeling, and lower-resolution approaches are described and critically assessed.
View details for DOI 10.1146/annurev.biochem.77.062906.171838
View details for Web of Science ID 000257596800016
View details for PubMedID 18410248
High-resolution structure prediction and the crystallographic phase problem
2007; 450 (7167): 259-U7
The energy-based refinement of low-resolution protein structure models to atomic-level accuracy is a major challenge for computational structural biology. Here we describe a new approach to refining protein structure models that focuses sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom force field. In applications to models produced using nuclear magnetic resonance data and to comparative models based on distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the backbone conformations and the placement of core side chains. Furthermore, the resulting models satisfy a particularly stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach the high accuracy required for molecular replacement without any experimental phase information and in the absence of templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of high-resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate previous models.
View details for DOI 10.1038/nature06249
View details for Web of Science ID 000250746200052
View details for PubMedID 17934447
Automated de novo prediction of native-like RNA tertiary structures
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (37): 14664-14669
RNA tertiary structure prediction has been based almost entirely on base-pairing constraints derived from phylogenetic covariation analysis. We describe here a complementary approach, inspired by the Rosetta low-resolution protein structure prediction method, that seeks the lowest energy tertiary structure for a given RNA sequence without using evolutionary information. In a benchmark test of 20 RNA sequences with known structure and lengths of approximately 30 nt, the new method reproduces better than 90% of Watson-Crick base pairs, comparable with the accuracy of secondary structure prediction methods. In more than half the cases, at least one of the top five models agrees with the native structure to better than 4 A rmsd over the backbone. Most importantly, the method recapitulates more than one-third of non-Watson-Crick base pairs seen in the native structures. Tandem stacks of "sheared" base pairs, base triplets, and pseudoknots are among the noncanonical features reproduced in the models. In the cases in which none of the top five models were native-like, higher energy conformations similar to the native structures are still sampled frequently but not assigned low energies. These results suggest that modest improvements in the energy function, together with the incorporation of information from phylogenetic covariance, may allow confident and accurate structure prediction for larger and more complex RNA chains.
View details for DOI 10.1073/pnas.0703836104
View details for Web of Science ID 000249513000023
View details for PubMedID 17726102
Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home.
2007; 69: 118-128
We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.
View details for PubMedID 17894356
- Structure prediction for CABP7 targets using extensive all-atom refinement with Rosetta@home 7th Meeting on Critical Assessment of Techniques for Protein Structure Prediction WILEY-BLACKWELL. 2007: 118–128
Determining the Mg2+ stoichiometry for folding an RNA metal ion core
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
2005; 127 (23): 8272-8273
The folding and catalytic function of RNA molecules depend on their interactions with divalent metal ions, such as magnesium. As with every molecular process, the most basic knowledge required for understanding the close relationship of an RNA with its metal ions is the stoichiometry of the interaction. Unfortunately, inventories of the numbers of divalent ions associated with unfolded and folded RNA states have been unattainable. A common approach has been to interpret Hill coefficients fit to folding equilibria as the number of metal ions bound upon folding. However, this approach is vitiated by the presence of diffusely associated divalent ions in a dynamic ion atmosphere and by the likelihood of multiple transitions along a folding pathway. We demonstrate that the use of molar concentrations of background monovalent salt can alleviate these complications. These simplifying solution conditions allow a precise determination of the stoichiometry of the magnesium ions involved in folding the metal ion core of the P4-P6 domain of the Tetrahymena group I ribozyme. Hill analysis of hydroxyl radical footprinting data suggests that the P4-P6 RNA core folds cooperatively upon the association of two metal ions. This unexpectedly small stoichiometry is strongly supported by counting magnesium ions associated with the P4-P6 RNA via fluorescence titration and atomic emission spectroscopy. By pinpointing the metal ion stoichiometry, these measurements provide a critical but previously missing step in the thermodynamic dissection of the coupling between metal ion binding and RNA folding.
View details for DOI 10.1021/ja051422h
View details for Web of Science ID 000229751100020
View details for PubMedID 15941246
SAFA: Semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments
RNA-A PUBLICATION OF THE RNA SOCIETY
2005; 11 (3): 344-354
Footprinting is a powerful and widely used tool for characterizing the structure, thermodynamics, and kinetics of nucleic acid folding and ligand binding reactions. However, quantitative analysis of the gel images produced by footprinting experiments is tedious and time-consuming, due to the absence of informatics tools specifically designed for footprinting analysis. We have developed SAFA, a semi-automated footprinting analysis software package that achieves accurate gel quantification while reducing the time to analyze a gel from several hours to 15 min or less. The increase in analysis speed is achieved through a graphical user interface that implements a novel methodology for lane and band assignment, called "gel rectification," and an optimized band deconvolution algorithm. The SAFA software yields results that are consistent with published methodologies and reduces the investigator-dependent variability compared to less automated methods. These software developments simplify the analysis procedure for a footprinting gel and can therefore facilitate the use of quantitative footprinting techniques in nucleic acid laboratories that otherwise might not have considered their use. Further, the increased throughput provided by SAFA may allow a more comprehensive understanding of molecular interactions. The software and documentation are freely available for download at http://safa.stanford.edu.
View details for DOI 10.1261/rna.7214405
View details for Web of Science ID 000227190000011
View details for PubMedID 15701734
View details for PubMedCentralID PMC1262685