Bio


Dr. Das is a Professor of Biochemistry at Stanford University School of Medicine. After training in particle physics and cosmology at Harvard, Cambridge, University College London, and Stanford, Dr. Das did postdoctoral research in computational protein folding at the University of Washington with David Baker. On returning to Stanford, Dr. Das set up his lab to focus on computer modeling and design of RNA molecules, which underlie important molecular machines in biology and medicine. As a core part of this research, Dr. Das leads Eterna, an open science platform that crowdsources intractable RNA design problems to 250,000 players of an online videogame and provides scoring feedback based on actual wet-lab experiments. Dr. Das has been recognized by the Burroughs-Wellcome Career Award at the Interface of Science, the Stanford Medicine Endowed Faculty Scholar award, and selection as an investigator of the Howard Hughes Medical Institute.

Academic Appointments


Honors & Awards


  • Gold Medal, Top US score, 2nd place worldwide, International Physics Olympiad (1995)
  • British Marshall Scholar, Marshall Aid Commemoration Commission (1998-2000)
  • Jane Coffin Childs Foundation Fellowship, Jane Coffin Childs Foundation (2006-2008)
  • Career Award at the Scientific Interface, Burroughs-Wellcome Foundation (2008-2015)
  • Keck Medical Research Grant award, W. M. Keck Foundation (2012)
  • OpenEye Outstanding Junior Faculty Award, American Chemical Society (2015)
  • Discovery Innovation Award, Stanford University School of Medicine (2016)
  • Stanford Medicine Endowed Faculty Scholar, Anonymous donor, Stanford School of Medicine (2020-2023)

Boards, Advisory Committees, Professional Organizations


  • Assessor, RNA Category, 15th and 16th Critical Assessments of Structure Prediction (CASP15 and CASP16) (2022 - Present)
  • COVID-19 Research Oversight Committee, Stanford University (2020 - 2020)
  • Structural Biology Review Committee, SLAC Linac Coherent Light Source (2020 - 2020)
  • Co-author, RNA Synthetic Biology Roadmap, Engineering Biology Research Consortium (2019 - Present)
  • Editorial Advisory Board, Biochemistry (2017 - 2020)

Professional Education


  • Ph.D., Stanford University, Physics (2005)
  • M.Res., University College London, Biocomplexity (2000)
  • M.Phil., Cambridge University, Physics (Radio Astronomy) (1999)
  • A.B., s.c.l., Harvard University, Physics (1998)

Current Research and Scholarly Interests


Our lab strives to predict and design how biopolymer sequences define and regulate biopolymer structure/function, focusing on medically important RNA and RNA/protein complexes.

We develop algorithms to predict the structures and energetics of RNAs and RNA/protein interfaces at high resolution, with an increasing focus on ribosomes and viruses. We test these ideas through community-wide blind trials and by solving molecule structures and structure ensembles with chemical mapping, NMR, crystallographic, and cryoelectron microscopy data. Notable achievements include top models in the majority of RNA-Puzzles blind structure prediction challenges and first experimental structures of several historically and biomedically important RNA molecules, such as the Tetrahymena ribozyme.

Complementary to this computational research, we are developing information-rich biochemical methods to model the myriad structures of non-coding RNAs that remain unknown. Current efforts focus on probing the extent and biological impact of RNA structure and conformational change in fundamental processes like splicing and mRNA transport in brain cells and viruses.

In addition to modeling RNAs, we aim to design new ones for basic science, diagnostics, therapeutics, and vaccines. Our videogame project Eterna seeks missing rules and novel molecules for medicine by giving citizen scientists access to high-throughput wet-lab experiments. Notable achievements include the first algorithm for automated 3D RNA design, development of the current community benchmark for RNA design, discovery of optimal 'zero-energy' switches made of RNA, and invention of RNA calculators for point-of-care diagnostics responsive to complex gene signatures for active tuberculosis. This project has also given rise to several firsts in citizen science, including the first papers written by videogame players as lead authors and as sole authors.

2024-25 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Tertiary folds of the SL5 RNA from the 5' proximal region of SARS-CoV-2 and related coronaviruses. Proceedings of the National Academy of Sciences of the United States of America Kretsch, R. C., Xu, L., Zheludev, I. N., Zhou, X., Huang, R., Nye, G., Li, S., Zhang, K., Chiu, W., Das, R. 2024; 121 (10): e2320493121

    Abstract

    Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus severe-acute-respiratory-syndrome-related coronavirus 2 (SARS-CoV-2), resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4 to 6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across these human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9 to 8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4 to 9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities and notable differences, with implications for potential protein-binding modes and therapeutic targets.

    View details for DOI 10.1073/pnas.2320493121

    View details for PubMedID 38427602

  • Ribonanza: deep learning of RNA structure through dual crowdsourcing. bioRxiv : the preprint server for biology He, S., Huang, R., Townley, J., Kretsch, R. C., Karagianes, T. G., Cox, D. B., Blair, H., Penzar, D., Vyaltsev, V., Aristova, E., Zinkevich, A., Bakulin, A., Sohn, H., Krstevski, D., Fukui, T., Tatematsu, F., Uchida, Y., Jang, D., Lee, J. S., Shieh, R., Ma, T., Martynov, E., Shugaev, M. V., Bukhari, H. S., Fujikawa, K., Onodera, K., Henkel, C., Ron, S., Romano, J., Nicol, J. J., Nye, G. P., Wu, Y., Choe, C., Reade, W., Participants, E., Das, R. 2024

    Abstract

    Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.

    View details for DOI 10.1101/2024.02.24.581671

    View details for PubMedID 38464325

  • Compact RNA sensors for increasingly complex functions of multiple inputs. bioRxiv : the preprint server for biology Choe, C., Andreasson, J. O., Melaine, F., Kladwang, W., Wu, M. J., Portela, F., Wellington-Oguri, R., Nicol, J. J., Wayment-Steele, H. K., Gotrik, M., Participants, E., Khatri, P., Greenleaf, W. J., Das, R. 2024

    Abstract

    Designing single molecules that compute general functions of input molecular partners represents a major unsolved challenge in molecular design. Here, we demonstrate that high-throughput, iterative experimental testing of diverse RNA designs crowdsourced from Eterna yields sensors of increasingly complex functions of input oligonucleotide concentrations. After designing single-input RNA sensors with activation ratios beyond our detection limits, we created logic gates, including challenging XOR and XNOR gates, and sensors that respond to the ratio of two inputs. Finally, we describe the OpenTB challenge, which elicited 85-nucleotide sensors that compute a score for diagnosing active tuberculosis, based on the ratio of products of three gene segments. Building on OpenTB design strategies, we created an algorithm Nucleologic that produces similarly compact sensors for the three-gene score based on RNA and DNA. These results open new avenues for diverse applications of compact, single molecule sensors previously limited by design complexity.

    View details for DOI 10.1101/2024.01.04.572289

    View details for PubMedID 38260323

    View details for PubMedCentralID PMC10802310

  • Minimization of the E. coli ribosome, aided and optimized by community science. Nucleic acids research Tangpradabkul, T., Palo, M., Townley, J., Hsu, K. B., Participants, E., Smaga, S., Das, R., Schepartz, A. 2024

    Abstract

    The ribosome is a ribonucleoprotein complex found in all domains of life. Its role is to catalyze protein synthesis, the messenger RNA (mRNA)-templated formation of amide bonds between α-amino acid monomers. Amide bond formation occurs within a highly conserved region of the large ribosomal subunit known as the peptidyl transferase center (PTC). Here we describe the step-wise design and characterization of mini-PTC 1.1, a 284-nucleotide RNA that recapitulates many essential features of the Escherichia coli PTC. Mini-PTC 1.1 folds into a PTC-like structure under physiological conditions, even in the absence of r-proteins, and engages small molecule analogs of A- and P-site tRNAs. The sequence of mini-PTC 1.1 differs from the wild type E. coli ribosome at 12 nucleotides that were installed by a cohort of citizen scientists using the on-line video game Eterna. These base changes improve both the secondary structure and tertiary folding of mini-PTC 1.1 as well as its ability to bind small molecule substrate analogs. Here, the combined input from Eterna citizen-scientists and RNA structural analysis provides a robust workflow for the design of a minimal PTC that recapitulates many features of an intact ribosome.

    View details for DOI 10.1093/nar/gkad1254

    View details for PubMedID 38214230

  • CASP15 cryo-EM protein and RNA targets: Refinement and analysis using experimental maps. Proteins Mulvaney, T., Kretsch, R. C., Elliott, L., Beton, J. G., Kryshtafovych, A., Rigden, D. J., Das, R., Topf, M. 2023; 91 (12): 1935-1951

    Abstract

    CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, experimental structures by their nature are only models themselves-their construction involves a certain degree of subjectivity in interpreting density maps and translating them to atomic coordinates. Here, we directly utilized density maps to evaluate the predictions by employing a method for ranking the quality of protein chain predictions based on their fit into the experimental density. The fit-based ranking was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy, and occasionally even better than the reference structure in some regions of the model. Local assessment of predicted side chains in a 1.52A resolution map showed that side-chains are sometimes poorly positioned. Additionally, the top 118 predictions associated with 9 protein target reference structures were selected for automated refinement, in addition to the top 40 predictions for 11 RNA targets. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure. This refinement was successful despite large conformational changes often being required, showing that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryo-EM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors, and together with the lack of consensus amongst models in these regions suggests that modeling, in combination with model-fit to the density, holds the potential for identifying more flexible regions within the structure.

    View details for DOI 10.1002/prot.26644

    View details for PubMedID 37994556

  • Assessment of three-dimensional RNA structure prediction in CASP15. Proteins Das, R., Kretsch, R. C., Simpkin, A. J., Mulvaney, T., Pham, P., Rangan, R., Bu, F., Keegan, R. M., Topf, M., Rigden, D. J., Miao, Z., Westhof, E. 2023

    Abstract

    The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.

    View details for DOI 10.1002/prot.26602

    View details for PubMedID 37876231

  • RNA target highlights in CASP15: Evaluation of predicted models by structure providers. Proteins Kretsch, R. C., Andersen, E. S., Bujnicki, J. M., Chiu, W., Das, R., Luo, B., Masquida, B., McRae, E. K., Schroeder, G. M., Su, Z., Wedekind, J. E., Xu, L., Zhang, K., Zheludev, I. N., Moult, J., Kryshtafovych, A. 2023

    Abstract

    The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.

    View details for DOI 10.1002/prot.26550

    View details for PubMedID 37466021

  • New prediction categories in CASP15. Proteins Kryshtafovych, A., Antczak, M., Szachniuk, M., Zok, T., Kretsch, R. C., Rangan, R., Pham, P., Das, R., Robin, X., Studer, G., Durairaj, J., Eberhardt, J., Sweeney, A., Topf, M., Schwede, T., Fidelis, K., Moult, J. 2023

    Abstract

    Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.

    View details for DOI 10.1002/prot.26515

    View details for PubMedID 37306011

  • Community science designed ribosomes with beneficial phenotypes. Nature communications Kruger, A., Watkins, A. M., Wellington-Oguri, R., Romano, J., Kofman, C., DeFoe, A., Kim, Y., Anderson-Lee, J., Fisker, E., Townley, J., Eterna Participants, d'Aquino, A. E., Das, R., Jewett, M. C. 2023; 14 (1): 961

    Abstract

    Functional design of ribosomes with mutant ribosomal RNA (rRNA) can expand opportunities for understanding molecular translation, building cells from the bottom-up, and engineering ribosomes with altered capabilities. However, such efforts are hampered by cell viability constraints, an enormous combinatorial sequence space, and limitations on large-scale, 3D design of RNA structures and functions. To address these challenges, we develop an integrated community science and experimental screening approach for rational design of ribosomes. This approach couples Eterna, an online video game that crowdsources RNA sequence design to community scientists in the form of puzzles, with in vitro ribosome synthesis, assembly, and translation in multiple design-build-test-learn cycles. We apply our framework to discover mutant rRNA sequences that improve protein synthesis in vitro and cell growth in vivo, relative to wild type ribosomes, under diverse environmental conditions. This work provides insights into rRNA sequence-function relationships and has implications for synthetic biology.

    View details for DOI 10.1038/s41467-023-35827-3

    View details for PubMedID 36810740

  • RNA 3D Modeling with FARFAR2, Online. Methods in molecular biology (Clifton, N.J.) Watkins, A. M., Das, R. 2023; 2586: 233-249

    Abstract

    Understanding the three-dimensional structure of an RNA molecule is often essential to understanding its function. Sampling algorithms and energy functions for RNA structure prediction are improving, due to the increasing diversity of structural data available for training statistical potentials and testing structural data, along with a steady supply of blind challenges through the RNA-Puzzles initiative. The recent FARFAR2 algorithm enables near-native structure predictions on fairly complex RNA structures, including automated selection of final candidate models and estimation of model accuracy. Here, we describe the use of a publicly available webserver for RNA modeling for realistic scenarios using FARFAR2, available at https://rosie.rosettacommons.org/farfar2 . We walk through two cases in some detail: a simple model pseudoknot from the frameshifting element of beet western yellows virus modeled using the "basic interface" to the webserver and a replication of RNA-Puzzle 20, a metagenomic twister sister ribozyme, using the "advanced interface." We also describe example runs of FARFAR2 modeling including two kinds of experimental data: a c-di-GMP riboswitch modeled with low-resolution restraints from MOHCA-seq experiments and a tandem GA motif modeled with 1H NMR chemical shifts.

    View details for DOI 10.1007/978-1-0716-2768-6_14

    View details for PubMedID 36705908

  • Auto-DRRAFTER: Automated RNA Modeling Based on Cryo-EM Density. Methods in molecular biology (Clifton, N.J.) Ma, H., Pham, P., Luo, B., Rangan, R., Kappel, K., Su, Z., Das, R. 2023; 2568: 193-211

    Abstract

    RNA three-dimensional structures provide rich and vital information for understanding their functions. Recent advances in cryogenic electron microscopy (cryo-EM) allow structure determination of RNAs and ribonucleoprotein (RNP) complexes. However, limited global and local resolutions of RNA cryo-EM mapspose great challenges in tracing RNA coordinates. The Rosetta-based "auto-DRRAFTER" method builds RNA models into moderate-resolution RNA cryo-EM density as part of the Ribosolve pipeline. Here, we describe a step-by-step protocol for auto-DRRAFTER using a glycine riboswitch from Fusobacterium nucleatum as an example. Successful implementation of this protocol allows automated RNA modeling into RNA cryo-EM density, accelerating our understanding of RNA structure-function relationships. Input and output files are being made available at https://github.com/auto-DRRAFTER/springer-chapter .

    View details for DOI 10.1007/978-1-0716-2687-0_13

    View details for PubMedID 36227570

  • Computationally-guided design and selection of high performing ribosomal active site mutants. Nucleic acids research Kofman, C., Watkins, A. M., Kim, D. S., Willi, J. A., Wooldredge, A. C., Karim, A. S., Das, R., Jewett, M. C. 2022

    Abstract

    Understanding how modifications to the ribosome affect function has implications for studying ribosome biogenesis, building minimal cells, and repurposing ribosomes for synthetic biology. However, efforts to design sequence-modified ribosomes have been limited because point mutations in the ribosomal RNA (rRNA), especially in the catalytic active site (peptidyl transferase center; PTC), are often functionally detrimental. Moreover, methods for directed evolution of rRNA are constrained by practical considerations (e.g. library size). Here, to address these limitations, we developed a computational rRNA design approach for screening guided libraries of mutant ribosomes. Our method includes in silico library design and selection using a Rosetta stepwise Monte Carlo method (SWM), library construction and in vitro testing of combined ribosomal assembly and translation activity, and functional characterization in vivo. As a model, we apply our method to making modified ribosomes with mutant PTCs. We engineer ribosomes with as many as 30 mutations in their PTCs, highlighting previously unidentified epistatic interactions, and show that SWM helps identify sequences with beneficial phenotypes as compared to random library sequences. We further demonstrate that some variants improve cell growth in vivo, relative to wild type ribosomes. We anticipate that SWM design and selection may serve as a powerful tool for rRNA engineering.

    View details for DOI 10.1093/nar/gkac1036

    View details for PubMedID 36484094

  • Learning RNA structure prediction from crowd-designed RNAs NATURE METHODS Wayment-Steele, H. K., Das, R. 2022: 1181-1182

    View details for DOI 10.1038/s41592-022-01607-y

    View details for Web of Science ID 000865223100004

    View details for PubMedID 36192465

    View details for PubMedCentralID PMC9528868

  • RNA secondary structure packages evaluated and improved by high-throughput experiments. Nature methods Wayment-Steele, H. K., Kladwang, W., Strom, A. I., Lee, J., Treuille, A., Becka, A., Eterna Participants, Das, R. 2022

    Abstract

    Despite the popularity of computer-aided study and design of RNA molecules, little is known about the accuracy of commonly used structure modeling packages in tasks sensitive to ensemble properties of RNA. Here, we demonstrate that the EternaBench dataset, a set of more than 20,000 synthetic RNA constructs designed on the RNA design platform Eterna, provides incisive discriminative power in evaluating current packages in ensemble-oriented structure prediction tasks. We find that CONTRAfold and RNAsoft, packages with parameters derived through statistical learning, achieve consistently higher accuracy than more widely used packages in their standard settings, which derive parameters primarily from thermodynamic experiments. We hypothesized that training a multitask model with the varied data types in EternaBench might improve inference on ensemble-based prediction tasks. Indeed, the resulting model, named EternaFold, demonstrated improved performance that generalizes to diverse external datasets including complete messenger RNAs, viral genomes probed in human cells and synthetic designs modeling mRNA vaccines.

    View details for DOI 10.1038/s41592-022-01605-0

    View details for PubMedID 36192461

  • Topological crossing in the misfolded Tetrahymena ribozyme resolved by cryo-EM. Proceedings of the National Academy of Sciences of the United States of America Li, S., Palo, M. Z., Pintilie, G., Zhang, X., Su, Z., Kappel, K., Chiu, W., Zhang, K., Das, R. 2022; 119 (37): e2209146119

    Abstract

    The Tetrahymena group I intron has been a key system in the understanding of RNA folding and misfolding. The molecule folds into a long-lived misfolded intermediate (M) invitro, which has been known to form extensive native-like secondary and tertiary structures but is separated by an unknown kinetic barrier from the native state (N). Here, we used cryogenic electron microscopy (cryo-EM) to resolve misfolded structures of the Tetrahymena L-21 ScaI ribozyme. Maps of three M substates (M1, M2, M3) and one N state were achieved from a single specimen with overall resolutions of 3.5 A, 3.8 A, 4.0 A, and 3.0 A, respectively. Comparisons of the structures reveal that all the M substates are highly similar to N, except for rotation of a core helix P7 that harbors the ribozyme's guanosine binding site and the crossing of the strands J7/3 and J8/7 that connect P7 to the other elements in the ribozyme core. This topological difference between the M substates and N state explains the failure of 5'-splice site substrate docking in M, supports a topological isomer model for the slow refolding of M to N due to a trapped strand crossing, and suggests pathways for M-to-N refolding.

    View details for DOI 10.1073/pnas.2209146119

    View details for PubMedID 36067294

  • Programmable antivirals targeting critical conserved viral RNA secondary structures from influenza A virus and SARS-CoV-2. Nature medicine Hagey, R. J., Elazar, M., Pham, E. A., Tian, S., Ben-Avi, L., Bernardin-Souibgui, C., Yee, M. F., Moreira, F. R., Rabinovitch, M. V., Meganck, R. M., Fram, B., Beck, A., Gibson, S. A., Lam, G., Devera, J., Kladwang, W., Nguyen, K., Xiong, A., Schaffert, S., Avisar, T., Liu, P., Rustagi, A., Fichtenbaum, C. J., Pang, P. S., Khatri, P., Tseng, C., Taubenberger, J. K., Blish, C. A., Hurst, B. L., Sheahan, T. P., Das, R., Glenn, J. S. 2022

    Abstract

    Influenza A virus's (IAV's) frequent genetic changes challenge vaccine strategies and engender resistance to current drugs. We sought to identify conserved and essential RNA secondary structures within IAV's genome that are predicted to have greater constraints on mutation in response to therapeutic targeting. We identified and genetically validated an RNA structure (packaging stem-loop 2 (PSL2)) that mediates in vitro packaging and in vivo disease and is conserved across all known IAV isolates. A PSL2-targeting locked nucleic acid (LNA), administered 3d after, or 14d before, a lethal IAV inoculum provided 100% survival in mice, led to the development of strong immunity to rechallenge with a tenfold lethal inoculum, evaded attempts to select for resistance and retained full potency against neuraminidase inhibitor-resistant virus. Use of an analogous approach to target SARS-CoV-2, prophylactic administration of LNAs specific for highly conserved RNA structures in the viral genome, protected hamsters from efficient transmission of the SARS-CoV-2 USA_WA1/2020 variant. These findings highlight the potential applicability of this approach to any virus of interest via a process we term 'programmable antivirals', with implications for antiviral prophylaxis and post-exposure therapy.

    View details for DOI 10.1038/s41591-022-01908-x

    View details for PubMedID 35982307

  • Three-dimensional structure-guided evolution of a ribosome with tethered subunits. Nature chemical biology Kim, D. S., Watkins, A., Bidstrup, E., Lee, J., Topkar, V., Kofman, C., Schwarz, K. J., Liu, Y., Pintilie, G., Roney, E., Das, R., Jewett, M. C. 2022

    Abstract

    RNA-based macromolecular machines, such as the ribosome, have functional parts reliant on structural interactions spanning sequence-distant regions. These features limit evolutionary exploration of mutant libraries and confound three-dimensional structure-guided design. To address these challenges, we describe Evolink (evolution and linkage), a method that enables high-throughput evolution of sequence-distant regions in large macromolecular machines, and library design guided by computational RNA modeling to enable exploration of structurally stable designs. Using Evolink, we evolved a tethered ribosome with a 58% increased activity in orthogonal protein translation and a 97% improvement in doubling times in SQ171 cells compared to a previously developed tethered ribosome, and reveal new permissible sequences in a pair of ribosomal helices with previously explored biological function. The Evolink approach may enable enhanced engineering of macromolecular machines for new and improved functions for synthetic biology.

    View details for DOI 10.1038/s41589-022-01064-w

    View details for PubMedID 35836020

  • Crowdsourced RNA design discovers diverse, reversible, efficient, self-contained molecular switches. Proceedings of the National Academy of Sciences of the United States of America Andreasson, J. O., Gotrik, M. R., Wu, M. J., Wayment-Steele, H. K., Kladwang, W., Portela, F., Wellington-Oguri, R., Eterna Participants, Das, R., Greenleaf, W. J. 2022; 119 (18): e2112979119

    Abstract

    SignificanceOur manuscript presents a paradigm for carrying out distributed science. We have harnessed an online RNA design game, Eterna, to challenge a large community of RNA designers to create diverse RNA sensors. RNA is an attractive, biocompatible substrate for the design and implementation of molecular sensors. We tasked the diverse Eterna community, comprising a global network of molecular design enthusiasts, to submit thousands to tens of thousands of "solutions" to these RNA sensor design challenges. Crucially, community designs were synthesized and tested experimentally in the real world using high-throughput methods for biochemical assays built on repurposed DNA sequencers. The best player-generated designs for RNA sensors approached the thermodynamic optimum.

    View details for DOI 10.1073/pnas.2112979119

    View details for PubMedID 35471911

  • Combinatorial optimization of mRNA structure, stability, and translation for RNA-based therapeutics. Nature communications Leppek, K., Byeon, G. W., Kladwang, W., Wayment-Steele, H. K., Kerr, C. H., Xu, A. F., Kim, D. S., Topkar, V. V., Choe, C., Rothschild, D., Tiu, G. C., Wellington-Oguri, R., Fujii, K., Sharma, E., Watkins, A. M., Nicol, J. J., Romano, J., Tunguz, B., Diaz, F., Cai, H., Guo, P., Wu, J., Meng, F., Shi, S., Participants, E., Dormitzer, P. R., Solorzano, A., Barna, M., Das, R. 2022; 13 (1): 1536

    Abstract

    Therapeutic mRNAs and vaccines are being developed for a broad range of human diseases, including COVID-19. However, their optimization is hindered by mRNA instability and inefficient protein expression. Here, we describe design principles that overcome these barriers. We develop an RNA sequencing-based platform called PERSIST-seq to systematically delineate in-cell mRNA stability, ribosome load, as well as in-solution stability of a library of diverse mRNAs. We find that, surprisingly, in-cell stability is a greater driver of protein output than high ribosome load. We further introduce a method called In-line-seq, applied to thousands of diverse RNAs, that reveals sequence and structure-based rules for mitigating hydrolytic degradation. Our findings show that highly structured "superfolder" mRNAs can be designed to improve both stability and expression with further enhancement through pseudouridine nucleoside modification. Together, our study demonstrates simultaneous improvement of mRNA stability and protein expression and provides a computational-experimental platform for the enhancement of mRNA medicines.

    View details for DOI 10.1038/s41467-022-28776-w

    View details for PubMedID 35318324

  • Deep learning models for predicting RNA degradation via dual crowdsourcing. Nature machine intelligence Wayment-Steele, H. K., Kladwang, W., Watkins, A. M., Kim, D. S., Tunguz, B., Reade, W., Demkin, M., Romano, J., Wellington-Oguri, R., Nicol, J. J., Gao, J., Onodera, K., Fujikawa, K., Mao, H., Vandewiele, G., Tinti, M., Steenwinckel, B., Ito, T., Noumi, T., He, S., Ishi, K., Lee, Y., Ozturk, F., Chiu, K. Y., Ozturk, E., Amer, K., Fares, M., Eterna Participants, Das, R. 2022; 4 (12): 1174-1184

    Abstract

    Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition ('Stanford OpenVaccine') on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102-130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504-1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales.

    View details for DOI 10.1038/s42256-022-00571-8

    View details for PubMedID 36567960

  • RiboDraw: semiautomated two-dimensional drawing of RNA tertiary structure diagrams. NAR genomics and bioinformatics Das, R., Watkins, A. M. 2021; 3 (4): lqab091

    Abstract

    Publishing, discussing, envisioning, modeling, designing and experimentally determining RNA three-dimensional (3D) structures involve preparation of two-dimensional (2D) drawings that depict critical functional features of the subject molecules, such as noncanonical base pairs and protein contacts. Here, we describe RiboDraw, new software for crafting these drawings. We illustrate the features of RiboDraw by applying it to several RNAs, including the Escherichia coli tRNA-Phe, the P4-P6 domain of Tetrahymena ribozyme, a -1 ribosomal frameshift stimulation element from beet western yellows virus and the 5' untranslated region of SARS-CoV-2. We show secondary structure diagrams of the 23S and 16S subunits of the E. coli ribosome that reflect noncanonical base pairs, ribosomal proteins and structural motifs, and that convey the relative positions of these critical features in 3D space. This software is a MATLAB package freely available at https://github.com/DasLab/RiboDraw.

    View details for DOI 10.1093/nargab/lqab091

    View details for PubMedID 34661102

  • Ensuring scientific reproducibility in bio-macromolecular modeling via extensive, automated benchmarks. Nature communications Koehler Leman, J., Lyskov, S., Lewis, S. M., Adolf-Bryfogle, J., Alford, R. F., Barlow, K., Ben-Aharon, Z., Farrell, D., Fell, J., Hansen, W. A., Harmalkar, A., Jeliazkov, J., Kuenze, G., Krys, J. D., Ljubetic, A., Loshbaugh, A. L., Maguire, J., Moretti, R., Mulligan, V. K., Nance, M. L., Nguyen, P. T., O Conchuir, S., Roy Burman, S. S., Samanta, R., Smith, S. T., Teets, F., Tiemann, J. K., Watkins, A., Woods, H., Yachnin, B. J., Bahl, C. D., Bailey-Kellogg, C., Baker, D., Das, R., DiMaio, F., Khare, S. D., Kortemme, T., Labonte, J. W., Lindorff-Larsen, K., Meiler, J., Schief, W., Schueler-Furman, O., Siegel, J. B., Stein, A., Yarov-Yarovoy, V., Kuhlman, B., Leaver-Fay, A., Gront, D., Gray, J. J., Bonneau, R. 2021; 12 (1): 6947

    Abstract

    Each year vast international resources are wasted on irreproducible research. The scientific community has been slow to adopt standard software engineering practices, despite the increases in high-dimensional data, complexities of workflows, and computational environments. Here we show how scientific software applications can be created in a reproducible manner when simple design goals for reproducibility are met. We describe the implementation of a test server framework and 40 scientific benchmarks, covering numerous applications in Rosetta bio-macromolecular modeling. High performance computing cluster integration allows these benchmarks to run continuously and automatically. Detailed protocol captures are useful for developers and users of Rosetta and other macromolecular modeling tools. The framework and design concepts presented here are valuable for developers and users of any type of scientific software and for the scientific community to create reproducible methods. Specific examples highlight the utility of this framework, and the comprehensive documentation illustrates the ease of adding new tests in a matter of hours.

    View details for DOI 10.1038/s41467-021-27222-7

    View details for PubMedID 34845212

  • Theoretical basis for stabilizing messenger RNA through secondary structure design. Nucleic acids research Wayment-Steele, H. K., Kim, D. S., Choe, C. A., Nicol, J. J., Wellington-Oguri, R., Watkins, A. M., Parra Sperberg, R. A., Huang, P., Participants, E., Das, R. 2021

    Abstract

    RNA hydrolysis presents problems in manufacturing, long-term storage, world-wide deliveryand in vivo stability of messenger RNA (mRNA)-based vaccines and therapeutics. A largely unexplored strategy to reduce mRNA hydrolysis is to redesign RNAs to form double-stranded regions, which are protected from in-line cleavage and enzymatic degradation, while coding for the same proteins. The amount of stabilization that this strategy can deliver and the most effective algorithmic approach to achieve stabilization remain poorly understood. Here, we present simple calculations for estimating RNA stability against hydrolysis, and a model that links the average unpaired probability of an mRNA, or AUP, to its overall hydrolysis rate. To characterize the stabilization achievable through structure design, we compare AUP optimization by conventional mRNA design methods to results from more computationally sophisticated algorithms and crowdsourcing through the OpenVaccine challenge on the Eterna platform. We find that rational design on Eterna and the more sophisticated algorithms lead to constructs with low AUP, which we term 'superfolder' mRNAs. These designs exhibit a wide diversity of sequence and structure features that may be desirable for translation, biophysical size, and immunogenicity. Furthermore, their folding is robust to temperature, computer modeling method, choice of flanking untranslated regions, and changes in target protein sequence, as illustrated by rapid redesign of superfolder mRNAs for B.1.351, P.1and B.1.1.7 variants of the prefusion-stabilized SARS-CoV-2 spike protein. Increases in in vitro mRNA half-life by at least two-fold appear immediately achievable.

    View details for DOI 10.1093/nar/gkab764

    View details for PubMedID 34520542

  • How to Kinetically Dissect an RNA Machine. Biochemistry Das, R., Russell, R. 2021

    Abstract

    RNA-based machines are ubiquitous in Nature and increasingly important for medicines. They fold into complex, dynamic structures that process information and catalyze reactions, including reactions that generate new RNAs and proteins across biology. What are the experimental strategies and steps that are necessary to understand how these complex machines work? Two 1990 papers from Herschlag and Cech on "Catalysis of RNA Cleavage by the Tetrahymena thermophila Ribozyme" provide a master class in dissecting an RNA machine through kinetics approaches. By showing how to propose a kinetic framework, fill in the numbers, do cross-checks, and make comparisons across mutants and different RNA systems, the papers illustrate how to take a mechanistic approach and distill the results into general insights that are difficult to attain through other means.

    View details for DOI 10.1021/acs.biochem.1c00392

    View details for PubMedID 34492193

  • Geometric deep learning of RNA structure. Science (New York, N.Y.) Townshend, R. J., Eismann, S., Watkins, A. M., Rangan, R., Karelina, M., Das, R., Dror, R. O. 2021; 373 (6558): 1047-1051

    Abstract

    RNA molecules adopt three-dimensional structures that are critical to their function and of interest in drug discovery. Few RNA structures are known, however, and predicting them computationally has proven challenging. We introduce a machine learning approach that enables identification of accurate structural models without assumptions about their defining characteristics, despite being trained with only 18 known RNA structures. The resulting scoring function, the Atomic Rotationally Equivariant Scorer (ARES), substantially outperforms previous methods and consistently produces the best results in community-wide blind RNA structure prediction challenges. By learning effectively even from a small amount of data, our approach overcomes a major limitation of standard deep neural networks. Because it uses only atomic coordinates as inputs and incorporates no RNA-specific information, this approach is applicable to diverse problems in structural biology, chemistry, materials science, and beyond.

    View details for DOI 10.1126/science.abe5650

    View details for PubMedID 34446608

  • Cryo-EM and antisense targeting of the 28-kDa frameshift stimulation element from the SARS-CoV-2 RNA genome. Nature structural & molecular biology Zhang, K., Zheludev, I. N., Hagey, R. J., Haslecker, R., Hou, Y. J., Kretsch, R., Pintilie, G. D., Rangan, R., Kladwang, W., Li, S., Wu, M. T., Pham, E. A., Bernardin-Souibgui, C., Baric, R. S., Sheahan, T. P., D'Souza, V., Glenn, J. S., Chiu, W., Das, R. 2021

    Abstract

    Drug discovery campaigns against COVID-19 are beginning to target the SARS-CoV-2 RNA genome. The highly conserved frameshift stimulation element (FSE), required for balanced expression of viral proteins, is a particularly attractive SARS-CoV-2 RNA target. Here we present a 6.9A resolution cryo-EM structure of the FSE (88nucleotides, ~28kDa), validated through an RNA nanostructure tagging method. The tertiary structure presents a topologically complex fold in which the 5' end is threaded through a ring formed inside a three-stem pseudoknot. Guided by this structure, we develop antisense oligonucleotides that impair FSE function in frameshifting assays and knock down SARS-CoV-2 virus replication in A549-ACE2 cells at 100nM concentration.

    View details for DOI 10.1038/s41594-021-00653-y

    View details for PubMedID 34426697

  • Interpretation of RNA cryo-EM maps of various resolutions Kretsch, R., Das, R., Chiu, W. INT UNION CRYSTALLOGRAPHY. 2021: A217
  • RNA structure: a renaissance begins? Nature methods Das, R. 2021; 18 (5): 439

    View details for DOI 10.1038/s41592-021-01132-4

    View details for PubMedID 33963334

  • Structure of human telomerase holoenzyme with bound telomeric DNA. Nature Ghanim, G. E., Fountain, A. J., van Roon, A. M., Rangan, R., Das, R., Collins, K., Nguyen, T. H. 2021

    Abstract

    Telomerase adds telomeric repeats at chromosome ends to compensate for the telomere loss that is caused by incomplete genome end replication1. In humans, telomerase is upregulated during embryogenesis and in cancers, and mutations that compromise the function of telomerase result in disease2. A previous structure of human telomerase at a resolution of 8A revealed a vertebrate-specific composition and architecture3, comprising a catalytic core that is flexibly tethered to an H and ACA (hereafter, H/ACA) box ribonucleoprotein (RNP) lobe by telomerase RNA. High-resolution structural information is necessary to develop treatments that can effectively modulate telomerase activity as a therapeutic approach against cancers and disease. Here we used cryo-electron microscopy to determine the structure of human telomerase holoenzyme bound to telomeric DNAat sub-4 A resolution, which reveals crucial DNA- and RNA-binding interfaces in the active site of telomerase as well as the locations of mutations that alter telomerase activity. We identified a histone H2A-H2B dimer within the holoenzyme that was bound to an essential telomerase RNA motif, which suggests a role for histones in the folding and function of telomerase RNA. Furthermore, this structure of a eukaryotic H/ACA RNP reveals the molecular recognition of conserved RNA and protein motifs, as well as interactions that are crucial for understanding the molecular pathology of many mutations that cause disease. Our findings provide thestructural details of the assembly and active site of human telomerase, which paves the way for the development of therapeutic agents that target this enzyme.

    View details for DOI 10.1038/s41586-021-03415-4

    View details for PubMedID 33883742

  • Functional and structural basis of extreme conservation in vertebrate 5' untranslated regions. Nature genetics Byeon, G. W., Cenik, E. S., Jiang, L., Tang, H., Das, R., Barna, M. 2021

    Abstract

    The lack of knowledge about extreme conservation in genomes remains a major gap in our understanding of the evolution of gene regulation. Here, we reveal an unexpected role of extremely conserved 5' untranslated regions (UTRs) in noncanonical translational regulation that is linked to the emergence of essential developmental features in vertebrate species. Endogenous deletion of conserved elements within these 5' UTRs decreased gene expression, and extremely conserved 5' UTRs possess cis-regulatory elements that promote cell-type-specific regulation of translation. We further developed in-cell mutate-and-map (icM2), a new methodology that maps RNA structure inside cells. Using icM2, we determined that an extremely conserved 5' UTR encodes multiple alternative structures and that each single nucleotide within the conserved element maintains the balance of alternative structures important to control the dynamic range of protein expression. These results explain how extreme sequence conservation can lead to RNA-level biological functions encoded in the untranslated regions of vertebrate genomes.

    View details for DOI 10.1038/s41588-021-00830-1

    View details for PubMedID 33821006

  • PyRosetta Jupyter Notebooks Teach Biomolecular Structure Prediction and Design. Biophysicist (Rockville, Md.) Le, K. H., Adolf-Bryfogle, J., Klima, J. C., Lyskov, S., Labonte, J., Bertolani, S., Burman, S. S., Leaver-Fay, A., Weitzner, B., Maguire, J., Rangan, R., Adrianowycz, M. A., Alford, R. F., Adal, A., Nance, M. L., Wu, Y., Willis, J., Kulp, D. W., Das, R., Dunbrack, R. L., Schief, W., Kuhlman, B., Siegel, J. B., Gray, J. J. 2021; 2 (1): 108-122

    Abstract

    Biomolecular structure drives function, and computational capabilities have progressed such that the prediction and computational design of biomolecular structures is increasingly feasible. Because computational biophysics attracts students from many different backgrounds and with different levels of resources, teaching the subject can be challenging. One strategy to teach diverse learners is with interactive multimedia material that promotes self-paced, active learning. We have created a hands-on education strategy with a set of sixteen modules that teach topics in biomolecular structure and design, from fundamentals of conformational sampling and energy evaluation to applications like protein docking, antibody design, and RNA structure prediction. Our modules are based on PyRosetta, a Python library that encapsulates all computational modules and methods in the Rosetta software package. The workshop-style modules are implemented as Jupyter Notebooks that can be executed in the Google Colaboratory, allowing learners access with just a web browser. The digital format of Jupyter Notebooks allows us to embed images, molecular visualization movies, and interactive coding exercises. This multimodal approach may better reach students from different disciplines and experience levels as well as attract more researchers from smaller labs and cognate backgrounds to leverage PyRosetta in their science and engineering research. All materials are freely available at https://github.com/RosettaCommons/PyRosetta.notebooks.

    View details for DOI 10.35459/tbp.2019.000147

    View details for PubMedID 35128343

  • De novo 3D models of SARS-CoV-2 RNA elements from consensus experimental secondary structures. Nucleic acids research Rangan, R., Watkins, A. M., Chacon, J., Kretsch, R., Kladwang, W., Zheludev, I. N., Townley, J., Rynge, M., Thain, G., Das, R. 2021

    Abstract

    The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.

    View details for DOI 10.1093/nar/gkab119

    View details for PubMedID 33693814

  • Learning cis-regulatory principles of ADAR-based RNA editing from CRISPR-mediated mutagenesis. Nature communications Liu, X., Sun, T., Shcherbina, A., Li, Q., Jarmoskaite, I., Kappel, K., Ramaswami, G., Das, R., Kundaje, A., Li, J. B. 2021; 12 (1): 2165

    Abstract

    Adenosine-to-inosine (A-to-I) RNA editing catalyzed by ADAR enzymes occurs in double-stranded RNAs. Despite a compelling need towards predictive understanding of natural and engineered editing events, how the RNA sequence and structure determine the editing efficiency and specificity (i.e., cis-regulation) is poorly understood. We apply a CRISPR/Cas9-mediated saturation mutagenesis approach to generate libraries of mutations near three natural editing substrates at their endogenous genomic loci. We use machine learning to integrate diverse RNA sequence and structure features to model editing levels measured by deep sequencing. We confirm known features and identify new features important for RNA editing. Training and testing XGBoost algorithm within the same substrate yield models that explain 68 to 86 percent of substrate-specific variation in editing levels. However, the models do not generalize across substrates, suggesting complex and context-dependent regulation patterns. Our integrative approach can be applied to larger scale experiments towards deciphering the RNA editing code.

    View details for DOI 10.1038/s41467-021-22489-2

    View details for PubMedID 33846332

  • Cryo-EM structures of full-length Tetrahymena ribozyme at 3.1 Å resolution. Nature Su, Z., Zhang, K., Kappel, K., Li, S., Palo, M. Z., Pintilie, G. D., Rangan, R., Luo, B., Wei, Y., Das, R., Chiu, W. 2021

    Abstract

    Single-particle cryogenic electron microscopy (cryo-EM) has become a standard technique for determining protein structures at atomic resolution1-3. However, cryo-EM studies of protein-free RNA are in their early days. The Tetrahymena thermophila group I self-splicing intron was the first ribozyme to be discovered and has been a prominent model system for the study of RNA catalysis and structure-function relationships4, but its full structure remains unknown. Here we report cryo-EM structures of the full-length Tetrahymena ribozyme in substrate-free and bound states at a resolution of 3.1 Å. Newly resolved peripheral regions form two coaxially stacked helices; these are interconnected by two kissing loop pseudoknots that wrap around the catalytic core and include two previously unforeseen (to our knowledge) tertiary interactions. The global architecture is nearly identical in both states; only the internal guide sequence and guanosine binding site undergo a large conformational change and a localized shift, respectively, upon binding of RNA substrates. These results provide a long-sought structural view of a paradigmatic RNA enzyme and signal a new era for the cryo-EM-based study of structure-function relationships in ribozymes.

    View details for DOI 10.1038/s41586-021-03803-w

    View details for PubMedID 34381213

  • Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nature methods Leman, J. K., Weitzner, B. D., Lewis, S. M., Adolf-Bryfogle, J., Alam, N., Alford, R. F., Aprahamian, M., Baker, D., Barlow, K. A., Barth, P., Basanta, B., Bender, B. J., Blacklock, K., Bonet, J., Boyken, S. E., Bradley, P., Bystroff, C., Conway, P., Cooper, S., Correia, B. E., Coventry, B., Das, R., De Jong, R. M., DiMaio, F., Dsilva, L., Dunbrack, R., Ford, A. S., Frenz, B., Fu, D. Y., Geniesse, C., Goldschmidt, L., Gowthaman, R., Gray, J. J., Gront, D., Guffy, S., Horowitz, S., Huang, P., Huber, T., Jacobs, T. M., Jeliazkov, J. R., Johnson, D. K., Kappel, K., Karanicolas, J., Khakzad, H., Khar, K. R., Khare, S. D., Khatib, F., Khramushin, A., King, I. C., Kleffner, R., Koepnick, B., Kortemme, T., Kuenze, G., Kuhlman, B., Kuroda, D., Labonte, J. W., Lai, J. K., Lapidoth, G., Leaver-Fay, A., Lindert, S., Linsky, T., London, N., Lubin, J. H., Lyskov, S., Maguire, J., Malmstrom, L., Marcos, E., Marcu, O., Marze, N. A., Meiler, J., Moretti, R., Mulligan, V. K., Nerli, S., Norn, C., O'Conchuir, S., Ollikainen, N., Ovchinnikov, S., Pacella, M. S., Pan, X., Park, H., Pavlovicz, R. E., Pethe, M., Pierce, B. G., Pilla, K. B., Raveh, B., Renfrew, P. D., Burman, S. S., Rubenstein, A., Sauer, M. F., Scheck, A., Schief, W., Schueler-Furman, O., Sedan, Y., Sevy, A. M., Sgourakis, N. G., Shi, L., Siegel, J. B., Silva, D., Smith, S., Song, Y., Stein, A., Szegedy, M., Teets, F. D., Thyme, S. B., Wang, R. Y., Watkins, A., Zimmerman, L., Bonneau, R. 2020

    Abstract

    The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.

    View details for DOI 10.1038/s41592-020-0848-2

    View details for PubMedID 32483333

  • Anomalous Reverse Transcription through Chemical Modifications in Polyadenosine Stretches. Biochemistry Kladwang, W., Topkar, V. V., Liu, B., Rangan, R., Hodges, T. L., Keane, S. C., Al-Hashimi, H., Das, R. 2020

    Abstract

    Thermostable reverse transcriptases are workhorse enzymes underlying nearly all modern techniques for RNA structure mapping and for the transcriptome-wide discovery of RNA chemical modifications. Despite their wide use, these enzymes' behaviors at chemical modified nucleotides remain poorly understood. Wellington-Oguri et al. recently reported an apparent loss of chemical modification within putatively unstructured polyadenosine stretches modified by dimethyl sulfate or 2' hydroxyl acylation, as probed by reverse transcription. Here, reanalysis of these and other publicly available data, capillary electrophoresis experiments on chemically modified RNAs, and nuclear magnetic resonance spectroscopy on (A)12 and variants show that this effect is unlikely to arise from an unusual structure of polyadenosine. Instead, tests of different reverse transcriptases on chemically modified RNAs and molecules synthesized with single 1-methyladenosines implicate a previously uncharacterized reverse transcriptase behavior: near-quantitative bypass through chemical modifications within polyadenosine stretches. All tested natural and engineered reverse transcriptases (MMLV; SuperScript II, III, and IV; TGIRT-III; and MarathonRT) exhibit this anomalous bypass behavior. Accurate DMS-guided structure modeling of the polyadenylated HIV-1 3' untranslated region requires taking into account this anomaly. Our results suggest that poly(rA-dT) hybrid duplexes can trigger an unexpectedly effective reverse transcriptase bypass and that chemical modifications in mRNA poly(A) tails may be generally undercounted.

    View details for DOI 10.1021/acs.biochem.0c00020

    View details for PubMedID 32407625

  • RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look. RNA (New York, N.Y.) Rangan, R., Zheludev, I. N., Das, R. 2020

    Abstract

    As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nucleotides as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences subsequently reported from the COVID-19 outbreak, and we present a curated list of 30 'SARS-related-conserved' regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 'SARS-CoV-2-conserved-structured' regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the extended 5' UTR, frame-shifting element, and 3' UTR. Last, we predict regions of the SARS-CoV-2 viral genome that have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 'SARS-CoV-2-conserved-unstructured' genomic regions may be most easily targeted in primer-based diagnostic and oligonucleotide-based therapeutic strategies.

    View details for DOI 10.1261/rna.076141.120

    View details for PubMedID 32398273

  • RNA-Puzzles Round IV: 3D structure predictions of four ribozymes and two aptamers. RNA (New York, N.Y.) Miao, Z., Adamiak, R. W., Antczak, M., Boniecki, M. J., Bujnicki, J. M., Chen, S., Cheng, C. Y., Cheng, Y., Chou, F., Das, R., Dokholyan, N. V., Ding, F., Geniesse, C., Jiang, Y., Joshi, A., Krokhotin, A., Magnus, M., Mailhot, O., Major, F., Mann, T. H., Piatkowski, P., Pluta, R., Popenda, M., Sarzynska, J., Sun, L., Szachniuk, M., Tian, S., Wang, J., Wang, J., Watkins, A. M., Wiedemann, J., Xiao, Y., Xu, X., Yesselman, J. D., Zhang, D., Zhang, Y., Zhang, Z., Zhao, C., Zhao, P., Zhou, Y., Zok, T., Zyla, A., Ren, A., Batey, R. T., Golden, B. L., Huang, L., Lilley, D. M., Liu, Y., Patel, D. J., Westhof, E. 2020

    Abstract

    RNA-Puzzles is a collective endeavor dedicated to the advancement and improvement of RNA 3D structure prediction. With agreement from crystallographers, the RNA structures are predicted by various groups before the publication of the crystal structures. We now report the prediction of six RNA sequences: four structures of nucleolytic ribozymes and two of riboswitches. Systematic protocols for comparing models and crystal structures are described and analyzed. In these six puzzles, we discuss a) the comparison between the automated web server and human experts; b) the prediction of coaxial stacking; c) the prediction of structural details and ligand binding; d) the development of novel prediction methods; and e) the potential improvements to be made. It is illustrated that correct coaxial stacking and tertiary contacts are key for the prediction of RNA architecture, while ligand binding modes can be only predicted with low resolution and accurate ligand binding prediction still remains out of reach. All the predicted models are available for the future development of force field parameters and the improvement of comparison and assessment tools.

    View details for DOI 10.1261/rna.075341.120

    View details for PubMedID 32371455

  • Transcription polymerase-catalyzed emergence of novel RNA replicons. Science (New York, N.Y.) Jain, N. n., Blauch, L. R., Szymanski, M. R., Das, R. n., Tang, S. K., Yin, Y. W., Fire, A. Z. 2020

    Abstract

    Transcription polymerases can exhibit an unusual mode of regenerating certain RNA templates from RNA, yielding systems that can replicate and evolve with RNA as information carrier. Two classes of pathogenic RNAs (Hepatitis delta virus in animals and viroids in plants) are copied by host transcription polymerases. Using in vitro RNA replication by the transcription polymerase of T7 bacteriophage as an experimental model, we identify hundreds of new replicating RNAs, define three mechanistic hallmarks of replication (subterminal de novo initiation, RNA shape-shifting and interrupted rolling circle synthesis) and describe emergence from DNA seeds as a mechanism for the origin of novel RNA replicons. These results inform models for the origins and replication of naturally occurring RNA genetic elements and suggest a means by which diverse RNA populations could be propagated as hereditary material in cellular contexts.

    View details for DOI 10.1126/science.aay0688

    View details for PubMedID 32217750

  • Folding heterogeneity in the essential human telomerase RNA three-way junction. RNA (New York, N.Y.) Palka, C. n., Forino, N. n., Hentschel, J. n., Das, R. n., Stone, M. D. 2020

    Abstract

    Telomeres safeguard the genome by suppressing illicit DNA damage responses at chromosome termini. In order to compensate for incomplete DNA replication at telomeres, most continually dividing cells, including many cancers, express the telomerase ribonucleoprotein (RNP) complex. Telomerase maintains telomere length by catalyzing de novo synthesis of short DNA repeats using an internal telomerase RNA (TR) template. TRs from diverse species harbor structurally conserved domains that contribute to RNP biogenesis and function. In vertebrate TRs, the conserved regions 4 and 5 (CR4/5) fold into a three-way junction (TWJ) that binds directly to the telomerase catalytic protein subunit and is required for telomerase function. We have analyzed the structural properties of the human TR (hTR) CR4/5 domain using a combination of in vitro chemical mapping, secondary structural modeling, and single-molecule structural analysis. Our data suggest the essential P6.1 stem loop within CR4/5 is not stably folded in the absence of the telomerase reverse transcriptase in vitro. Rather, the hTR CR4/5 domain adopts a heterogeneous ensemble of conformations. Finally, single-molecule FRET measurements of CR4/5 and a mutant designed to stabilize the P6.1 stem demonstrate that TERT-binding selects for a structural conformation of CR4/5 that is not the dominant state of the TERT-free in vitro RNA ensemble.

    View details for DOI 10.1261/rna.077255.120

    View details for PubMedID 32817241

  • FARFAR2: Improved De Novo Rosetta Prediction of Complex Global RNA Folds. Structure (London, England : 1993) Watkins, A. M., Rangan, R. n., Das, R. n. 2020

    Abstract

    Predicting RNA three-dimensional structures from sequence could accelerate understanding of the growing number of RNA molecules being discovered across biology. Rosetta's Fragment Assembly of RNA with Full-Atom Refinement (FARFAR) has shown promise in community-wide blind RNA-Puzzle trials, but lack of a systematic and automated benchmark has left unclear what limits FARFAR performance. Here, we benchmark FARFAR2, an algorithm integrating RNA-Puzzle-inspired innovations with updated fragment libraries and helix modeling. In 16 of 21 RNA-Puzzles revisited without experimental data or expert intervention, FARFAR2 recovers native-like structures more accurate than models submitted during the RNA-Puzzles trials. Remaining bottlenecks include conformational sampling for >80-nucleotide problems and scoring function limitations more generally. Supporting these conclusions, preregistered blind models for adenovirus VA-I RNA and five riboswitch complexes predicted native-like folds with 3- to 14 Å root-mean-square deviation accuracies. We present a FARFAR2 webserver and three large model archives (FARFAR2-Classics, FARFAR2-Motifs, and FARFAR2-Puzzles) to guide future applications and advances.

    View details for DOI 10.1016/j.str.2020.05.011

    View details for PubMedID 32531203

  • Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nature methods Kappel, K. n., Zhang, K. n., Su, Z. n., Watkins, A. M., Kladwang, W. n., Li, S. n., Pintilie, G. n., Topkar, V. V., Rangan, R. n., Zheludev, I. N., Yesselman, J. D., Chiu, W. n., Das, R. n. 2020; 17 (7): 699–707

    Abstract

    The discovery and design of biologically important RNA molecules is outpacing three-dimensional structural characterization. Here, we demonstrate that cryo-electron microscopy can routinely resolve maps of RNA-only systems and that these maps enable subnanometer-resolution coordinate estimation when complemented with multidimensional chemical mapping and Rosetta DRRAFTER computational modeling. This hybrid 'Ribosolve' pipeline detects and falsifies homologies and conformational rearrangements in 11 previously unknown 119- to 338-nucleotide protein-free RNA structures: full-length Tetrahymena ribozyme, hc16 ligase with and without substrate, full-length Vibrio cholerae and Fusobacterium nucleatum glycine riboswitch aptamers with and without glycine, Mycobacterium SAM-IV riboswitch with and without S-adenosylmethionine, and the computer-designed ATP-TTR-3 aptamer with and without AMP. Simulation benchmarks, blind challenges, compensatory mutagenesis, cross-RNA homologies and internal controls demonstrate that Ribosolve can accurately resolve the global architectures of RNA molecules but does not resolve atomic details. These tests offer guidelines for making inferences in future RNA structural studies with similarly accelerated throughput.

    View details for DOI 10.1038/s41592-020-0878-9

    View details for PubMedID 32616928

  • RNA 3D structure prediction guided by independent folding of homologous sequences. BMC bioinformatics Magnus, M., Kappel, K., Das, R., Bujnicki, J. M. 2019; 20 (1): 512

    Abstract

    BACKGROUND: The understanding of the importance of RNA has dramatically changed over recent years. As in the case of proteins, the function of an RNA molecule is encoded in its tertiary structure, which in turn is determined by the molecule's sequence. The prediction of tertiary structures of complex RNAs is still a challenging task.RESULTS: Using the observation that RNA sequences from the same RNA family fold into conserved structure, we test herein whether parallel modeling of RNA homologs can improve ab initio RNA structure prediction. EvoClustRNA is a multi-step modeling process, in which homologous sequences for the target sequence are selected using the Rfam database. Subsequently, independent folding simulations using Rosetta FARFAR and SimRNA are carried out. The model of the target sequence is selected based on the most common structural arrangement of the common helical fragments. As a test, on two blind RNA-Puzzles challenges, EvoClustRNA predictions ranked as the first of all submissions for the L-glutamine riboswitch and as the second for the ZMP riboswitch. Moreover, through a benchmark of known structures, we discovered several cases in which particular homologs were unusually amenable to structure recovery in folding simulations compared to the single original target sequence.CONCLUSION: This work, for the first time to our knowledge, demonstrates the importance of the selection of the target sequence from an alignment of an RNA family for the success of RNA 3D structure prediction. These observations prompt investigations into a new direction of research for checking 3D structure "foldability" or "predictability" of related RNA sequences to obtain accurate predictions. To support new research in this area, we provide all relevant scripts in a documented and ready-to-use form. By exploring new ideas and identifying limitations of the current RNA 3D structure prediction methods, this work is bringing us closer to the near-native computational RNA 3D models.

    View details for DOI 10.1186/s12859-019-3120-y

    View details for PubMedID 31640563

  • A unified mechanism for intron and exon definition and back-splicing. Nature Li, X., Liu, S., Zhang, L., Issaian, A., Hill, R. C., Espinosa, S., Shi, S., Cui, Y., Kappel, K., Das, R., Hansen, K. C., Zhou, Z. H., Zhao, R. 2019

    Abstract

    The molecular mechanisms of exon definition and back-splicing are fundamental unanswered questions in pre-mRNA splicing. Here we report cryo-electron microscopy structures of the yeast spliceosomal E complex assembled on introns, providing a view of the earliest event in the splicing cycle that commits pre-mRNAs to splicing. The E complex architecture suggests that the same spliceosome can assemble across an exon, and that it either remodels to span an intron for canonical linear splicing (typically on short exons) or catalyses back-splicing to generate circular RNA (on long exons). The model is supported by our experiments, which show that an E complex assembled on the middle exon of yeast EFM5 or HMRA1 can be chased into circular RNA when the exon is sufficiently long. This simple model unifies intron definition, exon definition, and back-splicing through the same spliceosome in all eukaryotes and should inspire experiments in many other systems to understand the mechanism and regulation of these processes.

    View details for DOI 10.1038/s41586-019-1523-6

    View details for PubMedID 31485080

  • A conserved RNA structural motif for organizing topology within picornaviral internal ribosome entry sites. Nature communications Koirala, D., Shao, Y., Koldobskaya, Y., Fuller, J. R., Watkins, A. M., Shelke, S. A., Pilipenko, E. V., Das, R., Rice, P. A., Piccirilli, J. A. 2019; 10 (1): 3629

    Abstract

    Picornaviral IRES elements are essential for initiating the cap-independent viral translation. However, three-dimensional structures of these elements remain elusive. Here, we report a 2.84-A resolution crystal structure of hepatitis A virus IRES domain V (dV) in complex with a synthetic antibody fragment-a crystallization chaperone. The RNA adopts a three-way junction structure, topologically organized by an adenine-rich stem-loop motif. Despite no obvious sequence homology, the dV architecture shows a striking similarity to a circularly permuted form of encephalomyocarditis virus J-K domain, suggesting a conserved strategy for organizing the domain architecture. Recurrence of the motif led us to use homology modeling tools to compute a 3-dimensional structure of the corresponding domain of foot-and-mouth disease virus, revealing an analogous domain organizing motif. The topological conservation observed among these IRESs and other viral domains implicates a structured three-way junction as an architectural scaffold to pre-organize helical domains for recruiting the translation initiation machinery.

    View details for DOI 10.1038/s41467-019-11585-z

    View details for PubMedID 31399592

  • Automated Design of Diverse Stand-Alone Riboswitches ACS SYNTHETIC BIOLOGY Wu, M. J., Andreasson, J. L., Kladwang, W., Greenleaf, W., Das, R. 2019; 8 (8): 1838–46

    Abstract

    Riboswitches that couple binding of ligands to conformational changes offer sensors and control elements for RNA synthetic biology and medical biotechnology. However, design of these riboswitches has required expert intuition or software specialized to transcription or translation outputs; design has been particularly challenging for applications in which the riboswitch output cannot be amplified by other molecular machinery. We present a fully automated design method called RiboLogic for such "stand-alone" riboswitches and test it via high-throughput experiments on 2875 molecules using RNA-MaP (RNA on a massively parallel array) technology. These molecules consistently modulate their affinity to the MS2 bacteriophage coat protein upon binding of flavin mononucleotide, tryptophan, theophylline, and microRNA miR-208a, achieving activation ratios of up to 20 and significantly better performance than control designs. By encompassing a wide diversity of stand-alone switches and highly quantitative data, the resulting ribologic-solves experimental data set provides a rich resource for further improvement of riboswitch models and design methods.

    View details for DOI 10.1021/acssynbio.9b00142

    View details for Web of Science ID 000481979300016

    View details for PubMedID 31298841

    View details for PubMedCentralID PMC6703183

  • Scientific Discovery Games for Biomedical Research. Annual review of biomedical data science Das, R., Keep, B., Washington, P., Riedel-Kruse, I. H. 2019; 2 (1): 253-279

    Abstract

    Over the past decade, scientific discovery games (SDGs) have emerged as a viable approach for biomedical research, engaging hundreds of thousands of volunteer players and resulting in numerous scientific publications. After describing the origins of this novel research approach, we review the scientific output of SDGs across molecular modeling, sequence alignment, neuroscience, pathology, cellular biology, genomics, and human cognition. We find compelling results and technical innovations arising in problem-oriented games such as Foldit and Eterna and in data-oriented games such as EyeWire and Project Discovery. We discuss emergent properties of player communities shared across different projects, including the diversity of communities and the extraordinary contributions of some volunteers, such as paper writing. Finally, we highlight connections to artificial intelligence, biological cloud laboratories, new game genres, science education, and open science that may drive the next generation of SDGs.

    View details for DOI 10.1146/annurev-biodatasci-072018-021139

    View details for PubMedID 34308269

    View details for PubMedCentralID PMC8297398

  • Structure and ligand binding of the glutamine-II riboswitch. Nucleic acids research Huang, L., Wang, J., Watkins, A. M., Das, R., Lilley, D. M. 2019

    Abstract

    We have determined the structure of the glutamine-II riboswitch ligand binding domain using X-ray crystallography. The structure was solved using a novel combination of homology modeling and molecular replacement. The structure comprises three coaxial helical domains, the central one of which is a pseudoknot with partial triplex character. The major groove of this helix provides the binding site for L-glutamine, which is extensively hydrogen bonded to the RNA. Atomic mutation of the RNA at the ligand binding site leads to loss of binding shown by isothermal titration calorimetry, explaining the specificity of the riboswitch. A metal ion also plays an important role in ligand binding. This is directly bonded to a glutamine carboxylate oxygen atom, and its remaining inner-sphere water molecules make hydrogen bonding interactions with the RNA.

    View details for DOI 10.1093/nar/gkz539

    View details for PubMedID 31216023

  • A Quantitative and Predictive Model for RNA Binding by Human Pumilio Proteins MOLECULAR CELL Jarmoskaite, I., Denny, S. K., Vaidyanathan, P. P., Becker, W. R., Andreasson, J. L., Layton, C. J., Kappel, K., Shivashankar, V., Sreenivasan, R., Das, R., Greenleaf, W. J., Herschlag, D. 2019; 74 (5): 966-+
  • Blind tests of RNA-protein binding affinity prediction PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kappel, K., Jarmoskaite, I., Vaidyanathan, P. P., Greenleaf, W. J., Herschlag, D., Das, R. 2019; 116 (17): 8336–41
  • Spontaneous driving forces give rise to protein-RNA condensates with coexisting phases and complex material properties PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Boeynaems, S., Holehouse, A. S., Weinhardt, V., Kovacs, D., Van Lindt, J., Larabell, C., Van Den Bosch, L., Das, R., Tompa, P. S., Pappu, R., Gitler, A. D. 2019; 116 (16): 7889–98
  • Sampling Native-like Structures of RNA-Protein Complexes through Rosetta Folding and Docking STRUCTURE Kappel, K., Das, R. 2019; 27 (1): 140-+
  • Sequence-dependent RNA helix conformational preferences predictably impact tertiary structure formation. Proceedings of the National Academy of Sciences of the United States of America Yesselman, J. D., Denny, S. K., Bisaria, N. n., Herschlag, D. n., Greenleaf, W. J., Das, R. n. 2019

    Abstract

    Structured RNAs and RNA complexes underlie biological processes ranging from control of gene expression to protein translation. Approximately 50% of nucleotides within known structured RNAs are folded into Watson-Crick (WC) base pairs, and sequence changes that preserve these pairs are typically assumed to preserve higher-order RNA structure and binding of macromolecule partners. Here, we report that indirect effects of the helix sequence on RNA tertiary stability are, in fact, significant but are nevertheless predictable from a simple computational model called RNAMake-∆∆G. When tested through the RNA on a massively parallel array (RNA-MaP) experimental platform, blind predictions for >1500 variants of the tectoRNA heterodimer model system achieve high accuracy (rmsd 0.34 and 0.77 kcal/mol for sequence and length changes, respectively). Detailed comparison of predictions to experiments support a microscopic picture of how helix sequence changes subtly modulate conformational fluctuations at each base-pair step, which accumulate to impact RNA tertiary structure stability. Our study reveals a previously overlooked phenomenon in RNA structure formation and provides a framework of computation and experiment for understanding helix conformational preferences and their impact across biological RNA and RNA-protein assemblies.

    View details for DOI 10.1073/pnas.1901530116

    View details for PubMedID 31375637

  • Computational design of three-dimensional RNA structure and function. Nature nanotechnology Yesselman, J. D., Eiler, D. n., Carlson, E. D., Gotrik, M. R., d'Aquino, A. E., Ooms, A. N., Kladwang, W. n., Carlson, P. D., Shi, X. n., Costantino, D. A., Herschlag, D. n., Lucks, J. B., Jewett, M. C., Kieft, J. S., Das, R. n. 2019

    Abstract

    RNA nanotechnology seeks to create nanoscale machines by repurposing natural RNA modules. The field is slowed by the current need for human intuition during three-dimensional structural design. Here, we demonstrate that three distinct problems in RNA nanotechnology can be reduced to a pathfinding problem and automatically solved through an algorithm called RNAMake. First, RNAMake discovers highly stable single-chain solutions to the classic problem of aligning a tetraloop and its sequence-distal receptor, with experimental validation from chemical mapping, gel electrophoresis, solution X-ray scattering and crystallography with 2.55 Å resolution. Second, RNAMake automatically generates structured tethers that integrate 16S and 23S ribosomal RNAs into single-chain ribosomal RNAs that remain uncleaved by ribonucleases and assemble onto messenger RNA. Third, RNAMake enables the automated stabilization of small-molecule binding RNAs, with designed tertiary contacts that improve the binding affinity of the ATP aptamer and improve the fluorescence and stability of the Spinach RNA in cell extracts and in living Escherichia coli cells.

    View details for DOI 10.1038/s41565-019-0517-8

    View details for PubMedID 31427748

  • Using Rosetta for RNA homology modeling. Methods in enzymology Watkins, A. M., Rangan, R., Das, R. 2019; 623: 177–207

    Abstract

    The three-dimensional structures of RNA molecules provide rich and often critical information for understanding their functions, including how they recognize small molecule and protein partners. Computational modeling of RNA 3D structure is becoming increasingly accurate, particularly with the availability of growing numbers of template structures already solved experimentally and the development of sequence alignment and 3D modeling tools to take advantage of this database. For several recent "RNA puzzle" blind modeling challenges, we have successfully identified useful template structures and achieved accurate structure predictions through homology modeling tools developed in the Rosetta software suite. We describe our semi-automated methodology here and walk through two illustrative examples: an adenine riboswitch aptamer, modeled from a template guanine riboswitch structure, and a SAM I/IV riboswitch aptamer, modeled from a template SAM I riboswitch structure.

    View details for DOI 10.1016/bs.mie.2019.05.026

    View details for PubMedID 31239046

  • Evaluating riboswitch optimality. Methods in enzymology Wayment-Steele, H., Wu, M., Gotrik, M., Das, R. 2019; 623: 417–50

    Abstract

    Riboswitches are RNA elements that recognize diverse chemical and biomolecular inputs, and transduce this recognition process to genetic, fluorescent, and other engineered outputs using RNA conformational changes. These systems are pervasive in cellular biology and are a promising biotechnology with applications in genetic regulation and biosensing. Here, we derive a simple expression bounding the activation ratio-the proportion of RNA in the active vs. inactive states-for both ON and OFF riboswitches that operate near thermodynamic equilibrium: 1+[I]/KdI, where [I] is the input ligand concentration and KdI is the intrinsic dissociation constant of the aptamer module toward the input ligand. A survey of published studies of natural and synthetic riboswitches confirms that the vast majority of empirically measured activation ratios have remained well below this thermodynamic limit. A few natural and synthetic riboswitches achieve activation ratios close to the limit, and these molecules highlight important principles for achieving high riboswitch performance. For several applications, including "light-up" fluorescent sensors and chemically-controlled CRISPR/Cas complexes, the thermodynamic limit has not yet been achieved, suggesting that current tools are operating at suboptimal efficiencies. Future riboswitch studies will benefit from comparing observed activation ratios to this simple expression for the optimal activation ratio. We present experimental and computational suggestions for how to make these quantitative comparisons and suggest new molecular mechanisms that may allow non-equilibrium riboswitches to surpass the derived limit.

    View details for DOI 10.1016/bs.mie.2019.05.028

    View details for PubMedID 31239056

  • Cryo-EM structure of a 40 kDa SAM-IV riboswitch RNA at 3.7 Å resolution. Nature communications Zhang, K. n., Li, S. n., Kappel, K. n., Pintilie, G. n., Su, Z. n., Mou, T. C., Schmid, M. F., Das, R. n., Chiu, W. n. 2019; 10 (1): 5511

    Abstract

    Specimens below 50 kDa have generally been considered too small to be analyzed by single-particle cryo-electron microscopy (cryo-EM). The high flexibility of pure RNAs makes it difficult to obtain high-resolution structures by cryo-EM. In bacteria, riboswitches regulate sulfur metabolism through binding to the S-adenosylmethionine (SAM) ligand and offer compelling targets for new antibiotics. SAM-I, SAM-I/IV, and SAM-IV are the three most commonly found SAM riboswitches, but the structure of SAM-IV is still unknown. Here, we report the structures of apo and SAM-bound SAM-IV riboswitches (119-nt, ~40 kDa) to 3.7 Å and 4.1 Å resolution, respectively, using cryo-EM. The structures illustrate homologies in the ligand-binding core but distinct peripheral tertiary contacts in SAM-IV compared to SAM-I and SAM-I/IV. Our results demonstrate the feasibility of resolving small RNAs with enough detail to enable detection of their ligand-binding pockets and suggest that cryo-EM could play a role in structure-assisted drug design for RNA.

    View details for DOI 10.1038/s41467-019-13494-7

    View details for PubMedID 31796736

  • Scientific Discovery Games for Biomedical Research ANNUAL REVIEW OF BIOMEDICAL DATA SCIENCE, VOL 2, 2019 Das, R., Keep, B., Washington, P., Riedel-Kruse, I. H., Altman, R. B., Levitt, M. 2019; 2: 253–79
  • EternaBrain: Automated RNA design through move sets and strategies from an Internet-scale RNA videogame. PLoS computational biology Koodli, R. V., Keep, B. n., Coppess, K. R., Portela, F. n., Das, R. n. 2019; 15 (6): e1007059

    Abstract

    Emerging RNA-based approaches to disease detection and gene therapy require RNA sequences that fold into specific base-pairing patterns, but computational algorithms generally remain inadequate for these secondary structure design tasks. The Eterna project has crowdsourced RNA design to human video game players in the form of puzzles that reach extraordinary difficulty. Here, we demonstrate that Eterna participants' moves and strategies can be leveraged to improve automated computational RNA design. We present an eternamoves-large repository consisting of 1.8 million of player moves on 12 of the most-played Eterna puzzles as well as an eternamoves-select repository of 30,477 moves from the top 72 players on a select set of more advanced puzzles. On eternamoves-select, we present a multilayer convolutional neural network (CNN) EternaBrain that achieves test accuracies of 51% and 34% in base prediction and location prediction, respectively, suggesting that top players' moves are partially stereotyped. Pipelining this CNN's move predictions with single-action-playout (SAP) of six strategies compiled by human players solves 61 out of 100 independent puzzles in the Eterna100 benchmark. EternaBrain-SAP outperforms previously published RNA design algorithms and achieves similar or better performance than a newer generation of deep learning methods, while being largely orthogonal to these other methods. Our study provides useful lessons for future efforts to achieve human-competitive performance with automated RNA design algorithms.

    View details for DOI 10.1371/journal.pcbi.1007059

    View details for PubMedID 31247029

  • Ribosome-induced RNA conformational changes in a viral 3 '-UTR sense and regulate translation levels NATURE COMMUNICATIONS Hartwick, E. W., Costantino, D. A., MacFadden, A., Nix, J. C., Tian, S., Das, R., Kieft, J. S. 2018; 9
  • De novo computational RNA modeling into cryo-EM maps of large ribonucleoprotein complexes. Nature methods Kappel, K., Liu, S., Larsen, K. P., Skiniotis, G., Puglisi, E. V., Puglisi, J. D., Zhou, Z. H., Zhao, R., Das, R. 2018

    Abstract

    Increasingly, cryo-electron microscopy (cryo-EM) is used to determine the structures of RNA-protein assemblies, but nearly all maps determined with this method have biologically important regions where the local resolution does not permit RNA coordinate tracing. To address these omissions, we present de novo ribonucleoprotein modeling in real space through assembly of fragments together with experimental density in Rosetta (DRRAFTER). We show that DRRAFTER recovers near-native models for a diverse benchmark set of RNA-protein complexes including the spliceosome, mitochondrial ribosome, and CRISPR-Cas9-sgRNA complexes; rigorous blind tests include yeast U1 snRNP and spliceosomal P complex maps. Additionally, to aid in model interpretation, we present a method for reliable in situ estimation of DRRAFTER model accuracy. Finally, we apply DRRAFTER to recently determined maps of telomerase, the HIV-1 reverse transcriptase initiation complex, and the packaged MS2 genome, demonstrating the acceleration of accurate model building in challenging cases.

    View details for PubMedID 30377372

  • Sampling Native-like Structures of RNA-Protein Complexes through Rosetta Folding and Docking. Structure (London, England : 1993) Kappel, K., Das, R. 2018

    Abstract

    RNA-protein complexes underlie numerous cellular processes including translation, splicing, and posttranscriptional regulation of gene expression. The structures of these complexes are crucial to their functions but often elude high-resolution structure determination. Computational methods are needed that can integrate low-resolution data for RNA-protein complexes while modeling de novo the large conformational changes of RNA components upon complex formation. To address this challenge, we describe RNP-denovo, a Rosetta method to simultaneously fold-and-dock RNA to a protein surface. On a benchmark set of diverse RNA-protein complexes not solvable with prior strategies, RNP-denovo consistently sampled native-like structures with better than nucleotide resolution. We revisited three past blind modeling challenges involving the spliceosome, telomerase, and a methyltransferase-ribosomal RNA complex in which previous methods gave poor results. When coupled with the same sparse FRET, crosslinking, and functional data used previously, RNP-denovo gave models with significantly improved accuracy. These results open a route to modeling global folds of RNA-protein complexes from low-resolution data.

    View details for PubMedID 30416038

  • High-Throughput Investigation of Diverse Junction Elements in RNA Tertiary Folding. Cell Denny, S. K., Bisaria, N., Yesselman, J. D., Das, R., Herschlag, D., Greenleaf, W. J. 2018

    Abstract

    RNAs fold into defined tertiary structures to function in critical biological processes. While quantitative models can predict RNA secondary structure stability, we are still unable to predict the thermodynamic stability of RNA tertiary structure. Here, we probe conformational preferences of diverse RNA two-way junctions to develop a predictive model for the formation of RNA tertiary structure. We quantitatively measured tertiary assembly energetics of >1,000 of RNA junctions inserted in multiple structural scaffolds to generate a "thermodynamic fingerprint" for each junction. Thermodynamic fingerprints enabled comparison of junction conformational preferences, revealing principles for how sequence influences 3-dimensional conformations. Utilizing fingerprints of junctions with known crystal structures, we generated ensembles for related junctions that predicted their thermodynamic effects on assembly formation. This work reveals sequence-structure-energeticrelationships in RNA, demonstrates the capacity fordiverse compensation strategies within tertiary structures, and provides a path to quantitative modeling of RNA folding energetics based on "ensemble modularity."

    View details for PubMedID 29961580

  • Recording and Analyzing Nucleic Acid Distance Distributions with X-Ray Scattering Interferometry (XSI). Current protocols in nucleic acid chemistry Zettl, T., Das, R., Harbury, P. A., Herschlag, D., Lipfert, J., Mathew, R. S., Shi, X. 2018; 73 (1): e54

    Abstract

    Most structural techniques provide averaged information or information about a single predominant conformational state. However, biological macromolecules typically function through series of conformations. Therefore, a complete understanding of macromolecular structures requires knowledge of the ensembles that represent probabilities on a conformational free energy landscape. Here we describe an emerging approach, X-ray scattering interferometry (XSI), a method that provides instantaneous distance distributions for molecules in solution. XSI uses gold nanocrystal labels site-specifically attached to a macromolecule and measures the scattering interference from pairs of heavy metal labels. The recorded signal can directly be transformed into a distance distribution between the two probes. We describe the underlying concepts, present a detailed protocol for preparing samples and recording XSI data, and provide a custom-written graphical user interface to facilitate XSI data analysis. © 2018 by John Wiley & Sons, Inc.

    View details for PubMedID 29927110

  • Blind prediction of noncanonical RNA structure at atomic accuracy. Science advances Watkins, A. M., Geniesse, C., Kladwang, W., Zakrevsky, P., Jaeger, L., Das, R. 2018; 4 (5): eaar5316

    Abstract

    Prediction of RNA structure from nucleotide sequence remains an unsolved grand challenge of biochemistry and requires distinct concepts from protein structure prediction. Despite extensive algorithmic development in recent years, modeling of noncanonical base pairs of new RNA structural motifs has not been achieved in blind challenges. We report a stepwise Monte Carlo (SWM) method with a unique add-and-delete move set that enables predictions of noncanonical base pairs of complex RNA structures. A benchmark of 82 diverse motifs establishes the method's general ability to recover noncanonical pairs ab initio, including multistrand motifs that have been refractory to prior approaches. In a blind challenge, SWM models predicted nucleotide-resolution chemical mapping and compensatory mutagenesis experiments for three in vitro selected tetraloop/receptors with previously unsolved structures (C7.2, C7.10, and R1). As a final test, SWM blindly and correctly predicted all noncanonical pairs of a Zika virus double pseudoknot during a recent community-wide RNA-Puzzle. Stepwise structure formation, as encoded in the SWM method, enables modeling of noncanonical RNA structure in a variety of previously intractable problems.

    View details for PubMedID 29806027

  • Hidden Structural Modules in a Cooperative RNA Folding Transition CELL REPORTS Gracia, B., Al-Hashimi, H. M., Bisaria, N., Das, R., Herschlag, D., Russell, R. 2018; 22 (12): 3240–50

    Abstract

    Large-scale, cooperative rearrangements underlie the functions of RNA in RNA-protein machines and gene regulation. To understand how such rearrangements are orchestrated, we used high-throughput chemical footprinting to dissect a seemingly concerted rearrangement in P5abc RNA, a paradigm of RNA folding studies. With mutations that systematically disrupt or restore putative structural elements, we found that this transition reflects local folding of structural modules, with modest and incremental cooperativity that results in concerted behavior. First, two distant secondary structure changes are coupled through a bridging three-way junction and Mg2+-dependent tertiary structure. Second, long-range contacts are formed between modules, resulting in additional cooperativity. Given the sparseness of RNA tertiary contacts after secondary structure formation, we expect that modular folding and incremental cooperativity are generally important for specifying functional structures while also providing productive kinetic paths to these structures. Additionally, we expect our approach to be useful for uncovering modularity in other complex RNAs.

    View details for PubMedID 29562180

  • Allosteric mechanism of the V-vulnificus adenine riboswitch resolved by four-dimensional chemical mapping ELIFE Tian, S., Kladwang, W., Das, R. 2018; 7

    Abstract

    The structural interconversions that mediate the gene regulatory functions of RNA molecules may be different from classic models of allostery, but the relevant structural correlations have remained elusive in even intensively studied systems. Here, we present a four-dimensional expansion of chemical mapping called lock-mutate-map-rescue (LM2R), which integrates multiple layers of mutation with nucleotide-resolution chemical mapping. This technique resolves the core mechanism of the adenine-responsive V. vulnificus add riboswitch, a paradigmatic system for which both Monod-Wyman-Changeux (MWC) conformational selection models and non-MWC alternatives have been proposed. To discriminate amongst these models, we locked each functionally important helix through designed mutations and assessed formation or depletion of other helices via compensatory rescue evaluated by chemical mapping. These LM2R measurements give strong support to the pre-existing correlations predicted by MWC models, disfavor alternative models, and suggest additional structural heterogeneities that may be general across ligand-free riboswitches.

    View details for PubMedID 29446752

  • Updates to the RNA mapping database (RMDB), version 2 NUCLEIC ACIDS RESEARCH Yesselman, J. D., Tian, S., Liu, X., Shi, L., Li, J., Das, R. 2018; 46 (D1): D375–D379

    View details for DOI 10.1093/nar/gkx873

    View details for Web of Science ID 000419550700057

  • An Activity Switch in Human Telomerase Based on RNA Conformation and Shaped by TCAB1. Cell Chen, L. n., Roake, C. M., Freund, A. n., Batista, P. J., Tian, S. n., Yin, Y. A., Gajera, C. R., Lin, S. n., Lee, B. n., Pech, M. F., Venteicher, A. S., Das, R. n., Chang, H. Y., Artandi, S. E. 2018

    Abstract

    Ribonucleoprotein enzymes require dynamic conformations of their RNA constituents for regulated catalysis. Human telomerase employs a non-coding RNA (hTR) with a bipartite arrangement of domains-a template-containing core and a distal three-way junction (CR4/5) that stimulates catalysis through unknown means. Here, we show that telomerase activity unexpectedly depends upon the holoenzyme protein TCAB1, which in turn controls conformation of CR4/5. Cells lacking TCAB1 exhibit a marked reduction in telomerase catalysis without affecting enzyme assembly. Instead, TCAB1 inactivation causes unfolding of CR4/5 helices that are required for catalysis and for association with the telomerase reverse-transcriptase (TERT). CR4/5 mutations derived from patients with telomere biology disorders provoke defects in catalysis and TERT binding similar to TCAB1 inactivation. These findings reveal a conformational "activity switch" in human telomerase RNA controlling catalysis and TERT engagement. The identification of two discrete catalytic states for telomerase suggests an intramolecular means for controlling telomerase in cancers and progenitor cells.

    View details for PubMedID 29804836

  • Web-accessible molecular modeling with Rosetta: The Rosetta Online Server that Includes Everyone (ROSIE) PROTEIN SCIENCE Moretti, R., Lyskov, S., Das, R., Meiler, J., Gray, J. J. 2018; 27 (1): 259–68

    Abstract

    The Rosetta molecular modeling software package provides a large number of experimentally validated tools for modeling and designing proteins, nucleic acids, and other biopolymers, with new protocols being added continually. While freely available to academic users, external usage is limited by the need for expertise in the Unix command line environment. To make Rosetta protocols available to a wider audience, we previously created a web server called Rosetta Online Server that Includes Everyone (ROSIE), which provides a common environment for hosting web-accessible Rosetta protocols. Here we describe a simplification of the ROSIE protocol specification format, one that permits easier implementation of Rosetta protocols. Whereas the previous format required creating multiple separate files in different locations, the new format allows specification of the protocol in a single file. This new, simplified protocol specification has more than doubled the number of Rosetta protocols available under ROSIE. These new applications include pKa determination, lipid accessibility calculation, ribonucleic acid redesign, protein-protein docking, protein-small molecule docking, symmetric docking, antibody docking, cyclic toxin docking, critical binding peptide determination, and mapping small molecule binding sites. ROSIE is freely available to academic users at http://rosie.rosettacommons.org.

    View details for PubMedID 28960691

    View details for PubMedCentralID PMC5734271

  • RNA structure inference through chemical mapping after accidental or intentional mutations. Proceedings of the National Academy of Sciences of the United States of America Cheng, C. Y., Kladwang, W., Yesselman, J. D., Das, R. 2017; 114 (37): 9876-9881

    Abstract

    Despite the critical roles RNA structures play in regulating gene expression, sequencing-based methods for experimentally determining RNA base pairs have remained inaccurate. Here, we describe a multidimensional chemical-mapping method called "mutate-and-map read out through next-generation sequencing" (M2-seq) that takes advantage of sparsely mutated nucleotides to induce structural perturbations at partner nucleotides and then detects these events through dimethyl sulfate (DMS) probing and mutational profiling. In special cases, fortuitous errors introduced during DNA template preparation and RNA transcription are sufficient to give M2-seq helix signatures; these signals were previously overlooked or mistaken for correlated double-DMS events. When mutations are enhanced through error-prone PCR, in vitro M2-seq experimentally resolves 33 of 68 helices in diverse structured RNAs including ribozyme domains, riboswitch aptamers, and viral RNA domains with a single false positive. These inferences do not require energy minimization algorithms and can be made by either direct visual inspection or by a neural-network-inspired algorithm called M2-net. Measurements on the P4-P6 domain of the Tetrahymena group I ribozyme embedded in Xenopus egg extract demonstrate the ability of M2-seq to detect RNA helices in a complex biological environment.

    View details for DOI 10.1073/pnas.1619897114

    View details for PubMedID 28851837

    View details for PubMedCentralID PMC5603990

  • The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. Journal of chemical theory and computation Alford, R. F., Leaver-Fay, A., Jeliazkov, J. R., O'Meara, M. J., DiMaio, F. P., Park, H., Shapovalov, M. V., Renfrew, P. D., Mulligan, V. K., Kappel, K., Labonte, J. W., Pacella, M. S., Bonneau, R., Bradley, P., Dunbrack, R. L., Das, R., Baker, D., Kuhlman, B., Kortemme, T., Gray, J. J. 2017

    Abstract

    Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parametrized from small-molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, called the Rosetta Energy Function 2015 (REF15). Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend its capabilities from soluble proteins to also include membrane proteins, peptides containing noncanonical amino acids, small molecules, carbohydrates, nucleic acids, and other macromolecules.

    View details for DOI 10.1021/acs.jctc.7b00125

    View details for PubMedID 28430426

  • Primerize-2D: automated primer design for RNA multidimensional chemical mapping. Bioinformatics (Oxford, England) Tian, S., Das, R. 2017; 33 (9): 1405-1406

    Abstract

    Rapid RNA synthesis of comprehensive single mutant libraries and targeted multiple mutant libraries is enabling new multidimensional chemical approaches to solve RNA structures. PCR assembly of DNA templates and in vitro transcription allow synthesis and purification of hundreds of RNA mutants in a cost-effective manner, with sharing of primers across constructs allowing significant reductions in expense. However, these protocols require organization of primer locations across numerous 96 well plates and guidance for pipetting, non-trivial tasks for which informatics and visualization tools can prevent costly errors. We report here an online tool to accelerate synthesis of large libraries of desired mutants through design and efficient organization of primers. The underlying program and graphical interface have been experimentally tested in our laboratory for RNA domains with lengths up to 300 nucleotides and libraries encompassing up to 960 variants. In addition to the freely available Primerize-2D server, the primer design code is available as a stand-alone Python package for broader applications.http://primerize2d.stanford.edu.rhiju@stanford.edu.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btw814

    View details for PubMedID 28453672

    View details for PubMedCentralID PMC5859995

  • Single-molecule FRET-Rosetta reveals RNA structural rearrangements during human telomerase catalysis RNA Parks, J. W., Kappel, K., Das, R., Stone, M. D. 2017; 23 (2): 175-188

    Abstract

    Maintenance of telomeres by telomerase permits continuous proliferation of rapidly dividing cells, including the majority of human cancers. Despite its direct biomedical significance, the architecture of the human telomerase complex remains unknown. Generating homogeneous telomerase samples has presented a significant barrier to developing improved structural models. Here we pair single-molecule Förster resonance energy transfer (smFRET) measurements with Rosetta modeling to map the conformations of the essential telomerase RNA core domain within the active ribonucleoprotein. FRET-guided modeling places the essential pseudoknot fold distal to the active site on a protein surface comprising the C-terminal element, a domain that shares structural homology with canonical polymerase thumb domains. An independently solved medium-resolution structure of Tetrahymena telomerase provides a blind test of our modeling methodology and sheds light on the structural homology of this domain across diverse organisms. Our smFRET-Rosetta models reveal nanometer-scale rearrangements within the RNA core domain during catalysis. Taken together, our FRET data and pseudoatomic molecular models permit us to propose a possible mechanism for how RNA core domain rearrangement is coupled to template hybrid elongation.

    View details for DOI 10.1261/rna.058743.116

    View details for Web of Science ID 000392883800007

    View details for PubMedID 28096444

    View details for PubMedCentralID PMC5238793

  • RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme. RNA (New York, N.Y.) Miao, Z., Adamiak, R. W., Antczak, M., Batey, R. T., Becka, A. J., Biesiada, M., Boniecki, M. J., Bujnicki, J., Chen, S., Cheng, C. Y., Chou, F., Ferré-D'Amaré, A. R., Das, R., Dawson, W. K., Feng, D., Dokholyan, N. V., Dunin-Horkawicz, S., Geniesse, C., Kappel, K., Kladwang, W., Krokhotin, A., Lach, G. E., Major, F., Mann, T. H., Magnus, M., Pachulska-Wieczorek, K., Patel, D. J., Piccirilli, J. A., Popenda, M., Purzycka, K. J., Ren, A., Rice, G. M., SantaLucia, J., Sarzynska, J., Szachniuk, M., Tandon, A., Trausch, J. J., Tian, S., Wang, J., Weeks, K. M., Williams, B., Xiao, Y., Xu, X., Zhang, D., Zok, T., Westhof, E. 2017

    Abstract

    RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5'-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson-Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/.

    View details for DOI 10.1261/rna.060368.116

    View details for PubMedID 28138060

    View details for PubMedCentralID PMC5393176

  • Functional 5' UTR mRNA structures in eukaryotic translation regulation and how to find them. Nature reviews. Molecular cell biology Leppek, K. n., Das, R. n., Barna, M. n. 2017

    Abstract

    RNA molecules can fold into intricate shapes that can provide an additional layer of control of gene expression beyond that of their sequence. In this Review, we discuss the current mechanistic understanding of structures in 5' untranslated regions (UTRs) of eukaryotic mRNAs and the emerging methodologies used to explore them. These structures may regulate cap-dependent translation initiation through helicase-mediated remodelling of RNA structures and higher-order RNA interactions, as well as cap-independent translation initiation through internal ribosome entry sites (IRESs), mRNA modifications and other specialized translation pathways. We discuss known 5' UTR RNA structures and how new structure probing technologies coupled with prospective validation, particularly compensatory mutagenesis, are likely to identify classes of structured RNA elements that shape post-transcriptional control of gene expression and the development of multicellular organisms.

    View details for PubMedID 29165424

  • Controllable molecular motors engineered from myosin and RNA. Nature nanotechnology Omabegho, T. n., Gurel, P. S., Cheng, C. Y., Kim, L. Y., Ruijgrok, P. V., Das, R. n., Alushin, G. M., Bryant, Z. n. 2017

    Abstract

    Engineering biomolecular motors can provide direct tests of structure-function relationships and customized components for controlling molecular transport in artificial systems 1 or in living cells 2 . Previously, synthetic nucleic acid motors 3-5 and modified natural protein motors 6-10 have been developed in separate complementary strategies to achieve tunable and controllable motor function. Integrating protein and nucleic-acid components to form engineered nucleoprotein motors may enable additional sophisticated functionalities. However, this potential has only begun to be explored in pioneering work harnessing DNA scaffolds to dictate the spacing, number and composition of tethered protein motors 11-15 . Here, we describe myosin motors that incorporate RNA lever arms, forming hybrid assemblies in which conformational changes in the protein motor domain are amplified and redirected by nucleic acid structures. The RNA lever arm geometry determines the speed and direction of motor transport and can be dynamically controlled using programmed transitions in the lever arm structure 7,9 . We have characterized the hybrid motors using in vitro motility assays, single-molecule tracking, cryo-electron microscopy and structural probing 16 . Our designs include nucleoprotein motors that reversibly change direction in response to oligonucleotides that drive strand-displacement 17 reactions. In multimeric assemblies, the controllable motors walk processively along actin filaments at speeds of 10-20 nm s-1. Finally, to illustrate the potential for multiplexed addressable control, we demonstrate sequence-specific responses of RNA variants to oligonucleotide signals.

    View details for PubMedID 29109539

  • Blind tests of RNA nearest-neighbor energy prediction PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Chou, F., Kladwang, W., Kappel, K., Das, R. 2016; 113 (30): 8430-8435

    Abstract

    The predictive modeling and design of biologically active RNA molecules requires understanding the energetic balance among their basic components. Rapid developments in computer simulation promise increasingly accurate recovery of RNA's nearest-neighbor (NN) free-energy parameters, but these methods have not been tested in predictive trials or on nonstandard nucleotides. Here, we present, to our knowledge, the first such tests through a RECCES-Rosetta (reweighting of energy-function collection with conformational ensemble sampling in Rosetta) framework that rigorously models conformational entropy, predicts previously unmeasured NN parameters, and estimates these values' systematic uncertainties. RECCES-Rosetta recovers the 10 NN parameters for Watson-Crick stacked base pairs and 32 single-nucleotide dangling-end parameters with unprecedented accuracies: rmsd of 0.28 kcal/mol and 0.41 kcal/mol, respectively. For set-aside test sets, RECCES-Rosetta gives rmsd values of 0.32 kcal/mol on eight stacked pairs involving G-U wobble pairs and 0.99 kcal/mol on seven stacked pairs involving nonstandard isocytidine-isoguanosine pairs. To more rigorously assess RECCES-Rosetta, we carried out four blind predictions for stacked pairs involving 2,6-diaminopurine-U pairs, which achieved 0.64 kcal/mol rmsd accuracy when tested by subsequent experiments. Overall, these results establish that computational methods can now blindly predict energetics of basic RNA motifs, including chemically modified variants, with consistently better than 1 kcal/mol accuracy. Systematic tests indicate that resolving the remaining discrepancies will require energy function improvements beyond simply reweighting component terms, and we propose further blind trials to test such efforts.

    View details for DOI 10.1073/pnas.1523335113

    View details for Web of Science ID 000380346200043

    View details for PubMedID 27402765

    View details for PubMedCentralID PMC4968729

  • RNA structure through multidimensional chemical mapping QUARTERLY REVIEWS OF BIOPHYSICS Tian, S., Das, R. 2016; 49

    Abstract

    The discoveries of myriad non-coding RNA molecules, each transiting through multiple flexible states in cells or virions, present major challenges for structure determination. Advances in high-throughput chemical mapping give new routes for characterizing entire transcriptomes in vivo, but the resulting one-dimensional data generally remain too information-poor to allow accurate de novo structure determination. Multidimensional chemical mapping (MCM) methods seek to address this challenge. Mutate-and-map (M2), RNA interaction groups by mutational profiling (RING-MaP and MaP-2D analysis) and multiplexed •OH cleavage analysis (MOHCA) measure how the chemical reactivities of every nucleotide in an RNA molecule change in response to modifications at every other nucleotide. A growing body of in vitro blind tests and compensatory mutation/rescue experiments indicate that MCM methods give consistently accurate secondary structures and global tertiary structures for ribozymes, ribosomal domains and ligand-bound riboswitch aptamers up to 200 nucleotides in length. Importantly, MCM analyses provide detailed information on structurally heterogeneous RNA states, such as ligand-free riboswitches that are functionally important but difficult to resolve with other approaches. The sequencing requirements of currently available MCM protocols scale at least quadratically with RNA length, precluding general application to transcriptomes or viral genomes at present. We propose a modify-cross-link-map (MXM) expansion to overcome this and other current limitations to resolving the in vivo 'RNA structurome'.

    View details for DOI 10.1017/S0033583516000020

    View details for Web of Science ID 000375229800001

    View details for PubMedID 27266715

  • Principles for Predicting RNA Secondary Structure Design Difficulty. Journal of molecular biology Anderson-Lee, J., Fisker, E., Kosaraju, V., Wu, M., Kong, J., Lee, J., Lee, M., Zada, M., Treuille, A., Das, R. 2016; 428 (5): 748-757

    Abstract

    Designing RNAs that form specific secondary structures is enabling better understanding and control of living systems through RNA-guided silencing, genome editing and protein organization. Little is known, however, about which RNA secondary structures might be tractable for downstream sequence design, increasing the time and expense of design efforts due to inefficient secondary structure choices. Here, we present insights into specific structural features that increase the difficulty of finding sequences that fold into a target RNA secondary structure, summarizing the design efforts of tens of thousands of human participants and three automated algorithms (RNAInverse, INFO-RNA and RNA-SSD) in the Eterna massive open laboratory. Subsequent tests through three independent RNA design algorithms (NUPACK, DSS-Opt and MODENA) confirmed the hypothesized importance of several features in determining design difficulty, including sequence length, mean stem length, symmetry and specific difficult-to-design motifs such as zigzags. Based on these results, we have compiled an Eterna100 benchmark of 100 secondary structure design challenges that span a large range in design difficulty to help test future efforts. Our in silico results suggest new routes for improving computational RNA design methods and for extending these insights to assess "designability" of single RNA structures, as well as of switches for in vitro and in vivo applications.

    View details for DOI 10.1016/j.jmb.2015.11.013

    View details for PubMedID 26902426

    View details for PubMedCentralID PMC4833017

  • RNA Structure Refinement Using the ERRASER-Phenix Pipeline. Methods in molecular biology (Clifton, N.J.) Chou, F., Echols, N., Terwilliger, T. C., Das, R. 2016; 1320: 269-282

    Abstract

    The final step of RNA crystallography involves the fitting of coordinates into electron density maps. The large number of backbone atoms in RNA presents a difficult and tedious challenge, particularly when experimental density is poor. The ERRASER-Phenix pipeline can improve an initial set of RNA coordinates automatically based on a physically realistic model of atomic-level RNA interactions. The pipeline couples diffraction-based refinement in Phenix with the Rosetta-based real-space refinement protocol ERRASER (Enumerative Real-Space Refinement ASsisted by Electron density under Rosetta). The combination of ERRASER and Phenix can improve the geometrical quality of RNA crystallographic models while maintaining or improving the fit to the diffraction data (as measured by R free). Here we present a complete tutorial for running ERRASER-Phenix through the Phenix GUI, from the command-line, and via an application in the Rosetta On-line Server that Includes Everyone (ROSIE).

    View details for DOI 10.1007/978-1-4939-2763-0_17

    View details for PubMedID 26227049

  • Modeling Small Noncanonical RNA Motifs with the Rosetta FARFAR Server. Methods in molecular biology (Clifton, N.J.) Yesselman, J. D., Das, R. 2016; 1490: 187-198

    Abstract

    Noncanonical RNA motifs help define the vast complexity of RNA structure and function, and in many cases, these loops and junctions are on the order of only ten nucleotides in size. Unfortunately, despite their small size, there is no reliable method to determine the ensemble of lowest energy structures of junctions and loops at atomic accuracy. This chapter outlines straightforward protocols using a webserver for Rosetta Fragment Assembly of RNA with Full Atom Refinement (FARFAR) ( http://rosie.rosettacommons.org/rna_denovo/submit ) to model the 3D structure of small noncanonical RNA motifs for use in visualizing motifs and for further refinement or filtering with experimental data such as NMR chemical shifts.

    View details for DOI 10.1007/978-1-4939-6433-8_12

    View details for PubMedID 27665600

  • Rich RNA Structure Landscapes Revealed by Mutate-and-Map Analysis PLOS COMPUTATIONAL BIOLOGY Cordero, P., Das, R. 2015; 11 (11)

    Abstract

    Landscapes exhibiting multiple secondary structures arise in natural RNA molecules that modulate gene expression, protein synthesis, and viral infection [corrected]. We report herein that high-throughput chemical experiments can isolate an RNA's multiple alternative secondary structures as they are stabilized by systematic mutagenesis (mutate-and-map, M2) and that a computational algorithm, REEFFIT, enables unbiased reconstruction of these states' structures and populations. In an in silico benchmark on non-coding RNAs with complex landscapes, M2-REEFFIT recovers 95% of RNA helices present with at least 25% population while maintaining a low false discovery rate (10%) and conservative error estimates. In experimental benchmarks, M2-REEFFIT recovers the structure landscapes of a 35-nt MedLoop hairpin, a 110-nt 16S rRNA four-way junction with an excited state, a 25-nt bistable hairpin, and a 112-nt three-state adenine riboswitch with its expression platform, molecules whose characterization previously required expert mutational analysis and specialized NMR or chemical mapping experiments. With this validation, M2-REEFFIT enabled tests of whether artificial RNA sequences might exhibit complex landscapes in the absence of explicit design. An artificial flavin mononucleotide riboswitch and a randomly generated RNA sequence are found to interconvert between three or more states, including structures for which there was no design, but that could be stabilized through mutations. These results highlight the likely pervasiveness of rich landscapes with multiple secondary structures in both natural and artificial RNAs and demonstrate an automated chemical/computational route for their empirical characterization.

    View details for DOI 10.1371/journal.pcbi.1004473

    View details for PubMedID 26566145

    View details for PubMedCentralID PMC4643908

  • Automated band annotation for RNA structure probing experiments with numerous capillary electrophoresis profiles. Bioinformatics Lee, S., Kim, H., Tian, S., Lee, T., Yoon, S., Das, R. 2015; 31 (17): 2808-2815

    Abstract

    Capillary electrophoresis (CE) is a powerful approach for structural analysis of nucleic acids, with recent high-throughput variants enabling three-dimensional RNA modeling and the discovery of new rules for RNA structure design. Among the steps composing CE analysis, the process of finding each band in an electrophoretic trace and mapping it to a position in the nucleic acid sequence has required significant manual inspection and remains the most time-consuming and error-prone step. The few available tools seeking to automate this band annotation have achieved limited accuracy and have not taken advantage of information across dozens of profiles routinely acquired in high-throughput measurements.We present a dynamic-programming-based approach to automate band annotation for high-throughput capillary electrophoresis. The approach is uniquely able to define and optimize a robust target function that takes into account multiple CE profiles (sequencing ladders, different chemical probes, different mutants) collected for the RNA. Over a large benchmark of multi-profile datasets for biological RNAs and designed RNAs from the EteRNA project, the method outperforms prior tools (QuSHAPE and FAST) significantly in terms of accuracy compared with gold-standard manual annotations. The amount of computation required is reasonable at a few seconds per dataset. We also introduce an 'E-score' metric to automatically assess the reliability of the band annotation and show it to be practically useful in flagging uncertainties in band annotation for further inspection.The implementation of the proposed algorithm is included in the HiTRACE software, freely available as an online server and for download at http://hitrace.stanford.edu.sryoon@snu.ac.kr or rhiju@stanford.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btv282

    View details for PubMedID 25943472

  • RNA-Redesign: a web server for fixed-backbone 3D design of RNA. Nucleic acids research Yesselman, J. D., Das, R. 2015; 43 (W1): W498-501

    Abstract

    RNA is rising in importance as a design medium for interrogating fundamental biology and for developing therapeutic and bioengineering applications. While there are several online servers for design of RNA secondary structure, there are no tools available for the rational design of 3D RNA structure. Here we present RNA-Redesign (http://rnaredesign.stanford.edu), an online 3D design tool for RNA. This resource utilizes fixed-backbone design to optimize the sequence identity and nucleobase conformations of an RNA to match a desired backbone, analogous to fundamental tools that underlie rational protein engineering. The resulting sequences suggest thermostabilizing mutations that can be experimentally verified. Further, sequence preferences that differ between natural and computationally designed sequences can suggest whether natural sequences possess functional constraints besides folding stability, such as cofactor binding or conformational switching. Finally, for biochemical studies, the designed sequences can suggest experimental tests of 3D models, including concomitant mutation of base triples. In addition to the designs generated, detailed graphical analysis is presented through an integrated and user-friendly environment.

    View details for DOI 10.1093/nar/gkv465

    View details for PubMedID 25964298

    View details for PubMedCentralID PMC4489241

  • Primerize: automated primer assembly for transcribing non-coding RNA domains. Nucleic acids research Tian, S., Yesselman, J. D., Cordero, P., Das, R. 2015; 43 (W1): W522-6

    Abstract

    Customized RNA synthesis is in demand for biological and biotechnological research. While chemical synthesis and gel or chromatographic purification of RNA is costly and difficult for sequences longer than tens of nucleotides, a pipeline of primer assembly of DNA templates, in vitro transcription by T7 RNA polymerase and kit-based purification provides a cost-effective and fast alternative for preparing RNA molecules. Nevertheless, designing template primers that optimize cost and avoid mispriming during polymerase chain reaction currently requires expert inspection, downloading specialized software or both. Online servers are currently not available or maintained for the task. We report here a server named Primerize that makes available an efficient algorithm for primer design developed and experimentally tested in our laboratory for RNA domains with lengths up to 300 nucleotides. Free access: http://primerize.stanford.edu.

    View details for DOI 10.1093/nar/gkv538

    View details for PubMedID 25999345

    View details for PubMedCentralID PMC4489279

  • Consistent global structures of complex RNA states through multidimensional chemical mapping ELIFE Cheng, C. Y., Chou, F., Kladwang, W., Tian, S., Cordero, P., Das, R. 2015; 4

    Abstract

    Accelerating discoveries of non-coding RNA (ncRNA) in myriad biological processes pose major challenges to structural and functional analysis. Despite progress in secondary structure modeling, high-throughput methods have generally failed to determine ncRNA tertiary structures, even at the 1-nm resolution that enables visualization of how helices and functional motifs are positioned in three dimensions. We report that integrating a new method called MOHCA-seq (Multiplexed •OH Cleavage Analysis with paired-end sequencing) with mutate-and-map secondary structure inference guides Rosetta 3D modeling to consistent 1-nm accuracy for intricately folded ncRNAs with lengths up to 188 nucleotides, including a blind RNA-puzzle challenge, the lariat-capping ribozyme. This multidimensional chemical mapping (MCM) pipeline resolves unexpected tertiary proximities for cyclic-di-GMP, glycine, and adenosylcobalamin riboswitch aptamers without their ligands and a loose structure for the recently discovered human HoxA9D internal ribosome entry site regulon. MCM offers a sequencing-based route to uncovering ncRNA 3D structure, applicable to functionally important but potentially heterogeneous states.

    View details for DOI 10.7554/eLife.07600

    View details for Web of Science ID 000373439800001

    View details for PubMedCentralID PMC4495719

  • RNA-Puzzles Round II: assessment of RNA structure prediction programs applied to three large RNA structures RNA Miao, Z., Adamiak, R. W., Blanchet, M., Boniecki, M., Bujnicki, J. M., Chen, S., Cheng, C., Chojnowski, G., Chou, F., Cordero, P., Cruz, J. A., Ferre-D'Amare, A. R., Das, R., Ding, F., Dokholyan, N. V., Dunin-Horkawicz, S., Kladwang, W., Krokhotin, A., Lach, G., Magnus, M., Major, F., Mann, T. H., Masquida, B., Matelska, D., Meyer, M., Peselis, A., Popenda, M., Purzycka, K. J., Serganov, A., Stasiewicz, J., Szachniuk, M., Tandon, A., Tian, S., Wang, J., Xia, Y., Xu, X., Zhang, J., Zha, P., Zok, T., Westhof, E. 2015; 21 (6): 1066-1084

    Abstract

    This paper is a report of a second round of RNA-Puzzles, a collective and blind experiment in three-dimensional (3D) RNA structure prediction. Three puzzles, Puzzles 5, 6, and 10, represented sequences of three large RNA structures with limited or no homology with previously solved RNA molecules. A lariat-capping ribozyme, as well as riboswitches complexed to adenosylcobalamin and tRNA, were predicted by seven groups using RNAComposer, ModeRNA/SimRNA, Vfold, Rosetta, DMD, MC-Fold, 3dRNA, and AMBER refinement. Some groups derived models using data from state-of-the-art chemical-mapping methods (SHAPE, DMS, CMCT, and mutate-and-map). The comparisons between the predictions and the three subsequently released crystallographic structures, solved at diffraction resolutions of 2.5-3.2 Å, were carried out automatically using various sets of quality indicators. The comparisons clearly demonstrate the state of present-day de novo prediction abilities as well as the limitations of these state-of-the-art methods. All of the best prediction models have similar topologies to the native structures, which suggests that computational methods for RNA structure prediction can already provide useful structural information for biological problems. However, the prediction accuracy for non-Watson-Crick interactions, key to proper folding of RNAs, is low and some predicted models had high Clash Scores. These two difficulties point to some of the continuing bottlenecks in RNA structure prediction. All submitted models are available for download at http://ahsoka.u-strasbg.fr/rnapuzzles/.

    View details for DOI 10.1261/rna.049502.114

    View details for Web of Science ID 000356316200002

    View details for PubMedID 25883046

    View details for PubMedCentralID PMC4436661

  • Modeling complex RNA tertiary folds with rosetta. Methods in enzymology Cheng, C. Y., Chou, F., Das, R. 2015; 553: 35-64

    Abstract

    Reliable modeling of RNA tertiary structures is key to both understanding these structures' roles in complex biological machines and to eventually facilitating their design for molecular computing and robotics. In recent years, a concerted effort to improve computational prediction of RNA structure through the RNA-Puzzles blind prediction trials has accelerated advances in the field. Among other approaches, the versatile and expanding Rosetta molecular modeling software now permits modeling of RNAs in the 100-300 nucleotide size range at consistent subhelical (~1nm) resolution. Our laboratory's current state-of-the-art methods for RNAs in this size range involve Fragment Assembly of RNA with Full-Atom Refinement (FARFAR), which optimizes RNA conformations in the context of a physically realistic energy function, as well as hybrid techniques that leverage experimental data to inform computational modeling. In this chapter, we give a practical guide to our current workflow for modeling RNA three-dimensional structures using FARFAR, including strategies for using data from multidimensional chemical mapping experiments to focus sampling and select accurate conformations.

    View details for DOI 10.1016/bs.mie.2014.10.051

    View details for PubMedID 25726460

  • Consistent global structures of complex RNA states through multidimensional chemical mapping. eLife Cheng, C. Y., Chou, F., Kladwang, W., Tian, S., Cordero, P., Das, R. 2015; 4

    Abstract

    Accelerating discoveries of non-coding RNA (ncRNA) in myriad biological processes pose major challenges to structural and functional analysis. Despite progress in secondary structure modeling, high-throughput methods have generally failed to determine ncRNA tertiary structures, even at the 1-nm resolution that enables visualization of how helices and functional motifs are positioned in three dimensions. We report that integrating a new method called MOHCA-seq (Multiplexed •OH Cleavage Analysis with paired-end sequencing) with mutate-and-map secondary structure inference guides Rosetta 3D modeling to consistent 1-nm accuracy for intricately folded ncRNAs with lengths up to 188 nucleotides, including a blind RNA-puzzle challenge, the lariat-capping ribozyme. This multidimensional chemical mapping (MCM) pipeline resolves unexpected tertiary proximities for cyclic-di-GMP, glycine, and adenosylcobalamin riboswitch aptamers without their ligands and a loose structure for the recently discovered human HoxA9D internal ribosome entry site regulon. MCM offers a sequencing-based route to uncovering ncRNA 3D structure, applicable to functionally important but potentially heterogeneous states.

    View details for DOI 10.7554/eLife.07600

    View details for PubMedID 26035425

    View details for PubMedCentralID PMC4495719

  • RNA regulons in Hox 5' UTRs confer ribosome specificity to gene regulation. Nature Xue, S., Tian, S., Fujii, K., Kladwang, W., Das, R., Barna, M. 2015; 517 (7532): 33-38

    Abstract

    Emerging evidence suggests that the ribosome has a regulatory function in directing how the genome is translated in time and space. However, how this regulation is encoded in the messenger RNA sequence remains largely unknown. Here we uncover unique RNA regulons embedded in homeobox (Hox) 5' untranslated regions (UTRs) that confer ribosome-mediated control of gene expression. These structured RNA elements, resembling viral internal ribosome entry sites (IRESs), are found in subsets of Hox mRNAs. They facilitate ribosome recruitment and require the ribosomal protein RPL38 for their activity. Despite numerous layers of Hox gene regulation, these IRES elements are essential for converting Hox transcripts into proteins to pattern the mammalian body plan. This specialized mode of IRES-dependent translation is enabled by an additional regulatory element that we term the translation inhibitory element (TIE), which blocks cap-dependent translation of transcripts. Together, these data uncover a new paradigm for ribosome-mediated control of gene expression and organismal development.

    View details for DOI 10.1038/nature14010

    View details for PubMedID 25409156

  • High-throughput mutate-map-rescue evaluates SHAPE-directed RNA structure and uncovers excited states RNA-A PUBLICATION OF THE RNA SOCIETY Tian, S., Cordero, P., Kladwang, W., Das, R. 2014; 20 (11): 1815-1826

    Abstract

    The three-dimensional conformations of noncoding RNAs underpin their biochemical functions but have largely eluded experimental characterization. Here, we report that integrating a classic mutation/rescue strategy with high-throughput chemical mapping enables rapid RNA structure inference with unusually strong validation. We revisit a 16S rRNA domain for which SHAPE (selective 2'-hydroxyl acylation with primer extension) and limited mutational analysis suggested a conformational change between apo- and holo-ribosome conformations. Computational support estimates, data from alternative chemical probes, and mutate-and-map (M(2)) experiments highlight issues of prior methodology and instead give a near-crystallographic secondary structure. Systematic interrogation of single base pairs via a high-throughput mutation/rescue approach then permits incisive validation and refinement of the M(2)-based secondary structure. The data further uncover the functional conformation as an excited state (20 ± 10% population) accessible via a single-nucleotide register shift. These results correct an erroneous SHAPE inference of a ribosomal conformational change, expose critical limitations of conventional structure mapping methods, and illustrate practical steps for more incisively dissecting RNA dynamic structure landscapes.

    View details for DOI 10.1261/rna.044321.114

    View details for Web of Science ID 000344065900015

  • Scientific rigor through videogames. Trends in biochemical sciences Treuille, A., Das, R. 2014; 39 (11): 507-509

    Abstract

    Hypothesis-driven experimentation - the scientific method - can be subverted by fraud, irreproducibility, and lack of rigorous predictive tests. A robust solution to these problems may be the 'massive open laboratory' model, recently embodied in the internet-scale videogame EteRNA. Deploying similar platforms throughout biology could enforce the scientific method more broadly.

    View details for DOI 10.1016/j.tibs.2014.08.005

    View details for PubMedID 25300714

  • Double-stranded RNA under force and torque: similarities to and striking differences from double-stranded DNA. Proceedings of the National Academy of Sciences of the United States of America Lipfert, J., Skinner, G. M., Keegstra, J. M., Hensgens, T., Jager, T., Dulin, D., Köber, M., Yu, Z., Donkers, S. P., Chou, F., Das, R., Dekker, N. H. 2014; 111 (43): 15408-15413

    Abstract

    RNA plays myriad roles in the transmission and regulation of genetic information that are fundamentally constrained by its mechanical properties, including the elasticity and conformational transitions of the double-stranded (dsRNA) form. Although double-stranded DNA (dsDNA) mechanics have been dissected with exquisite precision, much less is known about dsRNA. Here we present a comprehensive characterization of dsRNA under external forces and torques using magnetic tweezers. We find that dsRNA has a force-torque phase diagram similar to that of dsDNA, including plectoneme formation, melting of the double helix induced by torque, a highly overwound state termed "P-RNA," and a highly underwound, left-handed state denoted "L-RNA." Beyond these similarities, our experiments reveal two unexpected behaviors of dsRNA: Unlike dsDNA, dsRNA shortens upon overwinding, and its characteristic transition rate at the plectonemic buckling transition is two orders of magnitude slower than for dsDNA. Our results challenge current models of nucleic acid mechanics, provide a baseline for modeling RNAs in biological contexts, and pave the way for new classes of magnetic tweezers experiments to dissect the role of twist and torque for RNA-protein interactions at the single-molecule level.

    View details for DOI 10.1073/pnas.1407197111

    View details for PubMedID 25313077

    View details for PubMedCentralID PMC4217419

  • Blind predictions of DNA and RNA tweezers experiments with force and torque. PLoS computational biology Chou, F., Lipfert, J., Das, R. 2014; 10 (8)

    Abstract

    Single-molecule tweezers measurements of double-stranded nucleic acids (dsDNA and dsRNA) provide unprecedented opportunities to dissect how these fundamental molecules respond to forces and torques analogous to those applied by topoisomerases, viral capsids, and other biological partners. However, tweezers data are still most commonly interpreted post facto in the framework of simple analytical models. Testing falsifiable predictions of state-of-the-art nucleic acid models would be more illuminating but has not been performed. Here we describe a blind challenge in which numerical predictions of nucleic acid mechanical properties were compared to experimental data obtained recently for dsRNA under applied force and torque. The predictions were enabled by the HelixMC package, first presented in this paper. HelixMC advances crystallography-derived base-pair level models (BPLMs) to simulate kilobase-length dsDNAs and dsRNAs under external forces and torques, including their global linking numbers. These calculations recovered the experimental bending persistence length of dsRNA within the error of the simulations and accurately predicted that dsRNA's "spring-like" conformation would give a two-fold decrease of stretch modulus relative to dsDNA. Further blind predictions of helix torsional properties, however, exposed inaccuracies in current BPLM theory, including three-fold discrepancies in torsional persistence length at the high force limit and the incorrect sign of dsRNA link-extension (twist-stretch) coupling. Beyond these experiments, HelixMC predicted that 'nucleosome-excluding' poly(A)/poly(T) is at least two-fold stiffer than random-sequence dsDNA in bending, stretching, and torsional behaviors; Z-DNA to be at least three-fold stiffer than random-sequence dsDNA, with a near-zero link-extension coupling; and non-negligible effects from base pair step correlations. We propose that experimentally testing these predictions should be powerful next steps for understanding the flexibility of dsDNA and dsRNA in sequence contexts and under mechanical stresses relevant to their biology.

    View details for DOI 10.1371/journal.pcbi.1003756

    View details for PubMedID 25102226

    View details for PubMedCentralID PMC4125081

  • Understanding nucleic Acid-ion interactions. Annual review of biochemistry Lipfert, J., Doniach, S., Das, R., Herschlag, D. 2014; 83: 813-841

    Abstract

    Ions surround nucleic acids in what is referred to as an ion atmosphere. As a result, the folding and dynamics of RNA and DNA and their complexes with proteins and with each other cannot be understood without a reasonably sophisticated appreciation of these ions' electrostatic interactions. However, the underlying behavior of the ion atmosphere follows physical rules that are distinct from the rules of site binding that biochemists are most familiar and comfortable with. The main goal of this review is to familiarize nucleic acid experimentalists with the physical concepts that underlie nucleic acid-ion interactions. Throughout, we provide practical strategies for interpreting and analyzing nucleic acid experiments that avoid pitfalls from oversimplified or incorrect models. We briefly review the status of theories that predict or simulate nucleic acid-ion interactions and experiments that test these theories. Finally, we describe opportunities for going beyond phenomenological fits to a next-generation, truly predictive understanding of nucleic acid-ion interactions.

    View details for DOI 10.1146/annurev-biochem-060409-092720

    View details for PubMedID 24606136

  • Standardization of RNA chemical mapping experiments. Biochemistry Kladwang, W., Mann, T. H., Becka, A., Tian, S., Kim, H., Yoon, S., Das, R. 2014; 53 (19): 3063-3065

    Abstract

    Chemical mapping experiments offer powerful information about RNA structure but currently involve ad hoc assumptions in data processing. We show that simple dilutions, referencing standards (GAGUA hairpins), and HiTRACE/MAPseeker analysis allow rigorous overmodification correction, background subtraction, and normalization for electrophoretic data and a ligation bias correction needed for accurate deep sequencing data. Comparisons across six noncoding RNAs stringently test the proposed standardization of dimethyl sulfate (DMS), 2'-OH acylation (SHAPE), and carbodiimide measurements. Identification of new signatures for extrahelical bulges and DMS "hot spot" pockets (including tRNA A58, methylated in vivo) illustrates the utility and necessity of standardization for quantitative RNA mapping.

    View details for DOI 10.1021/bi5003426

    View details for PubMedID 24766159

  • Structure determination of noncanonical RNA motifs guided by ¹H NMR chemical shifts. Nature methods Sripakdeevong, P., Cevec, M., Chang, A. T., Erat, M. C., Ziegeler, M., Zhao, Q., Fox, G. E., Gao, X., Kennedy, S. D., Kierzek, R., Nikonowicz, E. P., Schwalbe, H., Sigel, R. K., Turner, D. H., Das, R. 2014; 11 (4): 413-416

    Abstract

    Structured noncoding RNAs underlie fundamental cellular processes, but determining their three-dimensional structures remains challenging. We demonstrate that integrating ¹H NMR chemical shift data with Rosetta de novo modeling can be used to consistently determine high-resolution RNA structures. On a benchmark set of 23 noncanonical RNA motifs, including 11 'blind' targets, chemical-shift Rosetta for RNA (CS-Rosetta-RNA) recovered experimental structures with high accuracy (0.6-2.0 Å all-heavy-atom r.m.s. deviation) in 18 cases.

    View details for DOI 10.1038/nmeth.2876

    View details for PubMedID 24584194

  • Bayesian energy landscape tilting: towards concordant models of molecular ensembles. Biophysical journal Beauchamp, K. A., Pande, V. S., Das, R. 2014; 106 (6): 1381-1390

    Abstract

    Predicting biological structure has remained challenging for systems such as disordered proteins that take on myriad conformations. Hybrid simulation/experiment strategies have been undermined by difficulties in evaluating errors from computational model inaccuracies and data uncertainties. Building on recent proposals from maximum entropy theory and nonequilibrium thermodynamics, we address these issues through a Bayesian energy landscape tilting (BELT) scheme for computing Bayesian hyperensembles over conformational ensembles. BELT uses Markov chain Monte Carlo to directly sample maximum-entropy conformational ensembles consistent with a set of input experimental observables. To test this framework, we apply BELT to model trialanine, starting from disagreeing simulations with the force fields ff96, ff99, ff99sbnmr-ildn, CHARMM27, and OPLS-AA. BELT incorporation of limited chemical shift and (3)J measurements gives convergent values of the peptide's α, β, and PPII conformational populations in all cases. As a test of predictive power, all five BELT hyperensembles recover set-aside measurements not used in the fitting and report accurate errors, even when starting from highly inaccurate simulations. BELT's principled framework thus enables practical predictions for complex biomolecular systems from discordant simulations and sparse data.

    View details for DOI 10.1016/j.bpj.2014.02.009

    View details for PubMedID 24655513

    View details for PubMedCentralID PMC3984982

  • RNA design rules from a massive open laboratory PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lee, J., Kladwang, W., Lee, M., Cantu, D., Azizyan, M., Kim, H., Limpaecher, A., Yoon, S., Treuille, A., Das, R. 2014; 111 (6): 2122-2127

    Abstract

    Self-assembling RNA molecules present compelling substrates for the rational interrogation and control of living systems. However, imperfect in silico models--even at the secondary structure level--hinder the design of new RNAs that function properly when synthesized. Here, we present a unique and potentially general approach to such empirical problems: the Massive Open Laboratory. The EteRNA project connects 37,000 enthusiasts to RNA design puzzles through an online interface. Uniquely, EteRNA participants not only manipulate simulated molecules but also control a remote experimental pipeline for high-throughput RNA synthesis and structure mapping. We show herein that the EteRNA community leveraged dozens of cycles of continuous wet laboratory feedback to learn strategies for solving in vitro RNA design problems on which automated methods fail. The top strategies--including several previously unrecognized negative design rules--were distilled by machine learning into an algorithm, EteRNABot. Over a rigorous 1-y testing phase, both the EteRNA community and EteRNABot significantly outperformed prior algorithms in a dozen RNA secondary structure design tests, including the creation of dendrimer-like structures and scaffolds for small molecule sensors. These results show that an online community can carry out large-scale experiments, hypothesis generation, and algorithm design to create practical advances in empirical science.

    View details for DOI 10.1073/pnas.1313039111

    View details for Web of Science ID 000330999600027

    View details for PubMedID 24469816

    View details for PubMedCentralID PMC3926058

  • Challenging the state of the art in protein structure prediction: Highlights of experimental target structures for the 10th Critical Assessment of Techniques for Protein Structure Prediction Experiment CASP10 PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Kryshtafovych, A., Moult, J., Bales, P., Bazan, J. F., Biasini, M., Burgin, A., Chen, C., Cochran, F. V., Craig, T. K., Das, R., Fass, D., Garcia-Doval, C., Herzberg, O., Lorimer, D., Luecke, H., Ma, X., Nelson, D. C., Van Raaij, M. J., Rohwer, F., Segall, A., Seguritan, V., Zeth, K., Schwede, T. 2014; 82: 26-42

    Abstract

    For the last two decades, CASP has assessed the state of the art in techniques for protein structure prediction and identified areas which required further development. CASP would not have been possible without the prediction targets provided by the experimental structural biology community. In the latest experiment, CASP10, more than 100 structures were suggested as prediction targets, some of which appeared to be extraordinarily difficult for modeling. In this article, authors of some of the most challenging targets discuss which specific scientific question motivated the experimental structure determination of the target protein, which structural features were especially interesting from a structural or functional perspective, and to what extent these features were correctly reproduced in the predictions submitted to CASP10. Specifically, the following targets will be presented: the acid-gated urea channel, a difficult to predict transmembrane protein from the important human pathogen Helicobacter pylori; the structure of human interleukin (IL)-34, a recently discovered helical cytokine; the structure of a functionally uncharacterized enzyme OrfY from Thermoproteus tenax formed by a gene duplication and a novel fold; an ORFan domain of mimivirus sulfhydryl oxidase R596; the fiber protein gene product 17 from bacteriophage T7; the bacteriophage CBA-120 tailspike protein; a virus coat protein from metagenomic samples of the marine environment; and finally, an unprecedented class of structure prediction targets based on engineered disulfide-rich small proteins.

    View details for Web of Science ID 000331147900004

    View details for PubMedID 24318984

  • The Mutate-and-Map Protocol for Inferring Base Pairs in Structured RNA. Methods in molecular biology (Clifton, N.J.) Cordero, P., Kladwang, W., VanLang, C. C., Das, R. 2014; 1086: 53-77

    Abstract

    Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule's "contact map." Here, we give our in-house protocol for this "mutate-and-map" (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure.

    View details for DOI 10.1007/978-1-62703-667-2_4

    View details for PubMedID 24136598

  • Massively Parallel RNA Chemical Mapping with a Reduced Bias MAP-Seq Protocol. Methods in molecular biology (Clifton, N.J.) Seetin, M. G., Kladwang, W., Bida, J. P., Das, R. 2014; 1086: 95-117

    Abstract

    Chemical mapping methods probe RNA structure by revealing and leveraging correlations of a nucleotide's structural accessibility or flexibility with its reactivity to various chemical probes. Pioneering work by Lucks and colleagues has expanded this method to probe hundreds of molecules at once on an Illumina sequencing platform, obviating the use of slab gels or capillary electrophoresis on one molecule at a time. Here, we describe optimizations to this method from our lab, resulting in the MAP-seq protocol (Multiplexed Accessibility Probing read out through sequencing), version 1.0. The protocol permits the quantitative probing of thousands of RNAs at once, by several chemical modification reagents, on the time scale of a day using a tabletop Illumina machine. This method and a software package MAPseeker ( http://simtk.org/home/map_seeker ) address several potential sources of bias, by eliminating PCR steps, improving ligation efficiencies of ssDNA adapters, and avoiding problematic heuristics in prior algorithms. We hope that the step-by-step description of MAP-seq 1.0 will help other RNA mapping laboratories to transition from electrophoretic to next-generation sequencing methods and to further reduce the turnaround time and any remaining biases of the protocol.

    View details for DOI 10.1007/978-1-62703-667-2_6

    View details for PubMedID 24136600

  • Atomic-Accuracy Prediction of Protein Loop Structures through an RNA-Inspired Ansatz PLOS ONE Das, R. 2013; 8 (10)

    Abstract

    Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.

    View details for DOI 10.1371/journal.pone.0074830

    View details for PubMedID 24204571

  • Adding Diverse Noncanonical Backbones to Rosetta: Enabling Peptidomimetic Design PLOS ONE Drew, K., Renfrew, P. D., Craven, T. W., Butterfoss, G. L., Chou, F., Lyskov, S., Bullock, B. N., Watkins, A., Labonte, J. W., Pacella, M., Kilambi, K. P., Leaver-Fay, A., Kuhlman, B., Gray, J. J., Bradley, P., Kirshenbaum, K., Arora, P. S., Das, R., Bonneau, R. 2013; 8 (7)

    Abstract

    Peptidomimetics are classes of molecules that mimic structural and functional attributes of polypeptides. Peptidomimetic oligomers can frequently be synthesized using efficient solid phase synthesis procedures similar to peptide synthesis. Conformationally ordered peptidomimetic oligomers are finding broad applications for molecular recognition and for inhibiting protein-protein interactions. One critical limitation is the limited set of design tools for identifying oligomer sequences that can adopt desired conformations. Here, we present expansions to the ROSETTA platform that enable structure prediction and design of five non-peptidic oligomer scaffolds (noncanonical backbones), oligooxopiperazines, oligo-peptoids, [Formula: see text]-peptides, hydrogen bond surrogate helices and oligosaccharides. This work is complementary to prior additions to model noncanonical protein side chains in ROSETTA. The main purpose of our manuscript is to give a detailed description to current and future developers of how each of these noncanonical backbones was implemented. Furthermore, we provide a general outline for implementation of new backbone types not discussed here. To illustrate the utility of this approach, we describe the first tests of the ROSETTA molecular mechanics energy function in the context of oligooxopiperazines, using quantum mechanical calculations as comparison points, scanning through backbone and side chain torsion angles for a model peptidomimetic. Finally, as an example of a novel design application, we describe the automated design of an oligooxopiperazine that inhibits the p53-MDM2 protein-protein interaction. For the general biological and bioengineering community, several noncanonical backbones have been incorporated into web applications that allow users to freely and rapidly test the presented protocols (http://rosie.rosettacommons.org). This work helps address the peptidomimetic community's need for an automated and expandable modeling tool for noncanonical backbones.

    View details for DOI 10.1371/journal.pone.0067051

    View details for Web of Science ID 000323110600005

    View details for PubMedID 23869206

    View details for PubMedCentralID PMC3712014

  • HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis NUCLEIC ACIDS RESEARCH Kim, H., Cordero, P., Das, R., Yoon, S. 2013; 41 (W1): W492-W498

    Abstract

    To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure mapping experiments, including mutate-and-map contact inference, chromatin footprinting, the Eterna RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use and extend. Here, we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version and additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org.

    View details for DOI 10.1093/nar/gkt501

    View details for Web of Science ID 000323603200079

  • Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE) PLOS ONE Lyskov, S., Chou, F., Conchuir, S. O., Der, B. S., Drew, K., Kuroda, D., Xu, J., Weitzner, B. D., Renfrew, P. D., Sripakdeevong, P., Borgo, B., Havranek, J. J., Kuhlman, B., Kortemme, T., Bonneau, R., Gray, J. J., Das, R. 2013; 8 (5)

    Abstract

    The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org.

    View details for DOI 10.1371/journal.pone.0063906

    View details for Web of Science ID 000320362700078

    View details for PubMedID 23717507

    View details for PubMedCentralID PMC3661552

  • Remodeling a beta-peptide bundle CHEMICAL SCIENCE Molski, M. A., Goodman, J. L., Chou, F., Baker, D., Das, R., Schepartz, A. 2013; 4 (1): 319-324

    View details for DOI 10.1039/c2sc21117c

    View details for Web of Science ID 000311971500036

  • Correcting pervasive errors in RNA crystallography through enumerative structure prediction NATURE METHODS Chou, F., Sripakdeevong, P., Dibrov, S. M., Hermann, T., Das, R. 2013; 10 (1): 74-U105

    Abstract

    Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors and steric clashes. To address these problems, we present enumerative real-space refinement assisted by electron density under Rosetta (ERRASER), coupled to Python-based hierarchical environment for integrated 'xtallography' (PHENIX) diffraction-based refinement. On 24 data sets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves the average R(free) factor, resolves functionally important discrepancies in noncanonical structure and refines low-resolution models to better match higher-resolution models.

    View details for DOI 10.1038/NMETH.2262

    View details for Web of Science ID 000312810100041

    View details for PubMedID 23202432

    View details for PubMedCentralID PMC3531565

  • Advances, Interactions, and Future Developments in the CNS, Phenix, and Rosetta Structural Biology Software Systems ANNUAL REVIEW OF BIOPHYSICS, VOL 42 Adams, P. D., Baker, D., Brunger, A. T., Das, R., DiMaio, F., Read, R. J., Richardson, D. C., Richardson, J. S., Terwilliger, T. C. 2013; 42: 265-287

    Abstract

    Advances in our understanding of macromolecular structure come from experimental methods, such as X-ray crystallography, and also computational analysis of the growing number of atomic models obtained from such experiments. The later analyses have made it possible to develop powerful tools for structure prediction and optimization in the absence of experimental data. In recent years, a synergy between these computational methods for crystallographic structure determination and structure prediction and optimization has begun to be exploited. We review some of the advances in the algorithms used for crystallographic structure determination in the Phenix and Crystallography & NMR System software packages and describe how methods from ab initio structure prediction and refinement in Rosetta have been applied to challenging crystallographic problems. The prospects for future improvement of these methods are discussed.

    View details for DOI 10.1146/annurev-biophys-083012-130253

    View details for Web of Science ID 000321695700013

    View details for PubMedID 23451892

  • An RNA Mapping DataBase for curating RNA structure mapping experiments BIOINFORMATICS Cordero, P., Lucks, J. B., Das, R. 2012; 28 (22): 3006-3008

    Abstract

    We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly.Freely available on the web at http://rmdb.stanford.edu.rhiju@stanford.edu.Supplementary data are available at Bioinformatics Online.

    View details for DOI 10.1093/bioinformatics/bts554

    View details for Web of Science ID 000311303500028

    View details for PubMedID 22976082

    View details for PubMedCentralID PMC3496344

  • Quantitative Dimethyl Sulfate Mapping for Automated RNA Secondary Structure Inference BIOCHEMISTRY Cordero, P., Kladwang, W., VanLang, C. C., Das, R. 2012; 51 (36): 7037-7039

    Abstract

    For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using an energy minimization framework developed for 2'-OH acylation (SHAPE) mapping. On six noncoding RNAs with crystallographic models, DMS-guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, respectively, comparable to or better than those of SHAPE-guided modeling, and bootstrapping provides straightforward confidence estimates. Integrating DMS-SHAPE data and including 1-cyclohexyl(2-morpholinoethyl) carbodiimide metho-p-toluene sulfonate (CMCT) reactivities provide small additional improvements. These results establish DMS mapping, an already routine technique, as a quantitative tool for unbiased RNA secondary structure modeling.

    View details for DOI 10.1021/bi3008802

    View details for Web of Science ID 000308833500001

    View details for PubMedID 22913637

    View details for PubMedCentralID PMC3448840

  • Squaring theory with practice in RNA design CURRENT OPINION IN STRUCTURAL BIOLOGY Bida, J. P., Das, R. 2012; 22 (4): 457-466

    Abstract

    Ribonucleic acid (RNA) design offers unique opportunities for engineering genetic networks and nanostructures that self-assemble within living cells. Recent years have seen the creation of increasingly complex RNA devices, including proof-of-concept applications for in vivo three-dimensional scaffolding, imaging, computing, and control of biological behaviors. Expert intuition and simple design rules--the stability of double helices, the modularity of noncanonical RNA motifs, and geometric closure--have enabled these successful applications. Going beyond heuristics, emerging algorithms may enable automated design of RNAs with nucleotide-level accuracy but, as illustrated on a recent RNA square design, are not yet fully predictive. Looking ahead, technological advances in RNA synthesis and interrogation are poised to radically accelerate the discovery and stringent testing of design methods.

    View details for DOI 10.1016/j.sbi.2012.06.003

    View details for Web of Science ID 000308516800009

    View details for PubMedID 22832174

  • Ultraviolet Shadowing of RNA Can Cause Significant Chemical Damage in Seconds SCIENTIFIC REPORTS Kladwang, W., Hum, J., Das, R. 2012; 2

    Abstract

    Chemical purity of RNA samples is important for high-precision studies of RNA folding and catalytic behavior, but photodamage accrued during ultraviolet (UV) shadowing steps of sample preparation can reduce this purity. Here, we report the quantitation of UV-induced damage by using reverse transcription and single-nucleotide-resolution capillary electrophoresis. We found photolesions in a dozen natural and artificial RNAs; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps recommended for UV shadowing. Irradiation time-courses revealed detectable damage within a few seconds of exposure for 254 nm lamps held at a distance of 5 to 10 cm from 0.5-mm thickness gels. Under these conditions, 200-nucleotide RNAs subjected to 20 seconds of UV shadowing incurred damage to 16-27% of molecules; and, due to a 'skin effect', the molecule-by-molecule distribution of lesions gave 4-fold higher variance than a Poisson distribution. Thicker gels, longer wavelength lamps, and shorter exposure times reduced but did not eliminate damage. These results suggest that RNA biophysical studies should report precautions taken to avoid artifactual heterogeneity from UV shadowing.

    View details for DOI 10.1038/srep00517

    View details for Web of Science ID 000306707600001

    View details for PubMedID 22816040

    View details for PubMedCentralID PMC3399121

  • Metal-ion rescue revisited: Biochemical detection of site-bound metal ions important for RNA folding RNA-A PUBLICATION OF THE RNA SOCIETY Frederiksen, J. K., Li, N., Das, R., Herschlag, D., Piccirilli, J. A. 2012; 18 (6): 1123-1141

    Abstract

    Within the three-dimensional architectures of RNA molecules, divalent metal ions populate specific locations, shedding their water molecules to form chelates. These interactions help the RNA adopt and maintain specific conformations and frequently make essential contributions to function. Defining the locations of these site-bound metal ions remains challenging despite the growing database of RNA structures. Metal-ion rescue experiments have provided a powerful approach to identify and distinguish catalytic metal ions within RNA active sites, but the ability of such experiments to identify metal ions that contribute to tertiary structure acquisition and structural stability is less developed and has been challenged. Herein, we use the well-defined P4-P6 RNA domain of the Tetrahymena group I intron to reevaluate prior evidence against the discriminatory power of metal-ion rescue experiments and to advance thermodynamic descriptions necessary for interpreting these experiments. The approach successfully identifies ligands within the RNA that occupy the inner coordination sphere of divalent metal ions and distinguishes them from ligands that occupy the outer coordination sphere. Our results underscore the importance of obtaining complete folding isotherms and establishing and evaluating thermodynamic models in order to draw conclusions from metal-ion rescue experiments. These results establish metal-ion rescue as a rigorous tool for identifying and dissecting energetically important metal-ion interactions in RNAs that are noncatalytic but critical for RNA tertiary structure.

    View details for DOI 10.1261/rna.028738.111

    View details for Web of Science ID 000304423000003

    View details for PubMedID 22539523

    View details for PubMedCentralID PMC3358636

  • Are Protein Force Fields Getting Better? A Systematic Benchmark on 524 Diverse NMR Measurements. Journal of chemical theory and computation Beauchamp, K. A., Lin, Y. S., Das, R., Pande, V. S. 2012; 8 (4): 1409-1414

    Abstract

    Recent hardware and software advances have enabled simulation studies of protein systems on biophysically-relevant timescales, often revealing the need for improved force fields. Although early force field development was limited by the lack of direct comparisons between simulation and experiment, recent work from several labs has demonstrated direct calculation of NMR observables from protein simulations. Here we quantitatively evaluate recent molecular dynamics force fields against a suite of 524 chemical shift and J coupling ((3)JH(N)H(α), (3)JH(N)C(β), (3)JH(α)C', (3)JH(N)C', and (3)JH(α)N) measurements on dipeptides, tripeptides, tetra-alanine, and ubiquitin. Of the force fields examined (ff96, ff99, ff03, ff03*, ff03w, ff99sb*, ff99sb-ildn, ff99sb-ildn-phi, ff99sb-ildn-nmr, CHARMM27, OPLS-AA), two force fields (ff99sb-ildn-phi, ff99sb-ildn-nmr) combining recent side chain and backbone torsion modifications achieve high accuracy in our benchmark. For the two optimal force fields, the calculation error is comparable to the uncertainty in the experimental comparison. This observation suggests that extracting additional force field improvements from NMR data may require increased accuracy in J coupling and chemical shift prediction. To further investigate the limitations of current force fields, we also consider conformational populations of dipeptides, which were recently estimated using vibrational spectroscopy.

    View details for DOI 10.1021/ct2007814

    View details for PubMedID 22754404

    View details for PubMedCentralID PMC3383641

  • RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction RNA-A PUBLICATION OF THE RNA SOCIETY Cruz, J. A., Blanchet, M., Boniecki, M., Bujnicki, J. M., Chen, S., Cao, S., Das, R., Ding, F., Dokholyan, N. V., Flores, S. C., Huang, L., Lavender, C. A., Lisi, V., Major, F., Mikolajczak, K., Patel, D. J., Philips, A., Puton, T., Santalucia, J., Sijenyi, F., Hermann, T., Rother, K., Rother, M., Serganov, A., Skorupski, M., Soltysinski, T., Sripakdeevong, P., Tuszynska, I., Weeks, K. M., Waldsich, C., Wildauer, M., Leontis, N. B., Westhof, E. 2012; 18 (4): 610-625

    Abstract

    We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.

    View details for DOI 10.1261/rna.031054.111

    View details for Web of Science ID 000301954600002

    View details for PubMedID 22361291

    View details for PubMedCentralID PMC3312550

  • Automated RNA Structure Prediction Uncovers a Kink-Turn Linker in Double Glycine Riboswitches JOURNAL OF THE AMERICAN CHEMICAL SOCIETY Kladwang, W., Chou, F., Das, R. 2012; 134 (3): 1404-1407

    Abstract

    The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four prediction tools to a class of double glycine riboswitches that can bind two ligands cooperatively. A novel method (BPPalign), RMdetect, JAR3D, and Rosetta 3D modeling give consistent predictions for a new stem P0 and a kink-turn motif. These elements structure the linker between the RNAs' double aptamers. Chemical mapping on the Fusobacterium nucleatum riboswitch with N-methylisatoic anhydride, dimethyl sulfate and 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate probing, mutate-and-map studies, and mutation/rescue experiments all provide strong evidence for the structured linker. Under solution conditions that permit rigorous thermodynamic analysis, disrupting this helix-junction-helix structure gives 120- and 6-30-fold poorer dissociation constants for the RNA's two glycine-binding transitions, corresponding to an overall energetic impact of 4.3 ± 0.5 kcal/mol. Prior biochemical and crystallography studies did not include this critical element due to over-truncation of the RNA. We speculate that several further undiscovered elements are likely to exist in the flanking regions of this and other functional RNAs, and automated prediction tools can play a useful role in their detection and dissection.

    View details for DOI 10.1021/ja2093508

    View details for PubMedID 22192063

  • An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Sripakdeevong, P., Kladwang, W., Das, R. 2011; 108 (51): 20573-20578

    Abstract

    Atomic-accuracy structure prediction of macromolecules should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz," for recursively constructing well-packed atomic-detail models in small steps, enumerating several million conformations for each monomer, and covering all build-up paths. By making use of high-performance computing and the Rosetta framework, we provide first tests of this hypothesis on a benchmark of 15 RNA loop-modeling problems drawn from riboswitches, ribozymes, and the ribosome, including 10 cases that are not solvable by current knowledge-based modeling approaches. For each loop problem, this deterministic stepwise assembly method either reaches atomic accuracy or exposes flaws in Rosetta's all-atom energy function, indicating the resolution of the conformational sampling bottleneck. As a further rigorous test, we have carried out a blind all-atom prediction for a noncanonical RNA motif, the C7.2 tetraloop/receptor, and validated this model through nucleotide-resolution chemical mapping experiments. Stepwise assembly is an enumerative, ab initio build-up method that systematically outperforms existing Monte Carlo and knowledge-based methods for 3D structure prediction.

    View details for DOI 10.1073/pnas.1106516108

    View details for Web of Science ID 000298289400065

    View details for PubMedID 22143768

    View details for PubMedCentralID PMC3251086

  • A two-dimensional mutate-and-map strategy for non-coding RNA structure NATURE CHEMISTRY Kladwang, W., VanLang, C. C., Cordero, P., Das, R. 2011; 3 (12): 954-962

    Abstract

    Non-coding RNAs fold into precise base-pairing patterns to carry out critical roles in genetic regulation and protein synthesis, but determining RNA structure remains difficult. Here, we show that coupling systematic mutagenesis with high-throughput chemical mapping enables accurate base-pair inference of domains from ribosomal RNA, ribozymes and riboswitches. For a six-RNA benchmark that has challenged previous chemical/computational methods, this 'mutate-and-map' strategy gives secondary structures that are in agreement with crystallography (helix error rates, 2%), including a blind test on a double-glycine riboswitch. Through modelling of partially ordered states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the data report on tertiary contacts within non-coding RNAs, and coupling to the Rosetta/FARFAR algorithm gives nucleotide-resolution three-dimensional models (helix root-mean-squared deviation, 5.7 Å) of an adenine riboswitch. These results establish a promising two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behaviour.

    View details for DOI 10.1038/NCHEM.1176

    View details for Web of Science ID 000297685800014

    View details for PubMedID 22109276

  • Understanding the Errors of SHAPE-Directed RNA Structure Modeling BIOCHEMISTRY Kladwang, W., VanLang, C. C., Cordero, P., Das, R. 2011; 50 (37): 8049-8056

    Abstract

    Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.

    View details for DOI 10.1021/bi4200524n

    View details for Web of Science ID 000294791100021

    View details for PubMedID 21842868

    View details for PubMedCentralID PMC3172344

  • Quantitative comparison of villin headpiece subdomain simulations and triplet-triplet energy transfer experiments PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Beauchamp, K. A., Ensign, D. L., Das, R., Pande, V. S. 2011; 108 (31): 12734-12739

    Abstract

    As the fastest folding protein, the villin headpiece (HP35) serves as an important bridge between simulation and experimental studies of protein folding. Despite the simplicity of this system, experiments continue to reveal a number of surprises, including structure in the unfolded state and complex equilibrium dynamics near the native state. Using 2.5 ms of molecular dynamics and Markov state models, we connect to current experimental results in three ways. First, we present and validate a novel method for the quantitative prediction of triplet-triplet energy transfer experiments. Second, we construct a many-state model for HP35 that is consistent with previous experiments. Finally, we predict contact-formation time traces for all 1,225 possible triplet-triplet energy transfer experiments on HP35.

    View details for DOI 10.1073/pnas.1010880108

    View details for Web of Science ID 000293385700043

    View details for PubMedID 21768345

    View details for PubMedCentralID PMC3150881

  • HiTRACE: high-throughput robust analysis for capillary electrophoresis BIOINFORMATICS Yoon, S., Kim, J., Hum, J., Kim, H., Park, S., Kladwang, W., Das, R. 2011; 27 (13): 1798-1805

    Abstract

    Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical mapping for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics and kinetics. In particular, the slow rate and poor automation of available analysis tools have bottlenecked a new generation of studies involving hundreds of CE profiles per experiment.We propose a computational method called high-throughput robust analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in large-scale nucleic acid CE analysis, including the profile alignment that has heretofore been a rate-limiting step in the highest throughput experiments. We illustrate the application of HiTRACE on 13 datasets representing 4 different RNAs, 3 chemical modification strategies and up to 480 single mutant variants; the largest datasets each include 87 360 bands. By applying a series of robust dynamic programming algorithms, HiTRACE outperforms prior tools in terms of alignment and fitting quality, as assessed by measures including the correlation between quantified band intensities between replicate datasets. Furthermore, while the smallest of these datasets required 7-10 h of manual intervention using prior approaches, HiTRACE quantitation of even the largest datasets herein was achieved in 3-12 min. The HiTRACE method, therefore, resolves a critical barrier to the efficient and accurate analysis of nucleic acid structure in experiments involving tens of thousands of electrophoretic bands.

    View details for DOI 10.1093/bioinformatics/btr277

    View details for Web of Science ID 000291752600058

    View details for PubMedID 21561922

  • Sharing and archiving nucleic acid structure mapping data RNA-A PUBLICATION OF THE RNA SOCIETY Rocca-Serra, P., Bellaousov, S., Birmingham, A., Chen, C., Cordero, P., Das, R., Davis-Neulander, L., Duncan, C. D., Halvorsen, M., Knight, R., Leontis, N. B., Mathews, D. H., Ritz, J., Stombaugh, J., Weeks, K. M., Zirbel, C. L., Laederach, A. 2011; 17 (7): 1204-1212

    Abstract

    Nucleic acids are particularly amenable to structural characterization using chemical and enzymatic probes. Each individual structure mapping experiment reveals specific information about the structure and/or dynamics of the nucleic acid. Currently, there is no simple approach for making these data publically available in a standardized format. We therefore developed a standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments, or SNRNASMs. We propose a schema for sharing nucleic acid chemical probing data that uses generic public servers for storing, retrieving, and searching the data. We have also developed a consistent nomenclature (ontology) within the Ontology of Biomedical Investigations (OBI), which provides unique identifiers (termed persistent URLs, or PURLs) for classifying the data. Links to standardized data sets shared using our proposed format along with a tutorial and links to templates can be found at http://snrnasm.bio.unc.edu.

    View details for DOI 10.1261/rna.2753211

    View details for Web of Science ID 000291683500002

    View details for PubMedID 21610212

    View details for PubMedCentralID PMC3138558

  • Four Small Puzzles That Rosetta Doesn't Solve PLOS ONE Das, R. 2011; 6 (5)

    Abstract

    A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.

    View details for DOI 10.1371/journal.pone.0020044

    View details for Web of Science ID 000290793400036

    View details for PubMedID 21625446

    View details for PubMedCentralID PMC3098862

  • A mutate-and-map strategy accurately infers the base pairs of a 35-nucleotide model RNA RNA-A PUBLICATION OF THE RNA SOCIETY Kladwang, W., Cordero, P., Das, R. 2011; 17 (3): 522-534

    Abstract

    We present a rapid experimental strategy for inferring base pairs in structured RNAs via an information-rich extension of classic chemical mapping approaches. The mutate-and-map method, previously applied to a DNA/RNA helix, systematically searches for single mutations that enhance the chemical accessibility of base-pairing partners distant in sequence. To test this strategy for structured RNAs, we have carried out mutate-and-map measurements for a 35-nt hairpin, called the MedLoop RNA, embedded within an 80-nt sequence. We demonstrate the synthesis of all 105 single mutants of the MedLoop RNA sequence and present high-throughput DMS, CMCT, and SHAPE modification measurements for this library at single-nucleotide resolution. The resulting two-dimensional data reveal visually clear, punctate features corresponding to RNA base pair interactions as well as more complex features; these signals can be qualitatively rationalized by comparison to secondary structure predictions. Finally, we present an automated, sequence-blind analysis that permits the confident identification of nine of the 10 MedLoop RNA base pairs at single-nucleotide resolution, while discriminating against all 1460 false-positive base pairs. These results establish the accuracy and information content of the mutate-and-map strategy and support its feasibility for rapidly characterizing the base-pairing patterns of larger and more complex RNA systems.

    View details for DOI 10.1261/rna.2516311

    View details for Web of Science ID 000287195900014

    View details for PubMedID 21239468

    View details for PubMedCentralID PMC3039151

  • ROSETTA3: AN OBJECT-ORIENTED SOFTWARE SUITE FOR THE SIMULATION AND DESIGN OF MACROMOLECULES METHODS IN ENZYMOLOGY, VOL 487: COMPUTER METHODS, PT C Leaver-Fay, A., Tyka, M., Lewis, S. M., Lange, O. F., Thompson, J., Jacak, R., Kaufman, K., Renfrew, P. D., Smith, C. A., Sheffler, W., Davis, I. W., Cooper, S., Treuille, A., Mandell, D. J., Richter, F., Ban, Y. A., Fleishman, S. J., Corn, J. E., Kim, D. E., Lyskov, S., Berrondo, M., Mentzer, S., Popovic, Z., Havranek, J. J., Karanicolas, J., Das, R., Meiler, J., Kortemme, T., Gray, J. J., Kuhlman, B., Baker, D., Bradley, P. 2011: 545-574

    Abstract

    We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.

    View details for DOI 10.1016/S0076-6879(11)87019-9

    View details for Web of Science ID 000286532000019

    View details for PubMedID 21187238

  • Rosetta in CAPRI rounds 13-19 PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Fleishman, S. J., Corn, J. E., Strauch, E. M., Whitehead, T. A., Andre, I., Thompson, J., Havranek, J. J., Das, R., Bradley, P., Baker, D. 2010; 78 (15): 3212-3218

    Abstract

    Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13-19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.

    View details for DOI 10.1002/prot.22784

    View details for Web of Science ID 000283565000020

    View details for PubMedID 20597089

    View details for PubMedCentralID PMC2952713

  • A Mutate-and-Map Strategy for Inferring Base Pairs in Structured Nucleic Acids: Proof of Concept on a DNA/RNA Helix BIOCHEMISTRY Kladwang, W., Das, R. 2010; 49 (35): 7414-7416

    Abstract

    We propose a rapid chemical strategy for identifying base pairs in structured nucleic acid systems. The approach goes beyond traditional chemical mapping approaches by monitoring perturbations of each residue's chemical accessibility in response to systematic mutagenesis of residues that are distant in sequence but nearby in three dimensions. As a proof of concept, we present high-throughput dimethyl sulfate accessibility data for a chimeric DNA/RNA system in which every possible sequence variation and deletion in a 20 bp region has been synthesized and tested. The data demonstrate that 88% of the system's base pairs can be robustly inferred, with A/A and T/C DNA/RNA mismatches giving the strongest signals. These results point to the feasibility of rapid base pair inference in larger and more complex nucleic acid systems with unknown structure.

    View details for DOI 10.1021/bi101123g

    View details for Web of Science ID 000281305200002

    View details for PubMedID 20677780

  • Atomic accuracy in predicting and designing noncanonical RNA structure NATURE METHODS Das, R., Karanicolas, J., Baker, D. 2010; 7 (4): 291-294

    Abstract

    We present fragment assembly of RNA with full-atom refinement (FARFAR), a Rosetta framework for predicting and designing noncanonical motifs that define RNA tertiary structure. In a test set of thirty-two 6-20-nucleotide motifs, FARFAR recapitulated 50% of the experimental structures at near-atomic accuracy. Sequence redesign calculations recovered native bases at 65% of residues engaged in noncanonical interactions, and we experimentally validated mutations predicted to stabilize a signal recognition particle domain.

    View details for DOI 10.1038/NMETH.1433

    View details for Web of Science ID 000276150600018

    View details for PubMedID 20190761

    View details for PubMedCentralID PMC2854559

  • Simultaneous prediction of protein folding and docking at high resolution PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Das, R., Andre, I., Shen, Y., Wu, Y., Lemak, A., Bansal, S., Arrowsmith, C. H., Szyperski, T., Baker, D. 2009; 106 (45): 18978-18983

    Abstract

    Interleaved dimers and higher order symmetric oligomers are ubiquitous in biology but present a challenge to de novo structure prediction methodology: The structure adopted by a monomer can be stabilized largely by interactions with other monomers and hence not the lowest energy state of a single chain. Building on the Rosetta framework, we present a general method to simultaneously model the folding and docking of multiple-chain interleaved homo-oligomers. For more than a third of the cases in a benchmark set of interleaved homo-oligomers, the method generates near-native models of large alpha-helical bundles, interlocking beta sandwiches, and interleaved alpha/beta motifs with an accuracy high enough for molecular replacement based phasing. With the incorporation of NMR chemical shift information, accurate models can be obtained consistently for symmetric complexes with as many as 192 total amino acids; a blind prediction was within 1 A rmsd of the traditionally determined NMR structure, and fit independently collected RDC data equally well. Together, these results show that the Rosetta "fold-and-dock" protocol can produce models of homo-oligomeric complexes with near-atomic-level accuracy and should be useful for crystallographic phasing and the rapid determination of the structures of multimers with limited NMR information.

    View details for DOI 10.1073/pnas.0904407106

    View details for Web of Science ID 000271637500021

    View details for PubMedID 19864631

    View details for PubMedCentralID PMC2770007

  • A robust peak detection method for RNA structure inference by high-throughput contact mapping BIOINFORMATICS Kim, J., Yu, S., Shim, B., Kim, H., Min, H., Chung, E., Das, R., Yoon, S. 2009; 25 (9): 1137-1144

    Abstract

    For high-throughput prediction of the helical arrangements of large RNA molecules, an innovative method termed multiplexed hydroxyl radical (*OH) cleavage analysis (MOHCA) has been proposed. A key step in this promising technique is to detect peaks accurately from noisy radioactivity profiles. Since manual peak finding is laborious and prone to error, an automated peak detection method to improve the accuracy and throughput of MOHCA is required. Existing methods were not applicable to MOHCA due to their high false positive rates.We developed a two-step computational method that can detect peaks from MOHCA profiles in a robust manner. The first step exploits an ensemble of linear and non-linear signal processing techniques to find true peak candidates. In the second step, a binary classifier trained with the characteristics of true and false peaks is used to eliminate false peaks out of the peak candidates. We tested the proposed approach with 2002 MOHCA cleavage profiles and obtained the median recall, precision and F-measure values of 0.917, 0.750 and 0.830, respectively. Compared with the alternatives considered, the proposed method was able to handle false peaks substantially better, thus resulting in 51.0-71.8% higher median values of precision and F-measure.The software and supplementary data are available at http://dna.korea.ac.kr/pub/mohca.

    View details for DOI 10.1093/bioinformatics/btp110

    View details for Web of Science ID 000265523300007

    View details for PubMedID 19246511

  • Prospects for de novo phasing with de novo protein models ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY Das, R., Baker, D. 2009; 65: 169-175

    Abstract

    The prospect of phasing diffraction data sets ;de novo' for proteins with previously unseen folds is appealing but largely untested. In a first systematic exploration of phasing with Rosetta de novo models, it is shown that all-atom refinement of coarse-grained models significantly improves both the model quality and performance in molecular replacement with the Phaser software. 15 new cases of diffraction data sets that are unambiguously phased with de novo models are presented. These diffraction data sets represent nine space groups and span a large range of solvent contents (33-79%) and asymmetric unit copy numbers (1-4). No correlation is observed between the ease of phasing and the solvent content or asymmetric unit copy number. Instead, a weak correlation is found with the length of the modeled protein: larger proteins required somewhat less accurate models to give successful molecular replacement. Overall, the results of this survey suggest that de novo models can phase diffraction data for approximately one sixth of proteins with sizes of 100 residues or less. However, for many of these cases, ;de novo phasing with de novo models' requires significant investment of computational power, much greater than 10(3) CPU days per target. Improvements in conformational search methods will be necessary if molecular replacement with de novo models is to become a practical tool for targets without homology to previously solved protein structures.

    View details for DOI 10.1107/S0907444908020039

    View details for Web of Science ID 000263557900009

    View details for PubMedID 19171972

    View details for PubMedCentralID PMC2631639

  • Structure prediction for CASP8 with all-atom refinement using Rosetta PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS Raman, S., Vernon, R., Thompson, J., Tyka, M., Sadreyev, R., Pei, J., Kim, D., Kellogg, E., DiMaio, F., Lange, O., Kinch, L., Sheffler, W., Kim, B., Das, R., Grishin, N. V., Baker, D. 2009; 77: 89-99

    Abstract

    We describe predictions made using the Rosetta structure prediction methodology for the Eighth Critical Assessment of Techniques for Protein Structure Prediction. Aggressive sampling and all-atom refinement were carried out for nearly all targets. A combination of alignment methodologies was used to generate starting models from a range of templates, and the models were then subjected to Rosetta all atom refinement. For the 64 domains with readily identified templates, the best submitted model was better than the best alignment to the best template in the Protein Data Bank for 24 cases, and improved over the best starting model for 43 cases. For 13 targets where only very distant sequence relationships to proteins of known structure were detected, models were generated using the Rosetta de novo structure prediction methodology followed by all-atom refinement; in several cases the submitted models were better than those based on the available templates. Of the 12 refinement challenges, the best submitted model improved on the starting model in seven cases. These improvements over the starting template-based models and refinement tests demonstrate the power of Rosetta structure refinement in improving model accuracy.

    View details for DOI 10.1002/prot.22540

    View details for Web of Science ID 000272244700009

    View details for PubMedID 19701941

    View details for PubMedCentralID PMC3688471

  • Remeasuring the double helix SCIENCE Mathew-Fenn, R. S., Das, R., Harbury, P. A. 2008; 322 (5900): 446-449

    Abstract

    DNA is thought to behave as a stiff elastic rod with respect to the ubiquitous mechanical deformations inherent to its biology. To test this model at short DNA lengths, we measured the mean and variance of end-to-end length for a series of DNA double helices in solution, using small-angle x-ray scattering interference between gold nanocrystal labels. In the absence of applied tension, DNA is at least one order of magnitude softer than measured by single-molecule stretching experiments. Further, the data rule out the conventional elastic rod model. The variance in end-to-end length follows a quadratic dependence on the number of base pairs rather than the expected linear dependence, indicating that DNA stretching is cooperative over more than two turns of the DNA double helix. Our observations support the idea of long-range allosteric communication through DNA structure.

    View details for DOI 10.1126/science.1158881

    View details for Web of Science ID 000260094500048

    View details for PubMedID 18927394

    View details for PubMedCentralID PMC2684691

  • Structural inference of native and partially folded RNA by high-throughput contact mapping PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Dast, R., Kudaravalli, M., Jonikas, M., Laederach, A., Fong, R., Schwans, J. P., Baker, D., Piccirilli, J. A., Altman, R. B., Herschlag, D. 2008; 105 (11): 4144-4149

    Abstract

    The biological behaviors of ribozymes, riboswitches, and numerous other functional RNA molecules are critically dependent on their tertiary folding and their ability to sample multiple functional states. The conformational heterogeneity and partially folded nature of most of these states has rendered their characterization by high-resolution structural approaches difficult or even intractable. Here we introduce a method to rapidly infer the tertiary helical arrangements of large RNA molecules in their native and non-native solution states. Multiplexed hydroxyl radical (.OH) cleavage analysis (MOHCA) enables the high-throughput detection of numerous pairs of contacting residues via random incorporation of radical cleavage agents followed by two-dimensional gel electrophoresis. We validated this technology by recapitulating the unfolded and native states of a well studied model RNA, the P4-P6 domain of the Tetrahymena ribozyme, at subhelical resolution. We then applied MOHCA to a recently discovered third state of the P4-P6 RNA that is stabilized by high concentrations of monovalent salt and whose partial order precludes conventional techniques for structure determination. The three-dimensional portrait of a compact, non-native RNA state reveals a well ordered subset of native tertiary contacts, in contrast to the dynamic but otherwise similar molten globule states of proteins. With its applicability to nearly any solution state, we expect MOHCA to be a powerful tool for illuminating the many functional structures of large RNA molecules and RNA/protein complexes.

    View details for DOI 10.1073/pnas.0709032105

    View details for Web of Science ID 000254263300015

    View details for PubMedID 18322008

    View details for PubMedCentralID PMC2393762

  • Macromolecular modeling with Rosetta ANNUAL REVIEW OF BIOCHEMISTRY Das, R., Baker, D. 2008; 77: 363-382

    Abstract

    Advances over the past few years have begun to enable prediction and design of macromolecular structures at near-atomic accuracy. Progress has stemmed from the development of reasonably accurate and efficiently computed all-atom potential functions as well as effective conformational sampling strategies appropriate for searching a highly rugged energy landscape, both driven by feedback from structure prediction and design tests. A unified energetic and kinematic framework in the Rosetta program allows a wide range of molecular modeling problems, from fibril structure prediction to RNA folding to the design of new protein interfaces, to be readily investigated and highlights areas for improvement. The methodology enables the creation of novel molecules with useful functions and holds promise for accelerating experimental structural inference. Emerging connections to crystallographic phasing, NMR modeling, and lower-resolution approaches are described and critically assessed.

    View details for DOI 10.1146/annurev.biochem.77.062906.171838

    View details for Web of Science ID 000257596800016

    View details for PubMedID 18410248

  • High-resolution structure prediction and the crystallographic phase problem NATURE Qian, B., Raman, S., Das, R., Bradley, P., McCoy, A. J., Read, R. J., Baker, D. 2007; 450 (7167): 259-U7

    Abstract

    The energy-based refinement of low-resolution protein structure models to atomic-level accuracy is a major challenge for computational structural biology. Here we describe a new approach to refining protein structure models that focuses sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom force field. In applications to models produced using nuclear magnetic resonance data and to comparative models based on distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the backbone conformations and the placement of core side chains. Furthermore, the resulting models satisfy a particularly stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach the high accuracy required for molecular replacement without any experimental phase information and in the absence of templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of high-resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate previous models.

    View details for DOI 10.1038/nature06249

    View details for Web of Science ID 000250746200052

    View details for PubMedID 17934447

    View details for PubMedCentralID PMC2504711

  • Automated de novo prediction of native-like RNA tertiary structures PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Das, R., Baker, D. 2007; 104 (37): 14664-14669

    Abstract

    RNA tertiary structure prediction has been based almost entirely on base-pairing constraints derived from phylogenetic covariation analysis. We describe here a complementary approach, inspired by the Rosetta low-resolution protein structure prediction method, that seeks the lowest energy tertiary structure for a given RNA sequence without using evolutionary information. In a benchmark test of 20 RNA sequences with known structure and lengths of approximately 30 nt, the new method reproduces better than 90% of Watson-Crick base pairs, comparable with the accuracy of secondary structure prediction methods. In more than half the cases, at least one of the top five models agrees with the native structure to better than 4 A rmsd over the backbone. Most importantly, the method recapitulates more than one-third of non-Watson-Crick base pairs seen in the native structures. Tandem stacks of "sheared" base pairs, base triplets, and pseudoknots are among the noncanonical features reproduced in the models. In the cases in which none of the top five models were native-like, higher energy conformations similar to the native structures are still sampled frequently but not assigned low energies. These results suggest that modest improvements in the energy function, together with the incorporation of information from phylogenetic covariance, may allow confident and accurate structure prediction for larger and more complex RNA chains.

    View details for DOI 10.1073/pnas.0703836104

    View details for Web of Science ID 000249513000023

    View details for PubMedID 17726102

    View details for PubMedCentralID PMC1955458

  • Structure prediction for CASP7 targets using extensive all-atom refinement with Rosetta@home. Proteins Das, R., Qian, B., Raman, S., Vernon, R., Thompson, J., Bradley, P., Khare, S., Tyka, M. D., Bhat, D., Chivian, D., Kim, D. E., Sheffler, W. H., Malmström, L., Wollacott, A. M., Wang, C., Andre, I., Baker, D. 2007; 69: 118-128

    Abstract

    We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.

    View details for PubMedID 17894356

  • Structure prediction for CABP7 targets using extensive all-atom refinement with Rosetta@home 7th Meeting on Critical Assessment of Techniques for Protein Structure Prediction Das, R., Bin Qian, Raman, S., Vernon, R., Thompson, J., Bradley, P., Khare, S., Tyka, M. D., Bhat, D., Chivian, D., Kim, D. E., Sheffler, W. H., Malmstrom, L., Wollacott, A. M., Wang, C., Andre, I., Baker, D. WILEY-BLACKWELL. 2007: 118–128

    Abstract

    We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.

    View details for DOI 10.1002/prot.21636

    View details for Web of Science ID 000251502400013

  • Determining the Mg2+ stoichiometry for folding an RNA metal ion core JOURNAL OF THE AMERICAN CHEMICAL SOCIETY Das, R., Travers, K. J., Bai, Y., Herschlag, D. 2005; 127 (23): 8272-8273

    Abstract

    The folding and catalytic function of RNA molecules depend on their interactions with divalent metal ions, such as magnesium. As with every molecular process, the most basic knowledge required for understanding the close relationship of an RNA with its metal ions is the stoichiometry of the interaction. Unfortunately, inventories of the numbers of divalent ions associated with unfolded and folded RNA states have been unattainable. A common approach has been to interpret Hill coefficients fit to folding equilibria as the number of metal ions bound upon folding. However, this approach is vitiated by the presence of diffusely associated divalent ions in a dynamic ion atmosphere and by the likelihood of multiple transitions along a folding pathway. We demonstrate that the use of molar concentrations of background monovalent salt can alleviate these complications. These simplifying solution conditions allow a precise determination of the stoichiometry of the magnesium ions involved in folding the metal ion core of the P4-P6 domain of the Tetrahymena group I ribozyme. Hill analysis of hydroxyl radical footprinting data suggests that the P4-P6 RNA core folds cooperatively upon the association of two metal ions. This unexpectedly small stoichiometry is strongly supported by counting magnesium ions associated with the P4-P6 RNA via fluorescence titration and atomic emission spectroscopy. By pinpointing the metal ion stoichiometry, these measurements provide a critical but previously missing step in the thermodynamic dissection of the coupling between metal ion binding and RNA folding.

    View details for DOI 10.1021/ja051422h

    View details for Web of Science ID 000229751100020

    View details for PubMedID 15941246

    View details for PubMedCentralID PMC2538950

  • SAFA: Semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments RNA-A PUBLICATION OF THE RNA SOCIETY Das, R., Laederach, A., Pearlman, S. M., Herschlag, D., Altman, R. B. 2005; 11 (3): 344-354

    Abstract

    Footprinting is a powerful and widely used tool for characterizing the structure, thermodynamics, and kinetics of nucleic acid folding and ligand binding reactions. However, quantitative analysis of the gel images produced by footprinting experiments is tedious and time-consuming, due to the absence of informatics tools specifically designed for footprinting analysis. We have developed SAFA, a semi-automated footprinting analysis software package that achieves accurate gel quantification while reducing the time to analyze a gel from several hours to 15 min or less. The increase in analysis speed is achieved through a graphical user interface that implements a novel methodology for lane and band assignment, called "gel rectification," and an optimized band deconvolution algorithm. The SAFA software yields results that are consistent with published methodologies and reduces the investigator-dependent variability compared to less automated methods. These software developments simplify the analysis procedure for a footprinting gel and can therefore facilitate the use of quantitative footprinting techniques in nucleic acid laboratories that otherwise might not have considered their use. Further, the increased throughput provided by SAFA may allow a more comprehensive understanding of molecular interactions. The software and documentation are freely available for download at http://safa.stanford.edu.

    View details for DOI 10.1261/rna.7214405

    View details for Web of Science ID 000227190000011

    View details for PubMedID 15701734

    View details for PubMedCentralID PMC1262685