Blind tests of RNA-protein binding affinity prediction.
Proceedings of the National Academy of Sciences of the United States of America
Interactions between RNA and proteins are pervasive in biology, driving fundamental processes such as protein translation and participating in the regulation of gene expression. Modeling the energies of RNA-protein interactions is therefore critical for understanding and repurposing living systems but has been hindered by complexities unique to RNA-protein binding. Here, we bring together several advances to complete a calculation framework for RNA-protein binding affinities, including a unified free energy function for bound complexes, automated Rosetta modeling of mutations, and use of secondary structure-based energetic calculations to model unbound RNA states. The resulting Rosetta-Vienna RNP-DeltaDeltaG method achieves root-mean-squared errors (RMSEs) of 1.3 kcal/mol on high-throughput MS2 coat protein-RNA measurements and 1.5 kcal/mol on an independent test set involving the signal recognition particle, human U1A, PUM1, and FOX-1. As a stringent test, the method achieves RMSE accuracy of 1.4 kcal/mol in blind predictions of hundreds of human PUM2-RNA relative binding affinities. Overall, these RMSE accuracies are significantly better than those attained by prior structure-based approaches applied to the same systems. Importantly, Rosetta-Vienna RNP-DeltaDeltaG establishes a framework for further improvements in modeling RNA-protein binding that can be tested by prospective high-throughput measurements on new systems.
View details for DOI 10.1073/pnas.1819047116
View details for PubMedID 30962376
De novo computational RNA modeling into cryo-EM maps of large ribonucleoprotein complexes.
Increasingly, cryo-electron microscopy (cryo-EM) is used to determine the structures of RNA-protein assemblies, but nearly all maps determined with this method have biologically important regions where the local resolution does not permit RNA coordinate tracing. To address these omissions, we present de novo ribonucleoprotein modeling in real space through assembly of fragments together with experimental density in Rosetta (DRRAFTER). We show that DRRAFTER recovers near-native models for a diverse benchmark set of RNA-protein complexes including the spliceosome, mitochondrial ribosome, and CRISPR-Cas9-sgRNA complexes; rigorous blind tests include yeast U1 snRNP and spliceosomal P complex maps. Additionally, to aid in model interpretation, we present a method for reliable in situ estimation of DRRAFTER model accuracy. Finally, we apply DRRAFTER to recently determined maps of telomerase, the HIV-1 reverse transcriptase initiation complex, and the packaged MS2 genome, demonstrating the acceleration of accurate model building in challenging cases.
View details for DOI 10.1038/s41592-018-0172-2
View details for PubMedID 30377372
Sampling Native-like Structures of RNA-Protein Complexes through Rosetta Folding and Docking.
Structure (London, England : 1993)
RNA-protein complexes underlie numerous cellular processes including translation, splicing, and posttranscriptional regulation of gene expression. The structures of these complexes are crucial to their functions but often elude high-resolution structure determination. Computational methods are needed that can integrate low-resolution data for RNA-protein complexes while modeling de novo the large conformational changes of RNA components upon complex formation. To address this challenge, we describe RNP-denovo, a Rosetta method to simultaneously fold-and-dock RNA to a protein surface. On a benchmark set of diverse RNA-protein complexes not solvable with prior strategies, RNP-denovo consistently sampled native-like structures with better than nucleotide resolution. We revisited three past blind modeling challenges involving the spliceosome, telomerase, and a methyltransferase-ribosomal RNA complex in which previous methods gave poor results. When coupled with the same sparse FRET, crosslinking, and functional data used previously, RNP-denovo gave models with significantly improved accuracy. These results open a route to modeling global folds of RNA-protein complexes from low-resolution data.
View details for DOI 10.1016/j.str.2018.10.001
View details for PubMedID 30416038
Architecture of an HIV-1 reverse transcriptase initiation complex.
Reverse transcription of the HIV-1 RNA genome into double-stranded DNA is a central step in viral infection 1 and a common target of antiretroviral drugs 2 . The reaction is catalysed by viral reverse transcriptase (RT)3,4 that is packaged in an infectious virion with two copies of viral genomic RNA 5 each bound to host lysine 3 transfer RNA (tRNALys3), which acts as a primer for initiation of reverse transcription6,7. Upon viral entry into cells, initiation is slow and non-processive compared to elongation8,9. Despite extensive efforts, the structural basis of RT function during initiation has remained a mystery. Here we use cryo-electron microscopy to determine a three-dimensional structure of an HIV-1 RT initiation complex. In our structure, RT is in an inactive polymerase conformation with open fingers and thumb and with the nucleic acid primer-template complex shifted away from the active site. The primer binding site (PBS) helix formed between tRNALys3 and HIV-1 RNA lies in the cleft of RT and is extended by additional pairing interactions. The 5' end of the tRNA refolds and stacks on the PBS to create a long helical structure, while the remaining viral RNA forms two helical stems positioned above the RT active site, with a linker that connects these helices to the RNase H region of the PBS. Our results illustrate how RNA structure in the initiation complex alters RT conformation to decrease activity, highlighting a potential target for drug action.
View details for DOI 10.1038/s41586-018-0055-9
View details for PubMedID 29695867
The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design.
Journal of chemical theory and computation
Over the past decade, the Rosetta biomolecular modeling suite has informed diverse biological questions and engineering challenges ranging from interpretation of low-resolution structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parametrized from small-molecule and X-ray crystal structure data used to approximate the energy associated with each biomolecule conformation. This paper describes the mathematical models and physical concepts that underlie the latest Rosetta energy function, called the Rosetta Energy Function 2015 (REF15). Applying these concepts, we explain how to use Rosetta energies to identify and analyze the features of biomolecular models. Finally, we discuss the latest advances in the energy function that extend its capabilities from soluble proteins to also include membrane proteins, peptides containing noncanonical amino acids, small molecules, carbohydrates, nucleic acids, and other macromolecules.
View details for DOI 10.1021/acs.jctc.7b00125
View details for PubMedID 28430426
Single-molecule FRET-Rosetta reveals RNA structural rearrangements during human telomerase catalysis
2017; 23 (2): 175-188
Maintenance of telomeres by telomerase permits continuous proliferation of rapidly dividing cells, including the majority of human cancers. Despite its direct biomedical significance, the architecture of the human telomerase complex remains unknown. Generating homogeneous telomerase samples has presented a significant barrier to developing improved structural models. Here we pair single-molecule Förster resonance energy transfer (smFRET) measurements with Rosetta modeling to map the conformations of the essential telomerase RNA core domain within the active ribonucleoprotein. FRET-guided modeling places the essential pseudoknot fold distal to the active site on a protein surface comprising the C-terminal element, a domain that shares structural homology with canonical polymerase thumb domains. An independently solved medium-resolution structure of Tetrahymena telomerase provides a blind test of our modeling methodology and sheds light on the structural homology of this domain across diverse organisms. Our smFRET-Rosetta models reveal nanometer-scale rearrangements within the RNA core domain during catalysis. Taken together, our FRET data and pseudoatomic molecular models permit us to propose a possible mechanism for how RNA core domain rearrangement is coupled to template hybrid elongation.
View details for DOI 10.1261/rna.058743.116
View details for Web of Science ID 000392883800007
View details for PubMedID 28096444
View details for PubMedCentralID PMC5238793
RNA-Puzzles Round III: 3D RNA structure prediction of five riboswitches and one ribozyme.
RNA (New York, N.Y.)
RNA-Puzzles is a collective experiment in blind 3D RNA structure prediction. We report here a third round of RNA-Puzzles. Five puzzles, 4, 8, 12, 13, 14, all structures of riboswitch aptamers and puzzle 7, a ribozyme structure, are included in this round of the experiment. The riboswitch structures include biological binding sites for small molecules (S-adenosyl methionine, cyclic diadenosine monophosphate, 5-amino 4-imidazole carboxamide riboside 5'-triphosphate, glutamine) and proteins (YbxF), and one set describes large conformational changes between ligand-free and ligand-bound states. The Varkud satellite ribozyme is the most recently solved structure of a known large ribozyme. All puzzles have established biological functions and require structural understanding to appreciate their molecular mechanisms. Through the use of fast-track experimental data, including multidimensional chemical mapping, and accurate prediction of RNA secondary structure, a large portion of the contacts in 3D have been predicted correctly leading to similar topologies for the top ranking predictions. Template-based and homology-derived predictions could predict structures to particularly high accuracies. However, achieving biological insights from de novo prediction of RNA 3D structures still depends on the size and complexity of the RNA. Blind computational predictions of RNA structures already appear to provide useful structural information in many cases. Similar to the previous RNA-Puzzles Round II experiment, the prediction of non-Watson-Crick interactions and the observed high atomic clash scores reveal a notable need for an algorithm of improvement. All prediction models and assessment results are available at http://ahsoka.u-strasbg.fr/rnapuzzles/.
View details for DOI 10.1261/rna.060368.116
View details for PubMedID 28138060
View details for PubMedCentralID PMC5393176
Blind tests of RNA nearest-neighbor energy prediction
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2016; 113 (30): 8430-8435
The predictive modeling and design of biologically active RNA molecules requires understanding the energetic balance among their basic components. Rapid developments in computer simulation promise increasingly accurate recovery of RNA's nearest-neighbor (NN) free-energy parameters, but these methods have not been tested in predictive trials or on nonstandard nucleotides. Here, we present, to our knowledge, the first such tests through a RECCES-Rosetta (reweighting of energy-function collection with conformational ensemble sampling in Rosetta) framework that rigorously models conformational entropy, predicts previously unmeasured NN parameters, and estimates these values' systematic uncertainties. RECCES-Rosetta recovers the 10 NN parameters for Watson-Crick stacked base pairs and 32 single-nucleotide dangling-end parameters with unprecedented accuracies: rmsd of 0.28 kcal/mol and 0.41 kcal/mol, respectively. For set-aside test sets, RECCES-Rosetta gives rmsd values of 0.32 kcal/mol on eight stacked pairs involving G-U wobble pairs and 0.99 kcal/mol on seven stacked pairs involving nonstandard isocytidine-isoguanosine pairs. To more rigorously assess RECCES-Rosetta, we carried out four blind predictions for stacked pairs involving 2,6-diaminopurine-U pairs, which achieved 0.64 kcal/mol rmsd accuracy when tested by subsequent experiments. Overall, these results establish that computational methods can now blindly predict energetics of basic RNA motifs, including chemically modified variants, with consistently better than 1 kcal/mol accuracy. Systematic tests indicate that resolving the remaining discrepancies will require energy function improvements beyond simply reweighting component terms, and we propose further blind trials to test such efforts.
View details for DOI 10.1073/pnas.1523335113
View details for Web of Science ID 000380346200043
View details for PubMedID 27402765
View details for PubMedCentralID PMC4968729
Accelerated molecular dynamics simulations of ligand binding to a muscarinic G-protein-coupled receptor.
Quarterly reviews of biophysics
2015; 48 (4): 479–87
Elucidating the detailed process of ligand binding to a receptor is pharmaceutically important for identifying druggable binding sites. With the ability to provide atomistic detail, computational methods are well poised to study these processes. Here, accelerated molecular dynamics (aMD) is proposed to simulate processes of ligand binding to a G-protein-coupled receptor (GPCR), in this case the M3 muscarinic receptor, which is a target for treating many human diseases, including cancer, diabetes and obesity. Long-timescale aMD simulations were performed to observe the binding of three chemically diverse ligand molecules: antagonist tiotropium (TTP), partial agonist arecoline (ARc) and full agonist acetylcholine (ACh). In comparison with earlier microsecond-timescale conventional MD simulations, aMD greatly accelerated the binding of ACh to the receptor orthosteric ligand-binding site and the binding of TTP to an extracellular vestibule. Further aMD simulations also captured binding of ARc to the receptor orthosteric site. Additionally, all three ligands were observed to bind in the extracellular vestibule during their binding pathways, suggesting that it is a metastable binding site. This study demonstrates the applicability of aMD to protein-ligand binding, especially the drug recognition of GPCRs.
View details for DOI 10.1017/S0033583515000153
View details for PubMedID 26537408
View details for PubMedCentralID PMC5435230
The binding mechanism, multiple binding modes, and allosteric regulation of Staphylococcus aureus Sortase A probed by molecular dynamics simulations.
Protein science : a publication of the Protein Society
2012; 21 (12): 1858–71
Sortase enzymes are vitally important for the virulence of gram-positive bacteria as they play a key role in the attachment of surface proteins to the cell wall. These enzymes recognize a specific sorting sequence in proteins destined to be displayed on the surface of the bacteria and catalyze the transpeptidation reaction that links it to a cell wall precursor molecule. Because of their role in establishing pathogenicity, and in light of the recent rise of antibiotic-resistant bacterial strains, sortase enzymes are novel drug targets. Here, we present a study of the prototypical sortase protein Staphylococcus aureus Sortase A (SrtA). Both conventional and accelerated molecular dynamics simulations of S. aureus SrtA in its apo state and when bound to an LPATG sorting signal (SS) were performed. Results support a binding mechanism that may be characterized as conformational selection followed by induced fit. Additionally, the SS was found to adopt multiple metastable states, thus resolving discrepancies between binding conformations in previously reported experimental structures. Finally, correlation analysis reveals that the SS actively affects allosteric pathways throughout the protein that connect the first and the second substrate binding sites, which are proposed to be located on opposing faces of the protein. Overall, these calculations shed new light on the role of dynamics in the binding mechanism and function of sortase enzymes.
View details for DOI 10.1002/pro.2168
View details for PubMedID 23023444
View details for PubMedCentralID PMC3575916