Assistant Professor, Biochemistry
Honors & Awards
Career Award at the Scientific Interface, Burroughs-Wellcome Foundation (2008-present)
Ph.D., Stanford University, Physics (2005)
M.Res., University College London, Biocomplexity (2000)
M.Phil., Cambridge University, Physics (Radio Astronomy) (1999)
A.B.,s.c.l., Harvard University, Physics (1998)
Current Research and Scholarly Interests
We strive for a predictive understanding of how biopolymer sequences code for biopolymer structures, with an initial focus on RNA.
Our research is following three tracks:
First, we are exploring new ab initio algorithms to predict the structures and energetics of RNAs and proteins at high resolution, with an initial focus on the smallest such puzzles. We test and apply these ideas through community-wide blind trials; by fixing crystallographic models; and by solving structures with sparse chemical mapping and NMR data.
Second, we are developing information-rich biochemical methods to solve the myriad structures of noncoding RNAs that remain unknown. Current efforts focus on applying these experimental methods to basic mysteries in RNA behavior, including the extent of RNA structure inside cells and viruses.
Third, we are integrating high-throughput biochemistry with a 100,000-player on-line game called Eterna. This project is revealing missing rules in RNA folding and design and engineering RNA devices for cellular control and computing. As the first instantiation of 'cloud biochemistry', Eterna empowers expert and citizen scientists to collaboratively solve fundamental biochemical problems on-line with rapid experimental certification.
Overall, our work aims to bring us a future in which coding living systems with RNA is as agile and pervasive as coding conventional computers with programming languages.
- Biological Macromolecules
BIOC 241, BIOPHYS 241, SBIO 241 (Spr)
- Computational Macromolecule Structure Modeling
BIOS 208 (Spr)
- Development of Thesis Research
BIOC 350 (Aut)
Independent Studies (11)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biochemistry
BIOC 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biophysics
BIOPHYS 399 (Aut, Win, Spr, Sum)
- Graduate Research
BIOPHYS 300 (Aut, Win, Spr, Sum)
- Graduate Research and Special Advanced Work
BIOC 399 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOC 370 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Out-of-Department Advanced Research Laboratory in Experimental Biology
BIO 199X (Aut, Win, Spr, Sum)
- The Teaching of Biochemistry
BIOC 221 (Aut, Win, Sum)
- Undergraduate Research
BIOC 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
- Prior Year Courses
The Mutate-and-Map Protocol for Inferring Base Pairs in Structured RNA.
Methods in molecular biology (Clifton, N.J.)
2014; 1086: 53-77
Chemical mapping is a widespread technique for structural analysis of nucleic acids in which a molecule's reactivity to different probes is quantified at single nucleotide resolution and used to constrain structural modeling. This experimental framework has been extensively revisited in the past decade with new strategies for high-throughput readouts, chemical modification, and rapid data analysis. Recently, we have coupled the technique to high-throughput mutagenesis. Point mutations of a base paired nucleotide can lead to exposure of not only that nucleotide but also its interaction partner. Systematically carrying out the mutation and mapping for the entire system gives an experimental approximation of the molecule's "contact map." Here, we give our in-house protocol for this "mutate-and-map" (M2) strategy, based on 96-well capillary electrophoresis, and we provide practical tips on interpreting the data to infer nucleic acid structure.
View details for DOI 10.1007/978-1-62703-667-2_4
View details for PubMedID 24136598
Massively Parallel RNA Chemical Mapping with a Reduced Bias MAP-Seq Protocol.
Methods in molecular biology (Clifton, N.J.)
2014; 1086: 95-117
Chemical mapping methods probe RNA structure by revealing and leveraging correlations of a nucleotide's structural accessibility or flexibility with its reactivity to various chemical probes. Pioneering work by Lucks and colleagues has expanded this method to probe hundreds of molecules at once on an Illumina sequencing platform, obviating the use of slab gels or capillary electrophoresis on one molecule at a time. Here, we describe optimizations to this method from our lab, resulting in the MAP-seq protocol (Multiplexed Accessibility Probing read out through sequencing), version 1.0. The protocol permits the quantitative probing of thousands of RNAs at once, by several chemical modification reagents, on the time scale of a day using a tabletop Illumina machine. This method and a software package MAPseeker ( http://simtk.org/home/map_seeker ) address several potential sources of bias, by eliminating PCR steps, improving ligation efficiencies of ssDNA adapters, and avoiding problematic heuristics in prior algorithms. We hope that the step-by-step description of MAP-seq 1.0 will help other RNA mapping laboratories to transition from electrophoretic to next-generation sequencing methods and to further reduce the turnaround time and any remaining biases of the protocol.
View details for DOI 10.1007/978-1-62703-667-2_6
View details for PubMedID 24136600
Atomic-Accuracy Prediction of Protein Loop Structures through an RNA-Inspired Ansatz
2013; 8 (10)
Consistently predicting biopolymer structure at atomic resolution from sequence alone remains a difficult problem, even for small sub-segments of large proteins. Such loop prediction challenges, which arise frequently in comparative modeling and protein design, can become intractable as loop lengths exceed 10 residues and if surrounding side-chain conformations are erased. Current approaches, such as the protein local optimization protocol or kinematic inversion closure (KIC) Monte Carlo, involve stages that coarse-grain proteins, simplifying modeling but precluding a systematic search of all-atom configurations. This article introduces an alternative modeling strategy based on a 'stepwise ansatz', recently developed for RNA modeling, which posits that any realistic all-atom molecular conformation can be built up by residue-by-residue stepwise enumeration. When harnessed to a dynamic-programming-like recursion in the Rosetta framework, the resulting stepwise assembly (SWA) protocol enables enumerative sampling of a 12 residue loop at a significant but achievable cost of thousands of CPU-hours. In a previously established benchmark, SWA recovers crystallographic conformations with sub-Angstrom accuracy for 19 of 20 loops, compared to 14 of 20 by KIC modeling with a comparable expenditure of computational power. Furthermore, SWA gives high accuracy results on an additional set of 15 loops highlighted in the biological literature for their irregularity or unusual length. Successes include cis-Pro touch turns, loops that pass through tunnels of other side-chains, and loops of lengths up to 24 residues. Remaining problem cases are traced to inaccuracies in the Rosetta all-atom energy function. In five additional blind tests, SWA achieves sub-Angstrom accuracy models, including the first such success in a protein/RNA binding interface, the YbxF/kink-turn interaction in the fourth 'RNA-puzzle' competition. These results establish all-atom enumeration as an unusually systematic approach to ab initio protein structure modeling that can leverage high performance computing and physically realistic energy functions to more consistently achieve atomic accuracy.
View details for DOI 10.1371/journal.pone.0074830
View details for Web of Science ID 000326032600003
View details for PubMedID 24204571
Adding Diverse Noncanonical Backbones to Rosetta: Enabling Peptidomimetic Design
2013; 8 (7)
Peptidomimetics are classes of molecules that mimic structural and functional attributes of polypeptides. Peptidomimetic oligomers can frequently be synthesized using efficient solid phase synthesis procedures similar to peptide synthesis. Conformationally ordered peptidomimetic oligomers are finding broad applications for molecular recognition and for inhibiting protein-protein interactions. One critical limitation is the limited set of design tools for identifying oligomer sequences that can adopt desired conformations. Here, we present expansions to the ROSETTA platform that enable structure prediction and design of five non-peptidic oligomer scaffolds (noncanonical backbones), oligooxopiperazines, oligo-peptoids, [Formula: see text]-peptides, hydrogen bond surrogate helices and oligosaccharides. This work is complementary to prior additions to model noncanonical protein side chains in ROSETTA. The main purpose of our manuscript is to give a detailed description to current and future developers of how each of these noncanonical backbones was implemented. Furthermore, we provide a general outline for implementation of new backbone types not discussed here. To illustrate the utility of this approach, we describe the first tests of the ROSETTA molecular mechanics energy function in the context of oligooxopiperazines, using quantum mechanical calculations as comparison points, scanning through backbone and side chain torsion angles for a model peptidomimetic. Finally, as an example of a novel design application, we describe the automated design of an oligooxopiperazine that inhibits the p53-MDM2 protein-protein interaction. For the general biological and bioengineering community, several noncanonical backbones have been incorporated into web applications that allow users to freely and rapidly test the presented protocols (http://rosie.rosettacommons.org). This work helps address the peptidomimetic community's need for an automated and expandable modeling tool for noncanonical backbones.
View details for DOI 10.1371/journal.pone.0067051
View details for Web of Science ID 000323110600005
View details for PubMedID 23869206
- HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis NUCLEIC ACIDS RESEARCH 2013; 41 (W1): W492-W498
HiTRACE-Web: an online tool for robust analysis of high-throughput capillary electrophoresis.
Nucleic acids research
2013; 41 (Web Server issue): W492-8
To facilitate the analysis of large-scale high-throughput capillary electrophoresis data, we previously proposed a suite of efficient analysis software named HiTRACE (High Throughput Robust Analysis of Capillary Electrophoresis). HiTRACE has been used extensively for quantitating data from RNA and DNA structure mapping experiments, including mutate-and-map contact inference, chromatin footprinting, the Eterna RNA design project and other high-throughput applications. However, HiTRACE is based on a suite of command-line MATLAB scripts that requires nontrivial efforts to learn, use and extend. Here, we present HiTRACE-Web, an online version of HiTRACE that includes standard features previously available in the command-line version and additional features such as automated band annotation and flexible adjustment of annotations, all via a user-friendly environment. By making use of parallelization, the on-line workflow is also faster than software implementations available to most users on their local computers. Free access: http://hitrace.org.
View details for DOI 10.1093/nar/gkt501
View details for PubMedID 23761448
Serverification of Molecular Modeling Applications: The Rosetta Online Server That Includes Everyone (ROSIE)
2013; 8 (5)
The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org.
View details for DOI 10.1371/journal.pone.0063906
View details for Web of Science ID 000320362700078
View details for PubMedID 23717507
- Remodeling a beta-peptide bundle CHEMICAL SCIENCE 2013; 4 (1): 319-324
Correcting pervasive errors in RNA crystallography through enumerative structure prediction
2013; 10 (1): 74-U105
Three-dimensional RNA models fitted into crystallographic density maps exhibit pervasive conformational ambiguities, geometric errors and steric clashes. To address these problems, we present enumerative real-space refinement assisted by electron density under Rosetta (ERRASER), coupled to Python-based hierarchical environment for integrated 'xtallography' (PHENIX) diffraction-based refinement. On 24 data sets, ERRASER automatically corrects the majority of MolProbity-assessed errors, improves the average R(free) factor, resolves functionally important discrepancies in noncanonical structure and refines low-resolution models to better match higher-resolution models.
View details for DOI 10.1038/NMETH.2262
View details for Web of Science ID 000312810100041
View details for PubMedID 23202432
Advances, Interactions, and Future Developments in the CNS, Phenix, and Rosetta Structural Biology Software Systems
ANNUAL REVIEW OF BIOPHYSICS, VOL 42
2013; 42: 265-287
Advances in our understanding of macromolecular structure come from experimental methods, such as X-ray crystallography, and also computational analysis of the growing number of atomic models obtained from such experiments. The later analyses have made it possible to develop powerful tools for structure prediction and optimization in the absence of experimental data. In recent years, a synergy between these computational methods for crystallographic structure determination and structure prediction and optimization has begun to be exploited. We review some of the advances in the algorithms used for crystallographic structure determination in the Phenix and Crystallography & NMR System software packages and describe how methods from ab initio structure prediction and refinement in Rosetta have been applied to challenging crystallographic problems. The prospects for future improvement of these methods are discussed.
View details for DOI 10.1146/annurev-biophys-083012-130253
View details for Web of Science ID 000321695700013
View details for PubMedID 23451892
An RNA Mapping DataBase for curating RNA structure mapping experiments
2012; 28 (22): 3006-3008
We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly.Freely available on the web at http://email@example.com.Supplementary data are available at Bioinformatics Online.
View details for DOI 10.1093/bioinformatics/bts554
View details for Web of Science ID 000311303500028
View details for PubMedID 22976082
Quantitative Dimethyl Sulfate Mapping for Automated RNA Secondary Structure Inference
2012; 51 (36): 7037-7039
For decades, dimethyl sulfate (DMS) mapping has informed manual modeling of RNA structure in vitro and in vivo. Here, we incorporate DMS data into automated secondary structure inference using an energy minimization framework developed for 2'-OH acylation (SHAPE) mapping. On six noncoding RNAs with crystallographic models, DMS-guided modeling achieves overall false negative and false discovery rates of 9.5% and 11.6%, respectively, comparable to or better than those of SHAPE-guided modeling, and bootstrapping provides straightforward confidence estimates. Integrating DMS-SHAPE data and including 1-cyclohexyl(2-morpholinoethyl) carbodiimide metho-p-toluene sulfonate (CMCT) reactivities provide small additional improvements. These results establish DMS mapping, an already routine technique, as a quantitative tool for unbiased RNA secondary structure modeling.
View details for DOI 10.1021/bi3008802
View details for Web of Science ID 000308833500001
View details for PubMedID 22913637
Squaring theory with practice in RNA design
CURRENT OPINION IN STRUCTURAL BIOLOGY
2012; 22 (4): 457-466
Ribonucleic acid (RNA) design offers unique opportunities for engineering genetic networks and nanostructures that self-assemble within living cells. Recent years have seen the creation of increasingly complex RNA devices, including proof-of-concept applications for in vivo three-dimensional scaffolding, imaging, computing, and control of biological behaviors. Expert intuition and simple design rules--the stability of double helices, the modularity of noncanonical RNA motifs, and geometric closure--have enabled these successful applications. Going beyond heuristics, emerging algorithms may enable automated design of RNAs with nucleotide-level accuracy but, as illustrated on a recent RNA square design, are not yet fully predictive. Looking ahead, technological advances in RNA synthesis and interrogation are poised to radically accelerate the discovery and stringent testing of design methods.
View details for DOI 10.1016/j.sbi.2012.06.003
View details for Web of Science ID 000308516800009
View details for PubMedID 22832174
Ultraviolet Shadowing of RNA Can Cause Significant Chemical Damage in Seconds
Chemical purity of RNA samples is important for high-precision studies of RNA folding and catalytic behavior, but photodamage accrued during ultraviolet (UV) shadowing steps of sample preparation can reduce this purity. Here, we report the quantitation of UV-induced damage by using reverse transcription and single-nucleotide-resolution capillary electrophoresis. We found photolesions in a dozen natural and artificial RNAs; across multiple sequence contexts, dominantly at but not limited to pyrimidine doublets; and from multiple lamps recommended for UV shadowing. Irradiation time-courses revealed detectable damage within a few seconds of exposure for 254 nm lamps held at a distance of 5 to 10 cm from 0.5-mm thickness gels. Under these conditions, 200-nucleotide RNAs subjected to 20 seconds of UV shadowing incurred damage to 16-27% of molecules; and, due to a 'skin effect', the molecule-by-molecule distribution of lesions gave 4-fold higher variance than a Poisson distribution. Thicker gels, longer wavelength lamps, and shorter exposure times reduced but did not eliminate damage. These results suggest that RNA biophysical studies should report precautions taken to avoid artifactual heterogeneity from UV shadowing.
View details for DOI 10.1038/srep00517
View details for Web of Science ID 000306707600001
View details for PubMedID 22816040
Metal-ion rescue revisited: Biochemical detection of site-bound metal ions important for RNA folding
RNA-A PUBLICATION OF THE RNA SOCIETY
2012; 18 (6): 1123-1141
Within the three-dimensional architectures of RNA molecules, divalent metal ions populate specific locations, shedding their water molecules to form chelates. These interactions help the RNA adopt and maintain specific conformations and frequently make essential contributions to function. Defining the locations of these site-bound metal ions remains challenging despite the growing database of RNA structures. Metal-ion rescue experiments have provided a powerful approach to identify and distinguish catalytic metal ions within RNA active sites, but the ability of such experiments to identify metal ions that contribute to tertiary structure acquisition and structural stability is less developed and has been challenged. Herein, we use the well-defined P4-P6 RNA domain of the Tetrahymena group I intron to reevaluate prior evidence against the discriminatory power of metal-ion rescue experiments and to advance thermodynamic descriptions necessary for interpreting these experiments. The approach successfully identifies ligands within the RNA that occupy the inner coordination sphere of divalent metal ions and distinguishes them from ligands that occupy the outer coordination sphere. Our results underscore the importance of obtaining complete folding isotherms and establishing and evaluating thermodynamic models in order to draw conclusions from metal-ion rescue experiments. These results establish metal-ion rescue as a rigorous tool for identifying and dissecting energetically important metal-ion interactions in RNAs that are noncatalytic but critical for RNA tertiary structure.
View details for DOI 10.1261/rna.028738.111
View details for Web of Science ID 000304423000003
View details for PubMedID 22539523
RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction
RNA-A PUBLICATION OF THE RNA SOCIETY
2012; 18 (4): 610-625
We report the results of a first, collective, blind experiment in RNA three-dimensional (3D) structure prediction, encompassing three prediction puzzles. The goals are to assess the leading edge of RNA structure prediction techniques; compare existing methods and tools; and evaluate their relative strengths, weaknesses, and limitations in terms of sequence length and structural complexity. The results should give potential users insight into the suitability of available methods for different applications and facilitate efforts in the RNA structure prediction community in ongoing efforts to improve prediction tools. We also report the creation of an automated evaluation pipeline to facilitate the analysis of future RNA structure prediction exercises.
View details for DOI 10.1261/rna.031054.111
View details for Web of Science ID 000301954600002
View details for PubMedID 22361291
Automated RNA Structure Prediction Uncovers a Kink-Turn Linker in Double Glycine Riboswitches
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
2012; 134 (3): 1404-1407
The tertiary structures of functional RNA molecules remain difficult to decipher. A new generation of automated RNA structure prediction methods may help address these challenges but have not yet been experimentally validated. Here we apply four prediction tools to a class of double glycine riboswitches that can bind two ligands cooperatively. A novel method (BPPalign), RMdetect, JAR3D, and Rosetta 3D modeling give consistent predictions for a new stem P0 and a kink-turn motif. These elements structure the linker between the RNAs' double aptamers. Chemical mapping on the Fusobacterium nucleatum riboswitch with N-methylisatoic anhydride, dimethyl sulfate and 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate probing, mutate-and-map studies, and mutation/rescue experiments all provide strong evidence for the structured linker. Under solution conditions that permit rigorous thermodynamic analysis, disrupting this helix-junction-helix structure gives 120- and 6-30-fold poorer dissociation constants for the RNA's two glycine-binding transitions, corresponding to an overall energetic impact of 4.3 ± 0.5 kcal/mol. Prior biochemical and crystallography studies did not include this critical element due to over-truncation of the RNA. We speculate that several further undiscovered elements are likely to exist in the flanking regions of this and other functional RNAs, and automated prediction tools can play a useful role in their detection and dissection.
View details for DOI 10.1021/ja2093508
View details for Web of Science ID 000301084400005
View details for PubMedID 22192063
An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (51): 20573-20578
Atomic-accuracy structure prediction of macromolecules should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz," for recursively constructing well-packed atomic-detail models in small steps, enumerating several million conformations for each monomer, and covering all build-up paths. By making use of high-performance computing and the Rosetta framework, we provide first tests of this hypothesis on a benchmark of 15 RNA loop-modeling problems drawn from riboswitches, ribozymes, and the ribosome, including 10 cases that are not solvable by current knowledge-based modeling approaches. For each loop problem, this deterministic stepwise assembly method either reaches atomic accuracy or exposes flaws in Rosetta's all-atom energy function, indicating the resolution of the conformational sampling bottleneck. As a further rigorous test, we have carried out a blind all-atom prediction for a noncanonical RNA motif, the C7.2 tetraloop/receptor, and validated this model through nucleotide-resolution chemical mapping experiments. Stepwise assembly is an enumerative, ab initio build-up method that systematically outperforms existing Monte Carlo and knowledge-based methods for 3D structure prediction.
View details for DOI 10.1073/pnas.1106516108
View details for Web of Science ID 000298289400065
View details for PubMedID 22143768
A two-dimensional mutate-and-map strategy for non-coding RNA structure
2011; 3 (12): 954-962
Non-coding RNAs fold into precise base-pairing patterns to carry out critical roles in genetic regulation and protein synthesis, but determining RNA structure remains difficult. Here, we show that coupling systematic mutagenesis with high-throughput chemical mapping enables accurate base-pair inference of domains from ribosomal RNA, ribozymes and riboswitches. For a six-RNA benchmark that has challenged previous chemical/computational methods, this 'mutate-and-map' strategy gives secondary structures that are in agreement with crystallography (helix error rates, 2%), including a blind test on a double-glycine riboswitch. Through modelling of partially ordered states, the method enables the first test of an interdomain helix-swap hypothesis for ligand-binding cooperativity in a glycine riboswitch. Finally, the data report on tertiary contacts within non-coding RNAs, and coupling to the Rosetta/FARFAR algorithm gives nucleotide-resolution three-dimensional models (helix root-mean-squared deviation, 5.7 Å) of an adenine riboswitch. These results establish a promising two-dimensional chemical strategy for inferring the secondary and tertiary structures that underlie non-coding RNA behaviour.
View details for DOI 10.1038/NCHEM.1176
View details for Web of Science ID 000297685800014
View details for PubMedID 22109276
Understanding the Errors of SHAPE-Directed RNA Structure Modeling
2011; 50 (37): 8049-8056
Single-nucleotide-resolution chemical mapping for structured RNA is being rapidly advanced by new chemistries, faster readouts, and coupling to computational algorithms. Recent tests have shown that selective 2'-hydroxyl acylation by primer extension (SHAPE) can give near-zero error rates (0-2%) in modeling the helices of RNA secondary structure. Here, we benchmark the method using six molecules for which crystallographic data are available: tRNA(phe) and 5S rRNA from Escherichia coli, the P4-P6 domain of the Tetrahymena group I ribozyme, and ligand-bound domains from riboswitches for adenine, cyclic di-GMP, and glycine. SHAPE-directed modeling of these highly structured RNAs gave an overall false negative rate (FNR) of 17% and a false discovery rate (FDR) of 21%, with at least one helix prediction error in five of the six cases. Extensive variations of data processing, normalization, and modeling parameters did not significantly mitigate modeling errors. Only one varation, filtering out data collected with deoxyinosine triphosphate during primer extension, gave a modest improvement (FNR = 12%, and FDR = 14%). The residual structure modeling errors are explained by the insufficient information content of these RNAs' SHAPE data, as evaluated by a nonparametric bootstrapping analysis. Beyond these benchmark cases, bootstrapping suggests a low level of confidence (<50%) in the majority of helices in a previously proposed SHAPE-directed model for the HIV-1 RNA genome. Thus, SHAPE-directed RNA modeling is not always unambiguous, and helix-by-helix confidence estimates, as described herein, may be critical for interpreting results from this powerful methodology.
View details for DOI 10.1021/bi4200524n
View details for Web of Science ID 000294791100021
View details for PubMedID 21842868
Quantitative comparison of villin headpiece subdomain simulations and triplet-triplet energy transfer experiments
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2011; 108 (31): 12734-12739
As the fastest folding protein, the villin headpiece (HP35) serves as an important bridge between simulation and experimental studies of protein folding. Despite the simplicity of this system, experiments continue to reveal a number of surprises, including structure in the unfolded state and complex equilibrium dynamics near the native state. Using 2.5 ms of molecular dynamics and Markov state models, we connect to current experimental results in three ways. First, we present and validate a novel method for the quantitative prediction of triplet-triplet energy transfer experiments. Second, we construct a many-state model for HP35 that is consistent with previous experiments. Finally, we predict contact-formation time traces for all 1,225 possible triplet-triplet energy transfer experiments on HP35.
View details for DOI 10.1073/pnas.1010880108
View details for Web of Science ID 000293385700043
View details for PubMedID 21768345
HiTRACE: high-throughput robust analysis for capillary electrophoresis
2011; 27 (13): 1798-1805
Capillary electrophoresis (CE) of nucleic acids is a workhorse technology underlying high-throughput genome analysis and large-scale chemical mapping for nucleic acid structural inference. Despite the wide availability of CE-based instruments, there remain challenges in leveraging their full power for quantitative analysis of RNA and DNA structure, thermodynamics and kinetics. In particular, the slow rate and poor automation of available analysis tools have bottlenecked a new generation of studies involving hundreds of CE profiles per experiment.We propose a computational method called high-throughput robust analysis for capillary electrophoresis (HiTRACE) to automate the key tasks in large-scale nucleic acid CE analysis, including the profile alignment that has heretofore been a rate-limiting step in the highest throughput experiments. We illustrate the application of HiTRACE on 13 datasets representing 4 different RNAs, 3 chemical modification strategies and up to 480 single mutant variants; the largest datasets each include 87 360 bands. By applying a series of robust dynamic programming algorithms, HiTRACE outperforms prior tools in terms of alignment and fitting quality, as assessed by measures including the correlation between quantified band intensities between replicate datasets. Furthermore, while the smallest of these datasets required 7-10 h of manual intervention using prior approaches, HiTRACE quantitation of even the largest datasets herein was achieved in 3-12 min. The HiTRACE method, therefore, resolves a critical barrier to the efficient and accurate analysis of nucleic acid structure in experiments involving tens of thousands of electrophoretic bands.
View details for DOI 10.1093/bioinformatics/btr277
View details for Web of Science ID 000291752600058
View details for PubMedID 21561922
Sharing and archiving nucleic acid structure mapping data
RNA-A PUBLICATION OF THE RNA SOCIETY
2011; 17 (7): 1204-1212
Nucleic acids are particularly amenable to structural characterization using chemical and enzymatic probes. Each individual structure mapping experiment reveals specific information about the structure and/or dynamics of the nucleic acid. Currently, there is no simple approach for making these data publically available in a standardized format. We therefore developed a standard for reporting the results of single nucleotide resolution nucleic acid structure mapping experiments, or SNRNASMs. We propose a schema for sharing nucleic acid chemical probing data that uses generic public servers for storing, retrieving, and searching the data. We have also developed a consistent nomenclature (ontology) within the Ontology of Biomedical Investigations (OBI), which provides unique identifiers (termed persistent URLs, or PURLs) for classifying the data. Links to standardized data sets shared using our proposed format along with a tutorial and links to templates can be found at http://snrnasm.bio.unc.edu.
View details for DOI 10.1261/rna.2753211
View details for Web of Science ID 000291683500002
View details for PubMedID 21610212
Four Small Puzzles That Rosetta Doesn't Solve
2011; 6 (5)
A complete macromolecule modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resolution structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approximations and omissions in the Rosetta all-atom energy function currently preclude discriminating experimentally observed conformations from de novo models at atomic resolution. These molecular "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
View details for DOI 10.1371/journal.pone.0020044
View details for Web of Science ID 000290793400036
View details for PubMedID 21625446
An effective assay for high cellular resolution time-lapse imaging of sensory placode formation and morphogenesis
The vertebrate peripheral nervous system contains sensory neurons that arise from ectodermal placodes. Placodal cells ingress to move inside the head to form sensory neurons of the cranial ganglia. To date, however, the process of placodal cell ingression and underlying cellular behavior are poorly understood as studies have relied upon static analyses on fixed tissues. Visualizing placodal cell behavior requires an ability to distinguish the surface ectoderm from the underlying mesenchyme. This necessitates high resolution imaging along the z-plane which is difficult to accomplish in whole embryos. To address this issue, we have developed an imaging system using cranial slices that allows direct visualization of placode formation.We demonstrate an effective imaging assay for capturing placode development at single cell resolution using chick embryonic tissue ex vivo. This provides the first time-lapse imaging of mitoses in the trigeminal placodal ectoderm, ingression, and intercellular contacts of placodal cells. Cell divisions with varied orientations were found in the placodal ectoderm all along the apical-basal axis. Placodal cells initially have short cytoplasmic processes during ingression as young neurons and mature over time to elaborate long axonal processes in the mesenchyme. Interestingly, the time-lapse imaging data reveal that these delaminating placodal neurons begin ingression early on from within the ectoderm, where they start to move and continue on to exit as individual or strings of neurons through common openings on the basal side of the epithelium. Furthermore, dynamic intercellular contacts are abundant among the delaminating placodal neurons, between these and the already delaminated cells, as well as among cells in the forming ganglion.This new imaging assay provides a powerful method to analyze directly development of placode-derived sensory neurons and subsequent ganglia formation for the first time in amniotes. Viewing placode development in a head cross-section provides a vantage point from which it is possible to study comprehensive events in placode formation, from differentiation, cell ingression to ganglion assembly. Understanding how placodal neurons form may reveal a new mechanism of neurogenesis distinct from that in the central nervous system and provide new insight into how cells acquire motility from a stationary epithelial cell type.
View details for DOI 10.1186/1471-2202-12-37
View details for Web of Science ID 000291750600001
View details for PubMedID 21554727
A mutate-and-map strategy accurately infers the base pairs of a 35-nucleotide model RNA
RNA-A PUBLICATION OF THE RNA SOCIETY
2011; 17 (3): 522-534
We present a rapid experimental strategy for inferring base pairs in structured RNAs via an information-rich extension of classic chemical mapping approaches. The mutate-and-map method, previously applied to a DNA/RNA helix, systematically searches for single mutations that enhance the chemical accessibility of base-pairing partners distant in sequence. To test this strategy for structured RNAs, we have carried out mutate-and-map measurements for a 35-nt hairpin, called the MedLoop RNA, embedded within an 80-nt sequence. We demonstrate the synthesis of all 105 single mutants of the MedLoop RNA sequence and present high-throughput DMS, CMCT, and SHAPE modification measurements for this library at single-nucleotide resolution. The resulting two-dimensional data reveal visually clear, punctate features corresponding to RNA base pair interactions as well as more complex features; these signals can be qualitatively rationalized by comparison to secondary structure predictions. Finally, we present an automated, sequence-blind analysis that permits the confident identification of nine of the 10 MedLoop RNA base pairs at single-nucleotide resolution, while discriminating against all 1460 false-positive base pairs. These results establish the accuracy and information content of the mutate-and-map strategy and support its feasibility for rapidly characterizing the base-pairing patterns of larger and more complex RNA systems.
View details for DOI 10.1261/rna.2516311
View details for Web of Science ID 000287195900014
View details for PubMedID 21239468
ROSETTA3: AN OBJECT-ORIENTED SOFTWARE SUITE FOR THE SIMULATION AND DESIGN OF MACROMOLECULES
METHODS IN ENZYMOLOGY, VOL 487: COMPUTER METHODS, PT C
We have recently completed a full re-architecturing of the ROSETTA molecular modeling program, generalizing and expanding its existing functionality. The new architecture enables the rapid prototyping of novel protocols by providing easy-to-use interfaces to powerful tools for molecular modeling. The source code of this rearchitecturing has been released as ROSETTA3 and is freely available for academic use. At the time of its release, it contained 470,000 lines of code. Counting currently unpublished protocols at the time of this writing, the source includes 1,285,000 lines. Its rapid growth is a testament to its ease of use. This chapter describes the requirements for our new architecture, justifies the design decisions, sketches out central classes, and highlights a few of the common tasks that the new software can perform.
View details for DOI 10.1016/S0076-6879(11)87019-9
View details for Web of Science ID 000286532000019
View details for PubMedID 21187238
Rosetta in CAPRI rounds 13-19
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2010; 78 (15): 3212-3218
Modeling the conformational changes that occur on binding of macromolecules is an unsolved challenge. In previous rounds of the Critical Assessment of PRediction of Interactions (CAPRI), it was demonstrated that the Rosetta approach to macromolecular modeling could capture side chain conformational changes on binding with high accuracy. In rounds 13-19 we tested the ability of various backbone remodeling strategies to capture the main-chain conformational changes observed during binding events. These approaches span a wide range of backbone motions, from limited refinement of loops to relieve clashes in homologous docking, through extensive remodeling of loop segments, to large-scale remodeling of RNA. Although the results are encouraging, major improvements in sampling and energy evaluation are clearly required for consistent high accuracy modeling. Analysis of our failures in the CAPRI challenges suggest that conformational sampling at the termini of exposed beta strands is a particularly pressing area for improvement.
View details for DOI 10.1002/prot.22784
View details for Web of Science ID 000283565000020
View details for PubMedID 20597089
A Mutate-and-Map Strategy for Inferring Base Pairs in Structured Nucleic Acids: Proof of Concept on a DNA/RNA Helix
2010; 49 (35): 7414-7416
We propose a rapid chemical strategy for identifying base pairs in structured nucleic acid systems. The approach goes beyond traditional chemical mapping approaches by monitoring perturbations of each residue's chemical accessibility in response to systematic mutagenesis of residues that are distant in sequence but nearby in three dimensions. As a proof of concept, we present high-throughput dimethyl sulfate accessibility data for a chimeric DNA/RNA system in which every possible sequence variation and deletion in a 20 bp region has been synthesized and tested. The data demonstrate that 88% of the system's base pairs can be robustly inferred, with A/A and T/C DNA/RNA mismatches giving the strongest signals. These results point to the feasibility of rapid base pair inference in larger and more complex nucleic acid systems with unknown structure.
View details for DOI 10.1021/bi101123g
View details for Web of Science ID 000281305200002
View details for PubMedID 20677780
Atomic accuracy in predicting and designing noncanonical RNA structure
2010; 7 (4): 291-294
We present fragment assembly of RNA with full-atom refinement (FARFAR), a Rosetta framework for predicting and designing noncanonical motifs that define RNA tertiary structure. In a test set of thirty-two 6-20-nucleotide motifs, FARFAR recapitulated 50% of the experimental structures at near-atomic accuracy. Sequence redesign calculations recovered native bases at 65% of residues engaged in noncanonical interactions, and we experimentally validated mutations predicted to stabilize a signal recognition particle domain.
View details for DOI 10.1038/NMETH.1433
View details for Web of Science ID 000276150600018
View details for PubMedID 20190761
A robust peak detection method for RNA structure inference by high-throughput contact mapping
2009; 25 (9): 1137-1144
For high-throughput prediction of the helical arrangements of large RNA molecules, an innovative method termed multiplexed hydroxyl radical (*OH) cleavage analysis (MOHCA) has been proposed. A key step in this promising technique is to detect peaks accurately from noisy radioactivity profiles. Since manual peak finding is laborious and prone to error, an automated peak detection method to improve the accuracy and throughput of MOHCA is required. Existing methods were not applicable to MOHCA due to their high false positive rates.We developed a two-step computational method that can detect peaks from MOHCA profiles in a robust manner. The first step exploits an ensemble of linear and non-linear signal processing techniques to find true peak candidates. In the second step, a binary classifier trained with the characteristics of true and false peaks is used to eliminate false peaks out of the peak candidates. We tested the proposed approach with 2002 MOHCA cleavage profiles and obtained the median recall, precision and F-measure values of 0.917, 0.750 and 0.830, respectively. Compared with the alternatives considered, the proposed method was able to handle false peaks substantially better, thus resulting in 51.0-71.8% higher median values of precision and F-measure.The software and supplementary data are available at http://dna.korea.ac.kr/pub/mohca.
View details for DOI 10.1093/bioinformatics/btp110
View details for Web of Science ID 000265523300007
View details for PubMedID 19246511
Remeasuring the double helix
2008; 322 (5900): 446-449
DNA is thought to behave as a stiff elastic rod with respect to the ubiquitous mechanical deformations inherent to its biology. To test this model at short DNA lengths, we measured the mean and variance of end-to-end length for a series of DNA double helices in solution, using small-angle x-ray scattering interference between gold nanocrystal labels. In the absence of applied tension, DNA is at least one order of magnitude softer than measured by single-molecule stretching experiments. Further, the data rule out the conventional elastic rod model. The variance in end-to-end length follows a quadratic dependence on the number of base pairs rather than the expected linear dependence, indicating that DNA stretching is cooperative over more than two turns of the DNA double helix. Our observations support the idea of long-range allosteric communication through DNA structure.
View details for DOI 10.1126/science.1158881
View details for Web of Science ID 000260094500048
View details for PubMedID 18927394
Structural inference of native and partially folded RNA by high-throughput contact mapping
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2008; 105 (11): 4144-4149
The biological behaviors of ribozymes, riboswitches, and numerous other functional RNA molecules are critically dependent on their tertiary folding and their ability to sample multiple functional states. The conformational heterogeneity and partially folded nature of most of these states has rendered their characterization by high-resolution structural approaches difficult or even intractable. Here we introduce a method to rapidly infer the tertiary helical arrangements of large RNA molecules in their native and non-native solution states. Multiplexed hydroxyl radical (.OH) cleavage analysis (MOHCA) enables the high-throughput detection of numerous pairs of contacting residues via random incorporation of radical cleavage agents followed by two-dimensional gel electrophoresis. We validated this technology by recapitulating the unfolded and native states of a well studied model RNA, the P4-P6 domain of the Tetrahymena ribozyme, at subhelical resolution. We then applied MOHCA to a recently discovered third state of the P4-P6 RNA that is stabilized by high concentrations of monovalent salt and whose partial order precludes conventional techniques for structure determination. The three-dimensional portrait of a compact, non-native RNA state reveals a well ordered subset of native tertiary contacts, in contrast to the dynamic but otherwise similar molten globule states of proteins. With its applicability to nearly any solution state, we expect MOHCA to be a powerful tool for illuminating the many functional structures of large RNA molecules and RNA/protein complexes.
View details for DOI 10.1073/pnas.0709032105
View details for Web of Science ID 000254263300015
View details for PubMedID 18322008
Macromolecular modeling with Rosetta
ANNUAL REVIEW OF BIOCHEMISTRY
2008; 77: 363-382
Advances over the past few years have begun to enable prediction and design of macromolecular structures at near-atomic accuracy. Progress has stemmed from the development of reasonably accurate and efficiently computed all-atom potential functions as well as effective conformational sampling strategies appropriate for searching a highly rugged energy landscape, both driven by feedback from structure prediction and design tests. A unified energetic and kinematic framework in the Rosetta program allows a wide range of molecular modeling problems, from fibril structure prediction to RNA folding to the design of new protein interfaces, to be readily investigated and highlights areas for improvement. The methodology enables the creation of novel molecules with useful functions and holds promise for accelerating experimental structural inference. Emerging connections to crystallographic phasing, NMR modeling, and lower-resolution approaches are described and critically assessed.
View details for DOI 10.1146/annurev.biochem.77.062906.171838
View details for Web of Science ID 000257596800016
View details for PubMedID 18410248
High-resolution structure prediction and the crystallographic phase problem
2007; 450 (7167): 259-U7
The energy-based refinement of low-resolution protein structure models to atomic-level accuracy is a major challenge for computational structural biology. Here we describe a new approach to refining protein structure models that focuses sampling in regions most likely to contain errors while allowing the whole structure to relax in a physically realistic all-atom force field. In applications to models produced using nuclear magnetic resonance data and to comparative models based on distant structural homologues, the method can significantly improve the accuracy of the structures in terms of both the backbone conformations and the placement of core side chains. Furthermore, the resulting models satisfy a particularly stringent test: they provide significantly better solutions to the X-ray crystallographic phase problem in molecular replacement trials. Finally, we show that all-atom refinement can produce de novo protein structure predictions that reach the high accuracy required for molecular replacement without any experimental phase information and in the absence of templates suitable for molecular replacement from the Protein Data Bank. These results suggest that the combination of high-resolution structure prediction with state-of-the-art phasing tools may be unexpectedly powerful in phasing crystallographic data for which molecular replacement is hindered by the absence of sufficiently accurate previous models.
View details for DOI 10.1038/nature06249
View details for Web of Science ID 000250746200052
View details for PubMedID 17934447
Automated de novo prediction of native-like RNA tertiary structures
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2007; 104 (37): 14664-14669
RNA tertiary structure prediction has been based almost entirely on base-pairing constraints derived from phylogenetic covariation analysis. We describe here a complementary approach, inspired by the Rosetta low-resolution protein structure prediction method, that seeks the lowest energy tertiary structure for a given RNA sequence without using evolutionary information. In a benchmark test of 20 RNA sequences with known structure and lengths of approximately 30 nt, the new method reproduces better than 90% of Watson-Crick base pairs, comparable with the accuracy of secondary structure prediction methods. In more than half the cases, at least one of the top five models agrees with the native structure to better than 4 A rmsd over the backbone. Most importantly, the method recapitulates more than one-third of non-Watson-Crick base pairs seen in the native structures. Tandem stacks of "sheared" base pairs, base triplets, and pseudoknots are among the noncanonical features reproduced in the models. In the cases in which none of the top five models were native-like, higher energy conformations similar to the native structures are still sampled frequently but not assigned low energies. These results suggest that modest improvements in the energy function, together with the incorporation of information from phylogenetic covariance, may allow confident and accurate structure prediction for larger and more complex RNA chains.
View details for DOI 10.1073/pnas.0703836104
View details for Web of Science ID 000249513000023
View details for PubMedID 17726102
Structure prediction for CABP7 targets using extensive all-atom refinement with Rosetta@home
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS
2007; 69: 118-128
We describe predictions made using the Rosetta structure prediction methodology for both template-based modeling and free modeling categories in the Seventh Critical Assessment of Techniques for Protein Structure Prediction. For the first time, aggressive sampling and all-atom refinement could be carried out for the majority of targets, an advance enabled by the Rosetta@home distributed computing network. Template-based modeling predictions using an iterative refinement algorithm improved over the best existing templates for the majority of proteins with less than 200 residues. Free modeling methods gave near-atomic accuracy predictions for several targets under 100 residues from all secondary structure classes. These results indicate that refinement with an all-atom energy function, although computationally expensive, is a powerful method for obtaining accurate structure predictions.
View details for DOI 10.1002/prot.21636
View details for Web of Science ID 000251502400013
Determining the Mg2+ stoichiometry for folding an RNA metal ion core
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY
2005; 127 (23): 8272-8273
The folding and catalytic function of RNA molecules depend on their interactions with divalent metal ions, such as magnesium. As with every molecular process, the most basic knowledge required for understanding the close relationship of an RNA with its metal ions is the stoichiometry of the interaction. Unfortunately, inventories of the numbers of divalent ions associated with unfolded and folded RNA states have been unattainable. A common approach has been to interpret Hill coefficients fit to folding equilibria as the number of metal ions bound upon folding. However, this approach is vitiated by the presence of diffusely associated divalent ions in a dynamic ion atmosphere and by the likelihood of multiple transitions along a folding pathway. We demonstrate that the use of molar concentrations of background monovalent salt can alleviate these complications. These simplifying solution conditions allow a precise determination of the stoichiometry of the magnesium ions involved in folding the metal ion core of the P4-P6 domain of the Tetrahymena group I ribozyme. Hill analysis of hydroxyl radical footprinting data suggests that the P4-P6 RNA core folds cooperatively upon the association of two metal ions. This unexpectedly small stoichiometry is strongly supported by counting magnesium ions associated with the P4-P6 RNA via fluorescence titration and atomic emission spectroscopy. By pinpointing the metal ion stoichiometry, these measurements provide a critical but previously missing step in the thermodynamic dissection of the coupling between metal ion binding and RNA folding.
View details for DOI 10.1021/ja051422h
View details for Web of Science ID 000229751100020
View details for PubMedID 15941246
SAFA: Semi-automated footprinting analysis software for high-throughput quantification of nucleic acid footprinting experiments
RNA-A PUBLICATION OF THE RNA SOCIETY
2005; 11 (3): 344-354
Footprinting is a powerful and widely used tool for characterizing the structure, thermodynamics, and kinetics of nucleic acid folding and ligand binding reactions. However, quantitative analysis of the gel images produced by footprinting experiments is tedious and time-consuming, due to the absence of informatics tools specifically designed for footprinting analysis. We have developed SAFA, a semi-automated footprinting analysis software package that achieves accurate gel quantification while reducing the time to analyze a gel from several hours to 15 min or less. The increase in analysis speed is achieved through a graphical user interface that implements a novel methodology for lane and band assignment, called "gel rectification," and an optimized band deconvolution algorithm. The SAFA software yields results that are consistent with published methodologies and reduces the investigator-dependent variability compared to less automated methods. These software developments simplify the analysis procedure for a footprinting gel and can therefore facilitate the use of quantitative footprinting techniques in nucleic acid laboratories that otherwise might not have considered their use. Further, the increased throughput provided by SAFA may allow a more comprehensive understanding of molecular interactions. The software and documentation are freely available for download at http://safa.stanford.edu.
View details for DOI 10.1261/rna.7214405
View details for Web of Science ID 000227190000011
View details for PubMedID 15701734