Rachael Kretsch
Postdoctoral Scholar, Bioengineering
All Publications
-
Structures of nucleotide-bound human telomerase at several steps of its telomeric DNA repeat addition cycle.
Nature communications
2026
Abstract
In most eukaryotes, the reverse transcriptase telomerase counteracts telomere shortening by processively adding telomeric DNA repeat sequences to chromosome ends. Telomerase activity depends on the telomerase reverse transcriptase (TERT) and the telomerase RNA (hTR in humans). Processive telomere elongation is critical for genome stability, and defects in this mechanism are linked to cellular dysfunction and human disease. However, the structural basis for telomerase repeat addition processivity in humans has remained elusive. Here, we present cryo-electron microscopy structures of human telomerase bound to telomeric DNA and an incoming nucleotide, captured at three distinct stages of its repeat addition cycle: initiation, elongation, and pre-termination. Across these states, the TERT active site maintains a conserved architecture that stabilises a short DNA-RNA duplex of constant length of four base-pairs. Beyond the active site, we identify dynamic structural features in both TERT and hTR that facilitate substrate engagement and RNA template repositioning, thereby supporting the synthesis of successive telomeric repeats. Together, these structures provide key insights into how human telomerase achieves its unique processivity to maintain telomere length and genome integrity.
View details for DOI 10.1038/s41467-026-68560-8
View details for PubMedID 41565648
-
Template-based RNA structure prediction advanced through a blind code competition.
bioRxiv : the preprint server for biology
2025
Abstract
Automatically predicting RNA 3D structure from sequence remains an unsolved challenge in biology and biotechnology. Here, we describe a Kaggle code competition engaging over 1700 teams and 43 previously unreleased structures to tackle this challenge. The top three submitted algorithms achieved scores within statistical error of the winners of the recent CASP16 competition. Unexpectedly, the top Kaggle strategy involved a pipeline for discovering 3D templates, without the use of deep learning. We integrated this template-modeling pipeline and other Kaggle strategies to develop a single model RNAPro that retrospectively outperformed individual Kaggle models on the same test set. These results suggest a growing importance of template-based modeling in RNA structure prediction.
View details for DOI 10.64898/2025.12.30.696949
View details for PubMedID 41509375
View details for PubMedCentralID PMC12776560
-
Blind Prediction of Complex Water and Ion Ensembles Around RNA in CASP16.
Proteins
2025
Abstract
Biomolecules rely on water and ions for stable folding, but these interactions are often transient, dynamic, or disordered and thus hidden from experiments and evaluation challenges that represent biomolecules as single, ordered structures. Here, we compare blindly predicted ensembles of water and ion structure to the cryo-EM densities observed around the Tetrahymena ribozyme at 2.2-2.3 Å resolution, collected through target R1260 in the CASP16 competition. Twenty-six groups participated in this solvation "cryo-ensemble" prediction challenge, submitting over 350 million atoms in total, offering the first opportunity to compare blind predictions of dynamic solvent shell ensembles to cryo-EM density. Predicted atomic ensembles were converted to density through local alignment and these densities were compared to the cryo-EM densities using Pearson correlation, Spearman correlation, mutual information, and precision-recall curves. These predictions show that an ensemble representation is able to capture information of transient or dynamic water and ions better than traditional atomic models, but there remains a large accuracy gap to the performance ceiling set by experimental uncertainty. Overall, molecular dynamics approaches best matched the cryo-EM density, with blind predictions from bussilab_plain_md, SoutheRNA, bussilab_replex, coogs2, and coogs3 outperforming the baseline molecular dynamics prediction. This study indicates that simulations of water and ions can be quantitatively evaluated with cryo-EM maps. We propose that further community-wide blind challenges can drive and evaluate progress in modeling water, ions, and other previously hidden components of biomolecular systems.
View details for DOI 10.1002/prot.70079
View details for PubMedID 41204761
-
Blind prediction of complex water and ion ensembles around RNA in CASP16.
bioRxiv : the preprint server for biology
2025
Abstract
Biomolecules rely on water and ions for stable folding, but these interactions are often transient, dynamic, or disordered and thus hidden from experiments and evaluation challenges that represent biomolecules as single, ordered structures. Here, we compare blindly predicted ensembles of water and ion structure to the cryo-EM densities observed around the Tetrahymena ribozyme at 2.2-2.3 Å resolution, collected through target R1260 in the CASP16 competition. 26 groups participated in this solvation 'cryo-ensemble' prediction challenge, submitting over 350 million atoms in total, offering the first opportunity to compare blind predictions of dynamic solvent shell ensembles to cryo-EM density. Predicted atomic ensembles were converted to density through local alignment and these densities were compared to the cryo-EM densities using Pearson correlation, Spearman correlation, mutual information, and precision-recall curves. These predictions show that an ensemble representation is able to capture information of transient or dynamic water and ions better than traditional atomic models, but there remains a large accuracy gap to the performance ceiling set by experimental uncertainty. Overall, molecular dynamics approaches best matched the cryo-EM density, with blind predictions from bussilab_plain_md, SoutheRNA, bussilab_replex, coogs2, and coogs3 outperforming the baseline molecular dynamics prediction. This study indicates that simulations of water and ions can be quantitatively evaluated with cryo-EM maps. We propose that further community-wide blind challenges can drive and evaluate progress in modeling water, ions and other previously hidden components of biomolecular systems.
View details for DOI 10.1101/2025.11.03.685595
View details for PubMedID 41279659
View details for PubMedCentralID PMC12637500
-
Assessment of Protein Complex Predictions in CASP16: Are We Making Progress?
Proteins
2025
Abstract
The assessment of oligomer targets in the Critical Assessment of Structure Prediction Round 16 (CASP16) suggests that complex structure prediction remains an unsolved challenge. Even the leading groups can only predict slightly more than half of the targets to high accuracy. Most CASP16 groups relied on AlphaFold-Multimer (AFM) or AlphaFold3 (AF3) as their core modeling engines. By optimizing input MSAs, refining modeling constructs (using partial rather than full sequences), and employing massive model sampling and selection, top-performing groups were able to significantly outperform the default AFM/AF3 predictions. CASP16 also introduced two additional challenges: Phase 0, which required predictions without stoichiometry information, and Phase 2, which provided participants with thousands of models generated by MassiveFold (MF) to enable large-scale sampling for resource-limited groups. Across all phases, the MULTICOM series and Kiharalab emerged as top performers based on the quality of their best models. However, these groups did not have a strong advantage in model ranking, and thus their lead over other teams, such as Yang-Multimer and kozakovvajda, was less pronounced when evaluating only the first submitted models. Compared to CASP15, CASP16 showed moderate overall improvement, likely driven by the release of AF3 and the extensive model sampling employed by top groups. Several notable trends highlight frontiers for future development. First, the kozakovvajda group significantly outperformed others on antibody-antigen targets, achieving over a 60% success rate without relying on AFM or AF3 as their primary modeling framework, suggesting that alternative approaches may offer promising solutions for these difficult targets. Second, model ranking and selection continue to be major bottlenecks. The PEZYFoldings group demonstrated a notable advantage in selecting their best models as first models, suggesting that their pipeline for model ranking may offer important insights for the field. Finally, the Phase 0 experiment indicated moderate success in stoichiometry prediction; however, stoichiometry prediction remains challenging for high-order assemblies and targets that differ from available homologous templates. Overall, CASP16 demonstrated steady progress in multimer prediction while emphasizing the need for more effective model ranking strategies, improved stoichiometry prediction, and new modeling methods that extend beyond the current AF-based paradigm.
View details for DOI 10.1002/prot.70068
View details for PubMedID 41170922
-
Assessment of Nucleic Acid Structure Prediction in CASP16.
Proteins
2025
Abstract
Consistently accurate 3D nucleic acid structure prediction would facilitate studies of the diverse RNA and DNA molecules underlying life. In CASP16, blind predictions for 42 targets canvassing a full array of nucleic acid functions, from dopamine binding by DNA to formation of elaborate RNA nanocages, were submitted by 65 groups from 46 different labs worldwide. In contrast to concurrent protein structure predictions, performance on nucleic acids was generally poor, with no predictions of previously unseen natural RNA structures achieving TM-scores above 0.8. Even though automated server performance has improved, all top-performing groups were human expert predictors: Vfold, GuangzhouRNA-human, and KiharaLab. Good performance on one template-free modeling target (OLE RNA) and accurate global secondary structure prediction suggested that structural information can be extracted from multiple sequence alignments. However, 3D accuracy generally appeared to depend on the availability of closely related 3D structure templates, and predictions still did not achieve consistent recovery of pseudoknots, singlet Watson-Crick-Franklin pairs, non-canonical pairs, or tertiary motifs like A-minor interactions. For the first time, blind predictions of nucleic acid interactions with small molecules, proteins, and other nucleic acids could be assessed in CASP16. As with nucleic acid monomers, prediction accuracy for nucleic acid complexes was generally poor unless 3D templates were available. Accounting for template availability, there has not been a notable increase in nucleic acid modeling accuracy between previous blind challenges and CASP16.
View details for DOI 10.1002/prot.70072
View details for PubMedID 41165252
-
Functional Relevance of CASP16 Nucleic Acid Predictions as Evaluated by Structure Providers.
Proteins
2025
Abstract
Accurate biomolecular structure prediction enables the prediction of mutational effects, the speculation of function based on predicted structural homology, the analysis of ligand binding modes, experimental model building, and many other applications. Such algorithms to predict essential functional and structural features remain out of reach for biomolecular complexes containing nucleic acids. Here, we report a quantitative and qualitative evaluation of nucleic acid structures for the CASP16 blind prediction challenge by 12 of the experimental groups who provided nucleic acid targets. Blind predictions accurately model secondary structure and some aspects of tertiary structure, including reasonable global folds for some complex RNAs; however, predictions often lack accuracy in the regions of highest functional importance. All models have inaccuracies in non-canonical regions where, for example, the nucleic-acid backbone bends, deviating from an A-form helix geometry, or a base forms a non-standard hydrogen bond (not a Watson-Crick base pair). These bends and non-canonical interactions are integral to forming functionally important regions such as RNA enzymatic active sites. Additionally, the modeling of conserved and functional interfaces between nucleic acids and ligands, proteins, or other nucleic acids remains poor. For some targets, the experimental structures may not represent the only structure the biomolecular complex occupies in solution or in its functional life cycle, posing a future challenge for the community.
View details for DOI 10.1002/prot.70043
View details for PubMedID 40905273
-
Cryo-EM structure of human telomerase dimer reveals H/ACA RNP-mediated dimerization.
Science (New York, N.Y.)
2025; 389 (6756): eadr5817
Abstract
Telomerase ribonucleoprotein (RNP) synthesizes telomeric repeats at chromosome ends using a telomerase reverse transcriptase (TERT) and a telomerase RNA (hTR in humans). Previous structural work showed that human telomerase is typically monomeric, containing a single copy of TERT and hTR. Evidence for dimeric complexes exists, although the composition, high-resolution structure, and function remain elusive. Here, we report the cryo-electron microscopy (cryo-EM) structure of a human telomerase dimer bound to telomeric DNA. The structure reveals a 26-subunit assembly and a dimerization interface mediated by the Hinge and ACA box (H/ACA) RNP of telomerase. Premature aging disease mutations map to this interface. Disrupting dimer formation affects RNP assembly, bulk telomerase activity, and telomere maintenance in cells. Our findings address a long-standing enigma surrounding the telomerase dimer and suggest a role for the dimer in telomerase assembly.
View details for DOI 10.1126/science.adr5817
View details for PubMedID 40638752
-
Assessment of Protein Complex Predictions in CASP16: Are we making progress?
bioRxiv : the preprint server for biology
2025
Abstract
The assessment of oligomer targets in the Critical Assessment of Structure Prediction Round 16 (CASP16) suggests that complex structure prediction remains an unsolved challenge. More than 30% of targets, particularly antibody-antigen targets, were highly challenging, with each group correctly predicting structures for only about a quarter of such targets. Most CASP16 groups relied on AlphaFold-Multimer (AFM) or AlphaFold3 (AF3) as their core modeling engines. By optimizing input MSAs, refining modeling constructs (using partial rather than full sequences), and employing massive model sampling and selection, top-performing groups were able to significantly outperform the default AFM/AF3 predictions. CASP16 also introduced two additional challenges: Phase 0, which required predictions without stoichiometry information, and Phase 2, which provided participants with thousands of models generated by MassiveFold (MF) to enable large-scale sampling for resource-limited groups. Across all phases, the MULTICOM series and Kiharalab emerged as top performers based on the quality of their best models per target. However, these groups did not have a strong advantage in model ranking, and thus their lead over other teams, such as Yang-Multimer and kozakovvajda, was less pronounced when evaluating only the first submitted models. Compared to CASP15, CASP16 showed moderate overall improvement, likely driven by the release of AF3 and the extensive model sampling employed by top groups. Several notable trends highlight key frontiers for future development. First, the kozakovvajda group significantly outperformed others on antibody-antigen targets, achieving over a 60% success rate without relying on AFM or AF3 as their primary modeling framework, suggesting that alternative approaches may offer promising solutions for these difficult targets. Second, model ranking and selection continue to be major bottlenecks. The PEZYFoldings group demonstrated a notable advantage in selecting their best models as first models, suggesting that their pipeline for model ranking may offer important insights for the field. Finally, the Phase 0 experiment indicated reasonable success in stoichiometry prediction; however, stoichiometry prediction remains challenging for high-order assemblies and targets that differ from available homologous templates. Overall, CASP16 demonstrated steady progress in multimer prediction while emphasizing the urgent need for more effective model ranking strategies, improved stoichiometry prediction, and the development of new modeling methods that extend beyond the current AF-based paradigm.
View details for DOI 10.1101/2025.05.29.656875
View details for PubMedID 40501681
View details for PubMedCentralID PMC12154902
-
Assessment of nucleic acid structure prediction in CASP16.
bioRxiv : the preprint server for biology
2025
Abstract
Consistently accurate 3D nucleic acid structure prediction would facilitate studies of the diverse RNA and DNA molecules underlying life. In CASP16, blind predictions for 42 targets canvassing a full array of nucleic acid functions, from dopamine binding by DNA to formation of elaborate RNA nanocages, were submitted by 65 groups from 46 different labs worldwide. In contrast to concurrent protein structure predictions, performance on nucleic acids was generally poor, with no predictions of previously unseen natural RNA structures achieving TM-scores above 0.8. Even though automated server performance has improved, all top-performing groups were human expert predictors: Vfold, GuangzhouRNA-human, and KiharaLab. Good performance on one template-free modeling target (OLE RNA) and accurate global secondary structure prediction suggested that structural information can be extracted from multiple sequence alignments. However, 3D accuracy generally appeared to depend on the availability of closely related 3D structures, and predictions still did not achieve consistent recovery of pseudoknots, singlet Watson-Crick-Franklin pairs, non-canonical pairs, or tertiary motifs like A-minor interactions. For the first time, blind predictions of nucleic acid interactions with small molecules, proteins, and other nucleic acids could be assessed in CASP16. As with nucleic acid monomers, prediction accuracy for nucleic acid complexes was generally poor unless 3D templates were available. Accounting for template availability, there has not been a notable increase in nucleic acid modeling accuracy between previous blind challenges and CASP16.
View details for DOI 10.1101/2025.05.06.652459
View details for PubMedID 40655015
-
Naturally ornate RNA-only complexes revealed by cryo-EM.
Nature
2025
Abstract
Myriad families of natural RNAs have been proposed, but not yet experimentally shown, to form biologically important structures1-4. Here we report three-dimensional structures of three large ornate bacterial RNAs using cryogenic electron microscopy at resolutions of 2.9-3.1 Å. Without precedent among previously characterized natural RNA molecules, Giant, Ornate, Lake- and Lactobacillales-Derived (GOLLD), Rumen-Originating, Ornate, Large (ROOL), and Ornate Large Extremophilic (OLE) RNAs form homo-oligomeric complexes whose stoichiometries are retained at concentrations lower than expected in the cell. OLE RNA forms a dimeric complex with long co-axial pipes spanning two monomers. Both GOLLD and ROOL form distinct RNA-only multimeric nanocages with diameters larger than the ribosome, empty except for a disordered loop. Extensive intra- and intermolecular A-minor interactions, kissing loops, an unusual A-A helix, and other interactions stabilize the three complexes. Sequence covariation analysis of these large RNAs reveals evolutionary conservation of intermolecular interactions, supporting the biological importance of large, ornate RNA quaternary structures that can assemble without any involvement of proteins.
View details for DOI 10.1038/s41586-025-09073-0
View details for PubMedID 40328315
-
Engaging the Community: CASP Special Interest Groups.
Proteins
2025
Abstract
The Critical Assessment of Structure Prediction (CASP) brings together a diverse group of scientists, from deep learning experts to NMR specialists, all aimed at developing accurate prediction algorithms that can effectively characterize the structural aspects of biomolecules relevant to their functions. Engagement within the CASP community has traditionally been limited to the prediction season and the conference, with limited discourse in the 1.5 years between CASP seasons. CASP special interest groups (SIGs) were established in 2023 to encourage continuous dialogue within the community. The online seminar series has drawn global participation from across disciplines and career stages. This has facilitated cross-disciplinary discussions fostering collaborations. The archives of these seminars have become a vital learning tool for newcomers to the field, lowering the barrier to entry.
View details for DOI 10.1002/prot.26833
View details for PubMedID 40304050
-
Functional relevance of CASP16 nucleic acid predictions as evaluated by structure providers.
bioRxiv : the preprint server for biology
2025
Abstract
Accurate biomolecular structure prediction enables the prediction of mutational effects, the speculation of function based on predicted structural homology, the analysis of ligand binding modes, experimental model building and many other applications. Such algorithms to predict essential functional and structural features remain out of reach for biomolecular. Here, we report quantitative and qualitative evaluation of nucleic acid structures for the CASP16 blind prediction challenge by 12 of the experimental groups who provided nucleic acid targets. Blind predictions accurately model secondary structure and some aspects of tertiary structure, including reasonable global folds for some complex RNAs, however, predictions often lack accuracy in the regions of highest functional importance. All models have inaccuracies in non-canonical regions where, e.g., the nucleic-acid backbone bends or a base forms a non-standard hydrogen bond. These bends and non-canonical interactions are integral to form functionally important regions such as RNA enzymatic active sites. Additionally, the modeling of conserved and functional interfaces between nucleic acids and ligands, proteins, or other nucleic acids remains poor. For some targets, the experimental structures may not represent the only structure the biomolecular complex occupies in solution or in its functional life-cycle, posing a future challenge for the community.
View details for DOI 10.1101/2025.04.15.649049
View details for PubMedID 40568131
-
Complex water networks visualized by cryogenic electron microscopy of RNA.
Nature
2025
Abstract
The stability and function of biomolecules are directly influenced by their myriad interactions with water1-16. In this study, we investigated water through cryogenic electron microscopy (cryo-EM) on a highly solvated molecule, the Tetrahymena ribozyme, determined at 2.2 and 2.3 Å resolutions. By employing segmentation-guided water and ion modeling (SWIM)17,18, an approach combining resolvability and chemical parameters, we automatically modeled and cross-validated water molecules and Mg2+ ions in the ribozyme core, revealing the extensive involvement of water in mediating RNA non-canonical interactions. Unexpectedly, in regions where SWIM does not model ordered water, we observed highly similar densities in both cryo-EM maps. In many of these regions, the cryo-EM densities superimpose with complex water networks predicted by molecular dynamics (MD), supporting their assignment as water and suggesting a biophysical explanation for their elusiveness to conventional atomic coordinate modeling. Our study demonstrates an approach to unveil both rigid and flexible waters that surround biomolecules through cryo-EM map densities, statistical and chemical metrics, and MD simulations.
View details for DOI 10.1038/s41586-025-08855-w
View details for PubMedID 40068818
-
Complex Water Networks Visualized through 2.2-2.3 A Cryogenic Electron Microscopy of RNA.
bioRxiv : the preprint server for biology
2025
Abstract
The stability and function of biomolecules are directly influenced by their myriad interactions with water. In this study, we investigated water through cryogenic electron microscopy (cryo-EM) on a highly solvated molecule, the Tetrahymena ribozyme, determined at 2.2 and 2.3 A resolutions. By employing segmentation-guided water and ion modeling (SWIM), an approach combining resolvability and chemical parameters, we automatically modeled and cross-validated water molecules and Mg 2+ ions in the ribozyme core, revealing the extensive involvement of water in mediating RNA non-canonical interactions. Unexpectedly, in regions where SWIM does not model ordered water, we observed highly similar densities in both cryo-EM maps. In many of these regions, the cryo-EM densities superimpose with complex water networks predicted by molecular dynamics (MD), supporting their assignment as water and suggesting a biophysical explanation for their elusiveness to conventional atomic coordinate modeling. Our study demonstrates an approach to unveil both rigid and flexible waters that surround biomolecules through cryo-EM map densities, statistical and chemical metrics, and MD simulations.
View details for DOI 10.1101/2025.01.23.634578
View details for PubMedID 39896454
-
Conformational ensembles reveal the origins of serine protease catalysis.
Science (New York, N.Y.)
2025; 387 (6735): eado5068
Abstract
Enzymes exist in ensembles of states that encode the energetics underlying their catalysis. Conformational ensembles built from 1231 structures of 17 serine proteases revealed atomic-level changes across their reaction states. By comparing the enzymatic and solution reaction, we identified molecular features that provide catalysis and quantified their energetic contributions to catalysis. Serine proteases precisely position their reactants in destabilized conformers, creating a downhill energetic gradient that selectively favors the motions required for reaction while limiting off-pathway conformational states. The same catalytic features have repeatedly evolved in proteases and additional enzymes across multiple distinct structural folds. Our ensemble-function analyses revealed previously unknown catalytic features, provided quantitative models based on simple physical and chemical principles, and identified motifs recurrent in nature that may inspire enzyme design.
View details for DOI 10.1126/science.ado5068
View details for PubMedID 39946472
-
RNA-Puzzles Round V: blind predictions of 23 RNA structures.
Nature methods
2024
Abstract
RNA-Puzzles is a collective endeavor dedicated to the advancement and improvement of RNA three-dimensional structure prediction. With agreement from structural biologists, RNA structures are predicted by modeling groups before publication of the experimental structures. We report a large-scale set of predictions by 18 groups for 23 RNA-Puzzles: 4 RNA elements, 2 Aptamers, 4 Viral elements, 5 Ribozymes and 8 Riboswitches. We describe automatic assessment protocols for comparisons between prediction and experiment. Our analyses reveal some critical steps to be overcome to achieve good accuracy in modeling RNA structures: identification of helix-forming pairs and of non-Watson-Crick modules, correct coaxial stacking between helices and avoidance of entanglements. Three of the top four modeling groups in this round also ranked among the top four in the CASP15 contest.
View details for DOI 10.1038/s41592-024-02543-9
View details for PubMedID 39623050
View details for PubMedCentralID 3312550
-
Tertiary folds of the SL5 RNA from the 5' proximal region of SARS-CoV-2 and related coronaviruses.
Proceedings of the National Academy of Sciences of the United States of America
2024; 121 (10): e2320493121
Abstract
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus severe-acute-respiratory-syndrome-related coronavirus 2 (SARS-CoV-2), resolved at 4.7 Å resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4 to 6.9 Å resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across these human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9 to 8.0 Å resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4 to 9.0 Å resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities and notable differences, with implications for potential protein-binding modes and therapeutic targets.
View details for DOI 10.1073/pnas.2320493121
View details for PubMedID 38427602
-
Ribonanza: deep learning of RNA structure through dual crowdsourcing.
bioRxiv : the preprint server for biology
2024
Abstract
Prediction of RNA structure from sequence remains an unsolved problem, and progress has been slowed by a paucity of experimental data. Here, we present Ribonanza, a dataset of chemical mapping measurements on two million diverse RNA sequences collected through Eterna and other crowdsourced initiatives. Ribonanza measurements enabled solicitation, training, and prospective evaluation of diverse deep neural networks through a Kaggle challenge, followed by distillation into a single, self-contained model called RibonanzaNet. When fine tuned on auxiliary datasets, RibonanzaNet achieves state-of-the-art performance in modeling experimental sequence dropout, RNA hydrolytic degradation, and RNA secondary structure, with implications for modeling RNA tertiary structure.
View details for DOI 10.1101/2024.02.24.581671
View details for PubMedID 38464325
-
CASP15 cryo-EM protein and RNA targets: Refinement and analysis using experimental maps.
Proteins
2023; 91 (12): 1935-1951
Abstract
CASP assessments primarily rely on comparing predicted coordinates with experimental reference structures. However, experimental structures by their nature are only models themselves-their construction involves a certain degree of subjectivity in interpreting density maps and translating them to atomic coordinates. Here, we directly utilized density maps to evaluate the predictions by employing a method for ranking the quality of protein chain predictions based on their fit into the experimental density. The fit-based ranking was found to correlate well with the CASP assessment scores. Overall, the evaluation against the density map indicated that the models are of high accuracy, and occasionally even better than the reference structure in some regions of the model. Local assessment of predicted side chains in a 1.52A resolution map showed that side-chains are sometimes poorly positioned. Additionally, the top 118 predictions associated with 9 protein target reference structures were selected for automated refinement, in addition to the top 40 predictions for 11 RNA targets. For both proteins and RNA, the refinement of CASP15 predictions resulted in structures that are close to the reference target structure. This refinement was successful despite large conformational changes often being required, showing that predictions from CASP-assessed methods could serve as a good starting point for building atomic models in cryo-EM maps for both proteins and RNA. Loop modeling continued to pose a challenge for predictors, and together with the lack of consensus amongst models in these regions suggests that modeling, in combination with model-fit to the density, holds the potential for identifying more flexible regions within the structure.
View details for DOI 10.1002/prot.26644
View details for PubMedID 37994556
-
Tertiary folds of the SL5 RNA from the 5' proximal region of SARS-CoV-2 and related coronaviruses.
bioRxiv : the preprint server for biology
2023
Abstract
Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically-determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus SARS-CoV-2, resolved at 4.7 A resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4-6.9 A resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across the studied human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9-8.0 A resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4-9.0 A resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities, with implications for potential protein-binding modes and therapeutic targets.Significance: The three-dimensional structures of viral RNAs are of interest to the study of viral pathogenesis and therapeutic design, but the three-dimensional structures of viral RNAs remain poorly characterized. Here, we provide the first 3D structures of the SL5 domain (124-160 nt, 40.0-51.4 kDa) from the majority of human-infecting coronaviruses. All studied SL5s exhibit a similar 4-way junction, with their crossing angles grouped along phylogenetic boundaries. Further, across all species studied, conserved UUYYGU hexaloop pairs are located at opposing ends of a coaxial stack, suggesting that their three-dimensional arrangement is important for their as-of-yet defined function. These conserved tertiary features support the relevance of SL5 for pan-coronavirus fitness and highlight new routes in understanding its molecular and virological roles and in developing SL5-based antivirals. Classification: Biological Sciences, Biophysics and Computational Biology.
View details for DOI 10.1101/2023.11.22.567964
View details for PubMedID 38076883
-
Assessment of three-dimensional RNA structure prediction in CASP15.
Proteins
2023
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report assessments of RNA structure predictions in CASP15, the first CASP exercise that involved RNA structure modeling. Forty-two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, which did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and x-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global fold of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as noncanonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
View details for DOI 10.1002/prot.26602
View details for PubMedID 37876231
-
RNA target highlights in CASP15: Evaluation of predicted models by structure providers.
Proteins
2023
Abstract
The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.
View details for DOI 10.1002/prot.26550
View details for PubMedID 37466021
-
New prediction categories in CASP15.
Proteins
2023
Abstract
Prediction categories in the Critical Assessment of Structure Prediction (CASP) experiments change with the need to address specific problems in structure modeling. In CASP15, four new prediction categories were introduced: RNA structure, ligand-protein complexes, accuracy of oligomeric structures and their interfaces, and ensembles of alternative conformations. This paper lists technical specifications for these categories and describes their integration in the CASP data management system.
View details for DOI 10.1002/prot.26515
View details for PubMedID 37306011
-
Assessment of three-dimensional RNA structure prediction in CASP15.
bioRxiv : the preprint server for biology
2023
Abstract
The prediction of RNA three-dimensional structures remains an unsolved problem. Here, we report double-blind assessments of RNA structure predictions in CASP15, the first CASP exercise in which RNA modeling was assessed. Forty two predictor groups submitted models for at least one of twelve RNA-containing targets. These models were evaluated by the RNA-Puzzles organizers and, separately, by a CASP-recruited team using metrics (GDT, lDDT) and approaches (Z-score rankings) initially developed for assessment of proteins and generalized here for RNA assessment. The two assessments independently ranked the same predictor groups as first (AIchemy_RNA2), second (Chen), and third (RNAPolis and GeneSilico, tied); predictions from deep learning approaches were significantly worse than these top ranked groups, who did not use deep learning. Further analyses based on direct comparison of predicted models to cryogenic electron microscopy (cryo-EM) maps and X-ray diffraction data support these rankings. With the exception of two RNA-protein complexes, models submitted by CASP15 groups correctly predicted the global topology of the RNA targets. Comparisons of CASP15 submissions to designed RNA nanostructures as well as molecular replacement trials highlight the potential utility of current RNA modeling approaches for RNA nanotechnology and structural biology, respectively. Nevertheless, challenges remain in modeling fine details such as non-canonical pairs, in ranking among submitted models, and in prediction of multiple structures resolved by cryo-EM or crystallography.
View details for DOI 10.1101/2023.04.25.538330
View details for PubMedID 37162955
-
Ensemble-function relationships to connect structure to mechanism: application of EnsemblePDB to the serine protease reaction coordinate and its catalytic features
CELL PRESS. 2022: 441A
View details for Web of Science ID 000759523002687
-
Cryo-EM and antisense targeting of the 28-kDa frameshift stimulation element from the SARS-CoV-2 RNA genome.
Nature structural & molecular biology
2021
Abstract
Drug discovery campaigns against COVID-19 are beginning to target the SARS-CoV-2 RNA genome. The highly conserved frameshift stimulation element (FSE), required for balanced expression of viral proteins, is a particularly attractive SARS-CoV-2 RNA target. Here we present a 6.9A resolution cryo-EM structure of the FSE (88nucleotides, ~28kDa), validated through an RNA nanostructure tagging method. The tertiary structure presents a topologically complex fold in which the 5' end is threaded through a ring formed inside a three-stem pseudoknot. Guided by this structure, we develop antisense oligonucleotides that impair FSE function in frameshifting assays and knock down SARS-CoV-2 virus replication in A549-ACE2 cells at 100nM concentration.
View details for DOI 10.1038/s41594-021-00653-y
View details for PubMedID 34426697
-
Interpretation of RNA cryo-EM maps of various resolutions
INT UNION CRYSTALLOGRAPHY. 2021: A217
View details for DOI 10.1107/S0108767321097828
View details for Web of Science ID 000720840500218
-
De novo 3D models of SARS-CoV-2 RNA elements from consensus experimental secondary structures.
Nucleic acids research
2021
Abstract
The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.
View details for DOI 10.1093/nar/gkab119
View details for PubMedID 33693814
-
Structure of hepcidin-bound ferroportin reveals iron homeostatic mechanisms.
Nature
2020
Abstract
The serum iron level in humans is tightly controlled by the action of the hormone hepcidin on the iron efflux transporter ferroportin. Hepcidin regulates iron absorption and recycling by inducing ferroportin internalization and degradation1. Aberrant ferroportin activity can lead to diseases of iron overload, such as hemochromatosis, or iron limitation anemias2. Here, we determined cryogenic electron microscopy (cryo-EM) structures of ferroportin in lipid nanodiscs, both in the apo state and in complex with cobalt, an iron mimetic, and hepcidin. These structures and accompanying molecular dynamics simulations identify two metal binding sites within the N- and C-domains of ferroportin. Hepcidin binds ferroportin in an outward-open conformation and completely occludes the iron efflux pathway to inhibit transport. The carboxy-terminus of hepcidin directly contacts the divalent metal in the ferroportin C-domain. We further show that hepcidin binding to ferroportin is coupled to iron binding, with an 80-fold increase in hepcidin affinity in the presence of iron. These results suggest a model for hepcidin regulation of ferroportin, where only iron loaded ferroportin molecules are targeted for degradation. More broadly, our structural and functional insights are likely to enable more targeted manipulation of the hepcidin-ferroportin axis in disorders of iron homeostasis.
View details for DOI 10.1038/s41586-020-2668-z
View details for PubMedID 32814342
https://orcid.org/0000-0002-6935-518X