All Publications


  • SPLASH: A statistical, reference-free genomic algorithm unifies biological discovery. Cell Chaung, K., Baharav, T. Z., Henderson, G., Zheludev, I. N., Wang, P. L., Salzman, J. 2023; 186 (25): 5440-5456.e26

    Abstract

    Today's genomics workflows typically require alignment to a reference sequence, which limits discovery. We introduce a unifying paradigm, SPLASH (Statistically Primary aLignment Agnostic Sequence Homing), which directly analyzes raw sequencing data, using a statistical test to detect a signature of regulation: sample-specific sequence variation. SPLASH detects many types of variation and can be efficiently run at scale. We show that SPLASH identifies complex mutation patterns in SARS-CoV-2, discovers regulated RNA isoforms at the single-cell level, detects the vast sequence diversity of adaptive immune receptors, and uncovers biology in non-model organisms undocumented in their reference genomes: geographic and seasonal variation and diatom association in eelgrass, an oceanic plant impacted by climate change, and tissue-specific transcripts in octopus. SPLASH is a unifying approach to genomic analysis that enables expansive discovery without metadata or references.

    View details for DOI 10.1016/j.cell.2023.10.028

    View details for PubMedID 38065078

  • Tertiary folds of the SL5 RNA from the 5' proximal region of SARS-CoV-2 and related coronaviruses. bioRxiv : the preprint server for biology Kretsch, R. C., Xu, L., Zheludev, I. N., Zhou, X., Huang, R., Nye, G., Li, S., Zhang, K., Chiu, W., Das, R. 2023

    Abstract

    Coronavirus genomes sequester their start codons within stem-loop 5 (SL5), a structured, 5' genomic RNA element. In most alpha- and betacoronaviruses, the secondary structure of SL5 is predicted to contain a four-way junction of helical stems, some of which are capped with UUYYGU hexaloops. Here, using cryogenic electron microscopy (cryo-EM) and computational modeling with biochemically-determined secondary structures, we present three-dimensional structures of SL5 from six coronaviruses. The SL5 domain of betacoronavirus SARS-CoV-2, resolved at 4.7 A resolution, exhibits a T-shaped structure, with its UUYYGU hexaloops at opposing ends of a coaxial stack, the T's "arms." Further analysis of SL5 domains from SARS-CoV-1 and MERS (7.1 and 6.4-6.9 A resolution, respectively) indicate that the junction geometry and inter-hexaloop distances are conserved features across the studied human-infecting betacoronaviruses. The MERS SL5 domain displays an additional tertiary interaction, which is also observed in the non-human-infecting betacoronavirus BtCoV-HKU5 (5.9-8.0 A resolution). SL5s from human-infecting alphacoronaviruses, HCoV-229E and HCoV-NL63 (6.5 and 8.4-9.0 A resolution, respectively), exhibit the same coaxial stacks, including the UUYYGU-capped arms, but with a phylogenetically distinct crossing angle, an X-shape. As such, all SL5 domains studied herein fold into stable tertiary structures with cross-genus similarities, with implications for potential protein-binding modes and therapeutic targets.Significance: The three-dimensional structures of viral RNAs are of interest to the study of viral pathogenesis and therapeutic design, but the three-dimensional structures of viral RNAs remain poorly characterized. Here, we provide the first 3D structures of the SL5 domain (124-160 nt, 40.0-51.4 kDa) from the majority of human-infecting coronaviruses. All studied SL5s exhibit a similar 4-way junction, with their crossing angles grouped along phylogenetic boundaries. Further, across all species studied, conserved UUYYGU hexaloop pairs are located at opposing ends of a coaxial stack, suggesting that their three-dimensional arrangement is important for their as-of-yet defined function. These conserved tertiary features support the relevance of SL5 for pan-coronavirus fitness and highlight new routes in understanding its molecular and virological roles and in developing SL5-based antivirals. Classification: Biological Sciences, Biophysics and Computational Biology.

    View details for DOI 10.1101/2023.11.22.567964

    View details for PubMedID 38076883

  • RNA target highlights in CASP15: Evaluation of predicted models by structure providers. Proteins Kretsch, R. C., Andersen, E. S., Bujnicki, J. M., Chiu, W., Das, R., Luo, B., Masquida, B., McRae, E. K., Schroeder, G. M., Su, Z., Wedekind, J. E., Xu, L., Zhang, K., Zheludev, I. N., Moult, J., Kryshtafovych, A. 2023

    Abstract

    The first RNA category of the Critical Assessment of Techniques for Structure Prediction competition was only made possible because of the scientists who provided experimental structures to challenge the predictors. In this article, these scientists offer a unique and valuable analysis of both the successes and areas for improvement in the predicted models. All 10 RNA-only targets yielded predictions topologically similar to experimentally determined structures. For one target, experimentalists were able to phase their x-ray diffraction data by molecular replacement, showing a potential application of structure predictions for RNA structural biologists. Recommended areas for improvement include: enhancing the accuracy in local interaction predictions and increased consideration of the experimental conditions such as multimerization, structure determination method, and time along folding pathways. The prediction of RNA-protein complexes remains the most significant challenge. Finally, given the intrinsic flexibility of many RNAs, we propose the consideration of ensemble models.

    View details for DOI 10.1002/prot.26550

    View details for PubMedID 37466021

  • Hybrids of RNA viruses and viroid-like elements replicate in fungi. Nature communications Forgia, M., Navarro, B., Daghino, S., Cervera, A., Gisel, A., Perotto, S., Aghayeva, D. N., Akinyuwa, M. F., Gobbi, E., Zheludev, I. N., Edgar, R. C., Chikhi, R., Turina, M., Babaian, A., Di Serio, F., de la Pena, M. 2023; 14 (1): 2591

    Abstract

    Earth's life may have originated as self-replicating RNA, and it has been argued that RNA viruses and viroid-like elements are remnants of such pre-cellular RNA world. RNA viruses are defined by linear RNA genomes encoding an RNA-dependent RNA polymerase (RdRp), whereas viroid-like elements consist of small, single-stranded, circular RNA genomes that, in some cases, encode paired self-cleaving ribozymes. Here we show that the number of candidate viroid-like elements occurring in geographically and ecologically diverse niches is much higher than previously thought. We report that, amongst these circular genomes, fungal ambiviruses are viroid-like elements that undergo rolling circle replication and encode their own viral RdRp. Thus, ambiviruses are distinct infectious RNAs showing hybrid features of viroid-like RNAs and viruses. We also detected similar circular RNAs, containing active ribozymes and encoding RdRps, related to mitochondrial-like fungal viruses, highlighting fungi as an evolutionary hub for RNA viruses and viroid-like elements. Our findings point to a deep co-evolutionary history between RNA viruses and subviral elements and offer new perspectives in the origin and evolution of primordial infectious agents, and RNA life.

    View details for DOI 10.1038/s41467-023-38301-2

    View details for PubMedID 37147358

  • Restriction Endonuclease-Based Modification-Dependent Enrichment (REMoDE) of DNA for Metagenomic Sequencing. Applied and environmental microbiology Enam, S. U., Cherry, J. L., Leonard, S. R., Zheludev, I. N., Lipman, D. J., Fire, A. Z. 2022: e0167022

    Abstract

    Metagenomic sequencing is a swift and powerful tool to ascertain the presence of an organism of interest in a sample. However, sequencing coverage of the organism of interest can be insufficient due to an inundation of reads from irrelevant organisms in the sample. Here, we report a nuclease-based approach to rapidly enrich for DNA from certain organisms, including enterobacteria, based on their differential endogenous modification patterns. We exploit the ability of taxon-specific methylated motifs to resist the action of cognate methylation-sensitive restriction endonucleases that thereby digest unwanted, unmethylated DNA. Subsequently, we use a distributive exonuclease or electrophoretic separation to deplete or exclude the digested fragments, thus enriching for undigested DNA from the organism of interest. As a proof of concept, we apply this method to enrich for the enterobacteria Escherichia coli and Salmonella enterica by 11- to 142-fold from mock metagenomic samples and validate this approach as a versatile means to enrich for genomes of interest in metagenomic samples. IMPORTANCE Pathogens that contaminate the food supply or spread through other means can cause outbreaks that bring devastating repercussions to the health of a populace. Investigations to trace the source of these outbreaks are initiated rapidly but can be drawn out due to the labored methods of pathogen isolation. Metagenomic sequencing can alleviate this hurdle but is often insufficiently sensitive. The approach and implementations detailed here provide a rapid means to enrich for many pathogens involved in foodborne outbreaks, thereby improving the utility of metagenomic sequencing as a tool in outbreak investigations. Additionally, this approach provides a means to broadly enrich for otherwise minute levels of modified DNA, which may escape unnoticed in metagenomic samples.

    View details for DOI 10.1128/aem.01670-22

    View details for PubMedID 36519847

  • A statistical, reference-free algorithm subsumes myriad problems in genome science and enables novel discovery. bioRxiv : the preprint server for biology Chaung, K., Baharav, T., Zheludev, I., Salzman, J. 2022

    Abstract

    We present a unifying statistical formulation for many fundamental problems in genome science and develop a reference-free, highly efficient algorithm that solves it. Sequence diversification - nucleic acid mutation, rearrangement, and reassortment - is necessary for the differentiation and adaptation of all replicating organisms. Identifying sample-dependent sequence diversification, e.g. adaptation or regulated isoform expression, is fundamental to many biological studies, and is achieved today with next-generation sequencing. Paradoxically, current analyses begin with attempts to align to or assemble necessarily incomplete reference genomes, a step that is at odds with detecting the most important examples of sequence diversification. In addition to being computationally expensive, reference-first approaches suffer from diminished discovery power: they are blind to unaligned or mis-aligned sequences. We provide a unifying formulation for detecting sample-dependent sequence diversification that subsumes core problems faced in diverse biological fields. This formulation allows us to construct an algorithm that performs inference on raw reads, avoiding references completely. We illustrate the power of our approach for new data-driven biological discovery with examples of novel single-cell resolved, cell-type-specific isoform expression, including expression in the major histocompatibility complex, and de novo prediction of viral protein adaptation including in SARS-CoV-2.

    View details for DOI 10.1101/2022.06.24.497555

    View details for PubMedID 35794890

  • Cryo-EM and antisense targeting of the 28-kDa frameshift stimulation element from the SARS-CoV-2 RNA genome. Nature structural & molecular biology Zhang, K., Zheludev, I. N., Hagey, R. J., Haslecker, R., Hou, Y. J., Kretsch, R., Pintilie, G. D., Rangan, R., Kladwang, W., Li, S., Wu, M. T., Pham, E. A., Bernardin-Souibgui, C., Baric, R. S., Sheahan, T. P., D'Souza, V., Glenn, J. S., Chiu, W., Das, R. 2021

    Abstract

    Drug discovery campaigns against COVID-19 are beginning to target the SARS-CoV-2 RNA genome. The highly conserved frameshift stimulation element (FSE), required for balanced expression of viral proteins, is a particularly attractive SARS-CoV-2 RNA target. Here we present a 6.9A resolution cryo-EM structure of the FSE (88nucleotides, ~28kDa), validated through an RNA nanostructure tagging method. The tertiary structure presents a topologically complex fold in which the 5' end is threaded through a ring formed inside a three-stem pseudoknot. Guided by this structure, we develop antisense oligonucleotides that impair FSE function in frameshifting assays and knock down SARS-CoV-2 virus replication in A549-ACE2 cells at 100nM concentration.

    View details for DOI 10.1038/s41594-021-00653-y

    View details for PubMedID 34426697

  • De novo 3D models of SARS-CoV-2 RNA elements from consensus experimental secondary structures. Nucleic acids research Rangan, R., Watkins, A. M., Chacon, J., Kretsch, R., Kladwang, W., Zheludev, I. N., Townley, J., Rynge, M., Thain, G., Das, R. 2021

    Abstract

    The rapid spread of COVID-19 is motivating development of antivirals targeting conserved SARS-CoV-2 molecular machinery. The SARS-CoV-2 genome includes conserved RNA elements that offer potential small-molecule drug targets, but most of their 3D structures have not been experimentally characterized. Here, we provide a compilation of chemical mapping data from our and other labs, secondary structure models, and 3D model ensembles based on Rosetta's FARFAR2 algorithm for SARS-CoV-2 RNA regions including the individual stems SL1-8 in the extended 5' UTR; the reverse complement of the 5' UTR SL1-4; the frameshift stimulating element (FSE); and the extended pseudoknot, hypervariable region, and s2m of the 3' UTR. For eleven of these elements (the stems in SL1-8, reverse complement of SL1-4, FSE, s2m and 3' UTR pseudoknot), modeling convergence supports the accuracy of predicted low energy states; subsequent cryo-EM characterization of the FSE confirms modeling accuracy. To aid efforts to discover small molecule RNA binders guided by computational models, we provide a second set of similarly prepared models for RNA riboswitches that bind small molecules. Both datasets ('FARFAR2-SARS-CoV-2', https://github.com/DasLab/FARFAR2-SARS-CoV-2; and 'FARFAR2-Apo-Riboswitch', at https://github.com/DasLab/FARFAR2-Apo-Riboswitch') include up to 400 models for each RNA element, which may facilitate drug discovery approaches targeting dynamic ensembles of RNA molecules.

    View details for DOI 10.1093/nar/gkab119

    View details for PubMedID 33693814

  • RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses: a first look. RNA (New York, N.Y.) Rangan, R., Zheludev, I. N., Das, R. 2020

    Abstract

    As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Here we present a first look at RNA sequence conservation and structural propensities in the SARS-CoV-2 genome. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nucleotides as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences subsequently reported from the COVID-19 outbreak, and we present a curated list of 30 'SARS-related-conserved' regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 'SARS-CoV-2-conserved-structured' regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the extended 5' UTR, frame-shifting element, and 3' UTR. Last, we predict regions of the SARS-CoV-2 viral genome that have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 'SARS-CoV-2-conserved-unstructured' genomic regions may be most easily targeted in primer-based diagnostic and oligonucleotide-based therapeutic strategies.

    View details for DOI 10.1261/rna.076141.120

    View details for PubMedID 32398273

  • Accelerated cryo-EM-guided determination of three-dimensional RNA-only structures. Nature methods Kappel, K. n., Zhang, K. n., Su, Z. n., Watkins, A. M., Kladwang, W. n., Li, S. n., Pintilie, G. n., Topkar, V. V., Rangan, R. n., Zheludev, I. N., Yesselman, J. D., Chiu, W. n., Das, R. n. 2020; 17 (7): 699–707

    Abstract

    The discovery and design of biologically important RNA molecules is outpacing three-dimensional structural characterization. Here, we demonstrate that cryo-electron microscopy can routinely resolve maps of RNA-only systems and that these maps enable subnanometer-resolution coordinate estimation when complemented with multidimensional chemical mapping and Rosetta DRRAFTER computational modeling. This hybrid 'Ribosolve' pipeline detects and falsifies homologies and conformational rearrangements in 11 previously unknown 119- to 338-nucleotide protein-free RNA structures: full-length Tetrahymena ribozyme, hc16 ligase with and without substrate, full-length Vibrio cholerae and Fusobacterium nucleatum glycine riboswitch aptamers with and without glycine, Mycobacterium SAM-IV riboswitch with and without S-adenosylmethionine, and the computer-designed ATP-TTR-3 aptamer with and without AMP. Simulation benchmarks, blind challenges, compensatory mutagenesis, cross-RNA homologies and internal controls demonstrate that Ribosolve can accurately resolve the global architectures of RNA molecules but does not resolve atomic details. These tests offer guidelines for making inferences in future RNA structural studies with similarly accelerated throughput.

    View details for DOI 10.1038/s41592-020-0878-9

    View details for PubMedID 32616928

  • RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses. bioRxiv : the preprint server for biology Rangan, R. n., Zheludev, I. N., Das, R. n. 2020

    Abstract

    As the COVID-19 outbreak spreads, there is a growing need for a compilation of conserved RNA genome regions in the SARS-CoV-2 virus along with their structural propensities to guide development of antivirals and diagnostics. Using sequence alignments spanning a range of betacoronaviruses, we rank genomic regions by RNA sequence conservation, identifying 79 regions of length at least 15 nucleotides as exactly conserved over SARS-related complete genome sequences available near the beginning of the COVID-19 outbreak. We then confirm the conservation of the majority of these genome regions across 739 SARS-CoV-2 sequences reported to date from the current COVID-19 outbreak, and we present a curated list of 30 'SARS-related-conserved' regions. We find that known RNA structured elements curated as Rfam families and in prior literature are enriched in these conserved genome regions, and we predict additional conserved, stable secondary structures across the viral genome. We provide 106 'SARS-CoV-2-conserved-structured' regions as potential targets for antivirals that bind to structured RNA. We further provide detailed secondary structure models for the 5´ UTR, frame-shifting element, and 3´ UTR. Last, we predict regions of the SARS-CoV-2 viral genome have low propensity for RNA secondary structure and are conserved within SARS-CoV-2 strains. These 59 'SARS-CoV-2-conserved-unstructured' genomic regions may be most easily targeted in primer-based diagnostic and oligonucleotide-based therapeutic strategies.

    View details for DOI 10.1101/2020.03.27.012906

    View details for PubMedID 32511306

    View details for PubMedCentralID PMC7217285