All Publications


  • Toward universal cell embeddings: integrating single-cell RNA-seq datasets across species with SATURN. Nature methods Rosen, Y., Brbić, M., Roohani, Y., Swanson, K., Li, Z., Leskovec, J. 2024

    Abstract

    Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, interspecies genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN can detect functionally related genes coexpressed across species, redefining differential expression for cross-species analysis. Applying SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets, we show that SATURN can effectively transfer annotations across species, even when they are evolutionarily remote. We also demonstrate that SATURN can be used to find potentially divergent gene functions between glaucoma-associated genes in humans and four other species.

    View details for DOI 10.1038/s41592-024-02191-z

    View details for PubMedID 38366243

    View details for PubMedCentralID 5762154

  • Towards Universal Cell Embeddings: Integrating Single-cell RNA-seq Datasets across Species with SATURN. bioRxiv : the preprint server for biology Rosen, Y., Brbić, M., Roohani, Y., Swanson, K., Li, Z., Leskovec, J. 2023

    Abstract

    Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, inter-species genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here, we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN has a unique ability to detect functionally related genes co-expressed across species, redefining differential expression for cross-species analysis. We apply SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets. We show that cell embeddings learnt in SATURN can be effectively used to transfer annotations across species and identify both homologous and species-specific cell types, even across evolutionarily remote species. Finally, we use SATURN to reannotate the five species Cell Atlas of Human Trabecular Meshwork and Aqueous Outflow Structures and find evidence of potentially divergent functions between glaucoma associated genes in humans and other species.

    View details for DOI 10.1101/2023.02.03.526939

    View details for PubMedID 36778387

    View details for PubMedCentralID PMC9915700

  • Machine learning predicts cellular response to genetic perturbation NATURE BIOTECHNOLOGY Roohani, Y., Leskovec, J. 2023

    View details for DOI 10.1038/s41587-023-01907-4

    View details for Web of Science ID 001050053500001

    View details for PubMedID 37592037

    View details for PubMedCentralID 5181115

  • Predicting transcriptional outcomes of novel multigene perturbations with GEARS. Nature biotechnology Roohani, Y., Huang, K., Leskovec, J. 2023

    Abstract

    Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene-gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.

    View details for DOI 10.1038/s41587-023-01905-6

    View details for PubMedID 37592036

    View details for PubMedCentralID 9310669

  • Artificial intelligence foundation for therapeutic science. Nature chemical biology Huang, K., Fu, T., Gao, W., Zhao, Y., Roohani, Y., Leskovec, J., Coley, C. W., Xiao, C., Sun, J., Zitnik, M. 2022

    View details for DOI 10.1038/s41589-022-01131-2

    View details for PubMedID 36131149

  • Improving Accuracy of Nuclei Segmentation by Reducing Histological Image Variability Roohani, Y. H., Kiss, E. G., Stoyanov, D., Taylor, Z., Ciompi, F., Xu, Y. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 3–10