Towards Universal Cell Embeddings: Integrating Single-cell RNA-seq Datasets across Species with SATURN.
bioRxiv : the preprint server for biology
Analysis of single-cell datasets generated from diverse organisms offers unprecedented opportunities to unravel fundamental evolutionary processes of conservation and diversification of cell types. However, inter-species genomic differences limit the joint analysis of cross-species datasets to homologous genes. Here, we present SATURN, a deep learning method for learning universal cell embeddings that encodes genes' biological properties using protein language models. By coupling protein embeddings from language models with RNA expression, SATURN integrates datasets profiled from different species regardless of their genomic similarity. SATURN has a unique ability to detect functionally related genes co-expressed across species, redefining differential expression for cross-species analysis. We apply SATURN to three species whole-organism atlases and frog and zebrafish embryogenesis datasets. We show that cell embeddings learnt in SATURN can be effectively used to transfer annotations across species and identify both homologous and species-specific cell types, even across evolutionarily remote species. Finally, we use SATURN to reannotate the five species Cell Atlas of Human Trabecular Meshwork and Aqueous Outflow Structures and find evidence of potentially divergent functions between glaucoma associated genes in humans and other species.
View details for DOI 10.1101/2023.02.03.526939
View details for PubMedID 36778387
View details for PubMedCentralID PMC9915700
- Machine learning predicts cellular response to genetic perturbation NATURE BIOTECHNOLOGY 2023
Predicting transcriptional outcomes of novel multigene perturbations with GEARS.
Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene-gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.
View details for DOI 10.1038/s41587-023-01905-6
View details for PubMedID 37592036
View details for PubMedCentralID 9310669
- Artificial intelligence foundation for therapeutic science. Nature chemical biology 2022
- Improving Accuracy of Nuclei Segmentation by Reducing Histological Image Variability SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 3–10