Stanford Advisors

All Publications

  • Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models. Nature biomedical engineering Carrillo-Perez, F., Pizurica, M., Zheng, Y., Nandi, T. N., Madduri, R., Shen, J., Gevaert, O. 2024


    Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

    View details for DOI 10.1038/s41551-024-01193-8

    View details for PubMedID 38514775

  • Digital profiling of cancer transcriptomes from histology images with grouped vision attention. bioRxiv : the preprint server for biology Zheng, Y., Pizurica, M., Carrillo-Perez, F., Noor, H., Yao, W., Wohlfart, C., Marchal, K., Vladimirova, A., Gevaert, O. 2023


    Cancer is a heterogeneous disease that demands precise molecular profiling for better understanding and management. RNA-sequencing has emerged as a potent tool to unravel the transcriptional heterogeneity. However, large-scale characterization of cancer transcriptomes is hindered by the limitations of costs and tissue accessibility. Here, we develop SEQUOIA , a deep learning model employing a transformer architecture to predict cancer transcriptomes from whole-slide histology images. We pre-train the model using data from 2,242 normal tissues, and the model is fine-tuned and evaluated in 4,218 tumor samples across nine cancer types. The results are further validated across two independent cohorts compromising 1,305 tumors. The highest performance was observed in cancers from breast, kidney and lung, where SEQUOIA accurately predicted 13,798, 10,922 and 9,735 genes, respectively. The well predicted genes are associated with the regulation of inflammatory response, cell cycles and hypoxia-related metabolic pathways. Leveraging the well predicted genes, we develop a digital signature to predict the risk of recurrence in breast cancer. While the model is trained at the tissue-level, we showcase its potential in predicting spatial gene expression patterns using spatial transcriptomics datasets. SEQUOIA deciphers clinically relevant gene expression patterns from histology images, opening avenues for improved cancer management and personalized therapies.

    View details for DOI 10.1101/2023.09.28.560068

    View details for PubMedID 37808782

  • EpiMix is an integrative tool for epigenomic subtyping using DNA methylation. Cell reports methods Zheng, Y., Jun, J., Brennan, K., Gevaert, O. 2023; 3 (7): 100515


    DNA methylation (DNAme) is a major epigenetic factor influencing gene expression with alterations leading to cancer and immunological and cardiovascular diseases. Recent technological advances have enabled genome-wide profiling of DNAme in large human cohorts. There is a need for analytical methods that can more sensitively detect differential methylation profiles present in subsets of individuals from these heterogeneous, population-level datasets. We developed an end-to-end analytical framework named "EpiMix" for population-level analysis of DNAme and gene expression. Compared with existing methods, EpiMix showed higher sensitivity in detecting abnormal DNAme that was present in only small patient subsets. We extended the model-based analyses of EpiMix to cis-regulatory elements within protein-coding genes, distal enhancers, and genes encoding microRNAs and long non-coding RNAs (lncRNAs). Using cell-type-specific data from two separate studies, we discover epigenetic mechanisms underlying childhood food allergy and survival-associated, methylation-driven ncRNAs in non-small cell lung cancer.

    View details for DOI 10.1016/j.crmeth.2023.100515

    View details for PubMedID 37533639

    View details for PubMedCentralID PMC10391348

  • Spatial cellular architecture predicts prognosis in glioblastoma. Nature communications Zheng, Y., Carrillo-Perez, F., Pizurica, M., Heiland, D. H., Gevaert, O. 2023; 14 (1): 4122


    Intra-tumoral heterogeneity and cell-state plasticity are key drivers for the therapeutic resistance of glioblastoma. Here, we investigate the association between spatial cellular organization and glioblastoma prognosis. Leveraging single-cell RNA-seq and spatial transcriptomics data, we develop a deep learning model to predict transcriptional subtypes of glioblastoma cells from histology images. Employing this model, we phenotypically analyze 40 million tissue spots from 410 patients and identify consistent associations between tumor architecture and prognosis across two independent cohorts. Patients with poor prognosis exhibit higher proportions of tumor cells expressing a hypoxia-induced transcriptional program. Furthermore, a clustering pattern of astrocyte-like tumor cells is associated with worse prognosis, while dispersion and connection of the astrocytes with other transcriptional subtypes correlate with decreased risk. To validate these results, we develop a separate deep learning model that utilizes histology images to predict prognosis. Applying this model to spatial transcriptomics data reveal survival-associated regional gene expression programs. Overall, our study presents a scalable approach to unravel the transcriptional heterogeneity of glioblastoma and establishes a critical connection between spatial cellular architecture and clinical outcomes.

    View details for DOI 10.1038/s41467-023-39933-0

    View details for PubMedID 37433817

    View details for PubMedCentralID PMC10336135

  • Multimodal deep learning to predict prognosis in adult and pediatric brain tumors. Communications medicine Steyaert, S., Qiu, Y. L., Zheng, Y., Mukherjee, P., Vogel, H., Gevaert, O. 2023; 3 (1): 44


    The introduction of deep learning in both imaging and genomics has significantly advanced the analysis of biomedical data. For complex diseases such as cancer, different data modalities may reveal different disease characteristics, and the integration of imaging with genomic data has the potential to unravel additional information than when using these data sources in isolation. Here, we propose a DL framework that combines these two modalities with the aim to predict brain tumor prognosis.Using two separate glioma cohorts of 783 adults and 305 pediatric patients we developed a DL framework that can fuse histopathology images with gene expression profiles. Three strategies for data fusion were implemented and compared: early, late, and joint fusion. Additional validation of the adult glioma models was done on an independent cohort of 97 adult patients.Here we show that the developed multimodal data models achieve better prediction results compared to the single data models, but also lead to the identification of more relevant biological pathways. When testing our adult models on a third brain tumor dataset, we show our multimodal framework is able to generalize and performs better on new data from different cohorts. Leveraging the concept of transfer learning, we demonstrate how our pediatric multimodal models can be used to predict prognosis for two more rare (less available samples) pediatric brain tumors.Our study illustrates that a multimodal data fusion approach can be successfully implemented and customized to model clinical outcome of adult and pediatric brain tumors.

    View details for DOI 10.1038/s43856-023-00276-y

    View details for PubMedID 36991216

    View details for PubMedCentralID 5563115

  • A deep-learning algorithm to classify skin lesions from mpox virus infection. Nature medicine Thieme, A. H., Zheng, Y., Machiraju, G., Sadee, C., Mittermaier, M., Gertler, M., Salinas, J. L., Srinivasan, K., Gyawali, P., Carrillo-Perez, F., Capodici, A., Uhlig, M., Habenicht, D., Loser, A., Kohler, M., Schuessler, M., Kaul, D., Gollrad, J., Ma, J., Lippert, C., Billick, K., Bogoch, I., Hernandez-Boussard, T., Geldsetzer, P., Gevaert, O. 2023


    Undetected infection and delayed isolation of infected individuals are key factors driving the monkeypox virus (now termed mpox virus or MPXV) outbreak. To enable earlier detection of MPXV infection, we developed an image-based deep convolutional neural network (named MPXV-CNN) for the identification of the characteristic skin lesions caused by MPXV. We assembled a dataset of 139,198 skin lesion images, split into training/validation and testing cohorts, comprising non-MPXV images (n=138,522) from eight dermatological repositories and MPXV images (n=676) from the scientific literature, news articles, social media and a prospective cohort of the Stanford University Medical Center (n=63 images from 12 patients, all male). In the validation and testing cohorts, the sensitivity of the MPXV-CNN was 0.83 and 0.91, the specificity was 0.965 and 0.898 and the area under the curve was 0.967 and 0.966, respectively. In the prospective cohort, the sensitivity was 0.89. The classification performance of the MPXV-CNN was robust across various skin tones and body regions. To facilitate the usage of the algorithm, we developed a web-based app by which the MPXV-CNN can be accessed for patient guidance. The capability of the MPXV-CNN for identifying MPXV lesions has the potential to aid in MPXV outbreak mitigation.

    View details for DOI 10.1038/s41591-023-02225-7

    View details for PubMedID 36864252

  • Early Dietary Exposures Epigenetically Program Mammary Cancer Susceptibility through Igf1-Mediated Expansion of the Mammary Stem Cell Compartment. Cells Zheng, Y., Luo, L., Lambertz, I. U., Conti, C. J., Fuchs-Young, R. 2022; 11 (16)


    Diet is a critical environmental factor affecting breast cancer risk, and recent evidence shows that dietary exposures during early development can affect lifetime mammary cancer susceptibility. To elucidate the underlying mechanisms, we used our established crossover feeding mouse model, where exposure to a high-fat and high-sugar (HFHS) diet during defined developmental windows determines mammary tumor incidence and latency in carcinogen-treated mice. Mammary tumor incidence is significantly increased in mice receiving a HFHS post-weaning diet (high-tumor mice, HT) compared to those receiving a HFHS diet during gestation (low-tumor mice, LT). The current study revealed that the mammary stem cell (MaSC) population was significantly increased in mammary glands from HT compared to LT mice. Igf1 expression was increased in mammary stromal cells from HT mice, where it promoted MaSC self-renewal. The increased Igf1 expression was induced by DNA hypomethylation of the Igf1 Pr1 promoter, mediated by a decrease in Dnmt3b levels. Mammary tissues from HT mice also had reduced levels of Igfbp5, leading to increased bioavailability of tissue Igf1. This study provides novel insights into how early dietary exposures program mammary cancer risk, demonstrating that effective dietary intervention can reduce mammary cancer incidence.

    View details for DOI 10.3390/cells11162558

    View details for PubMedID 36010633

    View details for PubMedCentralID PMC9406400

  • Overexpression of IGF-1 During Early Development Expands the Number of Mammary Stem Cells and Primes them for Transformation. Stem cells (Dayton, Ohio) Luo, L., Santos, A., Konganti, K., Hillhouse, A., Lambertz, I. U., Zheng, Y., Gunaratna, R. T., Threadgill, D. W., Fuchs-Young, R. S. 2022; 40 (3): 273-289


    Insulin-like growth factor I (IGF-1) has been implicated in breast cancer due to its mitogenic and anti-apoptotic effects. Despite substantial research on the role of IGF-1 in tumor progression, the relationship of IGF-1 to tissue stem cells, particularly in mammary tissue, and the resulting tumor susceptibility has not been elucidated. Previous studies with the BK5.IGF-1 transgenic (Tg) mouse model reveals that IGF-1 does not act as a classical, post-carcinogen tumor promoter in the mammary gland. Pre-pubertal Tg mammary glands display increased numbers and enlarged sizes of terminal end buds, a niche for mammary stem cells (MaSCs). Here we show that MaSCs from both wild-type (WT) and Tg mice expressed IGF-1R and that overexpression of Tg IGF-1 increased numbers of MaSCs by undergoing symmetric division, resulting in an expansion of the MaSC and luminal progenitor (LP) compartments in pre-pubertal female mice. This expansion was maintained post-pubertally and validated by mammosphere assays in vitro and transplantation assays in vivo. The addition of recombinant IGF-1 promoted, and IGF-1R downstream inhibitors decreased mammosphere formation. Single-cell transcriptomic profiles generated from 2 related platforms reveal that IGF-1 stimulated quiescent MaSCs to enter the cell cycle and increased their expression of genes involved in proliferation, plasticity, tumorigenesis, invasion, and metastasis. This study identifies a novel, pro-tumorigenic mechanism, where IGF-1 increases the number of transformation-susceptible carcinogen targets during the early stages of mammary tissue development, and "primes" their gene expression profiles for transformation.

    View details for DOI 10.1093/stmcls/sxab018

    View details for PubMedID 35356986