All Publications


  • The literacy barrier in clinical trial consents: a retrospective analysis ECLINICALMEDICINE Mirza, F. N., Wu, E., Abdulrazeq, H. F., Connolly, I. D., Tang, O. Y., Zogg, C. K., Williamson, T., Galamaga, P. F., Roye, G., Sampath, P., Telfeian, A. E., Qureshi, A. A., Groff, M. W., Shin, J. H., Asaad, W. F., Libby, T. J., Gokaslan, Z. L., Kohane, I. S., Zou, J., Ali, R. 2024; 75
  • Discovery and generalization of tissue structures from spatial omics data. Cell reports methods Wu, Z., Kondo, A., McGrady, M., Baker, E. A., Chidester, B., Wu, E., Rahim, M. K., Bracey, N. A., Charu, V., Cho, R. J., Cheng, J. B., Afkarian, M., Zou, J., Mayer, A. T., Trevino, A. E. 2024: 100838

    Abstract

    Tissues are organized into anatomical and functional units at different scales. New technologies for high-dimensional molecular profiling in situ have enabled the characterization of structure-function relationships in increasing molecular detail. However, it remains a challenge to consistently identify key functional units across experiments, tissues, and disease contexts, a task that demands extensive manual annotation. Here, we present spatial cellular graph partitioning (SCGP), a flexible method for the unsupervised annotation of tissue structures. We further present a reference-query extension pipeline, SCGP-Extension, that generalizes reference tissue structure labels to previously unseen samples, performing data integration and tissue structure discovery. Our experiments demonstrate reliable, robust partitioning of spatial data in a wide variety of contexts and best-in-class accuracy in identifying expertly annotated structures. Downstream analysis on SCGP-identified tissue structures reveals disease-relevant insights regarding diabetic kidney disease, skin disorder, and neoplastic diseases, underscoring its potential to drive biological insight and discovery from spatial datasets.

    View details for DOI 10.1016/j.crmeth.2024.100838

    View details for PubMedID 39127044

  • Systematic analysis of 32,111 AI model cards characterizes documentation practice in AI NATURE MACHINE INTELLIGENCE Liang, W., Rajani, N., Yang, X., Ozoani, E., Wu, E., Chen, Y., Smith, D., Zou, J. 2024
  • PEPSI: Polarity measurements from spatial proteomics imaging suggest immune cell engagement. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Wu, E., Wu, Z., Mayer, A. T., Trevino, A. E., Zou, J. 2024; 29: 492-505

    Abstract

    Subcellular protein localization is important for understanding functional states of cells, but measuring and quantifying this information can be difficult and typically requires high-resolution microscopy. In this work, we develop a metric to define surface protein polarity from immunofluorescence (IF) imaging data and use it to identify distinct immune cell states within tumor microenvironments. We apply this metric to characterize over two million cells across 600 patient samples and find that cells identified as having polar expression exhibit characteristics relating to tumor-immune cell engagement. Additionally, we show that incorporating these polarity-defined cell subtypes improves the performance of deep learning models trained to predict patient survival outcomes. This method provides a first look at using subcellular protein expression patterns to phenotype immune cell functional states with applications to precision medicine.

    View details for PubMedID 38160302

  • GPT detectors are biased against non-native English writers. Patterns (New York, N.Y.) Liang, W., Yuksekgonul, M., Mao, Y., Wu, E., Zou, J. 2023; 4 (7): 100779

    Abstract

    GPT detectors frequently misclassify non-native English writing as AI generated, raising concerns about fairness and robustness. Addressing the biases in these detectors is crucial to prevent the marginalization of non-native English speakers in evaluative and educational settings and to create a more equitable digital landscape.

    View details for DOI 10.1016/j.patter.2023.100779

    View details for PubMedID 37521038

  • 7-UP: Generating in silico CODEX from a small set of immunofluorescence markers. PNAS nexus Wu, E., Trevino, A. E., Wu, Z., Swanson, K., Kim, H. J., D'Angio, H. B., Preska, R., Chiou, A. E., Charville, G. W., Dalerba, P., Duvvuri, U., Colevas, A. D., Levi, J., Bedi, N., Chang, S., Sunwoo, J., Egloff, A. M., Uppaluri, R., Mayer, A. T., Zou, J. 2023; 2 (6): pgad171

    Abstract

    Multiplex immunofluorescence (mIF) assays multiple protein biomarkers on a single tissue section. Recently, high-plex CODEX (co-detection by indexing) systems enable simultaneous imaging of 40+ protein biomarkers, unlocking more detailed molecular phenotyping, leading to richer insights into cellular interactions and disease. However, high-plex data can be slower and more costly to collect, limiting its applications, especially in clinical settings. We propose a machine learning framework, 7-UP, that can computationally generate in silico 40-plex CODEX at single-cell resolution from a standard 7-plex mIF panel by leveraging cellular morphology. We demonstrate the usefulness of the imputed biomarkers in accurately classifying cell types and predicting patient survival outcomes. Furthermore, 7-UP's imputations generalize well across samples from different clinical sites and cancer types. 7-UP opens the possibility of in silico CODEX, making insights from high-plex mIF more widely available.

    View details for DOI 10.1093/pnasnexus/pgad171

    View details for PubMedID 37275261

    View details for PubMedCentralID PMC10236358

  • Leveraging Physiology and Artificial Intelligence to Deliver Advancements in Healthcare. Physiological reviews Zhang, A., Wu, Z., Wu, E., Wu, M., Snyder, M. P., Zou, J., Wu, J. C. 2023

    Abstract

    Artificial Intelligence (AI) in healthcare has generated remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of AI to transform physiology data to advance healthcare. In this review, we will explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of AI, with special attention to the most relevant AI models. We then detail how physiology data has been harnessed by AI to advance the main areas of healthcare such as automating existing healthcare tasks, increasing access to care, and augmenting healthcare capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying AI models to achieve meaningful clinical impact.

    View details for DOI 10.1152/physrev.00033.2022

    View details for PubMedID 37104717

  • From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell Swanson, K., Wu, E., Zhang, A., Alizadeh, A. A., Zou, J. 2023

    Abstract

    Machine learning (ML) is increasingly used in clinical oncology to diagnose cancers, predict patient outcomes, and inform treatment planning. Here, we review recent applications of ML across the clinical oncology workflow. We review how these techniques are applied to medical imaging and to molecular data obtained from liquid and solid tumor biopsies for cancer diagnosis, prognosis, and treatment design. We discuss key considerations in developing ML for the distinct challenges posed by imaging and molecular data. Finally, we examine ML models approved for cancer-related patient usage by regulatory agencies and discuss approaches to improve the clinical usefulness of ML.

    View details for DOI 10.1016/j.cell.2023.01.035

    View details for PubMedID 36905928

  • Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nature biomedical engineering Wu, Z., Trevino, A. E., Wu, E., Swanson, K., Kim, H. J., D'Angio, H. B., Preska, R., Charville, G. W., Dalerba, P. D., Egloff, A. M., Uppaluri, R., Duvvuri, U., Mayer, A. T., Zou, J. 2022

    Abstract

    Multiplexed immunofluorescence imaging allows the multidimensional molecular profiling of cellular environments at subcellular resolution. However, identifying and characterizing disease-relevant microenvironments from these rich datasets is challenging. Here we show that a graph neural network that leverages spatial protein profiles in tissue specimens to model tumour microenvironments as local subgraphs captures distinctive cellular interactions associated with differential clinical outcomes. We applied this spatial cellular-graph strategy to specimens of human head-and-neck and colorectal cancers assayed with 40-plex immunofluorescence imaging to identify spatial motifs associated with cancer recurrence and with patient survival after treatment. The graph deep learning model was substantially more accurate in predicting patient outcomes than deep learning approaches that model spatial data on the basis of the local composition of cell types, and it generated insights into the effect of the spatial compartmentalization of tumour cells and granulocytes on patient prognosis. Local graphs may also aid in the analysis of disease-relevant motifs in histology samples characterized via spatial transcriptomics and other -omics techniques.

    View details for DOI 10.1038/s41551-022-00951-w

    View details for PubMedID 36357512

  • Machine Learning Prediction of Clinical Trial Operational Efficiency. The AAPS journal Wu, K., Wu, E., DAndrea, M., Chitale, N., Lim, M., Dabrowski, M., Kantor, K., Rangi, H., Liu, R., Garmhausen, M., Pal, N., Harbron, C., Rizzo, S., Copping, R., Zou, J. 2022; 24 (3): 57

    Abstract

    Clinical trials are the gatekeepers and bottlenecks of progress in medicine. In recent years, they have become increasingly complex and expensive, driven by a growing number of stakeholders requiring more endpoints, more diverse patient populations, and a stringent regulatory environment. Trial designers have historically relied on investigator expertise and legacy norms established within sponsor companies to improve operational efficiency while achieving study goals. As such, data-driven forecasts of operational metrics can be a useful resource for trial design and planning. We develop a machine learning model to predict clinical trial operational efficiency using a novel dataset from Roche containing over 2,000 clinical trials across 20 years and multiple disease areas. The data includes important operational metrics related to patient recruitment and trial duration, as well as a variety of trial features such as the number of procedures, eligibility criteria, and endpoints. Our results demonstrate that operational efficiency can be predicted robustly using trial features, which can provide useful insights to trial designers on the potential impact of their decisions on patient recruitment success and trial duration.

    View details for DOI 10.1208/s12248-022-00703-3

    View details for PubMedID 35449371

  • How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nature medicine Wu, E., Wu, K., Daneshjou, R., Ouyang, D., Ho, D. E., Zou, J. 2021

    View details for DOI 10.1038/s41591-021-01312-x

    View details for PubMedID 33820998

  • Robust breast cancer detection in mammography and digital breast tomosynthesis using an annotation-efficient deep learning approach. Nature medicine Lotter, W., Diab, A. R., Haslam, B., Kim, J. G., Grisot, G., Wu, E., Wu, K., Onieva, J. O., Boyer, Y., Boxerman, J. L., Wang, M., Bandler, M., Vijayaraghavan, G. R., Gregory Sorensen, A. 2021

    Abstract

    Breast cancer remains a global challenge, causing over 600,000 deaths in 2018 (ref. 1). To achieve earlier cancer detection, health organizations worldwide recommend screening mammography, which is estimated to decrease breast cancer mortality by 20-40% (refs. 2,3). Despite the clear value of screening mammography, significant false positive and false negative rates along with non-uniformities in expert reader availability leave opportunities for improving quality and access4,5. To address these limitations, there has been much recent interest in applying deep learning to mammography6-18, and these efforts have highlighted two key difficulties: obtaining large amounts of annotated training data and ensuring generalization across populations, acquisition equipment and modalities. Here we present an annotation-efficient deep learning approach that (1) achieves state-of-the-art performance in mammogram classification, (2) successfully extends to digital breast tomosynthesis (DBT; '3D mammography'), (3) detects cancers in clinically negative prior mammograms of patients with cancer, (4) generalizes well to a population with low screening rates and (5) outperforms five out of five full-time breast-imaging specialists with an average increase in sensitivity of 14%. By creating new 'maximum suspicion projection' (MSP) images from DBT data, our progressively trained, multiple-instance learning approach effectively trains on DBT exams using only breast-level labels while maintaining localization-based interpretability. Altogether, our results demonstrate promise towards software that can improve the accuracy of and access to screening mammography worldwide.

    View details for DOI 10.1038/s41591-020-01174-9

    View details for PubMedID 33432172