Zhi Huang received his Bachelor of Science degree in Automation (BS--MS straight entrance class) from Xi'an Jiaotong University School of Electronic and Information Engineering in June 2015. In August 2021, He received a Ph.D. degree from Purdue University, majoring in Electrical and Computer Engineering (ECE).
His background is in the area of Machine and Deep Learning, Computational Pathology, Computational Biology, and Bioinformatics.
From May 2019 to August 2019, he was at Philips Research North America as a Research Intern.

Professional Education

  • Doctor of Philosophy, Purdue University (2021)
  • Bachelor of Science, Xi'An Jiaotong University (2015)

All Publications

  • A visual-language foundation model for pathology image analysis using medical Twitter. Nature medicine Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J., Zou, J. 2023


    The lack of annotated publicly available medical images is a major barrier for computational research and education innovations. At the same time, many de-identified images and much knowledge are shared by clinicians on public forums such as medical Twitter. Here we harness these crowd platforms to curate OpenPath, a large dataset of 208,414 pathology images paired with natural language descriptions. We demonstrate the value of this resource by developing pathology language-image pretraining (PLIP), a multimodal artificial intelligence with both image and text understanding, which is trained on OpenPath. PLIP achieves state-of-the-art performances for classifying new pathology images across four external datasets: for zero-shot classification, PLIP achieves F1 scores of 0.565-0.832 compared to F1 scores of 0.030-0.481 for previous contrastive language-image pretrained model. Training a simple supervised classifier on top of PLIP embeddings also achieves 2.5% improvement in F1 scores compared to using other supervised model embeddings. Moreover, PLIP enables users to retrieve similar cases by either image or natural language search, greatly facilitating knowledge sharing. Our approach demonstrates that publicly shared medical information is a tremendous resource that can be harnessed to develop medical artificial intelligence for enhancing diagnosis, knowledge sharing and education.

    View details for DOI 10.1038/s41591-023-02504-3

    View details for PubMedID 37592105

    View details for PubMedCentralID 9883475

  • Brain proteomic analysis implicates actin filament processes and injury response in resilience to Alzheimer's disease. Nature communications Huang, Z., Merrihew, G. E., Larson, E. B., Park, J., Plubell, D., Fox, E. J., Montine, K. S., Latimer, C. S., Dirk Keene, C., Zou, J. Y., MacCoss, M. J., Montine, T. J. 2023; 14 (1): 2747


    Resilience to Alzheimer's disease is an uncommon combination of high disease burden without dementia that offers valuable insights into limiting clinical impact. Here we assessed 43 research participants meeting stringent criteria, 11 healthy controls, 12 resilience to Alzheimer's disease and 20 Alzheimer's disease with dementia and analyzed matched isocortical regions, hippocampus, and caudate nucleus by mass spectrometry-based proteomics. Of 7115 differentially expressed soluble proteins, lower isocortical and hippocampal soluble Aβ levels is a significant feature of resilience when compared to healthy control and Alzheimer's disease dementia groups. Protein co-expression analysis reveals 181 densely-interacting proteins significantly associated with resilience that were enriched for actin filament-based processes, cellular detoxification, and wound healing in isocortex and hippocampus, further supported by four validation cohorts. Our results suggest that lowering soluble Aβ concentration may suppress severe cognitive impairment along the Alzheimer's disease continuum. The molecular basis of resilience likely holds important therapeutic insights.

    View details for DOI 10.1038/s41467-023-38376-x

    View details for PubMedID 37173305

    View details for PubMedCentralID 3266529

  • Artificial intelligence reveals features associated with breast cancer neoadjuvant chemotherapy responses from multi-stain histopathologic images. NPJ precision oncology Huang, Z., Shao, W., Han, Z., Alkashash, A. M., De la Sancha, C., Parwani, A. V., Nitta, H., Hou, Y., Wang, T., Salama, P., Rizkalla, M., Zhang, J., Huang, K., Li, Z. 2023; 7 (1): 14


    Advances in computational algorithms and tools have made the prediction of cancer patient outcomes using computational pathology feasible. However, predicting clinical outcomes from pre-treatment histopathologic images remains a challenging task, limited by the poor understanding of tumor immune micro-environments. In this study, an automatic, accurate, comprehensive, interpretable, and reproducible whole slide image (WSI) feature extraction pipeline known as, IMage-based Pathological REgistration and Segmentation Statistics (IMPRESS), is described. We used both H&E and multiplex IHC (PD-L1, CD8+, and CD163+) images, investigated whether artificial intelligence (AI)-based algorithms using automatic feature extraction methods can predict neoadjuvant chemotherapy (NAC) outcomes in HER2-positive (HER2+) and triple-negative breast cancer (TNBC) patients. Features are derived from tumor immune micro-environment and clinical data and used to train machine learning models to accurately predict the response to NAC in breast cancer patients (HER2+ AUC=0.8975; TNBC AUC=0.7674). The results demonstrate that this method outperforms the results trained from features that were manually generated by pathologists. The developed image features and algorithms were further externally validated by independent cohorts, yielding encouraging results, especially for the HER2+ subtype.

    View details for DOI 10.1038/s41698-023-00352-5

    View details for PubMedID 36707660

  • Systematic pan-cancer analysis of mutation-treatment interactions using large real-world clinicogenomics data. Nature medicine Liu, R., Rizzo, S., Waliany, S., Garmhausen, M. R., Pal, N., Huang, Z., Chaudhary, N., Wang, L., Harbron, C., Neal, J., Copping, R., Zou, J. 2022


    Quantifying the effectiveness of different cancer therapies in patients with specific tumor mutations is critical for improving patient outcomes and advancing precision medicine. Here we perform a large-scale computational analysis of 40,903 US patients with cancer who have detailed mutation profiles, treatment sequences and outcomes derived from electronic health records. We systematically identify 458 mutations that predict the survival of patients on specific immunotherapies, chemotherapy agents or targeted therapies across eight common cancer types. We further characterize mutation-mutation interactions that impact the outcomes of targeted therapies. This work demonstrates how computational analysis of large real-world data generates insights, hypotheses and resources to enable precision oncology.

    View details for DOI 10.1038/s41591-022-01873-5

    View details for PubMedID 35773542

  • TSUNAMI: Translational Bioinformatics Tool Suite for Network Analysis and Mining GENOMICS PROTEOMICS & BIOINFORMATICS Huang, Z., Han, Z., Wang, T., Shao, W., Xiang, S., Salama, P., Rizkalla, M., Huang, K., Zhang, J. 2021; 19 (6): 1023-1031


    Gene co-expression network (GCN) mining identifies gene modules with highly correlated expression profiles across samples/conditions. It enables researchers to discover latent gene/molecule interactions, identify novel gene functions, and extract molecular features from certain disease/condition groups, thus helping to identify disease biomarkers. However, there lacks an easy-to-use tool package for users to mine GCN modules that are relatively small in size with tightly connected genes that can be convenient for downstream gene set enrichment analysis, as well as modules that may share common members. To address this need, we developed an online GCN mining tool package: TSUNAMI (Tools SUite for Network Analysis and MIning). TSUNAMI incorporates our state-of-the-art lmQCM algorithm to mine GCN modules for both public and user-input data (microarray, RNA-seq, or any other numerical omics data), and then performs downstream gene set enrichment analysis for the identified modules. It has several features and advantages: 1) a user-friendly interface and real-time co-expression network mining through a web server; 2) direct access and search of NCBI Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA) databases, as well as user-input gene expression matrices for GCN module mining; 3) multiple co-expression analysis tools to choose from, all of which are highly flexible in regards to parameter selection options; 4) identified GCN modules are summarized to eigengenes, which are convenient for users to check their correlation with other clinical traits; 5) integrated downstream Enrichr enrichment analysis and links to other gene set enrichment tools; and 6) visualization of gene loci by Circos plot in any step of the process. The web service is freely accessible through URL: Source code is available at

    View details for DOI 10.1016/j.gpb.2019.05.006

    View details for Web of Science ID 000847852700013

    View details for PubMedID 33705981

    View details for PubMedCentralID PMC9403021