Kai Zhang's Profile | Stanford Profiles

Bio

Kai Zhang is a computer scientist dedicated to advancing artificial intelligence in healthcare. After earning his Ph.D. from Lehigh University, he is joining Stanford Medicine as a postdoctoral scholar. His research bridges multimodal learning and self-improving AI to build robust, interpretable foundation models. Through his work, he aims to create adaptive systems that reliably drive biomedical discovery and empower real-world clinical decisions.

Stanford Advisors

Xianjin Dai, Postdoctoral Faculty Sponsor

Contact

Academic
zkai@stanford.edu

University - Scholar Department: Radiation Oncology Position: Postdoctoral Scholar

Additional Info

ORCID:
https://orcid.org/0000-0002-6322-6096

Current Research and Scholarly Interests

My research develops AI systems for biomedicine, with a focus on multimodal learning, foundation models, and self-improving AI. I study how models can integrate medical images, clinical text, EHRs, and biomedical knowledge to support diagnosis, clinical workflows, and scientific discovery, while improving through feedback, evaluation, and human-AI interaction.

All Publications

Interpretable agentic AI system with localized reasoning for radiology. NPJ digital medicine Chen, W., Dong, Y., Ding, Z., Shi, Y., Zeng, F., Zhou, Y., Zhao, H., Luo, Y., Lin, T., Su, Y., Wu, Y., Liu, J., Zhang, K., Wang, W., Xiang, Z., Liu, T., Liu, N., Li, Q., Sun, L., Yuan, Y., Li, X. 2026

Abstract

Medical AI has produced many radiology models, particularly for chest X-rays (CXR), each excelling at isolated tasks like lesion detection or report generation. However, these models have disparate capabilities and limited generalizability due to training on restricted datasets, making clinical integration challenging. Large language models (LLMs) now enable interfacing heterogeneous models within agentic frameworks that automatically interpret and unify outputs in natural language. In this work, we present RadFabric, an agentic AI system that orchestrates fourteen specialized open-source CXR analytics models and two Vision-Language Models (VLM) through a modular protocol. RadFabric includes an Anatomical Interpretation Agent that grounds visual findings in anatomical context, and a trainable reasoning agent that synthesizes these anatomically-enriched outputs with VLM-generated radiology reports into transparent, step-by-step diagnoses, even when model outputs are heterogeneous or conflicting. This architecture enables explainable, robust diagnoses across common and rare pathologies while facilitating extensibility through additional agents. Evaluation results on the MIMIC-CXR dataset shows that RadFabric can achieve an AUC of 85.18% on task of detecting different legion types from the given CXR, outperforming all state-of-art CXR models. Notably, the reasoning agent particularly improves detection of uncommon findings, demonstrating enhanced interpretability, generalizability, and clinical applicability.

View details for DOI 10.1038/s41746-026-02994-8

View details for PubMedID 42457975
EIP: Weighted Ranking of LLMs by Quantifying Question Difficulty International Conference on Learning Representations Hu, X., Zhang, Z., Huang, Y., Zhang, K., Chen, R., Liu, Y., Wen, Q., Xu, K., Zhang, X., Gong, N. Z., Sun, L. 2026
Scaling up biomedical vision-language models: Fine-tuning, instruction tuning, and multi-modal learning. Journal of biomedical informatics Peng, C., Zhang, K., Lyu, M., Liu, H., Sun, L., Wu, Y. 2025; 171: 104946

Abstract

To advance biomedical vision language model capabilities through scaling up, fine-tuning, and instruction tuning, develop vision-language models with improved performance in handling long text, explore strategies to efficiently adopt vision language models for diverse multi-modal biomedical tasks, and examine the zero-shot learning performance.We developed two biomedical vision language models, BiomedGPT-Large and BiomedGPT-XLarge, based on an encoder-decoder-based transformer architecture. We fine-tuned the two models on 23 benchmark datasets from 6 multi-modal biomedical tasks, including one image-only task (image classification), three language-only tasks (text understanding, text summarization, and question answering), and two vision-language tasks (visual question answering and image captioning). We compared the developed scaled models with our previous BiomedGPT-Base model and existing prestigious models reported in the literature. We instruction-tuned the two models using a large-scale multi-modal biomedical instruction-tuning dataset and assessed the zero-shot learning performance and alignment accuracy.The experimental results show that the new models developed in this study outperform our previous BiomedGPT-Base model on 17 of 23 benchmark datasets and achieve state-of-the-art performance on 15 of 23 datasets when compared to previous models reported in the literature. The new models also demonstrated improved ability in handling long text, particularly on text summarization on the MIMIC-III dataset and text understanding on the SEER dataset, with a remarkable improvement of 4.6-11.4 %. Instruction tuning on the scaled models resulted in significant enhancements in zero-shot learning ability and alignment accuracy in following complex instructions across multiple tasks, including image classification, visual question answering, and image captioning. This study develops two vision-language models in the biomedical domain and examines technologies to improve long text content in vision language models through scaling, fine-tuning, and instruction tuning. This study demonstrates the potential of vision language models to integrate multiple data modalities to solve diverse multimodal tasks in the biomedical domain.

View details for DOI 10.1016/j.jbi.2025.104946

View details for PubMedID 41138953

View details for PubMedCentralID PMC12885065
Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement Adhikarla, E., Zhang, K., VidalMata, R. G., Aithal, M., Madhusudhana, N., Nicholson, J., Sun, L., Davison, B. D. edited by Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, C. L., Bhattacharya, S., Pal, U. SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 260-275

View details for DOI 10.1007/978-3-031-78110-0_17

View details for Web of Science ID 001565109600017
Secure Embedding Aggregation for Cross-Silo Federated Representation Learning IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY Li, S., Tang, J., Zhu, J., Zhang, K., Sun, L., Dong, C. 2025; 20: 6810-6825

View details for DOI 10.1109/TIFS.2025.3580228

View details for Web of Science ID 001567346100013
A comprehensive survey on pretrained foundation models: a history from BERT to ChatGPT INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS Zhou, C., Li, Q., Li, C., Yu, J., Liu, Y., Wang, G., Zhang, K., Ji, C., Yan, Q., He, L., Peng, H., Li, J., Wu, J., Liu, Z., Xie, P., Xiong, C., Pei, J., Yu, P. S., Sun, L. 2025; 16 (12): 9851-9915

View details for DOI 10.1007/s13042-024-02443-6

View details for Web of Science ID 001363100600001
A generalist vision-language foundation model for diverse biomedical tasks. Nature medicine Zhang, K., Zhou, R., Adhikarla, E., Yan, Z., Liu, Y., Yu, J., Liu, Z., Chen, X., Davison, B. D., Ren, H., Huang, J., Chen, C., Zhou, Y., Fu, S., Liu, W., Liu, T., Li, X., Chen, Y., He, L., Zou, J., Li, Q., Liu, H., Sun, L. 2024

Abstract

Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners and patients. Here, we describe BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.

View details for DOI 10.1038/s41591-024-03185-2

View details for PubMedID 39112796

View details for PubMedCentralID 6295806
FedSecurity: A Benchmark for Attacks and Defenses in Federated Learning and Federated LLMs Han, S., Buyukates, B., Hu, Z., Jin, H., Jin, W., Sun, L., Wang, X., Wu, W., Xie, C., Yao, Y., Zhang, K., Zhang, Q., Zhang, Y., Joe-Wong, C., Avestimehr, S., He, C., ASSOC COMPUTING MACHINERY ASSOC COMPUTING MACHINERY. 2024: 5070-5081

View details for DOI 10.1145/3637528.3671545

View details for Web of Science ID 001324524205022
Adversarial Attack and Defense on Graph Data: A Survey IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING Sun, L., Dou, Y., Yang, C., Zhang, K., Wang, J., Yu, P. S., He, L., Li, B. 2023; 35 (8): 7693-7711

View details for DOI 10.1109/TKDE.2022.3201243

View details for Web of Science ID 001033571000007

Kai Zhang

Postdoctoral Scholar, Radiation Physics

Bio

Stanford Advisors

Contact

Additional Info

Links

Current Research and Scholarly Interests

All Publications

Abstract

Abstract

Abstract