Bio


Kai Zhang is a computer scientist dedicated to advancing artificial intelligence in healthcare. After earning his Ph.D. from Lehigh University, he is joining Stanford Medicine as a postdoctoral scholar. His research bridges multimodal learning and self-improving AI to build robust, interpretable foundation models. Through his work, he aims to create adaptive systems that reliably drive biomedical discovery and empower real-world clinical decisions.

Stanford Advisors


Current Research and Scholarly Interests


My research develops AI systems for biomedicine, with a focus on multimodal learning, foundation models, and self-improving AI. I study how models can integrate medical images, clinical text, EHRs, and biomedical knowledge to support diagnosis, clinical workflows, and scientific discovery, while improving through feedback, evaluation, and human-AI interaction.

All Publications


  • EIP: Weighted Ranking of LLMs by Quantifying Question Difficulty International Conference on Learning Representations Hu, X., Zhang, Z., Huang, Y., Zhang, K., Chen, R., Liu, Y., Wen, Q., Xu, K., Zhang, X., Gong, N. Z., Sun, L. 2026
  • Scaling up biomedical vision-language models: Fine-tuning, instruction tuning, and multi-modal learning. Journal of biomedical informatics Peng, C., Zhang, K., Lyu, M., Liu, H., Sun, L., Wu, Y. 2025; 171: 104946

    Abstract

    To advance biomedical vision language model capabilities through scaling up, fine-tuning, and instruction tuning, develop vision-language models with improved performance in handling long text, explore strategies to efficiently adopt vision language models for diverse multi-modal biomedical tasks, and examine the zero-shot learning performance.We developed two biomedical vision language models, BiomedGPT-Large and BiomedGPT-XLarge, based on an encoder-decoder-based transformer architecture. We fine-tuned the two models on 23 benchmark datasets from 6 multi-modal biomedical tasks, including one image-only task (image classification), three language-only tasks (text understanding, text summarization, and question answering), and two vision-language tasks (visual question answering and image captioning). We compared the developed scaled models with our previous BiomedGPT-Base model and existing prestigious models reported in the literature. We instruction-tuned the two models using a large-scale multi-modal biomedical instruction-tuning dataset and assessed the zero-shot learning performance and alignment accuracy.The experimental results show that the new models developed in this study outperform our previous BiomedGPT-Base model on 17 of 23 benchmark datasets and achieve state-of-the-art performance on 15 of 23 datasets when compared to previous models reported in the literature. The new models also demonstrated improved ability in handling long text, particularly on text summarization on the MIMIC-III dataset and text understanding on the SEER dataset, with a remarkable improvement of 4.6-11.4 %. Instruction tuning on the scaled models resulted in significant enhancements in zero-shot learning ability and alignment accuracy in following complex instructions across multiple tasks, including image classification, visual question answering, and image captioning. This study develops two vision-language models in the biomedical domain and examines technologies to improve long text content in vision language models through scaling, fine-tuning, and instruction tuning. This study demonstrates the potential of vision language models to integrate multiple data modalities to solve diverse multimodal tasks in the biomedical domain.

    View details for DOI 10.1016/j.jbi.2025.104946

    View details for PubMedID 41138953

    View details for PubMedCentralID PMC12885065

  • Secure Embedding Aggregation for Cross-Silo Federated Representation Learning IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY Li, S., Tang, J., Zhu, J., Zhang, K., Sun, L., Dong, C. 2025; 20: 6810-6825
  • A generalist vision-language foundation model for diverse biomedical tasks. Nature medicine Zhang, K., Zhou, R., Adhikarla, E., Yan, Z., Liu, Y., Yu, J., Liu, Z., Chen, X., Davison, B. D., Ren, H., Huang, J., Chen, C., Zhou, Y., Fu, S., Liu, W., Liu, T., Li, X., Chen, Y., He, L., Zou, J., Li, Q., Liu, H., Sun, L. 2024

    Abstract

    Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize holistic information. Generalist AI holds the potential to address these limitations due to its versatility in interpreting different data types and generating tailored outputs for diverse needs. However, existing biomedical generalist AI solutions are typically heavyweight and closed source to researchers, practitioners and patients. Here, we describe BiomedGPT, the first open-source and lightweight vision-language foundation model, designed as a generalist capable of performing various biomedical tasks. BiomedGPT achieved state-of-the-art results in 16 out of 25 experiments while maintaining a computing-friendly model scale. We also conducted human evaluations to assess the capabilities of BiomedGPT in radiology visual question answering, report generation and summarization. BiomedGPT exhibits robust prediction ability with a low error rate of 3.8% in question answering, satisfactory performance with an error rate of 8.3% in writing complex radiology reports, and competitive summarization ability with a nearly equivalent preference score to human experts. Our method demonstrates that effective training with diverse data can lead to more practical biomedical AI for improving diagnosis and workflow efficiency.

    View details for DOI 10.1038/s41591-024-03185-2

    View details for PubMedID 39112796

    View details for PubMedCentralID 6295806

  • FedSecurity: A Benchmark for Attacks and Defenses in Federated Learning and Federated LLMs Han, S., Buyukates, B., Hu, Z., Jin, H., Jin, W., Sun, L., Wang, X., Wu, W., Xie, C., Yao, Y., Zhang, K., Zhang, Q., Zhang, Y., Joe-Wong, C., Avestimehr, S., He, C., ASSOC COMPUTING MACHINERY ASSOC COMPUTING MACHINERY. 2024: 5070-5081