Stanford Advisors


All Publications


  • Genetic Diagnosis and Discovery Enabled by Large Language Models. Advanced science (Weinheim, Baden-Wurttemberg, Germany) Tu, T., Saab, K., Liu, W., Fang, Z., Cheng, Z., Spasic, S., Djurisic, M., Mohri, H., Ren, W., Palepu, A., Gottweis, J., Karthikesalingam, A., Kulkarni, K., Pawlosky, A., Bonner, D., Kravets, E., Marwaha, S., Mendez, H. R., Wheeler, M. T., Bernstein, J. A., Tsai, C. Y., Wu, C. C., Stankovic, K. M., Natarajan, V., Peltz, G. 2026: e18656

    Abstract

    Artificial intelligence (AI) has been used in many areas of medicine, and large language models (LLMs) have shown potential utility for various clinical applications. However, to determine if LLMs can accelerate the pace of genetic diagnosis and discovery, we examined whether recently developed LLMs (Med-PaLM 2 and Gemini) could assist in solving four types of genetic problems with sequentially increasing complexity. First, in response to free-text input, Med-PaLM 2 correctly identified murine genes with experimentally verified causative genetic factors for six previously studied murine models of biomedical traits. Second, Med-PaLM 2 identified a novel causative murine genetic factor for spontaneous hearing loss that was validated using knock-in mice. Third, we developed a retrieval and grounding pipeline that enabled Gemini 2.5 Pro to analyze large lists of genes, which contained genetic variants that were identified in the genomic sequences of 20 human subjects with hearing loss, and demonstrated that it can assist in identifying causative genetic factors for hearing loss. Fourth, we modified the genetic analysis pipeline to enable Gemini 2.5 Pro without any task-specific fine-tuning to identify causative genetic factors for six subjects with rare genetic diseases, which required 14 to 34 different terms to describe their multi-faceted symptom complexes. These results demonstrate that an AI pipeline can facilitate genetic diagnosis and discovery in mice and humans.

    View details for DOI 10.1002/advs.202518656

    View details for PubMedID 41655254

  • A Tandem Repeat Atlas for the Genome of Inbred Mouse Strains: A Genetic Variation Resource. bioRxiv : the preprint server for biology Ren, W., Liu, W., Fang, Z., Dolzhenko, E., Weisburd, B., Cheng, Z., Peltz, G. 2025

    Abstract

    Tandem repeats (TRs) are a significant source of genetic variation in the human population; and TR alleles are responsible for over 60 human genetic diseases and for inter-individual differences in many biomedical traits. Therefore, we utilized long-read sequencing and state of the art computational programs to produce a database with 2,528,854 TRs covering 39 inbred mouse strains. As in humans, murine TRs are abundant and were primarily located in intergenic regions. However, there were important species differences: murine TRs did not have the extensive number of repeat expansions like those associated with human repeat expansion diseases and they were not associated with transposable elements. We demonstrate by analysis of two biomedical phenotypes, which were identified over 40 years ago, that this TR database can enhance our ability to characterize the genetic basis for trait differences among the inbred strains.

    View details for DOI 10.1101/2025.05.23.655792

    View details for PubMedID 40475611

    View details for PubMedCentralID PMC12139781