Stanford Advisors

All Publications

  • Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. NPJ digital medicine Savage, T., Nayak, A., Gallo, R., Rangan, E., Chen, J. H. 2024; 7 (1): 20


    One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the "black box" limitations of LLMs, bringing them one step closer to safe and effective use in medicine.

    View details for DOI 10.1038/s41746-024-01010-1

    View details for PubMedID 38267608

    View details for PubMedCentralID 9931230

  • Things We Do for No Reason™: Routine early PEG tube placement for dysphagia after acute stroke. Journal of hospital medicine Gallo, R. J., Wang, J. E., Madill, E. S. 2024

    View details for DOI 10.1002/jhm.13263

    View details for PubMedID 38180160

  • ChatGPT Influence on Medical Decision-Making, Bias, and Equity: A Randomized Study of Clinicians Evaluating Clinical Vignettes. medRxiv : the preprint server for health sciences Goh, E., Bunning, B., Khoong, E., Gallo, R., Milstein, A., Centola, D., Chen, J. H. 2023


    In a randomized, pre-post intervention study, we evaluated the influence of a large language model (LLM) generative AI system on accuracy of physician decision-making and bias in healthcare. 50 US-licensed physicians reviewed a video clinical vignette, featuring actors representing different demographics (a White male or a Black female) with chest pain. Participants were asked to answer clinical questions around triage, risk, and treatment based on these vignettes, then asked to reconsider after receiving advice generated by ChatGPT+ (GPT4). The primary outcome was the accuracy of clinical decisions based on pre-established evidence-based guidelines. Results showed that physicians are willing to change their initial clinical impressions given AI assistance, and that this led to a significant improvement in clinical decision-making accuracy in a chest pain evaluation scenario without introducing or exacerbating existing race or gender biases. A survey of physician participants indicates that the majority expect LLM tools to play a significant role in clinical decision making.

    View details for DOI 10.1101/2023.11.24.23298844

    View details for PubMedID 38076944

    View details for PubMedCentralID PMC10705632

  • K Grant Funding to Internal Medicine Specialties. Journal of general internal medicine Gallo, R. J., Asch, S. M., Chan, D. C. 2023

    View details for DOI 10.1007/s11606-023-08483-y

    View details for PubMedID 37904071

  • Administrative Coding Versus Laboratory Diagnosis of Inpatient Hypoglycemia. Diabetes care Gallo, R. J., Fang, D. Z., Heidenreich, P. A. 2023

    View details for DOI 10.2337/dc23-0053

    View details for PubMedID 37068271

  • Addition of Coronary Artery Calcium Scores to Primary Prevention Risk Estimation Models-Primum Non Nocere. JAMA internal medicine Gallo, R. J., Brown, D. L. 2022

    View details for DOI 10.1001/jamainternmed.2022.1258

    View details for PubMedID 35467702