Teresa Phuongtram Nguyen
Clinical Assistant Professor, Anesthesiology, Perioperative and Pain Medicine
Bio
Teresa Nguyen, MD, is a Clinical Assistant Professor of Anesthesiology at Stanford Medicine and faculty at the Stanford Institute for Human-Centered AI (HAI). She serves as a co-Principal Investigator on a cross-disciplinary initiative between Stanford HAI and the Department of Computer Science, directing the development of autonomous quadruped robotics for robotics education and deployment in clinical applications. Her research further develops frameworks for the integration of machine learning models into healthcare delivery systems and their impact on clinical decision-making.
Previously a Scientific Researcher at Genentech, Dr. Nguyen co-invented and patented a series of therapeutics for chronic and neuropathic pain. She holds a BS in Chemistry and an MD from Stanford University, where she conducted research in the laboratory of Nobel Laureate Carolyn Bertozzi.
A first-generation immigrant, U.S. Department of State Critical Language Scholar (Arabic/Morocco), and licensed helicopter pilot, Dr. Nguyen is the co-founder of two organizations: The Lighthouse Initiative, a mentorship platform achieving a 100% college admissions success rate for first-generation students, and Hands-On Robotics, a nonprofit accelerating technical education through accessible hardware and curriculum. She continues to serve as the instructor for Chemistry Unleashed (Chem 93) at the Stanford Department of Chemistry, bridging molecular theory with clinical practice for the next generation of scientists.
Clinical Focus
- Anesthesiology, Perioperative, and Pain Medicine
- Anesthesia
Administrative Appointments
-
Affiliated Faculty, Stanford Institute for Human Centered Artificial Intelligence (2024 - Present)
Honors & Awards
-
Critical Language Scholarship - Arabic, United States Department of State
-
Bing Fellowship, Stanford University Department of Chemistry, Prof. Barry Trost
-
Medical Scholars, Stanford School of Medicine, Prof. Carolyn Bertozzi (2017)
Professional Education
-
Board Certification: American Board of Anesthesiology, Anesthesia (2025)
-
Medical Education: Stanford University School of Medicine (2020) CA
-
Residency: Stanford University Anesthesiology Residency (2024) CA
-
Internship: Kaiser Permanente at Santa Clara (2021) CA
-
Medical Doctorate, Stanford University School of Medicine (2020)
-
Bachelor of Science, Stanford University, Chemistry (2014)
Community and International Work
-
Co-Founder
Topic
Hands On Robotics
Ongoing Project
Yes
Opportunities for Student Involvement
Yes
-
Founder
Partnering Organization(s)
The Lighthouse Initiative
Ongoing Project
Yes
Opportunities for Student Involvement
Yes
Patents
-
Bergeron, P, Chowdhury, S, Dehnhardt, CM., Focken, T, Grimwood, ME, Hasan, A, Lai, KW, Liu, Z, McKerrall, S, Nguyen, TP, Safina, B, Sutherlin, D, Tan, WT. "United States Patent WO 2017058821 A1 Therapeutic Compounds and Methods Use Thereof", Apr 16, 2017
2025-26 Courses
- Chemistry Unleashed: Exploring the Chemistry that Transforms Our World
CHEM 93 (Win) -
Prior Year Courses
2024-25 Courses
2023-24 Courses
All Publications
-
An automated framework for assessing how well LLMs cite relevant medical references.
Nature communications
2025; 16 (1): 3615
Abstract
As large language models (LLMs) are increasingly used to address health-related queries, it is crucial that they support their conclusions with credible references. While models can cite sources, the extent to which these support claims remains unclear. To address this gap, we introduce SourceCheckup, an automated agent-based pipeline that evaluates the relevance and supportiveness of sources in LLM responses. We evaluate seven popular LLMs on a dataset of 800 questions and 58,000 pairs of statements and sources on data that represent common medical queries. Our findings reveal that between 50% and 90% of LLM responses are not fully supported, and sometimes contradicted, by the sources they cite. Even for GPT-4o with Web Search, approximately 30% of individual statements are unsupported, and nearly half of its responses are not fully supported. Independent assessments by doctors further validate these results. Our research underscores significant limitations in current LLMs to produce trustworthy medical references.
View details for DOI 10.1038/s41467-025-58551-6
View details for PubMedID 40240349
View details for PubMedCentralID 10543445
-
Verifying Facts in Patient Care Documents Generated by Large Language Models Using Electronic Health Records
NEJM AI
2025; 3 (1)
View details for DOI 10.1056/AIdbp2500418
-
The evaluation of the performance of ChatGPT in the management of labor analgesia.
Journal of clinical anesthesia
2024; 98: 111582
Abstract
ChatGPT4 is a leading large language model (LLM) chatbot released by OpenAI in 2023. ChatGPT4 can respond to free-text queries, answer questions and make suggestions regarding virtually any topic. ChatGPT4 has successfully answered anesthesia and even obstetric anesthesia knowledge-based questions with reasonable accuracy. However, ChatGPT4 has yet to be challenged in obstetric anesthesia clinical decision-making.In this study, we evaluated the performance of ChatGPT4 in the management of clinical labor analgesia scenarios compared to expert obstetric anesthesiologists.Eight clinical questions with progressively increasing medical complexity were posed to ChatGPT4.The ChatGPT4 responses were rated by seven expert obstetric anesthesiologists based on safety, accuracy and completeness of each response using a five-point Likert rating scale.ChatGPT4 was deemed safe in 73% of responses to the presented obstetric anesthesia clinical scenarios (27% of responses were deemed unsafe). None of the ChatGPT4 responses were unanimously deemed to be safe by all seven expert obstetric anesthesiologists. Moreover, ChatGPT4 responses were overall partly accurate (score 4 out of 5) and somewhat incomplete (score 3.5 out of 5).In summary, approximately one quarter of all responses by ChatGPT4 were deemed unsafe by expert obstetric anesthesiologists. These findings may suggest the need for more fine-tuning and training of LLMs such as ChatGPT4 specifically for clinical decision making in obstetric anesthesia or other specialized medical fields. These LLMs may come to play an important future role in assisting obstetric anesthesiologists in clinical decision making and enhancing overall patient care.
View details for DOI 10.1016/j.jclinane.2024.111582
View details for PubMedID 39167880
-
Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia.
BJA open
2024; 10: 100280
Abstract
Patients are increasingly using artificial intelligence (AI) chatbots to seek answers to medical queries.Ten frequently asked questions in anaesthesia were posed to three AI chatbots: ChatGPT4 (OpenAI), Bard (Google), and Bing Chat (Microsoft). Each chatbot's answers were evaluated in a randomised, blinded order by five residency programme directors from 15 medical institutions in the USA. Three medical content quality categories (accuracy, comprehensiveness, safety) and three communication quality categories (understandability, empathy/respect, and ethics) were scored between 1 and 5 (1 representing worst, 5 representing best).ChatGPT4 and Bard outperformed Bing Chat (median [inter-quartile range] scores: 4 [3-4], 4 [3-4], and 3 [2-4], respectively; P<0.001 with all metrics combined). All AI chatbots performed poorly in accuracy (score of ≥4 by 58%, 48%, and 36% of experts for ChatGPT4, Bard, and Bing Chat, respectively), comprehensiveness (score ≥4 by 42%, 30%, and 12% of experts for ChatGPT4, Bard, and Bing Chat, respectively), and safety (score ≥4 by 50%, 40%, and 28% of experts for ChatGPT4, Bard, and Bing Chat, respectively). Notably, answers from ChatGPT4, Bard, and Bing Chat differed statistically in comprehensiveness (ChatGPT4, 3 [2-4] vs Bing Chat, 2 [2-3], P<0.001; and Bard 3 [2-4] vs Bing Chat, 2 [2-3], P=0.002). All large language model chatbots performed well with no statistical difference for understandability (P=0.24), empathy (P=0.032), and ethics (P=0.465).In answering anaesthesia patient frequently asked questions, the chatbots perform well on communication metrics but are suboptimal for medical content metrics. Overall, ChatGPT4 and Bard were comparable to each other, both outperforming Bing Chat.
View details for DOI 10.1016/j.bjao.2024.100280
View details for PubMedID 38764485
View details for PubMedCentralID PMC11099318
-
A comparative study of English and Japanese ChatGPT responses to anaesthesia-related medical questions.
BJA open
2024; 10: 100296
Abstract
The expansion of artificial intelligence (AI) within large language models (LLMs) has the potential to streamline healthcare delivery. Despite the increased use of LLMs, disparities in their performance particularly in different languages, remain underexplored. This study examines the quality of ChatGPT responses in English and Japanese, specifically to questions related to anaesthesiology.Anaesthesiologists proficient in both languages were recruited as experts in this study. Ten frequently asked questions in anaesthesia were selected and translated for evaluation. Three non-sequential responses from ChatGPT were assessed for content quality (accuracy, comprehensiveness, and safety) and communication quality (understanding, empathy/tone, and ethics) by expert evaluators.Eight anaesthesiologists evaluated English and Japanese LLM responses. The overall quality for all questions combined was higher in English compared with Japanese responses. Content and communication quality were significantly higher in English compared with Japanese LLMs responses (both P<0.001) in all three responses. Comprehensiveness, safety, and understanding were higher scores in English LLM responses. In all three responses, more than half of the evaluators marked overall English responses as better than Japanese responses.English LLM responses to anaesthesia-related frequently asked questions were superior in quality to Japanese responses when assessed by bilingual anaesthesia experts in this report. This study highlights the potential for language-related disparities in healthcare information and the need to improve the quality of AI responses in underrepresented languages. Future studies are needed to explore these disparities in other commonly spoken languages and to compare the performance of different LLMs.
View details for DOI 10.1016/j.bjao.2024.100296
View details for PubMedID 38975242
View details for PubMedCentralID PMC11225650
-
In Response.
Anesthesia and analgesia
2024; 138 (6): e37-e38
View details for DOI 10.1213/ANE.0000000000006979
View details for PubMedID 38771606
-
The Accuracy of ChatGPT-Generated Responses in Answering Commonly Asked Patient Questions About Labor Epidurals: A Survey-Based Study.
Anesthesia and analgesia
2024
View details for DOI 10.1213/ANE.0000000000006801
View details for PubMedID 38180897
-
Innovating pediatric care with social robots to alleviate anxiety.
Paediatric anaesthesia
2023
View details for DOI 10.1111/pan.14798
View details for PubMedID 37936541
-
Consumption of cruciferous vegetables and the risk of bladder cancer in a prospective US cohort: data from the NIH-AARP diet and health study
AMERICAN JOURNAL OF CLINICAL AND EXPERIMENTAL UROLOGY
2021; 9 (3): 229-238
View details for Web of Science ID 000672671600004
-
Hemodynamic changes in patients undergoing office-based sinus procedures under local anesthesia.
International forum of allergy & rhinology
2020; 10 (1): 114–20
Abstract
The objective of this study is to characterize changes in hemodynamics, pain, and anxiety during office-based endoscopic sinus procedures performed under local anesthesia.We conducted a prospective study of adults undergoing in-office endoscopic sinus procedures under local anesthesia. Patients with American Society of Anesthesiologists (ASA) Physical Status Classification System class 1 or 2 were included. Anesthesia was administered by topical 4% lidocaine/oxymetazoline and submucosal injection of 1% lidocaine/1:200,000 epinephrine. Vital signs and pain were measured at baseline, postinjection, and 5-minute intervals throughout the procedure. Anxiety levels were scored using the State-Trait Anxiety Inventory (STAI). Univariate and multivariate regression analyses were performed to identify factors significantly associated with changes in each hemodynamic metric.Twenty-five patients were studied. This cohort was 52% male, mean age of 57.8 ± 14.4 years, and Charlson Comorbidity Index (CCI) median of 2. Mean procedure duration was 25.0 ± 10.3 minutes. Mean maximal increase in systolic blood pressure (SBP) was 24.6 ± 17.8 mmHg from baseline. Mean maximal heart rate increase was 22.8 ± 10.8 beats per minute (bpm) from baseline. In multivariate regression analysis, when accounting for patient age, cardiac comorbidity, CCI, and ASA, older age was significantly associated with an increase of >20 mmHg in SBP (p = 0.043). Mean pain score during procedures was 1.5 ± 1.3 with a mean maximum of 4.0 ± 2.6. STAI anxiety scores did not change significantly from preprocedure to postprocedure (32.8 ± 11.6 to 31.0 ± 12.6, p = 0.46). No medical complications occurred.Although patients appear to tolerate office procedures well, providers should recognize the potential for significant fluctuations in blood pressure during the procedure, especially in older patients.
View details for DOI 10.1002/alr.22460
View details for PubMedID 31899857
-
Structure- and Ligand-Based Discovery of Chromane Arylsulfonamide Nav1.7 Inhibitors for the Treatment of Chronic Pain.
Journal of medicinal chemistry
2019; 62 (8): 4091-4109
Abstract
Using structure- and ligand-based design principles, a novel series of piperidyl chromane arylsulfonamide Nav1.7 inhibitors was discovered. Early optimization focused on improvement of potency through refinement of the low energy ligand conformation and mitigation of high in vivo clearance. An in vitro hepatotoxicity hazard was identified and resolved through optimization of lipophilicity and lipophilic ligand efficiency to arrive at GNE-616 (24), a highly potent, metabolically stable, subtype selective inhibitor of Nav1.7. Compound 24 showed a robust PK/PD response in a Nav1.7-dependent mouse model, and site-directed mutagenesis was used to identify residues critical for the isoform selectivity profile of 24.
View details for DOI 10.1021/acs.jmedchem.9b00141
View details for PubMedID 30943032
-
Biomechanical Study of a Multifilament Stainless Steel Cable Crimp System Versus a Multistrand Ultra-High Molecular Weight Polyethylene Polyester Suture Krackow Technique for Achilles Tendon Rupture Repair.
The Journal of foot and ankle surgery : official publication of the American College of Foot and Ankle Surgeons
2019; 59 (1): 86–90
Abstract
Currently, Achilles tendon rupture repair is surgically addressed with an open or minimally invasive approach using a heavy, nonabsorbable suture in a locking stitch configuration. However, these sutures have low stiffness and a propensity to stretch, which can result in gapping at the repair site. Our study compares a new multifilament stainless steel cable-crimp repair method to a standard Krackow repair using multistrand, ultra-high molecular weight polyethylene polyester sutures. Eight matched pairs of cadavers were randomly assigned for Achilles tendon repair using either Krackow technique with polyethylene polyester sutures or the multifilament stainless steel cable-crimp technique. Each repair was cyclically loaded from 10 to 50 N for 100 loading cycles, followed by a linear increase in load until complete failure of the repair. During cyclic loading, 4 of the 8 Krackow polyethylene polyester suture repairs failed, whereas none of the multifilament stainless steel cable crimp repairs failed. Load to failure was greater for the multifilament stainless steel cable crimp repairs (321.03 ± 118.71 N) than for the Krackow polyethylene polyester suture repairs (132.47 ± 103.39 N, p = .0078). The ultimate tensile strength of the multifilament stainless steel cable crimp repairs was also greater than that of the Krackow polyethylene polyester suture repairs (485.69 ± 47.93 N vs 378.71 ± 107.23 N, respectively, p = .12). The mode of failure was by suture breakage at the crimp for all cable-crimp repairs and by suture breakage at the knot, within the tendon, or suture pullout for the polyethylene polyester suture repairs. The multifilament stainless steel cable crimp construct may be a better alternative for Achilles tendon rupture repairs.
View details for DOI 10.1053/j.jfas.2019.01.022
View details for PubMedID 31882153
-
Budesonide irrigation with olfactory training improves outcomes compared with olfactory training alone in patients with olfactory loss
WILEY. 2018: 977–81
View details for DOI 10.1002/alr.22140
View details for Web of Science ID 000443132000002
https://orcid.org/0000-0001-9522-8937