Minh Nguyen's Profile | Stanford Profiles

Bio

Previous bio as a PhD student:
@DARE fellow (Diversifying Academia, Recruiting Excellence) https://vpge.stanford.edu/people/minh-nguyen
@Data Science Scholar https://datascience.stanford.edu/people/minh-nguyen

Education & Certifications

Ph.D, Stanford University, Biomedical Informatics - PhD minor: Management Science & Engineering (2024)
M.A, University of California, Berkeley, Biostatistics (2016)
B.S, San Francisco State University, Nursing (2010)

Contact

Work
minh084@stanford.edu

University - Staff Department: Medicine - Med/Stanford Prevention Research Center Position: Casual Employee

University - Affiliate Department: Medicine - Med/Stanford Prevention Research Center

Additional Info

Mail Code: 5702
ORCID:
https://orcid.org/0000-0001-7149-849X

Work Experience

Data Scientist, PAVIR

Clinical AI Research Scientist

Location

Palo Alto
Registered Nurse, UCSF Health (8/9/2010)

Location

San Francisco, CA, USA

All Publications

Red teaming ChatGPT in medicine to yield real-world insights on model behavior. NPJ digital medicine Chang, C. T., Farah, H., Gui, H., Rezaei, S. J., Bou-Khalil, C., Park, Y. J., Swaminathan, A., Omiye, J. A., Kolluri, A., Chaurasia, A., Lozano, A., Heiman, A., Jia, A. S., Kaushal, A., Jia, A., Iacovelli, A., Yang, A., Salles, A., Singhal, A., Narasimhan, B., Belai, B., Jacobson, B. H., Li, B., Poe, C. H., Sanghera, C., Zheng, C., Messer, C., Kettud, D. V., Pandya, D., Kaur, D., Hla, D., Dindoust, D., Moehrle, D., Ross, D., Chou, E., Lin, E., Haredasht, F. N., Cheng, G., Gao, I., Chang, J., Silberg, J., Fries, J. A., Xu, J., Jamison, J., Tamaresis, J. S., Chen, J. H., Lazaro, J., Banda, J. M., Lee, J. J., Matthys, K. E., Steffner, K. R., Tian, L., Pegolotti, L., Srinivasan, M., Manimaran, M., Schwede, M., Zhang, M., Nguyen, M., Fathzadeh, M., Zhao, Q., Bajra, R., Khurana, R., Azam, R., Bartlett, R., Truong, S. T., Fleming, S. L., Raj, S., Behr, S., Onyeka, S., Muppidi, S., Bandali, T., Eulalio, T. Y., Chen, W., Zhou, X., Ding, Y., Cui, Y., Tan, Y., Liu, Y., Shah, N., Daneshjou, R. 2025; 8 (1): 149

Abstract

Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.

View details for DOI 10.1038/s41746-025-01542-0

View details for PubMedID 40055532

View details for PubMedCentralID 10564921
Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement. The American statistician Nguyen, M., Eulalio, T., Marafino, B. J., Rose, C., Chen, J. H., Baiocchi, M. 2024; 78 (4): 456-464

Abstract

A gap remains between developing risk prediction models and deploying models to support real-world decision making, especially in high-stakes situations. Human-experts' reasoning abilities remain critical in identifying potential improvements and ensuring safety. We propose a thick data analytics (TDA) framework for eliciting and combining expert-human insight into the evaluation of models. The insight is threefold: (1) statistical methods are limited to using joint distributions of observable quantities for predictions but often there is more information available in a real-world than what is usable for algorithms, (2) domain experts can access more information (e.g., patient files) than an algorithm and bring additional knowledge into their assessments through leveraging insights and experiences, and (3) experts can re-frame and re-evaluate prediction problems to suit real-world situations. Here, we revisit an example of predicting temporal risk for intensive care admission within 24 hours of hospitalization. We propose a sampling procedure for identifying informative cases for deeper inspection. Expert feedback is used to understand sources of information to improve model development and deployment. We recommend model assessment based on objective evaluation metrics derived from subjective evaluations of the problem formulation. TDA insights facilitate iterative model development towards safer, actionable, and acceptable risk predictions.

View details for DOI 10.1080/00031305.2024.2327535

View details for PubMedID 39524529

View details for PubMedCentralID PMC11545316
Constrained Design of a Binary Instrument in a Partially Linear Model Morrison, T., Nguyen, M., Baiocchi, M., Owen, A. arxiv. 2024

Abstract

We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment status. Our goal is to assign the nudge as a function of covariates in a way that best estimates the local average treatment effect (LATE). We assume a partially linear model, wherein the baseline model is non-parametric and the treatment term is linear in the covariates. Under this model, we outline a two-stage procedure to consistently estimate the LATE. Though the variance of the LATE is intractable, we derive a finite sample approximation and thus a design criterion to minimize. This criterion is convex, allowing for constraints that might arise for budgetary or ethical reasons. We prove conditions under which our solution asymptotically recovers the lowest true variance among all possible nudge propensities. We apply our method to a semi-synthetic example involving triage in an emergency department and find significant gains relative to a regression discontinuity design. https://arxiv.org/pdf/2406.05592
Developing machine learning models to personalize care levels among emergency room patients for hospital admission. Journal of the American Medical Informatics Association : JAMIA Nguyen, M., Corbin, C. K., Eulalio, T., Ostberg, N. P., Machiraju, G., Marafino, B. J., Baiocchi, M., Rose, C., Chen, J. H. 2021

Abstract

OBJECTIVE: To develop prediction models for intensive care unit (ICU) vs non-ICU level-of-care need within 24 hours of inpatient admission for emergency department (ED) patients using electronic health record data.MATERIALS AND METHODS: Using records of 41 654 ED visits to a tertiary academic center from 2015 to 2019, we tested 4 algorithms-feed-forward neural networks, regularized regression, random forests, and gradient-boosted trees-to predict ICU vs non-ICU level-of-care within 24 hours and at the 24th hour following admission. Simple-feature models included patient demographics, Emergency Severity Index (ESI), and vital sign summary. Complex-feature models added all vital signs, lab results, and counts of diagnosis, imaging, procedures, medications, and lab orders.RESULTS: The best-performing model, a gradient-boosted tree using a full feature set, achieved an AUROC of 0.88 (95%CI: 0.87-0.89) and AUPRC of 0.65 (95%CI: 0.63-0.68) for predicting ICU care need within 24 hours of admission. The logistic regression model using ESI achieved an AUROC of 0.67 (95%CI: 0.65-0.70) and AUPRC of 0.37 (95%CI: 0.35-0.40). Using a discrimination threshold, such as 0.6, the positive predictive value, negative predictive value, sensitivity, and specificity were 85%, 89%, 30%, and 99%, respectively. Vital signs were the most important predictors.DISCUSSION AND CONCLUSIONS: Undertriaging admitted ED patients who subsequently require ICU care is common and associated with poorer outcomes. Machine learning models using readily available electronic health record data predict subsequent need for ICU admission with good discrimination, substantially better than the benchmarking ESI system. The results could be used in a multitiered clinical decision-support system to improve ED triage.

View details for DOI 10.1093/jamia/ocab118

View details for PubMedID 34402507
Machine learning for initial insulin estimation in hospitalized patients. Journal of the American Medical Informatics Association : JAMIA Nguyen, M., Jankovic, I., Kalesinskas, L., Baiocchi, M., Chen, J. H. 2021

Abstract

OBJECTIVE: The study sought to determine whether machine learning can predict initial inpatient total daily dose (TDD) of insulin from electronic health records more accurately than existing guideline-based dosing recommendations.MATERIALS AND METHODS: Using electronic health records from a tertiary academic center between 2008 and 2020 of 16 848 inpatients receiving subcutaneous insulin who achieved target blood glucose control of 100-180 mg/dL on a calendar day, we trained an ensemble machine learning algorithm consisting of regularized regression, random forest, and gradient boosted tree models for 2-stage TDD prediction. We evaluated the ability to predict patients requiring more than 6 units TDD and their point-value TDDs to achieve target glucose control.RESULTS: The method achieves an area under the receiver-operating characteristic curve of 0.85 (95% confidence interval [CI], 0.84-0.87) and area under the precision-recall curve of 0.65 (95% CI, 0.64-0.67) for classifying patients who require more than 6 units TDD. For patients requiring more than 6 units TDD, the mean absolute percent error in dose prediction based on standard clinical calculators using patient weight is in the range of 136%-329%, while the regression model based on weight improves to 60% (95% CI, 57%-63%), and the full ensemble model further improves to 51% (95% CI, 48%-54%).DISCUSSION: Owing to the narrow therapeutic window and wide individual variability, insulin dosing requires adaptive and predictive approaches that can be supported through data-driven analytic tools.CONCLUSIONS: Machine learning approaches based on readily available electronic medical records can discriminate which inpatients will require more than 6 units TDD and estimate individual doses more accurately than standard guidelines and practices.

View details for DOI 10.1093/jamia/ocab099

View details for PubMedID 34279615
Dynamic Impact of Transfusion Ratios on Outcomes in Severely Injured Patients: Targeted Machine Learning Analysis of the PROPPR Randomized Clinical Trial. The journal of trauma and acute care surgery Nguyen, M., Pirracchio, R., Kornblith, L. Z., Callcut, R., Fox, E. E., Wade, C. E., Schreiber, M., Holcomb, J. B., Coyle, J., Cohen, M., Hubbard, A. 2020

Abstract

BACKGROUND: Massive transfusion protocols to treat post-injury hemorrhage are based on pre-defined blood product transfusion ratios followed by goal-directed transfusion based on patient's clinical evolution. However, it remains unclear how these transfusion ratios impact patient outcomes over time from injury.METHODS: The Pragmatic, Randomized Optimal Platelet and Plasma Ratios (PROPPR) is a phase 3, randomized controlled trial, across 12 level-I trauma centers in North America. From 2012 to 2013, 680 severely injured patients required massive transfusion. We used semi-parametric machine learning techniques and causal inference methods to augment the intent-to-treat analysis of PROPPR, estimating the dynamic relationship between transfusion ratios and outcomes: mortality and hemostasis at different time-points during the first 24 hours after admission.RESULTS: In the intention-to-treat analysis, the 1:1:1 group tended to have decreased mortality, but with no statistical significance. For patients in whom hemostasis took longer than 2 hours, the 1:1:1 ratio was associated with a higher probability of hemostasis, statistically significant from the 4 hour on. In the per-protocol, actual-transfusion-ratios-received analysis, during four successive time intervals, no significant association was found between the actual ratios and mortality. When comparing patient groups who received both high plasma:PRBC and high platelet:PRBC ratios to the group of low ratios in both, the relative risk of achieving hemostasis was 2.49 (95% CI = 1.19-5.22) during the 3 hour after admission, suggesting a significant beneficial impact of higher transfusion ratios of plasma and platelets on hemostasis.CONCLUSIONS: Our results suggest that the impact of transfusion ratios on hemostasis is dynamic. Overall, the transfusion ratios had no significant impact on mortality over time. However, receiving higher ratios of platelets and plasma relative to red blood cells hastens hemostasis in subjects who have yet to achieve hemostasis within 3 hours after hospital admission.LEVEL OF EVIDENCE: Prognostic, level III.

View details for DOI 10.1097/TA.0000000000002819

View details for PubMedID 32520897
Context is Key: Using the Audit Log to Capture Contextual Factors Affecting Stroke Care Processes. AMIA ... Annual Symposium proceedings. AMIA Symposium Noshad, M., Rose, C. C., Thombley, R., Chiang, J., Corbin, C. K., Nguyen, M., Liu, V. X., Adler-Milstein, J., Chen, J. H. 2020; 2020: 953–62

Abstract

High quality patient care through timely, precise and efficacious management depends not only on the clinical presentation of a patient, but the context of the care environment to which they present. Understanding and improving factors that affect streamlined workflow, such as provider or department busyness or experience, are essential to improving these care processes, but have been difficult to measure with traditional approaches and clinical data sources. In this exploratory data analysis, we aim to determine whether such contextual factors can be captured for important clinical processes by taking advantage of non-traditional data sources like EHR audit logs which passively track the electronic behavior of clinical teams. Our results illustrate the potential of defining multiple measures of contextual factors and their correlation with key care processes. We illustrate this using thrombolytic (tPA) treatment for ischemic stroke as an example process, but the measurement approaches can be generalized to multiple scenarios.

View details for PubMedID 33936471

Minh Nguyen

Casual Employee, Medicine - Med/Stanford Prevention Research Center

Bio

Education & Certifications

Contact

Additional Info

Work Experience

Location

Location

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract