@DARE fellow (Diversifying Academia, Recruiting Excellence)

@Data Science Scholar

Education & Certifications

  • M.A, University of California, Berkeley, Biostatistics (2016)
  • B.S, San Francisco State University, Nursing (2010)

Work Experience

  • Registered Nurse, UCSF Health (8/9/2010 - Present)


    San Francisco, CA, USA

All Publications

  • Thick Data Analytics (TDA): An Iterative and Inductive Framework for Algorithmic Improvement The American Statistician Nguyen, M., Eulalio, T., Marafino, B. J., Rose, C., Chen, J. H., Baiocchi, M. 2024
  • Constrained Design of a Binary Instrument in a Partially Linear Model Morrison, T., Nguyen, M., Baiocchi, M., Owen, A. arxiv. 2024


    We study the question of how best to assign an encouragement in a randomized encouragement study. In our setting, units arrive with covariates, receive a nudge toward treatment or control, acquire one of those statuses in a way that need not align with the nudge, and finally have a response observed. The nudge can be seen as a binary instrument that affects the response only via the treatment status. Our goal is to assign the nudge as a function of covariates in a way that best estimates the local average treatment effect (LATE). We assume a partially linear model, wherein the baseline model is non-parametric and the treatment term is linear in the covariates. Under this model, we outline a two-stage procedure to consistently estimate the LATE. Though the variance of the LATE is intractable, we derive a finite sample approximation and thus a design criterion to minimize. This criterion is convex, allowing for constraints that might arise for budgetary or ethical reasons. We prove conditions under which our solution asymptotically recovers the lowest true variance among all possible nudge propensities. We apply our method to a semi-synthetic example involving triage in an emergency department and find significant gains relative to a regression discontinuity design.

  • Developing machine learning models to personalize care levels among emergency room patients for hospital admission. Journal of the American Medical Informatics Association : JAMIA Nguyen, M., Corbin, C. K., Eulalio, T., Ostberg, N. P., Machiraju, G., Marafino, B. J., Baiocchi, M., Rose, C., Chen, J. H. 2021


    OBJECTIVE: To develop prediction models for intensive care unit (ICU) vs non-ICU level-of-care need within 24 hours of inpatient admission for emergency department (ED) patients using electronic health record data.MATERIALS AND METHODS: Using records of 41 654 ED visits to a tertiary academic center from 2015 to 2019, we tested 4 algorithms-feed-forward neural networks, regularized regression, random forests, and gradient-boosted trees-to predict ICU vs non-ICU level-of-care within 24 hours and at the 24th hour following admission. Simple-feature models included patient demographics, Emergency Severity Index (ESI), and vital sign summary. Complex-feature models added all vital signs, lab results, and counts of diagnosis, imaging, procedures, medications, and lab orders.RESULTS: The best-performing model, a gradient-boosted tree using a full feature set, achieved an AUROC of 0.88 (95%CI: 0.87-0.89) and AUPRC of 0.65 (95%CI: 0.63-0.68) for predicting ICU care need within 24 hours of admission. The logistic regression model using ESI achieved an AUROC of 0.67 (95%CI: 0.65-0.70) and AUPRC of 0.37 (95%CI: 0.35-0.40). Using a discrimination threshold, such as 0.6, the positive predictive value, negative predictive value, sensitivity, and specificity were 85%, 89%, 30%, and 99%, respectively. Vital signs were the most important predictors.DISCUSSION AND CONCLUSIONS: Undertriaging admitted ED patients who subsequently require ICU care is common and associated with poorer outcomes. Machine learning models using readily available electronic health record data predict subsequent need for ICU admission with good discrimination, substantially better than the benchmarking ESI system. The results could be used in a multitiered clinical decision-support system to improve ED triage.

    View details for DOI 10.1093/jamia/ocab118

    View details for PubMedID 34402507

  • Machine learning for initial insulin estimation in hospitalized patients. Journal of the American Medical Informatics Association : JAMIA Nguyen, M., Jankovic, I., Kalesinskas, L., Baiocchi, M., Chen, J. H. 2021


    OBJECTIVE: The study sought to determine whether machine learning can predict initial inpatient total daily dose (TDD) of insulin from electronic health records more accurately than existing guideline-based dosing recommendations.MATERIALS AND METHODS: Using electronic health records from a tertiary academic center between 2008 and 2020 of 16 848 inpatients receiving subcutaneous insulin who achieved target blood glucose control of 100-180 mg/dL on a calendar day, we trained an ensemble machine learning algorithm consisting of regularized regression, random forest, and gradient boosted tree models for 2-stage TDD prediction. We evaluated the ability to predict patients requiring more than 6 units TDD and their point-value TDDs to achieve target glucose control.RESULTS: The method achieves an area under the receiver-operating characteristic curve of 0.85 (95% confidence interval [CI], 0.84-0.87) and area under the precision-recall curve of 0.65 (95% CI, 0.64-0.67) for classifying patients who require more than 6 units TDD. For patients requiring more than 6 units TDD, the mean absolute percent error in dose prediction based on standard clinical calculators using patient weight is in the range of 136%-329%, while the regression model based on weight improves to 60% (95% CI, 57%-63%), and the full ensemble model further improves to 51% (95% CI, 48%-54%).DISCUSSION: Owing to the narrow therapeutic window and wide individual variability, insulin dosing requires adaptive and predictive approaches that can be supported through data-driven analytic tools.CONCLUSIONS: Machine learning approaches based on readily available electronic medical records can discriminate which inpatients will require more than 6 units TDD and estimate individual doses more accurately than standard guidelines and practices.

    View details for DOI 10.1093/jamia/ocab099

    View details for PubMedID 34279615

  • Dynamic Impact of Transfusion Ratios on Outcomes in Severely Injured Patients: Targeted Machine Learning Analysis of the PROPPR Randomized Clinical Trial. The journal of trauma and acute care surgery Nguyen, M., Pirracchio, R., Kornblith, L. Z., Callcut, R., Fox, E. E., Wade, C. E., Schreiber, M., Holcomb, J. B., Coyle, J., Cohen, M., Hubbard, A. 2020


    BACKGROUND: Massive transfusion protocols to treat post-injury hemorrhage are based on pre-defined blood product transfusion ratios followed by goal-directed transfusion based on patient's clinical evolution. However, it remains unclear how these transfusion ratios impact patient outcomes over time from injury.METHODS: The Pragmatic, Randomized Optimal Platelet and Plasma Ratios (PROPPR) is a phase 3, randomized controlled trial, across 12 level-I trauma centers in North America. From 2012 to 2013, 680 severely injured patients required massive transfusion. We used semi-parametric machine learning techniques and causal inference methods to augment the intent-to-treat analysis of PROPPR, estimating the dynamic relationship between transfusion ratios and outcomes: mortality and hemostasis at different time-points during the first 24 hours after admission.RESULTS: In the intention-to-treat analysis, the 1:1:1 group tended to have decreased mortality, but with no statistical significance. For patients in whom hemostasis took longer than 2 hours, the 1:1:1 ratio was associated with a higher probability of hemostasis, statistically significant from the 4 hour on. In the per-protocol, actual-transfusion-ratios-received analysis, during four successive time intervals, no significant association was found between the actual ratios and mortality. When comparing patient groups who received both high plasma:PRBC and high platelet:PRBC ratios to the group of low ratios in both, the relative risk of achieving hemostasis was 2.49 (95% CI = 1.19-5.22) during the 3 hour after admission, suggesting a significant beneficial impact of higher transfusion ratios of plasma and platelets on hemostasis.CONCLUSIONS: Our results suggest that the impact of transfusion ratios on hemostasis is dynamic. Overall, the transfusion ratios had no significant impact on mortality over time. However, receiving higher ratios of platelets and plasma relative to red blood cells hastens hemostasis in subjects who have yet to achieve hemostasis within 3 hours after hospital admission.LEVEL OF EVIDENCE: Prognostic, level III.

    View details for DOI 10.1097/TA.0000000000002819

    View details for PubMedID 32520897

  • Context is Key: Using the Audit Log to Capture Contextual Factors Affecting Stroke Care Processes. AMIA ... Annual Symposium proceedings. AMIA Symposium Noshad, M., Rose, C. C., Thombley, R., Chiang, J., Corbin, C. K., Nguyen, M., Liu, V. X., Adler-Milstein, J., Chen, J. H. 2020; 2020: 953–62


    High quality patient care through timely, precise and efficacious management depends not only on the clinical presentation of a patient, but the context of the care environment to which they present. Understanding and improving factors that affect streamlined workflow, such as provider or department busyness or experience, are essential to improving these care processes, but have been difficult to measure with traditional approaches and clinical data sources. In this exploratory data analysis, we aim to determine whether such contextual factors can be captured for important clinical processes by taking advantage of non-traditional data sources like EHR audit logs which passively track the electronic behavior of clinical teams. Our results illustrate the potential of defining multiple measures of contextual factors and their correlation with key care processes. We illustrate this using thrombolytic (tPA) treatment for ischemic stroke as an example process, but the measurement approaches can be generalized to multiple scenarios.

    View details for PubMedID 33936471