Fateme (Fatima) Nateghi
Postdoctoral Scholar, Biomedical Informatics
Bio
As a postdoc researcher at the Division of Computational Medicine, I find myself at the exciting intersection of machine learning and healthcare. My journey began with a PhD in Biomedical Sciences from KU Leuven in Belgium, where I explored the complexities of machine learning algorithms and their transformative potential in clinical settings. My research focused on adapting these algorithms for time-to-event data, a method used to predict when specific events may occur in a patient’s future.
At Stanford, my work centers on building trustworthy AI systems to enhance healthcare delivery. I develop and evaluate machine learning models that integrate structured electronic health records (EHRs) and unstructured clinical notes to support real-world clinical decision-making. My recent projects include predicting treatment retention in opioid use disorder, improving antibiotic stewardship for urinary tract infections, and enabling digital consultations through large language models (LLMs). I'm particularly interested in embedding-based retrieval and retrieval-augmented generation (RAG) methods that help bridge cutting-edge AI research with clinical practice.
My role involves not just advancing the integration of machine learning in healthcare but also collaborating with a diverse team of clinicians, data scientists, and engineers. Together, we're striving to unravel complex healthcare challenges and ultimately improve patient outcomes.
Professional Education
-
PhD, KU Leuven, Biomedical sciences (2023)
All Publications
-
Quantization-aware matrix factorization for low bit rate image compression
INFORMATION SCIENCES
2026; 722
View details for DOI 10.1016/j.ins.2025.122646
View details for Web of Science ID 001568281800001
-
Holistic evaluation of large language models for medical tasks with MedHELM.
Nature medicine
2026
Abstract
While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. Here we introduce MedHELM, an extensible evaluation framework with three contributions. First, a clinician-validated taxonomy organizing medical AI applications into five categories that mirror real clinical tasks-clinical decision support (diagnostic decisions, treatment planning), clinical note generation (visit documentation, procedure reports), patient communication (education materials, care instructions), medical research (literature analysis, clinical data analysis) and administration (scheduling, workflow coordination). These encompass 22 subcategories and 121 specific tasks reflecting daily medical practice. Second, a comprehensive benchmark suite of 37 evaluations covering all subcategories. Third, systematic comparison of nine frontier LLMs-Claude 3.5 Sonnet, Claude 3.7 Sonnet, DeepSeek R1, Gemini 1.5 Pro, Gemini 2.0 Flash, GPT-4o, GPT-4o mini, Llama 3.3 and o3-mini-using an automated LLM-jury evaluation method. Our LLM-jury uses multiple AI evaluators to assess model outputs against expert-defined criteria. Advanced reasoning models (DeepSeek R1, o3-mini) demonstrated superior performance with win rates of 66%, although Claude 3.5 Sonnet achieved comparable results at 15% lower computational cost. These results not only highlight current model capabilities but also demonstrate how MedHELM could enable evidence-based selection of medical AI systems for healthcare applications.
View details for DOI 10.1038/s41591-025-04151-2
View details for PubMedID 41559415
View details for PubMedCentralID 10916499
-
Session Introduction: AI and Machine Learning in Clinical Medicine: Generative and Interactive Systems at the Human-Machine Interface.
Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
2025; 30: 33-39
Abstract
Artificial Intelligence (AI) technologies are increasingly capable of processing complex and multilayered datasets. Innovations in generative AI and deep learning have notably enhanced the extraction of insights from both unstructured texts, images, and structured data alike. These breakthroughs in AI technology have spurred a wave of research in the medical field, leading to the creation of a variety of tools aimed at improving clinical decision-making, patient monitoring, image analysis, and emergency response systems. However, thorough research is essential to fully understand the broader impact and potential consequences of deploying AI within the healthcare sector.
View details for PubMedID 39670359
-
Predicting treatment retention in medication for opioid use disorder: a machine learning approach using NLP and LLM-derived clinical features.
Journal of the American Medical Informatics Association : JAMIA
2025
Abstract
OBJECTIVE: Building upon our previous work on predicting treatment retention in medications for opioid use disorder, we aimed to improve 6-month retention prediction in buprenorphine-naloxone (BUP-NAL) therapy by incorporating features derived from large language models (LLMs) applied to unstructured clinical notes.MATERIALS AND METHODS: We used de-identified electronic health record (EHR) data from Stanford Health Care (STARR) for model development and internal validation, and the NeuroBlu behavioral health database for external validation. Structured features were supplemented with 13 clinical and psychosocial features extracted from free-text notes using the CLinical Entity Augmented Retrieval pipeline, which combines named entity recognition with LLM-based classification to provide contextual interpretation. We trained classification (Logistic Regression, Random Forest, XGBoost) and survival models (CoxPH, Random Survival Forest, Survival XGBoost), evaluated using Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and C-index.RESULTS: XGBoost achieved the highest classification performance (ROC-AUC=0.65). Incorporating LLM-derived features improved model performance across all architectures, with the largest gains observed in simpler models such as Logistic Regression. In time-to-event analysis, Random Survival Forest and Survival XGBoost reached the highest C-index (0.65). SHapley Additive exPlanations analysis identified LLM-extracted features like Chronic Pain, Liver Disease, and Major Depression as key predictors. We also developed an interactive web tool for real-time clinical use.DISCUSSION: Features extracted using NLP and LLM-assisted methods improved model accuracy and interpretability, revealing valuable psychosocial risks not captured in structured EHRs.CONCLUSION: Combining structured EHR data with LLM-extracted features moderately improves BUP-NAL retention prediction, enabling personalized risk stratification and advancing AI-driven care for substance use disorders.
View details for DOI 10.1093/jamia/ocaf157
View details for PubMedID 40977375
-
Antibiotic Resistance Microbiology Dataset (ARMD): A Resource for Antimicrobial Resistance from EHRs.
Scientific data
2025; 12 (1): 1299
Abstract
The Antibiotic Resistance Microbiology Dataset (ARMD) is a de-identified resource derived from electronic health records (EHR) that facilitates research in antimicrobial resistance (AMR). ARMD encompasses big data from adult patients collected from over 15 years at two academic-affiliated hospitals, focusing on microbiological cultures, antibiotic susceptibilities, and associated clinical and demographic features. Key attributes include organism identification, susceptibility patterns for 55 antibiotics, implied susceptibility rules, and de-identified patient information. This dataset supports studies on antimicrobial stewardship, causal inference, and clinical decision-making. ARMD is designed to be reusable and interoperable, promoting collaboration and innovation in combating AMR. This paper describes the dataset's acquisition, structure, and utility while detailing its de-identification process.
View details for DOI 10.1038/s41597-025-05649-7
View details for PubMedID 40715119
View details for PubMedCentralID PMC12297523
-
Clinical entity augmented retrieval for clinical information extraction.
NPJ digital medicine
2025; 8 (1): 45
Abstract
Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.
View details for DOI 10.1038/s41746-024-01377-1
View details for PubMedID 39828800
View details for PubMedCentralID 4287068
-
Predictability of buprenorphine-naloxone treatment retention: A multi-site analysis combining electronic health records and machine learning.
Addiction (Abingdon, England)
2024
Abstract
Opioid use disorder (OUD) and opioid dependence lead to significant morbidity and mortality, yet treatment retention, crucial for the effectiveness of medications like buprenorphine-naloxone, remains unpredictable. Our objective was to determine the predictability of 6-month retention in buprenorphine-naloxone treatment using electronic health record (EHR) data from diverse clinical settings and to identify key predictors.This retrospective observational study developed and validated machine learning-based clinical risk prediction models using EHR data.Data were sourced from Stanford University's healthcare system and Holmusk's NeuroBlu database, reflecting a wide range of healthcare settings. The study analyzed 1800 Stanford and 7957 NeuroBlu treatment encounters from 2008 to 2023 and from 2003 to 2023, respectively.Predict continuous prescription of buprenorphine-naloxone for at least 6 months, without a gap of more than 30 days. The performance of machine learning prediction models was assessed by area under receiver operating characteristic (ROC-AUC) analysis as well as precision, recall and calibration. To further validate our approach's clinical applicability, we conducted two secondary analyses: a time-to-event analysis on a single site to estimate the duration of buprenorphine-naloxone treatment continuity evaluated by the C-index and a comparative evaluation against predictions made by three human clinical experts.Attrition rates at 6 months were 58% (NeuroBlu) and 61% (Stanford). Prediction models trained and internally validated on NeuroBlu data achieved ROC-AUCs up to 75.8 (95% confidence interval [CI] = 73.6-78.0). Addiction medicine specialists' predictions show a ROC-AUC of 67.8 (95% CI = 50.4-85.2). Time-to-event analysis on Stanford data indicated a median treatment retention time of 65 days, with random survival forest model achieving an average C-index of 65.9. The top predictor of treatment retention identified included the diagnosis of opioid dependence.US patients with opioid use disorder or opioid dependence treated with buprenorphine-naloxone prescriptions appear to have a high (∼60%) treatment attrition by 6 months. Machine learning models trained on diverse electronic health record datasets appear to be able to predict treatment continuity with accuracy comparable to that of clinical experts.
View details for DOI 10.1111/add.16587
View details for PubMedID 38923168
-
Improving 1-Year Mortality Prediction After Pediatric Heart Transplantation Using Hypothetical Donor-Recipient Matches
IEEE ACCESS
2024; 12: 89754-89762
View details for DOI 10.1109/ACCESS.2024.3418146
View details for Web of Science ID 001262677900001
https://orcid.org/0000-0002-8874-8835