Behzad Naderalvojoud's Profile

Bio

Behzad Naderalvojoud is a biomedical informatics scientist at the Stanford Center for Biomedical Informatics Research. He received his Ph.D. degree in computer science at Hacettepe University, Turkey, in 2020. He is immersed in the fields of machine learning, deep learning, natural language understanding, and Big data analytics and works on health knowledge discovery platforms that transfer Big health data from volume-based to value-based by generating relational knowledge leading to innovative treatments, predictive therapeutic outcomes, and early diagnosis. He was the leader of many industrial AI projects in the fields of healthcare intelligence and information management in the Eureka cluster programs.

Dr. Naderalvojoud has published several papers in the field of natural language understanding by working on word sense disambiguation, sentiment analysis, neural word embeddings, and deep learning models through national and international projects.

He is currently working on the funded NLM grant project "Advancing Knowledge Discovery for Postoperative Pain Management" under the supervision of Dr. Tina Hernandez-Boussard. He develops descriptive, predictive, and analytical tools using OMOP CDM for postoperative pain research to facilitate timely generation of evidence across multiple populations and settings.

All Publications

Graph augmented transformers improve chemotherapy toxicity symptom extraction from clinical notes. Nature communications Saquand, E., Naderalvojoud, B., Schuessler, M., Pillai, M., Rice, B. T., Blayney, D. W., Hernandez-Boussard, T. 2026

Abstract

Chemotherapy is essential for cancer treatment but may cause adverse events requiring emergency department visits and hospitalizations, placing substantial burdens on patients and healthcare systems. Existing approaches to detect these events often rely on structured electronic health records (EHR) data, which incompletely capture patients' symptom trajectories. Clinical notes contain richer information yet remain challenging to synthesize. Here we show that integrating transformer-based language models with graph neural networks improves extraction of chemotherapy-related toxicity symptoms from clinical notes. We developed Graph-Augmented Transformer for Clinical Notes (GAT-CN), which embeds patient notes using Bio+ClinicalBERT and links them to symptom-related terms within a heterogeneous clinical graph learned using GraphSAGE. In a multi-symptom classification task, GAT-CN outperformed transformer-only models, achieving a weighted AUROC of 0.850 and AUPRC of 0.812. The model also identified additional diagnoses not captured in structured EHRs, confirmed through manual annotation. These results demonstrate that graph-augmented models improve symptom detection from clinical narratives and support earlier monitoring of chemotherapy-related adverse events.

View details for DOI 10.1038/s41467-026-72347-2

View details for PubMedID 42049739
Cost-Benefit Analysis of Preventing Acute Care Use in Oncology Patients Following Systemic Therapy Using Medicare Claims Data: Retrospective Cohort Study. JMIR medical informatics Keller, S. A., Schuessler, M., Naderalvojoud, B., Seto, T., Tian, L., Roy, M., Hernandez-Boussard, T. 2025; 13: e77891

Abstract

Acute care use (ACU) represents a major economic burden in oncology, which can ideally be prevented. Existing models effectively predict such events.We aimed to quantify the cost savings achieved by implementing a model to predict ACU in oncology patients undergoing systemic therapy.This retrospective cohort study analyzed patients with cancer at an academic medical center from 2010 to 2022. We included patients who received systemic therapy and identified ACU events occurring after treatment initiation, excluding those with known death dates within the study period. Data on ACU-related expenses were gathered from Medicare claims and mapped to service codes in electronic health records, yielding average daily costs for each patient over 180 days following the start of therapy. The exposure was an ACU event.The main outcome was the average daily cost per patient at the end of the first 180 days of systemic therapy. We observed that expense accumulation flattened earlier and more rapidly among non-ACU patients. This study included 20,556 patients, of whom 3820 (18.58%) experienced at least 1 ACU. The average daily cost per patient for those with and without ACU was US $94.62 (SD US $72.54; 95% CI US $92.32-$96.92) and US $53.28 (SD US $59.92; 95% CI US $52.37-$54.19), respectively. The average total cost per ACU and non-ACU patient was US $17,031.92 (SD US $13,056.63; 95% CI US $16,616.74-$17,445.09) and US $9591.06 (SD US $10,785.83; 95% CI US $9427.64-$9754.48), respectively. To estimate the long-term financial impact of deploying the predictive model, we conducted a cost-benefit analysis based on an annual cohort size of 2177 patients. In the first year alone, the model yielded projected savings of US $910,000. By year 6, projected savings grew to US $9.46 million annually. The cumulative avoided costs over a 6-year deployment period totaled approximately US $31.11 million. These estimates compared the baseline cost model to the intervention model assuming a prevention rate of 35% for preventable ACU events and an average ACU cost of US $17,031.92 (SD US $13,037).Predictive analytics can significantly reduce costs associated with ACU events, enhancing economic efficiency in cancer care. Further research is needed to explore potential health benefits.

View details for DOI 10.2196/77891

View details for PubMedID 41380118
Large Language Models Outperform Traditional Natural Language Processing Methods in Extracting Patient-Reported Outcomes in Inflammatory Bowel Disease. Gastro hep advances Patel, P. V., Davis, C., Ralbovsky, A., Tinoco, D., Williams, C. Y., Slatter, S., Naderalvojoud, B., Rosen, M. J., Hernandez-Boussard, T., Rudrapatna, V. 2025; 4 (2): 100563

Abstract

Patient-reported outcomes (PROs) are vital in assessing disease activity and treatment outcomes in inflammatory bowel disease (IBD). However, manual extraction of these PROs from the free-text of clinical notes is burdensome. We aimed to improve data curation from free-text information in the electronic health record, making it more available for research and quality improvement. This study aimed to compare traditional natural language processing (tNLP) and large language models (LLMs) in extracting 3 IBD PROs (abdominal pain, diarrhea, fecal blood) from clinical notes across 2 institutions.Clinic notes were annotated for each PRO using preset protocols. Models were developed and internally tested at the University of California, San Francisco, and then externally validated at Stanford University. We compared tNLP and LLM-based models on accuracy, sensitivity, specificity, positive, and negative predictive value. In addition, we conducted fairness and error assessments.Interrater reliability between annotators was >90%. On the University of California, San Francisco test set (n = 50), the top-performing tNLP models showcased accuracies of 92% (abdominal pain), 82% (diarrhea) and 80% (fecal blood), comparable to GPT-4, which was 96%, 88%, and 90% accurate, respectively. On external validation at Stanford (n = 250), tNLP models failed to generalize (61%-62% accuracy) while GPT-4 maintained accuracies >90%. Pathways Language Model-2 and Generative Pre-trained Transformer-4 showed similar performance. No biases were detected based on demographics or diagnosis.LLMs are accurate and generalizable methods for extracting PROs. They maintain excellent accuracy across institutions, despite heterogeneity in note templates and authors. Widespread adoption of such tools has the potential to enhance IBD research and patient care.

View details for DOI 10.1016/j.gastha.2024.10.003

View details for PubMedID 39877865

View details for PubMedCentralID PMC11772946
Evaluating the impact of data biases on algorithmic fairness and clinical utility of machine learning models for prolonged opioid use prediction. JAMIA open Naderalvojoud, B., Curtin, C., Asch, S. M., Humphreys, K., Hernandez-Boussard, T. 2025; 8 (5): ooaf115

Abstract

Objectives: The growing use of machine learning (ML) in healthcare raises concerns about how data biases affect real-world model performance. While existing frameworks evaluate algorithmic fairness, they often overlook the impact of bias on generalizability and clinical utility, which are critical for safe deployment. Building on prior methods, this study extends bias analysis to include clinical utility, addressing a key gap between fairness evaluation and decision-making.Materials and Methods: We applied a 3-phase evaluation to a previously developed model predicting prolonged opioid use (POU), validated on Veterans Health Administration (VHA) data. The analysis included internal and external validation, model retraining on VHA data, and subgroup evaluation across demographic, vulnerable, risk, and comorbidity groups. We assessed performance using area under the receiver operating characteristic curve (AUROC), calibration, and decision curve analysis, incorporating standardized net-benefits to evaluate clinical utility alongside fairness and generalizability.Results: The internal cohort (N=41929) had a 14.7% POU prevalence, compared to 34.3% in the external VHA cohort (N=397150). The model's AUROC decreased from 0.74 in the internal test cohort to 0.70 in the full external cohort. Subgroup-level performance averaged 0.69 (SD=0.01), showing minimal deviation from the external cohort overall. Retraining on VHA data improved AUROCs to 0.82. Clinical utility analysis showed systematic shifts in net-benefit across threshold probabilities.Discussion: While the POU model showed generalizability and fairness internally, external validation and retraining revealed performance and utility shifts across subgroups.Conclusion: Population-specific biases affect clinical utility-an often-overlooked dimension in fairness evaluation-a key need to ensure equitable benefits across diverse patient groups.

View details for DOI 10.1093/jamiaopen/ooaf115

View details for PubMedID 41036091
Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records. JAMIA open Mungle, T., Naderalvojoud, B., Andrews, C. A., An, H. S., Bicket, A., Zhang, A., Rosenthal, J., Lee, W. S., Ludwig, C. A., Mekonnen, B., Pershing, S., Stein, J. D., Hernandez-Boussard, T. 2025; 8 (4): ooaf080

Abstract

Existing ophthalmology studies for clinical phenotypes identification in real-world datasets (RWD) rely exclusively on structured data elements (SDE). We evaluated the performance, generalizability, and fairness of multimodal ensemble models that integrate real-world SDE and free-text data compared to SDE-only models to identify patients with glaucoma.This is a retrospective cross-sectional study involving 2 health systems- University of Michigan (UoM) and Stanford University (SU). It involves 1728 patients visiting eye clinics during 2012-2021. Free-text embeddings extracted using BioClinicalBERT were combined with SDE. EditedNearestNeighbor (ENN) undersampling and Borderline-Synthetic Minority Over-sampling Technique (bSMOTE) addressed class imbalance. Lasso Regression (LR), Random Forest (RF), Support Vector Classifier (SVC) models were trained on UoM imbalanced (imb) and resampled data along with bagging ensemble method. Models were externally validated with SU data. Fairness was assessed using equalized odds difference (EOD) and Target Probability Difference (TPD).Among 900 and 828 patients from UoM and SU, 10% and 23% respectively had glaucoma as confirmed by ophthalmologists. At UoM, multimodal LRimb (F1 = 76.60 [61.90-88.89]; AUROC = 95.41 [87.01-99.63]) outperformed unimodal RFimb (F1 = 69.77 [52.94-83.64]; AUROC = 97.72 [95.95-99.18]) and ICD-coding method (F1 = 53.01 [39.51-65.43]; AUROC = 90.10 [84.59-93.93]). Bagging (BM = LRENN + LRbSMOTE) improved performance achieving an F1 of 83.02 [70.59-92.86] and AUROC of 97.59 [92.98-99.88]. During external validation BM achieved the highest F1 (68.47 [62.61-73.75]), outperforming unimodal (F1 = 51.26 [43.80-58.13]) and multimodal LRimb (F1 = 62.46 [55.95-68.24]). BM EOD revealed lower disparities for sex (<0.1), race (<0.5) and ethnicity (<0.5), and had least uncertainty using TDP analysis as compared to traditional models.Multimodal ensemble models integrating structured and unstructured EHR data outperformed traditional SDE models achieving fair predictions across demographic sub-groups. Among ensemble methods, bagging demonstrated better generalizability than stacking, particularly when training data is limited.This approach can enhance phenotype discovery to enable future research studies using RWD, leading to better patient management and clinical outcomes.

View details for DOI 10.1093/jamiaopen/ooaf080

View details for PubMedID 40799932

View details for PubMedCentralID PMC12342940
Post-operative pain management in elders HELIYON Titan, A. L., Naderalvojoud, B., Suarez, P., Coquet, J., Baiu, I., Hernandez-Boussard, T., Curtin, C. 2025; 11 (12)

View details for DOI 10.1016/j.heliyon.2025.e43465

View details for Web of Science ID 001772891500001
Large Language Models Outperform Traditional Natural Language Processing Methods in Extracting Patient- Reported Outcomes in Inflammatory Bowel Disease GASTRO HEP ADVANCES Patel, P., Davis, C., Ralbovsky, A., Tinoco, D., Williams, C. Y. K., Slatter, S., Naderalvojoud, B., Rosen, M. J., Hernandez-Boussard, T., Rudrapatna, V. 2025; 4 (2)

View details for DOI 10.1016/j.gastha.2024.10.003

View details for Web of Science ID 001398433800001
Large language models outperform traditional natural language processing methods in extracting patient-reported outcomes in IBD. medRxiv : the preprint server for health sciences Patel, P. V., Davis, C., Ralbovsky, A., Tinoco, D., Williams, C. Y., Slatter, S., Naderalvojoud, B., Rosen, M. J., Hernandez-Boussard, T., Rudrapatna, V. 2024

Abstract

Patient-reported outcomes (PROs) are vital in assessing disease activity and treatment outcomes in inflammatory bowel disease (IBD). However, manual extraction of these PROs from the free-text of clinical notes is burdensome. We aimed to improve data curation from free-text information in the electronic health record, making it more available for research and quality improvement. This study aimed to compare traditional natural language processing (tNLP) and large language models (LLMs) in extracting three IBD PROs (abdominal pain, diarrhea, fecal blood) from clinical notes across two institutions.Clinic notes were annotated for each PRO using preset protocols. Models were developed and internally tested at the University of California San Francisco (UCSF), and then externally validated at Stanford University. We compared tNLP and LLM-based models on accuracy, sensitivity, specificity, positive and negative predictive value. Additionally, we conducted fairness and error assessments.Inter-rater reliability between annotators was >90%. On the UCSF test set (n=50), the top-performing tNLP models showcased accuracies of 92% (abdominal pain), 82% (diarrhea) and 80% (fecal blood), comparable to GPT-4, which was 96%, 88%, and 90% accurate, respectively. On external validation at Stanford (n=250), tNLP models failed to generalize (61-62% accuracy) while GPT-4 maintained accuracies >90%. PaLM-2 and GPT-4 showed similar performance. No biases were detected based on demographics or diagnosis.LLMs are accurate and generalizable methods for extracting PROs. They maintain excellent accuracy across institutions, despite heterogeneity in note templates and authors. Widespread adoption of such tools has the potential to enhance IBD research and patient care.

View details for DOI 10.1101/2024.09.05.24313139

View details for PubMedID 39281744

View details for PubMedCentralID PMC11398594
Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network. Journal of the American Medical Informatics Association : JAMIA Naderalvojoud, B., Curtin, C. M., Yanover, C., El-Hay, T., Choi, B., Park, R. W., Tabuenca, J. G., Reeve, M. P., Falconer, T., Humphreys, K., Asch, S. M., Hernandez-Boussard, T. 2024

Abstract

Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability.Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts.Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05).Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.

View details for DOI 10.1093/jamia/ocae028

View details for PubMedID 38412331
Trends in Influenza Vaccination Rates among a Medicaid Population from 2016 to 2021. Vaccines Naderalvojoud, B., Shah, N. D., Mutanga, J. N., Belov, A., Staiger, R., Chen, J. H., Whitaker, B., Hernandez-Boussard, T. 2023; 11 (11)

Abstract

Seasonal influenza is a leading cause of death in the U.S., causing significant morbidity, mortality, and economic burden. Despite the proven efficacy of vaccinations, rates remain notably low, especially among Medicaid enrollees. Leveraging Medicaid claims data, this study characterizes influenza vaccination rates among Medicaid enrollees and aims to elucidate factors influencing vaccine uptake, providing insights that might also be applicable to other vaccine-preventable diseases, including COVID-19. This study used Medicaid claims data from nine U.S. states (2016-2021], encompassing three types of claims: fee-for-service, major Medicaid managed care plan, and combined. We included Medicaid enrollees who had an in-person healthcare encounter during an influenza season in this period, excluding those under 6 months of age, over 65 years, or having telehealth-only encounters. Vaccination was the primary outcome, with secondary outcomes involving in-person healthcare encounters. Chi-square tests, multivariable logistic regression, and Fisher's exact test were utilized for statistical analysis. A total of 20,868,910 enrollees with at least one healthcare encounter in at least one influenza season were included in the study population between 2016 and 2021. Overall, 15% (N = 3,050,471) of enrollees received an influenza vaccine between 2016 and 2021. During peri-COVID periods, there was an increase in vaccination rates among enrollees compared to pre-COVID periods, from 14% to 16%. Children had the highest influenza vaccination rates among all age groups at 29%, whereas only 17% were of 5-17 years, and 10% were of the 18-64 years were vaccinated. We observed differences in the likelihood of receiving the influenza vaccine among enrollees based on their health conditions and medical encounters. In a study of Medicaid enrollees across nine states, 15% received an influenza vaccine from July 2016 to June 2021. Vaccination rates rose annually, peaking during peri-COVID seasons. The highest uptake was among children (6 months-4 years), and the lowest was in adults (18-64 years). Female gender, urban residency, and Medicaid-managed care affiliation positively influenced uptake. However, mental health and substance abuse disorders decreased the likelihood. This study, reliant on Medicaid claims data, underscores the need for outreach services.

View details for DOI 10.3390/vaccines11111712

View details for PubMedID 38006044
Improving machine learning with ensemble learning on observational healthcare data. AMIA ... Annual Symposium proceedings. AMIA Symposium Naderalvojoud, B., Hernandez-Boussard, T. 2023; 2023: 521-529

Abstract

Ensemble learning is a powerful technique for improving the accuracy and reliability of prediction models, especially in scenarios where individual models may not perform well. However, combining models with varying accuracies may not always improve the final prediction results, as models with lower accuracies may obscure the results of models with higher accuracies. This paper addresses this issue and answers the question of when an ensemble approach outperforms individual models for prediction. As a result, we propose an ensemble model for predicting patients at risk of postoperative prolonged opioid. The model incorporates two machine learning models that are trained using different covariates, resulting in high precision and recall. Our study, which employs five different machine learning algorithms, shows that the proposed approach significantly improves the final prediction results in terms of AUROC and AUPRC.

View details for PubMedID 38222353

View details for PubMedCentralID PMC10785929

Behzad Naderalvojoud

Biostatistician 2, Computational Medicine

Bio

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract