Dhamanpreet Kaur

Resident in Cardiothoracic Surgery - Thoracic Surgery
Affiliate, Department Funds

Publications

All Publications

Sex disparities in deep learning estimation of ejection fraction from cardiac magnetic resonance imaging. NPJ digital medicine Kaur, D., Shad, R., Kumar, A., Mathur, M., Cho, J., Fong, R., Zakka, C., Phillips, C., Hiesinger, W. 2026

Abstract

The advent of artificial intelligence in cardiovascular imaging holds immense potential for earlier diagnoses, precision medicine, and improved disease management. However, the presence of sex-based disparities and strategies to mitigate biases in deep learning models for cardiac imaging remain understudied. In this study, we analyzed algorithmic bias in a foundation model that was pretrained on cardiac magnetic resonance imaging and radiology reports from multiple institutes and finetuned to estimate ejection fraction (EF) on the UK Biobank dataset. The model performed significantly worse in EF estimation for females than males in the diagnosis of reduced EF. Algorithmic fairness did not improve despite masking of protected attributes in radiology reports and data resampling, although explicit input of sex in model finetuning may improve EF estimation in some cases. The underdiagnosis of reduced EF among females holds critical implications for the exacerbation of existing sex-based disparities in cardiovascular health. We advise caution in the development of models for cardiovascular imaging to avoid such pitfalls.

View details for DOI 10.1038/s41746-025-02330-6

View details for PubMedID 41577988
Race, Sex, and Age Disparities in the Performance of ECG Deep Learning Models Predicting Heart Failure. Circulation. Heart failure Kaur, D., Hughes, J. W., Rogers, A. J., Kang, G., Narayan, S. M., Ashley, E. A., Perez, M. V. 2023: e010879

Abstract

Deep learning models may combat widening racial disparities in heart failure outcomes through early identification of individuals at high risk. However, demographic biases in the performance of these models have not been well-studied.This retrospective analysis used 12-lead ECGs taken between 2008 and 2018 from 326 518 patient encounters referred for standard clinical indications to Stanford Hospital. The primary model was a convolutional neural network model trained to predict incident heart failure within 5 years. Biases were evaluated on the testing set (160 312 ECGs) using the area under the receiver operating characteristic curve, stratified across the protected attributes of race, ethnicity, age, and sex.There were 59 817 cases of incident heart failure observed within 5 years of ECG collection. The performance of the primary model declined with age. There were no significant differences observed between racial groups overall. However, the primary model performed significantly worse in Black patients aged 0 to 40 years compared with all other racial groups in this age group, with differences most pronounced among young Black women. Disparities in model performance did not improve with the integration of race, ethnicity, sex, and age into model architecture, by training separate models for each racial group, or by providing the model with a data set of equal racial representation. Using probability thresholds individualized for race, age, and sex offered substantial improvements in F1 scores.The biases found in this study warrant caution against perpetuating disparities through the development of machine learning tools for the prognosis and management of heart failure. Customizing the application of these models by using probability thresholds individualized by race, ethnicity, age, and sex may offer an avenue to mitigate existing algorithmic disparities.

View details for DOI 10.1161/CIRCHEARTFAILURE.123.010879

View details for PubMedID 38126168
Application of Bayesian networks to generate synthetic health data. Journal of the American Medical Informatics Association : JAMIA Kaur, D., Sobiesk, M., Patil, S., Liu, J., Bhagat, P., Gupta, A., Markuzon, N. 2021; 28 (4): 801-811

Abstract

This study seeks to develop a fully automated method of generating synthetic data from a real dataset that could be employed by medical organizations to distribute health data to researchers, reducing the need for access to real data. We hypothesize the application of Bayesian networks will improve upon the predominant existing method, medBGAN, in handling the complexity and dimensionality of healthcare data.We employed Bayesian networks to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure. We used the University of California Irvine (UCI) heart disease and diabetes datasets as well as the MIMIC-III diagnoses database. We evaluated our method through statistical tests, machine learning tasks, preservation of rare events, disclosure risk, and the ability of a machine learning classifier to discriminate between the real and synthetic data.Our Bayesian network model outperformed or equaled medBGAN in all key metrics. Notable improvement was achieved in capturing rare variables and preserving association rules.Bayesian networks generated data sufficiently similar to the original data with minimal risk of disclosure, while offering additional transparency, computational efficiency, and capacity to handle more data types in comparison to existing methods. We hope this method will allow healthcare organizations to efficiently disseminate synthetic health data to researchers, enabling them to generate hypotheses and develop analytical tools.We conclude the application of Bayesian networks is a promising option for generating realistic synthetic health data that preserves the features of the original data without compromising data privacy.

View details for DOI 10.1093/jamia/ocaa303

View details for PubMedID 33367620

View details for PubMedCentralID PMC7973486
Evaluation of clinician interaction with alerts to enhance performance of the tele-critical care medical environment. International journal of medical informatics Kaur, D., Panos, R. J., Badawi, O., Bapat, S. S., Wang, L., Gupta, A. 2020; 139: 104165

Abstract

Identify opportunities to improve the interaction between clinicians and Tele-Critical Care (Tele-CC) programs through an analysis of alert occurrence and reactivation in a specific Tele-CC application.Data were collected automatically through the Philips eCaremanager® software system used at multiple hospitals in the Avera health system. We evaluated the distribution of alerts per patient, frequency of alert types, time between consecutive alerts, and Tele-CC clinician choice of alert reactivation times.Each patient generated an average of 79.8 alerts during their ICU stay (median 31.0; 25th - 75th percentile 10.0-89.0) with 46.4 for blood pressure and 38.4 for oxygenation. The most frequent alerts for continuous physiological parameters were: MAP limit (28.9 %), O2/RR (26.4 %), MAP trend (16.5 %), HR trend (12.1 %), and HR limit (11.3 %). The median time between consecutive alerts for one parameter was less than 10 min for 86 % of patients. Tele-CC providers responded to all alert types with immediate reactivation 47-88 % of the time. Limit alerts had longer reactivation times than their trend alert counterparts (p-value < .001).The alert type specific differences in frequency, time occurrence and provider choice of reactivation time provide insight into how clinicians interact with the Tele-CC system. Systems engineering enhancements to Tele-CC software algorithms may reduce alert burden and thereby decrease clinicians' cognitive workload for alert assessment. Further study of Tele-CC alert generation, alert presentation to clinicians, and the clinicians' options to respond to these alerts may reduce provider workload, minimize alert desensitization, and optimize the ability of Tele-CC clinicians to provide efficient and timely critical care management.

View details for DOI 10.1016/j.ijmedinf.2020.104165

View details for PubMedID 32402986
Racial disparities in prostate cancer survival in a screened population: Reality versus artifact. Cancer Kaur, D., Ulloa-Pérez, E., Gulati, R., Etzioni, R. 2018; 124 (8): 1752-1759

Abstract

Racial disparities in prostate cancer survival (PCS) narrowed during the prostate-specific antigen (PSA) era, suggesting that screening may induce more equitable outcomes. However, the effects of lead time and overdiagnosis can inflate survival even without real screening benefit.A simulation model of PCS in the early PSA era (1991-2000) was created. The modeled survival started with baseline survival in the pre-PSA era (1975-1990) and added lead times and overdiagnosis using estimates from published studies. The authors quantified 1) discrepancies between modeled and observed PCS in the PSA era and 2) residual period effects on PCS given specified values for screening benefit.Lead time and overdiagnosis explained more of the improvement in PCS for older ages at diagnosis (46% [95% confidence interval (CI), 44%-50%] for blacks and 51% [95% CI, 50%-52%] for all races ages 50-54 years vs 98% [95% CI, 97%-99%] for blacks and 100% for all races ages 75-79 years). They also explained more of the narrowing in PCS disparities for older ages (33% [95% CI, 31%-43%] for men ages 50-54 years vs 74% [95% CI, 71%-81%] for men ages 75-79 years). The period effects amounted to reductions of 27% to 40% among blacks and 26% to 38% among all races in the risk of prostate cancer death, depending on the screening benefit.Real improvements in survival disparities in the PSA era are smaller than those observed and reflect similar reductions in the risk of prostate cancer death among blacks and all races. Understanding screening artifacts is necessary for valid interpretation of observed survival trends. Cancer 2018;124:1752-9. © 2018 American Cancer Society.

View details for DOI 10.1002/cncr.31253

View details for PubMedID 29370459

View details for PubMedCentralID PMC5891383
Red teaming ChatGPT in medicine to yield real-world insights on model behavior. NPJ digital medicine Chang, C. T., Farah, H., Gui, H., Rezaei, S. J., Bou-Khalil, C., Park, Y. J., Swaminathan, A., Omiye, J. A., Kolluri, A., Chaurasia, A., Lozano, A., Heiman, A., Jia, A. S., Kaushal, A., Jia, A., Iacovelli, A., Yang, A., Salles, A., Singhal, A., Narasimhan, B., Belai, B., Jacobson, B. H., Li, B., Poe, C. H., Sanghera, C., Zheng, C., Messer, C., Kettud, D. V., Pandya, D., Kaur, D., Hla, D., Dindoust, D., Moehrle, D., Ross, D., Chou, E., Lin, E., Haredasht, F. N., Cheng, G., Gao, I., Chang, J., Silberg, J., Fries, J. A., Xu, J., Jamison, J., Tamaresis, J. S., Chen, J. H., Lazaro, J., Banda, J. M., Lee, J. J., Matthys, K. E., Steffner, K. R., Tian, L., Pegolotti, L., Srinivasan, M., Manimaran, M., Schwede, M., Zhang, M., Nguyen, M., Fathzadeh, M., Zhao, Q., Bajra, R., Khurana, R., Azam, R., Bartlett, R., Truong, S. T., Fleming, S. L., Raj, S., Behr, S., Onyeka, S., Muppidi, S., Bandali, T., Eulalio, T. Y., Chen, W., Zhou, X., Ding, Y., Cui, Y., Tan, Y., Liu, Y., Shah, N., Daneshjou, R. 2025; 8 (1): 149

Abstract

Red teaming, the practice of adversarially exposing unexpected or undesired model behaviors, is critical towards improving equity and accuracy of large language models, but non-model creator-affiliated red teaming is scant in healthcare. We convened teams of clinicians, medical and engineering students, and technical professionals (80 participants total) to stress-test models with real-world clinical cases and categorize inappropriate responses along axes of safety, privacy, hallucinations/accuracy, and bias. Six medically-trained reviewers re-analyzed prompt-response pairs and added qualitative annotations. Of 376 unique prompts (1504 responses), 20.1% were inappropriate (GPT-3.5: 25.8%; GPT-4.0: 16%; GPT-4.0 with Internet: 17.8%). Subsequently, we show the utility of our benchmark by testing GPT-4o, a model released after our event (20.4% inappropriate). 21.5% of responses appropriate with GPT-3.5 were inappropriate in updated models. We share insights for constructing red teaming prompts, and present our benchmark for iterative model assessments.

View details for DOI 10.1038/s41746-025-01542-0

View details for PubMedID 40055532

View details for PubMedCentralID 10564921
Leveraging Masked Autoencoders for Self-Supervised Pretraining in Surgical Video Activity Recognition 104th Annual Meeting of The American Association for Thoracic Surgery Tang, L., Zakka, C., Dahlan, A., Chaurasia, A., Dalal, A., Kaur, D., Shad, R., Fong, R., Hiesinger, W. 2024
A Generalizable Deep Learning System for Cardiac MRI Shad, R., Zakka, C. R., Kaur, D., Mongan, J., Kallianos, K. G., Filice, R., Khandwala, N., Eng, D., Langlotz, C., Hiesinger, W. LIPPINCOTT WILLIAMS & WILKINS. 2023

View details for DOI 10.1161/circ.148.suppl_1.13588

View details for Web of Science ID 001157891302333
A deep learning-based electrocardiogram risk score for long term cardiovascular death and disease. NPJ digital medicine Hughes, J. W., Tooley, J., Torres Soto, J., Ostropolets, A., Poterucha, T., Christensen, M. K., Yuan, N., Ehlert, B., Kaur, D., Kang, G., Rogers, A., Narayan, S., Elias, P., Ouyang, D., Ashley, E., Zou, J., Perez, M. V. 2023; 6 (1): 169

Abstract

The electrocardiogram (ECG) is the most frequently performed cardiovascular diagnostic test, but it is unclear how much information resting ECGs contain about long term cardiovascular risk. Here we report that a deep convolutional neural network can accurately predict the long-term risk of cardiovascular mortality and disease based on a resting ECG alone. Using a large dataset of resting 12-lead ECGs collected at Stanford University Medical Center, we developed SEER, the Stanford Estimator of Electrocardiogram Risk. SEER predicts 5-year cardiovascular mortality with an area under the receiver operator characteristic curve (AUC) of 0.83 in a held-out test set at Stanford, and with AUCs of 0.78 and 0.83 respectively when independently evaluated at Cedars-Sinai Medical Center and Columbia University Irving Medical Center. SEER predicts 5-year atherosclerotic disease (ASCVD) with an AUC of 0.67, similar to the Pooled Cohort Equations for ASCVD Risk, while being only modestly correlated. When used in conjunction with the Pooled Cohort Equations, SEER accurately reclassified 16% of patients from low to moderate risk, uncovering a group with an actual average 9.9% 10-year ASCVD risk who would not have otherwise been indicated for statin therapy. SEER can also predict several other cardiovascular conditions such as heart failure and atrial fibrillation. Using only lead I of the ECG it predicts 5-year cardiovascular mortality with an AUC of 0.80. SEER, used alongside the Pooled Cohort Equations and other risk tools, can substantially improve cardiovascular risk stratification and aid in medical decision making.

View details for DOI 10.1038/s41746-023-00916-6

View details for PubMedID 37700032

View details for PubMedCentralID 8145781
Weak acids as an alternative anti-microbial therapy. Biofilm Kundukad, B., Udayakumar, G., Grela, E., Kaur, D., Rice, S. A., Kjelleberg, S., Doyle, P. S. 2020; 2: 100019

Abstract

Weak acids such as acetic acid and N-acetyl cysteine (NAC) at pH less than their pKa can effectively eradicate biofilms due to their ability to penetrate the biofilm matrix and the cell membrane. However, the optimum conditions for their activity against drug resistant strains, and safety, need to be understood for their application to treat infections or to inactivate biofilms on hard surfaces. Here, we investigate the efficacy and optimum conditions at which weak acids can eradicate biofilms. We compared the efficacy of various mono and triprotic weak acids such as N-acetyl cysteine (NAC), acetic acid, formic acid and citric acid, in eradicating biofilms. We found that monoprotic weak acids/acid drugs can kill mucoid P. aeruginosa mucA biofilm bacteria provided the pH is less than their pKa, demonstrating that the extracellular biofilm matrix does not protect the bacteria from the activity of the weak acids. Triprotic acids, such as citric acid, kill biofilm bacteria at pH < pKa1. However, at a pH between pKa1 and pKa2, citric acid is effective in killing the bacteria at the core of biofilm microcolonies but does not kill the bacteria on the periphery. The efficacy of a monoprotic weak acid (NAC) and triprotic weak acid (citric acid) were tested on biofilms formed by Klebsiella pneumoniae KP1, Pseudomonas putida OUS82, Staphylococcus aureus 15981, P. aeruginosa DK1-NH57388A, a mucoid cystic fibrosis isolate and P. aeruginosa PA_D25, an antibiotic resistant strain. We showed that weak acids have a broad spectrum of activity against a wide range of bacteria, including antibiotic resistant bacteria. Further, we showed that a weak acid drug, NAC, can kill bacteria without being toxic to human cells, if its pH is maintained close to its pKa. Thus weak acids/weak acid drugs target antibiotic resistant bacteria and eradicate the persister cells in biofilms which are tolerant to other conventional methods of biofilm eradication.

View details for DOI 10.1016/j.bioflm.2020.100019

View details for PubMedID 33447805

View details for PubMedCentralID PMC7798471

Dhamanpreet Kaur

Resident in Cardiothoracic Surgery - Thoracic SurgeryAffiliate, Department Funds

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Resident in Cardiothoracic Surgery - Thoracic Surgery
Affiliate, Department Funds