Debadutta (Dev) Dash, MD, MPH
Clinical Assistant Professor, Emergency Medicine
Bio
Dr. Dash is an emergency medicine physician. He delivers care in the Stanford Health Care level 1 trauma center. He is an assistant professor in the Department of Emergency Medicine at Stanford University School of Medicine.
He received fellowship training in clinical informatics at Stanford Health Care. He earned a Master of Public Health (MPH) degree from Harvard University.
His research interests include computer vision and natural language processing. He is also interested in quality assurance and quality improvement in digital health initiatives.
Other research projects of Dr. Dash include development of an image classification algorithm that helps predict hypoxic outcomes. He also worked on the development of a hardware and software system designed to provide real-time feedback about cardiac function at the patient’s bedside.
Dr. Dash was vice president of the American Medical Informatics Association Clinical Fellows while completing his fellowship. He was also a post-doctoral research fellow at the Stanford University Center for Artificial Intelligence in Medicine & Imaging.
He is a member of the American College of Emergency Physicians and American Academy of Emergency Medicine.
He speaks English and Oriya fluently. He also speaks, reads, and writes Japanese and Spanish with intermediate competence.
His interests outside of patient care include piano, computer programming, sustainable energy projects, and cooking multi-course East Asian meals.
Clinical Focus
- Artificial Intelligence
- Clinical Informatics
- Emergency Medicine
Boards, Advisory Committees, Professional Organizations
-
Member, Sigma Pi Sigma (2008 - Present)
-
Member, ACEP (2018 - Present)
Professional Education
-
Fellowship: Stanford University Clinical Informatics Fellowship (2022) CA
-
Board Certification: American Board of Emergency Medicine, Emergency Medicine (2019)
-
Residency: University Hosptials Cleveland Medical Center Emergency Medicine Program (2018) OH
-
Medical Education: Baylor College of Medicine (2013) TX
-
MPH, Harvard School of Public Health, Epidemiology & Statistics (2018)
-
BS, University of Texas at Austin, Radiation Physics (2008)
-
MD, Baylor College of Medicine (2013)
All Publications
-
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.
JAMA
2024
Abstract
Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas.To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty.A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024.Studies evaluating 1 or more LLMs in health care.Three independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty.Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented.Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.
View details for DOI 10.1001/jama.2024.21700
View details for PubMedID 39405325
View details for PubMedCentralID PMC11480901
-
Using an artificial intelligence software improves emergency medicine physician intracranial haemorrhage detection to radiologist levels.
Emergency medicine journal : EMJ
2024
Abstract
BACKGROUND: Tools to increase the turnaround speed and accuracy of imaging reports could positively influence ED logistics. The Caire ICH is an artificial intelligence (AI) software developed for ED physicians to recognise intracranial haemorrhages (ICHs) on non-contrast enhanced cranial CT scans to manage the clinical care of these patients in a timelier fashion.METHODS: A dataset of 532 non-contrast cranial CT scans was reviewed by five board-certified emergency physicians (EPs) with an average of 14.8 years of practice experience. The scans were labelled in random order for the presence or absence of an ICH. If an ICH was detected, the reader further labelled all subtypes present (ie, epidural, subdural, subarachnoid, intraparenchymal and/or intraventricular haemorrhage). After a washout period, the five EPs reviewed again the scans individually with the assistance of Caire ICH. The mean accuracy of the EP readings with AI assistance was compared with the mean accuracy of three general radiologists reading the films individually. The final diagnosis (ie, ground truth) was adjudicated by a consensus of the radiologists after their individual readings.RESULTS: Mean EP reader accuracy significantly increased by 6.20% (95% CI for the difference 5.10%-7.29%; p=0.0092) when using Caire ICH to detect an ICH. Mean accuracy of the EP cohort in detecting an ICH using Caire ICH was found to be more accurate than the radiologist cohort prior to discussion; this difference, however, was not statistically significant.CONCLUSION: The Caire ICH software significantly improved the accuracy and sensitivity of detecting an ICH by the EP to a level comparable to general radiologists. Further prospective research with larger numbers will be needed to understand the impact of Caire ICH on ED logistics and patient outcomes.
View details for DOI 10.1136/emermed-2023-213158
View details for PubMedID 38233106
-
Pseudo-randomized testing of a discharge medication alert to reduce free-text prescribing.
Applied clinical informatics
2023
Abstract
Pseudo-randomized testing can be applied to perform rigorous yet practical evaluations of clinical decision support tools. We apply this methodology to an interruptive alert aimed at reducing free-text prescriptions. Using free-text instead of structured computerized provider order entry elements can cause medication errors and inequity in care by bypassing medication-based clinical decision support tools and hindering automated translation of prescription instructions.Evaluate the effectiveness of an interruptive alert at reducing free-text prescriptions via pseudo-randomized testing using native electronic health records (EHR) functionality.Two versions of an EHR alert triggered when a provider attempted to sign a discharge free-text prescription. The visible version displayed an interruptive alert to the user, and a silent version triggered in the background, serving as a control. Providers were assigned to the visible and silent arms based on even/odd EHR provider IDs. The proportion of encounters with a free-text prescription was calculated across the groups. Alert trigger rates were compared in process control charts. Free-text prescriptions were analyzed to identify prescribing patterns.Over the 28 week study period, 143 providers triggered 695 alerts (345 visible and 350 silent). The proportions of encounters with free-text prescriptions were 83% (266/320) and 90% (273/303) in the intervention and control groups respectively (p-value = 0.01). For the active alert, median time to action was 31 seconds. Alert trigger rates between groups were similar over time. Ibuprofen, oxycodone, steroid tapers, and oncology-related prescriptions accounted for most free-text prescriptions. A majority of these prescriptions originated from user preference lists.An interruptive alert was associated with a modest reduction in free-text prescriptions. Furthermore, the majority of these prescriptions could have been reproduced using structured order entry fields. Targeting user preference lists shows promise for future intervention.
View details for DOI 10.1055/a-2068-6940
View details for PubMedID 37015344
-
AI-ENABLED ASSESSMENT OF CARDIAC FUNCTION AND VIDEO QUALITY IN EMERGENCY DEPARTMENT POINT-OF-CARE ECHOCARDIOGRAMS.
The Journal of emergency medicine
2023
Abstract
The adoption of point-of-care ultrasound (POCUS) has greatly improved the ability to rapidly evaluate unstable emergency department (ED) patients at the bedside. One major use of POCUS is to obtain echocardiograms to assess cardiac function.We developed EchoNet-POCUS, a novel deep learning system, to aid emergency physicians (EPs) in interpreting POCUS echocardiograms and to reduce operator-to-operator variability.We collected a new dataset of POCUS echocardiogram videos obtained in the ED by EPs and annotated the cardiac function and quality of each video. Using this dataset, we train EchoNet-POCUS to evaluate both cardiac function and video quality in POCUS echocardiograms.EchoNet-POCUS achieves an area under the receiver operating characteristic curve (AUROC) of 0.92 (0.89-0.94) for predicting whether cardiac function is abnormal and an AUROC of 0.81 (0.78-0.85) for predicting video quality.EchoNet-POCUS can be applied to bedside echocardiogram videos in real time using commodity hardware, as we demonstrate in a prospective pilot study.
View details for DOI 10.1016/j.jemermed.2023.02.005
View details for PubMedID 38369413
-
Investigating real-world consequences of biases in commonly used clinical calculators.
The American journal of managed care
2023; 29 (1): e1-e7
Abstract
OBJECTIVES: To evaluate whether one summary metric of calculator performance sufficiently conveys equity across different demographic subgroups, as well as to evaluate how calculator predictive performance affects downstream health outcomes.STUDY DESIGN: We evaluate 3 commonly used clinical calculators-Model for End-Stage Liver Disease (MELD), CHA2DS2-VASc, and simplified Pulmonary Embolism Severity Index (sPESI)-on the cohort extracted from the Stanford Medicine Research Data Repository, following the cohort selection process as described in respective calculator derivation papers.METHODS: We quantified the predictive performance of the 3 clinical calculators across sex and race. Then, using the clinical guidelines that guide care based on these calculators' output, we quantified potential disparities in subsequent health outcomes.RESULTS: Across the examined subgroups, the MELD calculator exhibited worse performance for female and White populations, CHA2DS2-VASc calculator for the male population, and sPESI for the Black population. The extent to which such performance differences translated into differential health outcomes depended on the distribution of the calculators' scores around the thresholds used to trigger a care action via the corresponding guidelines. In particular, under the old guideline for CHA2DS2-VASc, among those who would not have been offered anticoagulant therapy, the Hispanic subgroup exhibited the highest rate of stroke.CONCLUSIONS: Clinical calculators, even when they do not include variables such as sex and race as inputs, can have very different care consequences across those subgroups. These differences in health care outcomes across subgroups can be explained by examining the distribution of scores and their calibration around the thresholds encoded in the accompanying care guidelines.
View details for DOI 10.37765/ajmc.2023.89306
View details for PubMedID 36716157
- Paging the Clinical Informatics Community: Respond STAT to Dobbs v Jackson's Women's Health Organization. Applied clinical informatics 2022
-
Deep Learning System Boosts Radiologist Detection of Intracranial Hemorrhage.
Cureus
2022; 14 (10): e30264
Abstract
BACKGROUND: Intracranial hemorrhage (ICH) requires emergent medical treatment for positive outcomes. While previous artificial intelligence (AI) solutions achieved rapid diagnostics, none were shown to improve the performance of radiologists in detecting ICHs. Here, we show that the Caire ICH artificial intelligence system enhances a radiologist's ICH diagnosis performance.METHODS: A dataset of non-contrast-enhanced axial cranial computed tomography (CT) scans (n=532) were labeled for the presence or absence of an ICH. If an ICH was detected, its ICH subtype was identified. After a washout period, the three radiologists reviewed the same dataset with the assistance of the Caire ICH system. Performance was measured with respect to reader agreement, accuracy, sensitivity, and specificity when compared to the ground truth, defined as reader consensus.RESULTS: Caire ICH improved the inter-reader agreement on average by 5.76% in a dataset with an ICH prevalence of 74.3%. Further, radiologists using Caire ICH detected an average of 18 more ICHs and significantly increased their accuracy by 6.15%, their sensitivity by 4.6%, and their specificity by 10.62%. The Caire ICH system also improved the radiologist's ability to accurately identify the ICH subtypes present.CONCLUSION: The Caire ICH device significantly improves the performance of a cohort of radiologists. Such a device has the potential to be a tool that can improve patient outcomes and reduce misdiagnosis of ICH.
View details for DOI 10.7759/cureus.30264
View details for PubMedID 36381767
-
Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review.
JAMA network open
2022; 5 (8): e2227779
Abstract
Importance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.Objectives: To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested.Evidence Review: MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items.Findings: From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex).Conclusions and Relevance: These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.
View details for DOI 10.1001/jamanetworkopen.2022.27779
View details for PubMedID 35984654
-
Building a Learning Health System: Creating an Analytical Workflow for Evidence Generation to Inform Institutional Clinical Care Guidelines.
Applied clinical informatics
2022; 13 (1): 315-321
Abstract
BACKGROUND: One key aspect of a learning health system (LHS) is utilizing data generated during care delivery to inform clinical care. However, institutional guidelines that utilize observational data are rare and require months to create, making current processes impractical for more urgent scenarios such as those posed by the COVID-19 pandemic. There exists a need to rapidly analyze institutional data to drive guideline creation where evidence from randomized control trials are unavailable.OBJECTIVES: This article provides a background on the current state of observational data generation in institutional guideline creation and details our institution's experience in creating a novel workflow to (1) demonstrate the value of such a workflow, (2) demonstrate a real-world example, and (3) discuss difficulties encountered and future directions.METHODS: Utilizing a multidisciplinary team of database specialists, clinicians, and informaticists, we created a workflow for identifying and translating a clinical need into a queryable format in our clinical data warehouse, creating data summaries and feeding this information back into clinical guideline creation.RESULTS: Clinical questions posed by the hospital medicine division were answered in a rapid time frame and informed creation of institutional guidelines for the care of patients with COVID-19. The cost of setting up a workflow, answering the questions, and producing data summaries required around 300hours of effort and $300,000 USD.CONCLUSION: A key component of an LHS is the ability to learn from data generated during care delivery. There are rare examples in the literature and we demonstrate one such example along with proposed thoughts of ideal multidisciplinary team formation and deployment.
View details for DOI 10.1055/s-0042-1743241
View details for PubMedID 35235994