Bonnie Armstrong's Profile | Stanford Profiles

Stanford Advisors

David Larson, Postdoctoral Faculty Sponsor

Contact

Academic
bonniea@stanford.edu

University - Scholar Department: Radiology Position: Postdoctoral Scholar

Additional Info

Mail Code: 5372

All Publications

Predicting the Value of Radiology Artificial Intelligence Applications: Large-Scale Predeployment Evaluation of a Portfolio of Models. AJR. American journal of roentgenology Larson, D. B., Poff, J. A., Krishnan, S., Avondo, J., Armstrong, B. A., Na, H. S., Chaudhari, A., Kottler, N. 2026

Abstract

Background: Real-world performance of radiology artificial intelligence (AI) applications frequently diverges from previously reported results, creating challenges in anticipating a model's clinical value and impact. Objective: To develop a structured predeployment evaluation method for radiology AI models that combines standard performance metrics with new augmentation metrics in predicting overall AI model value and to test this method's predictions against radiologists' real-world postdeployment perceptions of model value. Methods: In this prospective study, a large national radiology practice conducted a predeployment evaluation from July 2022 to November 2024 of a single vendor's portfolio of 13 AI models for 12 clinical tasks. A four-radiologist workgroup identified attributes contributing to inherent value of AI assistance for clinical tasks, assigned weights to those attributes, and rated models accordingly. Performance of radiologists (based on clinical reports) and AI was assessed for 88,645 examinations across clinical sites using conventional metrics and augmentation metrics reflecting enhanced detection cases (i.e., AI-detected radiologist-missed positive cases). The workgroup combined inherent task values and pooled AI performance to predict models' overall value. Radiologists completed a postdeployment survey. Results: The workgroup identified three attributes as most likely to contribute to inherent value of AI assistance: tediousness of the task, likelihood that the radiologist would miss the finding, and a missed finding's potential clinical impact. Five, five, and two tasks were rated as having high, medium, and low inherent value, respectively. Across tasks, radiologists generally had higher PPV, whereas AI generally had higher sensitivity. Models showed widely varying absolute and relative enhanced detection rates (0.03-2.28% and 4.5-60.5%, respectively). Five, five, and three models were predicted to have high, medium, and low overall value, respectively. Survey response rate was 43.2% (54/125). Perceived value categories agreed between survey respondents and workgroup predictions for ten of 12 tasks. Conclusion: We present a structured method for predeployment evaluation of AI models' potential value, combining task-inherent value assessments with radiologist and AI performance metrics. A validation survey indicated high agreement between predeployment predictions and real-world postdeployment value perceptions. Clinical Impact: This practical evaluation approach can help guide radiology practices in evidence-based purchasing and deployment decisions for radiology AI models.

View details for DOI 10.2214/AJR.25.34340

View details for PubMedID 41779377
Automated real-time assessment of intracranial hemorrhage detection AI using an ensembled monitoring model (EMM). NPJ digital medicine Fang, Z., Johnston, A., Cheuy, L. Y., Na, H. S., Paschali, M., Gonzalez, C., Armstrong, B. A., Koirala, A., Laurel, D., Campion, A. W., Iv, M., Chaudhari, A. S., Larson, D. B. 2025; 8 (1): 608

Abstract

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM can successfully categorize confidence in the AI-generated prediction, suggest appropriate actions, and help physicians recognize low confidence scenarios, ultimately reducing cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.

View details for DOI 10.1038/s41746-025-02007-0

View details for PubMedID 41102370

View details for PubMedCentralID 6560460
Automated Real-time Assessment of Intracranial Hemorrhage Detection AI Using an Ensembled Monitoring Model (EMM). Research square Fang, Z., Johnston, A., Cheuy, L., Na, H. S., Paschali, M., Gonzalez, C., Armstrong, B. A., Koirala, A., Laurel, D., Campion, A. W., Iv, M., Chaudhari, A. S., Larson, D. B. 2025

Abstract

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM successfully categorizes confidence in the AI-generated prediction, suggesting different actions and helping improve the overall performance of AI tools to ultimately reduce cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.

View details for DOI 10.21203/rs.3.rs-6683104/v1

View details for PubMedID 40502778

View details for PubMedCentralID PMC12154145
Cognitive biases in surgery: systematic review. The British journal of surgery Armstrong, B. A., Dutescu, I. A., Tung, A., Carter, D. N., Trbovich, P. L., Wong, S., Saposnik, G., Grantcharov, T. 2023

Abstract

Although numerous studies have established cognitive biases as contributors to surgical adverse events, their prevalence and impact in surgery are unknown. This review aimed to describe types of cognitive bias in surgery, their impact on surgical performance and patient outcomes, their source, and the mitigation strategies used to reduce their effect.A literature search was conducted on 9 April and 6 December 2021 using MEDLINE, Embase, PsycINFO, Scopus, Web of Science, Cochrane Central Register of Controlled Trials, and the Cochrane Database of Systematic Reviews. Included studies investigated how cognitive biases affect surgery and the mitigation strategies used to combat their impact. The National Institutes of Health tools were used to assess study quality. Inductive thematic analysis was used to identify themes of cognitive bias impact on surgical performance.Thirty-nine studies were included, comprising 6514 surgeons and over 200 000 patients. Thirty-one types of cognitive bias were identified, with overconfidence, anchoring, and confirmation bias the most common. Cognitive biases differentially influenced six themes of surgical performance. For example, overconfidence bias associated with inaccurate perceptions of ability, whereas anchoring bias associated with inaccurate risk-benefit estimations and not considering alternative options. Anchoring and confirmation biases associated with actual patient harm, such as never events. No studies investigated cognitive bias source or mitigation strategies.Cognitive biases have a negative impact on surgical performance and patient outcomes across all points of surgical care. This review highlights the scarcity of research investigating the sources that give rise to cognitive biases in surgery and the mitigation strategies that target these factors.

View details for DOI 10.1093/bjs/znad004

View details for PubMedID 36752583
Electroencephalography can provide advance warning of technical errors during laparoscopic surgery. Surgical endoscopy Armstrong, B. A., Nemrodov, D., Tung, A., Graham, S. J., Grantcharov, T. 2022

Abstract

Intraoperative adverse events lead to patient injury and death, and are increasing. Early warning systems (EWSs) have been used to detect patient deterioration and save lives. However, few studies have used EWSs to monitor surgical performance and caution about imminent technical errors. Previous (non-surgical) research has investigated neural activity to predict future motor errors using electroencephalography (EEG). The present proof-of-concept cohort study investigates whether EEG could predict technical errors in surgery.In a large academic hospital, three surgical fellows performed 12 elective laparoscopic general surgeries. Audiovisual data of the operating room and the surgeon's neural activity were recorded. Technical errors and epochs of good surgical performance were coded into events. Neural activity was observed 40 s prior and 10 s after errors and good events to determine how far in advance errors were detected. A hierarchical regression model was used to account for possible clustering within surgeons. This prospective, proof-of-concept, cohort study was conducted from July to November 2021, with a pilot period from February to March 2020 used to optimize the technique of data capture and included participants who were blinded from study hypotheses.Forty-five technical errors, mainly due to too little force or distance (n = 39), and 27 good surgical events were coded during grasping and dissection. Neural activity representing error monitoring (p = .008) and motor uncertainty (p = .034) was detected 17 s prior to errors, but not prior to good surgical performance.These results show that distinct neural signatures are predictive of technical error in laparoscopic surgery. If replicated with low false-alarm rates, an EEG-based EWS of technical errors could be used to improve individualized surgical training by flagging imminent unsafe actions-before errors occur and cause patient harm.

View details for DOI 10.1007/s00464-022-09799-2

View details for PubMedID 36478137

Bonnie Armstrong

Postdoctoral Scholar, Radiology

Stanford Advisors

Contact

Additional Info

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract