Susan Lee's Profile | Stanford Profiles

All Publications

Extraction of Unstructured Electronic Health Records to Evaluate Glioblastoma Treatment Patterns. JCO clinical cancer informatics Swaminathan, A., Ren, A. L., Wu, J. Y., Bhargava-Shah, A., Lopez, I., Srivastava, U., Alexopoulos, V., Pizzitola, R., Bui, B., Alkhani, L., Lee, S., Mohit, N., Seo, N., Macedo, N., Cheng, W., Wang, W., Tran, E., Thomas, R., Gevaert, O. 2024; 8: e2300091

Abstract

Data on lines of therapy (LOTs) for cancer treatment are important for clinical oncology research, but LOTs are not explicitly recorded in electronic health records (EHRs). We present an efficient approach for clinical data abstraction and a flexible algorithm to derive LOTs from EHR-based medication data on patients with glioblastoma multiforme (GBM).Nonclinicians were trained to abstract the diagnosis of GBM from EHRs, and their accuracy was compared with abstraction performed by clinicians. The resulting data were used to build a cohort of patients with confirmed GBM diagnosis. An algorithm was developed to derive LOTs using structured medication data, accounting for the addition and discontinuation of therapies and drug class. Descriptive statistics were calculated and time-to-next-treatment (TTNT) analysis was performed using the Kaplan-Meier method.Treating clinicians as the gold standard, nonclinicians abstracted GBM diagnosis with a sensitivity of 0.98, specificity 1.00, positive predictive value 1.00, and negative predictive value 0.90, suggesting that nonclinician abstraction of GBM diagnosis was comparable with clinician abstraction. Of 693 patients with a confirmed diagnosis of GBM, 246 patients contained structured information about the types of medications received. Of them, 165 (67.1%) received a first-line therapy (1L) of temozolomide, and the median TTNT from the start of 1L was 179 days.We described a workflow for extracting diagnosis of GBM and LOT from EHR data that combines nonclinician abstraction with algorithmic processing, demonstrating comparable accuracy with clinician abstraction and highlighting the potential for scalable and efficient EHR-based oncology research.

View details for DOI 10.1200/CCI.23.00091

View details for PubMedID 38857465
Selective prediction for extracting unstructured clinical data. Journal of the American Medical Informatics Association : JAMIA Swaminathan, A., Lopez, I., Wang, W., Srivastava, U., Tran, E., Bhargava-Shah, A., Wu, J. Y., Ren, A. L., Caoili, K., Bui, B., Alkhani, L., Lee, S., Mohit, N., Seo, N., Macedo, N., Cheng, W., Liu, C., Thomas, R., Chen, J. H., Gevaert, O. 2023

Abstract

While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction.We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost.The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables.We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes.Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.

View details for DOI 10.1093/jamia/ocad182

View details for PubMedID 37769323

Susan Lee

Masters Student in Computer Science, admitted Autumn 2022

Contact

Additional Info

All Publications

Abstract

Abstract