All Publications

  • Selective prediction for extracting unstructured clinical data. Journal of the American Medical Informatics Association : JAMIA Swaminathan, A., Lopez, I., Wang, W., Srivastava, U., Tran, E., Bhargava-Shah, A., Wu, J. Y., Ren, A. L., Caoili, K., Bui, B., Alkhani, L., Lee, S., Mohit, N., Seo, N., Macedo, N., Cheng, W., Liu, C., Thomas, R., Chen, J. H., Gevaert, O. 2023


    While there are currently approaches to handle unstructured clinical data, such as manual abstraction and structured proxy variables, these methods may be time-consuming, not scalable, and imprecise. This article aims to determine whether selective prediction, which gives a model the option to abstain from generating a prediction, can improve the accuracy and efficiency of unstructured clinical data abstraction.We trained selective classifiers (logistic regression, random forest, support vector machine) to extract 5 variables from clinical notes: depression (n = 1563), glioblastoma (GBM, n = 659), rectal adenocarcinoma (DRA, n = 601), and abdominoperineal resection (APR, n = 601) and low anterior resection (LAR, n = 601) of adenocarcinoma. We varied the cost of false positives (FP), false negatives (FN), and abstained notes and measured total misclassification cost.The depression selective classifiers abstained on anywhere from 0% to 97% of notes, and the change in total misclassification cost ranged from -58% to 9%. Selective classifiers abstained on 5%-43% of notes across the GBM and colorectal cancer models. The GBM selective classifier abstained on 43% of notes, which led to improvements in sensitivity (0.94 to 0.96), specificity (0.79 to 0.96), PPV (0.89 to 0.98), and NPV (0.88 to 0.91) when compared to a non-selective classifier and when compared to structured proxy variables.We showed that selective classifiers outperformed both non-selective classifiers and structured proxy variables for extracting data from unstructured clinical notes.Selective prediction should be considered when abstaining is preferable to making an incorrect prediction.

    View details for DOI 10.1093/jamia/ocad182

    View details for PubMedID 37769323