Stanford Advisors


All Publications


  • APRICOT-Mamba: Acuity Prediction in Intensive Care Unit (ICU): Development and Validation of a Stability, Transitions, and Life-Sustaining Therapies Prediction Model. Research square Contreras, M., Silva, B., Shickel, B., Davidson, A., Ozrazgat-Baslanti, T., Ren, Y., Guan, Z., Balch, J., Zhang, J., Bandyopadhyay, S., Loftus, T., Khezeli, K., Nerella, S., Bihorac, A., Rashidi, P. 2024

    Abstract

    On average, more than 5 million patients are admitted to intensive care units (ICUs) in the US, with mortality rates ranging from 10 to 29%. The acuity state of patients in the ICU can quickly change from stable to unstable, sometimes leading to life-threatening conditions. Early detection of deteriorating conditions can assist in more timely interventions and improved survival rates. While Artificial Intelligence (AI)-based models show potential for assessing acuity in a more granular and automated manner, they typically use mortality as a proxy of acuity in the ICU. Furthermore, these methods do not determine the acuity state of a patient (i.e., stable or unstable), the transition between acuity states, or the need for life-sustaining therapies. In this study, we propose APRICOT-M (Acuity Prediction in Intensive Care Unit-Mamba), a 1M-parameter state space-based neural network to predict acuity state, transitions, and the need for life-sustaining therapies in real-time among ICU patients. The model integrates ICU data in the preceding four hours (including vital signs, laboratory results, assessment scores, and medications) and patient characteristics (age, sex, race, and comorbidities) to predict the acuity outcomes in the next four hours. Our state space-based model can process sparse and irregularly sampled data without manual imputation, thus reducing the noise in input data and increasing inference speed. The model was trained on data from 107,473 patients (142,062 ICU admissions) from 55 hospitals between 2014-2017 and validated externally on data from 74,901 patients (101,356 ICU admissions) from 143 hospitals. Additionally, it was validated temporally on data from 12,927 patients (15,940 ICU admissions) from one hospital in 2018-2019 and prospectively on data from 215 patients (369 ICU admissions) from one hospital in 2021-2023. Three datasets were used for training and evaluation: the University of Florida Health (UFH) dataset, the electronic ICU Collaborative Research Database (eICU), and the Medical Information Mart for Intensive Care (MIMIC)-IV dataset. APRICOT-M significantly outperforms the baseline acuity assessment, Sequential Organ Failure Assessment (SOFA), for mortality prediction in both external (AUROC 0.95 CI: 0.94-0.95 compared to 0.78 CI: 0.78-0.79) and prospective (AUROC 0.99 CI: 0.97-1.00 compared to 0.80 CI: 0.65-0.92) cohorts, as well as for instability prediction (external AUROC 0.75 CI: 0.74-0.75 compared to 0.51 CI: 0.51-0.51, and prospective AUROC 0.69 CI: 0.64-0.74 compared to 0.53 CI: 0.50-0.57). This tool has the potential to help clinicians make timely interventions by predicting the transition between acuity states and decision-making on life-sustaining within the next four hours in the ICU.

    View details for DOI 10.21203/rs.3.rs-4790824/v1

    View details for PubMedID 39149454

  • Developing a fair and interpretable representation of the clock drawing test for mitigating low education and racial bias. Scientific reports Zhang, J., Bandyopadhyay, S., Kimmet, F., Wittmayer, J., Khezeli, K., Libon, D. J., Price, C. C., Rashidi, P. 2024; 14 (1): 17444

    Abstract

    The clock drawing test (CDT) is a neuropsychological assessment tool to screen an individual's cognitive ability. In this study, we developed a Fair and Interpretable Representation of Clock drawing test (FaIRClocks) to evaluate and mitigate classification bias against people with less than 8 years of education, while screening their cognitive function using an array of neuropsychological measures. In this study, we represented clock drawings by a priorly published 10-dimensional deep learning feature set trained on publicly available data from the National Health and Aging Trends Study (NHATS). These embeddings were further fine-tuned with clocks from a preoperative cognitive screening program at the University of Florida to predict three cognitive scores: the Mini-Mental State Examination (MMSE) total score, an attention composite z-score (ATT-C), and a memory composite z-score (MEM-C). ATT-C and MEM-C scores were developed by averaging z-scores based on normative references. The cognitive screening classifiers were initially tested to see their relative performance in patients with low years of education (< = 8 years) versus patients with higher education (> 8 years) and race. Results indicated that the initial unweighted classifiers confounded lower education with cognitive compromise resulting in a 100% type I error rate for this group. Thereby, the samples were re-weighted using multiple fairness metrics to achieve sensitivity/specificity and positive/negative predictive value (PPV/NPV) balance across groups. In summary, we report the FaIRClocks model, with promise to help identify and mitigate bias against people with less than 8 years of education during preoperative cognitive screening.

    View details for DOI 10.1038/s41598-024-68481-w

    View details for PubMedID 39075127

    View details for PubMedCentralID 8402420

  • Transformers and large language models in healthcare: A review. Artificial intelligence in medicine Nerella, S., Bandyopadhyay, S., Zhang, J., Contreras, M., Siegel, S., Bumin, A., Silva, B., Sena, J., Shickel, B., Bihorac, A., Khezeli, K., Rashidi, P. 2024; 154: 102900

    Abstract

    With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of healthcare data, including clinical NLP, medical imaging, structured Electronic Health Records (EHR), social media, bio-physiological signals, biomolecular sequences. Furthermore, which have also include the articles that used the transformer architecture for generating surgical instructions and predicting adverse outcomes after surgeries under the umbrella of critical care. Under diverse settings, these models have been used for clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. Finally, we also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.

    View details for DOI 10.1016/j.artmed.2024.102900

    View details for PubMedID 38878555

  • Wearable sensors in patient acuity assessment in critical care. Frontiers in neurology Sena, J., Mostafiz, M. T., Zhang, J., Davidson, A. E., Bandyopadhyay, S., Nerella, S., Ren, Y., Ozrazgat-Baslanti, T., Shickel, B., Loftus, T., Schwartz, W. R., Bihorac, A., Rashidi, P. 2024; 15: 1386728

    Abstract

    Acuity assessments are vital for timely interventions and fair resource allocation in critical care settings. Conventional acuity scoring systems heavily depend on subjective patient assessments, leaving room for implicit bias and errors. These assessments are often manual, time-consuming, intermittent, and challenging to interpret accurately, especially for healthcare providers. This risk of bias and error is likely most pronounced in time-constrained and high-stakes environments, such as critical care settings. Furthermore, such scores do not incorporate other information, such as patients' mobility level, which can indicate recovery or deterioration in the intensive care unit (ICU), especially at a granular level. We hypothesized that wearable sensor data could assist in assessing patient acuity granularly, especially in conjunction with clinical data from electronic health records (EHR). In this prospective study, we evaluated the impact of integrating mobility data collected from wrist-worn accelerometers with clinical data obtained from EHR for estimating acuity. Accelerometry data were collected from 87 patients wearing accelerometers on their wrists in an academic hospital setting. The data was evaluated using five deep neural network models: VGG, ResNet, MobileNet, SqueezeNet, and a custom Transformer network. These models outperformed a rule-based clinical score (Sequential Organ Failure Assessment, SOFA) used as a baseline when predicting acuity state (for ground truth we labeled as unstable patients if they needed life-supporting therapies, and as stable otherwise), particularly regarding the precision, sensitivity, and F1 score. The results demonstrate that integrating accelerometer data with demographics and clinical variables improves predictive performance compared to traditional scoring systems in healthcare. Deep learning models consistently outperformed the SOFA score baseline across various scenarios, showing notable enhancements in metrics such as the area under the receiver operating characteristic (ROC) Curve (AUC), precision, sensitivity, specificity, and F1 score. The most comprehensive scenario, leveraging accelerometer, demographics, and clinical data, achieved the highest AUC of 0.73, compared to 0.53 when using SOFA score as the baseline, with significant improvements in precision (0.80 vs. 0.23), specificity (0.79 vs. 0.73), and F1 score (0.77 vs. 0.66). This study demonstrates a novel approach beyond the simplistic differentiation between stable and unstable conditions. By incorporating mobility and comprehensive patient information, we distinguish between these states in critically ill patients and capture essential nuances in physiology and functional status. Unlike rudimentary definitions, such as equating low blood pressure with instability, our methodology delves deeper, offering a more holistic understanding and potentially valuable insights for acuity assessment.

    View details for DOI 10.3389/fneur.2024.1386728

    View details for PubMedID 38784909

    View details for PubMedCentralID PMC11112699