Bio


Keith Morse, MD, MBA, is a pediatric hospitalist and Medical Director of Clinical Informatics - Enterprise AI at Stanford Medicine. His work in operational and research informatics focuses on meaningful deployment of machine learning in clinical settings. He serves as Stanford's co-site PI for participation in PEDSnet, an 11-site pediatric research consortium. His academic roles include Program Director for Stanford's Clinical Informatics fellowship.

Clinical Focus


  • Pediatric Hospital Medicine

Academic Appointments


  • Clinical Associate Professor, Pediatrics

Administrative Appointments


  • Medical Director of Clinical Informatics - Enterprise AI, Stanford Medicine Children's Health (2024 - Present)
  • Program Director, Clinical Informatics Fellowship, Stanford Medicine (2024 - Present)

Professional Education


  • Board Certification: American Board of Preventive Medicine, Clinical Informatics (2021)
  • Medical Education: Sidney Kimmel Medical College Thomas Jefferson University (2015) PA
  • Fellowship: Stanford Hospital and Clinics (2020) CA
  • Board Certification: American Board of Pediatrics, Pediatrics (2018)
  • Residency: Phoenix Children's Hospital Pediatric Residency (2018) AZ
  • Fellowship, Stanford University, Clinical Informatics (2020)
  • Residency, Phoenix Children's Hospital, Pediatrics (2018)
  • MD, Jefferson Medical College (2015)
  • MBA, Washington University in St. Louis (2009)

2024-25 Courses


All Publications


  • Large Language Model Responses to Adolescent Patient and Proxy Messages. JAMA pediatrics Tse, G., Zahedivash, A., Anoshiravani, A., Carlson, J., Haberkorn, W., Morse, K. E. 2024

    View details for DOI 10.1001/jamapediatrics.2024.4438

    View details for PubMedID 39495530

  • Accuracy of a Proprietary Large Language Model in Labeling Obstetric Incident Reports. Joint Commission journal on quality and patient safety Johnson, J., Brown, C., Lee, G., Morse, K. 2024

    Abstract

    BACKGROUND: Using the data collected through incident reporting systems is challenging, as it is a large volume of primarily qualitative information. Large language models (LLMs), such as ChatGPT, provide novel capabilities in text summarization and labeling that could support safety data trending and early identification of opportunities to prevent patient harm. This study assessed the capability of a proprietary LLM (GPT-3.5) to automatically label a cross-sectional sample of real-world obstetric incident reports.METHODS: A sample of 370 incident reports submitted to inpatient obstetric units between December 2022 and May 2023 was extracted. Human-annotated labels were assigned by a clinician reviewer and considered gold standard. The LLM was prompted to label incident reports relying solely on its pretrained knowledge and information included in the prompt. Primary outcomes assessed were sensitivity, specificity, positive predictive value, and negative predictive value. A secondary outcome assessed the human-perceived quality of the model's justification for the label(s) applied.RESULTS: The LLM demonstrated the ability to label incident reports with high sensitivity and specificity. The model applied a total of 79 labels compared to the reviewer's 49 labels. Overall sensitivity for the model was 85.7%, and specificity was 97.9%. Positive and negative predictive values were 53.2% and 99.6%, respectively. For 60.8% of labels, the reviewer approved of the model's justification for applying the label.CONCLUSION: The proprietary LLM demonstrated the ability to label obstetric incident reports with high sensitivity and specificity. LLMs offer the potential to enable more efficient use of data from incident reporting systems.

    View details for DOI 10.1016/j.jcjq.2024.08.001

    View details for PubMedID 39256071

  • A multi-center study on the adaptability of a shared foundation model for electronic health records. NPJ digital medicine Guo, L. L., Fries, J., Steinberg, E., Fleming, S. L., Morse, K., Aftandilian, C., Posada, J., Shah, N., Sung, L. 2024; 7 (1): 171

    Abstract

    Foundation models are transforming artificial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated benefits including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMIC-IV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-specific training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM's performance, and was 60 to 90% more sample-efficient than training local foundation models from scratch. Our findings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI.

    View details for DOI 10.1038/s41746-024-01166-w

    View details for PubMedID 38937550

    View details for PubMedCentralID 10396962

  • Using a Large Language Model to Identify Adolescent Patient Portal Account Access by Guardians. JAMA network open Liang, A. S., Vedak, S., Dussaq, A., Yao, D. H., Morse, K., Ip, W., Pageler, N. M. 2024; 7 (6): e2418454

    View details for DOI 10.1001/jamanetworkopen.2024.18454

    View details for PubMedID 38916895

  • Systematic data quality assessment of electronic health record data to evaluate study-specific fitness: Report from the PRESERVE research study. PLOS digital health Razzaghi, H., Goodwin Davies, A., Boss, S., Bunnell, H. T., Chen, Y., Chrischilles, E. A., Dickinson, K., Hanauer, D., Huang, Y., Ilunga, K. T., Katsoufis, C., Lehmann, H., Lemas, D. J., Matthews, K., Mendonca, E. A., Morse, K., Ranade, D., Rosenman, M., Taylor, B., Walters, K., Denburg, M. R., Forrest, C. B., Bailey, L. C. 2024; 3 (6): e0000527

    Abstract

    Study-specific data quality testing is an essential part of minimizing analytic errors, particularly for studies making secondary use of clinical data. We applied a systematic and reproducible approach for study-specific data quality testing to the analysis plan for PRESERVE, a 15-site, EHR-based observational study of chronic kidney disease in children. This approach integrated widely adopted data quality concepts with healthcare-specific evaluation methods. We implemented two rounds of data quality assessment. The first produced high-level evaluation using aggregate results from a distributed query, focused on cohort identification and main analytic requirements. The second focused on extended testing of row-level data centralized for analysis. We systematized reporting and cataloguing of data quality issues, providing institutional teams with prioritized issues for resolution. We tracked improvements and documented anomalous data for consideration during analyses. The checks we developed identified 115 and 157 data quality issues in the two rounds, involving completeness, data model conformance, cross-variable concordance, consistency, and plausibility, extending traditional data quality approaches to address more complex stratification and temporal patterns. Resolution efforts focused on higher priority issues, given finite study resources. In many cases, institutional teams were able to correct data extraction errors or obtain additional data, avoiding exclusion of 2 institutions entirely and resolving 123 other gaps. Other results identified complexities in measures of kidney function, bearing on the study's outcome definition. Where limitations such as these are intrinsic to clinical data, the study team must account for them in conducting analyses. This study rigorously evaluated fitness of data for intended use. The framework is reusable and built on a strong theoretical underpinning. Significant data quality issues that would have otherwise delayed analyses or made data unusable were addressed. This study highlights the need for teams combining subject-matter and informatics expertise to address data quality when working with real world data.

    View details for DOI 10.1371/journal.pdig.0000527

    View details for PubMedID 38935590

    View details for PubMedCentralID PMC11210795

  • Learning competing risks across multiple hospitals: one-shot distributed algorithms. Journal of the American Medical Informatics Association : JAMIA Zhang, D., Tong, J., Jing, N., Yang, Y., Luo, C., Lu, Y., Christakis, D. A., Güthe, D., Hornig, M., Kelleher, K. J., Morse, K. E., Rogerson, C. M., Divers, J., Carroll, R. J., Forrest, C. B., Chen, Y. 2024

    Abstract

    To characterize the complex interplay between multiple clinical conditions in a time-to-event analysis framework using data from multiple hospitals, we developed two novel one-shot distributed algorithms for competing risk models (ODACoR). By applying our algorithms to the EHR data from eight national children's hospitals, we quantified the impacts of a wide range of risk factors on the risk of post-acute sequelae of SARS-COV-2 (PASC) among children and adolescents.Our ODACoR algorithms are effectively executed due to their devised simplicity and communication efficiency. We evaluated our algorithms via extensive simulation studies as applications to quantification of the impacts of risk factors for PASC among children and adolescents using data from eight children's hospitals including the Children's Hospital of Philadelphia, Cincinnati Children's Hospital Medical Center, Children's Hospital of Colorado covering over 6.5 million pediatric patients. The accuracy of the estimation was assessed by comparing the results from our ODACoR algorithms with the estimators derived from the meta-analysis and the pooled data.The meta-analysis estimator showed a high relative bias (∼40%) when the clinical condition is relatively rare (∼0.5%), whereas ODACoR algorithms exhibited a substantially lower relative bias (∼0.2%). The estimated effects from our ODACoR algorithms were identical on par with the estimates from the pooled data, suggesting the high reliability of our federated learning algorithms. In contrast, the meta-analysis estimate failed to identify risk factors such as age, gender, chronic conditions history, and obesity, compared to the pooled data.Our proposed ODACoR algorithms are communication-efficient, highly accurate, and suitable to characterize the complex interplay between multiple clinical conditions.Our study demonstrates that our ODACoR algorithms are communication-efficient and can be widely applicable for analyzing multiple clinical conditions in a time-to-event analysis framework.

    View details for DOI 10.1093/jamia/ocae027

    View details for PubMedID 38456459

  • Characterizing the limitations of using diagnosis codes in the context of machine learning for healthcare. BMC medical informatics and decision making Guo, L. L., Morse, K. E., Aftandilian, C., Steinberg, E., Fries, J., Posada, J., Fleming, S. L., Lemmon, J., Jessa, K., Shah, N., Sung, L. 2024; 24 (1): 51

    Abstract

    Diagnostic codes are commonly used as inputs for clinical prediction models, to create labels for prediction tasks, and to identify cohorts for multicenter network studies. However, the coverage rates of diagnostic codes and their variability across institutions are underexplored. The primary objective was to describe lab- and diagnosis-based labels for 7 selected outcomes at three institutions. Secondary objectives were to describe agreement, sensitivity, and specificity of diagnosis-based labels against lab-based labels.This study included three cohorts: SickKids from The Hospital for Sick Children, and StanfordPeds and StanfordAdults from Stanford Medicine. We included seven clinical outcomes with lab-based definitions: acute kidney injury, hyperkalemia, hypoglycemia, hyponatremia, anemia, neutropenia and thrombocytopenia. For each outcome, we created four lab-based labels (abnormal, mild, moderate and severe) based on test result and one diagnosis-based label. Proportion of admissions with a positive label were presented for each outcome stratified by cohort. Using lab-based labels as the gold standard, agreement using Cohen's Kappa, sensitivity and specificity were calculated for each lab-based severity level.The number of admissions included were: SickKids (n = 59,298), StanfordPeds (n = 24,639) and StanfordAdults (n = 159,985). The proportion of admissions with a positive diagnosis-based label was significantly higher for StanfordPeds compared to SickKids across all outcomes, with odds ratio (99.9% confidence interval) for abnormal diagnosis-based label ranging from 2.2 (1.7-2.7) for neutropenia to 18.4 (10.1-33.4) for hyperkalemia. Lab-based labels were more similar by institution. When using lab-based labels as the gold standard, Cohen's Kappa and sensitivity were lower at SickKids for all severity levels compared to StanfordPeds.Across multiple outcomes, diagnosis codes were consistently different between the two pediatric institutions. This difference was not explained by differences in test results. These results may have implications for machine learning model development and deployment.

    View details for DOI 10.1186/s12911-024-02449-8

    View details for PubMedID 38355486

    View details for PubMedCentralID PMC10868117

  • Evaluation of a Large Language Model to Identify Confidential Content in Adolescent Encounter Notes. JAMA pediatrics Rabbani, N., Brown, C., Bedgood, M., Goldstein, R. L., Carlson, J. L., Pageler, N. M., Morse, K. E. 2024

    View details for DOI 10.1001/jamapediatrics.2023.6032

    View details for PubMedID 38252434

    View details for PubMedCentralID PMC10804277

  • MEDALIGN: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records Fleming, S. L., Lozano, A., Haberkorn, W. J., Jindal, J. A., Reis, E., Thapa, R., Blankemeier, L., Genkins, J. Z., Steinberg, E., Nayak, A., Patel, B., Chiang, C., Callahan, A., Huo, Z., Gatidis, S., Adams, S., Fayanju, O., Shah, S. J., Savage, T., Goh, E., Chaudhari, A. S., Aghaeepour, N., Sharp, C., Pfeffer, M. A., Liang, P., Chen, J. H., Morse, K. E., Brunskill, E. P., Fries, J. A., Shah, N. H., Wooldridge, M., Dy, J., Natarajan, S. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 22021-22030
  • Self-supervised machine learning using adult inpatient data produces effective models for pediatric clinical prediction tasks. Journal of the American Medical Informatics Association : JAMIA Lemmon, J., Guo, L. L., Steinberg, E., Morse, K. E., Fleming, S. L., Aftandilian, C., Pfohl, S. R., Posada, J. D., Shah, N., Fries, J., Sung, L. 2023

    Abstract

    Development of electronic health records (EHR)-based machine learning models for pediatric inpatients is challenged by limited training data. Self-supervised learning using adult data may be a promising approach to creating robust pediatric prediction models. The primary objective was to determine whether a self-supervised model trained in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients, for pediatric inpatient clinical prediction tasks.This retrospective cohort study used EHR data and included patients with at least one admission to an inpatient unit. One admission per patient was randomly selected. Adult inpatients were 18 years or older while pediatric inpatients were more than 28 days and less than 18 years. Admissions were temporally split into training (January 1, 2008 to December 31, 2019), validation (January 1, 2020 to December 31, 2020), and test (January 1, 2021 to August 1, 2022) sets. Primary comparison was a self-supervised model trained in adult inpatients versus count-based logistic regression models trained in pediatric inpatients. Primary outcome was mean area-under-the-receiver-operating-characteristic-curve (AUROC) for 11 distinct clinical outcomes. Models were evaluated in pediatric inpatients.When evaluated in pediatric inpatients, mean AUROC of self-supervised model trained in adult inpatients (0.902) was noninferior to count-based logistic regression models trained in pediatric inpatients (0.868) (mean difference = 0.034, 95% CI=0.014-0.057; P < .001 for noninferiority and P = .006 for superiority).Self-supervised learning in adult inpatients was noninferior to logistic regression models trained in pediatric inpatients. This finding suggests transferability of self-supervised models trained in adult patients to pediatric patients, without requiring costly model retraining.

    View details for DOI 10.1093/jamia/ocad175

    View details for PubMedID 37639620

  • Pseudo-randomized testing of a discharge medication alert to reduce free-text prescribing. Applied clinical informatics Rabbani, N., Ho, M., Dash, D., Calway, T., Morse, K., Chadwick, W. 2023

    Abstract

    Pseudo-randomized testing can be applied to perform rigorous yet practical evaluations of clinical decision support tools. We apply this methodology to an interruptive alert aimed at reducing free-text prescriptions. Using free-text instead of structured computerized provider order entry elements can cause medication errors and inequity in care by bypassing medication-based clinical decision support tools and hindering automated translation of prescription instructions.Evaluate the effectiveness of an interruptive alert at reducing free-text prescriptions via pseudo-randomized testing using native electronic health records (EHR) functionality.Two versions of an EHR alert triggered when a provider attempted to sign a discharge free-text prescription. The visible version displayed an interruptive alert to the user, and a silent version triggered in the background, serving as a control. Providers were assigned to the visible and silent arms based on even/odd EHR provider IDs. The proportion of encounters with a free-text prescription was calculated across the groups. Alert trigger rates were compared in process control charts. Free-text prescriptions were analyzed to identify prescribing patterns.Over the 28 week study period, 143 providers triggered 695 alerts (345 visible and 350 silent). The proportions of encounters with free-text prescriptions were 83% (266/320) and 90% (273/303) in the intervention and control groups respectively (p-value = 0.01). For the active alert, median time to action was 31 seconds. Alert trigger rates between groups were similar over time. Ibuprofen, oxycodone, steroid tapers, and oncology-related prescriptions accounted for most free-text prescriptions. A majority of these prescriptions originated from user preference lists.An interruptive alert was associated with a modest reduction in free-text prescriptions. Furthermore, the majority of these prescriptions could have been reproduced using structured order entry fields. Targeting user preference lists shows promise for future intervention.

    View details for DOI 10.1055/a-2068-6940

    View details for PubMedID 37015344

  • A Natural Language Processing Model to Identify Confidential Content in Adolescent Clinical Notes. Applied clinical informatics Rabbani, N., Bedgood, M., Brown, C., Steinberg, E., Goldstein, R., Carlson, J., Pageler, N., Morse, K. 2023

    Abstract

    BACKGROUND: The 21st Century Cures Act mandates the immediate, electronic release of health information to patients. However, in the case of adolescents, special consideration is required to ensure that confidentiality is maintained. The detection of confidential content in clinical notes may support operational efforts to preserve adolescent confidentiality while implementing information sharing.OBJECTIVE: Determine if a natural language processing (NLP) algorithm can identify confidential content in adolescent clinical progress notes.METHODS: 1,200 outpatient adolescent progress notes written between 2016 and 2019 were manually annotated to identify confidential content. Labeled sentences from this corpus were featurized and used to train a two-part logistic regression model, which provides both sentence-level and note-level probability estimates that a given text contains confidential content. This model was prospectively validated on a set of 240 progress notes written in May 2022. It was subsequently deployed in a pilot intervention to augment an ongoing operational effort to identify confidential content in progress notes. Note-level probability estimates were used to triage notes for review and sentence-level probability estimates were used to highlight high-risk portions of those notes to aid the manual reviewer.RESULTS: The prevalence of notes containing confidential content was 21% (255/1200) and 22% (53/240) in the train/test and validation cohorts. The ensemble logistic regression model achieved an AUROC of 90% and 88% in the test and validation cohorts. Its use in a pilot intervention identified outlier documentation practices and demonstrated efficiency gains over completely manual note review.DISCUSSION: An NLP algorithm can identify confidential content in progress notes with high accuracy. Its human-in-the-loop deployment in clinical operations augmented an ongoing operational effort to identify confidential content in adolescent progress notes. These findings suggest NLP may be used to support efforts to preserve adolescent confidentiality in the wake of the information blocking mandate.

    View details for DOI 10.1055/a-2051-9764

    View details for PubMedID 36898410

  • The Prevalence of Confidential Content in Adolescent Progress Notes Prior to the 21st Century Cures Act Information Blocking Mandate. Applied clinical informatics Bedgood, M., Rabbani, N., Brown, C., Goldstein, R., Carlson, J. L., Steinberg, E., Powell, A., Pageler, N. M., Morse, K. 2023; 14 (2): 337-344

    Abstract

    The 21st Century Cures Act information blocking final rule mandated the immediate and electronic release of health care data in 2020. There is anecdotal concern that a significant amount of information is documented in notes that would breach adolescent confidentiality if released electronically to a guardian.The purpose of this study was to quantify the prevalence of confidential information, based on California laws, within progress notes for adolescent patients that would be released electronically and assess differences in prevalence across patient demographics.This is a single-center retrospective chart review of outpatient progress notes written between January 1, 2016, and December 31, 2019, at a large suburban academic pediatric network. Notes were labeled into one of three confidential domains by five expert reviewers trained on a rubric defining confidential information for adolescents derived from California state law. Participants included a random sampling of eligible patients aged 12 to 17 years old at the time of note creation. Secondary analysis included prevalence of confidentiality across age, gender, language spoken, and patient race.Of 1,200 manually reviewed notes, 255 notes (21.3%) (95% confidence interval: 19-24%) contained confidential information. There was a similar distribution among gender and age and a majority of English speaking (83.9%) and white or Caucasian patients (41.2%) in the cohort. Confidential information was more likely to be found in notes for females (p < 0.05) as well as for English-speaking patients (p < 0.05). Older patients had a higher probability of notes containing confidential information (p < 0.05).This study demonstrates that there is a significant risk to breach adolescent confidentiality if historical progress notes are released electronically to proxies without further review or redaction. With increased sharing of health care data, there is a need to protect the privacy of the adolescents and prevent potential breaches of confidentiality.

    View details for DOI 10.1055/s-0043-1767682

    View details for PubMedID 37137339

    View details for PubMedCentralID PMC10156443

  • A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program. PloS one Lorman, V., Razzaghi, H., Song, X., Morse, K., Utidjian, L., Allen, A. J., Rao, S., Rogerson, C., Bennett, T. D., Morizono, H., Eckrich, D., Jhaveri, R., Huang, Y., Ranade, D., Pajor, N., Lee, G. M., Forrest, C. B., Bailey, L. C. 2023; 18 (8): e0289774

    Abstract

    As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

    View details for DOI 10.1371/journal.pone.0289774

    View details for PubMedID 37561683

  • User-centred design for machine learning in health care: a case study from care management. BMJ health & care informatics Seneviratne, M. G., Li, R. C., Schreier, M., Lopez-Martinez, D., Patel, B. S., Yakubovich, A., Kemp, J. B., Loreaux, E., Gamble, P., El-Khoury, K., Vardoulakis, L., Wong, D., Desai, J., Chen, J. H., Morse, K. E., Downing, N. L., Finger, L. T., Chen, M., Shah, N. 2022; 29 (1)

    Abstract

    OBJECTIVES: Few machine learning (ML) models are successfully deployed in clinical practice. One of the common pitfalls across the field is inappropriate problem formulation: designing ML to fit the data rather than to address a real-world clinical pain point.METHODS: We introduce a practical toolkit for user-centred design consisting of four questions covering: (1) solvable pain points, (2) the unique value of ML (eg, automation and augmentation), (3) the actionability pathway and (4) the model's reward function. This toolkit was implemented in a series of six participatory design workshops with care managers in an academic medical centre.RESULTS: Pain points amenable to ML solutions included outpatient risk stratification and risk factor identification. The endpoint definitions, triggering frequency and evaluation metrics of the proposed risk scoring model were directly influenced by care manager workflows and real-world constraints.CONCLUSIONS: Integrating user-centred design early in the ML life cycle is key for configuring models in a clinically actionable way. This toolkit can guide problem selection and influence choices about the technical setup of the ML problem.

    View details for DOI 10.1136/bmjhci-2022-100656

    View details for PubMedID 36220304

  • Assessment of Adherence to Reporting Guidelines by Commonly Used Clinical Prediction Models From a Single Vendor: A Systematic Review. JAMA network open Lu, J. H., Callahan, A., Patel, B. S., Morse, K. E., Dash, D., Pfeffer, M. A., Shah, N. H. 2022; 5 (8): e2227779

    Abstract

    Importance: Various model reporting guidelines have been proposed to ensure clinical prediction models are reliable and fair. However, no consensus exists about which model details are essential to report, and commonalities and differences among reporting guidelines have not been characterized. Furthermore, how well documentation of deployed models adheres to these guidelines has not been studied.Objectives: To assess information requested by model reporting guidelines and whether the documentation for commonly used machine learning models developed by a single vendor provides the information requested.Evidence Review: MEDLINE was queried using machine learning model card and reporting machine learning from November 4 to December 6, 2020. References were reviewed to find additional publications, and publications without specific reporting recommendations were excluded. Similar elements requested for reporting were merged into representative items. Four independent reviewers and 1 adjudicator assessed how often documentation for the most commonly used models developed by a single vendor reported the items.Findings: From 15 model reporting guidelines, 220 unique items were identified that represented the collective reporting requirements. Although 12 items were commonly requested (requested by 10 or more guidelines), 77 items were requested by just 1 guideline. Documentation for 12 commonly used models from a single vendor reported a median of 39% (IQR, 37%-43%; range, 31%-47%) of items from the collective reporting requirements. Many of the commonly requested items had 100% reporting rates, including items concerning outcome definition, area under the receiver operating characteristics curve, internal validation, and intended clinical use. Several items reported half the time or less related to reliability, such as external validation, uncertainty measures, and strategy for handling missing data. Other frequently unreported items related to fairness (summary statistics and subgroup analyses, including for race and ethnicity or sex).Conclusions and Relevance: These findings suggest that consistent reporting recommendations for clinical predictive models are needed for model developers to share necessary information for model deployment. The many published guidelines would, collectively, require reporting more than 200 items. Model documentation from 1 vendor reported the most commonly requested items from model reporting guidelines. However, areas for improvement were identified in reporting items related to model reliability and fairness. This analysis led to feedback to the vendor, which motivated updates to the documentation for future users.

    View details for DOI 10.1001/jamanetworkopen.2022.27779

    View details for PubMedID 35984654

  • Monitoring Approaches for a Pediatric Chronic Kidney Disease Machine Learning Model. Applied clinical informatics Morse, K. E., Brown, C., Fleming, S., Todd, I., Powell, A., Russell, A., Scheinker, D., Sutherland, S. M., Lu, J., Watkins, B., Shah, N. H., Pageler, N. M., Palma, J. P. 2022; 13 (2): 431-438

    Abstract

    OBJECTIVE: The purpose of this study is to evaluate the ability of three metrics to monitor for a reduction in performance of a chronic kidney disease (CKD) model deployed at a pediatric hospital.METHODS: The CKD risk model estimates a patient's risk of developing CKD 3 to 12 months following an inpatient admission. The model was developed on a retrospective dataset of 4,879 admissions from 2014 to 2018, then run silently on 1,270 admissions from April to October, 2019. Three metrics were used to monitor its performance during the silent phase: (1) standardized mean differences (SMDs); (2) performance of a "membership model"; and (3) response distribution analysis. Observed patient outcomes for the 1,270 admissions were used to calculate prospective model performance and the ability of the three metrics to detect performance changes.RESULTS: The deployed model had an area under the receiver-operator curve (AUROC) of 0.63 in the prospective evaluation, which was a significant decrease from an AUROC of 0.76 on retrospective data (p=0.033). Among the three metrics, SMDs were significantly different for 66/75 (88%) of the model's input variables (p <0.05) between retrospective and deployment data. The membership model was able to discriminate between the two settings (AUROC=0.71, p <0.0001) and the response distributions were significantly different (p <0.0001) for the two settings.CONCLUSION: This study suggests that the three metrics examined could provide early indication of performance deterioration in deployed models' performance.

    View details for DOI 10.1055/s-0042-1746168

    View details for PubMedID 35508197

  • Ensuring Adolescent Patient Portal Confidentiality in the Age of the Cures Act Final Rule. The Journal of adolescent health : official publication of the Society for Adolescent Medicine Xie, J., McPherson, T., Powell, A., Fong, P., Hogan, A., Ip, W., Morse, K., Carlson, J. L., Lee, T., Pageler, N. 2021

    Abstract

    PURPOSE: Managing confidential adolescent health information in patient portals presents unique challenges. Adolescent patients and guardians electronically access medical records and communicate with providers via portals. In confidential matters like sexual health, ensuring confidentiality is crucial. A key aspect of confidential portals is ensuring that the account is registered to and utilized by the intended user. Inappropriately registered or guardian-accessed adolescent portal accounts may lead to confidentiality breaches.METHODS: We used a quality improvement framework to develop screening methodologies to flag guardian-accessible accounts. Accounts of patients aged 12-17 were flagged via manual review of account emails and natural language processing of portal messages. We implemented a reconciliation program to correct affected accounts' registered email. Clinics were notified about sign-up errors and educated on sign-up workflow. An electronic alert was created to check the adolescent's email prior to account activation.RESULTS: After initial screening, 2,307 of 3,701 (62%) adolescent accounts were flagged as registered with a guardian's email. Those accounts were notified to resolve their logins. After five notifications over 8 weeks, 266 of 2,307 accounts (12%) were corrected; the remaining 2,041 (88%) were deactivated.CONCLUSIONS: The finding that 62% of adolescent portal accounts were used/accessed by guardians has significant confidentiality implications. In the context of the Cures Act Final Rule and increased information sharing, our institution's experience with ensuring appropriate access to adolescent portal accounts is necessary, timely, and relevant. This study highlights ways to improve patient portal confidentiality and prompts institutions caring for adolescents to review their systems and processes.

    View details for DOI 10.1016/j.jadohealth.2021.09.009

    View details for PubMedID 34666956

  • Assessment of Prevalence of Adolescent Patient Portal Account Access by Guardians. JAMA network open Ip, W., Yang, S., Parker, J., Powell, A., Xie, J., Morse, K., Aikens, R. C., Lee, J., Gill, M., Vundavalli, S., Huang, Y., Huang, J., Chen, J. H., Hoffman, J., Kuelbs, C., Pageler, N. 2021; 4 (9): e2124733

    Abstract

    Importance: Patient portals can be configured to allow confidential communication for adolescents' sensitive health care information. Guardian access of adolescent patient portal accounts could compromise adolescents' confidentiality.Objective: To estimate the prevalence of guardian access to adolescent patient portals at 3 academic children's hospitals.Design, Setting, and Participants: A cross-sectional study to estimate the prevalence of guardian access to adolescent patient portal accounts was conducted at 3 academic children's hospitals. Adolescent patients (aged 13-18 years) with access to their patient portal account with at least 1 outbound message from their portal during the study period were included. A rule-based natural language processing algorithm was used to analyze all portal messages from June 1, 2014, to February 28, 2020, and identify any message sent by guardians. The sensitivity and specificity of the algorithm at each institution was estimated through manual review of a stratified subsample of patient accounts. The overall proportion of accounts with guardian access was estimated after correcting for the sensitivity and specificity of the natural language processing algorithm.Exposures: Use of patient portal.Main Outcome and Measures: Percentage of adolescent portal accounts indicating guardian access.Results: A total of 3429 eligible adolescent accounts containing 25 642 messages across 3 institutions were analyzed. A total of 1797 adolescents (52%) were female and mean (SD) age was 15.6 (1.6) years. The percentage of adolescent portal accounts with apparent guardian access ranged from 52% to 57% across the 3 institutions. After correcting for the sensitivity and specificity of the algorithm based on manual review of 200 accounts per institution, an estimated 64% (95% CI, 59%-69%) to 76% (95% CI, 73%-88%) of accounts with outbound messages were accessed by guardians across the 3 institutions.Conclusions and Relevance: In this study, more than half of adolescent accounts with outbound messages were estimated to have been accessed by guardians at least once. These findings have implications for health systems intending to rely on separate adolescent accounts to protect adolescent confidentiality.

    View details for DOI 10.1001/jamanetworkopen.2021.24733

    View details for PubMedID 34529064

  • A survey of extant organizational and computational setups for deploying predictive models in health systems. Journal of the American Medical Informatics Association : JAMIA Kashyap, S., Morse, K. E., Patel, B., Shah, N. H. 2021

    Abstract

    OBJECTIVE: Artificial intelligence (AI) and machine learning (ML) enabled healthcare is now feasible for many health systems, yet little is known about effective strategies of system architecture and governance mechanisms for implementation. Our objective was to identify the different computational and organizational setups that early-adopter health systems have utilized to integrate AI/ML clinical decision support (AI-CDS) and scrutinize their trade-offs.MATERIALS AND METHODS: We conducted structured interviews with health systems with AI deployment experience about their organizational and computational setups for deploying AI-CDS at point of care.RESULTS: We contacted 34 health systems and interviewed 20 healthcare sites (58% response rate). Twelve (60%) sites used the native electronic health record vendor configuration for model development and deployment, making it the most common shared infrastructure. Nine (45%) sites used alternative computational configurations which varied significantly. Organizational configurations for managing AI-CDS were distinguished by how they identified model needs, built and implemented models, and were separable into 3 major types: Decentralized translation (n=10, 50%), IT Department led (n=2, 10%), and AI in Healthcare (AIHC) Team (n=8, 40%).DISCUSSION: No singular computational configuration enables all current use cases for AI-CDS. Health systems need to consider their desired applications for AI-CDS and whether investment in extending the off-the-shelf infrastructure is needed. Each organizational setup confers trade-offs for health systems planning strategies to implement AI-CDS.CONCLUSION: Health systems will be able to use this framework to understand strengths and weaknesses of alternative organizational and computational setups when designing their strategy for artificial intelligence.

    View details for DOI 10.1093/jamia/ocab154

    View details for PubMedID 34423364

  • Quantifying Discharge Medication Reconciliation Errors at 2 Pediatric Hospitals. Pediatric quality & safety Morse, K. E., Chadwick, W. A., Paul, W., Haaland, W., Pageler, N. M., Tarrago, R. 2021; 6 (4): e436

    Abstract

    Introduction: Medication reconciliation errors (MREs) are common and can lead to significant patient harm. Quality improvement efforts to identify and reduce these errors typically rely on resource-intensive chart reviews or adverse event reporting. Quantifying these errors hospital-wide is complicated and rarely done. The purpose of this study is to define a set of 6 MREs that can be easily identified across an entire healthcare organization and report their prevalence at 2 pediatric hospitals.Methods: An algorithmic analysis of discharge medication lists and confirmation by clinician reviewers was used to find the prevalence of the 6 discharge MREs at 2 pediatric hospitals. These errors represent deviations from the standards for medication instruction completeness, clarity, and safety. The 6 error types are Duplication, Missing Route, Missing Dose, Missing Frequency, Unlisted Medication, and See Instructions errors.Results: This study analyzed 67,339 discharge medications and detected MREs commonly at both hospitals. For Institution A, a total of 4,234 errors were identified, with 29.9% of discharges containing at least one error and an average of 0.7 errors per discharge. For Institution B, a total of 5,942 errors were identified, with 42.2% of discharges containing at least 1 error and an average of 1.6 errors per discharge. The most common error types were Duplication and See Instructions errors.Conclusion: The presented method shows these MREs to be a common finding in pediatric care. This work offers a tool to strengthen hospital-wide quality improvement efforts to reduce pediatric medication errors.

    View details for DOI 10.1097/pq9.0000000000000436

    View details for PubMedID 34345749

  • Digital Symptom Checker Usage and Triage: Population-Based Descriptive Study in a Large North American Integrated Health System. Journal of medical Internet research Morse, K. E., Ostberg, N. P., Jones, V. G., Chan, A. S. 2020

    Abstract

    BACKGROUND: Pressure on the United States (US) healthcare system has been increasing due to a combination of aging populations, rising healthcare expenditures and, most recently, the COVID-19 pandemic. Responses are hindered in part by a reliance on a limited supply of highly trained healthcare professionals, creating a need for scalable technological solutions. Digital symptom checkers are artificial intelligence (AI)-supported software tools that use a conversational "chatbot" format to support rapid diagnosis and consistent triage. The COVID-19 pandemic has brought new attention to these tools, with the need to avoid face-to-face contact and preserve urgent care capacity. However, evidence-based deployment of these chatbots requires an understanding of user demographics and associated triage recommendations generated by a large, general population.OBJECTIVE: In this study we evaluate the user demographics and levels of triage acuity provided by one symptom checker chatbot deployed in partnership with a large integrated health system in the US.METHODS: Population-based descriptive study including all online symptom assessments completed on the website and patient portal of the Sutter Health system (24 hospitals in Northern California) from April 24th, 2019 to February 1st, 2020. User demographics were compared to relevant US Census population data.RESULTS: A total of 26,646 symptom assessments were completed during the study period. Most assessments (17,816/26,646, 66.9%) were completed by female users. Mean user age was 34.3 years (SD: 14.4 years), compared to a median age of 37.3 years of the general population. The most common initial symptom was 'abdominal pain' (2,060/26,646, 7.7%). A substantial portion (12,357/26,646, 46.4%) was completed outside of typical physician office hours. Most users were advised to seek medical care the same day (7,299/26,646, 27.4%) or within 2-3 days (6,301/26,646, 23.6%). Over one quarter of assessments required a high degree of urgency (7,723/26,646, 29.0%).CONCLUSIONS: Users of the symptom checker chatbot were broadly representative of our patient population, though skewed towards younger and female users. Triage recommendations are comparable to those of nurse-staffed phone triage lines. While the emergence of COVID-19 increases the enthusiasm for remote medical assessment tools, it is important to take an evidence-based approach to their deployment.CLINICALTRIAL:

    View details for DOI 10.2196/20549

    View details for PubMedID 33170799

  • Estimate the hidden deployment cost of predictive models to improve patient care. Nature medicine Morse, K. E., Bagely, S. C., Shah, N. H. 2020; 26 (1): 18–19

    View details for DOI 10.1038/s41591-019-0651-8

    View details for PubMedID 31932778

  • Your Patient Has a New Health App? Start With Its Data Source. Journal of participatory medicine Morse, K. E., Schremp, J., Pageler, N. M., Palma, J. P. 2019; 11 (2): e14288

    Abstract

    Recent regulatory and technological advances have enabled a new era of health apps that are controlled by patients and contain valuable health information. These health apps will be numerous and use novel interfaces that appeal to patients but will likely be unfamiliar to practitioners. We posit that understanding the origin of the health data is the most meaningful and versatile way for physicians to understand and effectively use these apps in patient care. This will allow providers to better support patients and encourage patient engagement in their own care.

    View details for DOI 10.2196/14288

    View details for PubMedID 33055064

    View details for PubMedCentralID PMC7434101

  • Hospital-Level Variation in Practice Patterns and Patient Outcomes for Pediatric Patients Hospitalized With Functional Constipation. Hospital pediatrics Librizzi, J., Flores, S., Morse, K., Kelleher, K., Carter, J., Bode, R. 2017; 7 (6): 320-327

    Abstract

    Constipation is a common pediatric condition with a prevalence of 3% to 5% in children aged 4 to 17 years. Currently, there are no evidence-based guidelines for the management of pediatric patients hospitalized with constipation. The primary objective was to evaluate practice patterns and patient outcomes for the hospital management of functional constipation in US children's hospitals.We conducted a multicenter, retrospective cohort study of children aged 0 to 18 years hospitalized for functional constipation from 2012 to 2014 by using the Pediatric Health Information System. Patients were included by using constipation and other related diagnoses as classified by International Classification of Diseases, Ninth Revision. Patients with complex chronic conditions were excluded. Outcome measures included percentage of hospitalizations due to functional constipation, therapies used, length of stay, and 90-day readmission rates. Statistical analysis included means with 95% confidence intervals for individual hospital outcomes.A total of 14 243 hospitalizations were included, representing 12 804 unique patients. The overall percentage of hospitalizations due to functional constipation was 0.65% (range: 0.19%-1.41%, P < .0001). The percentage of patients receiving the following treatment during their hospitalization included: electrolyte laxatives: 40% to 96%; sodium phosphate enema: 0% to 64%; mineral oil enema: 0% to 61%; glycerin suppository: 0% to 37%; bisacodyl 0% to 47%; senna: 0% to 23%; and docusate 0% to 11%. Mean length of stay was 1.97 days (range: 1.31-2.73 days, P < .0001). Mean 90-day readmission rate was 3.78% (range: 0.95%-7.53%, P < .0001).There is significant variation in practice patterns and clinical outcomes for pediatric patients hospitalized with functional constipation across US children's hospitals. Collaborative initiatives to adopt evidence-based best practices guidelines could help standardize the hospital management of pediatric functional constipation.

    View details for DOI 10.1542/hpeds.2016-0101

    View details for PubMedID 28522604