Bio


April Shichu Liang, M.D., is a fellow in the Stanford University Clinical Informatics Fellowship Program. Dr. Liang holds a B.S.E. in Computer Science from Princeton University and an M.D. from UCSF School of Medicine. She completed her Internal Medicine residency at UCSF. Dr. Liang’s clinical interests include hospital medicine, and her research interests are in implementation of machine learning tools in healthcare, clinical decision support, and data-driven quality improvement. Her past work includes the development of a machine learning model to predict incident delirium in hospitalized patients and EHR-based interventions to increase guideline-recommended public health screening.

Clinical Focus


  • Fellow
  • Internal Medicine
  • Medical Informatics
  • Machine Learning
  • Artificial Intelligence
  • Clinical Decision Support

Honors & Awards


  • Pediatrics Fellow Scholarship Award, Stanford Department of Pediatrics (December 2023)

Boards, Advisory Committees, Professional Organizations


  • Executive Board, AMIA Clinical Informatics Fellows (ACIF) (2024 - Present)
  • Co-Chair, Housestaff Information Technology Enhancement Council (HITEC) (2024 - Present)
  • Project Leader, Stanford Resident Safety Council (2024 - Present)

Professional Education


  • Residency, UCSF, Internal Medicine (2023)
  • MD, UCSF (2020)
  • BSE, Princeton University, Computer Science (2015)

Graduate and Fellowship Programs


All Publications


  • Clinical entity augmented retrieval for clinical information extraction. NPJ digital medicine Lopez, I., Swaminathan, A., Vedula, K., Narayanan, S., Nateghi Haredasht, F., Ma, S. P., Liang, A. S., Tate, S., Maddali, M., Gallo, R. J., Shah, N. H., Chen, J. H. 2025; 8 (1): 45

    Abstract

    Large language models (LLMs) with retrieval-augmented generation (RAG) have improved information extraction over previous methods, yet their reliance on embeddings often leads to inefficient retrieval. We introduce CLinical Entity Augmented Retrieval (CLEAR), a RAG pipeline that retrieves information using entities. We compared CLEAR to embedding RAG and full-note approaches for extracting 18 variables using six LLMs across 20,000 clinical notes. Average F1 scores were 0.90, 0.86, and 0.79; inference times were 4.95, 17.41, and 20.08 s per note; average model queries were 1.68, 4.94, and 4.18 per note; and average input tokens were 1.1k, 3.8k, and 6.1k per note for CLEAR, embedding RAG, and full-note approaches, respectively. In conclusion, CLEAR utilizes clinical entities for information retrieval and achieves >70% reduction in token usage and inference time with improved performance compared to modern methods.

    View details for DOI 10.1038/s41746-024-01377-1

    View details for PubMedID 39828800

    View details for PubMedCentralID 4287068

  • Ambient artificial intelligence scribes: utilization and impact on documentation time. Journal of the American Medical Informatics Association : JAMIA Ma, S. P., Liang, A. S., Shah, S. J., Smith, M., Jeong, Y., Devon-Sand, A., Crowell, T., Delahaie, C., Hsia, C., Lin, S., Shanafelt, T., Pfeffer, M. A., Sharp, C., Garcia, P. 2024

    Abstract

    To quantify utilization and impact on documentation time of a large language model-powered ambient artificial intelligence (AI) scribe.This prospective quality improvement study was conducted at a large academic medical center with 45 physicians from 8 ambulatory disciplines over 3 months. Utilization and documentation times were derived from electronic health record (EHR) use measures.The ambient AI scribe was utilized in 9629 of 17 428 encounters (55.25%) with significant interuser heterogeneity. Compared to baseline, median time per note reduced significantly by 0.57 minutes. Median daily documentation, afterhours, and total EHR time also decreased significantly by 6.89, 5.17, and 19.95 minutes/day, respectively.An early pilot of an ambient AI scribe demonstrated robust utilization and reduced time spent on documentation and in the EHR. There was notable individual-level heterogeneity.Large language model-powered ambient AI scribes may reduce documentation burden. Further studies are needed to identify which users benefit most from current technology and how future iterations can support a broader audience.

    View details for DOI 10.1093/jamia/ocae304

    View details for PubMedID 39688515

  • Ambient artificial intelligence scribes: physician burnout and perspectives on usability and documentation burden. Journal of the American Medical Informatics Association : JAMIA Shah, S. J., Devon-Sand, A., Ma, S. P., Jeong, Y., Crowell, T., Smith, M., Liang, A. S., Delahaie, C., Hsia, C., Shanafelt, T., Pfeffer, M. A., Sharp, C., Lin, S., Garcia, P. 2024

    Abstract

    This study evaluates the pilot implementation of ambient AI scribe technology to assess physician perspectives on usability and the impact on physician burden and burnout.This prospective quality improvement study was conducted at Stanford Health Care with 48 physicians over a 3-month period. Outcome measures included burden, burnout, usability, and perceived time savings.Paired survey analysis (n = 38) revealed large statistically significant reductions in task load (-24.42, p <.001) and burnout (-1.94, p <.001), and moderate statistically significant improvements in usability scores (+10.9, p <.001). Post-survey responses (n = 46) indicated favorable utility with improved perceptions of efficiency, documentation quality, and ease of use.In one of the first pilot implementations of ambient AI scribe technology, improvements in physician task load, burnout, and usability were demonstrated.Ambient AI scribes like DAX Copilot may enhance clinical workflows. Further research is needed to optimize widespread implementation and evaluate long-term impacts.

    View details for DOI 10.1093/jamia/ocae295

    View details for PubMedID 39657021

  • Perspectives on Artificial Intelligence-Generated Responses to Patient Messages. JAMA network open Kim, J., Chen, M. L., Rezaei, S. J., Liang, A. S., Seav, S. M., Onyeka, S., Lee, J. J., Vedak, S. C., Mui, D., Lal, R. A., Pfeffer, M. A., Sharp, C., Pageler, N. M., Asch, S. M., Linos, E. 2024; 7 (10): e2438535

    View details for DOI 10.1001/jamanetworkopen.2024.38535

    View details for PubMedID 39412810

  • Using a Large Language Model to Identify Adolescent Patient Portal Account Access by Guardians. JAMA network open Liang, A. S., Vedak, S., Dussaq, A., Yao, D. H., Morse, K., Ip, W., Pageler, N. M. 2024; 7 (6): e2418454

    View details for DOI 10.1001/jamanetworkopen.2024.18454

    View details for PubMedID 38916895

  • Transcription-independent TFIIIC-bound sites cluster near heterochromatin boundaries within lamina-associated domains in C. elegans. Epigenetics & chromatin Stutzman, A. V., Liang, A. S., Beilinson, V., Ikegami, K. 2020; 13 (1): 1

    Abstract

    Chromatin organization is central to precise control of gene expression. In various eukaryotic species, domains of pervasive cis-chromatin interactions demarcate functional domains of the genomes. In nematode Caenorhabditis elegans, however, pervasive chromatin contact domains are limited to the dosage-compensated sex chromosome, leaving the principle of C. elegans chromatin organization unclear. Transcription factor III C (TFIIIC) is a basal transcription factor complex for RNA polymerase III, and is implicated in chromatin organization. TFIIIC binding without RNA polymerase III co-occupancy, referred to as extra-TFIIIC binding, has been implicated in insulating active and inactive chromatin domains in yeasts, flies, and mammalian cells. Whether extra-TFIIIC sites are present and contribute to chromatin organization in C. elegans remains unknown.We identified 504 TFIIIC-bound sites absent of RNA polymerase III and TATA-binding protein co-occupancy characteristic of extra-TFIIIC sites in C. elegans embryos. Extra-TFIIIC sites constituted half of all identified TFIIIC binding sites in the genome. Extra-TFIIIC sites formed dense clusters in cis. The clusters of extra-TFIIIC sites were highly over-represented within the distal arm domains of the autosomes that presented a high level of heterochromatin-associated histone H3K9 trimethylation (H3K9me3). Furthermore, extra-TFIIIC clusters were embedded in the lamina-associated domains. Despite the heterochromatin environment of extra-TFIIIC sites, the individual clusters of extra-TFIIIC sites were devoid of and resided near the individual H3K9me3-marked regions.Clusters of extra-TFIIIC sites were pervasive in the arm domains of C. elegans autosomes, near the outer boundaries of H3K9me3-marked regions. Given the reported activity of extra-TFIIIC sites in heterochromatin insulation in yeasts, our observation raised the possibility that TFIIIC may also demarcate heterochromatin in C. elegans.

    View details for DOI 10.1186/s13072-019-0325-2

    View details for PubMedID 31918747

    View details for PubMedCentralID PMC6950938

  • Large Scale Semi-Automated Labeling of Routine Free-Text Clinical Records for Deep Learning. Journal of digital imaging Trivedi, H. M., Panahiazar, M., Liang, A., Lituiev, D., Chang, P., Sohn, J. H., Chen, Y. Y., Franc, B. L., Joe, B., Hadley, D. 2019; 32 (1): 30-37

    Abstract

    Breast cancer is a leading cause of cancer death among women in the USA. Screening mammography is effective in reducing mortality, but has a high rate of unnecessary recalls and biopsies. While deep learning can be applied to mammography, large-scale labeled datasets, which are difficult to obtain, are required. We aim to remove many barriers of dataset development by automatically harvesting data from existing clinical records using a hybrid framework combining traditional NLP and IBM Watson. An expert reviewer manually annotated 3521 breast pathology reports with one of four outcomes: left positive, right positive, bilateral positive, negative. Traditional NLP techniques using seven different machine learning classifiers were compared to IBM Watson's automated natural language classifier. Techniques were evaluated using precision, recall, and F-measure. Logistic regression outperformed all other traditional machine learning classifiers and was used for subsequent comparisons. Both traditional NLP and Watson's NLC performed well for cases under 1024 characters with weighted average F-measures above 0.96 across all classes. Performance of traditional NLP was lower for cases over 1024 characters with an F-measure of 0.83. We demonstrate a hybrid framework using traditional NLP techniques combined with IBM Watson to annotate over 10,000 breast pathology reports for development of a large-scale database to be used for deep learning in mammography. Our work shows that traditional NLP and IBM Watson perform extremely well for cases under 1024 characters and can accelerate the rate of data annotation.

    View details for DOI 10.1007/s10278-018-0105-8

    View details for PubMedID 30128778

    View details for PubMedCentralID PMC6382632

  • Development and Validation of an Electronic Health Record-Based Machine Learning Model to Estimate Delirium Risk in Newly Hospitalized Patients Without Known Cognitive Impairment. JAMA network open Wong, A., Young, A. T., Liang, A. S., Gonzales, R., Douglas, V. C., Hadley, D. 2018; 1 (4): e181018

    Abstract

    Current methods for identifying hospitalized patients at increased risk of delirium require nurse-administered questionnaires with moderate accuracy.To develop and validate a machine learning model that predicts incident delirium risk based on electronic health data available on admission.Retrospective cohort study evaluating 5 machine learning algorithms to predict delirium using 796 clinical variables identified by an expert panel as relevant to delirium prediction and consistently available in electronic health records within 24 hours of admission. The training set comprised 14 227 adult patients with non-intensive care unit hospital stays and no delirium on admission who were discharged between January 1, 2016, and August 31, 2017, from UCSF Health, a large academic health institution. The test set comprised 3996 patients with hospital stays who were discharged between August 1, 2017, and November 30, 2017.Patient demographic characteristics, diagnoses, nursing records, laboratory results, and medications available in electronic health records during hospitalization.Delirium was defined as a positive Nursing Delirium Screening Scale or Confusion Assessment Method for the Intensive Care Unit score. Models were assessed using the area under the receiver operating characteristic curve (AUC) and compared against the 4-point scoring system AWOL (age >79 years, failure to spell world backward, disorientation to place, and higher nurse-rated illness severity), a validated delirium risk-assessment tool routinely administered in this cohort.The training set included 14 227 patients (5113 [35.9%] aged >64 years; 7335 [51.6%] female; 687 [4.8%] with delirium), and the test set included 3996 patients (1491 [37.3%] aged >64 years; 1966 [49.2%] female; 191 [4.8%] with delirium). In total, the analysis included 18 223 hospital admissions (6604 [36.2%] aged >64 years; 9301 [51.0%] female; 878 [4.8%] with delirium). The AWOL system achieved a baseline AUC of 0.678. The gradient boosting machine model performed best, with an AUC of 0.855. Setting specificity at 90%, the model had a 59.7% (95% CI, 52.4%-66.7%) sensitivity, 23.1% (95% CI, 20.5%-25.9%) positive predictive value, 97.8% (95% CI, 97.4%-98.1%) negative predictive value, and a number needed to screen of 4.8. Penalized logistic regression and random forest models also performed well, with AUCs of 0.854 and 0.848, respectively.Machine learning can be used to estimate hospital-acquired delirium risk using electronic health record data available within 24 hours of hospital admission. Such a model may allow more precise targeting of delirium prevention resources without increasing the burden on health care professionals.

    View details for DOI 10.1001/jamanetworkopen.2018.1018

    View details for PubMedID 30646095

    View details for PubMedCentralID PMC6324291