Malvika Pillai's Profile | Stanford Profiles

Bio

Malvika Pillai is a postdoctoral research fellow in the VA Big Data Scientist Training Enhancement Program (BD-STEP), jointly in Stanford University in Medicine (Biomedical Informatics) in the Boussard Lab and VA Palo Alto. She received her BS in Quantitative Biology and PhD in Health Informatics from the University of North Carolina at Chapel Hill. Her current work focuses on developing, evaluating, and implementing fair artificial intelligence/machine learning (AI/ML) models that can lead to high-quality, patient-centered care.

Professional Education

PhD, University of North Carolina at Chapel Hill, North Carolina, Health Informatics (2022)
BS, University of North Carolina at Chapel Hill, North Carolina, Quantitative Biology (2017)

Stanford Advisors

Tina Hernandez-Boussard, Postdoctoral Faculty Sponsor

Contact

Academic
mpillai@stanford.edu

University - Scholar Department: Department of Biomedical Data Science Position: Postdoctoral Scholar

Additional Info

Mail Code: 5479
ORCID:
https://orcid.org/0000-0001-8739-189X

All Publications

Applying Large Language Models to Assess Quality of Care: Monitoring ADHD Medication Side Effects. Pediatrics Bannett, Y., Gunturkun, F., Pillai, M., Herrmann, J. E., Luo, I., Huffman, L. C., Feldman, H. M. 2025; 155 (1)

Abstract

To assess the accuracy of a large language model (LLM) in measuring clinician adherence to practice guidelines for monitoring side effects after prescribing medications for children with attention-deficit/hyperactivity disorder (ADHD).Retrospective population-based cohort study of electronic health records. Cohort included children aged 6 to 11 years with ADHD diagnosis and 2 or more ADHD medication encounters (stimulants or nonstimulants prescribed) between 2015 and 2022 in a community-based primary health care network (n = 1201). To identify documentation of side effects inquiry, we trained, tested, and deployed an open-source LLM (LLaMA) on all clinical notes from ADHD-related encounters (ADHD diagnosis or ADHD medication prescription), including in-clinic/telehealth and telephone encounters (n = 15 628 notes). Model performance was assessed using holdout and deployment test sets, compared with manual medical record review.The LLaMA model accurately classified notes that contained side effects inquiry (sensitivity = 87.2, specificity = 86.3, area under curve = 0.93 on holdout test set). Analyses revealed no model bias in relation to patient sex or insurance. Mean age (SD) at first prescription was 8.8 (1.6) years; characteristics were mostly similar across patients with and without documented side effects inquiry. Rates of documented side effects inquiry were lower for telephone encounters than for in-clinic/telehealth encounters (51.9% vs 73.0%, P < .001). Side effects inquiry was documented in 61.4% of encounters after stimulant prescriptions and 48.5% of encounters after nonstimulant prescriptions (P = .041).Deploying an LLM on a variable set of clinical notes, including telephone notes, offered scalable measurement of quality of care and uncovered opportunities to improve psychopharmacological medication management in primary care.

View details for DOI 10.1542/peds.2024-067223

View details for PubMedID 39701141
Scaling equitable artificial intelligence in healthcare with machine learning operations. BMJ health & care informatics Ng, M. Y., Youssef, A., Pillai, M., Shah, V., Hernandez-Boussard, T. 2024; 31 (1)

View details for DOI 10.1136/bmjhci-2024-101101

View details for PubMedID 39496359
Improving postsurgical fall detection for older Americans using LLM-driven analysis of clinical narratives. medRxiv : the preprint server for health sciences Pillai, M., Blumke, T. L., Studnia, J., Wang, Y., Veigulis, Z. P., Ware, A. D., Hoover, P. J., Carroll, I. R., Humphreys, K., Osborne, T. F., Asch, S. M., Hernandez-Boussard, T., Curtin, C. M. 2024

Abstract

Postsurgical falls have significant patient and societal implications but remain challenging to identify and track. Detecting postsurgical falls is crucial to improve patient care for older adults and reduce healthcare costs. Large language models (LLMs) offer a promising solution for reliable and automated fall detection using unstructured data in clinical notes. We tested several LLM prompting approaches to postsurgical fall detection in two different healthcare systems with three open-source LLMs. The Mixtral-8*7B zero-shot had the best performance at Stanford Health Care (PPV = 0.81, recall = 0.67) and the Veterans Health Administration (PPV = 0.93, recall = 0.94). These results demonstrate that LLMs can detect falls with little to no guidance and lay groundwork for applications of LLMs in fall prediction and prevention across many different settings.

View details for DOI 10.1101/2024.06.25.24309480

View details for PubMedID 38978655
Using an explainable machine learning approach to prioritize factors contributing to healthcare professionals' burnout JOURNAL OF INTELLIGENT INFORMATION SYSTEMS Pillai, M., Liu, C., Kwong, E., Kratzke, I., Charguia, N., Mazur, L., Adapa, K. 2024

View details for DOI 10.1007/s10844-024-00862-z

View details for Web of Science ID 001244593500001
Leveraging a Large Language Model to Assess Quality-of-Care: Monitoring ADHD Medication Side Effects. medRxiv : the preprint server for health sciences Bannett, Y., Gunturkun, F., Pillai, M., Herrmann, J. E., Luo, I., Huffman, L. C., Feldman, H. M. 2024

Abstract

To assess the accuracy of a large language model (LLM) in measuring clinician adherence to practice guidelines for monitoring side effects after prescribing medications for children with attention-deficit/hyperactivity disorder (ADHD).Retrospective population-based cohort study of electronic health records. Cohort included children aged 6-11 years with ADHD diagnosis and ≥2 ADHD medication encounters (stimulants or non-stimulants prescribed) between 2015-2022 in a community-based primary healthcare network (n=1247). To identify documentation of side effects inquiry, we trained, tested, and deployed an open-source LLM (LLaMA) on all clinical notes from ADHD-related encounters (ADHD diagnosis or ADHD medication prescription), including in-clinic/telehealth and telephone encounters (n=15,593 notes). Model performance was assessed using holdout and deployment test sets, compared to manual chart review.The LLaMA model achieved excellent performance in classifying notes that contain side effects inquiry (sensitivity= 87.2%, specificity=86.3/90.3%, area under curve (AUC)=0.93/0.92 on holdout/deployment test sets). Analyses revealed no model bias in relation to patient age, sex, or insurance. Mean age (SD) at first prescription was 8.8 (1.6) years; patient characteristics were similar across patients with and without documented side effects inquiry. Rates of documented side effects inquiry were lower in telephone encounters than in-clinic/telehealth encounters (51.9% vs. 73.0%, p<0.01). Side effects inquiry was documented in 61% of encounters following stimulant prescriptions and 48% of encounters following non-stimulant prescriptions (p<0.01).Deploying an LLM on a variable set of clinical notes, including telephone notes, offered scalable measurement of quality-of-care and uncovered opportunities to improve psychopharmacological medication management in primary care.

View details for DOI 10.1101/2024.04.23.24306225

View details for PubMedID 38712037

View details for PubMedCentralID PMC11071552
Patient Characteristics Associated With Phone and Video Visits at a Tele-Urgent Care Center During the Initial COVID-19 Response: Cross-Sectional Study. Online journal of public health informatics Khairat, S., John, R., Pillai, M., McDaniel, P., Edson, B. 2024; 16: e50962

Abstract

BACKGROUND: Health systems rapidly adopted telemedicine as an alternative health care delivery modality in response to the COVID-19 pandemic. Demographic factors, such as age and gender, may play a role in patients' choice of a phone or video visit. However, it is unknown whether there are differences in utilization between phone and video visits.OBJECTIVE: This study aimed to investigate patients' characteristics, patient utilization, and service characteristics of a tele-urgent care clinic during the initial response to the pandemic.METHODS: We conducted a cross-sectional study of urgent care patients using a statewide, on-demand telemedicine clinic with board-certified physicians during the initial phases of the pandemic. The study data were collected from March 3, 2020, through May 3, 2020.RESULTS: Of 1803 telemedicine visits, 1278 (70.9%) patients were women, 730 (40.5%) were aged 18 to 34 years, and 1423 (78.9%) were uninsured. There were significant differences between telemedicine modalities and gender (P<.001), age (P<.001), insurance status (P<.001), prescriptions given (P<.001), and wait times (P<.001). Phone visits provided significantly more access to rural areas than video visits (P<.001).CONCLUSIONS: Our findings suggest that offering patients a combination of phone and video options provided additional flexibility for various patient subgroups, particularly patients living in rural regions with limited internet bandwidth. Differences in utilization were significant based on patient gender, age, and insurance status. We also found differences in prescription administration between phone and video visits that require additional investigation.

View details for DOI 10.2196/50962

View details for PubMedID 38241073
Measuring quality-of-care in treatment of young children with attention-deficit/hyperactivity disorder using pre-trained language models. Journal of the American Medical Informatics Association : JAMIA Pillai, M., Posada, J., Gardner, R. M., Hernandez-Boussard, T., Bannett, Y. 2024

Abstract

To measure pediatrician adherence to evidence-based guidelines in the treatment of young children with attention-deficit/hyperactivity disorder (ADHD) in a diverse healthcare system using natural language processing (NLP) techniques.We extracted structured and free-text data from electronic health records (EHRs) of all office visits (2015-2019) of children aged 4-6 years in a community-based primary healthcare network in California, who had ≥1 visits with an ICD-10 diagnosis of ADHD. Two pediatricians annotated clinical notes of the first ADHD visit for 423 patients. Inter-annotator agreement (IAA) was assessed for the recommendation for the first-line behavioral treatment (F-measure = 0.89). Four pre-trained language models, including BioClinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT), were used to identify behavioral treatment recommendations using a 70/30 train/test split. For temporal validation, we deployed BioClinicalBERT on 1,020 unannotated notes from other ADHD visits and well-care visits; all positively classified notes (n = 53) and 5% of negatively classified notes (n = 50) were manually reviewed.Of 423 patients, 313 (74%) were male; 298 (70%) were privately insured; 138 (33%) were White; 61 (14%) were Hispanic. The BioClinicalBERT model trained on the first ADHD visits achieved F1 = 0.76, precision = 0.81, recall = 0.72, and AUC = 0.81 [0.72-0.89]. Temporal validation achieved F1 = 0.77, precision = 0.68, and recall = 0.88. Fairness analysis revealed low model performance in publicly insured patients (F1 = 0.53).Deploying pre-trained language models on a variable set of clinical notes accurately captured pediatrician adherence to guidelines in the treatment of children with ADHD. Validating this approach in other patient populations is needed to achieve equitable measurement of quality of care at scale and improve clinical care for mental health conditions.

View details for DOI 10.1093/jamia/ocae001

View details for PubMedID 38244997
Leveraging Large Language Models to Assess Medication Side Effects Documentation in Children with Attention-Deficit/Hyperactivity Disorder Bannett, Y., Gunturkun, F., Pillai, M., Huffman, L. C., Feldman, H. M. LIPPINCOTT WILLIAMS & WILKINS. 2024: E119

View details for Web of Science ID 001165262900046
Augmenting Quality Assurance Measures in Treatment Review with Machine Learning in Radiation Oncology ADVANCES IN RADIATION ONCOLOGY Pillai, M., Shumway, J. W., Adapa, K., Dooley, J., McGurk, R., Mazur, L. M., Das, S. K., Chera, B. S. 2023; 8 (6): 101234

Abstract

Pretreatment quality assurance (QA) of treatment plans often requires a high cognitive workload and considerable time expenditure. This study explores the use of machine learning to classify pretreatment chart check QA for a given radiation plan as difficult or less difficult, thereby alerting the physicists to increase scrutiny on difficult plans.Pretreatment QA data were collected for 973 cases between July 2018 and October 2020. The outcome variable, a degree of difficulty, was collected as a subjective rating by physicists who performed the pretreatment chart checks. Potential features were identified based on clinical relevance, contribution to plan complexity, and QA metrics. Five machine learning models were developed: support vector machine, random forest classifier, adaboost classifier, decision tree classifier, and neural network. These were incorporated into a voting classifier, where at least 2 algorithms needed to predict a case as difficult for it to be classified as such. Sensitivity analyses were conducted to evaluate feature importance.The voting classifier achieved an overall accuracy of 77.4% on the test set, with 76.5% accuracy on difficult cases and 78.4% accuracy on less difficult cases. Sensitivity analysis showed features associated with plan complexity (number of fractions, dose per monitor unit, number of planning structures, and number of image sets) and clinical relevance (patient age) were sensitive across at least 3 algorithms.This approach can be used to equitably allocate plans to physicists rather than randomly allocate them, potentially improving pretreatment chart check effectiveness by reducing errors propagating downstream.

View details for DOI 10.1016/j.adro.2023.101234

View details for Web of Science ID 001054543200001

View details for PubMedID 37205277

View details for PubMedCentralID PMC10185740
Toward Community-Based Natural Language Processing (CBNLP): Cocreating With Communities. Journal of medical Internet research Pillai, M., Griffin, A. C., Kronk, C. A., McCall, T. 2023; 25: e48498

Abstract

Rapid development and adoption of natural language processing (NLP) techniques has led to a multitude of exciting and innovative societal and health care applications. These advancements have also generated concerns around perpetuation of historical injustices and that these tools lack cultural considerations. While traditional health care NLP techniques typically include clinical subject matter experts to extract health information or aid in interpretation, few NLP tools involve community stakeholders with lived experiences. In this perspective paper, we draw upon the field of community-based participatory research, which gathers input from community members for development of public health interventions, to identify and examine ways to equitably involve communities in developing health care NLP tools. To realize the potential of community-based NLP (CBNLP), research and development teams must thoughtfully consider mechanisms and resources needed to effectively collaborate with community members for maximal societal and ethical impact of NLP-based tools.

View details for DOI 10.2196/48498

View details for PubMedID 37540551
Validation approaches for computational drug repurposing: a review. AMIA ... Annual Symposium proceedings. AMIA Symposium Pillai, M., Wu, D. 2023; 2023: 559-568

View details for PubMedID 38222367

View details for PubMedCentralID PMC10785886
Recommendations for design of a mobile application to support management of anxiety and depression among Black American women. Frontiers in digital health McCall, T., Threats, M., Pillai, M., Lakdawala, A., Bolton, C. S. 2022; 4: 1028408

Abstract

Black American women experience adverse health outcomes due to anxiety and depression. They face systemic barriers to accessing culturally appropriate mental health care leading to the underutilization of mental health services and resources. Mobile technology can be leveraged to increase access to culturally relevant resources, however, the specific needs and preferences that Black women feel are useful in an app to support management of anxiety and depression are rarely reflected in existing digital health tools. This study aims to assess what types of content, features, and important considerations should be included in the design of a mobile app tailored to support management of anxiety and depression among Black women. Focus groups were conducted with 20 women (mean age 36.6 years, SD 17.8 years), with 5 participants per group. Focus groups were led by a moderator, with notetaker present, using an interview guide to discuss topics, such as participants' attitudes and perceptions towards mental health and use of mental health services, and content, features, and concerns for design of a mobile app to support management of anxiety and depression. Descriptive qualitative content analysis was conducted. Recommendations for content were either informational (e.g., information to find a Black woman therapist) or inspirational (e.g., encouraging stories about overcoming adversity). Suggested features allow users to monitor their progress, practice healthy coping techniques, and connect with others. The importance of feeling "a sense of community" was emphasized. Transparency about who created and owns the app, and how users' data will be used and protected was recommended to establish trust. The findings from this study were consistent with previous literature which highlighted the need for educational, psychotherapy, and personal development components for mental health apps. There has been exponential growth in the digital mental health space due to the COVID-19 pandemic; however, a one-size-fits-all approach may lead to more options but continued disparity in receiving mental health care. Designing a mental health app for and with Black women may help to advance digital health equity by providing a tool that addresses their specific needs and preferences, and increase engagement.

View details for DOI 10.3389/fdgth.2022.1028408

View details for PubMedID 36620185

View details for PubMedCentralID PMC9816326
An Interpretable Machine Learning Approach to Prioritizing Factors Contributing to Clinician Burnout Pillai, M., Adapa, K., Foster, M., Kratzke, I., Charguia, N., Mazur, L., Ceci, M., Flesca, S., Masciari, E., Manco, G., Ras, Z. W. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 149-161

View details for DOI 10.1007/978-3-031-16564-1_15

View details for Web of Science ID 000886990100015

Malvika Pillai

Postdoctoral Scholar, Biomedical Informatics

Bio

Professional Education

Stanford Advisors

Contact

Additional Info

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract