Akshay Paruchuri
Postdoctoral Scholar, Psychiatry
AI4ALL Graduate Mentor, Stanford Pre-Collegiate Studies
Bio
I'm a postdoctoral scholar at Stanford University, advised by Professor Ehsan Adeli. I'm affiliated with the Stanford Translational AI (STAI) Lab and the Stanford Vision and Learning (SVL) Lab. I earned my PhD in computer science at UNC Chapel Hill under the advisement of Professor Henry Fuchs. I build and evaluate multimodal AI systems, from general-purpose methods for interactive computing to applications in healthcare. Currently, I'm working toward a future where multimodal AI can safely and reliably integrate into healthcare systems in order to improve the entire patient journey, from advanced diagnostic imaging and surgical support to all-day health monitoring and management, with the aim to achieve better therapeutic outcomes for cancer and aging-related diseases. I'm generally interested in opportunities that would allow me to continue to deepen my research expertise while leading and working on projects that benefit people everywhere, whether through foundational research, real-world products, or shaping how these systems are evaluated and deployed.
Previously, I was a visiting researcher at IDSIA USI-SUPSI working with Professor Piotr Didyk on the interpretability of multimodal language models (MLMs) with respect to capabilities such as visual perception. I've published in leading venues on topics such as remote health sensing (WACV, NeurIPS), 3D reconstruction (ECCV, MICCAI), LLM-based conversational agents for personal health (EMNLP, Nature Communications), and energy-efficient operation of smart glasses (ISMAR). I've done internships at Google AR/VR, Google Consumer Health Research, and Kitware.
Professional Education
-
Ph.D., University of North Carolina at Chapel Hill, Computer Science (2025)
All Publications
-
Transforming wearable data into personal health insights using large language model agents
NATURE COMMUNICATIONS
2026; 17 (1): 1143
Abstract
Deriving personalized insights from popular wearable trackers requires complex numerical reasoning that challenges standard LLMs, necessitating tool-based approaches like code generation. Large language model (LLM) agents present a promising yet largely untapped solution for this analysis at scale. We introduce the Personal Health Insights Agent (PHIA), a system leveraging multistep reasoning with code generation and information retrieval to analyze and interpret behavioral health data. To test its capabilities, we create and share two benchmark datasets with over 4000 health insights questions. A 650-hour human expert evaluation shows that PHIA significantly outperforms a strong code generation baseline, achieving 84% accuracy on objective, numerical questions and, for open-ended ones, earning 83% favorable ratings while being twice as likely to achieve the highest quality rating. This work can advance behavioral health by empowering individuals to understand their data, enabling a new era of accessible, personalized, and data-driven wellness for the wider population.
View details for DOI 10.1038/s41467-025-67922-y
View details for Web of Science ID 001674380300001
View details for PubMedID 41526380
View details for PubMedCentralID PMC12855967
-
EgoTrigger: Toward Audio-Driven Image Capture for Human Memory Enhancement in All-Day Energy-Efficient Smart Glasses
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
2025; 31 (11): 9720-9729
Abstract
All-day smart glasses are likely to emerge as platforms capable of continuous contextual sensing, uniquely positioning them for unprecedented assistance in our daily lives. Integrating the multi-modal AI agents required for human memory enhancement while performing continuous sensing, however, presents a major energy efficiency challenge for all-day usage. Achieving this balance requires intelligent, context-aware sensor management. Our approach, EgoTrigger, leverages audio cues from the microphone to selectively activate power-intensive cameras, enabling efficient sensing while preserving substantial utility for human memory enhancement. EgoTrigger uses a lightweight audio model (YAMNet) and a custom classification head to trigger image capture from hand-object interaction (HOI) audio cues, such as the sound of a drawer opening or a medication bottle being opened. In addition to evaluating on the QA-Ego4D dataset, we introduce and evaluate on the Human Memory Enhancement Question-Answer (HME-QA) dataset. Our dataset contains 340 human-annotated first-person QA pairs from full-length Ego4D videos that were curated to ensure that they contained audio, focusing on HOI moments critical for contextual understanding and memory. Our results show EgoTrigger can use 54% fewer frames on average, significantly saving energy in both power-hungry sensing components (e.g., cameras) and downstream operations (e.g., wireless transmission), while achieving comparable performance on datasets for an episodic memory task. We believe this context-aware triggering strategy represents a promising direction for enabling energy-efficient, functional smart glasses capable of all-day use - supporting applications like helping users recall where they placed their keys or information about their routine activities (e.g., taking medications).
View details for DOI 10.1109/TVCG.2025.3616866
View details for Web of Science ID 001616171800014
View details for PubMedID 41056161
-
Structure-Preserving Image Translation for Depth Estimation in Colonoscopy
edited by Linguraru, M. G., Dou, Q., Feragen, A., Giannarou, S., Glocker, B., Lekadir, K., Schnabel, J. A.
SPRINGER INTERNATIONAL PUBLISHING AG. 2024: 667-677
View details for DOI 10.1007/978-3-031-72120-5_62
View details for Web of Science ID 001342238400062
https://orcid.org/0000-0003-4664-3186