Bio
Sanmi Koyejo is an Assistant Professor in the Department of Computer Science at Stanford University and an adjunct Associate Professor at the University of Illinois at Urbana-Champaign. He leads the Stanford Trustworthy AI Research (STAIR) lab, which develops measurement-theoretic foundations for trustworthy AI systems, spanning AI evaluation science, algorithmic accountability, and privacy-preserving machine learning, with applications to healthcare and scientific discovery. His research on AI capabilities evaluation has challenged conventional understanding in the field, including work on measurement frameworks cited in the 2024 Economic Report of the President.
Koyejo has received the Presidential Early Career Award for Scientists and Engineers (PECASE), Skip Ellis Early Career Award, Alfred P. Sloan Research Fellowship, NSF CAREER Award, and multiple outstanding paper awards at flagship venues, including NeurIPS and ACL. He has delivered keynote presentations at major conferences, including ECCV and FAccT. He serves in key leadership roles, including Board President of Black in AI, Board of Directors of the Neural Information Processing Systems Foundation, and other leadership positions in professional organizations advancing AI research and broadening participation in the field.
Academic Appointments
-
Assistant Professor, Computer Science
-
Member, Bio-X
-
Member, Wu Tsai Neurosciences Institute
2025-26 Courses
- AI Measurement Science
CS 321M (Spr) - Governing Artificial Intelligence: Law, Policy, and Institutions
COMM 152A, COMM 252A, CS 283, GLOBAL 245B, INTLPOL 245B (Aut) - Governing Artificial Intelligence: Law, Policy, and Institutions
LAW 4052 (Aut) - Governing Artificial Intelligence: Law, Policy, and Institutions
POLISCI 145B, POLISCI 445B (Aut) - Machine Learning
CS 229, STATS 229 (Win) - Machine Learning from Human Preferences
CS 329H (Aut) -
Independent Studies (18)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390C (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Aut, Win, Spr, Sum) - Independent Study
SYMSYS 196 (Aut, Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr, Sum) - Independent Work
CS 199P (Aut, Win, Spr, Sum) - Master's Research
CME 291 (Win, Spr) - Part-time Curricular Practical Training
CS 390D (Aut, Win, Spr, Sum) - Ph.D. Research
CME 400 (Aut, Win) - Ph.D. Research Rotation
CME 391 (Win) - Senior Honors Tutorial
SYMSYS 190 (Aut, Win, Spr, Sum) - Senior Project
CS 191 (Aut, Win, Spr) - Supervised Undergraduate Research
CS 195 (Aut, Win, Spr, Sum) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2024-25 Courses
- Machine Learning
CS 229, STATS 229 (Win) - Machine Learning from Human Preferences
CS 329H (Aut)
2023-24 Courses
- Artificial Intelligence: Principles and Techniques
CS 221 (Spr) - Machine Learning
CS 229, STATS 229 (Win) - Machine Learning from Human Preferences
CS 329H (Aut)
2022-23 Courses
- Machine Learning
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Edward Chen, Richard Chen -
Postdoctoral Faculty Sponsor
Youssef Allouah, Meena Jagadeesan, Alexander Spangher, Zeyu Tang -
Doctoral Dissertation Advisor (AC)
Steven Dillmann -
Orals Evaluator
Sabri Eyuboglu, Shirley Wu -
Doctoral Dissertation Co-Advisor (AC)
Ahmed Ahmed, Suhana Bedi, Fangrui Huang, Josh Kazdan, Alisa Levin, Kara Liu, Ken Liu, Anka Reuel, Neha Srivathsa, Alyssa Unell, Maya Varma -
Master's Program Advisor
Anthony Argyropoulos, Stefan Ene, Nicolás Kennedy, Sreyana Kukadia, Hoang Nguyen, Isaac Park, Nestor Perez Fernandez, Jacob Rubenstein, Haoyue Xiao, Christine Zhang -
Postdoctoral Research Mentor
Joachim Baumann -
Doctoral (Program)
Nicole Chiou, Natalie Dullerud, Brando Miranda, Zach Robertson, Rylan Schaeffer, Nikil Selvam, Sang Truong, Yibo Zhang
All Publications
-
Holistic evaluation of large language models for medical tasks with MedHELM.
Nature medicine
2026
Abstract
While large language models (LLMs) achieve near-perfect scores on medical licensing exams, these evaluations inadequately reflect the complexity and diversity of real-world clinical practice. Here we introduce MedHELM, an extensible evaluation framework with three contributions. First, a clinician-validated taxonomy organizing medical AI applications into five categories that mirror real clinical tasks-clinical decision support (diagnostic decisions, treatment planning), clinical note generation (visit documentation, procedure reports), patient communication (education materials, care instructions), medical research (literature analysis, clinical data analysis) and administration (scheduling, workflow coordination). These encompass 22 subcategories and 121 specific tasks reflecting daily medical practice. Second, a comprehensive benchmark suite of 37 evaluations covering all subcategories. Third, systematic comparison of nine frontier LLMs-Claude 3.5 Sonnet, Claude 3.7 Sonnet, DeepSeek R1, Gemini 1.5 Pro, Gemini 2.0 Flash, GPT-4o, GPT-4o mini, Llama 3.3 and o3-mini-using an automated LLM-jury evaluation method. Our LLM-jury uses multiple AI evaluators to assess model outputs against expert-defined criteria. Advanced reasoning models (DeepSeek R1, o3-mini) demonstrated superior performance with win rates of 66%, although Claude 3.5 Sonnet achieved comparable results at 15% lower computational cost. These results not only highlight current model capabilities but also demonstrate how MedHELM could enable evidence-based selection of medical AI systems for healthcare applications.
View details for DOI 10.1038/s41591-025-04151-2
View details for PubMedID 41559415
View details for PubMedCentralID 10916499
-
Shaping AI's Impact on Billions of Lives
COMMUNICATIONS OF THE ACM
2026; 69 (1): 54-65
View details for DOI 10.1145/3746132
View details for Web of Science ID 001650706200006
-
The inadequacy of offline large language model evaluations: A need to account for personalization in model behavior.
Patterns (New York, N.Y.)
2025; 6 (12): 101397
Abstract
Standard offline evaluations for language models fail to capture how these models actually behave in practice, where personalization fundamentally alters model behavior. In this work, we provide empirical evidence showcasing this phenomenon by comparing offline evaluations to field evaluations conducted by having 800 real users of ChatGPT and Gemini pose benchmark and other questions to their chat interfaces.
View details for DOI 10.1016/j.patter.2025.101397
View details for PubMedID 41472831
View details for PubMedCentralID PMC12745978
-
TIMER: temporal instruction modeling and evaluation for longitudinal clinical records.
NPJ digital medicine
2025; 8 (1): 577
Abstract
Electronic health records (EHRs) contain rich longitudinal information for clinical decision-making, yet LLMs struggle to reason across patient timelines. We introduce TIMER (Temporal Instruction Modeling and Evaluation for Longitudinal Clinical Records), a method to improve LLMs' temporal reasoning over multi-visit EHRs through time-aware instruction tuning. TIMER grounds LLMs in patient-specific temporal contexts by linking each instruction-response pair to specific timestamps, ensuring temporal fidelity throughout the training process. Evaluations show that TIMER-tuned models outperform conventional medical instruction-tuned approaches by 6.6% in completeness on clinician-curated benchmarks, with distribution-matched training demonstrating advantages up to 6.5% in temporal reasoning. Qualitative analyses reveal that using TIMER enhances temporal boundary adherence, trend detection, and chronological precision, necessary for applications such as disease trajectory modeling and treatment response monitoring. Overall, TIMER provides a methodological basis for developing LLMs that can effectively engage with the inherently longitudinal nature of data for patient care. Code is available at TIMER .
View details for DOI 10.1038/s41746-025-01965-9
View details for PubMedID 41006898
View details for PubMedCentralID PMC12475073
-
Advancing science- and evidence-based AI policy.
Science (New York, N.Y.)
2025; 389 (6759): 459-461
Abstract
Policy must be informed by, but also facilitate the generation of, scientific evidence.
View details for DOI 10.1126/science.adu8449
View details for PubMedID 40743343
-
Rethinking machine unlearning for large language models
NATURE MACHINE INTELLIGENCE
2025
View details for DOI 10.1038/s42256-025-00985-0
View details for Web of Science ID 001423011100001
-
Testing and Evaluation of Health Care Applications of Large Language Models: A Systematic Review.
JAMA
2024
Abstract
Large language models (LLMs) can assist in various health care activities, but current evaluation approaches may not adequately identify the most useful application areas.To summarize existing evaluations of LLMs in health care in terms of 5 components: (1) evaluation data type, (2) health care task, (3) natural language processing (NLP) and natural language understanding (NLU) tasks, (4) dimension of evaluation, and (5) medical specialty.A systematic search of PubMed and Web of Science was performed for studies published between January 1, 2022, and February 19, 2024.Studies evaluating 1 or more LLMs in health care.Three independent reviewers categorized studies via keyword searches based on the data used, the health care tasks, the NLP and NLU tasks, the dimensions of evaluation, and the medical specialty.Of 519 studies reviewed, published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. The most common health care tasks were assessing medical knowledge such as answering medical licensing examination questions (44.5%) and making diagnoses (19.5%). Administrative tasks such as assigning billing codes (0.2%) and writing prescriptions (0.2%) were less studied. For NLP and NLU tasks, most studies focused on question answering (84.2%), while tasks such as summarization (8.9%) and conversational dialogue (3.3%) were infrequent. Almost all studies (95.4%) used accuracy as the primary dimension of evaluation; fairness, bias, and toxicity (15.8%), deployment considerations (4.6%), and calibration and uncertainty (1.2%) were infrequently measured. Finally, in terms of medical specialty area, most studies were in generic health care applications (25.6%), internal medicine (16.4%), surgery (11.4%), and ophthalmology (6.9%), with nuclear medicine (0.6%), physical medicine (0.4%), and medical genetics (0.2%) being the least represented.Existing evaluations of LLMs mostly focus on accuracy of question answering for medical examinations, without consideration of real patient care data. Dimensions such as fairness, bias, and toxicity and deployment considerations received limited attention. Future evaluations should adopt standardized applications and metrics, use clinical data, and broaden focus to include a wider range of tasks and specialties.
View details for DOI 10.1001/jama.2024.21700
View details for PubMedID 39405325
View details for PubMedCentralID PMC11480901
-
Crossing Linguistic Horizons: Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models
edited by Duh, K., Gomez, H., Bethard, S.
ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2024: 2849-2900
View details for Web of Science ID 001511618600182
-
The inadequacy of offline large language model evaluations: A need to account for personalization in model behavior
PATTERNS
2025; 6 (12)
View details for DOI 10.1016/j.patter.2025.101397
View details for Web of Science ID 001641944700001
-
Evaluating anti-LGBTQIA+ medical bias in large language models.
PLOS digital health
2025; 4 (9): e0001001
Abstract
Large Language Models (LLMs) are increasingly deployed in clinical settings for tasks ranging from patient communication to decision support. While these models demonstrate race-based and binary gender biases, anti-LGBTQIA+ bias remains understudied despite documented healthcare disparities affecting these populations. In this work, we evaluated the potential of LLMs to propagate anti-LGBTQIA+ medical bias and misinformation. We prompted 4 LLMs (Gemini 1.5 Flash, Claude 3 Haiku, GPT-4o, Stanford Medicine Secure GPT [GPT-4.0]) with 38 prompts consisting of explicit questions and synthetic clinical notes created by medically-trained reviewers and LGBTQIA+ health experts. The prompts consisted of pairs of prompts with and without LGBTQIA+ identity terms and explored clinical situations across two axes: (i) situations where historical bias has been observed versus not observed, and (ii) situations where LGBTQIA+ identity is relevant to clinical care versus not relevant. Medically-trained reviewers evaluated LLM responses for appropriateness (safety, privacy, hallucination/accuracy, and bias) and clinical utility. We found that all 4 LLMs generated inappropriate responses for prompts with and without LGBTQIA+ identity terms. The proportion of inappropriate responses ranged from 43-62% for prompts mentioning LGBTQIA+ identities versus 47-65% for those without. The most common reason for inappropriate classification tended to be hallucination/accuracy, followed by bias or safety. Qualitatively, we observed differential bias patterns, with LGBTQIA+ prompts eliciting more severe bias. Average clinical utility score for inappropriate responses was lower than for appropriate responses (2.6 versus 3.7 on a 5-point Likert scale). Future work should focus on tailoring output formats to stated use cases, decreasing sycophancy and reliance on extraneous information in the prompt, and improving accuracy and decreasing bias for LGBTQIA+ patients. We present our prompts and annotated responses as a benchmark for evaluation of future models. Content warning: This paper includes prompts and model-generated responses that may be offensive.
View details for DOI 10.1371/journal.pdig.0001001
View details for PubMedID 40920790
-
Fidelity of Medical Reasoning in Large Language Models.
JAMA network open
2025; 8 (8): e2526021
View details for DOI 10.1001/jamanetworkopen.2025.26021
View details for PubMedID 40779272
-
Advancing oil and gas emissions assessment through large language model data extraction
ENERGY AND AI
2025; 20
View details for DOI 10.1016/j.egyai.2025.100481
View details for Web of Science ID 001437342500001
-
The Reality of AI and Biorisk
ASSOC COMPUTING MACHINERY. 2025: 763-771
View details for DOI 10.1145/3715275.3732048
View details for Web of Science ID 001543679300045
-
More than Marketing? On the Information Value of AI Benchmarks for Practitioners
ASSOC COMPUTING MACHINERY. 2025: 1032-1047
View details for DOI 10.1145/3708359.3712152
View details for Web of Science ID 001477132000061
-
Fairness through Difference Awareness: Measuring <i>Desired</i> Group Discrimination in LLMs
edited by Che, W., Nabende, J., Shutova, E., Pilehvar, M. T.
ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2025: 6867-6893
View details for Web of Science ID 001611615400020
-
Publisher Correction: Increasing the presence of BIPOC researchers in computational science.
Nature computational science
2024
View details for DOI 10.1038/s43588-024-00710-8
View details for PubMedID 39354103
-
Increasing the presence of BIPOC researchers in computational science.
Nature computational science
2024; 4 (9): 646-653
View details for DOI 10.1038/s43588-024-00693-6
View details for PubMedID 39317763
-
Artificial Intelligence, Social Responsibility, and the Roles of the University
COMMUNICATIONS OF THE ACM
2024; 67 (8): 22-25
View details for DOI 10.1145/3640541
View details for Web of Science ID 001293981400007
-
Single-Trial Detection and Classification of Event-Related Optical Signals for a Brain-Computer Interface Application.
Bioengineering (Basel, Switzerland)
2024; 11 (8)
Abstract
Event-related optical signals (EROS) measure fast modulations in the brain's optical properties related to neuronal activity. EROS offer a high spatial and temporal resolution and can be used for brain-computer interface (BCI) applications. However, the ability to classify single-trial EROS remains unexplored. This study evaluates the performance of neural network methods for single-trial classification of motor response-related EROS. EROS activity was obtained from a high-density recording montage covering the motor cortex during a two-choice reaction time task involving responses with the left or right hand. This study utilized a convolutional neural network (CNN) approach to extract spatiotemporal features from EROS data and perform classification of left and right motor responses. Subject-specific classifiers trained on EROS phase data outperformed those trained on intensity data, reaching an average single-trial classification accuracy of around 63%. Removing low-frequency noise from intensity data is critical for achieving discriminative classification results with this measure. Our results indicate that deep learning with high-spatial-resolution signals, such as EROS, can be successfully applied to single-trial classifications.
View details for DOI 10.3390/bioengineering11080781
View details for PubMedID 39199739
View details for PubMedCentralID PMC11351476
-
Bridging gaps in automated acute myocardial infarction detection between high-income and low-income countries
PLOS GLOBAL PUBLIC HEALTH
2024; 4 (6): e0003240
View details for DOI 10.1371/journalpgph.0003240
View details for Web of Science ID 001418792700001
View details for PubMedID 38941326
-
Bridging gaps in automated acute myocardial infarction detection between high-income and low-income countries.
PLOS global public health
2024; 4 (6): e0003240
View details for DOI 10.1371/journal.pgph.0003240
View details for PubMedID 38941326
-
Author Correction: Opportunistic detection of type 2 diabetes using deep learning from frontal chest radiographs.
Nature communications
2024; 15 (1): 4817
View details for DOI 10.1038/s41467-024-49184-2
View details for PubMedID 38844459
View details for PubMedCentralID PMC11156917
-
Impact of biased models in the context of fairness towards patients, and how to avoid or minimise biases in our datasets
ELSEVIER IRELAND LTD. 2024: S46
View details for Web of Science ID 001331355600057
-
Latent Multimodal Functional Graphical Model Estimation
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
2024; 119 (547): 2217-2229
View details for DOI 10.1080/01621459.2023.2252142
View details for Web of Science ID 001201774100001
-
The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa
ASSOC COMPUTING MACHINERY. 2024
View details for DOI 10.1145/3689904.3694708
View details for Web of Science ID 001537950100010
-
Adaptive Compression in Federated Learning via Side Information
edited by Dasgupta, S., Mandt, S., Li, Y.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024
View details for Web of Science ID 001221034001020
-
Invariant Aggregator for Defending against Federated Backdoor Attacks
edited by Dasgupta, S., Mandt, S., Li, Y.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024
View details for Web of Science ID 001286500300031
-
Proxy Methods for Domain Adaptation
edited by Dasgupta, S., Mandt, S., Li, Y.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024
View details for Web of Science ID 001286500304014
-
Causally Inspired Regularization Enables Domain General Representations
edited by Dasgupta, S., Mandt, S., Li, Y.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024
View details for Web of Science ID 001286500302003
-
Towards Trustworthy Large Language Models
ASSOC COMPUTING MACHINERY. 2024: 1126-1127
View details for DOI 10.1145/3616855.3636454
View details for Web of Science ID 001182230100136
-
Bayesian Optimization for Crop Genetics with Scalable Probabilistic Models
edited by Antoran, J., Naesseth, C. A.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024: 30-44
View details for Web of Science ID 001347127100002
-
Disentangling Fact from Grid Cell Fiction in Trained Deep Path Integrators.
ArXiv
2023
Abstract
Work on deep learning-based models of grid cells suggests that grid cells generically and robustly arise from optimizing networks to path integrate, i.e., track one's spatial position by integrating self-velocity signals. In previous work [27], we challenged this path integration hypothesis by showing that deep neural networks trained to path integrate almost always do so, but almost never learn grid-like tuning unless separately inserted by researchers via mechanisms unrelated to path integration. In this work, we restate the key evidence substantiating these insights, then address a response to [27] by authors of one of the path integration hypothesis papers [32]. First, we show that the response misinterprets our work, indirectly confirming our points. Second, we evaluate the response's preferred "unified theory for the origin of grid cells" in trained deep path integrators [31, 33, 34] and show that it is at best "occasionally suggestive," not exact or comprehensive. We finish by considering why assessing model quality through prediction of biological neural activity by regression of activity in deep networks [23] can lead to the wrong conclusions.
View details for PubMedID 38106458
-
Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons
JOURNAL OF MEDICAL IMAGING
2023; 10 (6): 61105
Abstract
The Medical Imaging and Data Resource Center (MIDRC) open data commons was launched to accelerate the development of artificial intelligence (AI) algorithms to help address the COVID-19 pandemic. The purpose of this study was to quantify longitudinal representativeness of the demographic characteristics of the primary MIDRC dataset compared to the United States general population (US Census) and COVID-19 positive case counts from the Centers for Disease Control and Prevention (CDC).The Jensen-Shannon distance (JSD), a measure of similarity of two distributions, was used to longitudinally measure the representativeness of the distribution of (1) all unique patients in the MIDRC data to the 2020 US Census and (2) all unique COVID-19 positive patients in the MIDRC data to the case counts reported by the CDC. The distributions were evaluated in the demographic categories of age at index, sex, race, ethnicity, and the combination of race and ethnicity.Representativeness of the MIDRC data by ethnicity and the combination of race and ethnicity was impacted by the percentage of CDC case counts for which this was not reported. The distributions by sex and race have retained their level of representativeness over time.The representativeness of the open medical imaging datasets in the curated public data commons at MIDRC has evolved over time as the number of contributing institutions and overall number of subjects have grown. The use of metrics, such as the JSD support measurement of representativeness, is one step needed for fair and generalizable AI algorithm development.
View details for DOI 10.1117/1.JMI.10.6.061105
View details for Web of Science ID 001139907400011
View details for PubMedID 37469387
View details for PubMedCentralID PMC10353566
-
Toward fairness in artificial intelligence for medical image analysis: identification and mitigation of potential biases in the roadmap from data collection to model deployment
JOURNAL OF MEDICAL IMAGING
2023; 10 (6): 061104
Abstract
To recognize and address various sources of bias essential for algorithmic fairness and trustworthiness and to contribute to a just and equitable deployment of AI in medical imaging, there is an increasing interest in developing medical imaging-based machine learning methods, also known as medical imaging artificial intelligence (AI), for the detection, diagnosis, prognosis, and risk assessment of disease with the goal of clinical implementation. These tools are intended to help improve traditional human decision-making in medical imaging. However, biases introduced in the steps toward clinical deployment may impede their intended function, potentially exacerbating inequities. Specifically, medical imaging AI can propagate or amplify biases introduced in the many steps from model inception to deployment, resulting in a systematic difference in the treatment of different groups.Our multi-institutional team included medical physicists, medical imaging artificial intelligence/machine learning (AI/ML) researchers, experts in AI/ML bias, statisticians, physicians, and scientists from regulatory bodies. We identified sources of bias in AI/ML, mitigation strategies for these biases, and developed recommendations for best practices in medical imaging AI/ML development.Five main steps along the roadmap of medical imaging AI/ML were identified: (1) data collection, (2) data preparation and annotation, (3) model development, (4) model evaluation, and (5) model deployment. Within these steps, or bias categories, we identified 29 sources of potential bias, many of which can impact multiple steps, as well as mitigation strategies.Our findings provide a valuable resource to researchers, clinicians, and the public at large.
View details for DOI 10.1117/1.JMI.10.6.061104
View details for Web of Science ID 001139907400013
View details for PubMedID 37125409
View details for PubMedCentralID PMC10129875
-
Opportunistic detection of type 2 diabetes using deep learning from frontal chest radiographs.
Nature communications
2023; 14 (1): 4039
Abstract
Deep learning (DL) models can harness electronic health records (EHRs) to predict diseases and extract radiologic findings for diagnosis. With ambulatory chest radiographs (CXRs) frequently ordered, we investigated detecting type 2 diabetes (T2D) by combining radiographic and EHR data using a DL model. Our model, developed from 271,065 CXRs and 160,244 patients, was tested on a prospective dataset of 9,943 CXRs. Here we show the model effectively detected T2D with a ROC AUC of 0.84 and a 16% prevalence. The algorithm flagged 1,381 cases (14%) as suspicious for T2D. External validation at a distinct institution yielded a ROC AUC of 0.77, with 5% of patients subsequently diagnosed with T2D. Explainable AI techniques revealed correlations between specific adiposity measures and high predictivity, suggesting CXRs' potential for enhanced T2D screening.
View details for DOI 10.1038/s41467-023-39631-x
View details for PubMedID 37419921
View details for PubMedCentralID PMC10328953
-
Fast Optical Signals for Real-Time Retinotopy and Brain Computer Interface.
Bioengineering (Basel, Switzerland)
2023; 10 (5)
Abstract
A brain-computer interface (BCI) allows users to control external devices through brain activity. Portable neuroimaging techniques, such as near-infrared (NIR) imaging, are suitable for this goal. NIR imaging has been used to measure rapid changes in brain optical properties associated with neuronal activation, namely fast optical signals (FOS) with good spatiotemporal resolution. However, FOS have a low signal-to-noise ratio, limiting their BCI application. Here FOS were acquired with a frequency-domain optical system from the visual cortex during visual stimulation consisting of a rotating checkerboard wedge, flickering at 5 Hz. We used measures of photon count (Direct Current, DC light intensity) and time of flight (phase) at two NIR wavelengths (690 nm and 830 nm) combined with a machine learning approach for fast estimation of visual-field quadrant stimulation. The input features of a cross-validated support vector machine classifier were computed as the average modulus of the wavelet coherence between each channel and the average response among all channels in 512 ms time windows. An above chance performance was obtained when differentiating visual stimulation quadrants (left vs. right or top vs. bottom) with the best classification accuracy of ~63% (information transfer rate of ~6 bits/min) when classifying the superior and inferior stimulation quadrants using DC at 830 nm. The method is the first attempt to provide generalizable retinotopy classification relying on FOS, paving the way for the use of FOS in real-time BCI.
View details for DOI 10.3390/bioengineering10050553
View details for PubMedID 37237623
-
One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning
edited by Ruiz, F., Dy, J., VanDeMeent, J. W.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001222727702004
-
Finite-sample Guarantees for Nash Q-learning with Linear Function Approximation
edited by Evans, R. J., Shpitser
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023: 424-432
View details for Web of Science ID 001222701100040
-
Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks
ASSOC COMPUTING MACHINERY. 2023: 1511-1525
View details for DOI 10.1145/3576915.3623193
View details for Web of Science ID 001124987201035
-
Self-Supervised Learning of Representations for Space Generates Multi-Modular Grid Cells
edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001228825107028
-
DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models
edited by Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001220600008008
-
Pairwise Ranking Losses of Click-Through Rates Prediction for Welfare Maximization in Ad Auctions
edited by Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001372371906018
-
Adapting to Latent Subgroup Shifts via Concepts and Proxies
edited by Ruiz, F., Dy, J., VanDeMeent, J. W.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001298469303032
-
Fair Wrapping for Black-box Predictions
edited by Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A.
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2022
View details for Web of Science ID 001215469503006
https://orcid.org/0000-0002-4023-419X