
Bio
Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is Research Director at Salesforce and Adjunct Professor of Computer Science at Stanford since 2021. He is co-Director of the Stanford Vision and Learning Lab. Before that, he was Associate Director of Research at the Stanford-Toyota Center for AI Research and a Senior Research Scientist at the Stanford AI Lab between 2015 and 2021. He was also an Associate Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) between 2011 and 2019. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He serves as Area Chair for the top computer vision conferences CVPR and ICCV, as well as Associate Editor for IEEE TPAMI. He is also a member of the AI Index Steering Committee and is the Curriculum Director for Stanford-AI4ALL. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).
Academic Appointments
-
Sr Research Engineer, Computer Science
Honors & Awards
-
Faculty Research Award, Google (2015)
-
Senior Member, IEEE (2015)
-
Faculty Fellow, Microsoft Research (2012)
-
Research Award, Google (2011)
-
Fulbright PhD Fellowship, Fulbright-Colciencias-DNP (2005)
Boards, Advisory Committees, Professional Organizations
-
Steering Committee, AI Index (2018 - Present)
-
Associate Director of Research, Stanford AI Lab-Toyota Center for AI Research (2015 - Present)
-
Senior Member, IEEE (2015 - Present)
-
Member, IEEE Computer Society (2014 - Present)
-
Member, IEEE (2007 - Present)
Professional Education
-
Ph.D., Princeton University, Electrical Engineering (2011)
-
M.A., Princeton University, Electrical Engineering (2009)
-
M.Sc., University of Illinois at Urbana-Champaign, Electrical and Computer Engineering (2007)
-
Engineer, Universidad del Norte, Electronics Engineering (2002)
Current Research and Scholarly Interests
My research work is in computer vision. The goal of my research is to enable computers and robots to perceive the visual world by developing novel computer vision algorithms for automatic analysis of images and videos. From the scientific point of view, we tackle fundamental open problems in computer vision research related to the visual recognition and understanding of human actions and activities, objects, scenes, and events. From the application perspective, we develop systems that solve practical world problems by introducing cutting-edge computer vision technologies into new application domains.
2023-24 Courses
- Computer Vision: Foundations and Applications
CS 131 (Win) -
Independent Studies (9)
- Advanced Reading and Research
CS 499 (Aut, Spr) - Advanced Reading and Research
CS 499P (Aut, Spr) - Curricular Practical Training
CS 390A (Sum) - Curricular Practical Training
CS 390B (Sum) - Independent Project
CS 399 (Aut, Win, Spr) - Independent Work
CS 199 (Aut, Win) - Senior Project
CS 191 (Aut, Spr) - Supervised Undergraduate Research
CS 195 (Win) - Writing Intensive Senior Research Project
CS 191W (Aut, Win)
- Advanced Reading and Research
-
Prior Year Courses
2022-23 Courses
- Computer Vision: Foundations and Applications
CS 131 (Aut)
2021-22 Courses
- Computer Vision: Foundations and Applications
CS 131 (Aut)
2020-21 Courses
- Computer Vision: Foundations and Applications
CS 131 (Aut)
- Computer Vision: Foundations and Applications
All Publications
-
Quantifying Parkinson's disease motor severity under uncertainty using MDS-UPDRS videos.
Medical image analysis
2021; 73: 102179
Abstract
Parkinson's disease (PD) is a brain disorder that primarily affects motor function, leading to slow movement, tremor, and stiffness, as well as postural instability and difficulty with walking/balance. The severity of PD motor impairments is clinically assessed by part III of the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a universally-accepted rating scale. However, experts often disagree on the exact scoring of individuals. In the presence of label noise, training a machine learning model using only scores from a single rater may introduce bias, while training models with multiple noisy ratings is a challenging task due to the inter-rater variabilities. In this paper, we introduce an ordinal focal neural network to estimate the MDS-UPDRS scores from input videos, to leverage the ordinal nature of MDS-UPDRS scores and combat class imbalance. To handle multiple noisy labels per exam, the training of the network is regularized via rater confusion estimation (RCE), which encodes the rating habits and skills of raters via a confusion matrix. We apply our pipeline to estimate MDS-UPDRS test scores from their video recordings including gait (with multiple Raters, R=3) and finger tapping scores (single rater). On a sizable clinical dataset for the gait test (N=55), we obtained a classification accuracy of 72% with majority vote as ground-truth, and an accuracy of ∼84% of our model predicting at least one of the raters' scores. Our work demonstrates how computer-assisted technologies can be used to track patients and their motor impairments, even when there is uncertainty in the clinical ratings. The latest version of the code will be available at https://github.com/mlu355/PD-Motor-Severity-Estimation.
View details for DOI 10.1016/j.media.2021.102179
View details for PubMedID 34340101
-
CoCon: Cooperative-Contrastive Learning
IEEE COMPUTER SOC. 2021: 3379-3388
View details for DOI 10.1109/CVPRW53098.2021.00377
View details for Web of Science ID 000705890203052
-
Metadata Normalization.
Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2021; 2021: 10912-10922
Abstract
Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.
View details for DOI 10.1109/cvpr46437.2021.01077
View details for PubMedID 34776724
View details for PubMedCentralID PMC8589298
-
Home Action Genome: Cooperative Compositional Action Understanding
IEEE COMPUTER SOC. 2021: 11179-11188
View details for DOI 10.1109/CVPR46437.2021.01103
View details for Web of Science ID 000742075001037
-
Representation Learning with Statistical Independence to Mitigate Bias.
IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision
2021; 2021: 2512-2522
Abstract
Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome and sometimes impossible. The alternative is to use the available data and build models incorporating fair representation learning. In this paper, we propose such a model based on adversarial training with two competing objectives to learn features that have (1) maximum discriminative power with respect to the task and (2) minimal statistical mean dependence with the protected (bias) variable(s). Our approach does so by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and the learned features. We apply our method to synthetic data, medical images (containing task bias), and a dataset for gender classification (containing dataset bias). Our results show that the learned features by our method not only result in superior prediction performance but also are unbiased.
View details for DOI 10.1109/wacv48630.2021.00256
View details for PubMedID 34522832
-
Vision-based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity.
Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention
2020; 12263: 637–47
Abstract
Parkinson's disease (PD) is a progressive neurological disorder primarily affecting motor function resulting in tremor at rest, rigidity, bradykinesia, and postural instability. The physical severity of PD impairments can be quantified through the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a widely used clinical rating scale. Accurate and quantitative assessment of disease progression is critical to developing a treatment that slows or stops further advancement of the disease. Prior work has mainly focused on dopamine transport neuroimaging for diagnosis or costly and intrusive wearables evaluating motor impairments. For the first time, we propose a computer vision-based model that observes non-intrusive video recordings of individuals, extracts their 3D body skeletons, tracks them through time, and classifies the movements according to the MDS-UPDRS gait scores. Experimental results show that our proposed method performs significantly better than chance and competing methods with an F 1-score of 0.83 and a balanced accuracy of 81%. This is the first benchmark for classifying PD patients based on MDS-UPDRS gait severity and could be an objective biomarker for disease severity. Our work demonstrates how computer-assisted technologies can be used to non-intrusively monitor patients and their motor impairments. The code is available at https://github.com/mlu355/PD-Motor-Severity-Estimation.
View details for DOI 10.1007/978-3-030-59716-0_61
View details for PubMedID 33103164
-
Socially and Contextually Aware Human Motion and Pose Forecasting
IEEE ROBOTICS AND AUTOMATION LETTERS
2020; 5 (4): 6033–40
View details for DOI 10.1109/LRA.2020.3010742
View details for Web of Science ID 000554894900027
-
Explaining VQA predictions using visual grounding and a knowledge base
IMAGE AND VISION COMPUTING
2020; 101
View details for DOI 10.1016/j.imavis.2020.103968
View details for Web of Science ID 000570137900006
-
Segmenting the Future
IEEE ROBOTICS AND AUTOMATION LETTERS
2020; 5 (3): 4202–9
View details for DOI 10.1109/LRA.2020.2992184
View details for Web of Science ID 000541731600001
-
Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction
IEEE ROBOTICS AND AUTOMATION LETTERS
2020; 5 (2): 3485–92
View details for DOI 10.1109/LRA.2020.2976305
View details for Web of Science ID 000520954200034
-
Motion Reasoning for Goal-Based Imitation Learning
IEEE. 2020: 4878-4884
View details for Web of Science ID 000712319503063
-
Adversarial Cross-Domain Action Recognition with Co-Attention
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2020: 11815-11822
View details for Web of Science ID 000668126804033
-
Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision
IEEE COMPUTER SOC. 2020: 2773–82
View details for Web of Science ID 000578444802088
-
Action-Agnostic Human Pose Forecasting
IEEE. 2019: 1423–32
View details for DOI 10.1109/WACV.2019.00156
View details for Web of Science ID 000469423400149
-
Imitation Learning for Human Pose Prediction
IEEE. 2019: 7123–32
View details for DOI 10.1109/ICCV.2019.00722
View details for Web of Science ID 000548549202023
-
Learning Temporal Action ProposalsWith Fewer Labels
IEEE. 2019: 7072–81
View details for DOI 10.1109/ICCV.2019.00717
View details for Web of Science ID 000548549202018
-
Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning
IEEE. 2019: 2635–42
View details for Web of Science ID 000544658402034
-
Peeking into the Future: Predicting Future Person Activities and Locations in Videos
IEEE COMPUTER SOC. 2019: 5718–27
View details for DOI 10.1109/CVPR.2019.00587
View details for Web of Science ID 000529484005092
-
(DTW)-T-3: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation
IEEE COMPUTER SOC. 2019: 3541–50
View details for DOI 10.1109/CVPR.2019.00366
View details for Web of Science ID 000529484003071
-
Peeking into the Future: Predicting Future Person Activities and Locations in Videos
IEEE. 2019: 2960–63
View details for DOI 10.1109/CVPRW.2019.00358
View details for Web of Science ID 000569983600352
-
Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration
IEEE. 2019: 8557–66
View details for DOI 10.1109/CVPR.2019.00876
View details for Web of Science ID 000542649302018
-
Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining
IEEE. 2019: 349–57
View details for DOI 10.1109/WACV.2019.00043
View details for Web of Science ID 000469423400036
-
Learning to Decompose and Disentangle Representations for Video Prediction
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823300048
-
Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos
IEEE. 2018: 5948–57
View details for DOI 10.1109/CVPR.2018.00623
View details for Web of Science ID 000457843606011
-
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets
IEEE. 2018: 7366–75
View details for DOI 10.1109/CVPR.2018.00769
View details for Web of Science ID 000457843607054
-
Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos
IMAGE AND VISION COMPUTING
2017; 59: 63–75
View details for DOI 10.1016/j.imavis.2016.11.004
View details for Web of Science ID 000397687900005
-
Visual Forecasting by Imitating Dynamics in Natural Sequences
IEEE. 2017: 3018–27
View details for DOI 10.1109/ICCV.2017.326
View details for Web of Science ID 000425498403009
-
Dense-Captioning Events in Videos
IEEE. 2017: 706–15
View details for DOI 10.1109/ICCV.2017.83
View details for Web of Science ID 000425498400074
-
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
IEEE. 2017: 1032–41
View details for DOI 10.1109/CVPR.2017.116
View details for Web of Science ID 000418371401011
-
Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization
IEEE. 2017: 1330–38
View details for DOI 10.1109/CVPR.2017.146
View details for Web of Science ID 000418371401041
-
Risky Region Localization with Point Supervision
IEEE. 2017: 246–53
View details for DOI 10.1109/ICCVW.2017.38
View details for Web of Science ID 000425239600031
-
Connectionist Temporal Modeling for Weakly Supervised Action Labeling
SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 137–53
View details for DOI 10.1007/978-3-319-46493-0_9
View details for Web of Science ID 000389385100009
-
Title Generation for User Generated Videos
SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 609–25
View details for DOI 10.1007/978-3-319-46475-6_38
View details for Web of Science ID 000389383900038
-
Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos
IEEE. 2016: 1914–23
View details for DOI 10.1109/CVPR.2016.211
View details for Web of Science ID 000400012301103
-
A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets
IEEE. 2016: 1981–90
View details for DOI 10.1109/CVPR.2016.218
View details for Web of Science ID 000400012302004
-
DAPs: Deep Action Proposals for Action Understanding
SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 768–84
View details for DOI 10.1007/978-3-319-46487-9_47
View details for Web of Science ID 000389384800047
-
Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification
11th European Conference on Computer Vision
SPRINGER-VERLAG BERLIN. 2010: 392–405
View details for Web of Science ID 000286164000029