Bio


Juan Carlos Niebles received an Engineering degree in Electronics from Universidad del Norte (Colombia) in 2002, an M.Sc. degree in Electrical and Computer Engineering from University of Illinois at Urbana-Champaign in 2007, and a Ph.D. degree in Electrical Engineering from Princeton University in 2011. He is Research Director at Salesforce and Adjunct Professor of Computer Science at Stanford since 2021. He is co-Director of the Stanford Vision and Learning Lab. Before that, he was Associate Director of Research at the Stanford-Toyota Center for AI Research and a Senior Research Scientist at the Stanford AI Lab between 2015 and 2021. He was also an Associate Professor of Electrical and Electronic Engineering in Universidad del Norte (Colombia) between 2011 and 2019. His research interests are in computer vision and machine learning, with a focus on visual recognition and understanding of human actions and activities, objects, scenes, and events. He serves as Area Chair for the top computer vision conferences CVPR and ICCV, as well as Associate Editor for IEEE TPAMI. He is also a member of the AI Index Steering Committee and is the Curriculum Director for Stanford-AI4ALL. He is a recipient of a Google Faculty Research award (2015), the Microsoft Research Faculty Fellowship (2012), a Google Research award (2011) and a Fulbright Fellowship (2005).

Academic Appointments


  • Sr Research Engineer, Computer Science

Honors & Awards


  • Faculty Research Award, Google (2015)
  • Senior Member, IEEE (2015)
  • Faculty Fellow, Microsoft Research (2012)
  • Research Award, Google (2011)
  • Fulbright PhD Fellowship, Fulbright-Colciencias-DNP (2005)

Boards, Advisory Committees, Professional Organizations


  • Steering Committee, AI Index (2018 - Present)
  • Associate Director of Research, Stanford AI Lab-Toyota Center for AI Research (2015 - Present)
  • Senior Member, IEEE (2015 - Present)
  • Member, IEEE Computer Society (2014 - Present)
  • Member, IEEE (2007 - Present)

Professional Education


  • Ph.D., Princeton University, Electrical Engineering (2011)
  • M.A., Princeton University, Electrical Engineering (2009)
  • M.Sc., University of Illinois at Urbana-Champaign, Electrical and Computer Engineering (2007)
  • Engineer, Universidad del Norte, Electronics Engineering (2002)

Current Research and Scholarly Interests


My research work is in computer vision. The goal of my research is to enable computers and robots to perceive the visual world by developing novel computer vision algorithms for automatic analysis of images and videos. From the scientific point of view, we tackle fundamental open problems in computer vision research related to the visual recognition and understanding of human actions and activities, objects, scenes, and events. From the application perspective, we develop systems that solve practical world problems by introducing cutting-edge computer vision technologies into new application domains.

Stanford Advisees


All Publications


  • Quantifying Parkinson's disease motor severity under uncertainty using MDS-UPDRS videos. Medical image analysis Lu, M., Zhao, Q., Poston, K. L., Sullivan, E. V., Pfefferbaum, A., Shahid, M., Katz, M., Kouhsari, L. M., Schulman, K., Milstein, A., Niebles, J. C., Henderson, V. W., Fei-Fei, L., Pohl, K. M., Adeli, E. 2021; 73: 102179

    Abstract

    Parkinson's disease (PD) is a brain disorder that primarily affects motor function, leading to slow movement, tremor, and stiffness, as well as postural instability and difficulty with walking/balance. The severity of PD motor impairments is clinically assessed by part III of the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a universally-accepted rating scale. However, experts often disagree on the exact scoring of individuals. In the presence of label noise, training a machine learning model using only scores from a single rater may introduce bias, while training models with multiple noisy ratings is a challenging task due to the inter-rater variabilities. In this paper, we introduce an ordinal focal neural network to estimate the MDS-UPDRS scores from input videos, to leverage the ordinal nature of MDS-UPDRS scores and combat class imbalance. To handle multiple noisy labels per exam, the training of the network is regularized via rater confusion estimation (RCE), which encodes the rating habits and skills of raters via a confusion matrix. We apply our pipeline to estimate MDS-UPDRS test scores from their video recordings including gait (with multiple Raters, R=3) and finger tapping scores (single rater). On a sizable clinical dataset for the gait test (N=55), we obtained a classification accuracy of 72% with majority vote as ground-truth, and an accuracy of ∼84% of our model predicting at least one of the raters' scores. Our work demonstrates how computer-assisted technologies can be used to track patients and their motor impairments, even when there is uncertainty in the clinical ratings. The latest version of the code will be available at https://github.com/mlu355/PD-Motor-Severity-Estimation.

    View details for DOI 10.1016/j.media.2021.102179

    View details for PubMedID 34340101

  • Home Action Genome: Cooperative Compositional Action Understanding Rai, N., Chen, H., Ji, J., Desai, R., Kozuka, K., Ishizaka, S., Adeli, E., Niebles, J., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 11179-11188
  • CoCon: Cooperative-Contrastive Learning Rai, N., Adeli, E., Lee, K., Gaidon, A., Niebles, J., IEEE Comp Soc IEEE COMPUTER SOC. 2021: 3379-3388
  • Metadata Normalization. Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition Lu, M., Zhao, Q., Zhang, J., Pohl, K. M., Fei-Fei, L., Niebles, J. C., Adeli, E. 2021; 2021: 10912-10922

    Abstract

    Batch Normalization (BN) and its variants have delivered tremendous success in combating the covariate shift induced by the training step of deep learning methods. While these techniques normalize feature distributions by standardizing with batch statistics, they do not correct the influence on features from extraneous variables or multiple distributions. Such extra variables, referred to as metadata here, may create bias or confounding effects (e.g., race when classifying gender from face images). We introduce the Metadata Normalization (MDN) layer, a new batch-level operation which can be used end-to-end within the training framework, to correct the influence of metadata on feature distributions. MDN adopts a regression analysis technique traditionally used for preprocessing to remove (regress out) the metadata effects on model features during training. We utilize a metric based on distance correlation to quantify the distribution bias from the metadata and demonstrate that our method successfully removes metadata effects on four diverse settings: one synthetic, one 2D image, one video, and one 3D medical image dataset.

    View details for DOI 10.1109/cvpr46437.2021.01077

    View details for PubMedID 34776724

    View details for PubMedCentralID PMC8589298

  • Representation Learning with Statistical Independence to Mitigate Bias. IEEE Winter Conference on Applications of Computer Vision. IEEE Winter Conference on Applications of Computer Vision Adeli, E., Zhao, Q., Pfefferbaum, A., Sullivan, E. V., Fei-Fei, L., Niebles, J. C., Pohl, K. M. 2021; 2021: 2512-2522

    Abstract

    Presence of bias (in datasets or tasks) is inarguably one of the most critical challenges in machine learning applications that has alluded to pivotal debates in recent years. Such challenges range from spurious associations between variables in medical studies to the bias of race in gender or face recognition systems. Controlling for all types of biases in the dataset curation stage is cumbersome and sometimes impossible. The alternative is to use the available data and build models incorporating fair representation learning. In this paper, we propose such a model based on adversarial training with two competing objectives to learn features that have (1) maximum discriminative power with respect to the task and (2) minimal statistical mean dependence with the protected (bias) variable(s). Our approach does so by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and the learned features. We apply our method to synthetic data, medical images (containing task bias), and a dataset for gender classification (containing dataset bias). Our results show that the learned features by our method not only result in superior prediction performance but also are unbiased.

    View details for DOI 10.1109/wacv48630.2021.00256

    View details for PubMedID 34522832

  • Vision-based Estimation of MDS-UPDRS Gait Scores for Assessing Parkinson's Disease Motor Severity. Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention Lu, M., Poston, K., Pfefferbaum, A., Sullivan, E. V., Fei-Fei, L., Pohl, K. M., Niebles, J. C., Adeli, E. 2020; 12263: 637–47

    Abstract

    Parkinson's disease (PD) is a progressive neurological disorder primarily affecting motor function resulting in tremor at rest, rigidity, bradykinesia, and postural instability. The physical severity of PD impairments can be quantified through the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS-UPDRS), a widely used clinical rating scale. Accurate and quantitative assessment of disease progression is critical to developing a treatment that slows or stops further advancement of the disease. Prior work has mainly focused on dopamine transport neuroimaging for diagnosis or costly and intrusive wearables evaluating motor impairments. For the first time, we propose a computer vision-based model that observes non-intrusive video recordings of individuals, extracts their 3D body skeletons, tracks them through time, and classifies the movements according to the MDS-UPDRS gait scores. Experimental results show that our proposed method performs significantly better than chance and competing methods with an F 1-score of 0.83 and a balanced accuracy of 81%. This is the first benchmark for classifying PD patients based on MDS-UPDRS gait severity and could be an objective biomarker for disease severity. Our work demonstrates how computer-assisted technologies can be used to non-intrusively monitor patients and their motor impairments. The code is available at https://github.com/mlu355/PD-Motor-Severity-Estimation.

    View details for DOI 10.1007/978-3-030-59716-0_61

    View details for PubMedID 33103164

  • Socially and Contextually Aware Human Motion and Pose Forecasting IEEE ROBOTICS AND AUTOMATION LETTERS Adeli, V., Adeli, E., Reid, I., Niebles, J., Rezatofighi, H. 2020; 5 (4): 6033–40
  • Explaining VQA predictions using visual grounding and a knowledge base IMAGE AND VISION COMPUTING Riquelme, F., De Goyeneche, A., Zhang, Y., Niebles, J., Soto, A. 2020; 101
  • Segmenting the Future IEEE ROBOTICS AND AUTOMATION LETTERS Chiu, H., Adeli, E., Niebles, J. 2020; 5 (3): 4202–9
  • Spatiotemporal Relationship Reasoning for Pedestrian Intent Prediction IEEE ROBOTICS AND AUTOMATION LETTERS Liu, B., Adeli, E., Cao, Z., Lee, K., Shenoi, A., Gaidon, A., Niebles, J. 2020; 5 (2): 3485–92
  • Motion Reasoning for Goal-Based Imitation Learning Huang, D., Chao, Y., Paxton, C., Deng, X., Li Fei-Fei, Niebles, J., Garg, A., Fox, D., IEEE IEEE. 2020: 4878-4884
  • Adversarial Cross-Domain Action Recognition with Co-Attention Pan, B., Cao, Z., Adeli, E., Niebles, J., Assoc Advancement Artificial Intelligence ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2020: 11815-11822
  • Disentangling Human Dynamics for Pedestrian Locomotion Forecasting with Noisy Supervision Mangalam, K., Adeli, E., Lee, K., Gaidon, A., Niebles, J., IEEE Comp Soc IEEE COMPUTER SOC. 2020: 2773–82
  • Action-Agnostic Human Pose Forecasting Chiu, H., Adeli, E., Wang, B., Huang, D., Niebles, J., IEEE IEEE. 2019: 1423–32
  • Learning Temporal Action ProposalsWith Fewer Labels Ji, J., Cao, K., Niebles, J., IEEE IEEE. 2019: 7072–81
  • Imitation Learning for Human Pose Prediction Wang, B., Adeli, E., Chiu, H., Huang, D., Niebles, J., IEEE IEEE. 2019: 7123–32
  • Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning Huang, D., Xu, D., Zhu, Y., Garg, A., Savarese, S., Fei-Fei, L., Niebles, J., IEEE IEEE. 2019: 2635–42
  • Peeking into the Future: Predicting Future Person Activities and Locations in Videos Liang, J., Jiang, L., Niebles, J., Hauptmann, A., Li Fei-Fei, IEEE Comp Soc IEEE COMPUTER SOC. 2019: 5718–27
  • (DTW)-T-3: Discriminative Differentiable Dynamic Time Warping for Weakly Supervised Action Alignment and Segmentation Chang, C., Huang, D., Sui, Y., Li Fei-Fei, Niebles, J., IEEE Comp Soc IEEE COMPUTER SOC. 2019: 3541–50
  • Peeking into the Future: Predicting Future Person Activities and Locations in Videos Liang, J., Jiang, L., Niebles, J., Hauptmann, A., Li Fei-Fei, IEEE IEEE. 2019: 2960–63
  • Neural Task Graphs: Generalizing to Unseen Tasks from a Single Video Demonstration Huang, D., Nair, S., Xu, D., Zhu, Y., Garg, A., Li Fei-Fei, Savarese, S., Niebles, J., IEEE Comp Soc IEEE. 2019: 8557–66
  • Interpretable Visual Question Answering by Visual Grounding from Attention Supervision Mining Zhang, Y., Niebles, J., Soto, A., IEEE IEEE. 2019: 349–57
  • Learning to Decompose and Disentangle Representations for Video Prediction Hsieh, J., Liu, B., Huang, D., Fei-Fei, L., Niebles, J., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Finding "It": Weakly-Supervised Reference-Aware Visual Grounding in Instructional Videos Huang, D., Buch, S., Dery, L., Garg, A., Li Fei-Fei, Niebles, J., IEEE IEEE. 2018: 5948–57
  • What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets Huang, D., Ramanathan, V., Mahajan, D., Torresani, L., Paluri, M., Li Fei-Fei, Niebles, J., IEEE IEEE. 2018: 7366–75
  • Sparse composition of body poses and atomic actions for human activity recognition in RGB-D videos IMAGE AND VISION COMPUTING Lillo, I., Niebles, J., Soto, A. 2017; 59: 63–75
  • Risky Region Localization with Point Supervision Kozuka, K., Niebles, J., IEEE IEEE. 2017: 246–53
  • Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization Zeng, K., Chou, S., Chan, F., Niebles, J., Sun, M., IEEE IEEE. 2017: 1330–38
  • Dense-Captioning Events in Videos Krishna, R., Hata, K., Ren, F., Fei-Fei, L., Niebles, J., IEEE IEEE. 2017: 706–15
  • Visual Forecasting by Imitating Dynamics in Natural Sequences Zeng, K., Shen, W. B., Huang, D., Sun, M., Niebles, J., IEEE IEEE. 2017: 3018–27
  • Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos Huang, D., Lim, J. J., Fei-Fei, L., Niebles, J., IEEE IEEE. 2017: 1032–41
  • Connectionist Temporal Modeling for Weakly Supervised Action Labeling Huang, D., Li Fei-Fei, Niebles, J., Leibe, B., Matas, J., Sebe, N., Welling, M. SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 137–53
  • Title Generation for User Generated Videos Zeng, K., Chen, T., Niebles, J., Sun, M., Leibe, B., Matas, J., Sebe, N., Welling, M. SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 609–25
  • Fast Temporal Activity Proposals for Efficient Detection of Human Actions in Untrimmed Videos Heilbron, F., Niebles, J., Ghanem, B., IEEE IEEE. 2016: 1914–23
  • A Hierarchical Pose-Based Approach to Complex Action Understanding Using Dictionaries of Actionlets and Motion Poselets Lillo, I., Niebles, J., Soto, A., IEEE IEEE. 2016: 1981–90
  • DAPs: Deep Action Proposals for Action Understanding Escorcia, V., Heilbron, F., Niebles, J., Ghanem, B., Leibe, B., Matas, J., Sebe, N., Welling, M. SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 768–84
  • Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification 11th European Conference on Computer Vision Niebles, J. C., Chen, C., Li Fei-Fei, F. F. SPRINGER-VERLAG BERLIN. 2010: 392–405