Bio


Jiajun Wu is an Assistant Professor of Computer Science and, by courtesy, of Psychology at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the Young Investigator Programs (YIP) by ONR and by AFOSR, the NSF CAREER award, the Okawa research grant, paper awards and finalists at ICCV, CVPR, SIGGRAPH Asia, CoRL, and IROS, dissertation awards from ACM, AAAI, and MIT, the 2020 Samsung AI Researcher of the Year, and faculty research awards from J.P. Morgan, Samsung, Amazon, and Meta.

Academic Appointments


Honors & Awards


  • Research Grant, Okawa Foundation (2024)
  • CAREER Award, NSF (2024)
  • Young Investigator Program (YIP), ONR (2024)
  • Innovators Under 35 Asia Pacific, MIT Technology Review (2024)
  • Young Investigator Program (YIP), AFOSR (2023)
  • Best Paper Award, SIGGRAPH Asia, ACM (2023)
  • Best Systems Paper Award, CoRL (2023)
  • Best Paper Award Finalist, ICCV, IEEE/CVF (2023)
  • Best Paper Award Candidate, CVPR, IEEE/CVF (2023)
  • Global Research Outreach (GRO) Award, Samsung (2023)
  • New Faculty Highlights, AAAI (2023)
  • Best Paper Award Nominee, CoRL (2022)
  • Faculty Research Award, J.P. Morgan (2022)
  • 30 Under 30, Science, Forbes (2022)
  • Early Career Professor Award Finalist, Agilent (2022)
  • Research Award, Meta (2021)
  • Research Award, Amazon (2021)
  • AI Researcher of the Year, Samsung (2020)
  • Global Research Outreach (GRO) Award, Samsung (2020)
  • George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, MIT (2020)
  • Doctoral Dissertation Award Honorable Mention, ACM (2019)
  • Dissertation Award, AAAI/ACM SIGAI (2019)
  • PhD Fellowship, Facebook (2017--2019)
  • Best Paper Award on Cognitive Robotics, IROS, IEEE/RSJ (2018)
  • PhD Fellowship, Samsung (2016--2017)
  • Graduate Fellowship, Nvidia (2016--2017)
  • Research Fellowship, Adobe (2015)
  • Edwin S. Webster Fellowship, MIT (2014)

Program Affiliations


  • Symbolic Systems Program

Professional Education


  • Ph.D., MIT, EECS (2020)
  • S.M., MIT, EECS (2016)

2024-25 Courses


Stanford Advisees


All Publications


  • Physical scene understanding AI MAGAZINE Wu, J. 2024

    View details for DOI 10.1002/aaai.12148

    View details for Web of Science ID 001158170300001

  • Neurosymbolic Models for Computer Graphics COMPUTER GRAPHICS FORUM Ritchie, D., Guerrero, P., Jones, R., Mitra, N. J., Schulz, A., Willis, K. D., Wu, J. 2023; 42 (2): 545-568

    View details for DOI 10.1111/cgf.14775

    View details for Web of Science ID 001000062600040

  • Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Xue, T., Wu, J., Bouman, K. L., Freeman, W. T. 2019; 41 (9): 2236–50

    Abstract

    We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.

    View details for DOI 10.1109/TPAMI.2018.2854726

    View details for Web of Science ID 000480343900014

    View details for PubMedID 30004870

  • 3D Congealing: 3D-Aware Image Alignment in the Wild Zhang, Y., Li, Z., Raj, A., Engelhardt, A., Li, Y., Hou, T., Wu, J., Jampani, V., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Leonardis, A., Ricci, E. SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 387-404
  • Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos Sun, K., Litvak, D., Zhang, Y., Li, H., Wu, J., Wu, S., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Leonardis, A., Ricci, E. SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 100-119
  • Foundation models in robotics: Applications, challenges, and the future INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH Firoozi, R., Tucker, J., Tian, S., Majumdar, A., Sun, J., Liu, W., Zhu, Y., Song, S., Kapoor, A., Hausman, K., Ichter, B., Driess, D., Wu, J., Lu, C., Schwager, M. 2024
  • Partial-View Object View Synthesis via Filtering Inversion Sun, F., Tremblay, J., Blukis, V., Lin, K., Xu, D., Ivanovic, B., Karkus, P., Birchfield, S., Fox, D., Zhang, R., Li, Y., Wu, J., Pavone, M., Haber, N., IEEE COMPUTER SOC IEEE COMPUTER SOC. 2024: 453-463
  • SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing Wang, Z., Prabha, R., Huang, T., Wu, J., Rajagopal, R., Dy, J., Natarajan, S., Wooldridge, M. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 5805-5813
  • CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series Huang, T., Wu, Z., Wu, J., Hwang, J., Rajagopal, R., Wooldridge, M., Dy, J., Natarajan, S. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 22123-22131
  • DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks Jin, X., Xu, C., Gao, R., Wu, J., Wang, G., Li, S., Spencer, S. ASSOC COMPUTING MACHINERY. 2024
  • RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH Shi, H., Xu, H., Huang, Z., Li, Y., Wu, J. 2023
  • Object Motion Guided Human Motion Synthesis ACM TRANSACTIONS ON GRAPHICS Li, J., Wu, J., Liu, C. 2023; 42 (6)

    View details for DOI 10.1145/3618333

    View details for Web of Science ID 001139790400025

  • Fluid Simulation on Neural Flow Maps ACM TRANSACTIONS ON GRAPHICS Deng, Y., Yu, H., Zhang, D., Wu, J., Zhu, B. 2023; 42 (6)

    View details for DOI 10.1145/3618392

    View details for Web of Science ID 001139790400076

  • Editing Motion Graphics Video via Motion Vectorization and Transformation ACM TRANSACTIONS ON GRAPHICS Zhang, S., Ma, J., Wu, J., Ritchie, D., Agrawala, M. 2023; 42 (6)

    View details for DOI 10.1145/3618316

    View details for Web of Science ID 001139790400057

  • Differentiable Physics Simulation of Dynamics-Augmented Neural Objects IEEE ROBOTICS AND AUTOMATION LETTERS Le Cleac'h, S., Yu, H., Guo, M., Howell, T., Gao, R., Wu, J., Manchester, Z., Schwager, M. 2023; 8 (5): 2780-2787
  • Rendering Humans from Object-Occluded Monocular Videos Xiang, T., Sun, A., Wu, J., Adeli, E., Fei-Fei, L., IEEE IEEE COMPUTER SOC. 2023: 3216-3227
  • Model-Based Control with Sparse Neural Dynamics Liu, Z., Zhou, G., He, J., Marcucci, T., Li Fei-Fei, Wu, J., Li, Y., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Learning Rational Subgoals from Demonstrations and Instructions Luo, Z., Mao, J., Wu, J., Lozano-Perez, T., Tenenbaum, J. B., Kaelbling, L., Williams, B., Chen, Y., Neville, J. ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2023: 12068-12078
  • Benchmarking Rigid Body Contact Models Guo, M., Jiang, Y., Spielberg, A., Wu, J., Liu, K., Pappas, G. J., Matni, N., Morari, M. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • Learning to Design and Use Tools for Robotic Manipulation Liu, Z., Tian, S., Guo, M., Liu, C., Wu, J., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models Huang, W., Wang, C., Zhang, R., Li, Y., Wu, J., Fei-Fei, L., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities Zhang, R., Lee, S., Hwang, M., Hiranaka, A., Wang, C., Ai, W., Tan, J., Gupta, S., Hao, Y., Levine, G., Gao, R., Norcia, A., Li Fei-Fei, Wu, J., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning? Hsu, J., Poesia, G., Wu, J., Goodman, N. D., Antoran, J., Blaas, A., Buchanan, K., Feng, F., Fortuin, Ghalebikesabi, S., Kriegler, A., Mason, Rohde, D., Ruiz, F. J., Uelwer, T., Xie, Y., Yang, R. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023: 21-28
  • Siamese Masked Autoencoders Gupta, A., Wu, J., Deng, J., Li Fei-Fei, Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Compositional Diffusion-Based Continuous Constraint Solvers Yang, Z., Mao, J., Du, Y., Wu, J., Tenenbaum, J. B., Lozano-Perez, T., Kaelbling, L., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • Learning Sequential Acquisition Policies for Robot-Assisted Feeding Sundaresan, P., Wu, J., Sadigh, D., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools Shi, H., Xu, H., Clarke, S., Li, Y., Wu, J., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • Composable Part-Based Manipulation Liu, W., Mao, J., Hsu, J., Hermans, T., Garg, A., Wu, J., Tan, J., Toussaint, M., Darvish, K. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • Inferring Hybrid Neural Fluid Fields from Videos Yu, H., Zheng, Y., Gao, Y., Deng, Y., Zhu, B., Wu, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Holistic Evaluation of Text-to-Image Models Lee, T., Yasunaga, M., Meng, C., Mai, Y., Park, J., Gupta, A., Zhang, Y., Narayanan, D., Teufel, H., Bellagente, M., Kang, M., Park, T., Leskovec, J., Zhu, J., Li Fei-Fei, Wu, J., Ermon, S., Liang, P., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • 3D Neural Field Generation using Triplane Diffusion Shue, J., Chan, E., Po, R., Ankner, Z., Wu, J., Wetzstein, G., IEEE IEEE COMPUTER SOC. 2023: 20875-20886
  • Disentanglement via Latent Quantization Hsu, K., Dorrell, W., Whittington, J. R., Wu, J., Finn, C., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • CIRCLE: Capture In Rich Contextual Environments Araujo, J., Li, J., Vetrivel, K., Agarwal, R., Wu, J., Gopinath, D., Clegg, A., Liu, C., IEEE IEEE COMPUTER SOC. 2023: 21211-21221
  • Ego-Body Pose Estimation via Ego-Head Pose Estimation Li, J., Liu, C., Wu, J., IEEE IEEE COMPUTER SOC. 2023: 17142-17151
  • Primitive Skill-based Robot Learning from Human Evaluative Feedback Hiranaka, A., Hwang, M., Lee, S., Wang, C., Fei-Fei, L., Wu, J., Zhang, R., IEEE IEEE. 2023: 7817-7824
  • Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark Kuang, Z., Zhang, Y., Yu, H., Agarwala, S., Wu, S., Wu, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • SOUNDCAM: A Dataset for Finding Humans Using Room Acoustics Wang, M., Clarke, S., Wang, J., Gao, R., Wu, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • What's <i>Left</i>? Concept Grounding with Logic-Enhanced Foundation Models Hsu, J., Mao, J., Tenenbaum, J. B., Wu, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • STAP: Sequencing Task-Agnostic Policies Agia, C., Migimatsu, T., Wu, J., Bohg, J., IEEE IEEE. 2023: 7951-7958
  • NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations Hsu, J., Mao, J., Wu, J., IEEE IEEE COMPUTER SOC. 2023: 2614-2623
  • ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding Xue, L., Gao, M., Xing, C., Martin-Martin, R., Wu, J., Xiong, C., Xu, R., Niebles, J., Savarese, S., IEEE IEEE COMPUTER SOC. 2023: 1179-1189
  • 3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection Ge, Y., Yu, H., Zhao, C., Guo, Y., Huang, X., Ren, L., Itti, L., Wu, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Accidental Light Probes Yu, H., Agarwala, S., Herrmann, C., Szeliski, R., Snavely, N., Wu, J., Sun, D., IEEE IEEE COMPUTER SOC. 2023: 12521-12530
  • Multi-Object Manipulation via Object-Centric Neural Scattering Functions Tian, S., Cai, Y., Yu, H., Zakharov, S., Liu, K., Gaidon, A., Li, Y., Wu, J., IEEE IEEE COMPUTER SOC. 2023: 9021-9031
  • The OBJECTFOLDER BENCHMARK: Multisensory Learning with <i>Neural</i> and <i>Real</i> Objects Gao, R., Dou, Y., Li, H., Agarwal, T., Bohg, J., Li, Y., Fei-Fei, L., Wu, J., IEEE IEEE COMPUTER SOC. 2023: 17276-17286
  • PyPose: A Library for Robot Learning with Physics-based Optimization Wang, C., Gao, D., Xu, K., Geng, J., Hu, Y., Qiu, Y., Li, B., Yang, F., Moon, B., Pandey, A., Aryan, Xu, J., Wu, T., He, H., Huang, D., Ren, Z., Zhao, S., Fu, T., Reddy, P., Lin, X., Wang, W., Shi, J., Talak, R., Cao, K., Du, Y., Wang, H., Yu, H., Wang, S., Chen, S., Kashyap, A., Bandaru, R., Dantu, K., Wu, J., Xie, L., Carlone, L., Hutter, M., Scherer, S., IEEE IEEE COMPUTER SOC. 2023: 22024-22034
  • Task-Driven Graph Attention for Hierarchical Relational Object Navigation Lingelbach, M., Li, C., Hwang, M., Kurenkov, A., Lou, A., Martin-Martin, R., Zhang, R., Li Fei-Fei, Wu, J., IEEE IEEE. 2023: 886-893
  • SONICVERSE: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear Gao, R., Li, H., Dharan, G., Wang, Z., Li, C., Xia, F., Savarese, S., Fei-Fei, L., Wu, J., IEEE IEEE. 2023: 704-711
  • Putting People in Their Place: Affordance-Aware Human Insertion into Scenes Kulal, S., Brooks, T., Aiken, A., Wu, J., Yang, J., Lu, J., Efros, A. A., Singh, K., IEEE IEEE COMPUTER SOC. 2023: 17089-17099
  • Tree-Structured Shading Decomposition Geng, C., Yu, H., Zhang, S., Agrawala, M., Wu, J., IEEE IEEE COMPUTER SOC. 2023: 488-498
  • VQ3D: Learning a 3D-Aware Generative Model on ImageNet Sargent, K., Koh, J., Zhang, H., Chang, H., Herrmann, C., Srinivasan, P., Wu, J., Sun, D., IEEE IEEE COMPUTER SOC. 2023: 4217-4227
  • RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks Shi, H., Xu, H., Huang, Z., Li, Y., Wu, J., Hauser, K., Shell, D., Huang, S. RSS FOUNDATION-ROBOTICS SCIENCE & SYSTEMS FOUNDATION. 2022
  • Unsupervised Segmentation in Real-World Images via Spelke Object Inference Chen, H., Venkatesh, R., Friedman, Y., Wu, J., Tenenbaum, J. B., Yamins, D. K., Bear, D. M., Avidan, S., Brostow, G., Cisse, M., Farinella, G. M., Hassner, T. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 719-735
  • Rotationally Equivariant 3D Object Detection Yu, H., Wu, J., Yi, L., IEEE COMP SOC IEEE COMPUTER SOC. 2022: 1446-1454
  • Revisiting the "Video" in Video-Language Understanding Buch, S., Eyzaguirre, C., Gaidon, A., Wu, J., Li Fei-Fei, Niebles, J., IEEE COMP SOC IEEE COMPUTER SOC. 2022: 2907-2917
  • Translating a Visual LEGO Manual to a Machine-Executable Plan Wang, R., Zhang, Y., Mao, J., Cheng, C., Wu, J., Avidan, S., Brostow, G., Cisse, M., Farinella, G. M., Hassner, T. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 677-694
  • Video Extrapolation in Space and Time Zhang, Y., Wu, J., Avidan, S., Brostow, G., Cisse, M., Farinella, G. M., Hassner, T. SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 313-333
  • Scene Synthesis from Human Motion Ye, S., Wang, Y., Li, J., Park, D., Liu, C., Xu, H., Wu, J., Spencer, S. N. ASSOC COMPUTING MACHINERY. 2022
  • Programmatic Concept Learning for Human Motion Description and Synthesis Kulal, S., Mao, J., Aiken, A., Wu, J., IEEE COMP SOC IEEE COMPUTER SOC. 2022: 13833-13842
  • Hierarchical Motion Understanding via Motion Programs Kulal, S., Mao, J., Aiken, A., Wu, J., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 6564-6572
  • Neural Radiance Flow for 4D View Synthesis and Video Processing Du, Y., Zhang, Y., Yu, H., Tenenbaum, J. B., Wu, J., IEEE IEEE. 2021: 14304-14314
  • Learning Temporal Dynamics from Cycles in Narrated Video Epstein, D., Wu, J., Schmid, C., Sun, C., IEEE IEEE. 2021: 1460-1469
  • Augmenting Policy Learning with Routines Discovered from a Single Demonstration Zhao, Z., Gan, C., Wu, J., Guo, X., Tenenbaum, J., Assoc Advancement Artificial Intelligence ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2021: 11024-11032
  • KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control Jakab, T., Tucker, R., Makadia, A., Wu, J., Snavely, N., Kanazawa, A., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 12778-12787
  • De-rendering the World's Revolutionary Artefacts Wu, S., Makadia, A., Wu, J., Snavely, N., Tucker, R., Kanazawa, A., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 6334-6343
  • pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis Chan, E. R., Monteiro, M., Kellnhofer, P., Wu, J., Wetzstein, G., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 5795-5805
  • Repopulating Street Scenes Wang, Y., Liu, A., Tucker, R., Wu, J., Curless, B. L., Seitz, S. M., Snavely, N., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 5106-5115
  • Learning Generative Models of 3D Structures Chaudhuri, S., Ritchie, D., Wu, J., Xu, K., Zhang, H. WILEY. 2020: 643–66

    View details for DOI 10.1111/cgf.14020

    View details for Web of Science ID 000548709600052

  • End-to-End Optimization of Scene Layout Luo, A., Zhang, Z., Wu, J., Tenenbaum, J. B., IEEE IEEE. 2020: 3753–62
  • DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs Wang, Y., Liu, B., Wu, J., Zhu, Y., Du, S. S., Li Fei-Fei, Tenenbaum, J. B., Bessiere, C. IJCAI-INT JOINT CONF ARTIF INTELL. 2020: 4190-4198
  • Accurate Vision-based Manipulation through Contact Reasoning Kloss, A., Bauza, M., Wu, J., Tenenbaum, J. B., Rodriguez, A., Bohg, J., IEEE IEEE. 2020: 6738-6744
  • Visual Grounding of Learned Physical Models Li, Y., Lin, T., Yi, K., Bear, D. M., Yamins, D. K., Wu, J., Tenenbaum, J. B., Torralba, A., Daume, H., Singh, A. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2020
  • Perspective Plane Program Induction from a Single Image Li, Y., Mao, J., Zhang, X., Freeman, W. T., Tenenbaum, J. B., Wu, J., IEEE IEEE. 2020: 4433–42
  • Video Enhancement with Task-Oriented Flow INTERNATIONAL JOURNAL OF COMPUTER VISION Xue, T., Chen, B., Wu, J., Wei, D., Freeman, W. T. 2019; 127 (8): 1106–25
  • An integrative computational architecture for object-driven cortex CURRENT OPINION IN NEUROBIOLOGY Yildirim, I., Wu, J., Kanwisher, N., Tenenbaum, J. 2019; 55: 73–81

    Abstract

    Computational architecture for object-driven cortex Objects in motion activate multiple cortical regions in every lobe of the human brain. Do these regions represent a collection of independent systems, or is there an overarching functional architecture spanning all of object-driven cortex? Inspired by recent work in artificial intelligence (AI), machine learning, and cognitive science, we consider the hypothesis that these regions can be understood as a coherent network implementing an integrative computational system that unifies the functions needed to perceive, predict, reason about, and plan with physical objects-as in the paradigmatic case of using or making tools. Our proposal draws on a modeling framework that combines multiple AI methods, including causal generative models, hybrid symbolic-continuous planning algorithms, and neural recognition networks, with object-centric, physics-based representations. We review evidence relating specific components of our proposal to the specific regions that comprise object-driven cortex, and lay out future research directions with the goal of building a complete functional and mechanistic account of this system.

    View details for DOI 10.1016/j.conb.2019.01.010

    View details for Web of Science ID 000472127600011

    View details for PubMedID 30825704

    View details for PubMedCentralID PMC6548583

  • See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion SCIENCE ROBOTICS Fazeli, N., Oller, M., Wu, J., Wu, Z., Tenenbaum, J. B., Rodriguez, A. 2019; 4 (26)
  • Visual Concept-Metaconcept Learning Han, C., Mao, J., Gan, C., Tenenbaum, J. B., Wu, J., Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
  • Combining Physical Simulators and Object-Based Networks for Control Ajay, A., Bauza, M., Wu, J., Fazeli, N., Tenenbaum, J. B., Rodriguez, A., Kaelbling, L. P., IEEE, Howard, A., Althoefer, K., Arai, F., Arrichiello, F., Caputo, B., Castellanos, J., Hauser, K., Isler, Kim, J., Liu, H., Oh, P., Santos, Scaramuzza, D., Ude, A., Voyles, R., Yamane, K., Okamura, A. IEEE. 2019: 3217–23
  • Propagation Networks for Model-Based Control Under Partial Observation Li, Y., Wu, J., Zhu, J., Tenenbaum, J. B., Torralba, A., Tedrake, R., IEEE, Howard, A., Althoefer, K., Arai, F., Arrichiello, F., Caputo, B., Castellanos, J., Hauser, K., Isler, Kim, J., Liu, H., Oh, P., Santos, Scaramuzza, D., Ude, A., Voyles, R., Yamane, K., Okamura, A. IEEE. 2019: 1205–11
  • ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics Hu, Y., Liu, J., Spielberg, A., Tenenbaum, J. B., Freeman, W. T., Wu, J., Rus, D., Matusik, W., IEEE, Howard, A., Althoefer, K., Arai, F., Arrichiello, F., Caputo, B., Castellanos, J., Hauser, K., Isler, Kim, J., Liu, H., Oh, P., Santos, Scaramuzza, D., Ude, A., Voyles, R., Yamane, K., Okamura, A. IEEE. 2019: 6265–71
  • Program-Guided Image Manipulators Mao, J., Zhang, X., Li, Y., Freeman, W. T., Tenenbaum, J. B., Wu, J., IEEE IEEE COMPUTER SOC. 2019: 4029–38
  • Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations Smith, K. A., Mei, L., Yao, S., Wu, J., Spelke, E., Tenenbaum, J. B., Ullman, T. D., Wallach, H., Larochelle, H., Beygelzimer, A., d'Alche-Buc, F., Fox, E., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
  • Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning INTERNATIONAL JOURNAL OF COMPUTER VISION Owens, A., Wu, J., McDermott, J. H., Freeman, W. T., Torralba, A. 2018; 126 (10): 1120–37
  • 3D Interpreter Networks for Viewer-Centered Wireframe Modeling INTERNATIONAL JOURNAL OF COMPUTER VISION Wu, J., Xue, T., Lim, J. J., Tian, Y., Tenenbaum, J. B., Torralba, A., Freeman, W. T. 2018; 126 (9): 1009–26
  • Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing Ajay, A., Wu, J., Fazeli, N., Bauza, M., Kaelbling, L. P., Tenenbaum, J. B., Rodriguez, A., Kosecka, J., Maciejewski, A. A., Okamura, A., Bicchi, A., Stachniss, C., Song, D. Z., Lee, D. H., Chaumette, F., Ding, H., Li, J. S., Wen, J., Roberts, J., Masamune, K., Chong, N. Y., Amato, N., Tsagwarakis, N., Rocco, P., Asfour, T., Chung, W. K., Yasuyoshi, Y., Sun, Y., Maciekeski, T., Althoefer, K., AndradeCetto, J., Chung, W. K., Demircan, E., Dias, J., Fraisse, P., Gross, R., Harada, H., Hasegawa, Y., Hayashibe, M., Kiguchi, K., Kim, K., Kroeger, T., Li, Y., Ma, S., Mochiyama, H., Monje, C. A., Rekleitis, Roberts, R., Stulp, F., Tsai, C. H., Zollo, L. IEEE. 2018: 3066–73
  • Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks Zheng, D., Luo, V., Wu, J., Tenenbaum, J. B., Globerson, A., Silva, R. AUAI PRESS. 2018: 497–507
  • Physical Primitive Decomposition Liu, Z., Freeman, W. T., Tenenbaum, J. B., Wu, J., Ferrari, Hebert, M., Sminchisescu, C., Weiss, Y. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 3–20
  • Seeing Tree Structure from Vibration Xue, T., Wu, J., Zhang, Z., Zhang, C., Tenenbaum, J. B., Freeman, W. T., Ferrari, Hebert, M., Sminchisescu, C., Weiss, Y. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 762–79
  • MoSculp: Interactive Visualization of Shape and Time Zhang, X., Dekel, T., Xue, T., Owens, A., He, Q., Wu, J., Mueller, S., Freeman, W. T., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2018: 275–85
  • Learning to Reconstruct Shapes from Unseen Classes Zhang, X., Zhang, Z., Zhang, C., Tenenbaum, J. B., Freeman, W. T., Wu, J., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification Long, X., Gan, C., de Melo, G., Wu, J., Liu, X., Wen, S., IEEE IEEE. 2018: 7834–43
  • Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling Sun, X., Wu, J., Zhang, X., Zhang, Z., Zhang, C., Xue, T., Tenenbaum, J. B., Freeman, W. T., IEEE IEEE. 2018: 2974–83
  • 3D Shape Perception from Monocular Vision, Touch, and Shape Priors Wang, S., Wu, J., Sun, X., Yuan, W., Freeman, W. T., Tenenbaum, J. B., Adelson, E. H., Kosecka, J., Maciejewski, A. A., Okamura, A., Bicchi, A., Stachniss, C., Song, D. Z., Lee, D. H., Chaumette, F., Ding, H., Li, J. S., Wen, J., Roberts, J., Masamune, K., Chong, N. Y., Amato, N., Tsagwarakis, N., Rocco, P., Asfour, T., Chung, W. K., Yasuyoshi, Y., Sun, Y., Maciekeski, T., Althoefer, K., AndradeCetto, J., Chung, W. K., Demircan, E., Dias, J., Fraisse, P., Gross, R., Harada, H., Hasegawa, Y., Hayashibe, M., Kiguchi, K., Kim, K., Kroeger, T., Li, Y., Ma, S., Mochiyama, H., Monje, C. A., Rekleitis, Roberts, R., Stulp, F., Tsai, C. H., Zollo, L. IEEE. 2018: 1606–13
  • 3D-Aware Scene Manipulation via Inverse Graphics Yao, S., Hsu, T., Zhu, J., Wu, J., Torralba, A., Freeman, W. T., Tenenbaum, J. B., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Visual Object Networks: Image Generation with Disentangled 3D Representation Zhu, J., Zhang, Z., Zhang, C., Wu, J., Torralba, A., Tenenbaum, J. B., Freeman, W. T., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Learning to Exploit Stability for 3D Scene Parsing Du, Y., Liu, Z., Basevi, H., Leonardis, A., Freeman, W. T., Tenenbaum, J. B., Wu, J., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding Yi, K., Wu, J., Gan, C., Torralba, A., Kohli, P., Tenenbaum, J. B., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Learning to See Physics via Visual De-animation Wu, J., Lu, E., Kohli, P., Freeman, W. T., Tenenbaum, J. B., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • MarrNet: 3D Shape Reconstruction via 2.5D Sketches Wu, J., Wang, Y., Xue, T., Sun, X., Freeman, W. T., Tenenbaum, J. B., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks Soltani, A., Huang, H., Wu, J., Kulkarni, T. D., Tenenbaum, J. B., IEEE IEEE. 2017: 2511–19
  • Neural Scene De-rendering Wu, J., Tenenbaum, J. B., Kohli, P., IEEE IEEE. 2017: 7035–43
  • Raster-to-Vector: Revisiting Floorplan Transformation Liu, C., Wu, J., Kohli, P., Furukawa, Y., IEEE IEEE. 2017: 2214–22
  • Generative Modeling of Audible Shapes for Object Perception Zhang, Z., Wu, J., Li, Q., Huang, Z., Traer, J., McDermott, J. H., Tenenbaum, J. B., Freeman, W. T., IEEE IEEE. 2017: 1260–69
  • Shape and Material from Sound Zhang, Z., Li, Q., Huang, Z., Wu, J., Tenenbaum, J. B., Freeman, W. T., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • Self-Supervised Intrinsic Image Decomposition Janner, M., Wu, J., Kulkarni, T. D., Yildirim, I., Tenenbaum, J. B., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks Xue, T., Wu, J., Bouman, K. L., Freeman, W. T., Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016
  • Single Image 3D Interpreter Network Wu, J., Xue, T., Lim, J. J., Tian, Y., Tenenbaum, J. B., Torralba, A., Freeman, W. T., Leibe, B., Matas, J., Sebe, N., Welling, M. SPRINGER INT PUBLISHING AG. 2016: 365–82
  • Ambient Sound Provides Supervision for Visual Learning Owens, A., Wu, J., McDermott, J. H., Freeman, W. T., Torralba, A., Leibe, B., Matas, J., Sebe, N., Welling, M. SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 801–16
  • Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Zhu, J., Wu, J., Xu, Y., Chang, E., Tu, Z. 2015; 37 (4): 862–75

    Abstract

    In this paper, we tackle the problem of common object (multiple classes) discovery from a set of input images, where we assume the presence of one object class in each image. This problem is, loosely speaking, unsupervised since we do not know a priori about the object type, location, and scale in each image. We observe that the general task of object class discovery in a fully unsupervised manner is intrinsically ambiguous; here we adopt saliency detection to propose candidate image windows/patches to turn an unsupervised learning problem into a weakly-supervised learning problem. In the paper, we propose an algorithm for simultaneously localizing objects and discovering object classes via bottom-up (saliency-guided) multiple class learning (bMCL). Our contributions are three-fold: (1) we adopt saliency detection to convert unsupervised learning into multiple instance learning, formulated as bottom-up multiple class learning (bMCL); (2) we propose an integrated framework that simultaneously performs object localization, object class discovery, and object detector training; (3) we demonstrate that our framework yields significant improvements over existing methods for multi-class object discovery and possess evident advantages over competing methods in computer vision. In addition, although saliency detection has recently attracted much attention, its practical usage for high-level vision tasks has yet to be justified. Our method validates the usefulness of saliency detection to output "noisy input" for a top-down method to extract common patterns.

    View details for DOI 10.1109/TPAMI.2014.2353617

    View details for Web of Science ID 000351213400012

    View details for PubMedID 26353299

  • Deep Multiple Instance Learning for Image Classification and Auto-Annotation Wu, J., Yu, Y., Huang, C., Yu, K., IEEE IEEE. 2015: 3460–69
  • MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation Wu, J., Zhao, Y., Zhu, J., Luo, S., Tu, Z., IEEE IEEE. 2014: 256–63
  • Harvesting Mid-level Visual Concepts from Large-scale Internet Images Li, Q., Wu, J., Tul, Z., IEEE IEEE. 2013: 851–58
  • A classification approach to coreference in discharge summaries: 2011 i2b2 challenge JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Xu, Y., Liu, J., Wu, J., Wang, Y., Tu, Z., Sun, J., Tsujii, J., Chang, E. 2012; 19 (5): 897–905

    Abstract

    To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries.

    View details for DOI 10.1136/amiajnl-2011-000734

    View details for Web of Science ID 000307934600030

    View details for PubMedID 22505762

    View details for PubMedCentralID PMC3422828

  • Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning Zhu, J., Wu, J., Wei, Y., Chang, E., Tu, Z., IEEE IEEE. 2012: 3218–25