Jiajun Wu
Assistant Professor of Computer Science and, by courtesy, of Psychology
Web page: https://jiajunwu.com
Bio
Jiajun Wu is an Assistant Professor of Computer Science and, by courtesy, of Psychology at Stanford University, working on computer vision, machine learning, and computational cognitive science. Before joining Stanford, he was a Visiting Faculty Researcher at Google Research. He received his PhD in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. Wu's research has been recognized through the Young Investigator Programs (YIP) by ONR and by AFOSR, the NSF CAREER award, the Okawa research grant, paper awards and finalists at ICCV, CVPR, SIGGRAPH Asia, CoRL, and IROS, dissertation awards from ACM, AAAI, and MIT, the 2020 Samsung AI Researcher of the Year, and faculty research awards from J.P. Morgan, Samsung, Amazon, and Meta.
Academic Appointments
-
Assistant Professor, Computer Science
-
Assistant Professor (By courtesy), Psychology
-
Member, Bio-X
-
Faculty Affiliate, Institute for Human-Centered Artificial Intelligence (HAI)
-
Member, Wu Tsai Neurosciences Institute
Honors & Awards
-
Research Grant, Okawa Foundation (2024)
-
CAREER Award, NSF (2024)
-
Young Investigator Program (YIP), ONR (2024)
-
Innovators Under 35 Asia Pacific, MIT Technology Review (2024)
-
Young Investigator Program (YIP), AFOSR (2023)
-
Best Paper Award, SIGGRAPH Asia, ACM (2023)
-
Best Systems Paper Award, CoRL (2023)
-
Best Paper Award Finalist, ICCV, IEEE/CVF (2023)
-
Best Paper Award Candidate, CVPR, IEEE/CVF (2023)
-
Global Research Outreach (GRO) Award, Samsung (2023)
-
New Faculty Highlights, AAAI (2023)
-
Best Paper Award Nominee, CoRL (2022)
-
Faculty Research Award, J.P. Morgan (2022)
-
30 Under 30, Science, Forbes (2022)
-
Early Career Professor Award Finalist, Agilent (2022)
-
Research Award, Meta (2021)
-
Research Award, Amazon (2021)
-
AI Researcher of the Year, Samsung (2020)
-
Global Research Outreach (GRO) Award, Samsung (2020)
-
George M. Sprowls PhD Thesis Award in Artificial Intelligence and Decision-Making, MIT (2020)
-
Doctoral Dissertation Award Honorable Mention, ACM (2019)
-
Dissertation Award, AAAI/ACM SIGAI (2019)
-
PhD Fellowship, Facebook (2017--2019)
-
Best Paper Award on Cognitive Robotics, IROS, IEEE/RSJ (2018)
-
PhD Fellowship, Samsung (2016--2017)
-
Graduate Fellowship, Nvidia (2016--2017)
-
Research Fellowship, Adobe (2015)
-
Edwin S. Webster Fellowship, MIT (2014)
Program Affiliations
-
Symbolic Systems Program
Professional Education
-
Ph.D., MIT, EECS (2020)
-
S.M., MIT, EECS (2016)
2024-25 Courses
- Minds and Machines
CS 24, LINGUIST 35, PHIL 99, PSYCH 35, SYMSYS 1, SYMSYS 200 (Win) -
Independent Studies (16)
- Advanced Reading and Research
CS 499 (Aut, Win, Spr, Sum) - Advanced Reading and Research
CS 499P (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390A (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390B (Aut, Win, Spr, Sum) - Curricular Practical Training
CS 390C (Aut, Win, Spr, Sum) - Graduate Research
NEPR 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399 (Aut, Win, Spr, Sum) - Independent Project
CS 399P (Aut, Win, Spr, Sum) - Independent Study
SYMSYS 296 (Aut, Win, Spr, Sum) - Independent Work
CS 199 (Aut, Win, Spr, Sum) - Independent Work
CS 199P (Aut, Win, Spr, Sum) - Part-time Curricular Practical Training
CS 390D (Aut, Win, Spr, Sum) - Programming Service Project
CS 192 (Aut, Win, Spr, Sum) - Senior Project
CS 191 (Aut, Win, Spr, Sum) - Supervised Undergraduate Research
CS 195 (Aut, Win, Spr, Sum) - Writing Intensive Senior Research Project
CS 191W (Aut, Win, Spr)
- Advanced Reading and Research
-
Prior Year Courses
2023-24 Courses
- Computer Graphics in the Era of AI
CS 348I (Win) - Minds and Machines
CS 24, LINGUIST 35, PHIL 99, PSYCH 35, SYMSYS 1, SYMSYS 200 (Win)
2022-23 Courses
- Minds and Machines
CS 24, LINGUIST 35, PHIL 99, PSYCH 35, SYMSYS 1, SYMSYS 200 (Aut)
2021-22 Courses
- Computer Graphics in the Era of AI
CS 348I (Aut) - Deep Learning for Computer Vision
CS 231N (Spr) - Triangulating Intelligence: Melding Neuroscience, Psychology, and AI
CS 322, PSYCH 225 (Win)
- Computer Graphics in the Era of AI
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Honglin Chen, Rastko Ciric, Zipeng Fu, Keenon Werling, Josiah Wong -
Postdoctoral Faculty Sponsor
Weiyu Liu, Elliott Wu, Mengdi Xu -
Orals Evaluator
Honglin Chen -
Doctoral Dissertation Advisor (AC)
Michael Lingelbach -
Master's Program Advisor
Aditya Bora, Tim Chen, Emma Escandon, Koren Gilbai, Emily Jin, Tarun Kumar Martheswaran, Ananjan Nandi, Nikil Ravi, Arvind Saligrama, Bhavna Sud, Jeremy Tian, Ethan Tiao, Chuyi Zhang, Paris Zhang, Frank Zhao, Fangjun Zhou -
Doctoral Dissertation Co-Advisor (AC)
Eric Chan, Cristobal Eyzaguirre, Michelle Guo, Kyle Hsu, Jiaman Li, Kyle Sargent, Fan-Yun Sun -
Doctoral (Program)
Samuel Clarke, Chen Geng, Joy Hsu, Stephen Tian, Koven Yu, Yunzhi Zhang -
Postdoctoral Research Mentor
Stefan Stojanov
All Publications
-
Physical scene understanding
AI MAGAZINE
2024
View details for DOI 10.1002/aaai.12148
View details for Web of Science ID 001158170300001
-
Neurosymbolic Models for Computer Graphics
COMPUTER GRAPHICS FORUM
2023; 42 (2): 545-568
View details for DOI 10.1111/cgf.14775
View details for Web of Science ID 001000062600040
-
REALIMPACT: A Dataset of Impact Sound Fields for Real Objects
IEEE COMPUTER SOC. 2023: 1516-1525
View details for DOI 10.1109/CVPR52729.2023.00152
View details for Web of Science ID 001058542601079
-
Seeing a Rose in Five Thousand Ways
IEEE COMPUTER SOC. 2023: 962-971
View details for DOI 10.1109/CVPR52729.2023.00099
View details for Web of Science ID 001058542601026
-
OBJECTFOLDER 2.0: A Multisensory Object Dataset for Sim2Real Transfer
IEEE COMPUTER SOC. 2022: 10588-10598
View details for DOI 10.1109/CVPR52688.2022.01034
View details for Web of Science ID 000870759103065
-
3D Shape Generation and Completion through Point-Voxel Diffusion
IEEE. 2021: 5806-5815
View details for DOI 10.1109/ICCV48922.2021.00577
View details for Web of Science ID 000797698906004
-
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
2019; 41 (9): 2236–50
Abstract
We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods that have tackled this problem in a deterministic or non-parametric way, we propose to model future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. To synthesize realistic movement of objects, we propose a novel network structure, namely a Cross Convolutional Network; this network encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, and on real-world video frames. We present analyses of the learned network representations, showing it is implicitly learning a compact encoding of object appearance and motion. We also demonstrate a few of its applications, including visual analogy-making and video extrapolation.
View details for DOI 10.1109/TPAMI.2018.2854726
View details for Web of Science ID 000480343900014
View details for PubMedID 30004870
-
Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016
View details for Web of Science ID 000458973700060
-
Galileo: Perceiving Physical Object Properties by Integrating a Physics Engine with Deep Learning
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2015
View details for Web of Science ID 000450913101040
-
3D Congealing: 3D-Aware Image Alignment in the Wild
SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 387-404
View details for DOI 10.1007/978-3-031-73232-4_22
View details for Web of Science ID 001346378300022
-
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
SPRINGER INTERNATIONAL PUBLISHING AG. 2025: 100-119
View details for DOI 10.1007/978-3-031-73232-4_6
View details for Web of Science ID 001346378300006
-
Foundation models in robotics: Applications, challenges, and the future
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH
2024
View details for DOI 10.1177/02783649241281508
View details for Web of Science ID 001319814900001
-
Partial-View Object View Synthesis via Filtering Inversion
IEEE COMPUTER SOC. 2024: 453-463
View details for DOI 10.1109/3DV62453.2024.00105
View details for Web of Science ID 001250581700033
-
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 5805-5813
View details for Web of Science ID 001239936300079
-
CityPulse: Fine-Grained Assessment of Urban Change with Street View Time Series
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2024: 22123-22131
View details for Web of Science ID 001239985800028
-
DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks
ASSOC COMPUTING MACHINERY. 2024
View details for DOI 10.1145/3641519.3657493
View details for Web of Science ID 001282218200099
-
RoboCraft: Learning to see, simulate, and shape elasto-plastic objects in 3D with graph networks
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH
2023
View details for DOI 10.1177/02783649231219020
View details for Web of Science ID 001126259900001
-
Object Motion Guided Human Motion Synthesis
ACM TRANSACTIONS ON GRAPHICS
2023; 42 (6)
View details for DOI 10.1145/3618333
View details for Web of Science ID 001139790400025
-
Fluid Simulation on Neural Flow Maps
ACM TRANSACTIONS ON GRAPHICS
2023; 42 (6)
View details for DOI 10.1145/3618392
View details for Web of Science ID 001139790400076
-
Editing Motion Graphics Video via Motion Vectorization and Transformation
ACM TRANSACTIONS ON GRAPHICS
2023; 42 (6)
View details for DOI 10.1145/3618316
View details for Web of Science ID 001139790400057
-
Differentiable Physics Simulation of Dynamics-Augmented Neural Objects
IEEE ROBOTICS AND AUTOMATION LETTERS
2023; 8 (5): 2780-2787
View details for DOI 10.1109/LRA.2023.3257707
View details for Web of Science ID 000964797800002
-
Rendering Humans from Object-Occluded Monocular Videos
IEEE COMPUTER SOC. 2023: 3216-3227
View details for DOI 10.1109/ICCV51070.2023.00300
View details for Web of Science ID 001159644303043
-
Model-Based Control with Sparse Neural Dynamics
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001226352807027
-
Learning Rational Subgoals from Demonstrations and Instructions
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2023: 12068-12078
View details for Web of Science ID 001243749200063
-
Benchmarking Rigid Body Contact Models
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221742900113
-
Learning to Design and Use Tools for Robotic Manipulation
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201500042
-
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201500025
-
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201501042
-
Can Visual Scratchpads With Diagrammatic Abstractions Augment LLM Reasoning?
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023: 21-28
View details for Web of Science ID 001347141700002
-
Siamese Masked Autoencoders
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001228825108029
-
Compositional Diffusion-Based Continuous Constraint Solvers
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201503016
-
Learning Sequential Acquisition Policies for Robot-Assisted Feeding
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201501018
-
RoboCook: Long-Horizon Elasto-Plastic Object Manipulation with Diverse Tools
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201500029
-
Composable Part-Based Manipulation
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221201501019
-
Inferring Hybrid Neural Fluid Fields from Videos
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001224281507019
-
Holistic Evaluation of Text-to-Image Models
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001224281504033
-
3D Neural Field Generation using Triplane Diffusion
IEEE COMPUTER SOC. 2023: 20875-20886
View details for DOI 10.1109/CVPR52729.2023.02000
View details for Web of Science ID 001062531305021
-
Disentanglement via Latent Quantization
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001220818801008
-
CIRCLE: Capture In Rich Contextual Environments
IEEE COMPUTER SOC. 2023: 21211-21221
View details for DOI 10.1109/CVPR52729.2023.02032
View details for Web of Science ID 001062531305053
-
Ego-Body Pose Estimation via Ego-Head Pose Estimation
IEEE COMPUTER SOC. 2023: 17142-17151
View details for DOI 10.1109/CVPR52729.2023.01644
View details for Web of Science ID 001062531301043
-
Primitive Skill-based Robot Learning from Human Evaluative Feedback
IEEE. 2023: 7817-7824
View details for DOI 10.1109/IROS55552.2023.10341912
View details for Web of Science ID 001136907802021
-
Stanford-ORB: A Real-World 3D Object Inverse Rendering Benchmark
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001229751906005
-
SOUNDCAM: A Dataset for Finding Humans Using Room Acoustics
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001229751904038
-
What's <i>Left</i>? Concept Grounding with Logic-Enhanced Foundation Models
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001230083405016
-
STAP: Sequencing Task-Agnostic Policies
IEEE. 2023: 7951-7958
View details for DOI 10.1109/ICRA48891.2023.10160220
View details for Web of Science ID 001048371101041
-
NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations
IEEE COMPUTER SOC. 2023: 2614-2623
View details for DOI 10.1109/CVPR52729.2023.00257
View details for Web of Science ID 001058542602090
-
ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding
IEEE COMPUTER SOC. 2023: 1179-1189
View details for DOI 10.1109/CVPR52729.2023.00120
View details for Web of Science ID 001058542601047
-
3D Copy-Paste: Physically Plausible Object Insertion for Monocular 3D Detection
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001202273400003
-
Accidental Light Probes
IEEE COMPUTER SOC. 2023: 12521-12530
View details for DOI 10.1109/CVPR52729.2023.01205
View details for Web of Science ID 001062522104081
-
Multi-Object Manipulation via Object-Centric Neural Scattering Functions
IEEE COMPUTER SOC. 2023: 9021-9031
View details for DOI 10.1109/CVPR52729.2023.00871
View details for Web of Science ID 001062522101031
-
The OBJECTFOLDER BENCHMARK: Multisensory Learning with <i>Neural</i> and <i>Real</i> Objects
IEEE COMPUTER SOC. 2023: 17276-17286
View details for DOI 10.1109/CVPR52729.2023.01657
View details for Web of Science ID 001062531301056
-
PyPose: A Library for Robot Learning with Physics-based Optimization
IEEE COMPUTER SOC. 2023: 22024-22034
View details for DOI 10.1109/CVPR52729.2023.02109
View details for Web of Science ID 001062531306035
-
Task-Driven Graph Attention for Hierarchical Relational Object Navigation
IEEE. 2023: 886-893
View details for DOI 10.1109/ICRA48891.2023.10161157
View details for Web of Science ID 001036713000053
-
SONICVERSE: A Multisensory Simulation Platform for Embodied Household Agents that See and Hear
IEEE. 2023: 704-711
View details for DOI 10.1109/ICRA48891.2023.10160461
View details for Web of Science ID 001036713000028
-
Putting People in Their Place: Affordance-Aware Human Insertion into Scenes
IEEE COMPUTER SOC. 2023: 17089-17099
View details for DOI 10.1109/CVPR52729.2023.01639
View details for Web of Science ID 001062531301038
-
Tree-Structured Shading Decomposition
IEEE COMPUTER SOC. 2023: 488-498
View details for DOI 10.1109/ICCV51070.2023.00051
View details for Web of Science ID 001159644300045
-
VQ3D: Learning a 3D-Aware Generative Model on ImageNet
IEEE COMPUTER SOC. 2023: 4217-4227
View details for DOI 10.1109/ICCV51070.2023.00391
View details for Web of Science ID 001159644304044
-
RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks
RSS FOUNDATION-ROBOTICS SCIENCE & SYSTEMS FOUNDATION. 2022
View details for Web of Science ID 000827625700008
-
Unsupervised Segmentation in Real-World Images via Spelke Object Inference
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 719-735
View details for DOI 10.1007/978-3-031-19818-2_41
View details for Web of Science ID 000903735000041
-
Rotationally Equivariant 3D Object Detection
IEEE COMPUTER SOC. 2022: 1446-1454
View details for DOI 10.1109/CVPR52688.2022.00151
View details for Web of Science ID 000867754201068
-
Revisiting the "Video" in Video-Language Understanding
IEEE COMPUTER SOC. 2022: 2907-2917
View details for DOI 10.1109/CVPR52688.2022.00293
View details for Web of Science ID 000867754203017
-
Translating a Visual LEGO Manual to a Machine-Executable Plan
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 677-694
View details for DOI 10.1007/978-3-031-19836-6_38
View details for Web of Science ID 000903756400038
-
Video Extrapolation in Space and Time
SPRINGER INTERNATIONAL PUBLISHING AG. 2022: 313-333
View details for DOI 10.1007/978-3-031-19787-1_18
View details for Web of Science ID 000904102900018
-
Scene Synthesis from Human Motion
ASSOC COMPUTING MACHINERY. 2022
View details for DOI 10.1145/3550469.3555426
View details for Web of Science ID 001074614400051
-
Programmatic Concept Learning for Human Motion Description and Synthesis
IEEE COMPUTER SOC. 2022: 13833-13842
View details for DOI 10.1109/CVPR52688.2022.01347
View details for Web of Science ID 000870759106090
-
Hierarchical Motion Understanding via Motion Programs
IEEE COMPUTER SOC. 2021: 6564-6572
View details for DOI 10.1109/CVPR46437.2021.00650
View details for Web of Science ID 000739917306077
-
Neural Radiance Flow for 4D View Synthesis and Video Processing
IEEE. 2021: 14304-14314
View details for DOI 10.1109/ICCV48922.2021.01406
View details for Web of Science ID 000798743204050
-
Learning Temporal Dynamics from Cycles in Narrated Video
IEEE. 2021: 1460-1469
View details for DOI 10.1109/ICCV48922.2021.00151
View details for Web of Science ID 000797698901064
-
Augmenting Policy Learning with Routines Discovered from a Single Demonstration
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2021: 11024-11032
View details for Web of Science ID 000681269802081
-
KeypointDeformer: Unsupervised 3D Keypoint Discovery for Shape Control
IEEE COMPUTER SOC. 2021: 12778-12787
View details for DOI 10.1109/CVPR46437.2021.01259
View details for Web of Science ID 000742075002096
-
De-rendering the World's Revolutionary Artefacts
IEEE COMPUTER SOC. 2021: 6334-6343
View details for DOI 10.1109/CVPR46437.2021.00627
View details for Web of Science ID 000739917306054
-
pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis
IEEE COMPUTER SOC. 2021: 5795-5805
View details for DOI 10.1109/CVPR46437.2021.00574
View details for Web of Science ID 000739917306001
-
Repopulating Street Scenes
IEEE COMPUTER SOC. 2021: 5106-5115
View details for DOI 10.1109/CVPR46437.2021.00507
View details for Web of Science ID 000739917305031
-
Learning Generative Models of 3D Structures
WILEY. 2020: 643–66
View details for DOI 10.1111/cgf.14020
View details for Web of Science ID 000548709600052
-
End-to-End Optimization of Scene Layout
IEEE. 2020: 3753–62
View details for DOI 10.1109/CVPR42600.2020.00381
View details for Web of Science ID 000620679504003
-
DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs
IJCAI-INT JOINT CONF ARTIF INTELL. 2020: 4190-4198
View details for Web of Science ID 000764196704043
-
Accurate Vision-based Manipulation through Contact Reasoning
IEEE. 2020: 6738-6744
View details for Web of Science ID 000712319504073
-
Visual Grounding of Learned Physical Models
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2020
View details for Web of Science ID 000683178506006
-
Perspective Plane Program Induction from a Single Image
IEEE. 2020: 4433–42
View details for DOI 10.1109/CVPR42600.2020.00449
View details for Web of Science ID 000620679504071
-
Video Enhancement with Task-Oriented Flow
INTERNATIONAL JOURNAL OF COMPUTER VISION
2019; 127 (8): 1106–25
View details for DOI 10.1007/s11263-018-01144-2
View details for Web of Science ID 000474559000008
-
An integrative computational architecture for object-driven cortex
CURRENT OPINION IN NEUROBIOLOGY
2019; 55: 73–81
Abstract
Computational architecture for object-driven cortex Objects in motion activate multiple cortical regions in every lobe of the human brain. Do these regions represent a collection of independent systems, or is there an overarching functional architecture spanning all of object-driven cortex? Inspired by recent work in artificial intelligence (AI), machine learning, and cognitive science, we consider the hypothesis that these regions can be understood as a coherent network implementing an integrative computational system that unifies the functions needed to perceive, predict, reason about, and plan with physical objects-as in the paradigmatic case of using or making tools. Our proposal draws on a modeling framework that combines multiple AI methods, including causal generative models, hybrid symbolic-continuous planning algorithms, and neural recognition networks, with object-centric, physics-based representations. We review evidence relating specific components of our proposal to the specific regions that comprise object-driven cortex, and lay out future research directions with the goal of building a complete functional and mechanistic account of this system.
View details for DOI 10.1016/j.conb.2019.01.010
View details for Web of Science ID 000472127600011
View details for PubMedID 30825704
View details for PubMedCentralID PMC6548583
-
See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion
SCIENCE ROBOTICS
2019; 4 (26)
View details for DOI 10.1126/scirobotics.aav3123
View details for Web of Science ID 000458560100005
-
Visual Concept-Metaconcept Learning
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
View details for Web of Science ID 000534424305005
-
Combining Physical Simulators and Object-Based Networks for Control
IEEE. 2019: 3217–23
View details for Web of Science ID 000494942302054
-
Propagation Networks for Model-Based Control Under Partial Observation
IEEE. 2019: 1205–11
View details for Web of Science ID 000494942300127
-
ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics
IEEE. 2019: 6265–71
View details for Web of Science ID 000494942304086
-
Program-Guided Image Manipulators
IEEE COMPUTER SOC. 2019: 4029–38
View details for DOI 10.1109/ICCV.2019.00413
View details for Web of Science ID 000531438104018
-
Modeling Expectation Violation in Intuitive Physics with Coarse Probabilistic Object Representations
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2019
View details for Web of Science ID 000535866900056
-
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
INTERNATIONAL JOURNAL OF COMPUTER VISION
2018; 126 (10): 1120–37
View details for DOI 10.1007/s11263-018-1083-5
View details for Web of Science ID 000443018400004
-
3D Interpreter Networks for Viewer-Centered Wireframe Modeling
INTERNATIONAL JOURNAL OF COMPUTER VISION
2018; 126 (9): 1009–26
View details for DOI 10.1007/s11263-018-1074-6
View details for Web of Science ID 000441553300008
-
Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing
IEEE. 2018: 3066–73
View details for Web of Science ID 000458872702129
-
Unsupervised Learning of Latent Physical Properties Using Perception-Prediction Networks
AUAI PRESS. 2018: 497–507
View details for Web of Science ID 000493119200049
-
Physical Primitive Decomposition
SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 3–20
View details for DOI 10.1007/978-3-030-01258-8_1
View details for Web of Science ID 000604449400001
-
Seeing Tree Structure from Vibration
SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 762–79
View details for DOI 10.1007/978-3-030-01240-3_46
View details for Web of Science ID 000594233000046
-
MoSculp: Interactive Visualization of Shape and Time
ASSOC COMPUTING MACHINERY. 2018: 275–85
View details for DOI 10.1145/3242587.3242592
View details for Web of Science ID 000494260500025
-
Learning to Reconstruct Shapes from Unseen Classes
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823302028
-
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
IEEE. 2018: 7834–43
View details for DOI 10.1109/CVPR.2018.00817
View details for Web of Science ID 000457843607102
-
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
IEEE. 2018: 2974–83
View details for DOI 10.1109/CVPR.2018.00314
View details for Web of Science ID 000457843603012
-
3D Shape Perception from Monocular Vision, Touch, and Shape Priors
IEEE. 2018: 1606–13
View details for Web of Science ID 000458872701105
-
3D-Aware Scene Manipulation via Inverse Graphics
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823301084
-
Visual Object Networks: Image Generation with Disentangled 3D Representation
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823300012
-
Learning to Exploit Stability for 3D Scene Parsing
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823301069
-
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
View details for Web of Science ID 000461823301006
-
Learning to See Physics via Visual De-animation
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649400015
-
MarrNet: 3D Shape Reconstruction via 2.5D Sketches
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649400052
-
Synthesizing 3D Shapes via Modeling Multi-View Depth Maps and Silhouettes with Deep Generative Networks
IEEE. 2017: 2511–19
View details for DOI 10.1109/CVPR.2017.269
View details for Web of Science ID 000418371402061
-
Neural Scene De-rendering
IEEE. 2017: 7035–43
View details for DOI 10.1109/CVPR.2017.744
View details for Web of Science ID 000418371407015
-
Raster-to-Vector: Revisiting Floorplan Transformation
IEEE. 2017: 2214–22
View details for DOI 10.1109/ICCV.2017.241
View details for Web of Science ID 000425498402029
-
Generative Modeling of Audible Shapes for Object Perception
IEEE. 2017: 1260–69
View details for DOI 10.1109/ICCV.2017.141
View details for Web of Science ID 000425498401034
-
Shape and Material from Sound
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649401031
-
Self-Supervised Intrinsic Image Decomposition
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
View details for Web of Science ID 000452649406002
-
Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2016
View details for Web of Science ID 000458973704076
-
Single Image 3D Interpreter Network
SPRINGER INT PUBLISHING AG. 2016: 365–82
View details for DOI 10.1007/978-3-319-46466-4_22
View details for Web of Science ID 000389499900022
-
Ambient Sound Provides Supervision for Visual Learning
SPRINGER INTERNATIONAL PUBLISHING AG. 2016: 801–16
View details for DOI 10.1007/978-3-319-46448-0_48
View details for Web of Science ID 000389382700048
-
Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
2015; 37 (4): 862–75
Abstract
In this paper, we tackle the problem of common object (multiple classes) discovery from a set of input images, where we assume the presence of one object class in each image. This problem is, loosely speaking, unsupervised since we do not know a priori about the object type, location, and scale in each image. We observe that the general task of object class discovery in a fully unsupervised manner is intrinsically ambiguous; here we adopt saliency detection to propose candidate image windows/patches to turn an unsupervised learning problem into a weakly-supervised learning problem. In the paper, we propose an algorithm for simultaneously localizing objects and discovering object classes via bottom-up (saliency-guided) multiple class learning (bMCL). Our contributions are three-fold: (1) we adopt saliency detection to convert unsupervised learning into multiple instance learning, formulated as bottom-up multiple class learning (bMCL); (2) we propose an integrated framework that simultaneously performs object localization, object class discovery, and object detector training; (3) we demonstrate that our framework yields significant improvements over existing methods for multi-class object discovery and possess evident advantages over competing methods in computer vision. In addition, although saliency detection has recently attracted much attention, its practical usage for high-level vision tasks has yet to be justified. Our method validates the usefulness of saliency detection to output "noisy input" for a top-down method to extract common patterns.
View details for DOI 10.1109/TPAMI.2014.2353617
View details for Web of Science ID 000351213400012
View details for PubMedID 26353299
-
Deep Multiple Instance Learning for Image Classification and Auto-Annotation
IEEE. 2015: 3460–69
View details for Web of Science ID 000387959203053
-
MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation
IEEE. 2014: 256–63
View details for DOI 10.1109/CVPR.2014.40
View details for Web of Science ID 000361555600033
-
Harvesting Mid-level Visual Concepts from Large-scale Internet Images
IEEE. 2013: 851–58
View details for DOI 10.1109/CVPR.2013.115
View details for Web of Science ID 000331094300108
-
A classification approach to coreference in discharge summaries: 2011 i2b2 challenge
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION
2012; 19 (5): 897–905
Abstract
To create a highly accurate coreference system in discharge summaries for the 2011 i2b2 challenge. The coreference categories include Person, Problem, Treatment, and Test.An integrated coreference resolution system was developed by exploiting Person attributes, contextual semantic clues, and world knowledge. It includes three subsystems: Person coreference system based on three Person attributes, Problem/Treatment/Test system based on numerous contextual semantic extractors and world knowledge, and Pronoun system based on a multi-class support vector machine classifier. The three Person attributes are patient, relative and hospital personnel. Contextual semantic extractors include anatomy, position, medication, indicator, temporal, spatial, section, modifier, equipment, operation, and assertion. The world knowledge is extracted from external resources such as Wikipedia.Micro-averaged precision, recall and F-measure in MUC, BCubed and CEAF were used to evaluate results.The system achieved an overall micro-averaged precision, recall and F-measure of 0.906, 0.925, and 0.915, respectively, on test data (from four hospitals) released by the challenge organizers. It achieved a precision, recall and F-measure of 0.905, 0.920 and 0.913, respectively, on test data without Pittsburgh data. We ranked the first out of 20 competing teams. Among the four sub-tasks on Person, Problem, Treatment, and Test, the highest F-measure was seen for Person coreference.This system achieved encouraging results. The Person system can determine whether personal pronouns and proper names are coreferent or not. The Problem/Treatment/Test system benefits from both world knowledge in evaluating the similarity of two mentions and contextual semantic extractors in identifying semantic clues. The Pronoun system can automatically detect whether a Pronoun mention is coreferent to that of the other four types. This study demonstrates that it is feasible to accomplish the coreference task in discharge summaries.
View details for DOI 10.1136/amiajnl-2011-000734
View details for Web of Science ID 000307934600030
View details for PubMedID 22505762
View details for PubMedCentralID PMC3422828
-
Unsupervised Object Class Discovery via Saliency-Guided Multiple Class Learning
IEEE. 2012: 3218–25
View details for Web of Science ID 000309166203049