Bio


Silvio Savarese is an Associate Professor of Computer Science at Stanford University and director of the SAIL-Toyota Center for AI Research at Stanford. He earned his Ph.D. in Electrical Engineering from the California Institute of Technology in 2005 and was a Beckman Institute Fellow at the University of Illinois at Urbana-Champaign from 2005–2008. He joined Stanford in 2013 after being Assistant and then Associate Professor of Electrical and Computer Engineering at the University of Michigan, Ann Arbor, from 2008 to 2013. His research interests include computer vision, robotic perception and machine learning. He is recipient of several awards including a Best Student Paper Award at CVPR 2016, the James R. Croes Medal in 2013, a TRW Automotive Endowed Research Award in 2012, an NSF Career Award in 2011 and Google Research Award in 2010. In 2002 he was awarded the Walker von Brimer Award for outstanding research initiative.

Academic Appointments


Administrative Appointments


  • Mindtree Faculty Scholar, Mindtree (2018 - Present)

2018-19 Courses


Stanford Advisees


All Publications


  • Watch-n-Patch: Unsupervised Learning of Actions and Relations IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Wu, C., Zhang, J., Sener, O., Selman, B., Savarese, S., Saxena, A. 2018; 40 (2): 467–81

    Abstract

    There is a large variation in the activities that humans perform in their everyday lives. We consider modeling these composite human activities which comprises multiple basic level actions in a completely unsupervised setting. Our model learns high-level co-occurrence and temporal relations between the actions. We consider the video as a sequence of short-term action clips, which contains human-words and object-words. An activity is about a set of action-topics and object-topics indicating which actions are present and which objects are interacting with. We then propose a new probabilistic model relating the words and the topics. It allows us to model long-range action relations that commonly exist in the composite activities, which is challenging in previous works. We apply our model to the unsupervised action segmentation and clustering, and to a novel application that detects forgotten actions, which we call action patching. For evaluation, we contribute a new challenging RGB-D activity video dataset recorded by the new Kinect v2, which contains several human daily activities as compositions of multiple actions interacting with different objects. Moreover, we develop a robotic system that watches and reminds people using our action patching algorithm. Our robotic setup can be easily deployed on any assistive robots.

    View details for DOI 10.1109/TPAMI.2017.2679054

    View details for Web of Science ID 000422706000015

    View details for PubMedID 28287959

  • Tracking The Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies Sadeghian, A., Alahi, A., Savarese, S., IEEE IEEE. 2017: 300–311
  • Adversarially Robust Policy Learning: Active Construction of Physically-Plausible Perturbations Mandlekar, A., Zhu, Y., Garg, A., Li Fei-Fei, Savarese, S., Bicchi, A., Okamura, A. IEEE. 2017: 3932–39
  • Deep View Morphing Ji, D., Kwon, J., McFarland, M., Savarese, S., IEEE IEEE. 2017: 7092–7100
  • Feedback Networks Zamir, A. R., Wu, T., Sun, L., Shen, W. B., Shi, B. E., Malik, J., Savarese, S., IEEE IEEE. 2017: 1808–17
  • Lattice Long Short-Term Memory for Human Action Recognition Sun, L., Jia, K., Chen, K., Yeung, D., Shi, B. E., Savarese, S., IEEE IEEE. 2017: 2166–75
  • Robust real-time tracking combining 3D shape, color, and motion INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH Held, D., Levinson, J., Thrun, S., Savarese, S. 2016; 35 (1-3): 30-49
  • Automatic Extrinsic Calibration of Vision and Lidar by Maximizing Mutual Information JOURNAL OF FIELD ROBOTICS Pandey, G., McBride, J. R., Savarese, S., Eustice, R. M. 2015; 32 (5): 696-722

    View details for DOI 10.1002/rob.21542

    View details for Web of Science ID 000358016200005

  • Indoor Scene Understanding with Geometric and Semantic Contexts INTERNATIONAL JOURNAL OF COMPUTER VISION Choi, W., Chao, Y., Pantofaru, C., Savarese, S. 2015; 112 (2): 204-220
  • Relating Things and Stuff via Object Property Interactions IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Sun, M., Kim, B., Kohli, P., Savarese, S. 2014; 36 (7): 1370-1383
  • Relating Things and Stuff via ObjectProperty Interactions. IEEE transactions on pattern analysis and machine intelligence Min Sun, Byung-Soo Kim, Kohli, P., Savarese, S. 2014; 36 (7): 1370-1383

    Abstract

    In the last few years, substantially different approaches have been adopted for segmenting and detecting "things" (object categories that have a well defined shape such as people and cars) and "stuff" (object categories which have an amorphous spatial extent such as grass and sky). While things have been typically detected by sliding window or Hough transform based methods, detection of stuff is generally formulated as a pixel or segment-wise classification problem. This paper proposes a framework for scene understanding that models both things and stuff using a common representation while preserving their distinct nature by using a property list. This representation allows us to enforce sophisticated geometric and semantic relationships between thing and stuff categories via property interactions in a single graphical model. We use the latest advances made in the field of discrete optimization to efficiently perform maximum a posteriori (MAP) inference in this model. We evaluate our method on the Stanford dataset by comparing it against state-of-the-art methods for object segmentation and detection. We also show that our method achieves competitive performances on the challenging PASCAL '09 segmentation dataset.

    View details for DOI 10.1109/TPAMI.2013.193

    View details for PubMedID 26353309

  • Understanding Collective Activities of People from Videos IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Choi, W., Savarese, S. 2014; 36 (6): 1242-1257
  • A Bayesian generative model for learning semantic hierarchies FRONTIERS IN PSYCHOLOGY Mittelman, R., Sun, M., Kuipers, B., Savarese, S. 2014; 5

    Abstract

    Building fine-grained visual recognition systems that are capable of recognizing tens of thousands of categories, has received much attention in recent years. The well known semantic hierarchical structure of categories and concepts, has been shown to provide a key prior which allows for optimal predictions. The hierarchical organization of various domains and concepts has been subject to extensive research, and led to the development of the WordNet domains hierarchy (Fellbaum, 1998), which was also used to organize the images in the ImageNet (Deng et al., 2009) dataset, in which the category count approaches the human capacity. Still, for the human visual system, the form of the hierarchy must be discovered with minimal use of supervision or innate knowledge. In this work, we propose a new Bayesian generative model for learning such domain hierarchies, based on semantic input. Our model is motivated by the super-subordinate organization of domain labels and concepts that characterizes WordNet, and accounts for several important challenges: maintaining context information when progressing deeper into the hierarchy, learning a coherent semantic concept for each node, and modeling uncertainty in the perception process.

    View details for DOI 10.3389/fpsyg.2014.00417

    View details for Web of Science ID 000336085600001

    View details for PubMedID 24904452

    View details for PubMedCentralID PMC4033064

  • Beyond PASCAL: A Benchmark for 3D Object Detection in the Wild Xiang, Y., Mottaghi, R., Savarese, S. 2014
  • Monocular Multiview Object Tracking with 3D Aspect Parts 13th European Conference on Computer Vision (ECCV) Xiang, Y., Song, C., Mottaghi, R., Savarese, S. SPRINGER-VERLAG BERLIN. 2014: 220–235
  • A Hierarchical Representation for Future Action Prediction 13th European Conference on Computer Vision (ECCV) Lan, T., Chen, T., Savarese, S. SPRINGER INT PUBLISHING AG. 2014: 689–704
  • Relating Things and Stuff via Object Property Interactions. IEEE transactions on pattern analysis and machine intelligence Sun, M., Kim, B., Kohli, P., Savarese, S. 2013: -?

    Abstract

    In the last few years, substantially different approaches have been adopted for segmenting and detecting "things" (object categories that have a well defined shape such as people and cars) and "stuff" (object categories which have an amorphous spatial extent such as grass and sky). While things have been typically detected by sliding window or Hough transform based methods, detection of stuff is generally formulated as a pixel or segment-wise classification problem. This paper proposes a framework for scene understanding that models both things and stuff using a common representation while preserving their distinct nature by using a property list. This representation allows us to enforce sophisticated geometric and semantic relationships between thing and stuff categories via property interactions in a single graphical model. We use the latest advances made in the field of discrete optimization to efficiently perform maximum a posteriori (MAP) inference in this model. We evaluate our method on the Stanford dataset by comparing it against state-of-the-art methods for object segmentation and detection. We also show that our method achieves competitive performances on the challenging PASCAL'09 segmentation dataset.

    View details for PubMedID 24101332

  • Layout Estimation of Highly Cluttered Indoor Scenes using Geometric and Semantic Cues Chao, Y. -W., Choi, W., Pantofaru, C., Savarese, S. 2013
  • Find the Best Path: an Efficient and Accurate Classifier for Image Hierarchies IEEE International Conference on Computer Vision (ICCV) Sun, M., Huang, W., Savarese, S. IEEE. 2013: 265–272
  • 3D Scene Understanding by Voxel-CRF IEEE International Conference on Computer Vision (ICCV) Kim, B., Kohli, P., Savarese, S. IEEE. 2013: 1425–1432
  • Breaking the chain: liberation from the temporal Markov assumption for tracking human poses IEEE International Conference on Computer Vision (ICCV) Tokola, R., Choi, W., Savarese, S. IEEE. 2013: 2424–2431
  • Object Detection by 3D Aspectlets and Occlusion Reasoning IEEE International Conference on Computer Vision Workshops (ICCVW) Xiang, Y., Savarese, S. IEEE. 2013: 530–537
  • Free your Camera: 3D Indoor Scene Understanding from Arbitrary Camera Motion 24th British Machine Vision Conference Furlan, A., Miller, S., Sorrenti, D. G., Fei-Fei, L., Savarese, S. B M V A PRESS. 2013

    View details for DOI 10.5244/C.27.24

    View details for Web of Science ID 000346352700021

  • Accurate Localization of 3D Objects from RGB-D Data using Segmentation Hypotheses Kim, B., Xu, S., Savarese, S. 2013
  • Breaking the chain: liberation from the temporal Markov assumption for tracking human poses Tokola, R., Choi, W., Savarese, S. 2013
  • Dense Object Reconstruction Using Semantic Priors Bao, Y., Chandraker, M., Lin, Y., Savarese, S. 2013
  • Free your Camera: 3D Indoor Scene Understanding from Arbitrary Camera Motion Furlan, A., Miller, S., Sorrenti, D., G., Fei-Fei, L., Savarese, S. 2013
  • Learning Hierarchical Linguistic Descriptions of Visual Datasets NAACL-HLT Workshop on Vision and Language Mittelman, R., Sun, M., Kuipers, B., Savarese, S. 2013
  • Find the Best Path: an Efficient and Accurate Classifier for Image Hierarchies Sun, M., Huang, W., Savarese, S. 2013
  • Understanding Indoor Scenes using 3D Geometric Phrases Choi, W., Chao, Y. -W., Pantofaru, C., Savarese, S. 2013
  • Weakly Supervised Learning of Mid-Level Features with Beta-Bernoulli Process Restricted Boltzmann Machines Mittelman, R., Lee, H., Kuipers, B., Savarese, S. 2013
  • Recognizing Complex Human Activities via Crowd Context Augmented Vision and Reality Choi, W., Savarese, S. Springer. 2013: 1
  • Object Detection by 3D Aspectlets and Occlusion Reasoning in the 4th International IEEE Workshop on 3D Representation and Recognition (3dRR) Xiang, Y., Savarese, S. 2013
  • Label Transfer Exploiting Three-dimensional Structure for Semantic Segmentation Garro, V., Fusiello, A., Savarese, S. 2013
  • 3D Scene Understanding by Voxel-CRF Kim, B., Kohli, P., Savarese, S. 2013
  • Object detection, shape recovery, and 3D modelling by depth-encoded hough voting in Computer Vision and Image Understanding (CVIU) Sun, M., Kumar, S., S., Bradski, G., Savarese, S. 2013
  • Automatic targetless extrinsic calibration of a 3d lidar and camera by maximizing mutual information Pandey, G., McBride, J., R., Savarese, S., Eustice, R., M. 2012
  • Mobile Object Detection through Client-Server based Vote Transfer Kumar, S., Sun, M., Savarese, S. 2012
  • Multimodality video indexing and retrieval using directed information IEEE Transactions on Multimedia Chen, X., Hero, A., Savarese, S. 2012; 14 (1)
  • Object Detection using Geometrical Context Feedback International Journal of Computer Vision Sun, M., Bao, S., Ying-Ze, Savarese, S. 2012; 2
  • An Efficient Branch-and-Bound Algorithm for Optimal Human Pose Estimation Sun, M., Telaprolu, M., Lee, H., Savarese, S. 2012
  • Relating Things and Stuff by High-Order Potential Modeling ECCV 2012 Workshop on Higher-Order Models and Global Constraints in Computer Vision (HiPot). Bao, Y., Xiang, Y., Savarese, S. 2012
  • Estimating the Aspect Layout of Object Categories Xiang, Y., Savarese, S. 2012
  • Structure From Motion with Points, Objects, and Regions Bao, Y., Bagra, M., Chao, Y. -W., Savarese, S. 2012
  • Toward Mutual Information based Automatic Registration of 3D Point Clouds Pandey, G., McBride, J., Savarese, S., Eustice, R. 2012
  • A Unified Framework for Multi-Target Tracking and Collective Activity Recognition Choi, W., Savarese, S. 2012
  • Model-based object recognition Encyclopedia of Computer Vision Sun, M., Savarese, S. Springer. 2012: 1
  • Object Co-detection Bao, Y., Xiang, Y., Savarese, S. 2012
  • 3D Shape from Specular Reflections Encyclopedia of Computer Vision Savarese, S. Springer. 2012: 1
  • MVSS: Michigan Visual Sonification System Clemons, J., Bao, Y., Sharma, V., Savarese, S., Austin, T. 2012
  • Scene Understanding for the Visually Impaired Using Visual Sonification by Visual Feature Analysis and Auditory Signature Clemons, J., Bao, Y., Bagra, M., Austin, T., Savarese, S. 2012
  • Efficient and Exact MAP Inference using Branch and Bound Sun, M., Telaprolu, M., Lee, H., Savarese, S. 2012
  • Research in Visualization Techniques for Field Construction JOURNAL OF CONSTRUCTION ENGINEERING AND MANAGEMENT-ASCE Kamat, V. R., Martinez, J. C., Fischer, M., Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2011; 137 (10): 853-862
  • Toward coherent object detection and scene layout understanding Image and Vision Computing Bao, S., Yingze, Sun, M., Savarese, S. 2011; 9
  • Visually Bootstrapped generalized ICP Pandey, G., McBride, James, R., Savarese, S., Eustice, Ryan, M. 2011
  • EFFEX: An Embedded Processor for Computer Vision Based Feature Extraction Clemons, J., Savarese, S., Austin, T. 2011
  • Deformable Part Models Revisited: A Performance Evaluation for Object Category Pose Estimation IEEE Workshop on Challenges and Opportunities in Robot Perception (in conjunction with ICCV-11). Lopez, R., Tuytelaars, T., Savarese, S. 2011
  • Monitoring Changes of 3D Building Elements from Unordered Photo Collections IEEE workshop on Computer Vision for Remote Sensing of the Environment (in conjunction with ICCV-11). Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2011
  • Learning Context for Collective Activity Recognition Choi, W., Shahid, K., Savarese, S. 2011
  • Semantic Structure from Motion Bao, S., Yingze, Savarese, S. 2011
  • MEVBench: A Mobile Computer Vision Benchmarking Suite Clemons, J., Zhu, H., Savarese, S., Austin, T. 2011
  • Visualization of Construction Progress Monitoring using Unordered Construction Photo Collections and 4D Building Information Models in "Augmented Reality", ISBN 978-953-307-631-7 Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2011: 1
  • Toward Automatic 3D Generic Object Modeling from One Single Image 3DIM-PVT Sun, M., Kumar, S. 2011
  • Semantic Structure From Motion with Object and Point Interactions IEEE Workshop on Challenges and Opportunities in Robot Perception (in conjunction with ICCV-11). Bao, S., Yingze, Bagra, M., Savarese, S. 2011
  • Hierarchical Classification of Images by Sparse Approximation Kim, B., Park, J., Gilbert, A., Savarese, S. 2011
  • Articulated Part-based Model for Joint Object Detection and Pose Estimation Sun, M., Savarese, S. 2011
  • Integrated Sequential As-Built and As-Planned Representation with D4AR Tools in Support of Decision-Making Tasks in the AEC/FM Industry ASCE Journal of Construction Engineering and Management Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2011
  • Robust Object Pose Estimation via Statistical Manifold Modeling Mei, L., Liu, J., Hero, A., Savarese, S. 2011
  • Representations and Techniques for 3D Object Recognition and Scene Interpretation Synthesis lecture on Artificial Intelligence and Machine Learning Hoiem, D., Savarese, S. Morgan Claypool Publishers. 2011: 1
  • Recognizing Human Actions by Attributes Liu, J., Kuipers, B., Savarese, S. 2011
  • Cross-View Action Recognition via View Knowledge Transfer Liu, J., Shah, M., Kuipers, B., Savarese, S. 2011
  • Detecting and Tracking People using an RGB-D Camera via Multiple Detector Fusion Workshop on Challenges and Opportunities in Robot Perception (in conjunction with ICCV-11). Choi, W., Pantofaru, C., Savarese, S. 2011
  • A computer analysis of the mirror in Hans Memlingis Virgin and Child and Maarten van Nieuwenhove Digital Imaging for Cultural Heritage Preservation Savarese, S., Stork, D., G., DelPozo, A., Spronk, R. CRC Press. 2011: 1
  • Multi-view Object Categorization and Pose Estimation Computer Vision: Detection, Recognition and Reconstruction (Studies in Computational Intelligence) Savarese, S., Fei-Fei, L. Springer. 2010: 1
  • Remote assessment of pre and post-disaster critical physical infrastructures using segway mobile workstation chariot and D4AR 4D augmented reality models. Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • Automated model component-based recognition of progress using daily construction photographs and 4D IFC-based models. Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • D4AR 4 Dimensional augmented reality - tools for automated remote progress tracking and support of decision-enabling tasks in the AEC/FM industry Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • Depth-Encoded Hough Voting for Joint Object Detection and Shape Recovery Sun, M., Bradsky, G., Xu, B., Savarese, S. 2010
  • CEC: Research in Visualization Techniques for Field Construction Kamat, V., Martinez, J., Fischer, M., Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • D4AR - 4 DIMENSIONAL AUGMENTED REALITY - MODELS FOR AUTOMATION AND INTERACTIVE VISUALIZATION OF CONSTRUCTION PROGRESS MONITORING Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • Extrinsic calibration of a 3d laser scanner and an omnidirectional camera. Pandey, G., McBride, J., Savarese, S., Eustice, R. 2010
  • Toward automated generation of parametric BIMs based on hybrid video and laser scanning data. In Journal of Advanced Engineering Informatics Brilakis, I., Lourakis, M., Sacks, R., Savarese, S., Christodoulou, S., Teizer, J. 2010; 4 (24): 456-465
  • Model-based detection of progress using D4AR - A 4 Dimensional augmented reality- models generated by daily site photologs and building information models Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2010
  • Toward Coherent Object Detection And Scene Layout Understanding Bao, S., Yingze, Sun, M., Savarese, S. 2010
  • Multiple Target Tracking in World Coordinate with Single, Minimally Calibrated Camera Choi, W., Savarese, S. 2010
  • Object Detection with Geometrical Context Feedback Loop Sun, M., Ying-Ze, S., Savarese, S. 2010
  • Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories 12th IEEE International Conference on Computer Vision Hao Su, S., Min Sun, S., Li Fei-Fei, F. F., Savarese, S. IEEE. 2009: 213–220
  • A Multi-View Probabilistic Model for 3D Object Classes. Sun, M., Su, H., Savarese, S., Fei-Fei, L. 2009
  • Monitoring of Construction Performance Using Daily Progress Photograph Logs and 4D As-Planned Models Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2009
  • What are they doing? : Collective Activity Classification Using Spatio-Temporal Relationship Among People Choi, W., Shahid, K., Savarese, S. 2009
  • D4AR- A 4-Dimensional Augmented Reality Model for Automating Construction Progress Data Collection Golparvar-Fard, M., Pena-Mora, F. 2009
  • Unsupervised Object Pose Classification from Short Video Sequences Mei, L., Sun, M., Carter, K., M., Hero III, A., O., Savarese, S. 2009
  • Sparse Reconstruction and Geo-Registration of Daily Site Photographs for Representation of As-Built Construction Scene and Automatic Construction Progress Data Collection Golparvar-Fard, M., Pena-Mora, F., Savarese, S. 2009
  • Scene Categorization from Low Definition Video Gupta, P., Arrabolu, S., Brown, M., Savarese, S. 2009
  • Interactive Visual Construction Progress Monitoring with 4D Augmented Reality Model Golparvar-Fard, M., Savarese, S., Pena-Mora, F. 2009
  • View synthesis for recognizing unseen poses of object classes. Savarese, S., Fei-Fei, L. 2008
  • Why do we see some surfaces as reflective? Pozo, A., Del, Savarese, S., Baker, D., Simons, D., J. 2008
  • When are reflections useful in perceiving the shape of shiny surfaces? Savarese, S., Pozo, A., Del, Baker, D., Simons, D., J. 2008
  • Spatial-Temporal Correlations for Unsupervised Action Classification Savarese, S., Pozo, A., Del, Niebles, J, C., Fei-Fei, L. 2008
  • Reflections on praxis and facture in a devotional portrait diptych: A computer analysis of the mirror in Hans Memling’s Virgin and Child and Maarten van Nieuwenhove Savarese, S., Spronk, R., Stork, D., G., DelPozo, A. 2008
  • Interactive Visual Construction Progress Monitoring with 4D Augmented Reality Model CCBE-XI Fard, M., Golparvar, Savarese, S., Pena-Mora, F. 2008
  • Detecting Specular Surfaces on Natural Images DelPozo, A., Savarese, S. 2007
  • Carving from ray-tracing constraints: IRT-carving Andreetto, M., Savarese, S., Perona, P. 2006
  • Discriminative Object Class Models of Appearance and Shape by Correlatons Savarese, S., Winn, J., Criminisi, A. 2006
  • 3D Reconstruction by Shadow Carving: Theory and Practical Evaluation International Journal of Computer Vision (IJCV) Savarese, S., Andreetto, M., Rushmeier, H. 2006; 3 (71): 305-336
  • Local Shape from Mirror Reflections International Journal of Computer Vision (IJCV) Savarese, S., Chen, M., Perona, P. 2005; 1 (64): 31-67
  • What do reflections tell us about the shape of a mirror? in Applied Perception in Graphics and Visualization [sponsored by ACM SIGGRAPH] Savarese, S., Fei-Fei, L., Perona, P. 2004: 115-118
  • Recovering local shape of a mirror surface from reflection of a regular gridI Savarese, S., Chen, M., Perona, P. 2004
  • Can We See the Shape of a Mirror? Savarese, S., Fei-Fei, L., Perona, P. 2003
  • Implementation of a Shadow Carving System for Shape Capture Savarese, S., Rushmeier, H., Bernardini, F., Perona, P. 2002
  • Local Analysis for 3D Reconstruction of Specular Surfaces -- part II Savarese, S., Perona, P. 2002
  • Second Order Local Analysis for 3D Reconstruction of Specular Surfaces Savarese, S., Chen, M., Perona, P. 2002
  • Local Analysis for 3D Reconstruction of Specular Surfaces Savarese, S., Perona, P. 2001
  • Shadow Carving Savarese, S., Rushmeier, H., Bernardini, F., Perona, P. 2001