Stanford Advisors


All Publications


  • CGOF++: Controllable 3D Face Synthesis With Conditional Generative Occupancy Fields. IEEE transactions on pattern analysis and machine intelligence Sun, K., Wu, S., Zhang, N., Huang, Z., Wang, Q., Li, H. 2024; 46 (2): 913-926

    Abstract

    Capitalizing on the recent advances in image generation models, existing controllable face image synthesis methods are able to generate high-fidelity images with some levels of controllability, e.g., controlling the shapes, expressions, textures, and poses of the generated face images. However, previous methods focus on controllable 2D image generative models, which are prone to producing inconsistent face images under large expression and pose changes. In this paper, we propose a new NeRF-based conditional 3D face synthesis framework, which enables 3D controllability over the generated face images by imposing explicit 3D conditions from 3D face priors. At its core is a conditional Generative Occupancy Field (cGOF++) that effectively enforces the shape of the generated face to conform to a given 3D Morphable Model (3DMM) mesh, built on top of EG3D (Chan et al. 2022), a recent tri-plane-based generative model. To achieve accurate control over fine-grained 3D face shapes of the synthesized images, we additionally incorporate a 3D landmark loss as well as a volume warping loss into our synthesis framework. Experiments validate the effectiveness of the proposed method, which is able to generate high-fidelity face images and shows more precise 3D controllability than state-of-the-art 2D-based controllable face synthesis methods.

    View details for DOI 10.1109/TPAMI.2023.3328912

    View details for PubMedID 38153826

  • DOVE: Learning Deformable 3D Objects by Watching Videos INTERNATIONAL JOURNAL OF COMPUTER VISION Wu, S., Jakab, T., Rupprecht, C., Vedaldi, A. 2023
  • Unsupervised Learning of Probably Symmetric Deformable 3D Objects From Images in the Wild (Invited Paper) IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Wu, S., Rupprecht, C., Vedaldi, A. 2023; 45 (4): 5268-5281

    Abstract

    We propose a method to learn 3D deformable object categories from raw single-view images, without external supervision. The method is based on an autoencoder that factors each input image into depth, albedo, viewpoint and illumination. In order to disentangle these components without supervision, we use the fact that many object categories have, at least approximately, a symmetric structure. We show that reasoning about illumination allows us to exploit the underlying object symmetry even if the appearance is not symmetric due to shading. Furthermore, we model objects that are probably, but not certainly, symmetric by predicting a symmetry probability map, learned end-to-end with the other components of the model. Our experiments show that this method can recover very accurately the 3D shape of human faces, cat faces and cars from single-view images, without any supervision or a prior shape model. On benchmarks, we demonstrate superior accuracy compared to another method that uses supervision at the level of 2D image correspondences.

    View details for DOI 10.1109/TPAMI.2021.3076536

    View details for Web of Science ID 000947840300082

    View details for PubMedID 33914682

  • Self-supervised learning for using overhead imagery as maps in outdoor range sensor localization INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH Tang, T. Y., De Martini, D., Wu, S., Newman, P. 2021; 40 (12-14): 1488-1509

    Abstract

    Traditional approaches to outdoor vehicle localization assume a reliable, prior map is available, typically built using the same sensor suite as the on-board sensors used during localization. This work makes a different assumption. It assumes that an overhead image of the workspace is available and utilizes that as a map for use for range-based sensor localization by a vehicle. Here, range-based sensors are radars and lidars. Our motivation is simple, off-the-shelf, publicly available overhead imagery such as Google satellite images can be a ubiquitous, cheap, and powerful tool for vehicle localization when a usable prior sensor map is unavailable, inconvenient, or expensive. The challenge to be addressed is that overhead images are clearly not directly comparable to data from ground range sensors because of their starkly different modalities. We present a learned metric localization method that not only handles the modality difference, but is also cheap to train, learning in a self-supervised fashion without requiring metrically accurate ground truth. By evaluating across multiple real-world datasets, we demonstrate the robustness and versatility of our method for various sensor configurations in cross-modality localization, achieving localization errors on-par with a prior supervised approach while requiring no pixel-wise aligned ground truth for supervision at training. We pay particular attention to the use of millimeter-wave radar, which, owing to its complex interaction with the scene and its immunity to weather and lighting conditions, makes for a compelling and valuable use case.

    View details for DOI 10.1177/02783649211045736

    View details for Web of Science ID 000703251000001

    View details for PubMedID 34992328

    View details for PubMedCentralID PMC8721700

  • De-rendering the World's Revolutionary Artefacts Wu, S., Makadia, A., Wu, J., Snavely, N., Tucker, R., Kanazawa, A., IEEE COMP SOC IEEE COMPUTER SOC. 2021: 6334-6343
  • Unsupervised Learning of Probably Symmetric Deformable 3D Objects from Images in the Wild Wu, S., Rupprecht, C., Vedaldi, A., IEEE IEEE. 2020: 1-10
  • Deep High Dynamic Range Imaging with Large Foreground Motions Wu, S., Xu, J., Tai, Y., Tang, C., Ferrari, Hebert, M., Sminchisescu, C., Weiss, Y. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 120-135
  • Image Generation from Sketch Constraint Using Contextual GAN Lu, Y., Wu, S., Tai, Y., Tang, C., Ferrari, Hebert, M., Sminchisescu, C., Weiss, Y. SPRINGER INTERNATIONAL PUBLISHING AG. 2018: 213-228