Anyi Rao

Postdoctoral Scholar, Computer Science

Web page: https://anyirao.com/

Bio

Anyi Rao is a Postdoctoral Scholar at Stanford. He studies reliable human-centered AI for creativity and film, focusing on intelligent media editing and creation, semantic and cinematic analysis, aiming to build connections between AI and humans for collaborative intelligence and unleash human creativity and productivity. His works include ControlNet, AnimateDiff, MovieNet, Virtual Studio, Shoot360, and CityNeRF, with a Marr Prize (ICCV best paper award). He assumes a leading role in the organization of the Creative Video Editing and Understanding Workshop at CVPR24, ICCV23, the Generative Models Course at SIGGRAPH24, and the 2023 Paris AI Short Film Festival. He has research experiences at Meta Reality Lab, Vector Institute, University of Toronto, and Hong Kong University. He received his Ph.D. at MMLab in the Chinese University of Hong Kong in 2022.

Honors & Awards

Marr Prize (Best Paper Award), ICCV (2023)
Magic Grant, Brown Institue (2023)
Research Funding by Prime Video, Amazon (2023)
Grant for Organizing ICCV23 Creative Video Editing and Understanding Workshop, Pika, KAUST (2023)
Grant for Organizing ECCV22 Creative Video Editing and Understanding Workshop, KAUST (2022)
Grant for Organizing ICCV21 Creative Video Editing and Understanding Workshop, Adobe (2021)
Most Influential Papers, Paper Digest (2021)

Boards, Advisory Committees, Professional Organizations

Leading Organizer, SIGGRAPH Course on Generative Models for Visual Content Editing and Creation (2024 - 2024)
Leading/Key Organizer, CVPR2024/ICCV2023/ECCV2022/ICCV2021 Workshop AI for Creative Video Editing and Understanding (2021 - Present)
Founder, Virtual Film Studio https://virtualfilmstudio.github.io/ (2023 - Present)
Co-Founder, City-Super https://city-super.github.io/ (2021 - Present)
Co-Founder, MovieNet https://movienet.github.io/ (2020 - Present)
Program Committee Member and Reviewer, CVPR, ICCV, ECCV, ACCV, SIGGRAPH, SIGGRAPH Asia, CHI, UIST, MM, NeurIPS, ICML, ICLR, AAAI, IJCAI (2021 - Present)
Journal Reviewer, TPAMI, TVCG, TMM, TCSVT, IJCV (2021 - Present)

Stanford Advisors

Maneesh Agrawala, Postdoctoral Faculty Sponsor

Contact

Academic
anyirao@stanford.edu

University - Scholar Department: Computer Science Position: Postdoctoral Scholar

Additional Info

Mail Code: 9025
ORCID:
https://orcid.org/0000-0003-1004-7753

Current Research and Scholarly Interests

Human AI for Creativity, Computer Vision, Graphics, Human-Computer Interaction

All Publications

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning International Conference on Learning Representations Guo, Y., Yang, C., Rao, A., Liang, Z., Wang, Y., Qiao, Y., Agrawala, M., Lin, D., Dai, B. 2024

View details for DOI 10.48550/arXiv.2307.04725
Adding Conditional Control to Text-to-Image Diffusion Models IEEE/CVF International Conference on Computer Vision (ICCV) Zhang, L., Rao, A., Agrawala, M. 2023

View details for DOI 10.1109/ICCV51070.2023.00355
Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production SIGGRAPH Special Interest Group on Computer Graphics and Interactive Techniques Conference Poster Rao, A., Jiang, X., Guo, Y., Xu, L., Yang, L., Jin, L., Lin, D., Dai, B. 2023

View details for DOI 10.1145/3588028.3603647
Shoot360: Normal View Video Creation from City Panorama Footage SIGGRAPH Special Interest Group on Computer Graphics and Interactive Techniques Conference Rao, A., Xu, L., Lin, D. 2022

View details for DOI 10.1145/3528233.3530702
BungeeNeRF: Progressive Neural Radiance Field for Extreme Multi-scale Scene Rendering European Conference on Computer Vision (ECCV) Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B., Lin, D. 2022

View details for DOI 10.1007/978-3-031-19824-3_7
MovieNet: A Holistic Dataset for Movie Understanding European Conference on Computer Vision (ECCV) Huang, Q., Xiong, Y., Rao, A., Wang, J., Lin, D. 2020

View details for DOI 10.1007/978-3-030-58548-8_41
HotFlip: White-Box Adversarial Examples for Text Classification Annual Meeting of the Association for Computational Linguistics (ACL) Ebrahimi, J., Rao, A., Lowd, D., Dou, D. 2018

View details for DOI 10.18653/v1/p18-2006
HireVAE: An Online and Adaptive Factor Model Based on Hierarchical and Regime-Switch VAE International Joint Conference on Artificial Intelligence (IJCAI) Wei, Z., Rao, A., Dai, B., Lin, D. 2023
Self-supervised Action Representation Learning from Partial Spatio-Temporal Skeleton Sequences The AAAI Conference on Artificial Intelligence Zhou, Y., Duan, H., Rao, A., Su, B., Wang, J. 2023

View details for DOI 10.1609/aaai.v37i3.25495
A Coarse-to-Fine Framework for Automatic Video Unscreen IEEE Transactions on Multimedia (TMM) Rao, A., Xu, L., Li, Z., Huang, Q., Kuang, Z., Zhang, W., Lin, D. 2022

View details for DOI 10.1109/TMM.2022.3150177
AutoGPart: Intermediate Supervision Search for Generalizable 3D Part Segmentation IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Liu, X., Xu, X., Rao, A., Gan, C., Yi, L. 2022

View details for DOI 10.1109/CVPR52688.2022.01133
BlockPlanner: City Block Generation with Vectorized Graph Representation IEEE/CVF International Conference on Computer Vision (ICCV) Xu, L., Xiangli, Y., Rao, A., Zhao, N., Dai, B., Liu, Z., Lin, D. 2021

View details for DOI 10.1109/ICCV48922.2021.00503
Jointly Learning the Attributes and Composition of Shots for Boundary Detection in Videos IEEE Transactions on Multimedia (TMM) Jiang, X., Jin, L., Rao, A., Xu, L., Lin, D. 2021

View details for DOI 10.1109/tmm.2021.3092143
A Local-to-Global Approach to Multi-Modal Movie Scene Segmentation IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Rao, A., Xu, L., Xiong, Y., Xu, G., Huang, Q., Zhou, B., Lin, D. 2020

View details for DOI 10.1109/cvpr42600.2020.01016
A Unified Framework for Shot Type Classification Based on Subject Centric Lens European Conference on Computer Vision (ECCV) Rao, A., Wang, J., Xu, L., Jiang, X., Huang, Q., Zhou, B., Lin, D. 2020

View details for DOI 10.1007/978-3-030-58621-8_2
Online Multi-modal Person Search in Videos European Conference on Computer Vision (ECCV) Xia, J., Rao, A., Huang, Q., Wen, J., Lin, D. 2020

View details for DOI 10.1007/978-3-030-58610-2_11