Sicheng Xu

I am currently a Senior Researcher at Microsoft Research Asia (MSRA), which I joined in late 2021. My research spans multimodal AIGC, 3D vision and embodied AI, with an emphasis on digital humans, 3D generation and reconstruction.

Feel free to contact me if you’re interested in my research, collaboration, or internship opportunities.

Publications

Trellis.2
Native and Compact Structured Latents for 3D Generation
Jianfeng Xiang+, Xiaoxue Chen+, Sicheng Xu, Ruicheng Wang+, Zelong Lv+, Yu Deng, Hongyuan Zhu, Yue Dong, Hao Zhao, Nicholas Jing Yuan, Jiaolong Yang
+: Intern at MSRA.
mvrobobench
Seeing Across Views: Benchmarking Spatial Reasoning of Vision-Language Models in Robotic Scenes
Z. Feng, Z. Kang, Q. Wang, Z. Du, J. Yan, S. Shi, C. Yuan, H. Liang, Y. Deng, Q. Li, R. Yang, R. An, L. Zheng, W. Wang, S. Chen, S. Xu, Y. Liang, J. Yang, B. Guo
VITRA
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Q. Li, Y. Deng, Y. Liang, L. Luo, L. Zhou, C. Yao, L. Zeng, Z. Feng, H. Liang, S. Xu, Y. Zhang, X. Chen, H. Chen, L. Sun, D. Chen, J. Yang, B. Guo
[PDF] [Project] [Code] ICRA 2026
VASA-3D
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image
Sicheng Xu*, Guojun Chen*, Jiaolong Yang, Yizhong Zhang, Stephen Lin, Baining Guo
[PDF] [Project] NeurIPS 2025
*: Equal contributions.
MoGe2
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
Ruicheng Wang+, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang+, Zelong Lv+, Guangzhong Sun, Xin Tong, Jiaolong Yang
[PDF] [Project] [Code] NeurIPS 2025
+: Intern at MSRA.
GVFDiffusion
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
Bowen Zhang+, Sicheng Xu, Chuxin Wang, Jiaolong Yang, Feng Zhao, Dong Chen, Baining Guo
[PDF] [Project] [Code] ICCV 2025
+: Intern at MSRA.
VASA-Rig
VASA-Rig: Audio-Driven 3D Facial Animation with 'Live' Mood Dynamics in Virtual Reality
Ye Pan, Chang Liu, Sicheng Xu, Shuai Tan, Jiaolong Yang
[PDF] VR 2025&TVCG, Best Paper Honorable Mention Award
Trellis
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng Xiang+, Zelong Lv+, Sicheng Xu, Yu Deng, Ruicheng Wang+, Bowen Zhang+, Dong Chen, Xin Tong, Jiaolong Yang
[PDF] [Project] [Code] CVPR 2025, Spotlight
+: Intern at MSRA.
CogACT
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang+, Sicheng Xu, Cassie Dai+, Jianfeng Xiang+, Yu Deng, Xin Tong, Jiaolong Yang
[PDF] [Project] [Code] CVPR 2025, Oral Presentation
+: Intern at MSRA.
CogACT
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Q. Li, Y. Liang, Z. Wang, L. Luo, X. Chen, M. Liao, F. Wei, Yu Deng, S. Xu, Y. Zhang, X. Wang, B. Liu, J. Fu, J. Bao, D. Chen, Y. Shi, J. Yang, B. Guo
*: Equal contributions.
Vasa-1
Vasa-1: Lifelike audio-driven talking faces generated in real time
Sicheng Xu*, Guojun Chen*, Yu-Xiao Guo*, Jiaolong Yang*, Chong Li, Zhenyu Zang, Yizhong Zhang, Xin Tong, Baining Guo
[PDF] [Project] [CAPP] NeurIPS 2024, Oral Presentation
*: Equal contributions.
AniPortraitGAN
AniPortraitGAN: Animatable 3D Portrait Generation from 2D Image Collections
Yue Wu+*, Sicheng Xu*, Jianfeng Xiang, Fangyun Wei, Qifeng Chen, Jiaolong Yang, Xin Tong.
[PDF] [Project] [Code] SIGGRAPH Asia 2023
+: Intern at MSRA, *: Equal contributions.
Deep 3D Portrait
Deep 3D Portrait from a Single Image
Sicheng Xu, Jiaolong Yang, Dong Chen, Fang Wen, Yu Deng, Yunde Jia, Xin Tong
[PDF] [Code] CVPR 2020
Work done during my internship at MSRA.
Accurate 3D Face Reconstruction
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set
Yu Deng, Jiaolong Yang, Sicheng Xu, Dong Chen, Yunde Jia, Xin Tong
[PDF] [Code] CVPRW 2019, Best Paper Award
Work done during my internship at MSRA.

Academic Services

Conference Reviewer: CVPR, ICCV, ECCV, NeurIPS, SIGGRAPH, SIGGRAPH Asia
Journal Reviewer: TPAMI, TVCG, IJCV, TIP