Xiangpeng Yang

Ph.D. Student, University of Technology Sydney (UTS)

Generative AI · Video Generation · Vision & Language · Multi-modal Learning

Sydney, Australia Open to collaboration

Biography

Hi, I'm Xiangpeng. I am currently a Ph.D. student at the University of Technology Sydney (UTS).

My research interests involve Generative AI, Video Generation, and Multi-modal Learning. Specifically, I focus on video world models, video generation, and multi-modal foundation models.

Looking ahead, I am deeply motivated to build unified video models capable of jointly understanding dynamic visual environments and generating coherent future content within a single framework. I believe this direction is a crucial step toward world models, where systems can reason about and interact with the physical world through continuous video understanding and prediction.

I am currently seeking research intern opportunities. If there are suitable positions available, please feel free to reach out. Thank you!

News

[02/2026] VideoCoF is accepted to CVPR 2026, with a strong average score of 5!
[12/2025] We Release the Code and Model of VideoCoF.
[07/2025] Gave an invited talk at Sydney AI Meet-Up.
[01/2025] VideoGrain is accepted to ICLR 2025.
[12/2023] DGL is accepted to AAAI 2024.

Selected Publications

Unified Video Editing with Temporal Reasoner

Xiangpeng Yang, Ji Xie, Yiyuan Yang, Min Xu, Qiang Wu

Conference on Computer Vision and Pattern Recognition (CVPR), 2026

TL;DR: A Chain-of-Frames reasoning framework for unified video editing with 16× length extrapolation.

ArXiv Project Code HF page Video Dataset media[机器之心]
VideoGrain: Modulating Space-Time Attention for Multi-Grained Video Editing

Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang

The International Conference on Learning Representations (ICLR), 2025

TL;DR: The first work propose Multi-Grained Video Editing, including Class, Instance and Part-level

Paper Project Code HF Page media(ak) media(gradio) media(量子位)
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval

Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang

AAAI Conference on Artificial Intelligence (AAAI), 2024

TL;DR: Training only 0.83 MB parameters to achieve better performance than full finetuning.

Paper Code Video

Industry Experience

Baidu Research — Aug 2022 – Mar 2023 (Beijing)
ByteDance AI Lab — May 2021 – Sep 2021 (Beijing)

Invited Talks

[06/2025] "Multimodal SSMs, Multimodal Reasoning, and Multi-Grained Video Editing" at Twelve Labs, hosted by James Le (video).
[03/2025] "VideoGrain: Exploration and Application of Multi-Grained Video Editing Based on Diffusion Models" at Qingke AI (video).

Academic Service

Regular Reviewer: CVPR, ICLR, ICML, NeurIPS, ICCV, ECCV

Visitor Map