CVPR 2026PastEfficiencyComputer vision

1st Workshop on Video World Models: Interaction, Memory, Efficiency (Non-Proceedings Track)

CVPR 2026 Workshop VideoWorldModel

Official website ↗OpenReview venue ↗See all CVPR workshops →✎ Edit this entry

Submission deadline: May 3, 2026, 01:39 UTC
OpenReview-synced 2026-05-03 01:39 UTC (as of 2026-06-23) — extensions on OpenReview are applied automatically; verify on the website.
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (24)

Fetched from OpenReview (v2) on 2026-06-10.

Building a Precise Video Language with Human–AI Oversight
Siyuan Cen, Hewei Wang, Chancharik Mitra, Isaac Li, Yuhan Huang, Yu Tong Tiffany Ling, Irene Pi, Shihang Zhu, Yili Han, Yilun Du, Deva Ramanan, Zhiqiu Lin · PDF
Causal State Compression for Long-Horizon Video World Models: A Bounded-Drift Theory and Efficient Architecture
Kaustubh S. Bukkapatnam, Siddharth Karuturi · PDF
Causal State Entropy Bounds on Predictive Horizons in Video World Models
Siddharth Karuturi, Kaustubh S. Bukkapatnam · PDF
DECOMWM: Interpretable Reward Decomposition for World-Model-Based Trajectory Selection
Yun Sang Nam, KimJinChan · PDF
Dexterous World Models
Byungjun Kim, Taeksoo Kim, Junyoung Lee, Hanbyul Joo · PDF
EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses
Enrico Pallotta, Sina Mokhtarzadeh Azar, Lars Doorenbos, Serdar Ozsoy, Umar Iqbal, Juergen Gall · PDF
Forecasting Motion in the Wild
Neerja Thakkar, Shiry Ginosar, Jacob C Walker, Jitendra Malik, Joao Carreira, Carl Doersch · PDF
Future Optical Flow Prediction Improves Robot Control & Video Generation
Kanchana Ranasinghe, Honglu Zhou, Yu Fang, Luyu Yang, Le Xue, Ran Xu, Caiming Xiong, silvio savarese, Michael S Ryoo, Juan Carlos Niebles · PDF
Inference-Time Planning with Action-Conditioned Video Models for Generalizable Robot Manipulation
Zhiting Mei, Yanbo Xu, Tenny Yin, Ola Sho, Anirudha Majumdar · PDF
Is Your Driving World Model an All-Around Player?
Lingdong Kong, Alan Liang, Tianyi Yan, Hongsi Liu, Yu Yang, Ziqi Huang, Xian Sun, Wei Yin, Jialong Zuo, Yixuan Hu, Dekai Zhu, Dongyue Lu, Youquan Liu, Guangfeng Jiang, Linfeng Li, Xiangtai Li, Long Zhuo, Lai Xing Ng, Benoit R Cottereau, Changxin Gao, Liang Pan, Wei Tsang Ooi, Ziwei Liu · PDF
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos
Yuxue Yang, Lue Fan, Ziqi Shi, Junran Peng, Feng Wang, Zhaoxiang Zhang · PDF
Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now
Varun Varma Thozhiyoor, Shivam Tripathi, Venkatesh Babu Radhakrishnan, Anand Bhattad · PDF
Olaf-World: Orienting Latent Actions for Video World Modeling
Yuxin Jiang, Yuchao Gu, Ivor Tsang, Mike Zheng Shou · PDF
OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis
Xiang Fan, Sharath Girish, Vivek Ramanujan, Chaoyang Wang, Ashkan Mirzaei, Peter Sushko, Aliaksandr Siarohin, Sergey Tulyakov, Ranjay Krishna · PDF
Rays as Pixels: Learning a Joint Distribution of Videos and Camera Trajectories
Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang · PDF
RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
Feng Jiang, Yang Chen, Jingkai Xu, Yuchen Liu, Haifeng Wang, Zhenhao Shen, Jasper Lu, Shu Chen, Shengze Huang, Yuanfei Wang, Ruihai Wu · PDF
SEGAR: Selective Enhancement for Generative Augmented Reality
Fanjun Bu, Chenyang Yuan, Hiroshi Yasuda · PDF
Spectral World Models: Provably Consistent Long-Horizon Video Generation via Koopman Operator Decomposition
Siddharth Karuturi, Kaustubh S. Bukkapatnam · PDF
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
Xiangbo Gao, Mingyang Wu, Siyuan Yang, Jiongze Yu, Pardis Taghavi, Fangzhou Lin, Zhengzhong Tu · PDF
VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control
Sixiao Zheng, Minghao Yin, Wenbo Hu, Xiaoyu Li, Ying Shan, Yanwei Fu · PDF
Video Models Reason Early: Exploiting Plan Commitment for Maze Solving
Kaleb Newman, Tyler Zhu, Olga Russakovsky · PDF
WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics
Sahil Khose, Mengqi Zhang, Prithvijit Chattopadhyay, Judy Hoffman · PDF
WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation
Jisu Nam, Yicong Hong, Chun-Hao Paul Huang, Feng Liu, JoungBin Lee, Jiyoung Kim, Siyoon Jin, Yunsung Lee, Jaeyoon Jung, Suhwan Choi, Seungryong Kim, Yang Zhou · PDF
WorldPack: Dynamic Frame Compression for Long-context Video World Modeling
Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, Hiroki Furuta · PDF

Accepted papers (24)

☆Building a Precise Video Language with Human–AI Oversight

☆Causal State Compression for Long-Horizon Video World Models: A Bounded-Drift Theory and Efficient Architecture

☆Causal State Entropy Bounds on Predictive Horizons in Video World Models

☆DECOMWM: Interpretable Reward Decomposition for World-Model-Based Trajectory Selection

☆Dexterous World Models

☆EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses

☆Forecasting Motion in the Wild

☆Future Optical Flow Prediction Improves Robot Control & Video Generation

☆Inference-Time Planning with Action-Conditioned Video Models for Generalizable Robot Manipulation

☆Is Your Driving World Model an All-Around Player?

☆NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

☆Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

☆Olaf-World: Orienting Latent Actions for Video World Modeling

☆OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

☆Rays as Pixels: Learning a Joint Distribution of Videos and Camera Trajectories

☆RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

☆SEGAR: Selective Enhancement for Generative Augmented Reality

☆Spectral World Models: Provably Consistent Long-Horizon Video Generation via Koopman Operator Decomposition

☆The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

☆VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

☆Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

☆WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics

☆WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

☆WorldPack: Dynamic Frame Compression for Long-context Video World Modeling

Building a Precise Video Language with Human–AI Oversight

Causal State Compression for Long-Horizon Video World Models: A Bounded-Drift Theory and Efficient Architecture

Causal State Entropy Bounds on Predictive Horizons in Video World Models

DECOMWM: Interpretable Reward Decomposition for World-Model-Based Trajectory Selection

Dexterous World Models

EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses

Forecasting Motion in the Wild

Future Optical Flow Prediction Improves Robot Control & Video Generation

Inference-Time Planning with Action-Conditioned Video Models for Generalizable Robot Manipulation

Is Your Driving World Model an All-Around Player?

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Objects in Generated Videos Are Slower Than They Appear: Models Suffer Sub-Earth Gravity and Don't Know Galileo's Principle...for now

Olaf-World: Orienting Latent Actions for Video World Modeling

OmniView: An All-Seeing Diffusion Model for 3D and 4D View Synthesis

Rays as Pixels: Learning a Joint Distribution of Videos and Camera Trajectories

RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation

SEGAR: Selective Enhancement for Generative Augmented Reality

Spectral World Models: Provably Consistent Long-Horizon Video Generation via Koopman Operator Decomposition

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

VerseCrafter: Dynamic Realistic Video World Model with 4D Geometric Control

Video Models Reason Early: Exploiting Plan Commitment for Maze Solving

WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

WorldPack: Dynamic Frame Compression for Long-context Video World Modeling