CVPR 2026 Past Large language modelsComputer vision

CVPR 2026 Video LLMs Workshop

VidLLMs 2026

Submission deadline
TBA — know the deadline? Add it in one line
The file opens with a ready-to-fill template — takes about a minute.
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (18)

Fetched from OpenReview (v2) on 2026-06-10.

  1. 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

    Chiao-An Yang, Ryo Hachiuma, Sifei Liu, Subhashree Radhakrishnan, Raymond A. Yeh, Yu-Chiang Frank Wang, Min-Hung Chen · PDF
  2. CausalScene: Typed Causal Scene Graphs for Counterfactual Physical Reasoning with a Path to Video LLMs

    Noor Islam S. Mohammad, Ulug Bayazit · PDF
  3. CoSeLECT: Adaptive Frame Selection for Video-Language Understanding

    Bhavika Suresh Devnani, Jitesh Jain, Humphrey Shi, Judy Hoffman · PDF
  4. Evaluating Video Question Answering Multimodal Large Language Models

    George Awad, Sanjay Purushotham · PDF
  5. FineBench: Benchmarking and Enhancing Vision-Language Models for Fine-grained Human Activity Understanding

    Gueter Josmy Faure, Min-Hung Chen, Jia-Fong Yeh, Hung-Ting Su, Winston H. Hsu · PDF
  6. Grounding Video Reasoning in Physical Signals

    Alibay Osmanli, Zixu Cheng, Shaogang Gong · PDF
  7. Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

    Zacharie Bugaud · PDF
  8. MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks

    Han Zhang, Wanting Jiang, Tomasz Kornuta, Tian Zheng, Vidya Nariyambut Murali · PDF
  9. Mind the Gap: Dataset and Fine-grained Evaluation for Inline Audio Descriptions

    Subhashini Venugopalan, Yingwen Tan, Taylor Roper, Jimmy Tobin, Anton Kast, Alicia Martin, Sam Sepah, Amy Pavel · PDF
  10. One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition

    Balaji Darur, Amanmeet Garg, Makarand Tapaswi · PDF
  11. StreamMem: Query-Agnostic KV Cache Memory for Streaming Video Understanding

    Yanlai Yang, Zhuokai Zhao, Satya Narayan Shukla, Aashu Singh, Shlok Kumar Mishra, Lizhu Zhang, Mengye Ren · PDF
  12. StreamReady: Learning *What* to Answer and *When* in Long Streaming Videos

    Shehreen Azad, Vibhav Vineet, Yogesh S Rawat · PDF
  13. Test-Time Horizon Scaling in Video LLMs via Adaptive Temporal Memory Compression

    Mahule Roy, Subhas Roy · PDF
  14. TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs

    Baiqi Li, Kangyi Zhao, Ce Zhang, Chancharik Mitra, Jean de Dieu Nyandwi, Gedas Bertasius · PDF
  15. VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

    Pritam Sarkar, Ali Etemad · PDF
  16. VideoCritic: Diagnosing and Localizing Reasoning Errors in Video-Language Models

    Chenwei Xu, Jianshu Zhang, Shang Wu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Han Liu · PDF
  17. VideoNet: A Large-Scale Dataset for Domain-Specific Action Recognition

    Tanush Yadav, Mohammadreza Salehi, Jae Sung Park, Vivek Ramanujan, Hannaneh Hajishirzi, Yejin Choi, Ali Farhadi, Rohun Tripathi, Ranjay Krishna · PDF
  18. VisCoP: Visual Probing for Video Domain Adaptation of Vision Language Models

    Dominick Reilly, Manish Kumar Govind, Le Xue, Srijan Das · PDF