CVPR 2026 Past Multimodal

CVPR 2026: 2nd Workshop on Multimodal Spatial Intelligence

MUSI

Submission deadline
TBA — know the deadline? Add it in one line
The file opens with a ready-to-fill template — takes about a minute.
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (18)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models

    Yanpeng Zhao, Wentao Ding, Hongtao Li, Baoxiong Jia, Zilong Zheng · PDF
  2. ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search

    Myungchul Kim, Kwanyong Park, Junmo Kim, In So Kweon · PDF
  3. Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

    Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma, Joseph Tighe, Fanyi Xiao · PDF
  4. Bridging the Granularity Gap: Object-Centric Masking for Contextual Visual Learning

    Jike Zhong · PDF
  5. Can VLMs Handle Multi-hop Compositional Spatial Reasoning?

    Youngwan Lee, Soojin Jang, Yoorhim Cho, Seunghwan Lee, Yong-Ju Lee, Sung Ju Hwang · PDF
  6. CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection

    Hojun Choi, Youngsun Lim, Jaeyo Shin, Hyunjung Shim · PDF
  7. Hear you are: Teaching LLMs Spatial Reasoning with Vision and Spatial Sound

    Hyeonggon Ryu, Joon Son Chung, David Harwath · PDF
  8. Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

    Mahtab Bigverdi, Linjie Li, Weikai Huang, Yiming Liu, Jaemin Cho, Jieyu Zhang, Tuhin Kundu, Chris Dongjoo Kim, Zelun Luo, Ranjay Krishna, Linda Shapiro · PDF
  9. Improving Scene Text Recognition in Multimodal Large Language Models using Visual Text Grounding

    Shashank Krishna Vempati, Chetan Arora · PDF
  10. MindBlock: Probing Spatial Assembly and Structure in Unified Multimodal Models

    Baiqiao Yin, Junhao Liu, Han Yin, Heyang Yu, Tingxuan Zhang, Zhiheng Li, Chengzu Li, Jihan Yang, Manling Li, Chen Feng, Yiming Li · PDF
  11. Multi-Modal Manipulation via Multi-Modal Policy Consensus

    Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, Katherine Rose Driggs-Campbell · PDF
  12. Name That Part: 3D Part Segmentation and Naming

    Soumava Paul, Prakhar Kaushik, Ankit Vaidya, Anand Bhattad, Alan Yuille · PDF
  13. SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation

    Vaibhav Agrawal, Rishubh Parihar, Pradhaan S Bhat, Ravi Kiran Sarvadevabhatla, Venkatesh Babu Radhakrishnan · PDF
  14. SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

    Siyi Chen, Mikaela Angelina Uy, Chan Hee Song, Faisal Ladhak, Adithyavairavan Murali, Qing Qu, Stan Birchfield, Valts Blukis, Jonathan Tremblay · PDF
  15. SPOT: Structured Prompting with Object-centric Tokens for open-world scene graphs

    Mengqi Zhang, Sahil Khose, Fiona Ryan, Judy Hoffman · PDF
  16. Synthesis of Interactive and Expansive Apartment Environments

    ChunTeng Chen · PDF
  17. Synthetic Counterfactual World Models for Multimodal Spatial Reasoning in Low-Resource 3D Domains

    Mahule Roy, Subhas Roy · PDF
  18. Theory of Space: Evaluating Multimodal Spatial Belief through Active Exploration

    Pingyue Zhang, Zihan Huang, Yue Wang, Jieyu Zhang, Letian Xue, Zihan Wang, Qineng Wang, Keshigeyan Chandrasegaran, Ruohan Zhang, Yejin Choi, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Manling Li · PDF