CVPR 2026 Past Multimodal
CVPR 2026: 2nd Workshop on Multimodal Spatial Intelligence
MUSI
- Submission deadline
-
TBA — know
the deadline? Add it in one line The file opens with a ready-to-fill template — takes about a minute.
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (18)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
-
ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search
-
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
-
Bridging the Granularity Gap: Object-Centric Masking for Contextual Visual Learning
-
Can VLMs Handle Multi-hop Compositional Spatial Reasoning?
-
CoT-PL: Chain-of-Thought Pseudo-Labeling for Open-Vocabulary Object Detection
-
Hear you are: Teaching LLMs Spatial Reasoning with Vision and Spatial Sound
-
Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models
-
Improving Scene Text Recognition in Multimodal Large Language Models using Visual Text Grounding
-
MindBlock: Probing Spatial Assembly and Structure in Unified Multimodal Models
-
Multi-Modal Manipulation via Multi-Modal Policy Consensus
-
Name That Part: 3D Part Segmentation and Naming
-
SeeThrough3D: Occlusion Aware 3D Control in Text-to-Image Generation
-
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
-
SPOT: Structured Prompting with Object-centric Tokens for open-world scene graphs
-
Synthesis of Interactive and Expansive Apartment Environments
-
Synthetic Counterfactual World Models for Multimodal Spatial Reasoning in Low-Resource 3D Domains
-
Theory of Space: Evaluating Multimodal Spatial Belief through Active Exploration