NeurIPS 2025 Past RoboticsComputer vision

NeurIPS 2025 Workshop on Space in Vision, Language, and Embodied AI

SpaVLE

Submission deadline
Sep 3, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (56)

Fetched from OpenReview (v2) on 2026-06-10.

  1. An Emergent Symbolic Representation of Space as a Bridge Between Language and Reinforcement Learning in Continuous Environments

    Ziqi Ma, Sao Mai Nguyen, Philippe Xu · PDF
  2. Avi: A 3D Vision-Language Action Model Architecture generating Action from Volumetric Inference

    Harris Song, Long Le · PDF
  3. BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning

    Hongyi Zhou, Weiran Liao, Xi Huang, Yucheng Tang, Fabian Otto, Xiaogang Jia, Xinkai Jiang, Simon Hilber, Ge Li, Qian Wang, Ömer Erdinç Yağmurlu, Nils Blank, Moritz Reuss, Rudolf Lioutikov · PDF
  4. Breaking Down and Building Up: Mixture of Skill-Based Vision-and-Language Navigation Agents

    Tianyi Ma, Yue Zhang, Zehao Wang, Parisa Kordjamshidi · PDF
  5. Bridging Embodiment Gaps: Deploying Vision-Language-Action Models on Soft Robots

    Haochen Su, Cristian Meo, Francesco Stella, Andrea Peirone, Kai Junge, Josie Hughes · PDF
  6. COREVQA: Spatial Reasoning and Multi-Step Visual Entailment in Crowded Environments

    Kazuma Choji, Ishant Yunay Chintapatla, Naaisha Agarwal, Andrew Lwin, Charles Duong · PDF
  7. DenseScan: Advancing 3D Scene Understanding with 2D Dense Annotation

    Zirui Wang, Tao Zhang · PDF
  8. Evaluation of Vision-LLMs in Surveillance Video

    Pascal Benschop, Cristian Meo, Justin Dauwels · PDF
  9. Every Camera Effect, Every Time, All at Once: 4D Gaussian Ray Tracing for Physics-based Camera Effect Data Generation

    Yi-Ruei Liu, You-Zhe Xie, Yu-Hsiang Hsu, I-Sheng Fang, Yu-Lun Liu, Jun-Cheng Chen · PDF
  10. FINDINGDORY: A Benchmark to Evaluate Memory in Embodied Agents

    Karmesh Yadav, Yusuf Ali, Gunshi Gupta, Yarin Gal, Zsolt Kira · PDF
  11. Flow Equivariant World Models: Structured Dynamics Outside the Field of View

    Hansen Lillemark, Benhao Huang, Fangneng Zhan, Yilun Du, T. Anderson Keller · PDF
  12. FoR-SALE: Frame of Reference-guided Spatial Adjustment in LLM-based Diffusion Editing

    Tanawan Premsri, Parisa Kordjamshidi · PDF
  13. From Static Domain Adaptation to State-Adaptive Perception in Embodied Agents

    Yu Zhang · PDF
  14. GeoGrid-Bench: Can Foundation Models Understand Multimodal Gridded Geo-Spatial Data?

    Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Jiashu He, John K Hutchison, Camillo Jose Taylor, Tanwi Mallick · PDF
  15. Grounding Foundational Vision Models with 3D Human Poses for Robust Action Recognition

    Nicholas Babey, Tiffany Gu, Yiheng Li, Cristian Meo, Kevin Zhu · PDF
  16. Hierarchical Equivariant Policy via Frame Transfer

    Haibo Zhao, Dian Wang, Yizhe Zhu, Xupeng Zhu, Owen Lewis Howell, Linfeng Zhao, Yaoyao Qian, Robin Walters, Robert Platt · PDF
  17. Hierarchical Object-Oriented POMDP Planning for Object Rearrangement

    Rajesh Devaraddi Mangannavar, Alan Fern, Prasad Tadepalli · PDF
  18. I Know Kung Fu: Synthetic Dexterous Hand Demonstration Collection via VR Teleoperation

    Kara Lu, Yanzi He, Cohen Lu, Peihao Li · PDF
  19. Improving Vision-and-Language Navigation with Explicit Sub-Instruction Alignment

    Mulang Shi · PDF
  20. LayerFusion: Harmonized Multi-Layer Text-to-Image Generation with Generative Priors

    Yusuf Dalva, Yijun Li, Qing Liu, Nanxuan Zhao, Jianming Zhang, Zhe Lin, Pinar Yanardag · PDF
  21. LayoutAgent: A Vision-Language Agent Guided Compositional Diffusion for Spatial Layout Planning

    Zezhong Fan, Xiaohan Li, Luyi Ma, Kai Zhao, Liang Peng, Topojoy Biswas, Evren Korpeoglu, Kaushiki Nag, Kannan Achan · PDF
  22. Learning Dynamics of Multitask Training Data in Vision Language Models

    Tyler Zhu, Nathan Koome Murungi, Polina Kirichenko, Olga Russakovsky · PDF
  23. Look and Tell: A Dataset for Multimodal Grounding Across Egocentric and Exocentric Views

    Anna Deichler, Jonas Beskow · PDF
  24. Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots

    Junyao Shi, Rujia Yang, Kaitian Chao, Bingqing Selina Wan, Yifei Simon Shao, Jiahui Lei, Jianing Qian, Long Le, Pratik Chaudhari, Kostas Daniilidis, Chuan Wen, Dinesh Jayaraman · PDF
  25. Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

    Xinjie Shen, Mufei Li, Pan Li · PDF
  26. MetaVLA: Unified Meta Co-Training for Efficient Embodied Adaptation

    Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, Anudeepsekhar Bolimera, Marios Savvides · PDF
  27. Motion as Language: Towards a Situation–Motion Language for Spatio-Temporal Learning

    Alejandro Sanchez Guinea, Achref Doula, Thomas Kreutz · PDF
  28. NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language

    Danial Kamali, Parisa Kordjamshidi · PDF
  29. NinA: Normalizing Flows in Action. Training VLA Models with Normalizing Flows

    Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Lyubaykin Nikita, Andrei Polubarov, Alexander Derevyagin, Vladislav Kurenkov · PDF
  30. Object-Centric Agentic Robot Policies

    Sacha Morin, Kumaraditya Gupta, Mahtab Sandhu, Charlie Gauthier, Francesco Argenziano, Kirsty Ellis, Liam Paull · PDF
  31. Probing the Limits of Embodied Spatial Planning in LLMs

    Xiangjue Dong, Manling Li, James Caverlee · PDF
  32. Rethinking the Simulation vs. Rendering Dichotomy: No Free Lunch in Spatial World Modelling

    Dezhi Luo, Qingying Gao, Hokin Deng · PDF
  33. Revisiting Depth Representations for Feed-Forward 3D Gaussian Splatting

    Duochao Shi, Weijie Wang, Donny Y. Chen, Zeyu Zhang, Jia-Wang Bian, Bohan Zhuang · PDF
  34. RIV-CoT: Retrieval-Based Interleaved Visual Chain-of-Thought for Multimodal Reasoning

    Charles Corbière, Simon Roburin, Syrielle Montariol, Antoine Bosselut, Alexandre Alahi · PDF
  35. RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems

    Mingcong Lei, Honghao Cai, Zezhou Cui, Liangchen Tan, Junkun Hong, Gehan Hu, Shuangyu Zhu, Yimou Wu, Shaohan Jiang, Ge Wang, Zhen Li, Shuguang Cui, Yiming Zhao, Yatong Han · PDF
  36. ROSE: Reconstructing Objects, Scenes, and Trajectories from Casual Videos for Robotic Manipulation

    Peihao Li, Haoran Geng, Jameson Crate, Yanbing Han, Junyi Zhang, Feishi Wang, Charlie Tianyue Cheng, Runpei Dong, Yen-Jen Wang, Haozhe Lou, Trevor Darrell, Pieter Abbeel, Jitendra Malik · PDF
  37. See it. Say it. Sorted: Agentic System for Compositional Diagram Generation

    Hantao Zhang, Jingyang Liu, Ed Li · PDF
  38. Seeing Beyond the Scene: Analyzing and Mitigating Background Bias in Action Recognition

    Ellie Zhou, Jihoon Chung, Olga Russakovsky · PDF
  39. Self-Augmented Learning of Differentiable Object Models for Compositional Interpretation of Complex Scenes

    Antoni Nowinowski, Krzysztof Krawiec · PDF
  40. SITCOM: Scaling Inference-Time COMpute for VLAs

    Ayudh Saxena, Harsh Shah, Sandeep Routray, Rishi Rajesh Shah, Esha Pahwa · PDF
  41. Spatial Reasoning in Foundation Models: Benchmarking Object-Centric Spatial Understanding

    Vahid Mirjalili, Ramin Giahi, Sriram Kollipara, Akshay Kekuda, Kehui Yao, Kai Zhao, Jianpeng Xu, Kaushiki Nag, Sinduja Subramaniam, Topojoy Biswas, Evren Korpeoglu, Kannan Achan · PDF
  42. SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

    Byungwoo Jeon, Dongyoung Kim, Huiwon Jang, Insoo Kim, Jinwoo Shin · PDF
  43. SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

    Hunar Batra, Haoqin Tu, Hardy Chen, Yuanze Lin, Cihang Xie, Ronald Clark · PDF
  44. Spatio-Temporal Grounding of Large Language Models from Perception Streams

    Jacob Anderson, Bardh Hoxha, Georgios Fainekos, HIDEKI OKAMOTO, Danil V. Prokhorov · PDF
  45. SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs

    Yuyou Zhang, Radu Corcodel, Chiori Hori, Anoop Cherian, Ding Zhao · PDF
  46. TC-LoRA: Temporally Modulated Conditional LoRA for Adaptive Diffusion Control

    Minkyoung Cho, Ruben Ohana, Christian Jacobsen, Adityan Jothi, Zhuoqing Mao, Min-Hung Chen, Ethem F. Can · PDF
  47. Think, Remember, Navigate: Zero-Shot Object-Goal Navigation with VLM-Powered Reasoning

    mobin habibpour, Fatemeh Afghah · PDF
  48. TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

    Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao · PDF
  49. Towards Understanding Multimodal Fine-Tuning: A Case Study into Spatial Features

    Lachin Naghashyar, Hunar Batra, Ashkan Khakzar, Philip Torr, Ronald Clark, Christian Schroeder de Witt, Constantin Venhoff · PDF
  50. TriFusion-AE: Language-Guided Depth and LiDAR Fusion for Robust Point Cloud Processing

    Susmit Neogi · PDF
  51. VFSI: Validity First Spatial Intelligence for Constraint-Guided Traffic Diffusion

    Kargi Chauhan, Leilani H. Gilpin · PDF
  52. Viewpoint-Invariant Latent Action Learning from Human Video Demonstrations

    Jung Min Lee, Dohyeok Lee, Jungwoo Lee · PDF
  53. VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning

    Li Kang, Xiufeng Song, Heng Zhou, Yiran Qin, Jie Yang, Xiaohong Liu, Philip Torr, LEI BAI, Zhenfei Yin · PDF
  54. ViPRA: Video Prediction for Robot Actions

    Sandeep Routray, Hengkai Pan, Unnat Jain, Shikhar Bahl, Deepak Pathak · PDF
  55. Weakly-supervised Latent Models for Task-specific Visual-Language Control

    Xian Yeow Lee, Lasitha Vidyaratne, Gregory Sin, Ahmed K. Farahat, Chetan Gupta · PDF
  56. Wholly Unsupervised! Segmenting Objects by Contrast and Context

    Fei Pan, Yixing Wang, Sangryul Jeon, Stella X. Yu · PDF