ICLR 2026 Past Math & reasoningEfficiency
The First Workshop on Efficient Spatial Reasoning
ES-Reasoning @ ICLR 2026
- Submission deadline
- Feb 13, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (48)
Fetched from OpenReview (v2) on 2026-06-10.
-
An Analysis of Reasoning Length Scaling and Positional Effects in Vision Language Models for Spatial Reasoning
-
Anytime Safe PAC Efficient Reasoning
-
Bio-Inspired Spatial Reasoning Transformer: Grid Cells, Place Cells, and Attractor Dynamics for Text-Based Spatial Understanding
-
CivicEmbed: Feature-specific embeddings for efficient geographic reasoning and retrieval
-
Demystifying Action Space Design for Robotic Manipulation Policies
-
DREAM-R: Multimodal Speculative Reasoning with RL-Based Refined Drafting, Precise Verification, and Fully Parallel Execution
-
EarthSpatialBench: Benchmarking Spatial Reasoning Capabilities of Multimodal LLMs on Earth Imagery
-
Efficient Dense Features With BRIXEL
-
Embedding Morphology into Transformers for Cross-Robot Policy Learning
-
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
-
Enhancing Aerial Vision-Language Navigation with Map Grounding and History Awareness
-
Evaluating VLMs' Spatial Reasoning Over Robot Motion: A Step Towards Robot Planning with Motion Preferences
-
Explicit 3D Spatial Reasoning via Program Generation
-
FlashDrive: Flash Vision-Language-Action Inference for Autonomous Driving
-
FROM STEERING TO PEDALLING: DO AUTONOMOUS DRIVING VLMS GENERALIZE TO CYCLIST-ASSISTIVE SPATIAL PERCEPTION AND PLANNING?
-
FROST: Filtering Reasoning Outliers with Attention for Efficient Reasoning
-
Geometry-aware 4D Video Generation for Robot Manipulation
-
GRAID: Enhancing Spatial Reasoning of VLMs through High-Fidelity Data Generation
-
HiResNets: Native Full-HD Video Recognition with Foveal Residual Streams
-
Improving GUI Grounding with Explicit Position-to-Coordinate Mapping
-
LEO-VL: Efficient Scene Representation for Scalable 3D Vision-Language Learning
-
LocationReasoner: Evaluating LLMs on Real-World Site Selection Reasoning
-
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse
-
Multimodal Language Models Cannot Spot Spatial Inconsistencies
-
Omni-View: Unlocking How Generation Facilitates Understanding in Unified 3D Model based on Multiview images
-
On the Provable Performance Guarantee of Efficient Reasoning Models
-
Orion: A Fully Deterministic and Interpretable Pipeline for Video Scene Graph Generation with Explicit Causal Influence Scoring
-
PhyRPR: Training-Free Physics-Constrained Video Generation
-
PhysNote: Self-Knowledge Notes for Evolvable Physical Reasoning in Vision-Language Model
-
PointWorld: Scaling 3D World Models for In-The-Wild Robotic Manipulation
-
Probing Perceptual Constancy in Large Vision-Language Models
-
Probing Visual Planning in Image Editing Models
-
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision–Language–Action Models via Latent Iterative Reasoning
-
REMAP: Evaluating Geometric Dual Representations in Multi-view Spatial Reasoning
-
ReSpace: Text-Driven Autoregressive 3D Indoor Scene Synthesis and Editing
-
RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic
-
SCOPE: Spatially-Constrained Parametric Editing for Text-Guided CAD Models
-
Seeing Once is Enough? Online Geometry-Aware Token Pruning for 3D Question Answering
-
Solving Spatial Supersensing Without Spatial Supersensing
-
Spatial Competence Benchmark
-
SpatialTree : How Spatial Abilities Branch Out in MLLMs
-
Structural Graph Probing of Vision–Language Models
-
SVQA-R1: Reinforcing Spatial Reasoning in MLLMs via View-Consistent Reward Optimization
-
The Dual Mechanisms of Spatial Reasoning in Vision–Language Models
-
TIDES: Test-time Inference Drift Exploitation via Scaling
-
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents
-
VisualThinker: First ever R1-Zero's Aha Moment on just a 2B non-SFT Model
-
ViTaB-A: Evaluating Multimodal Large Language Models on Visual Table Attribution