CVPR 2026 Past Large language modelsAgentsRobotics
The 2nd Workshop on Foundation Models Meet Embodied Agents at CVPR 2026
FMEA @ CVPR 2026
- Submission deadline
- May 11, 2026, 23:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (34)
Fetched from OpenReview (v2) on 2026-06-10.
-
$Re^2$: Reflective Rule Induction and Rule-Guided Refinement for Embodied Planning
-
A Physics-Grounded Benchmark for Multi-Agent Dynamics in World Models
-
ADeltaM: An Exploratory Counterfactual Delta-Memory Interface for Egocentric Agents
-
Automated Skill Optimization via Formal Verification for Embodied Agents
-
EmbodiedMidtrain: Bridging the Gap between Vision-Language Models and Vision-Language-Action Models via Mid-training
-
EvoWorld: A World-Model-Centric Framework for Continuous Self-Evolution of Modular Embodied Skills
-
FunFact: Building Probabilistic Functional 3D Scene Graphs via Factor-Graph Reasoning
-
GeoWorld-VLM: Sequential 3D Generation via Evidential Memory
-
HoMMI: Learning Whole-Body Mobile Manipulation from Human Demonstrations
-
Inference-Time Planning with Action-Conditioned Video Models for Generalizable Robot Manipulation
-
InterPrior: Scaling Generative Control for Physics-Based Human-Object Interactions
-
LARE: Low-Attention Region Encoding for Text--Image Retrieval
-
Learning Situated Awareness in the Real World
-
Making Your Action Policies Interpretable: Mixture of Action Queries
-
MASER: Modality-Adaptive Specialist Routing for Embodied 3D Spatial Intelligence
-
MOSAIC: The Right Modules for Each Task in Embodied Agents
-
Multimodal Causal Subtask Modeling for Scalable VLA Pipelines in Long-Horizon Manipulation
-
PEFT Methods for Embodied VLM Agents: A Systematic Study and MoE-DoRA
-
PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning
-
PInVerify: An Offline Embodied Benchmark for Active Instance Verification
-
PLanAR: Planning-Language-Grounded Agentic Reasoning for Robot Manipulation
-
RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks
-
RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
-
Scene2Demo: Self-Evolving Embodied Data Generation via Object-Action Graph
-
Self-Improving Loops for Visual Robotic Planning
-
Semantic Horizons: Information-Theoretic Limits of Foundation Model-Guided Embodied Planning
-
Task-Relevant Depth Quality Metrics for Suction Grasping
-
Theory of Space: Benchmarking Active Spatial Belief Construction and Revision in Foundation Models for Embodied Agents
-
TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments
-
VL-Nav: A Neuro-Symbolic Approach for Reasoning-based Vision-Language Navigation
-
VLS: Steering Pretrained Robot Policies via Vision–Language Models
-
WFM-Eval: Interpretable Error Diagnostics for Video World Models in Robotics
-
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning