CVPR 2025 Past Large language modelsAgentsRobotics
Workshop on Foundation Models Meet Embodied Agents at CVPR 2025
FMEA @ CVPR 2025
- Submission deadline
- May 26, 2025, 19:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (19)
Fetched from OpenReview (v2) on 2026-06-10.
-
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
-
AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives
-
Beyond Needle(s) in the Embodied Haystack: Environment, Architecture, and Training Considerations for Long Context Reasoning
-
Embodied AI with Knowledge Graphs: Material-Aware Obstacle Handling for Autonomous Agents
-
Episodic Memory Banks for Lifelong Robot Learning: A Case Study Focusing on Household Navigation and Manipulation
-
Human-like Navigation in a World Built for Humans
-
Interactive Post-Training for Vision-Language-Action Models
-
Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation
-
Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving
-
One Demo Is All It Takes: Planning Domain Derivation with LLMs from A Single Demonstration
-
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
-
Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations
-
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation
-
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
-
SemNav: A Model-Based Planner for Zero-Shot Object Goal Navigation Using Vision-Foundation Models
-
Slot-Level Robotic Placement via Visual Imitation from Single Human Video
-
TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
-
Visual Planning: Let's Think Only with Images
-
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos