ICRA 2025PastLarge language modelsSafety & alignmentRobotics

1st Workshop on Safely Leveraging Vision-Language Foundation Models in Robotics: Challenges and Opportunities

ICRA-Safe-VLM-WS-2025

Official website ↗OpenReview venue ↗See all ICRA workshops →✎ Edit this entry

Submission deadline: May 3, 2025, 06:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (18)

Fetched from OpenReview (v2) on 2026-06-10.

Adapting Diffusion Policies to Human Preferences via Reward-Guided Fine-Tuning
Yuxin Chen, Devesh K. Jha, Masayoshi Tomizuka, Diego Romeres · PDF
Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion
Boyuan Liang, Lingfeng Sun, Xinghao Zhu, Bike Zhang, Ziyin Xiong, Yixiao Wang, Chenran Li, Koushil Sreenath, Masayoshi Tomizuka · PDF
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Arthur Zhang, Harshit Sikchi, Amy Zhang, Joydeep Biswas · PDF
DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation
Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad Mahi Shafiullah, Lerrel Pinto · PDF
Human-in-the-loop Foundation Model Failure Recovery for Robot-Assisted Bite Acquisition
Krishna Palempalli, Rohan Banerjee, Sarah Dean, Tapomayukh Bhattacharjee · PDF
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan, Xiaosong Jia, Yihang Sun, Yixiao Wang, Jianglan Wei, ZiYang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding · PDF
KitchenVLA: Iterative Vision-Language Corrections for Robotic Execution of Human Tasks
Kai Lu, Chenyang Ma, Chiori Hori, Diego Romeres · PDF
Let's Talk About Language! Investigating Linguistic Diversity in Embodied AI Datasets
Selma Liliane Wanna, Agnes Luhtaru, Ryan Barron, Jonathan Salfity, Juston Moore, Cynthia Matuszek, Mitch Pryor · PDF
MAGIC-VFM Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models
Elena Sorina Lupu, Fengze Xie, James A Preiss, Jedidiah Alindogan, Matthew Anderson, Soon-Jo Chung · PDF
OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations
Christina Kassab, Sacha Morin, Martin Büchner, Matias Mattamala, Kumaraditya Gupta, Abhinav Valada, Liam Paull, Maurice Fallon · PDF
Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture
Hong Lu, Matthias Scheutz · PDF
Residual Policy Gradient: A Reward View of KL-regularized Objective
Pengcheng Wang, Xinghao Zhu, Yuxin Chen, Chenfeng Xu, Masayoshi Tomizuka, Chenran Li · PDF
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
Haritheja Etukuru, Norihito Naka, Zijin Hu, Seungjae Lee, Chris Paxton, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah · PDF
Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust
Asher James Hancock, Allen Z. Ren, Anirudha Majumdar · PDF
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards
Lukas Brunke, Yanni Zhang, Ralf Römer, Jack Naimer, Nikola Staykov, SiQi Zhou, Angela P. Schoellig · PDF
Towards Safe Robot Foundation Models Using Inductive Biases
Maximilian Tölle, Theo Gruner, Daniel Palenicek, Tim Schneider, Jonas Günster, Joe Watson, Davide Tateo, Puze Liu, Jan Peters · PDF
Versatile Legged Locomotion Adaptation through Vision-Language Grounding
I Made Aswin Nahrendra, Seunghyun Lee, Dongkyu Lee, Hyun Myung · PDF
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
Max Peter Ronecker, Matt Foutter, Amine Elhafsi, Daniele Gammelli, Ihor Barakaiev, Marco Pavone, Daniel Watzenig · PDF

Accepted papers (18)

☆Adapting Diffusion Policies to Human Preferences via Reward-Guided Fine-Tuning

☆Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

☆CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

☆DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

☆Human-in-the-loop Foundation Model Failure Recovery for Robot-Assisted Bite Acquisition

☆Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

☆KitchenVLA: Iterative Vision-Language Corrections for Robotic Execution of Human Tasks

☆Let's Talk About Language! Investigating Linguistic Diversity in Embodied AI Datasets

☆MAGIC-VFM Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

☆OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations

☆Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture

☆Residual Policy Gradient: A Reward View of KL-regularized Objective

☆Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

☆Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust

☆Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

☆Towards Safe Robot Foundation Models Using Inductive Biases

☆Versatile Legged Locomotion Adaptation through Vision-Language Grounding

☆Vision Foundation Model Embedding-Based Semantic Anomaly Detection

Adapting Diffusion Policies to Human Preferences via Reward-Guided Fine-Tuning

Adaptive Energy Regularization for Autonomous Gait Transition and Energy-Efficient Quadruped Locomotion

CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Human-in-the-loop Foundation Model Failure Recovery for Robot-Assisted Bite Acquisition

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

KitchenVLA: Iterative Vision-Language Corrections for Robotic Execution of Human Tasks

Let's Talk About Language! Investigating Linguistic Diversity in Embodied AI Datasets

MAGIC-VFM Meta-learning Adaptation for Ground Interaction Control with Visual Foundation Models

OpenLex3D: A New Evaluation Benchmark for Open-Vocabulary 3D Scene Representations

Probing a Vision-Language-Action Model for Symbolic States and Integration into a Cognitive Architecture

Residual Policy Gradient: A Reward View of KL-regularized Objective

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust

Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

Towards Safe Robot Foundation Models Using Inductive Biases

Versatile Legged Locomotion Adaptation through Vision-Language Grounding

Vision Foundation Model Embedding-Based Semantic Anomaly Detection