ICML 2024 Past Large language modelsRoboticsMultimodal

Multi-modal Foundation Model meets Embodied AI Workshop @ ICML2024

MFM-EAI@ICML2024

Submission deadline
May 31, 2024, 23:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (23)

Fetched from OpenReview (v2) on 2026-06-10.

  1. An Embodied Generalist Agent in 3D World

    Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang · PDF
  2. BEDD: The MineRL BASALT Evaluation and Demonstrations Dataset for Training and Benchmarking Agents that Solve Fuzzy Tasks

    Stephanie Milani, Anssi Kanervisto, Karolis Jucys, Sander V Schulhoff, Brandon Houghton, Rohin Shah · PDF
  3. Behavior Generation with Latent Actions

    Seungjae Lee, Yibin Wang, Haritheja Etukuru, H. Jin Kim, Nur Muhammad Mahi Shafiullah, Lerrel Pinto · PDF
  4. DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

    Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan · PDF
  5. DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

    Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar · PDF
  6. DPO-Finetuned Large Multi-Modal Planner with Retrieval-Augmented Generation @ EgoPlan Challenge ICML 2024

    Kwanghyeon Lee, Mina Kang, Hyungho Na, HeeSun Bae, Byeonghu Na, Doyun Kwon, Seungjae Shin, Yeongmin Kim, Kim taewoo, Seungmin Yun, Il-chul Moon · PDF
  7. EPD: Long-term Memory Extraction, Context-aware Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024

    Letian Shi, Qi Lv, Xiang Deng, Liqiang Nie · PDF
  8. GROOT-1.5: Learning to Follow Multi-Modal Instructions from Weak Supervision

    Shaofei Cai, Bowei Zhang, Zihao Wang, Xiaojian Ma, Anji Liu, Yitao Liang · PDF
  9. Hierarchical State Space Models for Continuous Sequence-to-Sequence Modeling

    Raunaq Bhirangi, Chenyu Wang, Venkatesh Pattabiraman, Carmel Majidi, Abhinav Gupta, Tess Hellebrekers, Lerrel Pinto · PDF
  10. Instruction-Guided Visual Masking

    Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan · PDF
  11. Jina CLIP: Your CLIP Model Is Also Your Text Retriever

    Han Xiao, Georgios Mastrapas, Bo Wang · PDF
  12. LEGENT: Open Platform for Embodied Agents

    Zhili Cheng, Jinyi Hu, Zhitong Wang, Yuge Tu, Shengding Hu, An Liu, Pengkai Li, Lei Shi, Zhiyuan Liu, Maosong Sun · PDF
  13. LLM3: Large Language Model-based Task and Motion Planning with Motion Failure Reasoning

    Shu Wang, Muzhi Han, Ziyuan Jiao, Zeyu Zhang, Ying Nian Wu, Song-Chun Zhu, Hangxin Liu · PDF
  14. Long-Horizon Planning for Multi-Agent Robots in Partially Observable Environments

    Siddharth Nayak, Adelmo Morrison Orozco, Marina Ten Have, Jackson Zhang, Vittal Thirumalai, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, James Harrison, Anuj Mahajan, brian ichter, Hamsa Balakrishnan · PDF
  15. MAP-THOR: Benchmarking Long-Horizon Multi-Agent Planning Frameworks in Partially Observable Environments

    Siddharth Nayak, Adelmo Morrison Orozco, Marina Ten Have, Vittal Thirumalai, Jackson Zhang, Darren Chen, Aditya Kapoor, Eric Robinson, Karthik Gopalakrishnan, brian ichter, James Harrison, Anuj Mahajan, Hamsa Balakrishnan · PDF
  16. Multimodal foundation world models for generalist embodied agents

    Pietro Mazzaglia, Tim Verbelen, Bart Dhoedt, Aaron Courville, Sai Rajeswar · PDF
  17. OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang · PDF
  18. RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

    Chenxi Wang, Hongjie Fang, Hao-Shu Fang, Cewu Lu · PDF
  19. RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model

    Hantao Zhou, Tianying Ji, Lukas Sommerhalder, Michael Görner, Norman Hendrich, Fuchun Sun, Jianwei Dr. Zhang, Huazhe Xu · PDF
  20. STREAM: Embodied Reasoning through Code Generation

    Daniil Cherniavskii, Phillip Lippe, Andrii Zadaianchuk, Efstratios Gavves · PDF
  21. The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts

    Wakana Haijima, Kou Nakakubo, Masahiro Suzuki, Yutaka Matsuo · PDF
  22. Vision-Language Models Provide Promptable Representations for Reinforcement Learning

    William Chen, Oier Mees, Aviral Kumar, Sergey Levine · PDF
  23. What can VLMs Do for Zero-shot Embodied Task Planning?

    Xian Fu, Min Zhang, Jianye HAO, Peilong Han, Hao Zhang, Lei Shi, Hongyao Tang · PDF