NeurIPS 2024PastAgents

NeurIPS 2024 Workshop on Open-World Agents

NeurIPS 2024 Workshop Open-World Agents

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Sep 21, 2024, 00:01 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (97)

Fetched from OpenReview (v2) on 2026-06-10.

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David Fouhey, Joyce Chai · PDF
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Prakash Patil, Maximilian Beck, Korbinian Pöppel, Johannes Brandstetter, Günter Klambauer, Razvan Pascanu, Sepp Hochreiter · PDF
A Simplified A Priori Theory Of Meaning, –Nature based AI ‘first principles’–
Marcus Abundis · PDF
Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset
Shankar Kumar Jeyakumar, Alaa Alameer Ahmad, Adrian Garret Gabriel · PDF
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang · PDF
Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems
Tamer Abuelsaad, Deepak Akkil, Prasenjit Dey, Ashish Jagmohan, Aditya Vempaty, Ravi Kokku · PDF
Agentic Anomaly Detection for Shipping
Alexander Timms, Abigail Langbridge, Fearghal O'Donncha · PDF
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
Konstantina Christakopoulou, Shibl Mourad, Maja Mataric · PDF
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng, Zhiyuan Huang, Zhenghai Xue, Xinrun Wang, Bo An, Shuicheng YAN · PDF
An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning
Eric Ye, Natasha Jaques · PDF
Are Expressive Models Truly Necessary for Offline RL?
Guan Wang, Haoyi Niu, Jianxiong Li, Li Jiang, Jianming HU, Xianyuan Zhan · PDF
Articulated Animal AI: An Environment for Animal-like Cognition in a Limbed Agent
Jeremy Lucas, Isabeau Prémont-Schwarz · PDF
Automated Design of Agentic Systems
Shengran Hu, Cong Lu, Jeff Clune · PDF
Automating Thought of Search: A Journey Towards Soundness and Completeness
Daniel Yiming Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, Shirin Sohrabi · PDF
Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case
Peng Chen, Pi Bu, Jun Song, Yuan Gao, Bo Zheng · PDF
CARD: Cross-modal Agent Framework for Generative and Editable Residential Design
Pengyu Zeng, Maowei Jiang, Zihang Wang, Jizhizi Li, Jun Yin, Shuai Lu · PDF
Chain-of-Imagination for Reliable Instruction Following in Decision Making
Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu Sheng, Yu Qiao, Jing Shao · PDF
Cognitive Planning for Object Goal Navigation using Generative AI Models
Arjun P S, Andrew Melnik, Gora Chand Nandi · PDF
Collective Wisdom in Language Models: Harnessing LLM-Swarm for Agile Project Management
Tahmid Hussain, Tashin Ahmed, Md Shahedul Haque, Mohammad Rifat Ahmmad Rashid · PDF
CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents
Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li · PDF
Cradle: Empowering Foundation Agents towards General Computer Control
Weihao Tan, Wentao Zhang, Xinrun Xu, Haochong Xia, Gang Ding, Boyu Li, Bohan Zhou, Junpeng Yue, Jiechuan Jiang, Yewen Li, Ruyi An, Molei Qin, Chuqiao Zong, Longtao Zheng, YuJie Wu, Xiaoqiang Chai, Yifei Bi, Tianbao Xie, Pengjie Gu, Xiyun Li, Ceyao Zhang, Long Tian, Chaojie Wang, Xinrun Wang, Börje F. Karlsson, Bo An, Shuicheng YAN, Zongqing Lu · PDF
DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems
Aman Gupta, Anirudh Ravichandran, Ziji Zhang, Swair Shah, Anurag Beniwal, Narayanan Sadagopan · PDF
DepsRAG: Towards Agentic Reasoning and Planning for Software Dependency Management
Mohannad Alhanahnah, Yazan Boshmaf · PDF
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Henry Wu, Rishi Rajesh Shah, Jing Yu Koh, Russ Salakhutdinov, Daniel Fried, Aditi Raghunathan · PDF
Do LLM Personas Dream of Bull Markets? Comparing Human and AI Investment Strategies Through the Lens of the Five-Factor Model
Harris Borman, Anna Leontjeva, Luiz Pizzato, Max Kun Jiang, Dan Jermyn · PDF
Efficient Reinforcement Learning via Large Language Model-based Search
Siddhant Bhambri, Amrita Bhattacharjee, huan liu, Subbarao Kambhampati · PDF
ENHANCING DATA EFFICIENCY IN REINFORCEMENT LEARNING: A NOVEL IMAGINATION MECHANISM BASED ON MESH INFORMATION PROPAGATION
Zihang Wang, Maowei Jiang, Pengyu Zeng, Ruiqi Li, Quangao Liu, Peter Búš · PDF
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms
Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Dongsheng Li, Deqing Yang · PDF
FEABench: Evaluating Language Models on Real World Physics Reasoning Ability
Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael Brenner, Peter Christian Norgaard · PDF
Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think
Massimo Caccia, Megh Thakkar, Léo Boisvert, Thibault Le Sellier de Chezelles, Alexandre Piché, Nicolas Chapados, Alexandre Drouin, Maxime Gasse, Alexandre Lacoste · PDF
First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs
Ben Norman, Jeff Clune · PDF
FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL
Woosung Koh, Wonbeen Oh, Siyeol Kim, Suhin Shin, Hyeongjin Kim, Jaein Jang, Junghyun Lee, Se-Young Yun · PDF
FPGA-Gym: An FPGA-Accelerated Reinforcement Learning Environment Simulation Framework
Jiayi Li, Hongxiao Zhao, Wenshuo Yue, Yihan Fu, Daijing Shi, Anjunyi Fan, Qinghao Wang, Yaodong Yang, Bonan Yan · PDF
From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents
Nalin Tiwary, Vardhan Dongre, Sanil Arun Chawla, Ashwin Lamani, Dilek Hakkani Tur · PDF
Generalized Open-World Semi-Supervised Object Detection
Garvita Allabadi, Ana Lucic, Siddarth Aananth, Tiffany Yang, Yu-Xiong Wang, Vikram S. Adve · PDF
GTA: A Benchmark for General Tool Agents
Jize Wang, Ma Zerun, Yining Li, Songyang Zhang, Cailian Chen, Kai Chen, Xinyi Le · PDF
HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models
Zichen Song, Sitan Huang · PDF
Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models
Logan Cross, Violet Xiang, Agam Bhatia, Daniel LK Yamins, Nick Haber · PDF
IDEA: Enhancing the Rule Learning Ability of Language Agent through Induction, Deduction, and Abduction
Kaiyu He, Mian Zhang, Shuo yan, Peilin Wu, Zhiyu Chen · PDF
IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks
Yanjie Li, Zhen Xiang, Nathaniel D. Bastian, Dawn Song, Bo Li · PDF
Improving Decision-Making in Open-World Agents with Conformal Prediction and Monty Hall
Harit Vishwakarma, Alan Mishler, Thomas Cook, Niccolo Dalmasso, Natraj Raman, Sumitra Ganesh · PDF
In-Context Imitation Learning via Next-Token Prediction
Letian Fu, Huang Huang, Gaurav Datta, Lawrence Yunliang Chen, William Chung-Ho Panitch, Fangchen Liu, Hui Li, Ken Goldberg · PDF
Infer Human’s Intentions Before Following Natural Language Instructions
Yanming Wan, Yue Wu, Yiping Wang, Jiayuan Mao, Natasha Jaques · PDF
Infogent: An Agent-based Framework for Web Information Aggregation
Revanth Gangi Reddy, Sagnik Mukherjee, Jeonghwan Kim, Zhenhailong Wang, Dilek Hakkani Tur, Heng Ji · PDF
Integrating Visual and Linguistic Instructions for Context-Aware Navigation Agents
Suhwan Choi, Yongjun Cho, Minchan Kim, Jaeyoon Jung, Myunchul Joe, Park Yu Been, Minseo Kim, Sungwoong Kim, Sungjae Lee, WHISEONG PARK, Jiwan Chung, Youngjae Yu · PDF
Interactive Navigation of Quadruped Robots in Challenging Environments using Large Language Models
Kangjie Zhou, Yao Mu, Pengying Wu, Han Gao, Chang Liu · PDF
Inverse Attention Agent in Multi-Agent System
Qian Long, Ruoyan Li, Minglu Zhao, Tao Gao, Demetri Terzopoulos · PDF
Language Models and Symbolic Planners can Infer Action Semantics through Environment Feedback
Wang Bill Zhu, Ishika Singh, Robin Jia, Jesse Thomason · PDF
Learning Region-Word Alignment with Attentive Masking for Open-Vocabulary Object Detection
Masoumeh Zareapoor, Pourya Shamsolmoali, Yue Lu · PDF
Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning
Alicia Li, Nishanth Kumar, Tomás Lozano-Pérez, Leslie Pack Kaelbling · PDF
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste, Jianye HAO, Jun Wang, Kun Shao · PDF
LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs
Volker Strobel, Marco Dorigo, Mario Fritz · PDF
LLM4Drive: A Survey of Large Language Models for Autonomous Driving
Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan · PDF
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench
Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati · PDF
MASAI: Modular Architecture for Software-engineering AI Agents
Nalin Wadhwa, Atharv Sonwane, Daman Arora, Abhav Mehrotra, Saiteja Utpala, Ramakrishna B Bairi, Aditya Kanade, Nagarajan Natarajan · PDF
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Somnath Sendhil Kumar, Yash Vinesh Gadhia, Tanuja Ganu, Akshay Nambi · PDF
MobileFlow: A Multimodal LLM For Mobile GUI Agent
Songqin Nong, Jiali Zhu, Rui Wu, Jiongchao Jin, Shuo Shan, Xiutian Huang, Wenhao Xu · PDF
Multimodal Auto Validation For Self-Refinement in Web Agents
Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish Jagmohan · PDF
OASIS: Open Agents Social Interaction Simulations on One Million Agents
Ziyi Yang, Zaibin Zhang, Zirui Zheng, Yuxian Jiang, Ziyue Gan, Zhiyu Wang, Zijian Ling, Konisberg, Martz Ma, Bowen Dong, Prateek Gupta, Shuyue Hu, Zhenfei Yin, Guohao Li, Xu Jia, Lijun Wang, Bernard Ghanem, Huchuan Lu, Wanli Ouyang, Yu Qiao, Philip Torr, Jing Shao · PDF
One-shot World Models Using a Transformer Trained on a Synthetic Prior
Fabio Ferreira, Moreno Schlageter, Raghu Rajan, André Biedenkapp, Frank Hutter · PDF
Planning as Inpainting: A Generative Framework for Realistic Embodied Path Planning
Cheng-Fu Yang, Haoyang Xu, Te-Lin Wu, Xiaofeng Gao, Kai-Wei Chang, Feng Gao · PDF
Policy optimization to align the validity, coherence and efficiency of reasoning agents in multi-turn dialogues
Jeremy Curuksu · PDF
Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena
Jiangjie Chen, Siyu Yuan, Rong Ye, Bodhisattwa Prasad Majumder, Kyle Richardson · PDF
Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models
Aaron Dharna, Cong Lu, Jeff Clune · PDF
RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents
Tomoyuki Kagaya, Thong Jing Yuan, Yuxuan Lou, Jayashree Karlekar, Sugiri Pranata, Akira Kinose, Koki Oguri, Felix Wick, Yang You · PDF
RAR-Agent: Retrieval Augmented Reflection Learning from Scratch for Reasoning
Shipeng Xie, HAICHAO ZHU, Da Chen · PDF
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning and Verification in Long-Horizon Generation
Zihao Wang, Anji Liu, Haowei Lin, Jiaqi Li, Xiaojian Ma, Yitao Liang · PDF
RefactorBench: Evaluating Stateful Reasoning In Language Agents Through Code
Dhruv Gautam, Spandan Garg, Jinu Jang, Neel Sundaresan, Roshanak Zilouchian Moghaddam · PDF
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman, Insup Lee · PDF
RH20T-P: A Primitive-Level Robotic Manipulation Dataset Towards Composable Generalization Agents in Real-world Scenarios
Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng · PDF
Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy
Zhenyu Guan, Xiangyu Kong, Fangwei Zhong, Yizhou Wang · PDF
Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning
Jianxiong Li, Zhihao Wang, Jinliang Zheng, Xiaoai Zhou, Guanming Wang, Guanglu Song, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Junzhi Yu, Xianyuan Zhan · PDF
Robust Offline Learning via Adversarial World Models
Uljad Berdica, Kelvin Li, Michael Beukman, Alexander David Goldie, Perla Maiolino, Jakob Nicolaus Foerster · PDF
ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian, Zhancun Mu, Xiaojian Ma, Anji Liu, Yitao Liang · PDF
Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation
Asad Ali Shahid · PDF
SEAL: Suite for Evaluating API-use of LLMs
Woojeong Kim, Ashish Jagmohan, Aditya Vempaty · PDF
SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals
Ruihan Yang, Jiangjie Chen, Yikai Zhang, Siyu Yuan, Aili Chen, Kyle Richardson, Yanghua Xiao, Deqing Yang · PDF
Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards
Lukas Brunke, Yanni Zhang, Ralf Römer, Jack Naimer, Nikola Staykov, SiQi Zhou, Angela P. Schoellig · PDF
ShowUI: One Vision-Language-Action Model for Generalist GUI Agent
Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Zechen Bai, Weixian Lei, Lijuan Wang, Mike Zheng Shou · PDF
Simulating User Agents for Embodied Conversational AI
Daniel Philipov, Vardhan Dongre, gokhan tur, Dilek Hakkani Tur · PDF
Sliding Puzzles Gym: A Scalable Benchmark for State Representation in Visual Reinforcement Learning
Bryan Lincoln Marques de Oliveira, Bruno Brandão, Murilo Lopes da Luz, Luana Guedes Barros Martins, Telma Woerle de Lima Soares, Luckeciano Carvalho Melo · PDF
SPA-BENCH: A COMPREHENSIVE BENCHMARK FOR SMARTPHONE AGENT EVALUATION
Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, Rui Shao, Liqiang Nie, Yasheng Wang, Jianye HAO, Jun Wang, Kun Shao · PDF
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Yiran Wu, Tianwei Yue, Shaokun Zhang, Chi Wang, Qingyun Wu · PDF
The Impact of Element Ordering on LM Agent Performance
Wayne Chi, Ameet Talwalkar, Chris Donahue · PDF
Thermal and Energy Management with Fan Control Through Offline Meta-Reinforcement Learning
Shao-Yu Yen, Yen Ru Lai, Fu-Chieh Chang, Pei-Yuan Wu · PDF
Towards Automated Patent Workflows: AI-Orchestrated Multi-Agent Framework for Intellectual Property Management and Analysis
Sagar Srinivas Sakhinana, Vijay sri vaikunth, Venkataramana Runkana · PDF
Towards Autonomous Agents: Adaptive-planning, Reasoning, and Acting in Language Models
Abhishek Dutta, Yen-Che Hsiao · PDF
Towards Humanoid: Value-Driven Agent Modeling Based on Large Language Models
Xuzheng Chen, Zhangshiyin, Guojie Song · PDF
Towards Principled Representation Learning from Videos for Reinforcement Learning
Dipendra Misra, Akanksha Saran, Tengyang Xie, Alex Lamb, John Langford · PDF
Towards Robust Estimation of Human Intention Hierarchy in Robot Teleoperation
Nikki Lijing Kuang, Songpo Li, Soshi Iba · PDF
Variational Inequality Perspective and Optimizers for Multi-Agent Reinforcement Learning
Baraah A. M. Sidahmed, Tatjana Chavdarova · PDF
VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Lawrence Keunho Jang, Yinheng Li, Charles Ding, Justin Lin, Paul Pu Liang, Dan Zhao, Rogerio Bonatti, Kazuhito Koishida · PDF
What Do You Mean by "Open World"?
Bowen Xu · PDF
Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Rogerio Bonatti, Dan Zhao, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, Lawrence Keunho Jang, Zheng Hui · PDF
Words as Beacons: Guiding RL Agents with High-Level Language Prompts
Unai Ruiz-Gonzalez, Alain Andres, Pedro G. Bascoy, Javier Del Ser · PDF
xTED: Cross-Domain Adaptation via Diffusion-Based Trajectory Editing
Haoyi Niu, Qimao Chen, Tenglong Liu, Jianxiong Li, Guyue Zhou, Yi ZHANG, Jianming HU, Xianyuan Zhan · PDF
Zero-shot Whole-Body Humanoid Control via Behavioral Foundation Models
Andrea Tirinzoni, Ahmed Touati, Jesse Farebrother, Mateusz Guzek, Anssi Kanervisto, Yingchen Xu, Alessandro Lazaric, Matteo Pirotta · PDF

Accepted papers (97)

☆3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

☆A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

☆A Simplified A Priori Theory Of Meaning, –Nature based AI ‘first principles’–

☆Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset

☆Agent S: An Open Agentic Framework that Uses Computers Like a Human

☆Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

☆Agentic Anomaly Detection for Shipping

☆Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

☆AgentStudio: A Toolkit for Building General Virtual Agents

☆An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning

☆Are Expressive Models Truly Necessary for Offline RL?

☆Articulated Animal AI: An Environment for Animal-like Cognition in a Limbed Agent

☆Automated Design of Agentic Systems

☆Automating Thought of Search: A Journey Towards Soundness and Completeness

☆Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case

☆CARD: Cross-modal Agent Framework for Generative and Editable Residential Design

☆Chain-of-Imagination for Reliable Instruction Following in Decision Making

☆Cognitive Planning for Object Goal Navigation using Generative AI Models

☆Collective Wisdom in Language Models: Harnessing LLM-Swarm for Agile Project Management

☆CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents

☆Cradle: Empowering Foundation Agents towards General Computer Control

☆DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems

☆DepsRAG: Towards Agentic Reasoning and Planning for Software Dependency Management

☆Dissecting Adversarial Robustness of Multimodal LM Agents

☆Do LLM Personas Dream of Bull Markets? Comparing Human and AI Investment Strategies Through the Lens of the Five-Factor Model

☆Efficient Reinforcement Learning via Large Language Model-based Search

☆ENHANCING DATA EFFICIENCY IN REINFORCEMENT LEARNING: A NOVEL IMAGINATION MECHANISM BASED ON MESH INFORMATION PROPAGATION

☆EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

☆FEABench: Evaluating Language Models on Real World Physics Reasoning Ability

☆Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

☆First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs

☆FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

☆FPGA-Gym: An FPGA-Accelerated Reinforcement Learning Environment Simulation Framework

☆From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents

☆Generalized Open-World Semi-Supervised Object Detection

☆GTA: A Benchmark for General Tool Agents

☆HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models

☆Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

☆IDEA: Enhancing the Rule Learning Ability of Language Agent through Induction, Deduction, and Abduction

☆IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks

☆Improving Decision-Making in Open-World Agents with Conformal Prediction and Monty Hall

☆In-Context Imitation Learning via Next-Token Prediction

☆Infer Human’s Intentions Before Following Natural Language Instructions

☆Infogent: An Agent-based Framework for Web Information Aggregation

☆Integrating Visual and Linguistic Instructions for Context-Aware Navigation Agents

☆Interactive Navigation of Quadruped Robots in Challenging Environments using Large Language Models

☆Inverse Attention Agent in Multi-Agent System

☆Language Models and Symbolic Planners can Infer Action Semantics through Environment Feedback

☆Learning Region-Word Alignment with Attentive Masking for Open-Vocabulary Object Detection

☆Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning

☆Lightweight Neural App Control

☆LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs

☆LLM4Drive: A Survey of Large Language Models for Autonomous Driving

☆LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

☆MASAI: Modular Architecture for Software-engineering AI Agents

☆MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning

☆MobileFlow: A Multimodal LLM For Mobile GUI Agent

☆Multimodal Auto Validation For Self-Refinement in Web Agents

☆OASIS: Open Agents Social Interaction Simulations on One Million Agents

☆One-shot World Models Using a Transformer Trained on a Synthetic Prior

☆Planning as Inpainting: A Generative Framework for Realistic Embodied Path Planning

☆Policy optimization to align the validity, coherence and efficiency of reasoning agents in multi-turn dialogues

☆Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

☆Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models

☆RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

☆RAR-Agent: Retrieval Augmented Reflection Learning from Scratch for Reasoning

☆RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning and Verification in Long-Horizon Generation

☆RefactorBench: Evaluating Stateful Reasoning In Language Agents Through Code

☆REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

☆RH20T-P: A Primitive-Level Robotic Manipulation Dataset Towards Composable Generalization Agents in Real-world Scenarios

☆Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

☆Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

☆Robust Offline Learning via Adversarial World Models

☆ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

☆Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation

☆SEAL: Suite for Evaluating API-use of LLMs

☆SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals

☆Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

☆ShowUI: One Vision-Language-Action Model for Generalist GUI Agent

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

A Simplified A Priori Theory Of Meaning, –Nature based AI ‘first principles’–

Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems

Agentic Anomaly Detection for Shipping

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

AgentStudio: A Toolkit for Building General Virtual Agents

An Efficient Open World Benchmark for Multi-Agent Reinforcement Learning

Are Expressive Models Truly Necessary for Offline RL?

Articulated Animal AI: An Environment for Animal-like Cognition in a Limbed Agent

Automated Design of Agentic Systems

Automating Thought of Search: A Journey Towards Soundness and Completeness

Can VLMs Play Action Role-Playing Games? Take Black Myth Wukong as a Study Case

CARD: Cross-modal Agent Framework for Generative and Editable Residential Design

Chain-of-Imagination for Reliable Instruction Following in Decision Making

Cognitive Planning for Object Goal Navigation using Generative AI Models

Collective Wisdom in Language Models: Harnessing LLM-Swarm for Agile Project Management

CRAB: Cross-platfrom agent benchmark for multi-modal embodied language model agents

Cradle: Empowering Foundation Agents towards General Computer Control

DARD: A Multi-Agent Approach for Task-Oriented Dialog Systems

DepsRAG: Towards Agentic Reasoning and Planning for Software Dependency Management

Dissecting Adversarial Robustness of Multimodal LM Agents

Do LLM Personas Dream of Bull Markets? Comparing Human and AI Investment Strategies Through the Lens of the Five-Factor Model

Efficient Reinforcement Learning via Large Language Model-based Search

ENHANCING DATA EFFICIENCY IN REINFORCEMENT LEARNING: A NOVEL IMAGINATION MECHANISM BASED ON MESH INFORMATION PROPAGATION

EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms

FEABench: Evaluating Language Models on Real World Physics Reasoning Ability

Fine-Tuning Web Agents: It Works, But It's Trickier Than You Think

First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL

FPGA-Gym: An FPGA-Accelerated Reinforcement Learning Environment Simulation Framework

From Context to Action: Analysis of the Impact of State Representation and Context on the Generalization of Multi-Turn Web Navigation Agents

Generalized Open-World Semi-Supervised Object Detection

GTA: A Benchmark for General Tool Agents

HSCL-RL: Mitigating Hallucinations in Multimodal Large Language Models

Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models

IDEA: Enhancing the Rule Learning Ability of Language Agent through Induction, Deduction, and Abduction

IDS-Agent: An LLM Agent for Explainable Intrusion Detection in IoT Networks

Improving Decision-Making in Open-World Agents with Conformal Prediction and Monty Hall

In-Context Imitation Learning via Next-Token Prediction

Infer Human’s Intentions Before Following Natural Language Instructions

Infogent: An Agent-based Framework for Web Information Aggregation

Integrating Visual and Linguistic Instructions for Context-Aware Navigation Agents

Interactive Navigation of Quadruped Robots in Challenging Environments using Large Language Models

Inverse Attention Agent in Multi-Agent System

Language Models and Symbolic Planners can Infer Action Semantics through Environment Feedback

Learning Region-Word Alignment with Attentive Masking for Open-Vocabulary Object Detection

Learning to Bridge the Gap: Efficient Novelty Recovery with Planning and Reinforcement Learning

Lightweight Neural App Control

LLM2Swarm: Robot Swarms that Responsively Reason, Plan, and Collaborate through LLMs

LLM4Drive: A Survey of Large Language Models for Autonomous Driving

LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench

MASAI: Modular Architecture for Software-engineering AI Agents

MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning

MobileFlow: A Multimodal LLM For Mobile GUI Agent

Multimodal Auto Validation For Self-Refinement in Web Agents

OASIS: Open Agents Social Interaction Simulations on One Million Agents

One-shot World Models Using a Transformer Trained on a Synthetic Prior

Planning as Inpainting: A Generative Framework for Realistic Embodied Path Planning

Policy optimization to align the validity, coherence and efficiency of reasoning agents in multi-turn dialogues

Put Your Money Where Your Mouth Is: Evaluating Strategic Planning and Execution of LLM Agents in an Auction Arena

Quality-Diversity Self-Play: Open-Ended Strategy Innovation via Foundation Models

RAP: Retrieval-Augmented Planning with Contextual Memory for Multimodal LLM Agents

RAR-Agent: Retrieval Augmented Reflection Learning from Scratch for Reasoning

RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning and Verification in Long-Horizon Generation

RefactorBench: Evaluating Stateful Reasoning In Language Agents Through Code

REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments

RH20T-P: A Primitive-Level Robotic Manipulation Dataset Towards Composable Generalization Agents in Real-world Scenarios

Richelieu: Self-Evolving LLM-Based Agents for AI Diplomacy

Robo-MUTUAL: Robotic Multimodal Task Specification via Unimodal Learning

Robust Offline Learning via Adversarial World Models

ROCKET-1: Master Open-World Interaction with Visual-Temporal Context Prompting

Scaling Population-Based Reinforcement Learning with GPU Accelerated Simulation

SEAL: Suite for Evaluating API-use of LLMs

SELFGOAL: Your Language Agents Already Know How to Achieve High-level Goals

Semantically Safe Robot Manipulation: From Semantic Scene Understanding to Motion Safeguards

ShowUI: One Vision-Language-Action Model for Generalist GUI Agent

Simulating User Agents for Embodied Conversational AI