ICLR 2024 Past Large language modelsAgents

ICLR 2024 Workshop on Large Language Model (LLM) Agents

LLMAgents @ ICLR 2024

Submission deadline
Feb 12, 2024, 23:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (96)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

    Kuang-Huei Lee, Xinyun Chen, Hiroki Furuta, John Canny, Ian Fischer · PDF
  2. A-CONECT: Designing AI-based Conversational Chatbot for Early Dementia Intervention

    Junyuan Hong, Wenqing Zheng, Han Meng, Siqi Liang, Anqing Chen, Hiroko H. Dodge, Jiayu Zhou, Zhangyang Wang · PDF
  3. Adapting Uni-Modal Language Models for Dense Multi-Modal Co-Reference Resolution using Parameter Augmentation

    Samuel Osebe, Prashan Wanigasekara, Thanh Tran, Thomas Gueudre · PDF
  4. Agent Instructs Large Language Models to be General Zero-Shot Reasoners

    Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang · PDF
  5. Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast

    Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin · PDF
  6. Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization

    Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming Lu · PDF
  7. AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

    Chang Ma, Junlei Zhang, Zhihao Zhu, Cheng Yang, Yujiu Yang, Yaohui Jin, Zhenzhong Lan, Lingpeng Kong, Junxian He · PDF
  8. Agents: An Open-source Framework for Autonomous Language Agents

    Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shuai Wang, Jiamin Chen, Jintian Zhang, Jing Chen, Xiangru Tang, Peng Cui, Ningyu Zhang, Huajun Chen, Mrinmaya Sachan · PDF
  9. An Embodied Generalist Agent in 3D World

    Jiangyong Huang, Silong Yong, Xiaojian Ma, Xiongkun Linghu, Puhao Li, Yan Wang, Qing Li, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang · PDF
  10. ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

    Yifei Zhou, Andrea Zanette, Jiayi Pan, Aviral Kumar, Sergey Levine · PDF
  11. Are Machines Better at Slow Thinking? Unveiling Human-Machine Inference Gaps in Entailment Verification

    Soumya Sanyal, Tianyi Xiao, Jiacheng Liu, Wenya Wang, Xiang Ren · PDF
  12. AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning

    Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor Jiang, chengfei lv, Huajun Chen · PDF
  13. AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, Ahmed Hassan Awadallah, Ryen W White, Doug Burger, Chi Wang · PDF
  14. Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

    Lucas Lehnert, Sainbayar Sukhbaatar, Paul McVay, Michael Rabbat, Yuandong Tian · PDF
  15. BioDiscoveryAgent: An AI Agent for Designing Genetic Perturbation Experiments

    Yusuf H Roohani, Jian Vora, Qian Huang, Percy Liang, Jure Leskovec · PDF
  16. BOLAA: BENCHMARKING AND ORCHESTRATING LLM AUTONOMOUS AGENTS

    Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh R N, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese · PDF
  17. Bring Your Own KG: Self-Supervised Program Synthesis for Zero-Shot KGQA

    Dhruv Agarwal, Rajarshi Das, Sopan Khosla, Rashmi Gangadharaiah · PDF
  18. Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning

    Mohamed Aghzal, Erion Plaku, Ziyu Yao · PDF
  19. Collaborative LLM-Agents for Editable Driving Scene Simulation

    Yuxi Wei, Zi Wang, Yifan Lu, Chenxin Xu, Changxing Liu, Hao Zhao, Siheng Chen, Yanfeng Wang · PDF
  20. Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach

    Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Guoliang Fan, Lijuan Li · PDF
  21. Corex: Pushing the Boundaries of Complex Reasoning through Multi-Model Collaboration

    Qiushi Sun, Zhangyue Yin, Xiang Li, Zhiyong Wu, Xipeng Qiu, Lingpeng Kong · PDF
  22. Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

    Wenqi Zhang, Yongliang Shen, Weiming Lu, Yueting Zhuang · PDF
  23. Decision-Oriented Dialogue for Human-AI Collaboration

    Jessy Lin, Nicholas Tomlin, Jacob Andreas, Jason Eisner · PDF
  24. Do LLM Agents Have Regret? A Case Study in Online Learning and Games

    Chanwoo Park, Xiangyu Liu, Asuman E. Ozdaglar, Kaiqing Zhang · PDF
  25. EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

    Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Kan Ren, Dongsheng Li, Deqing Yang · PDF
  26. EcoAssistant: Using LLM Assistants More Affordably and Accurately

    Jieyu Zhang, Ranjay Krishna, Ahmed Hassan Awadallah, Chi Wang · PDF
  27. Efficient Human-AI Coordination via Preparatory Language-based Convention

    Cong Guan, Lichao Zhang, Chunpeng Fan, Yi-Chen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu · PDF
  28. EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records

    Wenqi Shi, Ran Xu, Yuchen Zhuang, Yue Yu, Jieyu Zhang, Hang Wu, Yuanda Zhu, Joyce C. Ho, Carl Yang, May Dongmei Wang · PDF
  29. Empowering Autonomous Driving with Large Language Models: A Safety Perspective

    Yixuan Wang, Ruochen Jiao, Simon Sinong Zhan, Chengtian Lang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu · PDF
  30. Executable Code Actions Elicit Better LLM Agents

    Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhang, Yunzhu Li, Hao Peng, Heng Ji · PDF
  31. Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View

    Jintian Zhang, Xin Xu, Ningyu Zhang, Ruibo Liu, Bryan Hooi, Shumin Deng · PDF
  32. Exposing Limitations of Language Model Agents in Sequential-Task Compositions on the Web

    Hiroki Furuta, Yutaka Matsuo, Aleksandra Faust, Izzeddin Gur · PDF
  33. Expressing and Exploiting Parallelism in Language Model Decoding

    Tian Jin, Ellie Y Cheng, Michael Carbin · PDF
  34. FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design

    Haohang Li, Yangyang Yu, Zhi Chen, Yuechen Jiang, Yang Li, Denghui Zhang, Rong Liu, Jordan W. Suchow, Khaldoun Khashanah · PDF
  35. FL-TAC: Enhanced Fine-Tuning in Federated Learning via Low-Rank, Task-Specific Adapter Clustering

    Siqi Ping, Yuzhu Mao, Yang Liu, Xiao-Ping Zhang, Wenbo Ding · PDF
  36. FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

    Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo · PDF
  37. GPT-4V(ision) is a Generalist Web Agent, if Grounded

    Boyuan Zheng, Boyu Gou, Jihyung Kil, Huan Sun, Yu Su · PDF
  38. HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models

    Gabriel Herbert Sarch, Sahil Somani, Raghav Kapoor, Michael J. Tarr, Katerina Fragkiadaki · PDF
  39. Hierarchical Auto-Organizing System for Open-Ended Multi-Agent Navigation

    Zhonghan Zhao, Kewei Chen, Dongxu Guo, Wenhao Chai, Tian Ye, Yanting Zhang, Gaoang Wang · PDF
  40. If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

    Ke Yang, Jiateng Liu, John Wu, Chaoqi Yang, Yi Fung, Sha Li, Zixuan Huang, Xu Cao, Xingyao Wang, Heng Ji, ChengXiang Zhai · PDF
  41. IntentGPT: Few-shot Intent Discovery with Large Language Models

    Juan A. Rodriguez, Nicholas Botzer, David Vazquez, Christopher Pal, Marco Pedersoli, Issam H. Laradji · PDF
  42. Is it Possible to Edit Large Language Models Robustly?

    Xinbei Ma, Tianjie Ju, Jiyang Qiu, Zhuosheng Zhang, hai zhao, lifeng Liu, Yulong Wang · PDF
  43. L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

    Yutaro Yamada, Khyathi Chandu, Bill Yuchen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi · PDF
  44. LangProp: A code optimization framework using Large Language Models applied to driving

    Shu Ishida, Gianluca Corrado, George Fedoseev, Hudson Yeo, Lloyd Russell, Jamie Shotton, Joao F. Henriques, Anthony Hu · PDF
  45. Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

    Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang · PDF
  46. Language-guided Skill Learning with Temporal Variational Inference

    Haotian Fu, Pratyusha Sharma, Elias Stengel-Eskin, George Konidaris, Nicolas Le Roux, Marc-Alexandre Côté, Xingdi Yuan · PDF
  47. Large Language Model Evaluation Via Multi AI Agents: Preliminary results

    Zeeshan Rasheed, Muhammad Waseem, Kari Systä, Pekka Abrahamsson · PDF
  48. Large Language Models can Strategically Deceive their Users when Put Under Pressure

    Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn · PDF
  49. LEAGUE++: EMPOWERING CONTINUAL ROBOT LEARNING THROUGH GUIDED SKILL ACQUISITION WITH LARGE LANGUAGE MODELS

    Zhaoyi Li, Kelin Yu, Shuo Cheng, Danfei Xu · PDF
  50. Limitations of Agents Simulated by Predictive Models

    Raymond Douglas, Jacek Karwowski, Chan Bae, Andis Draguns, Victoria Krakovna · PDF
  51. LLF-Bench: Benchmark for Interactive Learning from Language Feedback

    Ching-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan · PDF
  52. LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models

    Shibo Hao, Yi Gu, Haotian Luo, Tianyang Liu, Xiyan Shao, Xinyuan Wang, Shuhua Xie, Haodi Ma, Adithya Samavedhi, Qiyue Gao, Zhen Wang, Zhiting Hu · PDF
  53. LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Game

    Sahar Abdelnabi, Amr Gomaa, Sarath Sivaprasad, Lea Schönherr, Mario Fritz · PDF
  54. Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs

    Da Yin, Faeze Brahman, Abhilasha Ravichander, Khyathi Chandu, Kai-Wei Chang, Yejin Choi, Bill Yuchen Lin · PDF
  55. MAGIC: INVESTIGATION OF LARGE LANGUAGE MODEL POWERED MULTI-AGENT IN COGNITION, ADAPTABILITY, RATIONALITY AND COLLABORATION

    Lin Xu, Zhiyuan Hu, Daquan Zhou, Hongyu Ren, Zhen Dong, Kurt Keutzer, See-Kiong Ng, Jiashi Feng · PDF
  56. Making Retrieval-Augmented Language Models Robust to Irrelevant Context

    Ori Yoran, Tomer Wolfson, Ori Ram, Jonathan Berant · PDF
  57. MathChat: Converse to Tackle Challenging Math Problems with LLM Agents

    Yiran Wu, Feiran Jia, Shaokun Zhang, Hangyu Li, Erkang Zhu, Yue Wang, Yin Tat Lee, Richard Peng, Qingyun Wu, Chi Wang · PDF
  58. MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

    Xiangru Tang, Anni Zou, Zhuosheng Zhang, Ziming Li, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein · PDF
  59. Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception

    Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang · PDF
  60. On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

    Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao MA, Yingxuan Li, Linran XU, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi BAI, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi, Yu Qiao · PDF
  61. Open-TI: Open Traffic Intelligence with Augmented Language Model

    Longchao Da, Kuan-Ru Liou, Tiejin Chen, Xuesong Zhou, Xiangyong Luo, Yezhou Yang, Hua Wei · PDF
  62. OpenAgents: An Open Platform for Language Agents in the Wild

    Tianbao Xie, Fan Zhou, Zhoujun Cheng, Peng Shi, Luoxuan Weng, Yitao Liu, Toh Jing Hua, Junning Zhao, Qian Liu, Che Liu, Zeyu Liu, Yiheng Xu, Hongjin SU, Dongchan Shin, Caiming Xiong, Tao Yu · PDF
  63. OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

    Yuxuan Kuang, Hai Lin, Meng Jiang · PDF
  64. OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

    Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhoumianze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong · PDF
  65. Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks

    Murtaza Dalal, Tarun Chiruvolu, Devendra Singh Chaplot, Ruslan Salakhutdinov · PDF
  66. Preference-Conditioned Language-Guided Abstraction

    Andi Peng, Andreea Bobu, Belinda Z. Li, Theodore Sumers, Ilia Sucholutsky, Nishanth Kumar, Thomas L. Griffiths, Julie Shah · PDF
  67. Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

    Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein · PDF
  68. ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning

    Alireza Ghafarollahi, Markus Buehler · PDF
  69. R-Judge: Benchmarking Safety Risk Awareness for LLM Agents

    Tongxin Yuan, Zhiwei He, Lingzhong Dong, Yiming Wang, Ruijie Zhao, Tian Xia, Lizhen Xu, Binglin Zhou, Li Fangqi, Zhuosheng Zhang, Rui Wang, Gongshen Liu · PDF
  70. R2E: Turning any Github Repository into a Programming Agent Environment

    Naman Jain, Manish Shetty, Tianjun Zhang, King Han, Koushik Sen, Ion Stoica · PDF
  71. Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement

    Wonseok Jeon, Mukul Gagrani, Raghavv Goel, Junyoung Park, Mingu Lee, Christopher Lott · PDF
  72. ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

    Renat Aksitov, Sobhan Miryoosefi, Zonglin Li, Daliang Li, Sheila Babayan, Kavya Kopparapu, Zachary Fisher, Ruiqi Guo, Sushant Prakash, Pranesh Srinivasan, Manzil Zaheer, Felix Yu, Sanjiv Kumar · PDF
  73. REX: Rapid Exploration and eXploitation for AI agents

    Rithesh R N, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil L Mui, Huan Wang, Caiming Xiong, Silvio Savarese · PDF
  74. S-Agents: Self-organizing Agents in Open-ended Environments

    Jiaqi Chen, Yuxian Jiang, Jiachen Lu, Li Zhang · PDF
  75. SAGE: Bridging Semantic and Actionable Parts for Generalizable Manipulation of Articulated Objects

    Haoran Geng, Songlin Wei, Congyue Deng, Bokui Shen, He Wang, Leonidas Guibas · PDF
  76. SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code

    Ziniu Hu · PDF
  77. SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

    Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Li YanTao, Jianbing Zhang, Zhiyong Wu · PDF
  78. Self-Alignment of Large Language Models via Multi-Agent Social Simulation

    Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen · PDF
  79. SELF-IMAGINE: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination

    Syeda Nahida Akter, Aman Madaan, Sangwu Lee, Yiming Yang, Eric Nyberg · PDF
  80. Self-Training Language Models in Arithmetic Reasoning

    Marek Kadlčík, Michal Štefánik, Ondrej Sotolar, Vlastimil Martinek · PDF
  81. Simulating Opinion Dynamics with Networks of LLM-based Agents

    Yun-Shiuan Chuang, Agam Goyal, Nikunj Harlalka, Siddharth Suresh, Robert D. Hawkins, Sijia Yang, Dhavan V. Shah, Junjie Hu, Timothy T. Rogers · PDF
  82. TaskBench: Benchmarking Large Language Models for Task Automation

    Yongliang Shen, Kaitao Song, Xu Tan, Wenqi Zhang, Kan Ren, Siyu Yuan, Weiming Lu, Dongsheng Li, Yueting Zhuang · PDF
  83. The Agent Ohana: Designing Unified Data and Training Pipeline for Effective Agent Learning

    Jianguo Zhang, Tian Lan, Rithesh R N, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Quoc Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, Ming Zhu, Tulika Manoj Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong · PDF
  84. The ART of LLM Refinement: Ask, Refine, Trust

    Kumar Shridhar, Koustuv Sinha, Andrew Cohen, Tianlu Wang, Ping Yu, Ramakanth Pasunuru, Mrinmaya Sachan, Jason E Weston, Asli Celikyilmaz · PDF
  85. The Wisdom of Partisan Crowds: Comparing Collective Intelligence in Humans and LLM-based Agents

    Yun-Shiuan Chuang, Nikunj Harlalka, Siddharth Suresh, Agam Goyal, Robert D. Hawkins, Sijia Yang, Dhavan V. Shah, Junjie Hu, Timothy T. Rogers · PDF
  86. Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

    Weihao Tan, Ziluo Ding, Wentao Zhang, Boyu Li, Bohan Zhou, Junpeng Yue, Haochong Xia, Jiechuan Jiang, Longtao Zheng, Xinrun Xu, Yifei Bi, Pengjie Gu, Xinrun Wang, Börje F. Karlsson, Bo An, Zongqing Lu · PDF
  87. Towards Natural Language-Driven Industrial Assembly Using Foundation Models

    Omkar Joglekar, Shir Kozlovsky, Tal Lancewicki, Vladimir Tchuiev, Zohar Feldman, Dotan Di Castro · PDF
  88. Towards Self-Improving Language Models for Code Generation

    Michaël Defferrard, Corrado Rainone, David W. Zhang, Blazej Manczak, Natasha Butt, Taco Cohen · PDF
  89. Towards Unified Alignment Between Agents, Humans, and Environment

    Zonghan Yang, An Liu, Zijun Liu, Kaiming Liu, Fangzhou Xiong, Yile Wang, Zeyuan Yang, Qingyuan Hu, XinRui Chen, Zhenhe Zhang, Fuwen Luo, Zhicheng Guo, Peng Li, Yang Liu · PDF
  90. TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

    Yilun Kong, Jingqing Ruan, YiHong Chen, Bin Zhang, Tianpeng Bao, shi shiwei, du guo qing, xiaoru hu, Hangyu Mao, Ziyue Li, Xingyu Zeng, Rui Zhao, Xueqian Wang · PDF
  91. TravelPlanner: A Benchmark for Real-World Planning with Language Agents

    Jian Xie, Kai Zhang, Jiangjie Chen, Tinghui Zhu, Renze Lou, Yuandong Tian, Yanghua Xiao, Yu Su · PDF
  92. Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

    Zhiyuan Hu, Chumin Liu, Xidong Feng, Yilun Zhao, See-Kiong Ng, Anh Tuan Luu, Junxian He, Pang Wei Koh, Bryan Hooi · PDF
  93. VisualWebArena: Evaluating Multimodal Agents on Realistic Visual Web Tasks

    Jing Yu Koh, Robert Lo, Lawrence Jang, Vikram Duvvur, Ming Chong Lim, Po-Yu Huang, Graham Neubig, Shuyan Zhou, Ruslan Salakhutdinov, Daniel Fried · PDF
  94. WavCraft: Audio Editing and Generation with Large Language Models

    Jinhua Liang, Huan Zhang, Haohe Liu, Yin Cao, Qiuqiang Kong, Xubo Liu, Wenwu Wang, Mark D Plumbley, Huy Phan, Emmanouil Benetos · PDF
  95. WebLINX: Real-World Website Navigation with Multi-Turn Dialogue

    Xing Han Lù, Zdeněk Kasner, Siva Reddy · PDF
  96. WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?

    Alexandre Drouin, Maxime Gasse, Massimo Caccia, Issam H. Laradji, Manuel Del Verme, Tom Marty, Léo Boisvert, Megh Thakkar, Quentin Cappart, David Vazquez, Nicolas Chapados, Alexandre Lacoste · PDF