ICML 2025PastAgents

ICML 2025 Workshop on Computer Use Agents

WCUA 2025

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: May 21, 2025, 13:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (33)

Fetched from OpenReview (v2) on 2026-06-10.

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Maya Pavlova, Ruslan Salakhutdinov, Kamalika Chaudhuri · PDF
API Agents vs. GUI Agents: Divergence and Convergence
Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang · PDF
BIMgent: Towards Autonomous Building Modeling via Computer-use Agents
Zihan Deng, Changyu Du, Stavros Nousias, André Borrmann · PDF
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, Graham Neubig · PDF
Context manipulation attacks : Web agents are susceptible to corrupted memory
Atharv Singh Patlan, S Ashwin Hebbar, Pramod Viswanath, Prateek Mittal · PDF
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert, Abhay Puri, Gabriel Huang, Mihir Bansal, Chandra Kiran Reddy Evuru, Avinandan Bose, Maryam Fazel, Quentin Cappart, Alexandre Lacoste, Alexandre Drouin, Krishnamurthy Dj Dvijotham · PDF
Dynamic Risk Assessments for Offensive Cybersecurity Agents
Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson · PDF
EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind
SHRADDHA VIJAY PAWAR, Balavarun Pedapudi, Pramod Kaushik, Sarath Sivaprasad, Mario Fritz, Shirish Karande · PDF
EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments
Sara Fish, Julia Shephard, Minkai Li, Ran I Shorrer, Yannai A. Gonczarowski · PDF
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Nathaniel D. Bastian, Carl Yang, Dawn Song, Bo Li · PDF
How to Train Your LLM Web Agent: A Statistical Diagnosis
Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia · PDF
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search
Samuel Holt, Max Ruiz Luyten, Thomas Pouplin, Mihaela van der Schaar · PDF
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu · PDF
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, Chan Hee Song, Jiaman Wu, Shijie Chen, Hanane Nour Moussa, TIANSHU ZHANG, Jian Xie, Yifei Li, Tianci Xue, Zeyi Liao, Kai Zhang, Boyuan Zheng, Zhaowei Cai, Viktor Rozgic, Morteza Ziyadi, Huan Sun, Yu Su · PDF
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko · PDF
OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth?
Xuetian Chen, Yinghao Chen, Xinfeng Yuan, ZhuoPeng, Lu Chen, Yuekeng Li, Zhoujia Zhang, Yingqian Huang, Leyan Huang, Jiaqing Liang, Tianbao Xie, Zhiyong Wu, Qiushi Sun, Biqing Qi, Bowen Zhou · PDF
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
Reyna Abhyankar, Qi Qi, Yiying Zhang · PDF
Reimagining ABM with LLM Agents via Shachi
So Kuroki, Yingtao Tian, Kou Misaki, Takashi Ikegami, Takuya Akiba, Yujin Tang · PDF
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
Siliang Zeng, Quan Wei, William Brown, Oana Frunza, Yuriy Nevmyvaka, Yang Katie Zhao, Mingyi Hong · PDF
Replacing thinking with tool usage enables reasoning in small language models
Corrado Rainone, Tim Bakker, Roland Memisevic · PDF
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Robert Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu · PDF
Semantic Context for Tool Orchestration
Robert Müller · PDF
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Joshua Kazdan, Avinandan Bose, Quentin Cappart, Maryam Fazel, Sai Rajeswar, Jason Stanley, Nicolas Chapados, Alexandre Drouin, Krishnamurthy Dj Dvijotham · PDF
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
Ido Levy, Ben wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov · PDF
Toward Autonomous UI Exploration: The UIExplorer Benchmark
Andrei Cristian Nica, Akshaya Vishnu Kudlu Shanbhogue, Harshil Shah, Aleix Cambray, Tudor Berariu, Lucas Maystre, David Barber · PDF
UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
Ziyun Zhang, Xinyi Liu, Xiaoyi Zhang, Jun Wang, Gang Chen, Yan Lu · PDF
Universal Retrieval for Multimodal Trajectory Modeling
Xuan Zhang, Ziyan Jiang, Rui Meng, Yifei Leng, Zhenbang Xiao, Zora Zhiruo Wang, Yanyi Shang, Dehan Kong · PDF
VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning
Thong Q. Nguyen, Shubhang Desai, Yash Jain, Tanvir Aumi, Vishal Chowdhary · PDF
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, Kamalika Chaudhuri · PDF
Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents
Daniel Jones, Martin Pouliot, Giorgio Severi, Joris de Gruyter, Gary David Lopez Munoz, Santiago Zanella-Beguelin, Justin Song, Amanda J. Minnich, Pamela Cortez · PDF
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li · PDF
WebGames: Challenging General-Purpose Web-Browsing AI Agents
George Thomas, Filippos Christianos, Alex James Chan, Rohit Midha, Jikun Kang, Wenqi Wu, Fraser David Greenlee, Andrew Toulis, Marvin Purtorab · PDF
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
Maria Wang, Srinivas Sunkara, Jason Lin, Gilles Baechler, Fedir Zubach, Lei Shu, Yun Zhu, Jindong Chen · PDF

Accepted papers (33)

☆AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

☆API Agents vs. GUI Agents: Divergence and Convergence

☆BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

☆Coding Agents with Multimodal Browsing are Generalist Problem Solvers

☆Context manipulation attacks : Web agents are susceptible to corrupted memory

☆DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

☆Dynamic Risk Assessments for Offensive Cybersecurity Agents

☆EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind

☆EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments

☆GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

☆How to Train Your LLM Web Agent: A Statistical Diagnosis

☆Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search

☆InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

☆Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

☆OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

☆OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth?

☆OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents

☆Reimagining ABM with LLM Agents via Shachi

☆Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

☆Replacing thinking with tool usage enables reasoning in small language models

☆ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

☆Semantic Context for Tool Orchestration

☆Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

☆ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

☆Toward Autonomous UI Exploration: The UIExplorer Benchmark

☆UI-Evol: Automatic Knowledge Evolving for Computer Use Agents

☆Universal Retrieval for Multimodal Trajectory Modeling

☆VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning

☆WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

☆Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents

☆WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

☆WebGames: Challenging General-Purpose Web-Browsing AI Agents

☆WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

API Agents vs. GUI Agents: Divergence and Convergence

BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

Coding Agents with Multimodal Browsing are Generalist Problem Solvers

Context manipulation attacks : Web agents are susceptible to corrupted memory

DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

Dynamic Risk Assessments for Offensive Cybersecurity Agents

EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind

EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments

GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

How to Train Your LLM Web Agent: A Statistical Diagnosis

Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth?

OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents

Reimagining ABM with LLM Agents via Shachi

Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

Replacing thinking with tool usage enables reasoning in small language models

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

Semantic Context for Tool Orchestration

Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

Toward Autonomous UI Exploration: The UIExplorer Benchmark

UI-Evol: Automatic Knowledge Evolving for Computer Use Agents

Universal Retrieval for Multimodal Trajectory Modeling

VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning

WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents

WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

WebGames: Challenging General-Purpose Web-Browsing AI Agents

WebQuest: A Benchmark for Multimodal QA on Web Page Sequences