ICML 2025 Past Agents

ICML 2025 Workshop on Computer Use Agents

WCUA 2025

Submission deadline
May 21, 2025, 13:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (33)

Fetched from OpenReview (v2) on 2026-06-10.

  1. AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

    Arman Zharmagambetov, Chuan Guo, Ivan Evtimov, Maya Pavlova, Ruslan Salakhutdinov, Kamalika Chaudhuri · PDF
  2. API Agents vs. GUI Agents: Divergence and Convergence

    Chaoyun Zhang, Shilin He, Liqun Li, Si Qin, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang · PDF
  3. BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

    Zihan Deng, Changyu Du, Stavros Nousias, André Borrmann · PDF
  4. Coding Agents with Multimodal Browsing are Generalist Problem Solvers

    Aditya Bharat Soni, Boxuan Li, Xingyao Wang, Valerie Chen, Graham Neubig · PDF
  5. Context manipulation attacks : Web agents are susceptible to corrupted memory

    Atharv Singh Patlan, S Ashwin Hebbar, Pramod Viswanath, Prateek Mittal · PDF
  6. DoomArena: A framework for Testing AI Agents Against Evolving Security Threats

    Léo Boisvert, Abhay Puri, Gabriel Huang, Mihir Bansal, Chandra Kiran Reddy Evuru, Avinandan Bose, Maryam Fazel, Quentin Cappart, Alexandre Lacoste, Alexandre Drouin, Krishnamurthy Dj Dvijotham · PDF
  7. Dynamic Risk Assessments for Offensive Cybersecurity Agents

    Boyi Wei, Benedikt Stroebl, Jiacen Xu, Joie Zhang, Zhou Li, Peter Henderson · PDF
  8. EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind

    SHRADDHA VIJAY PAWAR, Balavarun Pedapudi, Pramod Kaushik, Sarath Sivaprasad, Mario Fritz, Shirish Karande · PDF
  9. EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments

    Sara Fish, Julia Shephard, Minkai Li, Ran I Shorrer, Yannai A. Gonczarowski · PDF
  10. GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning

    Zhen Xiang, Linzhi Zheng, Yanjie Li, Junyuan Hong, Qinbin Li, Han Xie, Jiawei Zhang, Zidi Xiong, Chulin Xie, Nathaniel D. Bastian, Carl Yang, Dawn Song, Bo Li · PDF
  11. How to Train Your LLM Web Agent: A Statistical Diagnosis

    Dheeraj Vattikonda, Santhoshi Ravichandran, Emiliano Penaloza, Hadi Nekoei, Megh Thakkar, Thibault Le Sellier de Chezelles, Nicolas Gontier, Miguel Muñoz-Mármol, Sahar Omidi Shayegan, Stefania Raimondo, Xue Liu, Alexandre Drouin, Laurent Charlin, Alexandre Piché, Alexandre Lacoste, Massimo Caccia · PDF
  12. Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search

    Samuel Holt, Max Ruiz Luyten, Thomas Pouplin, Mihaela van der Schaar · PDF
  13. InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

    Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, Fei Wu · PDF
  14. Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge

    Boyu Gou, Zanming Huang, Yuting Ning, Yu Gu, Michael Lin, Weijian Qi, Andrei Kopanev, Botao Yu, Bernal Jiménez Gutiérrez, Yiheng Shu, Chan Hee Song, Jiaman Wu, Shijie Chen, Hanane Nour Moussa, TIANSHU ZHANG, Jian Xie, Yifei Li, Tianci Xue, Zeyi Liao, Kai Zhang, Boyuan Zheng, Zhaowei Cai, Viktor Rozgic, Morteza Ziyadi, Huan Sun, Yu Su · PDF
  15. OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

    Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, J Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko · PDF
  16. OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth?

    Xuetian Chen, Yinghao Chen, Xinfeng Yuan, ZhuoPeng, Lu Chen, Yuekeng Li, Zhoujia Zhang, Yingqian Huang, Leyan Huang, Jiaqing Liang, Tianbao Xie, Zhiyong Wu, Qiushi Sun, Biqing Qi, Bowen Zhou · PDF
  17. OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents

    Reyna Abhyankar, Qi Qi, Yiying Zhang · PDF
  18. Reimagining ABM with LLM Agents via Shachi

    So Kuroki, Yingtao Tian, Kou Misaki, Takashi Ikegami, Takuya Akiba, Yujin Tang · PDF
  19. Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

    Siliang Zeng, Quan Wei, William Brown, Oana Frunza, Yuriy Nevmyvaka, Yang Katie Zhao, Mingyi Hong · PDF
  20. Replacing thinking with tool usage enables reasoning in small language models

    Corrado Rainone, Tim Bakker, Roland Memisevic · PDF
  21. ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows

    Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Robert Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu · PDF
  22. Semantic Context for Tool Orchestration

    Robert Müller · PDF
  23. Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning

    Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Joshua Kazdan, Avinandan Bose, Quentin Cappart, Maryam Fazel, Sai Rajeswar, Jason Stanley, Nicolas Chapados, Alexandre Drouin, Krishnamurthy Dj Dvijotham · PDF
  24. ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

    Ido Levy, Ben wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov · PDF
  25. Toward Autonomous UI Exploration: The UIExplorer Benchmark

    Andrei Cristian Nica, Akshaya Vishnu Kudlu Shanbhogue, Harshil Shah, Aleix Cambray, Tudor Berariu, Lucas Maystre, David Barber · PDF
  26. UI-Evol: Automatic Knowledge Evolving for Computer Use Agents

    Ziyun Zhang, Xinyi Liu, Xiaoyi Zhang, Jun Wang, Gang Chen, Yan Lu · PDF
  27. Universal Retrieval for Multimodal Trajectory Modeling

    Xuan Zhang, Ziyan Jiang, Rui Meng, Yifei Leng, Zhenbang Xiao, Zora Zhiruo Wang, Yanyi Shang, Dehan Kong · PDF
  28. VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning

    Thong Q. Nguyen, Shubhang Desai, Yash Jain, Tanvir Aumi, Vishal Chowdhary · PDF
  29. WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks

    Ivan Evtimov, Arman Zharmagambetov, Aaron Grattafiori, Chuan Guo, Kamalika Chaudhuri · PDF
  30. Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents

    Daniel Jones, Martin Pouliot, Giorgio Severi, Joris de Gruyter, Gary David Lopez Munoz, Santiago Zanella-Beguelin, Justin Song, Amanda J. Minnich, Pamela Cortez · PDF
  31. WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning

    Zhepei Wei, Wenlin Yao, Yao Liu, Weizhi Zhang, Qin Lu, Liang Qiu, Changlong Yu, Puyang Xu, Chao Zhang, Bing Yin, Hyokun Yun, Lihong Li · PDF
  32. WebGames: Challenging General-Purpose Web-Browsing AI Agents

    George Thomas, Filippos Christianos, Alex James Chan, Rohit Midha, Jikun Kang, Wenqi Wu, Fraser David Greenlee, Andrew Toulis, Marvin Purtorab · PDF
  33. WebQuest: A Benchmark for Multimodal QA on Web Page Sequences

    Maria Wang, Srinivas Sunkara, Jason Lin, Gilles Baechler, Fedir Zubach, Lei Shu, Yun Zhu, Jindong Chen · PDF