ICML 2025 Past Agents
ICML 2025 Workshop on Computer Use Agents
WCUA 2025
- Submission deadline
- May 21, 2025, 13:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (33)
Fetched from OpenReview (v2) on 2026-06-10.
-
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
-
API Agents vs. GUI Agents: Divergence and Convergence
-
BIMgent: Towards Autonomous Building Modeling via Computer-use Agents
-
Coding Agents with Multimodal Browsing are Generalist Problem Solvers
-
Context manipulation attacks : Web agents are susceptible to corrupted memory
-
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
-
Dynamic Risk Assessments for Offensive Cybersecurity Agents
-
EARL: Early Intent Recognition in GUI Tasks Using Theory of Mind
-
EconEvals: Benchmarks and Litmus Tests for LLM Agents in Unknown Environments
-
GuardAgent: Safeguard LLM Agents via Knowledge-Enabled Reasoning
-
How to Train Your LLM Web Agent: A Statistical Diagnosis
-
Improving LLM Agent Planning with In-Context Learning via Atomic Fact Augmentation and Lookahead Search
-
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
-
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
-
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
-
OS-MAP: How Far Can Computer Use Agents Go in Breadth and Depth?
-
OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents
-
Reimagining ABM with LLM Agents via Shachi
-
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment
-
Replacing thinking with tool usage enables reasoning in small language models
-
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows
-
Semantic Context for Tool Orchestration
-
Silent Sabotage: Injecting Backdoors into AI Agents Through Fine-Tuning
-
ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents
-
Toward Autonomous UI Exploration: The UIExplorer Benchmark
-
UI-Evol: Automatic Knowledge Evolving for Computer Use Agents
-
Universal Retrieval for Multimodal Trajectory Modeling
-
VerificAgent: Integrating Expert Knowledge and Fact-Checked Memory for Robust Domain-Specific Task Planning
-
WASP: Benchmarking Web Agent Security Against Prompt Injection Attacks
-
Weathering the CUA Storm: Mapping Security Threats in the Rapid Rise of Computer Use Agents
-
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
-
WebGames: Challenging General-Purpose Web-Browsing AI Agents
-
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences