ICML 2026 Past AgentsSafety & alignmentPrivacy & security

Second Workshop on Agents in the Wild: Safety, Security, and Beyond

ICML 2026 AIWILD

Submission deadline
May 9, 2026, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (216)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Multi-Model Self-Evolving Framework for Zero-Data Document Understanding via Axiomatic Synthetic Refinement

    Yuwen Tao, Fuxiao Liu
  2. A Prompt-Masked Pilot for History-Dependent Safety Degradation in Multi-Turn Conversational Agents

    Bright Liu, Lia Zheng, Natalia Siwek, Karina Chung
  3. A Systematic Investigation of RL-Jailbreaking in LLMs

    Montaser Mohammedalamen, Kevin Roice, Reginald McLean, Alyssa Lefaivre Škopac
  4. ABRA: Agent Benchmark for Radiology Applications

    Bulat Maksudov, Vladislav Kurenkov, Kathleen M Curran, Alessandra Mileo
  5. Activation Steering for Tool-Poisoning Defense in Language-Model Agents

    Artem Zhuravel, Hubert M. Pysklo
  6. Adaptive Adversaries: A Multi-Turn, Multi-LLM Benchmark for LLM Agent Security

    Devina Jain, David Hartmann, Chuan Li
  7. AF-ARENA: A Multi-Dimensional Evaluation Suite for Alignment Faking

    Chijioke Ugwuanyi, Terry Jingchen Zhang, Zhijing Jin
  8. Agentic Aggregation for Parallel Scaling of Long-Horizon Agentic Tasks

    Yoonsang Lee, Howard Yen, Xi Ye, Danqi Chen
  9. Agentic Misalignment Deterrence: Incorporating Probability and Stakeholders to Decision-Making

    Karthik Reddy Konuganti, Ryan Co
  10. Agentic Reinforcement Learning for Search Misaligns Instruction-Tuning

    Yushi Yang, Shreyansh Padarha, Sarah Ball, Andrew Lee, Adam Mahdi
  11. AgentSociety: Incentivizing Agentic Social Intelligence

    Aditya Vema Reddy Kesari, Krishna Reddy Kesari
  12. AI Agent Safety is a Reinforcement Learning Problem

    Reginald McLean, Tabitha Edith Lee, Montaser Mohammedalamen, Kevin Roice, Glen Berseth, Patrick M. Pilarski, Marlos C. Machado, Alyssa Lefaivre Škopac, Benjamin Rosman
  13. AI Safeguards as Affordance Modulation: Embedded Population Assumptions in Agentic Systems

    Sumaya Nur Adan
  14. Aligning Language Models with Selective Prediction

    Gaoxiang Luo, Yifan Wu, Sinian Zhang, Aryan Deshwal, Ju Sun
  15. Approve the Effect, Not the Tool Call: Preventing Stale Consent in Tool-Using Agents

    Qi Zhang
  16. Architecture Matters for Multi-Agent Security

    Ben Hagag, William L. Anderson, Christian Schroeder de Witt, Sarah Scheffler
  17. ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning

    Xiaoxuan Wang, Han Zhang, Haixin Wang, Yidan Shi, Ruoyan Li, Kaiqiao Han, Chenyi Tong, Haoran Deng, Alexander K Taylor, Renliang Sun, Yanqiao Zhu, Jason Cong, Yizhou Sun, Wei Wang
  18. ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents

    Udari Madhushani Sehwag, Zhengyang Shan, Heming Liu, Dileepa Lakshan, Joseph Brandifino, Max Fenkell
  19. ATLAS: Adaptive Topology-Level Attack Synthesis for Probing Multi-Agent Systems

    Raja Sekhar Rao Dheekonda, Vincent Abruzzo
  20. Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

    Tyler Crosse, Catherine Ge-Wang, Benjamin Hadad, Joachim Schaeffer, Ram Potham, Tyler Tracy
  21. Attractor States Emerge in Multi-Turn LLM Conversations

    Ting-Wen Ko, Jonas Geiping
  22. Autoformalization of Agent Instructions into Policy-as-Code

    Adam Mondl, Matthew Maisel, John H. Brock
  23. AutoHoney: Automating, Deploying, and Evaluating Scheming Honeypots Across Production Codebases

    Martin Ciesielski-Listwan, Alyssia Jovellanos, Victoria Krakovna
  24. Automata from Agent Traces: Failure and Next-Step Prediction

    Seonglae Cho, Franklin Cardenoso Fernandez, Umar Mohammed, Zekun Wu, Kleyton Da Costa, Ilham Wicaksono, Adriano Koshiyama
  25. Automated interpretability and feature discovery in language models with agents

    Arnau Marin-Llobet, Javier Ferrando
  26. BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

    Arnon Mazza, Elad Levi
  27. BarrierSteer: LLM Safety via Learning Barrier Steering

    Thanh Q. Tran, Arun Verma, Kiwan Wong, Bryan Kian Hsiang Low, Daniela Rus, Wei Xiao
  28. Behavioral Code: Legible, Auditable Loops for Autonomous Agents

    Abutalib Namazov, Tae Wook Kim, Eagon Meng, Daniel Jackson
  29. Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification

    Sarah Wilson, Diem Linh Dang, Usman Ali Moazzam, Shan Ye, Gail Kaiser
  30. Beyond Single-Model Injection: A Threat Model and Defense Architecture for Prompt Injection in Multi-Agent Systems

    Rudrendu Kumar Paul, Sourav Nandy
  31. Bias and Discrimination in the Agentic Web and How Project NANDA Can Support Mitigation

    Sundaraparipurnan Narayanan
  32. BiasTrojan: LLM Judgers Are Easily Distorted by Few Hundreds of Contrastive Biased Training Data

    Zichen TANG, Zhenheng Tang, Qian Wang, Gaoning Pan, Yuhan Yang, Wei He, Shaohuai Shi, Xiaowen Chu, Bo Li
  33. BioAgent Bench: An AI Agent Evaluation Suite for Bioinformatics

    Dionizije Fa, Marko Culjak, Bruno Pandza, Mateo Čupić
  34. Boundary Point Jailbreaking of Black-Box LLMs

    Xander Davies, Giorgi Giglemiani, Edmund Lau, Eric Winsor, Geoffrey Irving, Yarin Gal
  35. Bridging Safety and Performance in Autonomous Systems using Offline Reinforcement Learning

    Mumuksh Tayal, Manan Tayal, Ravi Prakash
  36. CAD-bench: Benchmarking Language Models on Functional CAD Generation

    Dhruv Saini
  37. Calibrated Deferral Routing for Cost-Efficient Guardrails

    Giandomenico Cornacchia, Inkit Padhi, Manish Nagireddy, Subhajit Chaudhury, Pierre Dognin, Tejaswini Pedapati, Ambrish Rawat, Mark Purcell, Kush R. Varshney, Prasanna Sattigeri
  38. CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

    Hanna Foerster, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert D. Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao
  39. CANARY: Zero-Label Detection of Fine-Tuning Contamination in Language Models

    Swapnil Parekh
  40. Catching Infrastructure Sabotage When Coding Agents Are Insider Threats

    Preeti Ravindra, Rahul Tiwari
  41. Caught in the Act(ivation): Stopping Credential Exfiltration Before It Starts

    Kargi Chauhan, Pratibha Revankar
  42. Chain of Risk: Safety Failures in Large Reasoning Models and Mitigation via Adaptive Multi-Principle Steering

    Xiaomin Li, Jianheng Hou, Zheyuan Deng, Zhiwei Zhang, Taoran Li, binghang lu, Bing Hu, Yunhan Zhao, Yuexing Hao
  43. Chain-of-Sanitized-Thoughts: Reducing PII Leakage in Chain-of-Thought Reasoning

    Arghyadeep Das, Sai Sreenivas Chintha, Sharvi Endait, Rishiraj Girmal, Kinjal Pandey
  44. ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation

    Xiaochong Jiang, Shiqi Yang, Ziwei Li, Lifei Liu, Haoran Yu, Yichen Liu
  45. Chasing the Public Score: User Pressure and Evaluation Exploitation in Coding Agent Workflows

    Hardy Chen, Nancy Lau, Haoqin Tu, Shuo Yan, Xiangyan Liu, Zijun Wang, Juncheng Wu, Michael Qizhe Shieh, Cihang Xie, Yuyin Zhou
  46. Claudini: Autoresearch Discovers State-of-the-Art Adversarial Attack Algorithms for LLMs

    Alexander Panfilov, Peter Romov, Igor Shilov, Yves-Alexandre de Montjoye, Jonas Geiping, Maksym Andriushchenko
  47. ClinSeekAgent: Automating Multi-modal Evidence Seeking for Agentic Clinical Reasoning

    Juncheng Wu, Letian Zhang, Yuhan Wang, Haoqin Tu, Hardy Chen, Zijun Wang, Cihang Xie, Yuyin Zhou
  48. Coding Agents Don't Know When to Act

    Thibaud Gloaguen, Niels Mündler, Mark Niklas Mueller, Veselin Raychev, Martin Vechev
  49. Communication Boundary Control for Safer Multi-Agent Language Agents

    Max Ruiz Luyten, Mihaela van der Schaar
  50. Component and Dimension Sparsity in Transformer Refusal Mechanisms

    Vincent Siu, Glenn Grant-Richards, Vlad Pavlovich, Yizhou Sun, Dawn Song, Chenguang Wang
  51. Compound AI System Reliability: A Failure Taxonomy and Resilience Pattern Catalog from 150 Production Incidents

    Rudrendu Kumar Paul, Sourav Nandy
  52. Consensus–Bayesian Anomaly Detection in Agentic Access Graphs

    Pratyush uppuluri, Shilpa Noushad, Sajan Kumar, Jayanth Poosarla
  53. Containment Verification: AI Safety Guarantees Independent of Alignment

    Royce Moon, Lav R. Varshney
  54. CONTRA: Red-Teaming Configurations of Personalizable Agents

    Jonathan Nöther, Adish Singla, Goran Radanovic
  55. Contrastive Discovery: Open-Ended Scientific Discovery over Competing Explanations

    Ziang Liu, James J. Kim, Yijia Dai, Jennifer J. Sun
  56. Controlling Tool Use with Heading-Specific Activation Steering

    Yuqi Chen, Vincent Siu, Yang Liu, Dawn Song, Chenguang Wang
  57. Copy-on-Write Scoring: Application-Specific Agent Evaluations

    Joanna Roy, Sven Hoelzel
  58. Correcting Noise-Mispecified Operator Selection in Wild Compound LLM Agents

    Jiayi Qiu
  59. Coverage-Aware Test Generation for Conversational AI Agents

    Mridul Katta, Darin Glatt
  60. CPPO: Contrastive Perception Policy Optimization for VLM Agents

    Ahmad Rezaei, Mohsen Gholami, Saeed Ranjbar Alvar, Kevin Cannons, Mohammad Asiful Hossain, Zhou Weimin, Yong Zhang, Mohammad Akbari
  61. Cross-Agent Campaign Attribution: Linking Asynchronous Attacks Across LLM Agents

    SangJin Park, Myungsub Choi, Jineok Kim, Minseung Kang
  62. CrossAnchor: Image-Anchored Text Optimization Exposes Blind Spots in Multi-line Defenses of Agentic Systems

    Anish Kiran Kulkarni, Ravikumar Balakrishnan, Prashanth Arun
  63. CTFusion : A CTF-based Benchmark for LLM Agent Evaluation

    Dongjun Lee, Ga-eun Bae, Insu yun
  64. Decomposing Smooth Agentic Inference Scaling

    Ole Kristian Jorgensen, Rivka Mitchell, Tom Rainforth, Cozmin Ududec
  65. Digital Twin Builder: A Multi-Agent LLM System for Automated Industrial Digital Twin Development

    Maria Dziuba, Elizaveta Egorova, Nikolay Vasilev, Ilya Novitskiy, Vladislav Tuchin, Valeria Efimova
  66. Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models

    Wenlong Deng, Jiaji Huang, Kaan Ozkara, Yushu Li, Christos Thrampoulidis, Xiaoxiao Li, Youngsuk Park
  67. Do AI Agents Write Less Maintainable Code Than Human Developers?

    Betty Li Hou, Shaswat P Patel, Arun Purohit, Kai Xu, Jane Pan, He He, Valerie Chen
  68. Does the Optimal Hallucination Detector for Agentic Tool Calls Depend on Model Scale?

    Valentin NOËL, Kait Healy, Visakh Madathil, Bharathi Srinivasan
  69. Double-Helix Co-Training for Computer-Use Generator and Verifier Models

    Jikun Kang, Yi Heng Lim, Alex James Chan
  70. Dynamic Capability Scoping for Enterprise AI Agents: A Synthetic Dataset and Three-Source Permission Architecture

    Halil Burak Noyan
  71. Emergent Social Intelligence Risks in Generative Multi-Agent Systems

    Yue Huang, Yu Jiang, Wenjie Wang, Haomin Zhuang, Xiaonan Luo, Yuchen Ma, Zhangchen Xu, Zichen Chen, Nuno Moniz, Zinan Lin, Pin-Yu Chen, Nitesh V. Chawla, Nouha Dziri, Huan Sun, Xiangliang Zhang
  72. Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute

    Yejun Yun, Samantha Tetef, Eugene Koran, Pablo Bernabeu-Perez, Benjamin Arnav
  73. Evaluating Agentic Configuration Repair for Computer Networks

    Rufat Asadli, Benjamin Hoffman, Ioannis Protogeros, Laurent Vanbever
  74. Evaluation Theater: How Structural Compliance Decouples from Cognitive Judgment in Deployed LLM Agents

    Ning Coeva
  75. EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience

    Taofeng Xue, Chong Peng, Mianqiu Huang, Linsen Guo, Tiancheng Han, Haozhe Wang, Xiaocheng Zhang, Xin Yang, Dengchang Zhao, Jinrui Ding, Xiandi Ma, Yuchen Xie, Peng Pei, Xunliang Cai, Xipeng Qiu
  76. Extracting Recurring Vulnerabilities from Black-Box LLM-Generated Software

    Tomer Kordonsky, Amit LeVi, Maayan Yamin, Noam Benzimra, Avi Mendelson
  77. Failure-Aware Query Refinement for Reliable Open-Vocabulary Home-Robot Perception

    Daun Jeong, Sohyeon Kim, Jouwon Song, Kyeongbo Kong
  78. FaultLoc: Evaluating Coding Agents For Fault Localization

    Jianhong Tu, Shubham Gaur, Rathik Murtinty, Zhun Wang, Tianneng Shi, Dawn Song, Chenguang Wang
  79. From Business Metrics to Behavioral Personas Controllable User Simulation for Pre-Deployment Agent Testing

    Zhenyu Zhang, Yuan Ling, Dingyang Chen, Xinyang Shen
  80. From Reward-Hack Activations to Agentic Risk States: Context-Calibrated Mechanistic Monitoring in LLM Agents

    Patrick Wilhelm, Odej Kao
  81. From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

    Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidari, Zhou Weimin, Yong Zhang, Mohammad Akbari
  82. From Self-Preservation to Peer-Preservation: A Staged Framing of Preservation-Oriented Misalignment in Frontier Models

    Rundong Yang
  83. From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

    Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah, Zhiwei Shang
  84. FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback

    Xueqing Wu, Zihan Xue, Da Yin, Shuyan Zhou, Kai-Wei Chang, Nanyun Peng, Yeming Wen
  85. Full-Season Agent Evaluation in Soybean Farm Operations under Real-World Agricultural Process Dynamics

    Ao Qu, Panagiotis Michelakis, Yiannis Hadjiyianni, Feng Li, Jingchi Jiang, Dimitrios Stamoulis, Jie Liu
  86. Game-Theoretic Multi-LLM Routing for Safer Agents in the Wild

    Jing Wang, Jie Shen, Dean Foster, Zohar Karnin
  87. GameDevBench: Evaluating Agentic Capabilities Through Game Development

    Wayne Chi, Yixiong Fang, Arnav Yayavaram, Siddharth Yayavaram, Seth Karten, Qiuhong Anna Wei, Runkun Chen, Alexander Wang, Valerie Chen, Ameet Talwalkar, Chris Donahue
  88. General Agent Evaluation

    Elron Bandel, Asaf Yehudai, Lilach Eden, Yehoshua Sagron, Yotam Perlitz, Elad Venezian, Natalia Razinkov, Natan Ergas, Shlomit Shachor Ifergan, Segev Shlomov, Michal Jacovi, Leshem Choshen, Liat Ein-Dor, Yoav Katz, Michal Shmueli-Scheuer
  89. Goal-Drift Probes: Anticipating Multi-Turn LLM Agent Failure From Mid-Network Activations

    Oliver Yangluo Chen
  90. Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

    Sidharth Pulipaka, Stanislau Hlebik, Leonidas Raghav, Sahar Abdelnabi, Vyas Raina, Ivaxi Sheth, Mario Fritz
  91. Hidden in Plain Sight: Benchmarking Agent Safety Against Decomposition Attacks with DecompBench

    Vikhyath Kothamasu, Virginia Smith, Chhavi Yadav
  92. HiMA: Efficient Hybrid Model Serving for Agentic Systems

    Yuzhou Nie, Weifang Zhang, Zhen Xu, Songyang Peng, Yuheng Tang, Minzhou Pan, Bo Li, Dawn Song, Ce Zhang, Wenbo Guo
  93. Honeyval: A Comprehensive Evaluation Framework for LLM-powered HTTP Honeypots

    Mark Vero, Fabian Kaczmarczyck, Ivan Petrov, Ilia Shumailov, Niels Heinen, Jamie Hayes, Tianqi Fan, Luca Invernizzi, Martin Vechev
  94. House Rules: Institutional Design in Multi-Agent LLM Code Markets

    Tony O'Halloran, Allison Claire Zhuang, Michael Zhang, Thibault Soubeste, Alexandre Sallinen, Stefan Krsteski, Charlotte Meyer, Guillaume Allegre, Kailey Seiler
  95. How can we assess human-agent interactions? Case studies in software agent design

    Valerie Chen, Rohit Malhotra, Xingyao Wang, Juan Michelini, Xuhui Zhou, Aditya Bharat Soni, Hoang H. Tran, Calvin Smith, Ameet Talwalkar, Graham Neubig
  96. How Far Are VLMs from Privacy Awareness in the Physical World? An Empirical Study

    Junran Wang, Xinjie Shen, Zehao Jin, Pan Li
  97. How Should Your Agent Talk to Mine? Measuring the Utility–Security Frontier of Cross-Boundary Agentic Delegation

    Xisen Wang, Adel Bibi, Kevin Qinghong Lin, Philip Torr, Jindong Gu
  98. How Well Do Models Follow Their Constitutions?

    Arya Jakkli, Senthooran Rajamanoharan, Neel Nanda
  99. HowLLMDecision Agents Fail in the Wild: AReproducible Failure-Cell Framework

    Anthony Cruz
  100. InferenceBench: A Benchmark for Open-Ended LLM Inference Optimization by AI Agents

    Jehyeok Yeon, Ben Rank, Maksym Andriushchenko
  101. Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

    Hershraj Niranjani, Ujwal Kumar, Phan Xuan Tan
  102. Internal-State Probes Read the Situation, Not the Action: Three Negative Results for Pre-Action Misalignment Monitoring

    Max Fomin, Elad David, Amit LeVi
  103. iOSWorld: A Benchmark for Personally Intelligent Phone Agents

    Lawrence Keunho Jang, Mareks Woodside, Geronimo Carom, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov
  104. Is Your LLM-as-a-Recommender Agent Trustable? LLMs' Recommendation is Easily Hacked by Biases (Preferences)

    Zichen TANG, Zirui Zhang, Qian Wang, Zhenheng Tang, Xiaowen Chu, Bo Li
  105. It's a TRAP! Task-Redirecting Agent Persuasion Benchmark for Web Agents

    Karolina Korgul, Yushi Yang, Arkadiusz Drohomirecki, Piotr Blaszczyk, Will Howard, Lukas Aichberger, Chris Russell, Philip Torr, Adam Mahdi, Adel Bibi
  106. Knowing When to \texttt{STOP}, \texttt{RECOVER}, and \texttt{SEARCH} \\ A Modular Framework for GUI Automation

    Qijun Han, Haoqin Tu, Zijun Wang, Haoyue Dai, Yiyang Zhou, Nancy Lau, Alvaro A. Cardenas, Yuhui Xu, Ran Xu, Caiming Xiong, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
  107. Latent Undertow: How Ordinary Typos Break Probes

    Elad David, Max Fomin, Amit LeVi
  108. Lateral Data Exfiltration in MCP: How One Compromised Server Captures Cross-Domain Agent Data

    Akshat Sharda, Zhenyu Zhang
  109. Laundering AI Authority with Adversarial Examples

    Jie Zhang, Pura Peetathawatchai, Florian Tramèr, Avital Shafran
  110. Learning Stateful Predictive Knowledge From Experience

    Yan Song, Xidong Feng, Bo Liu, Xinyu Cui, Zichen Liu, Haotian Fu, Mengyue Yang, Cheng Deng, Jian Zhao, Jun Wang
  111. Learning to Inject: Automated Prompt Injection via Reinforcement Learning

    Xin Chen, Jie Zhang, Florian Tramèr
  112. Learning to Learn-at-Test-Time: Language Agents with Learnable Adaptation Policies

    Zhanzhi Lou, Hui Chen, Yibo Li, Qian Wang, Bryan Hooi
  113. LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

    Lalit Yadav, Akshaj Gurugubelli
  114. Liability Frameworks for Agentic AI Systems

    MARCEL OSMOND
  115. Life After Benchmark Saturation: A Case Study of CORE-Bench

    Nitya Nadgir, Sayash Kapoor, Kangheng Liu, Peter Kirgis, Matilda Orona, Stephan Rabanser, Tilman Bayer, Abhishek Shetty, Yue Ling, Derrick Chan-Sew, Rumi Nakagawa, Saiteja Utpala, Zachary S Siegel, Arvind Narayanan
  116. Linguistic Firewall: Geometry as Defense in Multi-Agent Systems Routing

    Dvir Alsheich, Adar Peleg, Ben Hagag, Rom Himelstein, Amit LeVi, Avi Mendelson
  117. LinuxArena: A Control Setting for AI Agents in Live Production Software Environments

    Tyler Tracy, Ram Potham, Nikolas Kuhn, Myles Heller, Anshul Khandelwal, Cody Rushing, Henri Lemoine, Miguel Brandão, Tomáš Turlík, Adam Hanson, Josh Hills, Amy Dieu-Am Ngo, Ram Rachum, Nik Mitchell, Falko Galperin, Oscar Sykes, Pip Arnott, Samuel Prieto Lima, Carlos Rafael Giudice, Matt Goldwater, Daniel J Popp, Drew de Wet, Ruben Castaing, Qi Guo, Douw Marx, Benjamin Shaffrey, Justin Shenk, Martin Milbradt, Hannah Meagher, Shaheen Ahmed-Chowdhury, Daniel O'Connell, Christopher William Canal, Buck Shlegeris, Aryan Bhatt
  118. LLM-Guided Planning for Multi-hop Reasoning over Multimodal Nuclear Regulatory Documents

    mingyu jeon, Bokyeong Kim, Suwan Cho, Jae Young Suh, Yonggyun Yu
  119. LLMs Struggle to Rank Products Robustly

    Kumail Alhamoud, Charikleia Moraitaki, Carlos Hinojosa, Jennifer Zhou, Yuexing Hao, Philip Torr, Adel Bibi, Marzyeh Ghassemi
  120. Lost in the Maze: Overcoming Context Limitations in Long- Horizon Agentic Search

    Howard Yen, Yoonsang Lee, Ashwin Paranjape, Mengzhou Xia, Thejas Venkatesh, Jack Hessel, Danqi Chen, Yuhao Zhang
  121. MacArena: Benchmarking Computer Use Agents on an Online macOS Environment

    Victor Muryn, Maksym Shamrai, Sofiia Mazepa, Yehor Khodysko
  122. Making Open-Source Text LLM Watermarks Durable Against Merging

    Luisa Scharff, Thibaud Gloaguen, Robin Staab, Martin Vechev
  123. Measuring Five-Nines Reliability: Sample-Efficient LLM Evaluation in Saturated Benchmarks

    Eungyeup Kim, Chenchen Gu, Vashisth Tiwari, J Zico Kolter
  124. Mecha-nudges for Machines

    Giulio Frey, Kawin Ethayarajh
  125. Mechanism Design Is Not Enough: Prosocial Agents for Cooperative AI

    Xuanqiang Angelo Huang, Charlie Tharas, Samuele Marro, Van Q. Truong, Bernhard Schölkopf, Emanuele La Malfa, Zhijing Jin
  126. Memory-Induced Tool-Drift in LLM Agents

    Mahavir Dabas, Jihyun Jeong, Ming Jin, Ruoxi Jia
  127. Meta-Harness: Post-Training Reliable Agent Systems via Harness Search

    Yoonho Lee, Roshen Sanjay Nair, Qizheng Zhang, Kangwook Lee, Omar Khattab, Chelsea Finn
  128. Minibinder Lab: The Reliability Gap Of Agents For Designing High Quality Protein Binders

    Anda-Raluca Epure, Deepa Mal Korani, Jonathan G. Hedley, Maxim Secor, Alejo Nevado-Holgado, Philip Torr
  129. MIRAGE: A Polarity-Flipping Encoding Subspace in LLM Agents

    Pratibha Revankar, Kargi Chauhan, Jihye Kim, Sadiba Nusrat Nur, Vincent Siu, Chenguang Wang
  130. Mitigating Over-Personalization in Language Models via Structured Memory

    Hakeem Hannoon, Andrew Zhao, Mihir Narayan, Sharvin Goyal, Ivaxi Sheth
  131. Mitigating Visual Hallucinations for Reliable Multimodal Agents

    Sohyeon Kim, Sang Yeon Yoon, Kyeongbo Kong
  132. MyPCBench: A Benchmark for Personally Intelligent Computer-Use Agents

    Lawrence Keunho Jang, Andrew Keunwoo Jang, Jing Yu Koh, Ruslan Salakhutdinov
  133. NEST: Nascent Encoded Steganographic Thoughts

    Artem Karpov
  134. Network-Level Prompt and Trait Leakage in Local Research Agents

    Hyejun Jeong, Mohammadreza Teymoorianfard, Abhinav Kumar, Amir Houmansadr, Eugene Bagdasarian
  135. NitroBox: Lightning-Fast Sandbox for Large-Scale RL Training

    Yuzhou Nie, Ruilin Zhou, Zhaorun Chen, Jingyang Zhang, Yan Shao, Hongwei Li, Bo Li, Dawn Song, Wenbo Guo
  136. Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

    Thomas Jiralerspong, Flemming Kondrup, Yoshua Bengio
  137. Odysseys: Benchmarking Web Agents on Realistic Long Horizon Tasks

    Lawrence Keunho Jang, Jing Yu Koh, Daniel Fried, Ruslan Salakhutdinov
  138. Omission Constraints Decay While Commission Constraints Persist in Long-Context LLM Agents

    Yeran Gamage
  139. One Turn Too Late: Response-Aware Defense Against Hidden Malicious Intent in Multi-Turn Dialogue

    Xinjie Shen, Rongzhe Wei, Peizhi Niu, Haoyu Peter Wang, Ruihan Wu, Eli Chien, Bo Li, Pin-Yu Chen, Pan Li
  140. Online Boundary-Aware Memory for Case-Based Reasoning Agents

    Zheng Dong, Luming Shang
  141. Open-World Evaluations for Measuring Frontier AI Capabilities

    Sayash Kapoor, Peter Kirgis, Andrew Schwartz, Stephan Rabanser, J.J. Allaire, Rishi Bommasani, Harry Coppock, Magda Dubois, Gillian K Hadfield, Andrew B. Hall, Sara Hooker, Seth Lazar, Steve Newman, Dimitris Papailiopoulos, Shoshannah Tekofsky, Helen Toner, Cozmin Ududec, Arvind Narayanan
  142. OpenSeeker: Democratizing Frontier Search Agents by Fully Open-Sourcing Training Data

    Yuwen Du, Rui Ye, Shuo Tang, Xinyu Zhu, Yijun Lu, Yuzhu Cai, Siheng Chen
  143. Out-of-Distribution Generalization of Risk Aversion in Language Models

    Kristina Zhang, Junior Chinomso Okoroafor, Benjamin Maltbie, Andrew Lin, Abhitej Bokka, Elliott Thornley
  144. Oversight is Not Compliance: Tacit Collusion in LLM Pricing Agents Under Antitrust Regulation

    Meiri Anto, Juan J Vazquez
  145. Parameters as Agentic Memory: Internalizing Long-Horizon Memories for Efficient LLM Agents

    Zhenheng Tang, Fanjunduo Wei, Zichen TANG, Peijie Dong, Xiang Liu, Qian Wang, Xiaowen Chu, Bo Li
  146. Peer-Preservation in Frontier Models

    Yujin Potter, Nicholas Crispino, Vincent Siu, Chenguang Wang, Dawn Song
  147. Plausible Deniability Guarantees for Whistleblowers

    Leo Richter, Matt J. Kusner
  148. Position: LLM Social-Simulation Agents in the Wild Cannot Serve as Social Scientific Evidence Without an Identification Strategy

    Zezheng Lin, Fengming Liu
  149. Position: Stop Hardcoding Multi-Agent Workflows That General Agents Will Outgrow

    Xisen Wang
  150. ReCode: Unify Plan and Action for Universal Granularity Control

    Zhaoyang Yu, Jiayi Zhang, Huixue Su, Yufan Zhao, Yifan Wu, Mingyi Deng, Jinyu Xiang, Yizhang Lin, Fanqi Kong, Lingxiao Tang, Yuyu Luo, Bang Liu, Chenglin Wu
  151. Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

    Hongwei Yao, Yiming Liu, Yiling He, Bingrun Yang
  152. Reliable to Expressive: A Curriculum for Rubric-Following Safety Judges

    YongTaek Lim, hyeji choi, Minwoo Kim
  153. Remote Control: AI Control with User Actions

    Jou Barzdukas, Matthew Nguyen
  154. RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments

    Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge
  155. Reward Bias Substitution: Single-Axis Bias Mitigations Redirect Optimization Pressure

    Max Lamparth, Daniel Fein, Andreas Haupt, Marcel Hussing, Mykel Kochenderfer
  156. Reward Hacking in Rubric-Based Reinforcement Learning

    Anas Mahmoud, MohammadHossein Rezaei, Zihao Wang, Anisha Gunjal, Bing Liu, Yunzhong He
  157. Robust Multi-Agent LLMs under Byzantine Faults

    Haejoon Lee, Vincent-Daniel Yun, Hyeonho Oh, Dimitra Panagou, Sai Praneeth Karimireddy
  158. SAFARI: Scaling Long Horizon Agentic Fault Attribution via Active Investigation

    Chenyang Zhu, Jiayu Yao, Kushal Chawla, Youbing Yin, Nathan Wolfe, Pengshan Cai, Jingyu Wu, Spencer Hong, Sangwoo Cho, Shi-Xiong Zhang, Daben Liu, Sambit Sahu, Erin Babinsky
  159. Safe Flow Q-Learning: Offline Safe Reinforcement Learning with Reachability-Based Flow Policies

    Mumuksh Tayal, Manan Tayal, Ravi Prakash
  160. Safe Under Budget? Verification Budgets and Abstention Failures in Web Agents

    Daoyuan Li
  161. SafeClawBench: An Operating-System Perspective on Evaluating the Security of Claw-like Agent Systems

    Peizhi Niu, Shangding Gu, Wenjie Qu, Tianneng Shi, Yuankai Li, Ahmad Tawaha, Hend Alzahrani, Vincent Siu, Boyi Li, Chenguang Wang, Jiaheng Zhang, Basel Alomair, Ming Jin, Muhao Chen, Chi Wang, Costas Spanos, Dawn Song
  162. Same Action, Different Justification: Path-Based Authorization for Irreversible Agent Actions

    Jungsoo Baek
  163. Same Biology, Different Scores: Quantifying the Tool-Use Confound in Agentic Biology Evaluation

    Alyssia Jovellanos, Martin Ciesielski-Listwan
  164. Scaling Laws for Strategic Interactions

    Joie Zhang, Danqi Chen, Peter Henderson, Lewis Hammond
  165. Sell Me This Stock: Unsafe Recommendation Drift in LLM Agents

    Zekun Wu, Adriano Koshiyama, Sahan Bulathwela, Maria Perez-Ortiz
  166. SkillOptimizer: Agent Skill Optimization Through Subskills Without Task Supervision

    Nicholas Crispino, Shubham Gaur, Xuefang Yang, Angela Yu, Berat Ercevik, Clara Sapugay, Yujin Potter, Dawn Song, Chenguang Wang
  167. SlotGuard: Stop Oversharing Private Local Context in LLM Agent Transcripts

    Haocheng Xia, Yongjoo Park
  168. Smarter Saboteurs, Better Fixers: Scaling & Security in Linear Multi-Agent Workflows

    Timothy McAllister, Sina Abdidizaji, Ivan Garibay, Ozlem Garibay
  169. Sockpuppetting: Jailbreaking LLMs by Combining Prefilling with Optimization

    Asen Dotsinski, Panagiotis Eustratiadis
  170. SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

    Jindong Li, Ying Liu, Yali Fu, Jinjing Zhu, Leyao Wang, Menglin Yang, Rex Ying
  171. Steering Externalities: Benign Activation Steering Unintentionally Increases Jailbreak Risk for Large Language Models

    Chen Xiong, Zhiyuan He, Pin-Yu Chen, Ching-Yun Ko, Tsung-Yi Ho
  172. Stop Comparing LLM Agents Without Disclosing the Harness

    Yunbei Zhang, Janet Wang, Yingqiang Ge, Weijie Xu, Jihun Hamm, Chandan K. Reddy
  173. Stop Reporting System-Level AI Reasoning as Individual Model Capability

    Adhiraj Chhoda
  174. Structured Hallucination in Tool-Using Agents: Measuring and Mitigating LLM Synthesis Corruption in Production

    Tung Thanh Hoang
  175. SudoBench: A Contextual Authorization Benchmark for LLM Agents

    Vincent Siu, Tianneng Shi, Shangding Gu, Zhun Wang, Dawn Song, Chenguang Wang
  176. Temper and Tilt Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment

    Ye Wang, Jing Liu, Toshiaki Koike-Akino
  177. Terminal Agents Suffice for Enterprise Automation

    Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, Sai Rajeswar
  178. The Best-Laid SCHEMEs: Coordinated Sabotage and Monitoring in Multi-Agent Systems

    Nikolay Radev, Lennart J Haas, Benjamin Arnav, Pablo Bernabeu-Perez
  179. The Monitorability Gap Between Reasoning in Thinking and Reasoning in the Output

    Zora Che, David Lindner
  180. The Safety Illusion of Greedy Decoding: Diagnosing Booster's Compliant Leakage and a Phase-2 Mitigation

    Tung Lam Tran, Muhammad Rifki Kurniawan, Christopher Kanan
  181. The Token Tax : Measuring the Diminishing Returns of Test-Time Compute in Agentic Pipelines

    Kushagra Agrawal, Christian Beecks, Man-Fai Leung
  182. Thought Virus: Spreading Subliminal Biases in Multi-Agent Systems

    Moritz Weckbecker, Jonas Müller, Ben Hagag, Michael Mulet
  183. Tool Selection Bias Amplifies in Multi-turn User–Agent Interactions

    Nakyeong Yang, Junseok Kim, Kyomin Jung
  184. Tool-Framing Bypasses LLM Safety: Procedural Abstraction Reduces Refusal Rates by Up to 40 Percentage Points Across Models

    Kevin Power
  185. ToolFailBench: Diagnosing Tool-Use Failures in LLM Agents

    Harsh Soni
  186. Toward Scalable Terminal Task Synthesis via Skill Graphs

    Zhiyuan Fan, TingHao YU, Yuanjun Cai, JiangTaoGuan, Yun Yang, Dingxin Hu, Jiang Zhou, Xing W, Zhuo Han, feng zhang, Lilin Wang
  187. Towards Budget-Aware Agents: Do LLM Agents Know What They Will Spend?

    Yuxiang Lin, Zihan Wang, Mengyang Liu, Yuxuan Shan, Longju Bai, Junyao Zhang, Xing Jin, Boshan Chen, Jinyan Su, Xingyao Wang, Jiaxin Pei, Manling Li
  188. Towards Predictive Models of Strategic Behaviour in Large Language Model Agents

    Jennifer Za, Aristeidis Panos, Jan Cuhel
  189. TRACE: Capability-Targeted Agentic Training

    Hangoo Kang, Tarun Suresh, Jon Saad-Falcon, Azalia Mirhoseini
  190. TraceSafe: A Systematic Assessment of LLM Guardrails on Multi-Step Tool-Calling Trajectories

    Yen-Shan Chen, Sian-Yao Huang, Cheng-Lin Yang, Yun-Nung Chen
  191. Tracking the Behavioral Trajectories of Adapting Agents

    Jonah Leshin, Manish Shah, Ian Timmis
  192. Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models

    Tri Cao, Khoi M. Le, Thong Thanh Nguyen, Cong-Duy T Nguyen, Quynh Vo, Anh Tuan Luu, Chunyan Miao, See-Kiong Ng, Shuicheng YAN, Bryan Hooi
  193. Training Language Agents to Learn from Experience

    Yuval Shalev, Zifeng Ding, Mateja Jamnik
  194. Training ML Models with Predictable Failures

    Will Schwarzer, Scott Niekum
  195. Untrusted Content Masking for Web Agents with Security Guarantees

    Kristina Nikolić, Egor Zverev, Javier Rando, Matthew Jagielski, Edoardo Debenedetti, Florian Tramèr
  196. VATS: Exploiting Implicit Authority in Error-Path Injection via Systematic Mutation

    Harshil Patel, Kunal Pai
  197. VIGIL: A Reflective Runtime for Self-Healing LLM Agents

    Christopher Cruz
  198. WARD: Adversarially Robust Defense of Web Agents Against Prompt Injections

    Tri Cao, Yulin Chen, Thanh Hieu Cao, Yibo Li, Khoi M. Le, Thong Thanh Nguyen, Yuexin Li, Yufei He, Yue Liu, Shuicheng YAN, Bryan Hooi
  199. WARP: A Wrapper-Based, Adaptive, Realistic Pipeline for Reliable Web-Agent Robustness Testing

    Jasmine Xinze Li, Ashton Chew, Maxwell Lin, Eliot Krzysztof Jones, Xiaohan Fu, Andy Zou
  200. We Let Agents Compete and They Tried to Cheat. KernelGuard:Defending GPU Competitions from Adversarial Agentic Systems

    Muhammed Emin Baslak, Erik Schultheis, Matej Sirovatka, Mark Saroufim, Alex L Zhang
  201. Web Agents Leak Sensitive Data on Simple Scalable Websites

    Zachary Yahn
  202. WebAgentGuard: A Reasoning-Driven Guard Model for Detecting Prompt Injection Attacks in Web Agents

    Yulin Chen, Tri Cao, Haoran Li, Yibo Li, Yue Liu, Yufei He, Khoi M. Le, Yangqiu Song, Shuicheng YAN, Bryan Hooi
  203. WebArena-Pro: A Heterogeneous, Multimodal, Reproducible Benchmark for Web Agents

    Imene Kerboua, Fatemeh Pesaran zadeh, Xing Han Lù, Weijian Qi, Alexander Miller, Junyi Song, Yunjia Tian, Dongjin Kang, Seyeon Choi, Marzia Nouri, Ewen Gueguen, Matteo Boglioni, Fengyuan Liu, Zeyi Liao, Mengqi Yuan, Yue Li, Alexandre Lacoste, Alexandre Drouin, Spandana Gella, Huan Sun, Gunhee Kim, Siva Reddy
  204. WebPII: Benchmarking Visual PII Detection for Computer-Use Agents

    Nathan J. Zhao
  205. What Can One Bad Tool Call Destroy? Measuring and Minimizing Blast Radius in Agentic Tool Use

    Muhammet Anil Yagiz
  206. What Game-Theoretic Benchmarks Miss: Strategic Silence in Multi-Agent LLMs

    Jerick Shi, Terry Jingchen Zhang, Vincent Conitzer, Zhijing Jin
  207. When Cloud Agents Meet Device Agents: Lessons from Hybrid Multi-Agent Systems

    Corrado Rainone, Davide Belli, Bence Major, Arash Behboodi
  208. When Do Covert Channels Emerge? Probing Steganographic Capacity in Multimodal Agents via Diffusion VAE Latents

    Catherine Ge-Wang, Tushar Nagar, Joy Zheyun Yang
  209. When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel

    Wenkai Li, Fan Yang, Ananya Hazarika, Shaunak A. Mehta, Koichi Onoue
  210. Who Flips? Self- and Cross-Model Counterarguments Reveal Answer Instability in LLMs

    Nafiseh Nikeghbal, Amir Hossein Kargaran, Shaghayegh Kolli, Jana Diesner
  211. Widening the Gap: Exploiting LLM Quantization via Outlier Injection

    Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev
  212. WinDOM: Self-Family Distillation for Small-Model GUI Grounding

    Chengheng Li Chen, Zhiqian Zhou, Hao Chen, Nicolas Chauvin
  213. Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing

    Ankush Kadu, Aswanth Krishnan
  214. WorldFork: Trace-Auditable Forecasting Agents in Open-Ended Domains

    Hanson Wen, James Gui
  215. Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

    Zijun Wang, Haoqin Tu, Letian Zhang, Hardy Chen, Juncheng Wu, Xiangyan Liu, Zhenlong Yuan, Tianyu Pang, Michael Qizhe Shieh, Fengze Liu, Zeyu Zheng, Huaxiu Yao, Yuyin Zhou, Cihang Xie
  216. Your Cursor is Not Secure: Command Line Interface Agent Can Expose Realistic Risks Through Tactics, Techniques, and Procedures

    Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Bin Hu, CHIU Hung Chun, Siyuan Ma, Yizhe Zhang, Xusheng Xiao, Yinzhi Cao, Zhen Xiang, Chaowei Xiao