ICLR 2026 Past AgentsSafety & alignment
ICLR 2026 Workshop on Lifelong Agents: Learning, Aligning, Evolving
LLA 2026
- Submission deadline
- Feb 16, 2026, 23:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (104)
Fetched from OpenReview (v2) on 2026-06-10.
-
AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization
-
ACON: Optimizing Context Compression for Long-horizon LLM Agents
-
Actor-Curator: A Scalable RL Post-training Framework with Co-adaptive Curricula
-
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
-
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
-
AgentGym-RL: An Open-Source Framework to Train LLM Agents for Long-Horizon Decision Making via Multi-Turn RL
-
Agentic Cognitive Profiling: Realigning Automated Alzheimer’s Disease Detection with Clinical Construct Validity
-
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
-
AIF-GEN: Open-Source Platform and Synthetic Dataset Suite for Reinforcement Learning on Large Language Models
-
Aligning LLMs Toward Multi-Turn Conversational Outcomes Using Iterative RLHF
-
Alignment Propagation: Spreading Cooperative Behaviors in Multi-Agent Systems through Seed Agents
-
AlphaApollo: A System for Deep Agentic Reasoning
-
Asymmetric Goal Drift in Coding Agents Under Value Conflict
-
Benchmarking Continual Agent Memory for Online Learning, Transfer, and Forgetting
-
Beyond Reward Maximization: Evaluating the Diversity of Trajectories in Reinforcement Learning with Temporal Vendi Score
-
BEYOND SYNTAX: ACTION SEMANTICS LEARNING FOR APP AGENTS
-
BioProAgent: Neuro-Symbolic Grounding for Constrained Scientific Planning
-
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
-
Can We Predict Before Executing Machine Learning Agents?
-
CAP: A Scalable Benchmark for Evaluating Cross-Site Browser Agents with Complex Actions and Perception
-
CF-Router: Closed-Form Solution for Expert Selection in Multimodal Agent Lifelong Learning
-
CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for LLM Reasoning
-
Cold-Start Personalization via Training-Free Priors from Structured World Models
-
Constructive Specification for Plug-and-Play Learnware Agents
-
Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live
-
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents
-
DéjàQ: Open-Ended Evolution of Diverse, Learnable and Verifiable Problems
-
DETACH: Cross-domain Learning for Long-Horizon Tasks via Mixture of Disentangled Experts
-
DomusMind: A Benchmark for Evaluating Lifelong Smart Home Agents Under Drift
-
DRPG (Decompose, Retrieve, Plan, Generate): An Agentic Framework for Academic Rebuttal
-
DSGym: A Standardized and Holistic Framework for Advancing Data Science Agents
-
Efficient Tree-Structured Deep Research with Adaptive Resource Allocation
-
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
-
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
-
EnterpriseOps-Gym: Environments and Evaluations for Stateful Agentic Planning and Tool Use in Enterprise Settings
-
EvoTac: A Self-Evolving LLM Agent for Eliciting Reusable Tacit Negotiation Heuristics from Terminal Outcomes
-
ExecTune: Effective Steering of Black-Box LLMs with Guide Models
-
Expanding the Capabilities of Reinforcement Learning via Text Feedback
-
Experiential Reinforcement Learning
-
Federated Agent Reinforcement Learning
-
FocusAgent: Simple Yet Effective Ways Of Trimming The Large Context of Web Agents
-
From Word to World: Can Large Language Models be Implicit Text-based World Models?
-
GASP: Guided Asymmetric Self-Play For Coding LLMs
-
GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators
-
Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control
-
Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing
-
Hierarchical Agenda Reasoning for Strategic Multi-Turn Dialogue Agents
-
Human-Guided Harm Recovery for Computer Use Agents
-
InfoPO: Information-Driven Policy Optimization for User-Centric Agents
-
Inherited Goal Drift: Contextual Pressure Can Undermine Agentic Goals
-
Intrinsic Credit Assignment for Long Horizon Interaction
-
Learning Agent Routing From Early Experience
-
Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
-
Learning Physical Principles from Interaction: Self-Evolving Embodied Planning via Test-Time Memory
-
Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL
-
Learning to Self-Evolve
-
Learning Transferable Skills in Action RPGs via Directed Skill Graphs and Selective Adaptation
-
Learning What to Learn: Curriculum Curation for Test-Time Agent Learning
-
LHAW: Controllable Underspecification for Long-Horizon Tasks
-
Lifelong Contextual Safety Alignment at Test Time for Multi-Modal LLMs
-
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
-
MAGE: Meta-Reinforcement Learning for Language Agents toward Strategic Exploration and Exploitation
-
Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation
-
MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization
-
MindZero: Learning Online Mental Reasoning With Zero Annotations
-
MobileMem: Evaluating Long-Horizon Memory for Language Agents in Real-World Mobile Environments
-
Narrow Fine-Tuning Erodes Safety Alignment in Vision-Language Agents
-
Navigating the Cost-Performance Pareto Frontier of Test-Time LLM Agent Adaptation
-
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
-
Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback
-
OceanGym: Evaluating Language-Grounded Embodied Agents in Underwater Environments
-
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
-
On Path to Multimodal Historical Reasoning: HistBench and HistAgent
-
One Model, Many Goals: Meta-Learning Preference-Conditioned Alignment for Lifelong LLM Agents
-
PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution
-
Paying Less Generalization Tax: A Cross-Domain Generalization Study of RL Training for LLM Agents
-
PEARL: Self-Evolving Assistant for Time Management
-
PlugMem: A Task-Agnostic Plugin Memory Module for LLM Agents
-
PolicyBank: Evolving Policy Understanding For Evolving Agents
-
Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization
-
ReCreate: Reasoning and Creating Domain Agents Driven by Experience
-
ReMix: Reinforcement Routing for Mixtures of LoRAs in LLM Finetuning
-
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
-
ScenDroid: A Scenario-Level Benchmark for Long-Horizon, Time-Evolving GUI Agents
-
Self-Distillation Enables Continual Learning
-
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
-
Self-Questioning Language Models
-
SimpleMem: Efficient Lifelong Memory for LLM Agents
-
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
-
Streaming Memory Benchmark: Stage-level Diagnosis with Evidence Dependency Control
-
SWITCH: Benchmarking Interaction and Verification on Real-World Interfaces in Lifelong Embodied Agents
-
The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
-
The Hidden Costs of Domain Fine-Tuning: Pii-Bearing Data Degrades Safety and Increases Leakage
-
TSR: Trajectory‑Search Rollouts for Multi‑Turn RL of LLM Agents
-
TTCS: Test-Time Curriculum Synthesis for Self-Evolving
-
Understanding Knowledge Acquisition and Release in Language Models via Circuits
-
Understanding Reasoning Collapse in Multi-Turn Agent Reinforcement Learning
-
Universe Routing: Why Self-Evolving Agents Need Epistemic Control
-
Verifying the Verifiers: Failure Attribution for Agentic Benchmark Diagnostics and Training Data Curation
-
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
-
Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection
-
When Drafts Evolve: Speculative Decoding Meets Online Learning
-
Which Memory Operation Drives Recovery? A Factorial Study of Retrieve, Write, and Manage Adaptation under Domain Shift
-
Zombie Agents: Persistent Control of Self-Evolving LLM Agents via Self-Reinforcing Injections