ICLR 2026 Past Other
ICLR 2026 Workshop on AI with Recursive Self-Improvement
ICLR 2026 Workshop RSI
- Submission deadline
- Feb 11, 2026, 11:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (110)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Framework for Prompt Optimization and Translation Across Foundation Models
-
A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula
-
ACE: Self-Evolving LLM Coding Framework Adversarial Unit Test Generation and Preference Optimization
-
Actor-Curator: Scalable Policy-driven Curriculum Learning for RL Post-Training
-
Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation
-
Adaptive Meta-Curriculum for Test-Time Self-Improvement
-
Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning
-
Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning
-
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
-
Aligned but Stereotypical? Understanding and Mitigating Social Bias in LLM-Driven Text-to-Image Models
-
AlphaApollo: A System for Deep Agentic Reasoning
-
Anchored Self-Play for Code Repair
-
AUTOHARNESS: IMPROVING LLM AGENTS BY AUTOMATICALLY SYNTHESIZING A CODE HARNESS
-
Beyond Solving: A Closer Look at LLMs as Solution Verifiers
-
Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants
-
Can Current Language Models Close the Discovery to Application Loop?
-
Can Language Models Discover Scaling Laws?
-
CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad
-
CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning
-
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
-
Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
-
Contextual Drag: How Errors in the Context Affect LLM Reasoning
-
Contrastive Self-Refinement for Low-Cost Adaptation in Real-World Text-to-SQL
-
Correct Reasoning Paths Visit Shared Decision Pivots
-
CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction
-
Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models
-
Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction
-
Differentiable Evolutionary Reinforcement Learning
-
Discover the distinguishing and effective reasoning patterns among LLMs via an LLM
-
Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
-
Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences
-
Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data
-
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning
-
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
-
ESDAE: Evaluating Synthetic Data for Agent Evaluation
-
Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?
-
Federated Agent Reinforcement Learning
-
Federation over Text
-
Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
-
From Growing to Looping: A Unified View of Iterative Computation in LLMs
-
GASP: Guided Asymmetric Self-Play For Coding LLMs
-
Generative Recursive Reasoning Models
-
Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction
-
In-Context Adaptation
-
Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement
-
Intelligent Robot Manipulation Requires Self-Directed Learning
-
Interestingness as an Inductive Heuristic for Future Compression Progress
-
Just Enough Learning: GRPO-Guided Controllers for Hyperparameter Sweeps
-
Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation
-
Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
-
Language Self-Play For Data-Free Training
-
Language-Guided Expertise Evolution for Protein Optimization
-
Learning to Continually Learn via Meta-learning Agentic Memory Designs
-
Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL
-
Learning What to Learn: Curriculum Curation for Test-Time Agent Learning
-
Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL
-
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
-
Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation
-
MAPPA: Scaling Multiagent Systems with Process Rewards
-
MimicAgent: Learning Quadruped Skills via Text-to-Trajectory Generation
-
OMEGA: Optimizing Machine learning by Evaluating Generated Algorithms
-
One-Step Video Depth Estimation via Self-Distillation
-
Orthogonal Gradient Projection for Continual LLM Unlearning
-
POLARIS: A GODEL AGENT FRAMEWORK FOR SMALL LANGUAGE MODELS THROUGH EXPERIENCE ABSTRACTED POLICY REPAIR
-
PostTrainBench: Can LLM Agents Automate LLM Post-Training?
-
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
-
Real-Time Procedural Learning From Experience for AI Agents
-
Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search
-
Reasoning Cache: Learning to Extrapolate to Long Lengths via Short-Length RL
-
Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space
-
Reference-Guided Machine Unlearning
-
Refining Large Language Models with Self-Generated Data Through Iterative Training
-
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
-
Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion
-
Reward Hacking in Self-Improving Code Agents
-
RFTF: Reinforcement Fine-tuning for Vision-language-action Models with Temporal Feedback
-
SAGE: Self-play Adversarial Games Enhance Large Language Model Reasoning Capabilities
-
SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement
-
Self-Adapting Agents for Automating Research Coding Workflows
-
Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation
-
Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback
-
Self-Evolving Language Models through Co-evolved Discriminative Rubrics
-
Self-Improvement via Fast Tree-search
-
Self-Improving Clinical Reasoning via Textual Gradients
-
Self-Improving Vision-Language-Action Models with Data Generation via Residual RL
-
Self-Improving VLM Judges Without Human Annotations
-
Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
-
Simple Baselines are Competitive with Code Evolution
-
SimpleMem: Efficient Lifelong Memory for LLM Agents
-
SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning
-
Soft Mellowmax Monte Carlo Planning
-
Structure Enables Effective Self-Localization of Errors in LLMs
-
TamperBench: A Systematic Framework to Stress-Test LLM Safety Under Fine-Tuning and Tampering
-
TangramSR: A Benchmark for Recursive Self-Improvement In Continuous Geometric Reasoning
-
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
-
Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls
-
Test-Time Meta-Adaptation with Self-Synthesis
-
Test-Time Self-Distillation
-
TextBO: Bayesian Optimization in Language Space for Eval-Efficient Self-Improving AI
-
Theory-Driven Modeling and LLM-Guided Evolution for Power System Scheduling
-
Tiny Autoregressive Recursive Models
-
Towards Execution-Grounded Automated AI Research
-
Unlocking Intrinsic Self-Reflection for LLM Preference Policy Optimization
-
Unrolled Policy Iteration for Tiny Recursive Models
-
Verifying the Verifiers: Failure Attribution for Agentic Benchmark Diagnostics and Training Data Curation
-
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use
-
Vision-Guided Iterative Refinement for Frontend Code Generation
-
VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model
-
World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry
-
Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning