ICLR 2025 Past Large language modelsComputer vision
Scaling Self-Improving Foundation Models without Human Supervision
SSI-FM
- Submission deadline
- Feb 13, 2025, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (73)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Self-Improving Coding Agent
-
Adaptively-Labeled Vision Datasets Via Instance-Level Retrieval
-
AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement
-
AIDE: Agentically Improve Visual Language Model with Domain Experts
-
Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning
-
AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
-
AMPO: Active Multi Preference Optimization for Self-play Preference Selection
-
An Adversarial Collaborative Framework for Comprehensive Image Captioning
-
An Architecture Search Framework for Inference-Time Techniques
-
Assessing Diversity Collapse in Reasoning
-
Automated Capability Discovery via Model Self-Exploration
-
Aviary: Training Language Agents on Challenging Scientific Tasks
-
Boss LLM: Adaptation via No-Regret Learning
-
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens
-
Can Language Models Falsify? The Need for Inverse Benchmarking
-
D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff
-
Demystifying Long Chain-of-Thought Reasoning in LLMs
-
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
-
DISC: Dynamic Decomposition Improves LLM Inference Scaling
-
Don't Throw Away Data: Improving Sequence Knowledge Distillation with Minimum Bayes Risk Decoding
-
Escaping Collapse: The Strength of Weak Data for Large Language Model Training
-
Evaluating LLMs Without Oracle Feedback: Agentic Annotation Evaluation Through Unsupervised Consistency Signals
-
Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models
-
Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy
-
Exploring the Pre-conditions for Memory-Learning Agents
-
Game-Theoretic Regularized Self-Play Alignment of Large Language Models
-
Great Models Think Alike and this Undermines AI Oversight
-
HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning
-
How to Mitigate Overfitting in Weak-to-strong Generalization?
-
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm
-
Improving Test-Time Search for LLMs with Backtracking Against In-Context Value Verifiers
-
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context
-
KernelBench: Can LLMs Write Efficient GPU Kernels?
-
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
-
LaMsS: When Large Language Models Meet Self-Skepticism
-
Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment
-
MALT: Improving Reasoning with Multi-Agent LLM Training
-
MetaSC: Test-Time Safety Specification Optimization for Language Models
-
Mitigating Short Board Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization
-
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
-
Moral Intrinsic Rewards for Automated Alignment of LLM Agents
-
MPAW: Multi-Preference Alignment through Weak Model Collaboration for Efficient and Flexible LLM Decoding
-
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (Abridged)
-
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
-
Multi-Turn Code Generation Through Single-Step Rewards
-
Natural Language Reinforcement Learning
-
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
-
OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning
-
Optimizing Test-Time Compute via Meta Reinforcement Finetuning
-
Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
-
Preference Tree Optimization: Enhancing Goal-Oriented Dialogue with Look-Ahead Simulations
-
ReSL: Enhancing Deep Clustering Through Reset-based Self-Labeling
-
RMBoost: Reward Model Training With Preference-Conditional Multi-Aspect Synthetic Data Generation
-
Safety is Essential for Responsible Open-Ended Systems
-
Scalable Thompson Sampling via Ensemble++
-
Scaling Flaws of Verifier-guided Search in Mathematical Reasoning
-
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
-
SCOPE: Improving LLM Conversations with Efficient Semantic Space Planning
-
Self-Correcting Self-Consuming Loops For Generative Model Training
-
Self-correction for OOD generalization
-
Self-Improving Diffusion Models With Synthetic Data
-
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
-
Self-Taught Self-Correction for Small Language Models
-
Solving Robotic Tasks via Self-Adapting Improvement Loops with Internet Video Knowledge
-
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources
-
Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models
-
Towards Internet-Scale Training For Agents
-
Training a Generally Curious Agent
-
Understanding the Capabilities and Limitations of Weak-to-Strong Generalization
-
Value-Based Deep RL Scales Predictably
-
Vision-Language Model Dialog Games for Self-Improvement
-
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
-
Yes, Q-learning Helps Offline In-Context RL