ICLR 2025 Past Large language modelsComputer vision

Scaling Self-Improving Foundation Models without Human Supervision

SSI-FM

Submission deadline
Feb 13, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (73)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Self-Improving Coding Agent

    Maxime Robeyns, Martin Szummer, Laurence Aitchison · PDF
  2. Adaptively-Labeled Vision Datasets Via Instance-Level Retrieval

    Brandon Trabucco, Rishav Mukherji, Yutong Bai, Ruslan Salakhutdinov · PDF
  3. AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

    J Rosser, Jakob Nicolaus Foerster · PDF
  4. AIDE: Agentically Improve Visual Language Model with Domain Experts

    Ming-Chang Chiu, Fuxiao Liu, Karan Sapra, Andrew Tao, Yaser Yacoob, Xuezhe Ma, Zhiding Yu, Guilin Liu · PDF
  5. Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

    Anja Šurina, Amin Mansouri, Amal Seddas, Maryna Viazovska, Emmanuel Abbe, Caglar Gulcehre · PDF
  6. AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement

    Pranjal Aggarwal, Bryan Parno, Sean Welleck · PDF
  7. AMPO: Active Multi Preference Optimization for Self-play Preference Selection

    Taneesh Gupta, Rahul Madhavan, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan · PDF
  8. An Adversarial Collaborative Framework for Comprehensive Image Captioning

    Dinesh Chowdary, Ying Xie, Linh Le · PDF
  9. An Architecture Search Framework for Inference-Time Techniques

    Jon Saad-Falcon, Adrian Gamarra Lafuente, Shlok Natarajan, Nahum Maru, Hristo Todorov, Etash Kumar Guha, E. Kelly Buchanan, Mayee F Chen, Neel Guha, Christopher Re, Azalia Mirhoseini · PDF
  10. Assessing Diversity Collapse in Reasoning

    Xingyu Dang, Christina Baek, J Zico Kolter, Aditi Raghunathan · PDF
  11. Automated Capability Discovery via Model Self-Exploration

    Cong Lu, Shengran Hu, Jeff Clune · PDF
  12. Aviary: Training Language Agents on Challenging Scientific Tasks

    Siddharth Narayanan, James D. Braza, Ryan-Rhys Griffiths, Manvitha Ponnapati, Albert Bou, Jon M Laurent, Ori Kabeli, Geemi Wellawatte, Sam Cox, Samuel G Rodriques, Andrew White · PDF
  13. Boss LLM: Adaptation via No-Regret Learning

    Yu Feng, Avishree Khare, Nghia Nguyen, Sikata Bela Sengupta · PDF
  14. Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens

    Zhepeng Cen, Yao Liu, Siliang Zeng, Pratik Chaudhari, Huzefa Rangwala, George Karypis, Rasool Fakoor · PDF
  15. Can Language Models Falsify? The Need for Inverse Benchmarking

    Shiven Sinha, Shashwat Goel, Ponnurangam Kumaraguru, Jonas Geiping, Matthias Bethge, Ameya Prabhu · PDF
  16. D3: A Large Dataset for Training Code Language Models to Act Diff-by-Diff

    Ulyana Piterbarg, Kanishk Gandhi, Lerrel Pinto, Noah Goodman, Rob Fergus · PDF
  17. Demystifying Long Chain-of-Thought Reasoning in LLMs

    Edward Yeo, Yuxuan Tong, Xinyao Niu, Graham Neubig, Xiang Yue · PDF
  18. DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks

    Amin Karimi Monsefi, Kishore Prakash Sailaja, Ali Alilooee, Ser-Nam Lim, Rajiv Ramnath · PDF
  19. DISC: Dynamic Decomposition Improves LLM Inference Scaling

    Jonathan Light, Wei Cheng, Yue Wu, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen · PDF
  20. Don't Throw Away Data: Improving Sequence Knowledge Distillation with Minimum Bayes Risk Decoding

    Jun Wang, Eleftheria Briakou, Hamid Dadkhahi, Rishabh Agarwal, Colin Cherry, Trevor Cohn · PDF
  21. Escaping Collapse: The Strength of Weak Data for Large Language Model Training

    Kareem Amin, Sara Babakniya, Alex Bie, Weiwei Kong, Umar Syed, Sergei Vassilvitskii · PDF
  22. Evaluating LLMs Without Oracle Feedback: Agentic Annotation Evaluation Through Unsupervised Consistency Signals

    Cheng Chen, Haiyan Yin, Ivor Tsang · PDF
  23. Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models

    Sid Bharthulwar, John Rho, Katrina Brown · PDF
  24. Explain-Query-Test: Self-Evaluating LLMs Via Explanation and Comprehension Discrepancy

    Saeid Asgari, Joao Monteiro · PDF
  25. Exploring the Pre-conditions for Memory-Learning Agents

    Vishwa Shah, Vishruth Veerendranath, Graham Neubig, Daniel Fried, Zora Zhiruo Wang · PDF
  26. Game-Theoretic Regularized Self-Play Alignment of Large Language Models

    Xiaohang Tang, Sangwoong Yoon, Seongho Son, Huizhuo Yuan, Quanquan Gu, Ilija Bogunovic · PDF
  27. Great Models Think Alike and this Undermines AI Oversight

    Shashwat Goel, Joschka Strüber, Ilze Amanda Auzina, Karuna K Chandra, Ponnurangam Kumaraguru, Douwe Kiela, Ameya Prabhu, Matthias Bethge, Jonas Geiping · PDF
  28. HEAL: Hierarchical Embedding Alignment Loss for Improved Retrieval and Representation Learning

    Manish Bhattarai, Ryan Barron, Maksim E. Eren, Minh N. Vu, Vesselin Grantcharov, Ismael Boureima, Valentin Stanev, Cynthia Matuszek, Vladimir I Valtchinov, Kim Rasmussen, Boian S. Alexandrov · PDF
  29. How to Mitigate Overfitting in Weak-to-strong Generalization?

    Junhao Shi, Qingyuan Chen, Zhaoye Fei, Yining Zheng, Qipeng Guo, Xuanjing Huang, Xipeng Qiu · PDF
  30. I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

    Yiming Liang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xeron Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Ge Zhang, Lei Ma, Stephen Huang, Jiajun Zhang · PDF
  31. Improving Test-Time Search for LLMs with Backtracking Against In-Context Value Verifiers

    Anikait Singh, Kushal Arora, Sedrick Keh, Jean Mercat, Tatsunori Hashimoto, Chelsea Finn, Aviral Kumar · PDF
  32. InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context

    Bryan Lincoln Marques de Oliveira, Luana Guedes Barros Martins, Bruno Brandão, Luckeciano Carvalho Melo · PDF
  33. KernelBench: Can LLMs Write Efficient GPU Kernels?

    Anne Ouyang, Simon Guo, Simran Arora, Alex L Zhang, William Hu, Christopher Re, Azalia Mirhoseini · PDF
  34. Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation

    Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Xeron Du, Chenghua Lin, Stephen Huang, Jie Fu, Ge Zhang · PDF
  35. LaMsS: When Large Language Models Meet Self-Skepticism

    Yetao Wu, Yihong Wang, Teng Chen, Ningyuan Xi, Qingqing Gu, Hongyang Lei, Luo Ji · PDF
  36. Leveraging Reasoning with Guidelines to Elicit and Utilize Knowledge for Enhancing Safety Alignment

    Haoyu Wang, Zeyu Qin, Li Shen, Xueqian Wang, Minhao Cheng, Dacheng Tao · PDF
  37. MALT: Improving Reasoning with Multi-Agent LLM Training

    Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das, Rafael Rafailov, Ivan Laptev, Philip Torr, Fabio Pizzati, Ronald Clark, Christian Schroeder de Witt · PDF
  38. MetaSC: Test-Time Safety Specification Optimization for Language Models

    Victor Gallego · PDF
  39. Mitigating Short Board Effect via Dynamic Reward Balancing in Multi-reward LLM Optimization

    Nuo Chen, Yufei Gao, Yongnan Jin, Yan Hu, Anningzhe Gao, Lingyong Yan, Benyou Wang · PDF
  40. MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge

    Yuntao Du, Kailin Jiang, Zhi Gao, Chenrui Shi, Zilong Zheng, Siyuan Qi, Qing Li · PDF
  41. Moral Intrinsic Rewards for Automated Alignment of LLM Agents

    Elizaveta Tennant, Stephen Hailes, Mirco Musolesi · PDF
  42. MPAW: Multi-Preference Alignment through Weak Model Collaboration for Efficient and Flexible LLM Decoding

    Nuo Chen, GUOJUN XIONG, Bingsheng He · PDF
  43. Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers (Abridged)

    Shalev Lifshitz, Sheila A. McIlraith, Yilun Du · PDF
  44. Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage

    Zhi Gao, Bofei Zhang, Pengxiang Li, Xiaojian Ma, Tao Yuan, Yue Fan, Yuwei Wu, Yunde Jia, Song-Chun Zhu, Qing Li · PDF
  45. Multi-Turn Code Generation Through Single-Step Rewards

    Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen, Alexander M Rush, Wenting Zhao, Sanjiban Choudhury · PDF
  46. Natural Language Reinforcement Learning

    Xidong Feng, Bo Liu, Ziyu Wan, Haotian Fu, Girish A. Koushik, Zhiyuan Hu, Mengyue Yang, Ying Wen, Jun Wang · PDF
  47. NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild

    Shikhar Murty, Hao Zhu, Dzmitry Bahdanau, Christopher D Manning · PDF
  48. OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning

    Jiawei Zhou, Lei Chen · PDF
  49. Optimizing Test-Time Compute via Meta Reinforcement Finetuning

    Yuxiao Qu, Matthew Y. R. Yang, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov · PDF
  50. Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

    Max Sobol Mark, Tian Gao, Georgia Gabriela Sampaio, Mohan Kumar Srirama, Archit Sharma, Chelsea Finn, Aviral Kumar · PDF
  51. Preference Tree Optimization: Enhancing Goal-Oriented Dialogue with Look-Ahead Simulations

    Lior Baruch, Moshe Butman, Kfir Bar, Doron Friedman · PDF
  52. ReSL: Enhancing Deep Clustering Through Reset-based Self-Labeling

    Andrii Shkabrii, Timo Klein, Lukas Miklautz, Sebastian Tschiatschek, Claudia Plant · PDF
  53. RMBoost: Reward Model Training With Preference-Conditional Multi-Aspect Synthetic Data Generation

    Jiaming Shen, Ran Xu, Yennie Jun, Zhen Qin, Tianqi Liu, Carl Yang, Yi Liang, Simon Baumgartner, Michael Bendersky · PDF
  54. Safety is Essential for Responsible Open-Ended Systems

    Ivaxi Sheth, Jan Wehner, Sahar Abdelnabi, Ruta Binkyte, Mario Fritz · PDF
  55. Scalable Thompson Sampling via Ensemble++

    Yingru Li, Jiawei Xu, Baoxiang Wang, Zhi-Quan Luo · PDF
  56. Scaling Flaws of Verifier-guided Search in Mathematical Reasoning

    Fei Yu, Yingru Li, Benyou Wang · PDF
  57. Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension

    Xiyao Wang, Zhengyuan Yang, Linjie Li, Hongjin Lu, Yuancheng Xu, Chung-Ching Lin, Kevin Lin, Furong Huang, Lijuan Wang · PDF
  58. SCOPE: Improving LLM Conversations with Efficient Semantic Space Planning

    Zhiliang Chen, Xinyuan Niu, Chuan-Sheng Foo, Bryan Kian Hsiang Low · PDF
  59. Self-Correcting Self-Consuming Loops For Generative Model Training

    Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong HSU, Calvin Luo, Yonglong Tian, Chen Sun · PDF
  60. Self-correction for OOD generalization

    Vanya Bannihatti Kumar, Abhinav Sukumar Rao, Aditi Raghunathan · PDF
  61. Self-Improving Diffusion Models With Synthetic Data

    Sina Alemohammad, Ahmed Imtiaz Humayun, Shruti Agarwal, John Collomosse, Richard Baraniuk · PDF
  62. Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges

    Nayoung Lee, Ziyang Cai, Avi Schwarzschild, Kangwook Lee, Dimitris Papailiopoulos · PDF
  63. Self-Taught Self-Correction for Small Language Models

    Viktor Moskvoretskii, Chris Biemann, Irina Nikishina · PDF
  64. Solving Robotic Tasks via Self-Adapting Improvement Loops with Internet Video Knowledge

    Calvin Luo, Zilai Zeng, Yilun Du, Chen Sun · PDF
  65. Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

    Alisia Maria Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Yu, Jason E Weston, Jakob Nicolaus Foerster, Roberta Raileanu, Maria Lomeli · PDF
  66. Think, Prune, Train, Improve: Scaling Reasoning Without Scaling Models

    Caia Costello · PDF
  67. Towards Internet-Scale Training For Agents

    Brandon Trabucco, Gunnar A Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov · PDF
  68. Training a Generally Curious Agent

    Fahim Tajwar, Yiding Jiang, Abitha Thankaraj, Sumaita Sadia Rahman, J Zico Kolter, Jeff Schneider, Ruslan Salakhutdinov · PDF
  69. Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

    Wei Yao, Wenkai Yang, Ziqiao Wang, Yankai Lin, Yong Liu · PDF
  70. Value-Based Deep RL Scales Predictably

    Oleh Rybkin, Michal Nauman, Preston Fu, Charlie Victor Snell, Pieter Abbeel, Sergey Levine, Aviral Kumar · PDF
  71. Vision-Language Model Dialog Games for Self-Improvement

    Ksenia Konyushkova, Christos Kaplanis, Serkan Cabi, Misha Denil · PDF
  72. VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making

    Jake Grigsby, Yuke Zhu, Michael S Ryoo, Juan Carlos Niebles · PDF
  73. Yes, Q-learning Helps Offline In-Context RL

    Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov · PDF