ICLR 2026 Past Other

ICLR 2026 Workshop on AI with Recursive Self-Improvement

ICLR 2026 Workshop RSI

Submission deadline
Feb 11, 2026, 11:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (110)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Framework for Prompt Optimization and Translation Across Foundation Models

    Abhinav Shankaranarayanan Venkataraman, Athanasios Nikolakopoulos, Vishwanath Kumaraswamy, Tao Zhang, Sarath Chander, Rohit Saboo, Suleiman A. Khan · PDF
  2. A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula

    Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei · PDF
  3. ACE: Self-Evolving LLM Coding Framework Adversarial Unit Test Generation and Preference Optimization

    Yixu Huang, Xinglei Yu, zhongyu wei · PDF
  4. Actor-Curator: Scalable Policy-driven Curriculum Learning for RL Post-Training

    Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Wei Cheng, Santiago Paternain, Philip S. Yu, Yisong Yue · PDF
  5. Adaptive Decoding via Test-Time Policy Learning for Self-Improving Generation

    Asmita Bhardwaj, Yuya Jeremy Ong, Eelaaf Zahid, Basel Shbita · PDF
  6. Adaptive Meta-Curriculum for Test-Time Self-Improvement

    Kaustubh S. Bukkapatnam, Aarav Lala, Laksh Patel · PDF
  7. Agent0-VL: Exploring Self-Evolving Agent for Tool-Integrated Vision-Language Reasoning

    Jiaqi Liu, Kaiwen Xiong, Peng Xia, Yiyang Zhou, Haonian Ji, Lu Feng, Siwei Han, Mingyu Ding, Huaxiu Yao · PDF
  8. Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

    Peng Xia, Kaide Zeng, Jiaqi Liu, Can Qin, Fang Wu, Yiyang Zhou, Caiming Xiong, Huaxiu Yao · PDF
  9. Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

    Qizheng Zhang, Changran Hu, Shubhangi Upasani, Boyuan Ma, Fenglu Hong, Vamsidhar Kamanuru, Jay Rainton, Chen Wu, Mengmeng Ji, Hanchen Li, Urmish Thakker, James Zou, Kunle Olukotun · PDF
  10. Aligned but Stereotypical? Understanding and Mitigating Social Bias in LLM-Driven Text-to-Image Models

    NaHyeon Park, Na Min An, Kunhee Kim, Soyeon Yoon, Jiahao Huo, Hyunjung Shim · PDF
  11. AlphaApollo: A System for Deep Agentic Reasoning

    Zhanke Zhou, Chentao Cao, Xiao Feng, Xuan Li, Zongze Li, Xiangyu Lu, Jiangchao Yao, Weikai Huang, Tian Cheng, Jianghangfan Zhang, Tangyu Jiang, Linrui Xu, Yiming Zheng, Brando Miranda, Tongliang Liu, Sanmi Koyejo, Masashi Sugiyama, Bo Han · PDF
  12. Anchored Self-Play for Code Repair

    Caroline Choi, Zeyneb N. Kaya, Shirley Wu, Tengyu Ma, Tatsunori Hashimoto, Ludwig Schmidt · PDF
  13. AUTOHARNESS: IMPROVING LLM AGENTS BY AUTOMATICALLY SYNTHESIZING A CODE HARNESS

    Xinghua Lou, Miguel Lazaro-Gredilla, Antoine Dedieu, Carter Wendelken, Wolfgang Lehrach, Kevin Murphy · PDF
  14. Beyond Solving: A Closer Look at LLMs as Solution Verifiers

    Jack Lu, Ryan Teehan, Jinran Jin, Mengye Ren · PDF
  15. Build, Judge, Optimize: A Blueprint for Continuous Improvement of Multi-Agent Consumer Assistants

    Alejandro Breen Herrera, Aayush Sheth, Steven Guanxing Xu, Zhucheng Zhan, Charles Wright, Hong Tai Wei, Marcus Yearwood, Sudeep Das, Danny Nightingale, Meg Watson, Charles Pollnow V · PDF
  16. Can Current Language Models Close the Discovery to Application Loop?

    Zhou Ziheng, Huacong Tang, Jinyuan Zhang, Haowei Lin, Bangcheng Yang, Qian Long, Fang Sun, Yizhou Sun, Yitao Liang, Ying Nian Wu, Demetri Terzopoulos, Xiaofeng Gao · PDF
  17. Can Language Models Discover Scaling Laws?

    Haowei Lin, Haotian Ye, Wenzheng Feng, Quzhe Huang, Yujun Li, Hubert Lim, Zhengrui Li, Jianzhu Ma, Yitao Liang, James Zou · PDF
  18. CausalEvolve: Towards Open-Ended Discovery with Causal Scratchpad

    Yongqiang Chen, Chenxi Liu, Zhenhao Chen, Tongliang Liu, Bo Han, Kun Zhang · PDF
  19. CircuitBuilder: From Polynomials to Circuits via Reinforcement Learning

    Weikun Zhang, Rohan Pandey, Bhaumik Mehta, Kaijie Jin, Naomi Morato, Archit Ganapule, Michael Ruofan Zeng, Jarod Alper · PDF
  20. Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

    Dulhan Jayalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten · PDF
  21. Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping

    Dwip Dalal, Gautam Vashishtha, Utkarsh Mishra, Jeonghwan Kim, Madhav Kanda, Hyeonjeong Ha, Svetlana Lazebnik, Heng Ji, Unnat Jain · PDF
  22. Contextual Drag: How Errors in the Context Affect LLM Reasoning

    Yun Cheng, Xingyu Zhu, Haoyu Zhao, Sanjeev Arora · PDF
  23. Contrastive Self-Refinement for Low-Cost Adaptation in Real-World Text-to-SQL

    Yuyang Wu, xin shi, Wei Long, Cam-Tu Nguyen, Xiaoliang Wang · PDF
  24. Correct Reasoning Paths Visit Shared Decision Pivots

    Dongkyu Cho, Amy B.Z. Zhang, Bilel Fehri, Sheng Wang, Rumi Chunara, Hengrui Cai, Rui Song · PDF
  25. CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction

    Huaiqian Liu, Chak Ho Huang, Shiu-hong Kao, Yu-Wing Tai, Chi-Keung Tang · PDF
  26. Cross-Family Speculative Prefill: Training-Free Long-Context Compression with Small Draft Models

    Shubhangi Upasani, Guangtao Wang, Ravi Shanker Raju, Bo Li, Urmish Thakker, Mengmeng Ji, John Long, Chen Wu · PDF
  27. Depth vs Recursion: Outperforming Transformers in Jigsaw Reconstruction

    Artemii Miasoedov, Timofey Brayko, Rustam A. Lukmanov · PDF
  28. Differentiable Evolutionary Reinforcement Learning

    Sitao Cheng, Tianle Li, Xuhan Huang, Xunjian Yin, Difan Zou · PDF
  29. Discover the distinguishing and effective reasoning patterns among LLMs via an LLM

    Yida Chen, Yuning Mao, Xianjun Yang, Suyu Ge, Shengjie Bi, Lijuan Liu, Saghar Hosseini, Liang Tan, Yixin Nie, Shaoliang Nie · PDF
  30. Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

    Ferdinand Kapl, Emmanouil Angelis, Tobias Höppe, Kaitlin Maile, Johannes von Oswald, Nino Scherrer, Stefan Bauer · PDF
  31. Duel-Evolve: Reward-Free Test-Time Scaling via LLM Self-Preferences

    Sweta Karlekar, Carolina Zheng, Nicolas Beltran-Velez, Magnus Saebo, Shuyang Yu, Michal Kucer, John Bowlan, David Blei · PDF
  32. Dynamic Noise Preference Optimization: Self-Improvement of Large Language Models with Self-Synthetic Data

    Haoyan Yang, Le Huy Khiem, Ting Hua, Shangqian Gao, Binfeng Xu, Zheng Tang, Jie Xu, Nitesh V. Chawla, Hongxia Jin, Vijay Srinivasan · PDF
  33. Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning

    Seijin Kobayashi, Yanick Schimpf, Maximilian Schlegel, Angelika Steger, Maciej Wolczyk, Johannes von Oswald, Nino Scherrer, Kaitlin Maile, Guillaume Lajoie, Blake Aaron Richards, Rif A. Saurous, James Manyika, Blaise Aguera y Arcas, Alexander Meulemans, Joao Sacramento · PDF
  34. Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

    Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu · PDF
  35. ESDAE: Evaluating Synthetic Data for Agent Evaluation

    Shuaiqi Wang, Aadyaa Maddi, Zinan Lin, Giulia Fanti · PDF
  36. Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?

    Thibaud Gloaguen, Niels Mündler, Mark Niklas Mueller, Veselin Raychev, Martin Vechev · PDF
  37. Federated Agent Reinforcement Learning

    Canyu Chen, Kangyu Zhu, Zhaorun Chen, Zhanhui Zhou, Shizhe Diao, Yiping Lu, Tian Li, Manling Li, Dawn Song · PDF
  38. Federation over Text

    Dixi Yao, Tahseen Rabbani, Tian Li · PDF
  39. Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison

    Yoonho Lee, Joseph Boen, Chelsea Finn · PDF
  40. From Growing to Looping: A Unified View of Iterative Computation in LLMs

    Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile, Johannes von Oswald, Stefan Bauer · PDF
  41. GASP: Guided Asymmetric Self-Play For Coding LLMs

    Swadesh Jana, Cansu Sancaktar, Tomáš Daniš, Georg Martius, Antonio Orvieto, Pavel Kolev · PDF
  42. Generative Recursive Reasoning Models

    Junyeob Baek, Mingyu Jo, Minsu Kim, Yoshua Bengio, Sungjin Ahn · PDF
  43. Improved Iterative Refinement for Chart-to-Code Generation via Structured Instruction

    Chengzhi Xu, Yuyang Wang, Lai Wei, Weiran Huang, Lichao Sun · PDF
  44. In-Context Adaptation

    Yongqiang Chen, Chenxi Liu, Qingyi Guo, Bo Han, Kun Zhang · PDF
  45. Inference-Time Scaling in Diffusion Models through Iterative Partial Refinement

    Taegu Kang, Jaesik Yoon, Sungjin Ahn · PDF
  46. Intelligent Robot Manipulation Requires Self-Directed Learning

    Li Chen, Chonghao Sima, Kashyap Chitta, Antonio Loquercio, Ping Luo, Yi Ma, Hongyang Li · PDF
  47. Interestingness as an Inductive Heuristic for Future Compression Progress

    Vincent Herrmann, Jürgen Schmidhuber · PDF
  48. Just Enough Learning: GRPO-Guided Controllers for Hyperparameter Sweeps

    Justin H Lee, Henry Ndubuaku · PDF
  49. Knowledge is Not Enough: Injecting RL Skills for Continual Adaptation

    Pingzhi Tang, Yiding Wang, Muhan Zhang · PDF
  50. Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework

    Xin He, Liangliang You, Hongduan Tian, Bo Han, Ivor Tsang, Yew-Soon Ong · PDF
  51. Language Self-Play For Data-Free Training

    Jakub Grudzien Kuba, Mengting Gu, Qi Ma, Yuandong Tian, Vijai Mohan, Chun-cheng Jason Chen · PDF
  52. Language-Guided Expertise Evolution for Protein Optimization

    Xingyue Liu, Zijie Xing, Runze Wang, Luoming Hu, Yanming Shen · PDF
  53. Learning to Continually Learn via Meta-learning Agentic Memory Designs

    Yiming Xiong, Shengran Hu, Jeff Clune · PDF
  54. Learning to Evolve: Scaling Open-Ended Discovery with Relative-Progress RL

    Xuan Li, Zhanke Zhou, Zongze Li, Jiangchao Yao, Bo Han · PDF
  55. Learning What to Learn: Curriculum Curation for Test-Time Agent Learning

    Qizheng Zhang, Sherry Ruan, Shubhangi Upasani, Fenglu Hong, Changxiu Ji, Changran Hu, Bo Li, Hanchen Li, Kunle Olukotun · PDF
  56. Leveraging Suboptimal and Noisy Trajectories for Goal-Conditional Offline RL

    Ningze Zhong, Yi Wang, Bo Wu · PDF
  57. LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

    Nikhil Abhyankar, Parshin Shojaee, Chandan K. Reddy · PDF
  58. Log-Augmented Generation: Scaling Test-Time Reasoning with Reusable Computation

    Peter Baile Chen, Yi Zhang, Dan Roth, Samuel Madden, Jacob Andreas, Mike Cafarella · PDF
  59. MAPPA: Scaling Multiagent Systems with Process Rewards

    Ed Li, Junyu Ren, Cat Yan · PDF
  60. MimicAgent: Learning Quadruped Skills via Text-to-Trajectory Generation

    Narayanan Palghat Parameswaran, Lucky Kant Nayak, Neehar Peri, Deva Ramanan · PDF
  61. OMEGA: Optimizing Machine learning by Evaluating Generated Algorithms

    Annika Kaul Singh, Jeremy Nixon · PDF
  62. One-Step Video Depth Estimation via Self-Distillation

    Wenqing Cui, Zhenyu Li, Jian Shi, Shariq Farooq Bhat, Peter Wonka · PDF
  63. Orthogonal Gradient Projection for Continual LLM Unlearning

    Juan Belieni, Ana Carolina Erthal, Eliezer de Souza da Silva, Diego Mesquita · PDF
  64. POLARIS: A GODEL AGENT FRAMEWORK FOR SMALL LANGUAGE MODELS THROUGH EXPERIENCE ABSTRACTED POLICY REPAIR

    ADITYA NAMDEV KAKADE, Vivek Srivastava, Shirish Karande · PDF
  65. PostTrainBench: Can LLM Agents Automate LLM Post-Training?

    Ben Rank, Hardik Bhatnagar, Ameya Prabhu, Shira Eisenberg, Karina Nguyen, Matthias Bethge, Maksym Andriushchenko · PDF
  66. Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations

    Chengzhi Liu, Yuzhe YANG, Kaiwen Zhou, Zhen Zhang, Yue Fan, Yanan Xie, Peng Qi, Xin Eric Wang · PDF
  67. Real-Time Procedural Learning From Experience for AI Agents

    Dasheng Bi, Yubin Hu, Mohammed N Nasir · PDF
  68. Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

    Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Minrui Xu, Yuge Zhang, Weiqing Liu, Jiang Bian · PDF
  69. Reasoning Cache: Learning to Extrapolate to Long Lengths via Short-Length RL

    Ian Wu, Yuxiao Qu, Amrith Setlur, Aviral Kumar · PDF
  70. Reasoning Within the Mind: Dynamic Multimodal Interleaving in Latent Space

    Chengzhi Liu, Yuzhe YANG, Yue Fan, Qingyue Wei, Sheng Liu, Xin Eric Wang · PDF
  71. Reference-Guided Machine Unlearning

    Jonas Mirlach, Sonia Laguna, Julia E Vogt · PDF
  72. Refining Large Language Models with Self-Generated Data Through Iterative Training

    Xiao Hu, Muxi Diao, Jizhi Zhang, Xin Chen, Xianyuan Zhan · PDF
  73. Residual Off-Policy RL for Finetuning Behavior Cloning Policies

    Lars Ankile, Zhenyu Jiang, Rocky Duan, Guanya Shi, Pieter Abbeel, Anusha Nagabandi · PDF
  74. Rethinking Machine Unlearning: Models Designed to Forget via Key Deletion

    Sonia Laguna, Jorge da Silva Gonçalves, Moritz Vandenhirtz, Alain Ryser, Irene Cannistraci, Julia E Vogt · PDF
  75. Reward Hacking in Self-Improving Code Agents

    Bingchen Zhao, Dhruv Srikanth, Yuxiang Wu, zhengyao jiang · PDF
  76. RFTF: Reinforcement Fine-tuning for Vision-language-action Models with Temporal Feedback

    Junyang Shu, Zhiwei Lin, Yongtao Wang · PDF
  77. SAGE: Self-play Adversarial Games Enhance Large Language Model Reasoning Capabilities

    Saraswathy Amjith, Michael X. Wang, Jayson Lynch, Hans Gundlach, Neil Thompson · PDF
  78. SAHOO: Safeguarded Alignment for High-Order Optimization Objectives in Recursive Self-Improvement

    Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary · PDF
  79. Self-Adapting Agents for Automating Research Coding Workflows

    Balaji Dinesh Gangireddi, Aniketh Garikaparthi, Manasi Patwardhan, Arman Cohan · PDF
  80. Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation

    Jinbang Huang, Zhiyuan Li, Yuanzhao Hu, Zhanguang Zhang, Mark Coates, Xingyue Quan, Yingxue Zhang · PDF
  81. Self-EvolveRec: Self-Evolving Recommender Systems with LLM-based Directional Feedback

    Sein Kim, Sangwu Park, HongSeok Kang, Wonjoong Kim, Jimin Seo, Yeonjun In, Kanghoon Yoon, Chanyoung Park · PDF
  82. Self-Evolving Language Models through Co-evolved Discriminative Rubrics

    Shuyue Stella Li, Rui Xin, Yike Wang, Teng Xiao, Rulin Shao, Zoey Hao, Melanie Sclar, Faeze Brahman, Pang Wei Koh, Yulia Tsvetkov · PDF
  83. Self-Improvement via Fast Tree-search

    Xinghong Fu, Aravinth Kulanthaivelu, Yutaro Yamada · PDF
  84. Self-Improving Clinical Reasoning via Textual Gradients

    Sean Wu, Fabien Scalzo, Ira Kurtz · PDF
  85. Self-Improving Vision-Language-Action Models with Data Generation via Residual RL

    Wenli Xiao, Haotian Lin, Andy Peng, Haoru Xue, Tairan He, Zhengyi Luo, Yuqi Xie, Fengyuan Hu, Linxi Fan, Guanya Shi, Yuke Zhu · PDF
  86. Self-Improving VLM Judges Without Human Annotations

    Inna Wanyin Lin, Yushi Hu, Shuyue Stella Li, Scott Geng, Pang Wei Koh, Luke Zettlemoyer, Tim Althoff, Marjan Ghazvininejad · PDF
  87. Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks

    Abhranil Chandra, Ayush Agrawal, Arian Hosseini, Sebastian Fischmeister, Rishabh Agarwal, Navin Goyal, Aaron Courville · PDF
  88. Simple Baselines are Competitive with Code Evolution

    Yonatan Gideoni, Sebastian Risi, Yarin Gal · PDF
  89. SimpleMem: Efficient Lifelong Memory for LLM Agents

    Jiaqi Liu, Yaofeng Su, Peng Xia, Siwei Han, Zeyu Zheng, Cihang Xie, Mingyu Ding, Huaxiu Yao · PDF
  90. SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

    Peng Xia, Jianwen Chen, Hanyang Wang, Jiaqi Liu, Kaide Zeng, Yu Wang, Siwei Han, Yiyang Zhou, Xujiang Zhao, Haifeng Chen, Zeyu Zheng, Cihang Xie, Huaxiu Yao · PDF
  91. Soft Mellowmax Monte Carlo Planning

    Danilo Vucetic, Gauthier Gidel · PDF
  92. Structure Enables Effective Self-Localization of Errors in LLMs

    Ankur Samanta, Akshayaa Magesh, Ayush Jain, Kavosh Asadi, Youliang Yu, Daniel Jiang, Boris Vidolov, Kaveh Hassani, Paul Sajda, Jalaj Bhandari, Yonathan Efroni · PDF
  93. TamperBench: A Systematic Framework to Stress-Test LLM Safety Under Fine-Tuning and Tampering

    Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla · PDF
  94. TangramSR: A Benchmark for Recursive Self-Improvement In Continuous Geometric Reasoning

    Yikun Zong, Cheston Tan · PDF
  95. Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

    Shobhita Sundaram, John Quan, Ariel Kwiatkowski, Kartik Ahuja, Yann Ollivier, Julia Kempe · PDF
  96. Test-Time Adaptation via Many-Shot Prompting: Benefits, Limits, and Pitfalls

    Shubhangi Upasani, Chen Wu, Jay Rainton, Changran Hu, Qizheng Zhang, Bo Li, Urmish Thakker · PDF
  97. Test-Time Meta-Adaptation with Self-Synthesis

    Zeyneb N. Kaya, Nick Rui · PDF
  98. Test-Time Self-Distillation

    Jonas Hübotter, Frederike Lübeck, Lejs Deen Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, Andreas Krause · PDF
  99. TextBO: Bayesian Optimization in Language Space for Eval-Efficient Self-Improving AI

    Enoch H. Kang, Hema Yoganarasimhan · PDF
  100. Theory-Driven Modeling and LLM-Guided Evolution for Power System Scheduling

    Zhangwushuaijun, Yikun Zong, YueYishuang, Kun Yuan · PDF
  101. Tiny Autoregressive Recursive Models

    Paulius Rauba, Claudio Fanconi, Mihaela van der Schaar · PDF
  102. Towards Execution-Grounded Automated AI Research

    Chenglei Si, Zitong Yang, Yejin Choi, Emmanuel Candes, Diyi Yang, Tatsunori Hashimoto · PDF
  103. Unlocking Intrinsic Self-Reflection for LLM Preference Policy Optimization

    Yu Li, Tian Lan, Zhengling Qi · PDF
  104. Unrolled Policy Iteration for Tiny Recursive Models

    Bahram Behzadian, Brett Daley, Gopeshh Subbaraj, Houssam Nassif · PDF
  105. Verifying the Verifiers: Failure Attribution for Agentic Benchmark Diagnostics and Training Data Curation

    Jesse Hu, Pratyush Shukla, Ke Huang · PDF
  106. VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use

    Dongfu Jiang, Yi Lu, Zhuofeng Li, Zhiheng Lyu, Ping Nie, Haozhe Wang, Alex Su, Hui Chen, Kai Zou, Chao Du, Tianyu Pang, Wenhu Chen · PDF
  107. Vision-Guided Iterative Refinement for Frontend Code Generation

    Hannah Sansford, Derek H.C. Law, Wei Liu, Abhishek Tripathi, Niresh Agarwal, Gerrit J.J. Van den Burg · PDF
  108. VLAW: Iterative Co-Improvement of Vision-Language-Action Policy and World Model

    Yanjiang Guo, Tony Lee, Lucy Xiaoyang Shi, Jianyu Chen, Percy Liang, Chelsea Finn · PDF
  109. World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

    Yuejiang Liu, Fan Feng, Lingjing Kong, Weifeng Lu, Jinzhou Tang, XiangCheng Zhang, Kun Zhang, Kevin Murphy, Chelsea Finn, Yilun Du · PDF
  110. Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning

    Shangzhe Li, Xuchao Zhang, Chetan Bansal, Weitong Zhang · PDF