NeurIPS 2025 Past Safety & alignmentReinforcement learningTheory

NeurIPS 2025 Workshop: Second Workshop on Aligning Reinforcement Learning Experimentalists and Theorists

ARLET

Submission deadline
Sep 3, 2025, 13:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (101)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Regularized Actor-Critic Algorithm for Bi-Level Reinforcement Learning

    Sihan Zeng, Sujay Bhatt, Sumitra Ganesh, Alec Koppel · PDF
  2. A Reinforcement Learning Approach for Health-Behavioural Recommendations to Reduce Cancer Risk

    Gloria Desideri, Andrés Pasinetti · PDF
  3. A Theoretical Analysis of Information Bottlenecks for Zero-Shot Transfer in Reinforcement Learning

    Kenzo Clauw, Daniel Polani, Nicola Catenacci Volpi · PDF
  4. Accelerating Diffusion Planners in Offline RL via Reward-Aware Consistency Trajectory Distillation

    Xintong Duan, Yutong He, Fahim Tajwar, Ruslan Salakhutdinov, J Zico Kolter, Jeff Schneider · PDF
  5. Active Learning for Stochastic Contextual Linear Bandits

    Emma Brunskill, Ishani Karmarkar, Zhaoqi Li · PDF
  6. Ada-Diffuser: Latent-Aware Adaptive Diffusion for Decision-Making

    Fan Feng, Selena Ge, Minghao Fu, Zijian Li, Yujia Zheng, Zeyu Tang, Yingyao Hu, Biwei Huang, Kun Zhang · PDF
  7. All Roads Lead to Likelihood: The Value of RL in Fine-Tuning

    Gokul Swamy, Sanjiban Choudhury, Wen Sun, Steven Wu, Drew Bagnell · PDF
  8. Automatic Reward Shaping from Multi-Objective Human Heuristics

    Yuqing Xie, Jiayu Chen, Chao Yu, Yu Wang · PDF
  9. Bandit and Delayed Feedback in Online Structured Prediction

    Yuki Shibukawa, Taira Tsuchiya, Shinsaku Sakaue, Kenji Yamanishi · PDF
  10. Bandit Learning on Dynamic Graphs

    Amit Kiran Rege, Sourav Chakraborty, Lijun Chen, Claire Monteleoni · PDF
  11. Behavior-Aware Off-Policy Selection in High-Stake Human-Centric Environments

    Ge Gao, Aishwarya Mandyam, Joy He-Yueya, Min Chi, Emma Brunskill · PDF
  12. Beyond Marginals: Capturing Correlated Returns through Joint Distributional Reinforcement Learning

    Ege Can Kaya, Mahsa Ghasemi, Abolfazl Hashemi · PDF
  13. Beyond RLHF and NLHF: Population-Proportional Alignment under an Axiomatic Framework

    Kihyun Kim, Jiawei Zhang, Asuman E. Ozdaglar, Pablo A. Parrilo · PDF
  14. Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

    Michal Nauman, Marek Cygan, Carmelo Sferrazza, Aviral Kumar, Pieter Abbeel · PDF
  15. Bootstrap Ensemble Uncertainty for State-Adaptive Regularization in Offline Reinforcement Learning

    Rishav Rishav, Vincent Michalski, Samira Ebrahimi Kahou · PDF
  16. Compute-Optimal Scaling for Value-Based Deep RL

    Preston Fu, Oleh Rybkin, Zhiyuan Zhou, Michal Nauman, Pieter Abbeel, Sergey Levine, Aviral Kumar · PDF
  17. Constrained Linear Thompson Sampling

    Aditya Gangrade, Venkatesh Saligrama · PDF
  18. Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning

    Uri Sherman, Tomer Koren, Yishay Mansour · PDF
  19. Data-Dependent Regret Bounds for MABs with Constraints

    Gianmarco Genalti, Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti · PDF
  20. Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL

    Mahsa Bastankhah, Grace Liu, Dilip Arumugam, Thomas L. Griffiths, Benjamin Eysenbach · PDF
  21. DHP: Discrete Hierarchical Planning for HRL Agents

    Shashank Sharma, Janina Anna Hoffmann, Vinay P. Namboodiri · PDF
  22. Efficient Adversarial Attacks on High-dimensional Offline Bandits

    Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah · PDF
  23. Efficient Restarts in Non-Stationary Model-Free Reinforcement Learning

    Hiroshi Nonaka, Simon Ambrozak, Sofia R. Miskala-Dinc, Amedeo Ercole, Aviva Prins · PDF
  24. Efficiently Robust In-Context Reinforcement Learning with Adversarial Generalization and Adaptation

    Juncheng Dong, Hao-Lun Hsu, Miroslav Pajic, Vahid Tarokh · PDF
  25. Enhancing Diversity in Large Language Models via Determinantal Point Processes

    Yilei Chen, Lorenz Wolf, Souradip Chakraborty, Aldo Pacchiano, Ioannis Paschalidis · PDF
  26. Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs

    Max Weltevrede, Caroline Horsch, Matthijs T. J. Spaan, Wendelin Boehmer · PDF
  27. Exploring Time-Step Size in Reinforcement Learning for Sepsis Treatment

    Yingchuan Sun, Shengpu Tang · PDF
  28. Fictive Learning Augments Model-Based Reinforcement Learning in the Two-Step Task

    Jianning Chen, Masakazu Taira, Kenji Doya · PDF
  29. floq: Training Critics via Flow-Matching for Scaling Compute in Value-Based RL

    Bhavya Kumar Agrawalla, Michal Nauman, Khush Agrawal, Aviral Kumar · PDF
  30. From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

    Liad Erez, Tomer Koren · PDF
  31. Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update

    Yu-Jie Zhang, Sheng-An Xu, Peng Zhao, Masashi Sugiyama · PDF
  32. Generating Auxiliary Tasks with Reinforcement Learning

    Judah Goldfeder, Matthew So, Hod Lipson · PDF
  33. Horizon Reduction Makes RL Scalable

    Seohong Park, Kevin Frans, Deepinder Mann, Benjamin Eysenbach, Aviral Kumar, Sergey Levine · PDF
  34. How to Provably Improve Return Conditioned Supervised Learning?

    Zhishuai Liu, Yu Yang, Ruhan Wang, Pan Xu, Dongruo Zhou · PDF
  35. Human-Inspired Multi-Level Reinforcement Learning

    Mingkang Wu, Devin White, Vernon Lawhern, Nicholas R Waytowich, Yongcan Cao · PDF
  36. Hybrid Training for Enhanced Multi-task Generalization in Multi-agent Reinforcement Learning

    Mingliang Zhang, Sichang Su, Chengyang He, Guillaume Adrien Sartoretti · PDF
  37. Idea: Bridging Theoretical Fairness Definitions with Multi-Agent Coordination in the Real World

    Promise Osaine Ekpo, Brian La, Thomas Wiener, Saesha Agarwal, Arshia Agrawal, Gonzalo Gonzalez-Pumariega, Lekan P Molu, Angelique Taylor · PDF
  38. Idea: Fairness Constraints as Reliability Guarantees for RLHF Reward Models

    Advay Samnerkar, Doelle Bhattacharya, Kailash Ranganathan, Ashwinee Panda, Kevin Zhu · PDF
  39. Idea: Sharpe Ratio-Optimized Thompson Sampling for Risk-Aware Online Learning

    SABRINA KHURSHID, MOHAMMAD TAHA SHAH, Gourab Ghatak · PDF
  40. Improved Regret Bounds for Linear Bandits with Heavy-Tailed Rewards

    Artin Tajdini, Jonathan Scarlett, Kevin Jamieson · PDF
  41. Improved Training Mechanisms for Reinforcement Learning via Online Model Selection

    Aida Afshar, Aldo Pacchiano · PDF
  42. Improving Value Estimation Critically Enhances Vanilla Policy Gradient

    Tao Wang, Sicun Gao · PDF
  43. Intent‑Based Reward Inference for Value‑Aligned Reinforcement Learning

    Md Masudur Rahman, Juan Wachs · PDF
  44. Large Language Model-Enhanced RL for Diverse and Novel Recommendations

    Jiin Woo, Alireza Bagheri Garakani, Tianchen Zhou, Zhishen Huang, Yan Gao · PDF
  45. Learning a Pessimistic Reward in RLHF: KL Regularization is Not Necessary

    Yinglun Xu, Hangoo Kang, Tarun Suresh, Yuxuan Wan, Gagandeep Singh · PDF
  46. Linear Dynamics meets Linear MDPs: Closed-Form Optimal Policies via Reinforcement Learning

    Abed AlRahman Al Makdah, Oliver Kosut, Lalitha Sankar, Shaofeng Zou · PDF
  47. LLM-Driven Policy Diffusion: Enhancing Generalization in Offline Reinforcement Learning

    Hanping Zhang, Yuhong Guo · PDF
  48. Long-Horizon Model-Based Offline Reinforcement Learning Without Conservatism

    Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon · PDF
  49. MOBODY: Model-Based Off-Dynamics Offline Reinforcement Learning

    Yihong Guo, Yu Yang, Pan Xu, Anqi Liu · PDF
  50. On the relation of bisimulation, model irrelevance, and corresponding regret bounds

    Alperen Tercan, Necmiye Ozay · PDF
  51. Open Problem: Order Optimal Regret Bounds for Non-Markovian Rewards

    Aya Shabbar · PDF
  52. Optimal Regret Bounds for Policy Optimization in Contextual Bandits

    Orin Levy, Yishay Mansour · PDF
  53. Optimistic Actor-Critic with Parametric Policies: Unifying Sample Efficiency and Practicality

    Max Qiushi Lin, Reza Asad, Kevin Tan, Haque Ishfaq, Csaba Szepesvari, Sharan Vaswani · PDF
  54. Oracle-RLAIF: An Improved Fine-Tuning Framework for Multi-modal Video Models through Reinforcement Learning from Ranking Feedback

    Derek Shi, Ruben Glatt, Christine Klymko, Shubham Mohole, Hongjun Choi, Shashank Kushwaha, Wesam A. Sakla, Felipe Leno da Silva · PDF
  55. Outcome-based Exploration for LLM Reasoning

    Yuda Song, Julia Kempe, Rémi Munos · PDF
  56. Policy Compatible Skill Incremental Learning via Lazy Learning Interface

    Daehee Lee, TaeYoon Kwack, Dongsu Lee, Wonje Choi, Honguk Woo · PDF
  57. Policy Gradient Guidance Enables Test Time Control

    Jianing Qi, Hao Tang, Zhigang Zhu · PDF
  58. Policy Optimization in CMDPs with Bandit Feedback: Learning with Stochastic and Adversarial Constraints

    Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti · PDF
  59. Policy Search via Bayesian Optimization with Temporal Difference Gaussian Processes

    Armin Lederer, Anuj Srivastava, Andreas Krause · PDF
  60. Policy Testing in Markov Decision Processes

    Kaito Ariu, Po-An Wang, Alexandre Proutiere, Kenshi Abe · PDF
  61. Principled Learning-to-Communicate in Cooperative MARL: An Information-Structure Perspective

    Xiangyu Liu, Haoyi You, Kaiqing Zhang · PDF
  62. Provably Efficient and Agile Randomized Q-Learning

    He Wang, Xingyu Xu, Yuejie Chi · PDF
  63. Quantile Reward Policy Optimization: Alignment with Pointwise Regression and Exact Partition Functions

    Simon Matrenok, Skander Moalla, Caglar Gulcehre · PDF
  64. Real-World Reinforcement Learning of Active Perception Behaviors

    Edward S. Hu, Jie Wang, Xingfang Yuan, Fiona Luo, Muyao Li, Gaspard Lambrechts, Oleh Rybkin, Dinesh Jayaraman · PDF
  65. Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback

    Orin Levy, Liad Erez, Alon Cohen, Yishay Mansour · PDF
  66. Replicable Reinforcement Learning with Linear Function Approximation

    Eric Eaton, Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell · PDF
  67. Revisiting Actor-Critic Methods in Discrete Action Off-Policy Reinforcement Learning

    Reza Asad, Reza Babanezhad Harikandeh, Sharan Vaswani · PDF
  68. Revisiting Mixture Policies in Entropy-Regularized Actor-Critic

    Jiamin He, Samuel Neumann, Jincheng Mei, Adam White, Martha White · PDF
  69. Reward Model Overoptimisation in Iterated RLHF

    Lorenz Wolf, Robert Kirk, Mirco Musolesi · PDF
  70. RL's Razor: Why On-Policy Reinforcement Learning Forgets Less

    Idan Shenfeld, Jyothish Pari, Pulkit Agrawal · PDF
  71. Robust Constrained Offline Reinforcement Learning with Linear Function Approximation

    Wenbin Wang, He Wang · PDF
  72. Robust Policy Gradient Optimization through Parameter Perturbation in Reinforcement Learning

    Md Masudur Rahman, Juan Wachs, Yexiang Xue · PDF
  73. Safe Exploration via Policy Priors

    Manuel Wendl, Yarden As, Manish Prajapat, Anton Pollak, Stelian Coros, Andreas Krause · PDF
  74. Safe Guaranteed Dynamics Exploration with Probabilistic Models

    Manish Prajapat, Johannes Köhler, Melanie Zeilinger, Andreas Krause · PDF
  75. Safe, Trust Region Policy Optimization for Constrained Reinforcement Learning

    Md Asifur Rahman, Risal Shahriar Shefin, Debashis Gupta, Sarra Alqahtani · PDF
  76. Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking

    Paria Rashidinejad, Yuandong Tian · PDF
  77. Scaling Offline RL via Efficient and Expressive Shortcut Models

    Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen, Bradley Guo, Owen Oertell, Gokul Swamy, Kianté Brantley, Wen Sun · PDF
  78. Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs

    Shulun Chen, Runlong Zhou, Zihan Zhang, Maryam Fazel, Simon Shaolei Du · PDF
  79. Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

    Bastien Dubail, Stefan Stojanovic, Alexandre Proutiere · PDF
  80. SpatialThinker: Reinforcing 3D Reasoning in Multimodal LLMs via Spatial Rewards

    Hunar Batra, Haoqin Tu, Hardy Chen, Yuanze Lin, Cihang Xie, Ronald Clark · PDF
  81. Speaking the Language of Teamwork: LLM-Guided Credit Assignment in Multi-Agent Reinforcement Learning

    Muhan Lin, Shuyang Shi, Yue Guo, Vaishnav Tadiparthi, Behdad Chalaki, Ehsan Moradi Pari, Simon Stepputtis, Woojun Kim, Joseph Campbell, Katia P. Sycara · PDF
  82. Spectral Collapse Drives Loss of Plasticity in Deep Continual Learning

    Naicheng He, Kaicheng Guo, Arjun Prakash, Saket Tiwari, Tyrone Serapio, Ruo Yu Tao, Amy Greenwald, George Konidaris · PDF
  83. Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game

    Barna Pásztor, Thomas Kleine Buening, Andreas Krause · PDF
  84. State Entropy Regularization for Robust Reinforcement Learning

    Uri Koren, Yonatan Ashlag, Mirco Mutti, Esther Derman, Pierre-Luc Bacon, Shie Mannor · PDF
  85. Steering Diffusion Policies with Value-Guided Denoising

    Hanming Ye · PDF
  86. Structure Matters: Dynamic Policy Gradient

    Sara Klein, Xiangyuan Zhang, Tamer Basar, Simon Weissmann, Leif Döring · PDF
  87. SUSD: Structured Unsupervised Skill Discovery through State Factorization

    Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah · PDF
  88. TARC: Time-Adaptive Robotic Control

    Arnav Sukhija, Lenart Treven, Jin Cheng, Florian Dorfler, Stelian Coros, Andreas Krause · PDF
  89. Test Time Risk Adaption with Mixture of Agents

    Mohamad Chehade, Amrit Singh Bedi, Souradip Chakraborty, Amy Zhang, Hao Zhu · PDF
  90. The Good, The Bad, and The Hybrid: A Reward Structure Showdown in Reasoning Models Training

    Subramanyam Sahoo · PDF
  91. The Minimax Complexity of Preference-Based Decision Making in Multi-Objective Reinforcement Learning

    Kalyan Cherukuri, Aarav Lala · PDF
  92. The Role of Preference Data and Unembeddings in the Convergence Rate of DPO

    Gayathri Chandran, Sai Soumya Nalli, Sruthi Gorantla, Amit Deshpande, Anand Louis · PDF
  93. Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

    Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar · PDF
  94. Towards Parameter-Free Temporal Difference Learning

    Yunxiang LI, Mark Schmidt, Reza Babanezhad Harikandeh, Sharan Vaswani · PDF
  95. Towards shutdownable agents via stochastic choice

    Elliott Thornley, Alexander Roman, Christos Ziakas, Louis Thomson, Leyton Ho · PDF
  96. Uncertainty-Aware Policy-Preserving Abstractions with Abstention for One-Shot Decisions

    Sandy Tanwisuth, Daniel K Leja · PDF
  97. Unifying Agent Interaction and World Information for Multi-agent Coordination

    Dongsu Lee, Daehee Lee, Yaru Niu, Honguk Woo, Amy Zhang, Ding Zhao · PDF
  98. Unsupervised Contrastive Goal Reaching

    Ahmed Turkman, Raj Ghugare, Benjamin Eysenbach · PDF
  99. What Makes a Reward Model a Good Teacher? An Optimization Perspective

    Noam Razin, Zixuan Wang, Hubert Strauss, Stanley Wei, Jason D. Lee, Sanjeev Arora · PDF
  100. When Data is the Algorithm: A Systematic Study and Curation of Preference Optimization Datasets

    Aladin Djuhera, Farhan Ahmed, Swanand Ravindra Kadhe, Syed Zawad, Heiko Ludwig, Holger Boche · PDF
  101. When Maximum Entropy Misleads Policy Optimization

    Ruipeng Zhang, Ya-Chien Chang, Sicun Gao · PDF