ICML 2024 Past Safety & alignmentReinforcement learningTheory

ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists

ARLET 2024

Submission deadline
Jun 1, 2024, 13:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (76)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Case for Validation Buffer in Pessimistic Actor-Critic

    Michal Nauman, Mateusz Ostaszewski, Marek Cygan · PDF
  2. A Theoretical Framework for Partially-Observed Reward States in RLHF

    Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari · PDF
  3. A Tractable Inference Perspective of Offline RL

    Xuejie Liu, Anji Liu, Guy Van den Broeck, Yitao Liang · PDF
  4. A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

    Junghyun Lee, Se-Young Yun, Kwang-Sung Jun · PDF
  5. Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

    Aman Mehra, Alexandre Capone, Jeff Schneider · PDF
  6. Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

    Onur Celik, Aleksandar Taranovic, Gerhard Neumann · PDF
  7. Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

    Yingru Li, Jiawei Xu, Zhi-Quan Luo · PDF
  8. Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic

    Du Ouyang, Zhenpeng Shi, Aodong Guo, Huaze Tang, Hejin Wang, Chao Wang, Wenbo Ding · PDF
  9. Advantage Alignment Algorithms

    Juan Agustin Duque, Milad Aghajohari, Tim Cooijmans, Tianyu Zhang, Aaron Courville · PDF
  10. An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

    Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr · PDF
  11. Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

    Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Y. F. Tan, Hamid R. Rabiee · PDF
  12. Batched fixed-confidence pure exploration for bandits with switching constraints

    Newton Mwai, Milad Malekipirbazari, Fredrik D. Johansson · PDF
  13. BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

    Matteo Bettini, Amanda Prorok, Vincent Moens · PDF
  14. Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

    Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Miłoś, Marek Cygan · PDF
  15. Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

    Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas · PDF
  16. Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

    Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann · PDF
  17. Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

    Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe · PDF
  18. Coordination Failure in Cooperative Offline MARL

    Callum Rhys Tilbury, Juan Claude Formanek, Louise Beyers, Jonathan Phillip Shock, Arnu Pretorius · PDF
  19. Decoupled Stochastic Gradient Descent for N-Player Games

    Ali Zindari, Parham Yazdkhasti, Tatjana Chavdarova, Sebastian U Stich · PDF
  20. Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits

    Pierriccardo Olivieri, Matteo Castiglioni, Nicola Gatti · PDF
  21. Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

    Miao Lu, Han Zhong, Tong Zhang, Jose Blanchet · PDF
  22. Dual Approximation Policy Optimization

    Zhihan Xiong, Maryam Fazel, Lin Xiao · PDF
  23. Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition

    Ren Kishimoto, Koichi Tanaka, Haruka Kiyohara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito · PDF
  24. Efficient Offline Reinforcement Learning: The Critic is Critical

    Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey · PDF
  25. EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing

    Jongeui Park, Myungsik Cho, Youngchul Sung · PDF
  26. Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control

    Norio Kosaka · PDF
  27. Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

    Batuhan Yardim, Niao He · PDF
  28. Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

    Jia Wan, Sean R. Sinclair, Devavrat Shah, Martin J Wainwright · PDF
  29. Functional Acceleration for Policy Mirror Descent

    Veronica Chelu, Doina Precup · PDF
  30. Generalized Linear Bandits with Limited Adaptivity

    Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha · PDF
  31. Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons

    Ivan Anokhin, Rishav Rishav, Stephen Chung, Irina Rish, Samira Ebrahimi Kahou · PDF
  32. How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

    Ke Sun, Bei Jiang, Linglong Kong · PDF
  33. Improved Algorithms for Adversarial Bandits with Unbounded Losses

    Mingyu Chen, Xuezhou Zhang · PDF
  34. In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

    Mikhail Terekhov, Caglar Gulcehre · PDF
  35. Information Theoretic Guarantees For Policy Alignment In Large Language Models

    Youssef Mroueh · PDF
  36. Is Value Learning Really the Main Bottleneck in Offline RL?

    Seohong Park, Kevin Frans, Sergey Levine, Aviral Kumar · PDF
  37. Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

    Quentin Gallouédec, Edward Emanuel Beeching, Clément ROMAC, Emmanuel Dellandrea · PDF
  38. KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

    Philipp Becker, Niklas Freymuth, Gerhard Neumann · PDF
  39. Learning to Steer Markovian Agents under Model Uncertainty

    Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He · PDF
  40. Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

    Alex DeWeese, Guannan Qu · PDF
  41. Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch

    Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Alberto Marchesi · PDF
  42. Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Ally Yalei Du, Lin Yang, Ruosong Wang · PDF
  43. Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

    Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu · PDF
  44. No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

    Skander Moalla, Andrea Miele, Razvan Pascanu, Caglar Gulcehre · PDF
  45. Offline Reinforcement Learning with Pessimistic Value Priors

    Filippo Valdettaro, Aldo A. Faisal · PDF
  46. Offline RL via Feature-Occupancy Gradient Ascent

    Gergely Neu, Nneka Okolo · PDF
  47. On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

    Michal Nauman, Marek Cygan · PDF
  48. Oracle-Efficient Reinforcement Learning for Max Value Ensembles

    Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell · PDF
  49. ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

    Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, Pulkit Agrawal · PDF
  50. Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control

    Kai Cui, Sascha H. Hauck, Christian Fabian, Heinz Koeppl · PDF
  51. PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

    Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi · PDF
  52. Policy Gradient Methods with Adaptive Policy Spaces

    Gianmarco Tedeschi, Matteo Papini, Marcello Restelli · PDF
  53. Provable Partially Observable Reinforcement Learning with Privileged Information

    Yang Cai, Xiangyu Liu, Argyris Oikonomou, Kaiqing Zhang · PDF
  54. Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

    Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang · PDF
  55. Quantized Representations Prevent Dimensional Collapse in Self-predictive RL

    Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaeinezhadfard, Arno Solin, Joni Pajarinen · PDF
  56. Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models

    Matthew Riemer, Gopeshh Subbaraj, Glen Berseth, Irina Rish · PDF
  57. REBEL: Reinforcement Learning via Regressing Relative Rewards

    Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun · PDF
  58. Reinforcement Learning from Bagged Reward

    Yuting Tang, Xin-Qiang Cai, Yao-Xiang Ding, Qiyu Wu, Guoqing Liu, Masashi Sugiyama · PDF
  59. Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

    Hannes Eriksson, Tommy Tram, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis · PDF
  60. Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

    Guhao Feng, Han Zhong · PDF
  61. Reward Centering

    Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton · PDF
  62. Reweighted Bellman Targets for Continual Reinforcement Learning

    Ke Sun, Jun Jin, Xi Chen, Wulong Liu, Linglong Kong · PDF
  63. Risk-Aware Bandits for Best Crop Management

    Dorian Baudry, Romain Gautron · PDF
  64. RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

    Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman E. Ozdaglar · PDF
  65. Safe exploration in reproducing kernel Hilbert spaces

    Abdullah Tokmak, Kiran G. Krishnan, Thomas B. Schön, Dominik Baumann · PDF
  66. Should You Trust DQN?

    Aditya Gopalan, Gugan Thoppe · PDF
  67. Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

    Kuan-Chen Pan, MingHong Chen, Xi Liu, Ping-Chun Hsieh · PDF
  68. The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage

    Yuda Song, Gokul Swamy, Aarti Singh, Drew Bagnell, Wen Sun · PDF
  69. Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

    Andreas Schlaginhaufen, Maryam Kamgarpour · PDF
  70. Towards Zero-Shot Generalization in Offline Reinforcement Learning

    Zhiyong Wang, Chen Yang, John C.S. Lui, Dongruo Zhou · PDF
  71. Transductive Active Learning with Application to Safe Bayesian Optimization

    Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause · PDF
  72. Transferable Reinforcement Learning via Generalized Occupancy Models

    Chuning Zhu, Xinqi Wang, Tyler Han, Simon Shaolei Du, Abhishek Gupta · PDF
  73. VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

    Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, Winston H. Hsu · PDF
  74. vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings

    Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Théo Bontempelli, Thomas Bouabça, Tristan Cazenave · PDF
  75. When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

    Lenart Treven, Bhavya Sukhija, Yarden As, Florian Dorfler, Andreas Krause · PDF
  76. Wind farm control with cooperative multi-agent reinforcement learning

    Claire Bizon Monroc, Ana Busic, Jiamin Zhu, Donatien Dubuc · PDF