ICML 2024PastSafety & alignmentReinforcement learningTheory

ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists

ARLET 2024

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: Jun 1, 2024, 13:00 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (76)

Fetched from OpenReview (v2) on 2026-06-10.

A Case for Validation Buffer in Pessimistic Actor-Critic
Michal Nauman, Mateusz Ostaszewski, Marek Cygan · PDF
A Theoretical Framework for Partially-Observed Reward States in RLHF
Chinmaya Kausik, Mirco Mutti, Aldo Pacchiano, Ambuj Tewari · PDF
A Tractable Inference Perspective of Offline RL
Xuejie Liu, Anji Liu, Guy Van den Broeck, Yitao Liang · PDF
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
Junghyun Lee, Se-Young Yun, Kwang-Sung Jun · PDF
Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
Aman Mehra, Alexandre Capone, Jeff Schneider · PDF
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
Onur Celik, Aleksandar Taranovic, Gerhard Neumann · PDF
Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation
Yingru Li, Jiawei Xu, Zhi-Quan Luo · PDF
Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic
Du Ouyang, Zhenpeng Shi, Aodong Guo, Huaze Tang, Hejin Wang, Chao Wang, Wenbo Ding · PDF
Advantage Alignment Algorithms
Juan Agustin Duque, Milad Aghajohari, Tim Cooijmans, Tianyu Zhang, Aaron Courville · PDF
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
Yangchen Pan, Junfeng Wen, Chenjun Xiao, Philip Torr · PDF
Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback
Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Y. F. Tan, Hamid R. Rabiee · PDF
Batched fixed-confidence pure exploration for bandits with switching constraints
Newton Mwai, Milad Malekipirbazari, Fredrik D. Johansson · PDF
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
Matteo Bettini, Amanda Prorok, Vincent Moens · PDF
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Michal Nauman, Mateusz Ostaszewski, Krzysztof Jankowski, Piotr Miłoś, Marek Cygan · PDF
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas · PDF
Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL
Philipp Becker, Sebastian Mossburger, Fabian Otto, Gerhard Neumann · PDF
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Emma Cramer, Bernd Frauenknecht, Ramil Sabirov, Sebastian Trimpe · PDF
Coordination Failure in Cooperative Offline MARL
Callum Rhys Tilbury, Juan Claude Formanek, Louise Beyers, Jonathan Phillip Shock, Arnu Pretorius · PDF
Decoupled Stochastic Gradient Descent for N-Player Games
Ali Zindari, Parham Yazdkhasti, Tatjana Chavdarova, Sebastian U Stich · PDF
Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits
Pierriccardo Olivieri, Matteo Castiglioni, Nicola Gatti · PDF
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
Miao Lu, Han Zhong, Tong Zhang, Jose Blanchet · PDF
Dual Approximation Policy Optimization
Zhihan Xiong, Maryam Fazel, Lin Xiao · PDF
Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition
Ren Kishimoto, Koichi Tanaka, Haruka Kiyohara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito · PDF
Efficient Offline Reinforcement Learning: The Critic is Critical
Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey · PDF
EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing
Jongeui Park, Myungsik Cho, Youngchul Sung · PDF
Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control
Norio Kosaka · PDF
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
Batuhan Yardim, Niao He · PDF
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
Jia Wan, Sean R. Sinclair, Devavrat Shah, Martin J Wainwright · PDF
Functional Acceleration for Policy Mirror Descent
Veronica Chelu, Doina Precup · PDF
Generalized Linear Bandits with Limited Adaptivity
Ayush Sawarni, Nirjhar Das, Siddharth Barman, Gaurav Sinha · PDF
Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons
Ivan Anokhin, Rishav Rishav, Stephen Chung, Irina Rish, Samira Ebrahimi Kahou · PDF
How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?
Ke Sun, Bei Jiang, Linglong Kong · PDF
Improved Algorithms for Adversarial Bandits with Unbounded Losses
Mingyu Chen, Xuezhou Zhang · PDF
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
Mikhail Terekhov, Caglar Gulcehre · PDF
Information Theoretic Guarantees For Policy Alignment In Large Language Models
Youssef Mroueh · PDF
Is Value Learning Really the Main Bottleneck in Offline RL?
Seohong Park, Kevin Frans, Sergey Levine, Aviral Kumar · PDF
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
Quentin Gallouédec, Edward Emanuel Beeching, Clément ROMAC, Emmanuel Dellandrea · PDF
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty
Philipp Becker, Niklas Freymuth, Gerhard Neumann · PDF
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang, Vinzenz Thoma, Zebang Shen, Heinrich H. Nax, Niao He · PDF
Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
Alex DeWeese, Guannan Qu · PDF
Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni, Nicola Gatti, Alberto Marchesi · PDF
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
Ally Yalei Du, Lin Yang, Ruosong Wang · PDF
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu · PDF
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
Skander Moalla, Andrea Miele, Razvan Pascanu, Caglar Gulcehre · PDF
Offline Reinforcement Learning with Pessimistic Value Priors
Filippo Valdettaro, Aldo A. Faisal · PDF
Offline RL via Feature-Occupancy Gradient Ascent
Gergely Neu, Nneka Okolo · PDF
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
Michal Nauman, Marek Cygan · PDF
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
Marcel Hussing, Michael Kearns, Aaron Roth, Sikata Bela Sengupta, Jessica Sorrell · PDF
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, Pulkit Agrawal · PDF
Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control
Kai Cui, Sascha H. Hauck, Christian Fabian, Heinz Koeppl · PDF
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi · PDF
Policy Gradient Methods with Adaptive Policy Spaces
Gianmarco Tedeschi, Matteo Papini, Marcello Restelli · PDF
Provable Partially Observable Reinforcement Learning with Privileged Information
Yang Cai, Xiangyu Liu, Argyris Oikonomou, Kaiqing Zhang · PDF
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang · PDF
Quantized Representations Prevent Dimensional Collapse in Self-predictive RL
Aidan Scannell, Kalle Kujanpää, Yi Zhao, Mohammadreza Nakhaeinezhadfard, Arno Solin, Joni Pajarinen · PDF
Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models
Matthew Riemer, Gopeshh Subbaraj, Glen Berseth, Irina Rish · PDF
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao, Jonathan Daniel Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun · PDF
Reinforcement Learning from Bagged Reward
Yuting Tang, Xin-Qiang Cai, Yao-Xiang Ding, Qiyu Wu, Guoqing Liu, Masashi Sugiyama · PDF
Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
Hannes Eriksson, Tommy Tram, Debabrota Basu, Mina Alibeigi, Christos Dimitrakakis · PDF
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
Guhao Feng, Han Zhong · PDF
Reward Centering
Abhishek Naik, Yi Wan, Manan Tomar, Richard S. Sutton · PDF
Reweighted Bellman Targets for Continual Reinforcement Learning
Ke Sun, Jun Jin, Xi Chen, Wulong Liu, Linglong Kong · PDF
Risk-Aware Bandits for Best Crop Management
Dorian Baudry, Romain Gautron · PDF
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman E. Ozdaglar · PDF
Safe exploration in reproducing kernel Hilbert spaces
Abdullah Tokmak, Kiran G. Krishnan, Thomas B. Schön, Dominik Baumann · PDF
Should You Trust DQN?
Aditya Gopalan, Gugan Thoppe · PDF
Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations
Kuan-Chen Pan, MingHong Chen, Xi Liu, Ping-Chun Hsieh · PDF
The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage
Yuda Song, Gokul Swamy, Aarti Singh, Drew Bagnell, Wen Sun · PDF
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
Andreas Schlaginhaufen, Maryam Kamgarpour · PDF
Towards Zero-Shot Generalization in Offline Reinforcement Learning
Zhiyong Wang, Chen Yang, John C.S. Lui, Dongruo Zhou · PDF
Transductive Active Learning with Application to Safe Bayesian Optimization
Jonas Hübotter, Bhavya Sukhija, Lenart Treven, Yarden As, Andreas Krause · PDF
Transferable Reinforcement Learning via Generalized Occupancy Models
Chuning Zhu, Xinqi Wang, Tyler Han, Simon Shaolei Du, Abhishek Gupta · PDF
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, Winston H. Hsu · PDF
vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings
Walid Bendada, Guillaume Salha-Galvan, Romain Hennequin, Théo Bontempelli, Thomas Bouabça, Tristan Cazenave · PDF
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL
Lenart Treven, Bhavya Sukhija, Yarden As, Florian Dorfler, Andreas Krause · PDF
Wind farm control with cooperative multi-agent reinforcement learning
Claire Bizon Monroc, Ana Busic, Jiamin Zhu, Donatien Dubuc · PDF

Accepted papers (76)

☆A Case for Validation Buffer in Pessimistic Actor-Critic

☆A Theoretical Framework for Partially-Observed Reward States in RLHF

☆A Tractable Inference Perspective of Offline RL

☆A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

☆Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

☆Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

☆Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

☆Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic

☆Advantage Alignment Algorithms

☆An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

☆Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

☆Batched fixed-confidence pure exploration for bandits with switching constraints

☆BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

☆Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

☆Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

☆Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

☆Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

☆Coordination Failure in Cooperative Offline MARL

☆Decoupled Stochastic Gradient Descent for N-Player Games

☆Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits

☆Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

☆Dual Approximation Policy Optimization

☆Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition

☆Efficient Offline Reinforcement Learning: The Critic is Critical

☆EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing

☆Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control

☆Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

☆Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

☆Functional Acceleration for Policy Mirror Descent

☆Generalized Linear Bandits with Limited Adaptivity

☆Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons

☆How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

☆Improved Algorithms for Adversarial Bandits with Unbounded Losses

☆In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

☆Information Theoretic Guarantees For Policy Alignment In Large Language Models

☆Is Value Learning Really the Main Bottleneck in Offline RL?

☆Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

☆KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

☆Learning to Steer Markovian Agents under Model Uncertainty

☆Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

☆Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch

☆Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

☆Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

☆No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

☆Offline Reinforcement Learning with Pessimistic Value Priors

☆Offline RL via Feature-Occupancy Gradient Ascent

☆On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

☆Oracle-Efficient Reinforcement Learning for Max Value Ensembles

☆ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

☆Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control

☆PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

☆Policy Gradient Methods with Adaptive Policy Spaces

☆Provable Partially Observable Reinforcement Learning with Privileged Information

☆Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

☆Quantized Representations Prevent Dimensional Collapse in Self-predictive RL

☆Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models

☆REBEL: Reinforcement Learning via Regressing Relative Rewards

☆Reinforcement Learning from Bagged Reward

☆Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

☆Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

☆Reward Centering

☆Reweighted Bellman Targets for Continual Reinforcement Learning

☆Risk-Aware Bandits for Best Crop Management

☆RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

☆Safe exploration in reproducing kernel Hilbert spaces

☆Should You Trust DQN?

☆Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

☆The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage

☆Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

☆Towards Zero-Shot Generalization in Offline Reinforcement Learning

☆Transductive Active Learning with Application to Safe Bayesian Optimization

☆Transferable Reinforcement Learning via Generalized Occupancy Models

☆VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

☆vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings

☆When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

☆Wind farm control with cooperative multi-agent reinforcement learning

A Case for Validation Buffer in Pessimistic Actor-Critic

A Theoretical Framework for Partially-Observed Reward States in RLHF

A Tractable Inference Perspective of Offline RL

A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits

Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation

Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic

Advantage Alignment Algorithms

An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models

Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

Batched fixed-confidence pure exploration for bandits with switching constraints

BenchMARL: Benchmarking Multi-Agent Reinforcement Learning

Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control

Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently

Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

Coordination Failure in Cooperative Offline MARL

Decoupled Stochastic Gradient Descent for N-Player Games

Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits

Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm

Dual Approximation Policy Optimization

Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition

Efficient Offline Reinforcement Learning: The Critic is Critical

EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing

Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control

Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning

Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning

Functional Acceleration for Policy Mirror Descent

Generalized Linear Bandits with Limited Adaptivity

Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons

How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?

Improved Algorithms for Adversarial Bandits with Unbounded Losses

In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning

Information Theoretic Guarantees For Policy Alignment In Large Language Models

Is Value Learning Really the Main Bottleneck in Offline RL?

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty

Learning to Steer Markovian Agents under Model Uncertainty

Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies

Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch

Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

Multi-Agent Imitation Learning: Value is Easy, Regret is Hard

No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO

Offline Reinforcement Learning with Pessimistic Value Priors

Offline RL via Feature-Occupancy Gradient Ascent

On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics

Oracle-Efficient Reinforcement Learning for Max Value Ensembles

ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization

Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control

PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling

Policy Gradient Methods with Adaptive Policy Spaces

Provable Partially Observable Reinforcement Learning with Privileged Information

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Quantized Representations Prevent Dimensional Collapse in Self-predictive RL

Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models

REBEL: Reinforcement Learning via Regressing Relative Rewards

Reinforcement Learning from Bagged Reward

Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer

Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity

Reward Centering

Reweighted Bellman Targets for Continual Reinforcement Learning

Risk-Aware Bandits for Best Crop Management

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

Safe exploration in reproducing kernel Hilbert spaces

Should You Trust DQN?

Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations

The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage

Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning

Towards Zero-Shot Generalization in Offline Reinforcement Learning

Transductive Active Learning with Application to Safe Bayesian Optimization

Transferable Reinforcement Learning via Generalized Occupancy Models

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings

When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL

Wind farm control with cooperative multi-agent reinforcement learning