ICML 2024 Past Safety & alignmentReinforcement learningTheory
ICML 2024 Workshop: Aligning Reinforcement Learning Experimentalists and Theorists
ARLET 2024
- Submission deadline
- Jun 1, 2024, 13:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (76)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Case for Validation Buffer in Pessimistic Actor-Critic
-
A Theoretical Framework for Partially-Observed Reward States in RLHF
-
A Tractable Inference Perspective of Offline RL
-
A Unified Confidence Sequence for Generalized Linear Models, with Applications to Bandits
-
Accelerated Online Reinforcement Learning using Auxiliary Start State Distributions
-
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
-
Adaptive Foundation Models for Online Decisions: HyperAgent with Fast Incremental Uncertainty Estimation
-
Adaptive Two-Level Quasi-Monte Carlo for Soft Actor-Critic
-
Advantage Alignment Algorithms
-
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models
-
Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback
-
Batched fixed-confidence pure exploration for bandits with switching constraints
-
BenchMARL: Benchmarking Multi-Agent Reinforcement Learning
-
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
-
Bisimulation Metrics are Optimal Transport Distances, and Can be Computed Efficiently
-
Combining Reconstruction and Contrastive Methods for Multimodal Representations in RL
-
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
-
Coordination Failure in Cooperative Offline MARL
-
Decoupled Stochastic Gradient Descent for N-Player Games
-
Delayed Adversarial Attacks on Stochastic Multi-Armed Bandits
-
Distributionally Robust Reinforcement Learning with Interactive Data Collection: Fundamental Hardness and Near-Optimal Algorithm
-
Dual Approximation Policy Optimization
-
Efficient Offline Learning of Ranking Policies via Top-$k$ Policy Decomposition
-
Efficient Offline Reinforcement Learning: The Critic is Critical
-
EMPO: A Clustering-Based On-Policy Algorithm for Offline Reinforcement Learing
-
Enhancing Actor-Critic Decision-Making with Afterstate Models for Continuous Control
-
Exploiting Approximate Symmetry for Efficient Multi-Agent Reinforcement Learning
-
Exploiting Exogenous Structure for Sample-Efficient Reinforcement Learning
-
Functional Acceleration for Policy Mirror Descent
-
Generalized Linear Bandits with Limited Adaptivity
-
Handling Delay in Reinforcement Learning Caused by Parallel Computations of Neurons
-
How Does Return Distribution in Distributional Reinforcement Learning Help Optimization?
-
Improved Algorithms for Adversarial Bandits with Unbounded Losses
-
In Search for Architectures and Loss Functions in Multi-Objective Reinforcement Learning
-
Information Theoretic Guarantees For Policy Alignment In Large Language Models
-
Is Value Learning Really the Main Bottleneck in Offline RL?
-
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent
-
KalMamba: Towards Efficient Probabilistic State Space Models for RL under Uncertainty
-
Learning to Steer Markovian Agents under Model Uncertainty
-
Locally Interdependent Multi-Agent MDP: Theoretical Framework for Decentralized Agents with Dynamic Dependencies
-
Markov Persuasion Processes: How to Persuade Multiple Agents From Scratch
-
Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
-
Multi-Agent Imitation Learning: Value is Easy, Regret is Hard
-
No Representation, No Trust: Connecting Representation, Collapse, and Trust Issues in PPO
-
Offline Reinforcement Learning with Pessimistic Value Priors
-
Offline RL via Feature-Occupancy Gradient Ascent
-
On the Theory of Risk-Aware Agents: Bridging Actor-Critic and Economics
-
Oracle-Efficient Reinforcement Learning for Max Value Ensembles
-
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
-
Partially Observable Multi-Agent Reinforcement Learning using Mean Field Control
-
PIPER: Primitive-Informed Preference-based Hierarchical Reinforcement Learning via Hindsight Relabeling
-
Policy Gradient Methods with Adaptive Policy Spaces
-
Provable Partially Observable Reinforcement Learning with Privileged Information
-
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
-
Quantized Representations Prevent Dimensional Collapse in Self-predictive RL
-
Realtime Reinforcement Learning: Towards Rapid Asynchronous Deployment of Large Models
-
REBEL: Reinforcement Learning via Regressing Relative Rewards
-
Reinforcement Learning from Bagged Reward
-
Reinforcement Learning in the Wild with Maximum Likelihood-based Model Transfer
-
Rethinking Model-based, Policy-based, and Value-based Reinforcement Learning via the Lens of Representation Complexity
-
Reward Centering
-
Reweighted Bellman Targets for Continual Reinforcement Learning
-
Risk-Aware Bandits for Best Crop Management
-
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
-
Safe exploration in reproducing kernel Hilbert spaces
-
Should You Trust DQN?
-
Survive on Planet Pandora: Robust Cross-Domain RL Under Distinct State-Action Representations
-
The Importance of Online Data: Understanding Preference Fine-Tuning via Coverage
-
Towards the Transferability of Rewards Recovered via Regularized Inverse Reinforcement Learning
-
Towards Zero-Shot Generalization in Offline Reinforcement Learning
-
Transductive Active Learning with Application to Safe Bayesian Optimization
-
Transferable Reinforcement Learning via Generalized Occupancy Models
-
VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation
-
vMF-exp: von Mises-Fisher Exploration of Large Action Sets with Hyperspherical Embeddings
-
When to Sense and Control? A Time-adaptive Approach for Continuous-Time RL
-
Wind farm control with cooperative multi-agent reinforcement learning