ICML 2024 Past Reinforcement learning

ICML 2024 Workshop: Foundations of Reinforcement Learning and Control -- Connections and Perspectives

FoRLaC

Submission deadline
May 30, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (72)

Fetched from OpenReview (v2) on 2026-06-10.

  1. $\alpha$-Fair Contextual Bandits

    Siddhant Chaudhary, Abhishek Sinha · PDF
  2. A Best-of-both-worlds Algorithm for Bandits with Delayed Feedback with Robustness to Excessive Delays

    Saeed Masoudian, Julian Zimmert, Yevgeny Seldin · PDF
  3. A Policy Optimization Approach to the Solution of Unregularized Mean Field Games

    Sihan Zeng, Sujay Bhatt, Alec Koppel, Sumitra Ganesh · PDF
  4. A Pontryagin Perspective on Reinforcement Learning

    Onno Eberhard, Claire Vernade, Michael Muehlebach · PDF
  5. A safe exploration approach to constrained Markov decision processes

    Tingting Ni, Maryam Kamgarpour · PDF
  6. A Simple and Adaptive Learning Rate for FTRL in Online Learning with Minimax Regret of $\Theta(T^{2/3})$ and its Application to Best-of-Both-Worlds

    Taira Tsuchiya, Shinji Ito · PDF
  7. A Variational Formulation of Reinforcement Learning in Infinite-Horizon Markov Decision Processes

    Tim G. J. Rudner · PDF
  8. Adaptive Experimental Design for Policy Learning: Contextual Best Arm Identification

    Masahiro Kato, Kyohei Okumura, Takuya Ishihara, Toru Kitagawa · PDF
  9. Bandits with Abstention under Expert Advice

    Stephen Pasteris, Alberto Rumi, Maximilian Thiessen, Shota Saito, Atsushi Miyauchi, Fabio Vitale, Mark Herbster · PDF
  10. Bandits with Preference Feedback: A Stackelberg Game Perspective

    Barna Pásztor, Parnian Kassraie, Andreas Krause · PDF
  11. Bridging Distributional and Risk-Sensitive Reinforcement Learning: Balancing Statistical, Computational, and Risk Considerations

    Hao Liang · PDF
  12. Bridging Distributionally Robust Learning and Offline RL: An Approach to Mitigate Distribution Shift and Partial Data Coverage

    Kishan Panaganti, Zaiyan Xu, Dileep Kalathil, Mohammad Ghavamzadeh · PDF
  13. Causal Bandits: The Pareto Optimal Frontier of Adaptivity, a Reduction to Linear Bandits, and Limitations around Unknown Marginals

    Ziyi Liu, Idan Attias, Daniel M. Roy · PDF
  14. Certifying robustness to adaptive data poisoning

    Avinandan Bose, Madeleine Udell, Laurent Lessard, Maryam Fazel, Krishnamurthy Dj Dvijotham · PDF
  15. Chained Information-Theoretic Bounds and Tight Regret Rate for Linear Bandit Problems

    Amaury Gouverneur, Borja Rodríguez Gálvez, Tobias Oechtering, Mikael Skoglund · PDF
  16. Combining Neural Networks and Symbolic Regression for Analytical Lyapunov Function Discovery

    Jie Feng, Haohan Zou, Yuanyuan Shi · PDF
  17. Compatible Gradient Approximations for Actor-Critic Algorithms

    Baturay Saglam, Dionysis Kalogerias · PDF
  18. CPeSFA: Empowering SFs for Policy Learning and Transfer in Continuous Action Spaces

    Yining LI, Tianpei Yang, Wei Guo, Jianye HAO, YAN ZHENG · PDF
  19. DARE: The Deep Adaptive Regulator for Control of Uncertain Continuous-Time Systems

    Harrison Waldon, Fayçal Drissi, Yannick Limmer, Uljad Berdica, Jakob Nicolaus Foerster, Alvaro Cartea · PDF
  20. DeePC-Hunt: Data-enabled Predictive Control Hyperparameter Tuning via Differentiable Optimization

    Michael Cummins, Alberto Padoan, Keith Moffat, John Lygeros, Florian Dorfler · PDF
  21. Defending Against Unknown Corrupted Agents: Reinforcement Learning of Adversarially Robust Nash Equilibria

    Andi Nika, Jonathan Nöther, Adish Singla, Goran Radanovic · PDF
  22. Distributional Monte-Carlo Planning with Thompson Sampling in Stochastic Environments

    Tuan Quang Dam, Brahim Driss, Odalric-Ambrym Maillard · PDF
  23. Essentially Sharp Estimates on the Entropy Regularization Error in Discounted Markov Decision Processes

    Johannes Müller, Semih Cayci · PDF
  24. Event-Based Federated Q-Learning

    Guner Dilsad ER, Michael Muehlebach · PDF
  25. Exploring Integrality Grip for Mixed-integer Programming by MCTS Planning

    Defeng Liu · PDF
  26. Finite Sample Identification: From Frequency to Time Domain

    Anastasios Tsiamis, Mohamed Abdalmoaty, Roy S. Smith, John Lygeros · PDF
  27. Finite-time convergence to an $\epsilon$-efficient Nash equilibrium in potential games

    Anna Maria Maddux, Reda Ouhamma, Maryam Kamgarpour · PDF
  28. Growing Q-Networks: Solving Continuous Control Tasks with Adaptive Control Resolution

    Tim Seyde, Peter Werner, Wilko Schwarting, Markus Wulfmeier, Daniela Rus · PDF
  29. Hierarchical Reinforcement Learning and Model Predictive Control for Strategic Motion Planning in Autonomous Racing

    Rudolf Reiter, Jasper Hoffmann, Joschka Boedecker, Moritz Diehl · PDF
  30. Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

    Poppy Collis, Ryan Singh, Paul Kinghorn, Christopher Buckley · PDF
  31. Identifiable latent bandits: Combining observational data and exploration for personalized healthcare

    Ahmet Zahid Balcıoğlu, Emil Carlsson, Fredrik D. Johansson · PDF
  32. Improved Algorithms for Contextual Dynamic Pricing

    Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet · PDF
  33. Learning HJB Viscosity Solutions with PINNs for Continuous-Time Reinforcement Learning

    Alena Shilova, Thomas Delliaux, Philippe Preux, Bruno Raffin · PDF
  34. Learning Nash Equilibria in Zero-Sum Markov Games: A Single-Timescale Algorithm Under Weak Reachability

    Reda Ouhamma, Maryam Kamgarpour · PDF
  35. Learning to Explore with Lagrangians for Bandits under Unknown Constraints

    Udvas Das, Debabrota Basu · PDF
  36. Learning When to Trust the Expert for Guided Exploration in RL

    Felix Schulz, Jasper Hoffmann, Yuan Zhang, Joschka Boedecker · PDF
  37. Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

    Cameron Allen, Aaron T. Kirtland, Ruo Yu Tao, Sam Lobel, Daniel Scott, Nicholas Petrocelli, Omer Gottesman, Ronald Parr, Michael Littman, George Konidaris · PDF
  38. Model Based Diffusion for Trajectory Optimization

    Chaoyi Pan, Zeji Yi, Guanya Shi, Guannan Qu · PDF
  39. Multiple-policy Evaluation via Density Estimation

    Yilei Chen, Aldo Pacchiano, Ioannis Paschalidis · PDF
  40. NEORL: Efficient Exploration for Nonepisodic RL

    Bhavya Sukhija, Lenart Treven, Florian Dorfler, Stelian Coros, Andreas Krause · PDF
  41. Neural Dueling Bandits

    Arun Verma, Zhongxiang Dai, Xiaoqiang Lin, Patrick Jaillet, Bryan Kian Hsiang Low · PDF
  42. Non-ergodicity in reinforcement learning: robustness via ergodicity transformations

    Dominik Baumann, Erfaun Noorani, James Price, Ole Peters, Colm Connaughton, Thomas B. Schön · PDF
  43. Non-Linear $H_\infty$ Robustness Guarantees for Neural Network Policies

    Daniel Urieli · PDF
  44. On PI Controllers for Updating Lagrange Multipliers in Constrained Optimization

    Motahareh Sohrabi, Juan Ramirez, Tianyue H. Zhang, Simon Lacoste-Julien, Jose Gallego-Posada · PDF
  45. On Robust Reinforcement Learning with Lipschitz-Bounded Policy Networks

    Nicholas H. Barbara, Ruigang Wang, Ian Manchester · PDF
  46. Online Optimization of Closed-Loop Control Systems

    Hao Ma, Melanie Zeilinger, Michael Muehlebach · PDF
  47. Online Performance Optimization of Nonlinear Systems: A Gray-Box Approach

    Zhiyu He, Michael Muehlebach, Saverio Bolognani, Florian Dorfler · PDF
  48. Optimality of Stationary Policies in Risk-averse Total-reward MDPs with EVaR

    Xihong Su, Marek Petrik, Julien Grand-Clément · PDF
  49. Optimistic Information Directed Sampling

    Gergely Neu, Matteo Papini, Ludovic Schwartz · PDF
  50. Partial Structure Discovery is Sufficient for No-regret Learning in Causal Bandits

    Muhammad Qasim Elahi, Mahsa Ghasemi, Murat Kocaoglu · PDF
  51. Pink Noise LQR: How does Colored Noise affect the Optimal Policy in RL?

    Jakob Hollenstein, Marko Zaric, Samuele Tosatto, Justus Piater · PDF
  52. Power Mean Estimation in Stochastic Monte-Carlo Tree Search

    Tuan Quang Dam, Odalric-Ambrym Maillard, Emilie Kaufmann · PDF
  53. Preference Elicitation for Offline Reinforcement Learning

    Alizée Pace, Bernhard Schölkopf, Gunnar Ratsch, Giorgia Ramponi · PDF
  54. Randomized Confidence Bounds for Stochastic Partial Monitoring

    Maxime Heuillet, Ola Ahmad, Audrey Durand · PDF
  55. Recommender System Design via Online Feedback Optimization

    Sanjay Chandrasekaran, Giulia De Pasquale, Giuseppe Belgioioso, Florian Dorfler · PDF
  56. Recurrent Natural Policy Gradient for POMDPs

    Semih Cayci, Atilla Eryilmaz · PDF
  57. Reinforcement Learning of Adaptive Acquisition Policies for Inverse Problems

    Gianluigi Silvestri, Fabio Valerio Massoli, Tribhuvanesh Orekondy, Afshin Abdi, Arash Behboodi · PDF
  58. Reinforcement Learning with Lookahead Information

    Nadav Merlis · PDF
  59. Reinforcement Learning with Quasi-Hyperbolic Discounting

    Eshwar S R, Nibedita Roy, Gugan Thoppe · PDF
  60. Robust Best-of-Both-Worlds Gap Estimators Based on Importance-Weighted Sampling

    Sarah Clusiau, Saeed Masoudian, Yevgeny Seldin · PDF
  61. Safe online nonstochastic control from data

    Sebastian Kerz, Armin Lederer, Marion Leibold, Dirk Wollherr · PDF
  62. Safe Reinforcement Learning with Contrastive Risk Prediction

    Hanping Zhang, Yuhong Guo · PDF
  63. SMX: Sequential Monte Carlo Planning for Expert Iteration

    Edan Toledo, Matthew Macfarlane, Donal John Byrne, Siddarth Singh, Paul Duckworth, Alexandre Laterre · PDF
  64. Sum-Max Submodular Bandits

    Stephen Pasteris, Alberto Rumi, Fabio Vitale, Nicolò Cesa-Bianchi · PDF
  65. The Minimax Regret of Sequential Probability Assignment, Contextual Shtarkov Sums, and Contextual Normalized Maximum Likelihood

    Ziyi Liu, Idan Attias, Daniel M. Roy · PDF
  66. The Value of Reward Lookahead in Reinforcement Learning

    Nadav Merlis, Dorian Baudry, Vianney Perchet · PDF
  67. Tight Bounds for Online Convex Optimization with Adversarial Constraints

    Abhishek Sinha, Rahul Vaze · PDF
  68. Towards Empowerment Gain through Causal Structure Learning in Model-Based RL

    Hongye Cao, Fan Feng, Meng Fang, Shaokang Dong, Jing Huo, Yang Gao · PDF
  69. Truly No-Regret Learning in Constrained MDPs

    Adrian Müller, Pragnya Alatur, Volkan Cevher, Giorgia Ramponi, Niao He · PDF
  70. Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning

    Junyan Liu, Yunfan Li, Ruosong Wang, Lin Yang · PDF
  71. Variance-Dependent Regret Bounds for Nonstationary Linear Bandits

    Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, Dongruo Zhou · PDF
  72. When is Mean-Field Reinforcement Learning Tractable and Relevant?

    Batuhan Yardim, Artur Goldman, Niao He · PDF