ICML 2026 Past Reinforcement learningOptimizationDatasets

Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

DEMO 2026

Submission deadline
May 9, 2026, 23:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (142)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Differentiable Bayesian Optimization Framework via Variational Mutual Information Estimation

    Farhad Mirkarimi · PDF
  2. A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

    Mattia Forzo, Alessio Russo, Enea Monzio Compagnoni, Aldo Pacchiano · PDF
  3. A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search

    Baek Seong-Eun, Lee Jung-Mok, Kim Sung-Bin, Tae-Hyun Oh · PDF
  4. A Mutual Information Lower Bound for Multimodal Regression Active Learning

    Leonardo Ferreira Guilhoto, Akshat Kaushal, Paris Perdikaris · PDF
  5. A Planning-Based Reinforcement Learning Approach to Numerical Optimization

    Uma Maheswari Natarajan, Sai Shruti Prakhya, Shivani Sanjiv Shukla, Chandramouli Kamanchi, Raghuram Bharadwaj Diddigi · PDF
  6. Abstraction for Offline Goal-Conditioned Reinforcement Learning

    Clarisse Wibault, Alexander David Goldie, Antonio León Villares, M. A. Osborne, Jakob Nicolaus Foerster · PDF
  7. Action-Free Offline RL via Demonstrator Diversity

    Felix Schur · PDF
  8. AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates

    Shaolong Chen, Madalina Ciobanu, Qingqing Mao, Ritankar Das · PDF
  9. Adaptive Querying with AI Persona Priors

    Kaizheng Wang, Yuhang Wu, assaf zeevi · PDF
  10. Adaptive Stratified Active Statistical Inference

    Pinaki Mohanty, Rajiv Khanna · PDF
  11. Aligning Flow Map Policies with Optimal $Q$-Guidance

    Christos Ziakas, Alessandra Russo, Joey Bose · PDF
  12. An Information-Theoretic Analysis of OOD Generalization in Meta-Reinforcement Learning

    Xingtu Liu · PDF
  13. ARCA: Adapter-Residual Credit Assignment When Token Signals Degenerate

    Rodney Lafuente-Mercado · PDF
  14. AsyncOPD: How Stale Can On-Policy Distillation Be?

    Wonjun Kang, Kevin Galim, Seunghyuk Oh, Minjun Kang, Sanghyun Park, Donghoon Kim, Minjae Lee, Minseo Kim, Rishabh Tiwari, Yuchen Zeng, Hyung Il Koo, Kangwook Lee · PDF
  15. Auditing Offline Demonstration Pruning for Online Robot Deployment

    Joy Zheyun Yang, Socrates Osorio · PDF
  16. Automated Kernel Discovery Towards Understanding High-dimensional Bayesian Optimization

    Taeyoung Yun, Woocheol Shin, Inhyuck Song, Jaewoo Lee, Jinkyoo Park · PDF
  17. Bayesian Optimization with Early Trial Termination for Speeding Up Parallel Neural Network Training

    Apivich Hemachandra, Yizhan Han, See-Kiong Ng, Bryan Kian Hsiang Low · PDF
  18. Behavior Cloning is Not All You Need: The Optimality of On-Policy Distillation for Noisy Expert Feedback

    Ved Sriraman, Peihan Liu, Daniel Hsu, Adam Block · PDF
  19. Belief-Aware Decision Transformers for Offline-to-Online Decision-Making under Partial Observability: A Geosteering Case Study

    Hibat Errahmen DJECTA, Sergey Alyaev, Kristian Fossum · PDF
  20. Bellman--Whitney Envelopes: Sharp Partial Identification in Offline Control under Support Holes

    Manoj Saravanan, Rohit Kumar Salla · PDF
  21. Beyond One-Size-Fits-All: Diagnosis-Driven Online Reinforcement Learning with Offline Priors

    Guozheng Ma, Lu Li, Zilin Wang, Pierre-Luc Bacon, Dacheng Tao · PDF
  22. BoLT: A Benchmark to Democratize Black-box Optimization Research for Expensive LLM Tasks

    Ruth Wan Theng Chew, Zhiliang Chen, Apivich Hemachandra, Bryan Kian Hsiang Low · PDF
  23. Boosting Direct Preference Optimization with Penalization

    Pengwei Sun · PDF
  24. Boosting for Reinforcement Learning in Structured MDPs

    Anh Do, Jessica Sorrell · PDF
  25. Can K Heads Explore Better Than One in Online Reinforcement Learning?

    Abhishek Jha, Satyapragnya Kar, Kishlay Kumar, Stephanie Milani, Rajesh Ranganath · PDF
  26. Can We Really Learn One Representation to Optimize All Rewards?

    Chongyi Zheng, Royina Karegoudra Jayanth, Benjamin Eysenbach · PDF
  27. Clarifying Uncertainty Quantification in Off-Policy Evaluation: Beyond Effective Sample Sizes, Towards Confidence Intervals

    Aditya Dutta, Kaixuan Liu, Shengpu Tang · PDF
  28. CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning

    Marcel Hedman, Kale-ab Tessera, Juan Claude Formanek, Anya Sims, Riccardo Zamboni, Trevor McInroe, John Torr, Elliot Fosong · PDF
  29. CombiLatent: Neural Combinatorial Optimization via Latent Space Search under Sinkhorn Divergence Regularization

    Younes Boukacem, Benouaklil Hodhaifa, Ait Said Yassine, Houssem Talbi, Mehdi Zakaria Adjal, Faissal IZERMINE · PDF
  30. Combinatorial Allocation Bandits with Nonlinear Arm Utility

    Yuki Shibukawa, Koichi Tanaka, Yuta Saito, Shinji Ito · PDF
  31. Conformal Candidate Certification for Offline Model-Based Optimization

    Seungjin Choi · PDF
  32. CoRDE: Concept-Prior Routed Diffusion Experts for Structural Generalization in Robot Manipulation

    Haidong Huang, Xixin Zhao, Yaohua Zhou, Jiayu Song, Jiayi Zhang, Jun Ma, Haiyue Zhu, Xiaocong Li · PDF
  33. Cost-Aware Learning

    Clara Mohri, Amir Globerson, Haim Kaplan, Tomer Koren, Yishay Mansour · PDF
  34. Curvature-Aware Active Statistical Inference : Reducing Labeling via Data Coherence

    Pinaki Mohanty, Rajiv Khanna · PDF
  35. Decision Titan: Test-Time Training for Long-Term Memory in Offline Reinforcement Learning

    Jude Waide, Robert Lieck · PDF
  36. Disentangled Differentiable Model Predictive Control for Data-efficient and Interpretable Imitation Learning

    Jonghak Bae, Taehyung Kim, Yonghwa Seo, Jongeun Choi · PDF
  37. Dual Advantage Fields

    Alexey Zemtsov, Maksim Bobrin, Alexander Nikulin, Dmitry V. Dylov, Fakhri Karray, Vladislav Kurenkov, Martin Takáč, Arip Asadulaev · PDF
  38. DUO: Diffusion Models for Universal Offline Black-Box Optimization

    Yihao Zheng, Ke Xue, Rong-Xi Tan, Chao Qian · PDF
  39. Efficient Algorithms for Contextual Apple Tasting with Log-Loss

    Byeongwoo An, Kapilan Balagopalan, Sehwa Jeong, Jiun Jeong, Kyoungseok Jang, Hyowon Wi, Noseong Park, Gi-Soo Kim, Kwang-Sung Jun · PDF
  40. Efficient Cost-Aware LLM Evaluation via Bayesian Bandit Gittins Indices

    Qian Xie, Yueli He, Nairen Cao · PDF
  41. Efficient Off-Policy RL for Video Generation via Forward-Consistent Reward Matching

    Hongzheng Yang, Mengyang LIU, Haoxuan Wu, Kun Li, Yuzhi Zhao, Wei Liu · PDF
  42. Efficient Post-training of LLMs for Code Generation With Offline Reinforcement Learning

    Abhinav Anand, mingze wu, Shweta Verma, Mira Mezini · PDF
  43. ElementMindX: Offline Supplier-Substitution Ranking for Natural-Language Trade-Shock Decision Support

    Sharv Murgai · PDF
  44. Fairness of Exposure in Stochastic Multiple-play Multi-armed Bandits

    Youngmi Jin, Dongdeok Kim, Young-Joo Suh · PDF
  45. Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

    Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu · PDF
  46. FASTER: Value-Guided Sampling for Fast RL

    Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn · PDF
  47. FICReg: Forward-Inverse Consistency Regularization for Latent World Models

    Sungwon Seo, Jean Seong Bjorn Choe, Jong-Kook Kim · PDF
  48. Flow-Based Offline Reinforcement Learning for Voltage Regulation in Distribution Networks

    Liyu Shan, Yongli Zhu · PDF
  49. Forgetting to Improve: Principled Data Removal in Active Learning

    Manuel Wendl, Erik Englesson, Andreas Krause, Carl Henrik Ek · PDF
  50. Freeze the Policy, Infer the Goal: Cross-Domain Imitation with World Models

    Xingyuan Zhang, Marvin Alles, Patrick van der Smagt, Philip Becker-Ehmck · PDF
  51. From Offline Evidence to Online Action: A Decision Framework for Imperfect Offline Evaluation

    Yuhao Wang, Lorenzo Masoero · PDF
  52. From Offline Global Information to Online Decentralized Policies in Edge Network Scheduling

    Justin Chang, Angela Zhang, Jiayi Chen, Aditya Akella · PDF
  53. From Offline Trajectories to Online Adaptation: A Multimodal JEPA Pretraining Study on Pokemon Red

    Stefano Campese, Alessandro Moschitti · PDF
  54. From Static Policies to Adaptive Priors in Offline Reinforcement Learning

    Tianwei Ni, Vineet Jain, Akash Karthikeyan, Pierre-Luc Bacon · PDF
  55. Future Information-Directed Sampling for Bayesian Nonstationary Bandits

    Yichen Song, Alessio Russo, Aldo Pacchiano · PDF
  56. Globally Convergent Offline Reinforcement Learning with Smoothed Bellman Residual Minimization

    Byungjun Park, Minhyeok Park, Enoch H. Kang, Kyoungseok Jang · PDF
  57. Good Experience Maximization

    Anthony GX-Chen, Stephanie Milani, Jatin Prakash, Sumit Chopra, Rob Fergus, Rajesh Ranganath · PDF
  58. Hazard Compression: Catastrophic Forgetting in Diffusion-Based Generative Replay for Safe Reinforcement Learning

    Wei Gao, Ellie Du, Akeel Majeed · PDF
  59. Hidden Failure Modes in Latent World-Model Planning from Offline Data

    Kanpat Vesessook, Kevin Yang · PDF
  60. How Many Initial Points Does Bayesian Optimization Need?

    Mujin Cheon, James Odgers, Dong-Yeun Koh, Calvin Tsay · PDF
  61. Imitating the Imperfect: Offline-to-Online Robust Imitation Learning from Heterogeneous Demonstrators

    Cheng Pan, Kai Arulkumaran, Nan Lu, Josie Hughes · PDF
  62. Improve Reasoning Ability by Reinforcing Only from Positive Rollouts

    Mingwei Xu, Hao Fang · PDF
  63. Improving Multi-Agent Coordination with a Drift-Aware RL Objective

    Sangeun Park, Guhyeon Kang, Minhae Kwon · PDF
  64. In-context Latent Space Bayesian Optimization

    Tuan A. Vu, Harri Lähdesmäki, Julien Martinelli · PDF
  65. In-Context Pure Exploration in Continuous Decision Spaces

    Alessio Russo, Yin-Ching Lee, Ryan Welch, Aldo Pacchiano · PDF
  66. Information-Directed Offline-to-Online Reinforcement Learning

    Keru Chen · PDF
  67. Instability and Interpretability Discrepancies Between CNNs and Vision Transformers in Keratoconus Detection

    Shankar Harikrishnan · PDF
  68. Is Temporal-Difference Learning the Only Path to Stitching in RL?

    Michał Bortkiewicz, Władysław Pałucki, Mateusz Ostaszewski, Benjamin Eysenbach · PDF
  69. Learning from the Right Mistakes: When Do Low-Performing Data Help Offline Policy Gradients?

    Jesse Silverberg, Glen Berseth, Marc G Bellemare · PDF
  70. Learning Insider-Threat Intervention Policies from Offline Logs

    Cyrine Fekih, Ashish Sai, Adriana Iamnitchi, Dennis J. N. J. Soemers · PDF
  71. Learning Reasoning Rewards from Expert Demonstrations with Inverse Reinforcement Learning

    Claudio Fanconi, Nicolás Astorga, Mihaela van der Schaar · PDF
  72. Learning through Adaptive Queries: a Directional Derivative Approach

    Doyoung Heo, Hwan Kyu Sung, Sanghwa Kim, Seungki Min · PDF
  73. Learning to Orchestrate Heterogeneous Agents under Uncertainty

    Mary Chriselda Antony Oliver, Lan Jiang, Aaron Bundi Anampiu, Elaf Almahmoud, Francesco Quinzan, Umang Bhatt · PDF
  74. Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

    Surbhi Goel, Jonathan Pei, James Wang · PDF
  75. Less Tuning, Better Planning: Simplifying Offline Model-Based Planning

    Co Yong, Chan-Hung Yu, Shao-Hua Sun · PDF
  76. Leveraging Instruction Tuning and Merging for Reasoning Model Adaptation

    Yu-Du Feng, Niels Mündler, Mark Vero, Martin Vechev · PDF
  77. Leveraging Offline Supervision for Efficient and Generalizable Reinforcement Learning in Large-Scale Vision--Language--Action Models

    Dmitriy Poyarkov, Aleksei Staroverov, Aleksandr Panov · PDF
  78. LLM-PriorCB: Textual Contextual Bandits with LLM-Induced Priors

    Geon-Hyeong Kim, Yu Jin Kim, June Yong Yang, Woohyung Lim, Youngsoo Jang, Moontae Lee · PDF
  79. Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism

    Tianwei Ni, Esther Derman, Vineet Jain, Vincent Taboga, Siamak Ravanbakhsh, Pierre-Luc Bacon · PDF
  80. Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning

    Jiahang Cao, Qiang Zhang, Ziqing Wang, Jingkai SUN, Jiaxu Wang, Hao Cheng, Yecheng Shao, Wen Zhao, Gang Han, Yijie Guo, Renjing Xu · PDF
  81. Meta-GC-TTT: Training Offline Goal-Conditioned Policies for Test-Time Adaptation

    Antonio Mari, Marco Bagatella, Jonas Hübotter, Andreas Krause · PDF
  82. MIRT: Multi-Dimensional IRT for SLO-Adaptive Multi-Agent Routing

    Hak Hyun Kim, Benjamin Huh, Jimin Moon, Chia-Wei Lee, Jason Peng, Wesley J. Marrero, Soroush Vosoughi · PDF
  83. Molten Pot: Evaluations & Datasets for Social Offline Reinforcement Learning

    Juan Claude Formanek, Marcel Hedman, Kale-ab Tessera, Christopher C. Holmes, Jonathan P. Shock · PDF
  84. Neuron-Aware Data Selection for Annotation-Free LLM Self-Distillation

    Zhuowei Chen, Xiang Lorraine Li · PDF
  85. Neutral Reward Filtering for Fair Offline-to-Online Diffusion Alignment

    YeonGyu Han, Junah Jung, Dongheon Lee · PDF
  86. Noisy-Space Policy Gradient for Diffusion Policies in Offline Reinforcement Learning

    Mahmoud Selim, Cristina Cipriani, Karl Henrik Johansson · PDF
  87. Offline Multi-Agent Reinforcement Learning for Objective-Weight Adaptation in Three-Sided Marketplace Dispatch

    Haochen Wu, Yi Hou, Shiguang Xie · PDF
  88. Offline Policy Learning for Clinical-Trial Strategy

    William James Bolton, Philip Torr · PDF
  89. Offline Policy Learning under Compliance Uncertainty: Adoption-Aware Decision-Making with Observational-to-RCT Calibration Drift

    Vairaaj Bindal · PDF
  90. Offline Preference Learning with Clustering and Active Data-Augmentation

    Jingyuan Liu, Fatemeh Ghaffari, Xuchuang Wang, Xutong Liu, Mohammad Hajiesmaili, Carlee Joe-Wong · PDF
  91. On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

    Kaixuan Ji, Qiwei Di, Heyang Zhao, Qingyue Zhao, Quanquan Gu · PDF
  92. On the Role of Proposal Support in Diffusion-Based Offline RL for Sequential Decision-Making

    Yunho Kim, Jaehyun Park, Heejun Kim, Sejin Kim, Byung-Jun Lee, Sundong Kim · PDF
  93. Online Regret Minimization in Linear Bandits with Offline data.

    Sushant Vijayan, Karthikeyan Shanmugam, Arun Suggala, Soumyabrata Pal · PDF
  94. Online Self-Training for Co-Adaptation in Hierarchical Diffusion Policies

    Clémence Grislain, Mathilde Kappel, Olivier Sigaud, Mohamed CHETOUANI · PDF
  95. Orchestrating LLMs as Hierarchical Multi-Agent Reinforcement Learning System for Automotive Software Development

    Raghav R, Mohammed Farag · PDF
  96. PC3D: Zero-Shot Cooperation Across Variable Rosters via Personalized Context Distillation

    Ahmet Onur Akman, Rafal Kucharski · PDF
  97. Pessimism’s Paradox: Conservative Offline Training Amplifies Reward Hacking During Online Adaptation in Reasoning Models.

    Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary · PDF
  98. Pitfalls and Remedies for Multi-Task Bayesian Optimization

    Carl Hvarfner, Samuel Daulton, Maximilian Balandat, Eytan Bakshy · PDF
  99. Policy-Only Power Sampling for Vision-Language-Action Control

    Jimin Park, Wonjeong Choi, Jaekyun Moon · PDF
  100. Position: Offline-Dataset Evaluation for Online Decision-Making Needs an Identification Standard

    Zezheng Lin, Jinhao Gan · PDF
  101. Practical Bayesian Optimization for Scientific Discovery

    Hamza Tahir Chaudhry, Sean H. Murphy, Umesh Padia, Cengiz Pehlevan, James Harrison, George Church, Jasper Snoek · PDF
  102. Provably Stable Neural Dynamics via Koopman Operator Certificates

    Aryan Dadwal · PDF
  103. PubSwap: Public-Data Off-Policy Coordination for Federated RLVR

    Anupam Nayak, Baris Askin, Muhammed Ustaomeroglu, Carlee Joe-Wong, Gauri Joshi · PDF
  104. Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

    JaeHyeok Doo, Byeongguk Jeon, Seonghyeon Ye, Kimin Lee, Minjoon Seo · PDF
  105. Qantara: Bridge-Flow Training for Multi-Paradigm JEPA Control

    Ruslan Rakhimov, George Bredis, Yuriy Maksyuta, Daniil Gavrilov · PDF
  106. Rationale-Guided Policy Optimization: Learning to Reason with Adaptive Rationale Scaffolding

    Hoang Phan, Minh Pham, Chau Pham, Chinmay Hegde, Trung Le, Qi Lei · PDF
  107. Receding-Horizon Control via Drifting Models

    Alessio Russo, Daniele Foffano, Alexandre Proutiere · PDF
  108. Receding-Horizon Execution for Action Chunking in Offline-to-Online Reinforcement Learning

    Jihwan Lee, Geonwoo Cho, Hojun Yi, Jaegyun Im, Giseung Park, Sundong Kim · PDF
  109. Rethinking Bayesian Optimization for Co-Optimizing LLM Training Configurations

    Zhiliang Chen, Alfred Wei Lun Leong, Shao Yong Ong, Apivich Hemachandra, Gregory Kang Ruey Lau, Chuan-Sheng Foo, Zhengyuan Liu, Nancy F. Chen, Bryan Kian Hsiang Low · PDF
  110. REVES: REvision and VErification–Augmented Training for Test-Time Scaling

    Yuanxin Liu, Ruida Zhou, Xinyan Zhao, Amr Sharaf, Hongzhou Lin, Arijit Biswas, Mohammad Ghavamzadeh, Zhaoran Wang, Mingyi Hong · PDF
  111. Reward-Wise Value Estimation for Multi-Reward Optimization in Large Language Models

    Quan Wei, Zhongruo Wang, Chenliang Li, Xi Chen, Oana Frunza, Yang Katie Zhao, Mingyi Hong · PDF
  112. RLRank: Distilling Offline Oracles into Online Policies for Document Reranking

    Sagnik Palchaudhuri, Tadisetty Sai Yashwanth · PDF
  113. Safe-CDT: Adaptive Target Scheduling for Safe Cross-Domain Deployment of Constrained Decision Transformers

    Jiajun Shen, Bijan Sayyarrodsari · PDF
  114. SALT: Learning State- and Temporally-Abstracted World Models for Offline Long-Horizon Decision-Making

    William Huang, Benjamin Freed · PDF
  115. Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

    Bochao Li, Yao Fu, Wei Chen, Fang Kong · PDF
  116. Sampling-Based Safe Reinforcement Learning

    Luca Vignola, Bruce D Lee, Manish Prajapat, Manuel Wendl, Melanie N. Zeilinger, Andreas Krause, Yarden As · PDF
  117. Second-Order Actor-Critic Methods for Discounted MDPs via Policy Hessian Decomposition

    Sanjeev Manivannan, Shuban V · PDF
  118. Soft Forward-Backward Representations for Zero-shot Reinforcement Learning with General Utilities

    Marco Bagatella, Thomas Rupf, Georg Martius, Andreas Krause · PDF
  119. Spectral Perturbation Bounds for Experience Replay: A Bias–Variance Decomposition for Offline Decision-Making

    Saket Atreya · PDF
  120. Static Benchmarks Are Broken: The Case for Dynamic Evaluation of LLMs

    Farhan Ahmed, Chad DeLuca · PDF
  121. Statistical Complexity of Soft Bellman Residual Minimization

    Enoch H. Kang, Kyoungseok Jang · PDF
  122. Structured Behavioral Heterogeneity as Latent Regime Constraints

    Yuting Yan, Haozhou Gao, Xinye Chen, Yinghao Fu, Shuang Li · PDF
  123. The Illusion of State: Sharp Memory-Decay Bounds in Linear SSMs

    Aryan Dadwal · PDF
  124. The Three Regimes of Offline-to-Online Reinforcement Learning

    Lu Li, Tianwei Ni, Yihao Sun, Pierre-Luc Bacon · PDF
  125. Tight Gap-Dependent Regret Bounds and Problem-Independent Bounds for Cost-aware Cascading Bandits

    Yuji Tamakoshi, Shinji Ito · PDF
  126. Towards Adapting Contrastive RL to the Offline Setting

    Catherine Ji, Grace Tan, Benjamin Eysenbach · PDF
  127. TRACER: Trust-Calibrated Offline-to-Online Reinforcement Learning

    Yutong Zhang, Yaoran Yang · PDF
  128. Transfer-Ready Critics: Auditing Conservatism Footprints for Offline-to-Online RL

    BOPENG · PDF
  129. Trust the Batch, Online or Offline: Adaptive Policy Optimization for Post-Training

    Rasool Fakoor, Murdock Aubry, Nicholas Stranges, Alex Smola · PDF
  130. UA2C: Uncertainty-Aware Adaptive Action Chunking for Offline-to-Online Decision-Making in Mixed Traffic

    Hongki Kim, Sangeun Park, Minhae Kwon · PDF
  131. Uncertainty-Guided Reward Labeling for Reinforcement Learning under Limited Feedback

    Renhao Zhang, Shreyas Chaudhari, Bruno Castro da Silva · PDF
  132. Unified Latent Steering and Residual Refinement for Online Improvement of Diffusion Policy Models

    Zhengbang Zhu, Ziyan Li, Xiu Yuan, Hanbo Zhang, Yuxiao Liu, Chongjie Zhang, Yong Yu, Weinan Zhang, Minghuan Liu · PDF
  133. UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

    ADITYA UPADHYAY · PDF
  134. Utilizing Historical Data for Neural Bandits with Domain Shift

    Donovan Barcelona, Lily Xu · PDF
  135. V-VLAPS: Value-Guided Planning for Vision-Language-Action Models

    Ke Ren, Ali Salamatian, Kieran Pattison, Cyrus Neary · PDF
  136. VLA Grounder: Language-Conditioning Space Optimization for Black-Box VLA Models

    Damir Shodiev, Aleksei Staroverov, Nikita Kachaev, Alexey Kovalev, Aleksandr Panov · PDF
  137. What Makes Value Learning Efficient in Residual Reinforcement Learning?

    Guozheng Ma, Lu Li, Haoyu Wang, Zixuan Liu, Pierre-Luc Bacon, Dacheng Tao · PDF
  138. When Does Deep RL Beat Calibrated Baselines? A Benchmark Study on Adaptive Resource Control

    Guilin Zhang, Kai Zhao, Chuanyi Sun, SHAHRYAR SARKANI, John M. Fossaceca · PDF
  139. When Loss Signals Dominate Context: Adaptive Expert Routing in the Loss-Dominance Regime

    Vatsal Khanna, Samridhi Saini · PDF
  140. When Offline Selectors Cannot Beat the Best Single Model: A Diagnostic Study on edX Dropout Prediction

    Tyler Crosse, Alan Nadelsticher Ruvalcaba, Dustin Khang LeDuc, Thomas Trask · PDF
  141. When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

    Zhengzhe Yang · PDF
  142. XQCfD: Accelerating Fast Actor-Critic Algorithms with Prior Data and Prior Policies

    Daniel Palenicek, Florian Vogt, Joe Watson, Ingmar Posner, Danica Kragic, Jan Peters · PDF