NeurIPS 2025 Past Optimization

OPT 2025: Optimization for Machine Learning

NeurIPS 2025 Workshop

Submission deadline
Sep 3, 2025, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (132)

Fetched from OpenReview (v2) on 2026-06-10.

  1. \textsc{LeonArDBO}: Fast and Prior-Driven Bayesian Optimization without Surrogate Modeling

    Efe Mert Karagözlü, Conor Igoe, Barnabas Poczos, Jeff Schneider · PDF
  2. A Monte Carlo Approach to Nonsmooth Convex Optimization via Proximal Splitting Algorithms

    Nicholas Di, Eric Chi, Samy Wu Fung · PDF
  3. A Non-Convex Method for Polynomial Manifold Learning

    Param Mody, Elina Robeva · PDF
  4. A Simplified Analysis of SGD for Linear Regression with Weight Averaging

    Alexandru Meterez, Depen Morwani, Costin-Andrei Oncescu, Jingfeng Wu, Cengiz Pehlevan, Sham M. Kakade · PDF
  5. A stochastic Lagrangian-based method for nonconvex empirical risk minimization with nonlinear constraints

    Dimitri Papadimitriou · PDF
  6. A Theoretical Analysis for CUR Decomposition based Active Learning and Feature Selection

    Zhong Chen, Chen Zhao, Yi He · PDF
  7. A Unified Noise-Curvature View of Loss of Trainability

    Gunbir Singh Baveja, Alex Lewandowski, Mark Schmidt · PDF
  8. Achieving First-Order Statistical Improvements in Data-Driven Optimization

    Henry Lam, Tianyu Wang · PDF
  9. AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates

    Minxin Zhang, Yuxuan Liu, Hayden Schaeffer · PDF
  10. Adaptive acceleration without strong convexity priors or restarts

    Joao V. Cavalcanti, Laurent Lessard, Ashia C. Wilson · PDF
  11. Algorithm design and sharper bounds for improving bandits

    Avrim Blum, Marten Garicano, Kavya Ravichandran, Dravyansh Sharma · PDF
  12. Aligning Distributionally Robust Optimization with Practical Deep Learning Needs

    Dmitrii Feoktistov, Igor Ignashin, Andrey Veprikov, Nikita Borovko, Aleksandr Bogdanov, Savelii Chezhegov, Aleksandr Beznosikov · PDF
  13. Aligning Theory with Practice for Muon-type Optimizers: A Layer-wise Framework

    Artem Riabinin, Egor Shulgin, Kaja Gruntkowska, Peter Richtárik · PDF
  14. Analysis of Schedule Free Non-Convex Optimization

    Connor Brown, Ahmed Khaled, Chi Jin · PDF
  15. Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization

    Fangzhao Zhang, Mert Pilanci · PDF
  16. Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE

    Faris Chaudhry · PDF
  17. Atlas – Rethinking Optimizer Design for Stability and Speed

    Janos Horvath · PDF
  18. Augmented Normalization: Differentiating the Generalized Geometric Median

    Tyler King, Ser-Nam Lim · PDF
  19. Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs

    Shmulik Markovich-Golan, Daniel Ohayon, Itay Niv, Yair Hanani · PDF
  20. Balanced Locality-Sensitive Hashing for Online Data Selection

    Hoang Phan, Yijun Dong, Andrew Gordon Wilson, Qi Lei · PDF
  21. BatchNorm Layers have an Outsized Effect on Adversarial Robustness

    Noam Zeise, Tiffany Joyce Vlaar · PDF
  22. Benefits of Learning Rate Annealing for Tuning-Robustness in Stochastic Optimization

    Amit Attia, Tomer Koren · PDF
  23. Block-Diagonal K-FAC: A Trade-off Between Curvature Information and Resource Efficiency

    Mingzhe Yu, Osamu Tatebe · PDF
  24. Can SGD Handle Heavy-Tailed Noise?

    Ilyas Fatkhullin, Florian Hübler, Guanghui Lan · PDF
  25. Can We Estimate The Entropy Of Arbitrary Distributions Known Up To A Normalization Constant?

    Safa Messaoud, Skander Charni, Elaa Bouazza, Ali Pourghasemi Fatideh, Halima Bensmail · PDF
  26. Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games

    Ashkan Soleymani, Georgios Piliouras, Gabriele Farina · PDF
  27. Central Limit Theorems for Asynchronous Averaged Q-Learning

    Xingtu Liu · PDF
  28. Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping

    Jinwoo Baek · PDF
  29. Communication Efficient LLM Pre-training with SparseLoCo

    Amir Sarfi, Benjamin Thérien, Joel Lidin, Eugene Belilovsky · PDF
  30. Connecting Membership Inference Privacy and Generalization through Instance-Wise Measurements

    Leah Woldemariam, Anna Scaglione · PDF
  31. Convergence for Discrete Parameter Update Schemes

    Paul W Wilson, Fabio Zanasi, George Anthony Constantinides · PDF
  32. Convex Neural Networks For Robust ASR Language Detection

    Miria Feng, Mert Pilanci · PDF
  33. Curriculum-Learning PIELMs for Hemodynamic Flows

    Vikas Dwivedi, Monica Sigovan, Sixou Bruno · PDF
  34. Data Generation without Function Estimation

    Hadi Daneshmand, Ashkan Soleymani · PDF
  35. Data Geometry Determines Generalization Below the Edge-of-Stability

    Tongtong Liang, Alex Cloninger, Rahul Parhi, Yu-Xiang Wang · PDF
  36. Data Source Adaptive Online Learning under Heteroscedastic Noise

    Amith Bhat Hosadurga Anand, Aadirupa Saha, Thomas Kleine Buening, Haipeng Luo · PDF
  37. Data-Aware Training Quality Monitoring and Certification for Deep Learning

    Farhang Yeganegi, Arian Eamaz, Mojtaba Soltanalian · PDF
  38. Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation

    Kaoru Otsuka, Yuki Takezawa, Makoto Yamada · PDF
  39. Designing Algorithms for Entropic Optimal Transport from an Optimisation Perspective

    Vishwak Srinivasan, Qijia Jiang · PDF
  40. Distributionally Robust Nash Equilibria via Variational Inequalities

    Zeinab Alizadeh, Azadeh Farsi, Afrooz Jalilzadeh · PDF
  41. Distributionally Robust Optimization via Diffusion Ambiguity Modeling

    JIAQI WEN, Jianyi Yang · PDF
  42. Domain-Aware Scaling Laws Uncover Data Synergy

    Kimia Hamidieh, Lester Mackey, David Alvarez-Melis · PDF
  43. DRO: A Python Library for Distributionally Robust Optimization in Machine Learning

    Jiashuo Liu, Tianyu Wang, Henry Lam, Hongseok Namkoong, Jose Blanchet · PDF
  44. DSGD-AC: controlled consensus errors improve generalization in decentralized training

    Zesen Wang, Mikael Johansson · PDF
  45. EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients

    He-Yen Hsieh, Hong Wang, H. T. Kung · PDF
  46. Efficient Algorithms for Combinatorial-Bandits with Monotonicity

    Aniket Wagde, Aadirupa Saha · PDF
  47. Efficient Training of CNN Ensembles via Feature-Prioritized Boosting

    Biyi Fang, Truong Vo, Jean Utke, Diego Klabjan · PDF
  48. EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes

    Adam Block, Cyril Zhang · PDF
  49. Empirical-Bayes XTFC for Inverse Parameter Estimation

    Vikas Dwivedi, Monica Sigovan, Sixou Bruno · PDF
  50. Entropy Meets Importance: A Unified Head Importance–Entropy Score for Stable and Efficient Transformer Pruning

    MINSIK CHOI, Hyegang Son, Joohun Hyun, Seokmin Kim, Young Geun Kim · PDF
  51. Error Feedback for Muon and Friends

    Kaja Gruntkowska, Alexander Gaponov, Zhirayr Tovmasyan, Peter Richtárik · PDF
  52. Evolution of the Spectral Dimension of Transformer Activations

    Andy Zeyi Liu, Elliot Paquette, John Sous · PDF
  53. Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers

    Eric Tillmann Bill, Cristian Perez Jensen · PDF
  54. Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers

    akshita gupta, Marieme Ngom, Sam Foreman, Venkatram Vishwanath · PDF
  55. FairPO: Fair Preference Optimization for Multi-Label Learning

    Soumen Kumar Mondal, Prateek Chanda, Akshit Varmora, Ganesh Ramakrishnan · PDF
  56. Fast decentralized gradient tracking for federated learning with local updates

    Chris Junchi Li · PDF
  57. Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization

    Lesi Chen, Junru Li, El Mahdi Chayti, Jingzhao Zhang · PDF
  58. Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update

    Abdulla Jasem Almansoori, Maria Ivanova, Andrey Veprikov, Aleksandr Beznosikov, Samuel Horváth, Martin Takáč · PDF
  59. Feature Learning as a Virtual Covariance Learning

    Taehun Cha, Donghun Lee · PDF
  60. FineAMP: Optimization-Based Automatic Mixed Precision Quantization for Efficient Diffusion Model Inference

    Burak Bartan, Ruizhong Qiu, Rafael Esteves, Yuwei Ren, Weiliang Will Zeng, An Chen · PDF
  61. First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions

    Egor Shulgin, Grigory Malinovsky, Sarit Khirirat, Peter Richtárik · PDF
  62. Flat Minima and Generalization: Insights from Stochastic Convex Optimization

    Matan Schliserman, Shira Vansover-Hager, Tomer Koren · PDF
  63. Foundations of Top-$k$ Decoding for Language Models

    Georgy Noarov, Soham Mallick, Tao Wang, Sunay Joshi, Yan Sun, Yangxinyu Xie, Mengxin Yu, Edgar Dobriban · PDF
  64. From Emergence to Intention: A Statistical Inductive Bias for Tractable Optimization in Multi-Agent Coordination

    Brennen Hill, Mant Koh En Wei, Jishnuanandh Thangavel · PDF
  65. Gradient Descent’s Last Iterate is Often (slightly) Suboptimal

    Guy Kornowski, Ohad Shamir · PDF
  66. Graph-theoretic perspectives on splitting methods for sparse optimal transport

    Jacob Lindbäck, Mikael Johansson · PDF
  67. Grassmannian Optimization Drives Generationlization in Overparameterized DNN

    Changfeng Wang · PDF
  68. Hessian Spectrum is Constant Across Minimizers in Regularized Deep Scalar Factorization

    Anıl Kamber, Rahul Parhi · PDF
  69. Hessian-Dependent Sample Complexity in Zeroth-Order Stochastic Optimization: Nonconvex Support Sampling Is Necessary for Optimality

    Mengtian Hong, Jason D. Lee, Qian Yu · PDF
  70. High-dimensional isotropic scaling dynamics of Muon and SGD

    Guangyuan Wang, Elliot Paquette, Atish Agarwala · PDF
  71. HiSo: Efficient Federated Zeroth-Order Optimization via Hessian-Informed Acceleration and Scalar-Only Communication

    Zhe Li, Bicheng Ying, Zidong Liu, Chaosheng Dong, Haibo Yang · PDF
  72. How Does Layer Normalization Improve Deep $Q$-learning?

    Braham Snyder, Hadi Daneshmand, Chen-Yu Wei · PDF
  73. HyperPALoRA: Parameter-Efficient Pareto Hypernetworks via Preference-Based Diverse Low-Rank Adaptations

    Ashmita Bhattacharya, Malyaban Bal · PDF
  74. Hyperparameter-Free Auto-Scaled Gradient Normalization via Global Standard Deviation Dynamics

    Vincent-Daniel Yun · PDF
  75. Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime

    Beomhan Baek, Minhak Song, Chulhee Yun · PDF
  76. Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data

    Chen Fan, Reza Babanezhad Harikandeh, Christos Thrampoulidis, Mark Schmidt, Sharan Vaswani · PDF
  77. Incentivizing Permissionless Distributed Learning of LLMs

    Joel Lidin, Amir Sarfi, Evangelos Pappas, Samuel Dare, Eugene Belilovsky, Jacob steeves · PDF
  78. Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression

    Tingkai Yan, Haodong Wen, Binghui Li, Kairong Luo, Wenguang Chen, Kaifeng Lyu · PDF
  79. Learning by solving differential equations

    Benoit Dherin, Michael Munn, Hanna Mazzawi, Michael Wunder, Sourabh Medapati, Javier Gonzalvo · PDF
  80. Lipschitz Optimization via Weighted Sampling Based on Expected Potential Maximizers Reduction

    Hideyuki Masui, Koki Nakane, Renshi Nagasawa · PDF
  81. LOTION: Smoothing the Optimization Landscape for Quantized Training

    Mujin Kwun, Depen Morwani, Huangyuan Su, Stephanie Gil, Nikhil Anand, Sham M. Kakade · PDF
  82. M+Adam: Stable Low-Precision Training with Combined Adam--Madam Updates

    Xiaoyuan Liang, Sebastian Loeschcke, Mads Toftrup, Anima Anandkumar · PDF
  83. Multi-Timescale Gradient Sliding for Distributed Optimization

    Junhui Zhang, Patrick Jaillet · PDF
  84. Muon Optimizes Under Spectral Norm Constraints

    Lizhang Chen, Jonathan Li, qiang liu · PDF
  85. New Optimization Methods for Very Large Scale SVMs

    Yifan Kang, Yarui Cao, Kai Liu · PDF
  86. On Optimizing Large Scale Multi-Class Logistic Regression

    Yifan Kang, Yarui Cao, Kai Liu · PDF
  87. On Riemannian Gradient Descent Algorithm using gradient averaging

    Saugata Purkayastha, Sukannya Purkayastha · PDF
  88. On the Benefits of Weight Normalization for Overparameterized Matrix Sensing

    Yudong Wei, Liang Zhang, Bingcong Li, Niao He · PDF
  89. On the Finite-Sample Bias of Minimizing Expected Wasserstein Loss Between Empirical Distributions

    Cheongjae Jang, Yung-Kyun Noh · PDF
  90. On the Limits of Momentum in Decentralized and Federated Optimization

    Riccardo Zaccone, Sai Praneeth Karimireddy, Carlo Masone · PDF
  91. On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations

    Mahdi Ghaznavi, Hesam Asadollahzadeh · PDF
  92. On the Rollout-Training Mismatch in Modern RL Systems

    Feng Yao, Liyuan Liu, Dinghuai Zhang, Chengyu Dong, Jingbo Shang, Jianfeng Gao · PDF
  93. One-Sided Matrix Completion from Ultra-Sparse Samples

    Hongyang R. Zhang, Zhenshuo Zhang, Huy Nguyen, Guanghui Lan · PDF
  94. OptiBridge: Multi-Scale Multi-Shift Bridging for Conditioning Optimization Landscapes

    Farnaz Salehi Sadaghiani, Mojtaba Soltanalian · PDF
  95. Optimal Implicit Bias in Linear Regression

    K Nithin Varma, Babak Hassibi · PDF
  96. Optimized Statistical Ranking is All You Need for Robust Coreset Selection in Efficient Transformer-Based Spam Detection

    Aisha Hamad Hassan, Tushar Shinde · PDF
  97. OrthoGrad Improves Neural Calibration

    C. Evans Hedges · PDF
  98. Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum

    Abdurakhmon Sadiev, Yury Demidovich, Grigory Malinovsky, Igor Sokolov, Sarit Khirirat, Peter Richtárik · PDF
  99. Partial Parameter Updates for Efficient Distributed Training

    Anastasiia Filippova, Angelos Katharopoulos, David Grangier, Ronan Collobert · PDF
  100. PEARL-Prox: Proximal Algorithm for Resolving Player Drift in Multiplayer Federated Learning

    TaeHo Yoon, Nicolas Loizou · PDF
  101. Per-Group Distributionally Robust Optimization (Per-GDRO) with Learnable Ambiguity Set Sizes via Bilevel Optimization

    Seobeom Jung, Woojae Lee, Jihun Hamm, Jangho Park · PDF
  102. PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts

    Zeman Li, Yuan Deng, Peilin Zhong, Meisam Razaviyayn, Vahab Mirrokni · PDF
  103. Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games

    Fivos Kalogiannis, Gabriele Farina · PDF
  104. Primal-dual hybrid algorithms for chi-squared regularized Optimal Transport: statistical-computational trade-offs and applications to Wasserstein Barycenters

    Denys Ruban, Augusto Gerolin · PDF
  105. Projected Compression

    Maciej Stefaniak, Michał Krutul, Mikołaj Dziok, Jan Małaśnicki, Maciej Pióro, Jakub Krajewski, Sebastian Jaszczur, Marek Cygan, Kamil Adamczewski, Jan Ludziejewski · PDF
  106. Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tail Class Imbalance

    Robin Yadav, Shuo Xie, Tianhao Wang, Zhiyuan Li · PDF
  107. Quantum Non-Linear Bandit Optimization

    Zakaria Shams Siam, Chaowen Guan, Chong Liu · PDF
  108. Quantum Optimal Transport: Regularization and Algorithms

    Pavlo Pelikh, Augusto Gerolin · PDF
  109. Quasi-Newton Methods for Federated Learning with Error Feedback

    Yanlin Wu, Dmitry Kamzolov, Martin Takáč · PDF
  110. Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs

    Nandan Kumar Jha, Brandon Reagen · PDF
  111. Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity

    Zhirayr Tovmasyan, Grigory Malinovsky, Laurent Condat, Peter Richtárik · PDF
  112. Revisiting the Geometrically Decaying Step Size: Linear Convergence for Smooth or Non-Smooth Functions

    Jihun Kim · PDF
  113. Sharpness-Aware Minimization with Z-Score Gradient Filtering

    Vincent-Daniel Yun · PDF
  114. Simultaneous Fine-Tuning and Pruning of LLMs

    Finn Reinecke, Jörg K.H. Franke, Frank Hutter, Michael Hefenbrock · PDF
  115. Sparse Adversarial Perturbation-Driven Scalable Coreset Optimization

    Tushar Shinde, Manasa Madabhushi · PDF
  116. Spiking Brain Compression: Exploring One-Shot Post-Training Pruning and Quantization for Spiking Neural Networks

    Lianfeng Shi, Ao Li, Benjamin Ward-Cherrier · PDF
  117. Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game

    Barna Pásztor, Thomas Kleine Buening, Andreas Krause · PDF
  118. Stochastic Neural Tangent Kernel: Revisiting the NTK For SGD

    Bhavesh Kumar, Dan Mikulincer · PDF
  119. Switching Gradient Methods for Constrained Federated Optimization

    Antesh Upadhyay, Sang Bin Moon, Abolfazl Hashemi · PDF
  120. The Hebbian Forward-Forward Algorithm

    Andrii Krutsylo · PDF
  121. The Hidden Cost of Approximation in Online Mirror Descent

    Ofir Schlisselberg, Uri Sherman, Tomer Koren, Yishay Mansour · PDF
  122. The Limits of large learning rates: A Case Study in Single Index Models

    Bhavesh Kumar, Libin Zhu · PDF
  123. Toward the First Optimization Framework for Low-Rank Adaptation

    Grigory Malinovsky, Umberto Michieli, Hasan Abed Al Kader Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtárik · PDF
  124. Towards Characterizing the Complexity of Riemannian Online Convex Optimization

    Hibiki Fukushima, Hiroshi Hirai, Shinji Ito · PDF
  125. Towards Quantifying the Hessian Structure of Neural Networks

    Zhaorui Dong, Yushun Zhang, Jianfeng Yao, Ruoyu Sun · PDF
  126. Towards Robust Unroll Generalization in Learned Optimizers

    Xiaolong Huang, Benjamin Thérien, Eugene Belilovsky · PDF
  127. Understanding and Improving Shampoo via Kullback–Leibler Minimization

    Wu Lin, Scott C. Lowe, Felix Dangel, Runa Eschenhagen, Zikun Xu, Roger Baker Grosse · PDF
  128. Weight Decay may matter more than µP for Learning Rate Transfer in Practice

    Atli Kosson, Jeremy Welborn, Yang Liu, Martin Jaggi, Xi Chen · PDF
  129. What really matters in matrix-whitening optimizers?

    Kevin Frans, Pieter Abbeel, Sergey Levine · PDF
  130. Who to Trust? Aggregating Client Knowledge in Logit-Based Federated Learning

    Viktor Kovalchuk, Nikita Kotelevskii, Maxim Panov, Samuel Horváth, Martin Takáč · PDF
  131. Why Does Stochastic Gradient Descent Slow Down in Low-Precision Training?

    Vincent-Daniel Yun · PDF
  132. Zero-Infinity GAN: Stable Dynamics and Implicit Bias of Extragradient

    Kyungjae Lee, Donghwan Kim · PDF