NeurIPS 2024 Past Optimization

OPT 2024: Optimization for Machine Learning

NeurIPS 2024 Workshop

Submission deadline
Sep 28, 2024, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (106)

Fetched from OpenReview (v2) on 2026-06-10.

  1. $\mu$LO: Compute-Efficient Meta-Generalization of Learned Optimizers

    Benjamin Thérien, Charles-Étienne Joseph, Boris Knyazev, Edouard Oyallon, Irina Rish, Eugene Belilovsky · PDF
  2. A Continuous Variable Optimization method for the Quadratic Assignment Problem

    Aron Vizkeleti, Timothee Leleu · PDF
  3. A fast and efficient randomized quasi-Newton method

    Danny Duan, Hanbaek Lyu · PDF
  4. A Stochastic Algorithm for Sinkhorn Distance-Regularized Distributionally Robust Optimization

    Yufeng Yang, Yi Zhou, Zhaosong Lu · PDF
  5. A theoretical study of the $(L_0,L_1)$-smoothness condition in deep learning

    Y Cooper · PDF
  6. A Unified Convergence Theory for Large Language Model Efficient Fine-tuning

    Zhanhong Jiang, Nastaran Saadati, Aditya Balu, Minh Pham, Joshua Russell Waite, Nasla Saleem, Chinmay Hegde, Soumik Sarkar · PDF
  7. ACCO: Accumulate while you Communicate, Hiding Communications in Distributed LLM Training

    Adel Nabli, Louis Fournier, Pierre ERBACHER, Louis Serrano, Eugene Belilovsky, Edouard Oyallon · PDF
  8. Adaptive Partitioning Schemes for Black-Box Optimization

    Raja Sunkara, Ardhendu Tripathy · PDF
  9. Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models

    Zeman Li, Xinwei Zhang, Peilin Zhong, Yuan Deng, Meisam Razaviyayn, Vahab Mirrokni · PDF
  10. AdEMAMix: Better and Faster Training with Older Gradients

    Matteo Pagliardini, Pierre Ablin, David Grangier · PDF
  11. Aggregating Data for Optimal and Private Learning

    Sushant Agarwal, Yukti Makhija, Rishi Saket, Aravindan Raghuveer · PDF
  12. Aligned Multi-Objective Optimization

    Yonathan Efroni, Daniel Jiang, Ben Kretzu, Jalaj Bhandari, Zheqing Zhu, Karen Ullrich · PDF
  13. Amplitude Modulated Riemannian Optimization for QAP

    Timothee Leleu, Aron Vizkeleti, Sam Reifenstein · PDF
  14. An Elementary Predictor Obtaining 2\sqrt{T} Distance to Calibration

    Eshwar Ram Arunachaleswaran, Natalie Collina, Aaron Roth, Mirah Shi · PDF
  15. Applications of fractional calculus in learned optimization

    Teodor Alexandru Szente, James Harrison, Mihai Zanfir, Cristian Sminchisescu · PDF
  16. Batch size invariant Adam

    Xi Wang, Laurence Aitchison · PDF
  17. BlockLLM: Memory-Efficient Adaptation of LLMs by Selecting and Optimizing the Right Coordinate Blocks

    Amrutha Varshini Ramesh, Vignesh Ganapathiraman, Issam H. Laradji, Mark Schmidt · PDF
  18. Communication-efficient Algorithms Under Generalized Smoothness Assumptions

    Sarit Khirirat, Abdurakhmon Sadiev, Artem Riabinin, Eduard Gorbunov, Peter Richtárik · PDF
  19. Communication-Efficient Loss Minimization over Heterogeneous Data with Federated Hierarchical Ensemble Aggregation via Distillation

    Sayantan Chowdhury, Ben Liang, Ali Tizghadam, Ilijc Albanese · PDF
  20. Connections between Schedule-Free SGD, Accelerated SGD Variants, and Weight Averaging

    Depen Morwani, Nikhil Vyas, Hanlin Zhang, Sham M. Kakade · PDF
  21. Consensus Based Optimization Accelerates Gradient Descent

    Anagha Satish, Ricardo Baptista, Franca Hoffmann · PDF
  22. Cyclic Data Parallelism for Efficient Parallelism of Deep Neural Networks

    Louis Fournier, Edouard Oyallon · PDF
  23. DADA: Dual Averaging with Distance Adaptation

    Mohammad Moshtaghifar, Anton Rodomanov, Daniil Vankov, Sebastian U Stich · PDF
  24. Deconstructing What Makes a Good Optimizer for Language Models

    Rosie Zhao, Depen Morwani, David Brandfonbrener, Nikhil Vyas, Sham M. Kakade · PDF
  25. Dense Backpropagation Improves Routing for Sparsely-Gated Mixture-of-Experts

    Ashwinee Panda, Vatsal Baherwani, Zain Sarwar, Benjamin Thérien, Stephen Rawls, Sambit Sahu, Supriyo Chakraborty, Tom Goldstein · PDF
  26. Differentially Private Random Block Coordinate Descent

    Arto Maranjyan, Abdurakhmon Sadiev, Peter Richtárik · PDF
  27. Dimensionality Reduction Techniques for Global Bayesian Optimisation

    Luo Long, Coralia Cartis, Paz Fink Shustin · PDF
  28. Discrete-Continuous Variational Optimization with Local Gradients

    Jonathan H Warrell, Francesco Alesiani, Cameron Smith, Anja Mösch, Martin Renqiang Min · PDF
  29. DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction

    Xinwei Zhang, Zhiqi Bu, Borja Balle, Mingyi Hong, Meisam Razaviyayn, Vahab Mirrokni · PDF
  30. Distributionally Robust Linear Regression With Block Lewis Weights

    Naren Sarayu Manoj, Kumar Kshitij Patel · PDF
  31. Don't Be So Positive: Negative Step Sizes in Second-Order Methods

    Betty Shea, Mark Schmidt · PDF
  32. Dual Feature Reduction for the Sparse-Group Lasso and its Adaptive Variant

    Fabio Feser, Marina Evangelou · PDF
  33. Dueling in the Dark: An Efficient and Optimal Mirror Descent Approach for Online Optimization with Adversarial Preferences

    Aadirupa Saha, Yonathan Efroni, Barry-John Theobald · PDF
  34. Efficient Levenberg-Marquardt for SLAM

    Amir Belder, Refael Vivanti · PDF
  35. Estimating Vote Choice in U.S. Elections with Approximate Poisson-Binomial Logistic Regression

    Nic Fishman, Evan Rosenman · PDF
  36. Extra-Gradient and Optimistic Gradient Descent Converge in Iterates Faster than $O(1/\sqrt{T})$ in All Monotone Lipschitz Variational Inequalities

    Kimon Antonakopoulos · PDF
  37. Fast Convergence of Softmax Policy Mirror Ascent for Bandits & Tabular MDPs

    Reza Asad, Reza Babanezhad Harikandeh, Issam H. Laradji, Nicolas Le Roux, Sharan Vaswani · PDF
  38. Fast decentralized gradient tracking for federated learning with local updates: From mini to minimax optimization

    Chris Junchi Li · PDF
  39. From Gradient Clipping to Normalization for Heavy Tailed SGD

    Florian Hübler, Ilyas Fatkhullin, Niao He · PDF
  40. Glocal Smoothness: Line Search can really help!

    Curtis Fox, Mark Schmidt · PDF
  41. Graph Neural Networks for Hyperparameter Inference in Ising Solvers

    Edward Jiang, Sam Reifenstein, Milin Doppalapudi, Timothee Leleu · PDF
  42. Hierarchical Simplicity Bias of Neural Networks

    Zhehang Du · PDF
  43. High Dimensional First Order Mini-Batch Algorithms on Quadratic Problems

    Andrew Nicholas Cheng, Kiwon Lee, Courtney Paquette · PDF
  44. How Does Critical Batch Size Scale in Pre-training?

    Hanlin Zhang, Depen Morwani, Nikhil Vyas, Jingfeng Wu, Difan Zou, Udaya Ghai, Dean Foster, Sham M. Kakade · PDF
  45. Improving Deep Learning Speed and Performance through Synaptic Neural Balance

    Antonios Alexos, Ian Domingo, Pierre Baldi · PDF
  46. In the Search for Optimal Portfolios of Counterstrategies in the Large Imperfect Information Games

    Karolina Drabent, David Milec, Ondrej Kubicek, Viliam Lisý · PDF
  47. Incentivizing Truthful Collaboration in Heterogeneous Federated Learning

    Dimitar Chakarov, Nikita Tsoy, Kristian Minchev, Nikola Konstantinov · PDF
  48. Intuitive Analysis of the Quantization based Optimization : From establishing a SDE to Quantum Mechanical Perspective

    Jinwuk Seok, Changsik Cho · PDF
  49. Langevin Dynamics: A Unified Perspective on Optimization via Lyapunov Potentials

    August Y Chen, Ayush Sekhari, Karthik Sridharan · PDF
  50. Learning Morphisms with Gauss-Newton Approximation for Growing Networks

    Neal Gregory Lawton, Aram Galstyan, Greg Ver Steeg · PDF
  51. Linear Attention Sequence Parallelism

    Weigao Sun, Zhen Qin, Dong Li, Xuyang Shen, Yu Qiao, Yiran Zhong · PDF
  52. Lion's sign noise can make training more stable

    Simon Elistratov, Andrey Podivilov, Timofei Iuzhakov, Dmitry Vetrov · PDF
  53. Local Curvature Descent: Squeezing More Curvature out of Standard and Polyak Gradient Descent

    Peter Richtárik, Simone Maria Giancola, Dymitr Lubczyk, Robin Yadav · PDF
  54. LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression

    Laurent Condat, Arto Maranjyan, Peter Richtárik · PDF
  55. Memory Efficient Adaptive Stochastic Optimization via Subset-Norm

    Thien Hang Nguyen, Huy Nguyen · PDF
  56. Memory-Efficient Large Language Model (LLM) Training and Fine-Tuning via Gradient Subspace Tracking

    Sahar Rajabi, Sirisha Rambhatla · PDF
  57. MindFlayer: Efficient Asynchronous Parallel SGD in the Presence of Heterogeneous and Random Worker Compute Times

    Arto Maranjyan, Omar Shaikh Omar, Peter Richtárik · PDF
  58. Modularity aided consistent attributed graph clustering via coarsening

    Samarth Bhatia, Yukti Makhija, Manoj Kumar, Sandeep Kumar · PDF
  59. Multi Objective Regionalized Bayesian Optimization via Entropy Search

    Thomas James, Sinnu Thomas · PDF
  60. Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

    Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou · PDF
  61. Multimodal Federated Learning with Model Personalization

    Ratun Rahman, Dinh C.Nguyen · PDF
  62. Neural Entropic Multimarginal Optimal Transport

    Dor Tsur, Ziv Goldfeld, Kristjan Greenewald, Haim H. Permuter · PDF
  63. Neural Networks with Complex-Valued Weights Have No Spurious Local Minima

    Xingtu Liu · PDF
  64. Nonlinear tomographic reconstruction via nonsmooth optimization

    Vasileios Charisopoulos, Rebecca Willett · PDF
  65. Nonmonotone Line Searches Operate at the Edge of Stability

    Curtis Fox, Leonardo Galli, Mark Schmidt, Holger Rauhut · PDF
  66. Normalization Matters for Optimization Performance on Graph Neural Networks

    Alan Milligan, Frederik Kunstner, Hamed Shirzad, Mark Schmidt, Danica J. Sutherland · PDF
  67. Old Optimizer, New Norm: An Anthology

    Jeremy Bernstein, Laker Newhouse · PDF
  68. On the Convergence of DP-SGD with Adaptive Clipping

    Egor Shulgin, Peter Richtárik · PDF
  69. On the Convergence of FedProx with Extrapolation and Inexact Prox

    Hanmin Li, Peter Richtárik · PDF
  70. On the Crucial Role of Initialization for Matrix Factorization

    Bingcong Li, Liang Zhang, Aryan Mokhtari, Niao He · PDF
  71. On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization

    Guy Kornowski, Swati Padmanabhan, Ohad Shamir · PDF
  72. On the Hypomonotone Class of Variational Inequalities

    Khaled Alomar, Tatjana Chavdarova · PDF
  73. On the Inherent Privacy of Two Point Zeroth Order Projected Gradient Descent

    Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan · PDF
  74. Online Nonconvex Bilevel Optimization with Bregman Divergences

    Jason Bohne, David S Rosenberg, Gary Kazantsev, Pawel Polak · PDF
  75. Optimal Transport for Probabilistic Circuits

    Adrian Ciotinga, YooJung Choi · PDF
  76. Optimizing Attention

    Hanno Ackermann, Hong Cai, Markus Nagel, Leyla Mirvakhabova, Farhad G. Zanjani, Fatih Porikli · PDF
  77. Partially Observed Trajectory Inference using Optimal Transport and a Dynamics Prior

    Anming Gu, Edward Chien, Kristjan Greenewald · PDF
  78. Path Integral Optimiser: Global Optimisation via Neural Schrödinger-Föllmer Diffusion

    Max McGuinness, Eirik Fladmark, Francisco Vargas · PDF
  79. Personalized Federated Learning via Low-Rank Matrix Factorization

    Ali Dadras, Sebastian U Stich, Alp Yurtsever · PDF
  80. Policy Optimization for Strictly Batch Imitation Learning

    Rishabh Agrawal, Nathan Dahlin, Rahul Jain, Ashutosh Nayyar · PDF
  81. Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

    Hiroki Naganuma, Xinzhi Zhang, Man-Chung Yue, Ioannis Mitliagkas, Russell J. Hewett, Philipp Andre Witte, Yin Tat Lee · PDF
  82. Remove Symmetries to Control Model Expressivity and Improve Optimization

    Liu Ziyin, Yizhou Xu, Isaac L. Chuang · PDF
  83. Revisiting the Initial Steps in Adaptive Gradient Descent Optimization

    Abulikemu Abuduweili, Changliu Liu · PDF
  84. Role of Parametrization in Learning Dynamics of Recurrent Neural Networks

    Adwait Datar, Chinmay Datar, Zahra Monfared, Felix Dietrich · PDF
  85. Scalable Second-Order Optimization Algorithms for Minimizing Low-rank Functions

    Edward Tansley, Coralia Cartis · PDF
  86. Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks

    Shikai Qiu, Atish Agarwala, Jeffrey Pennington, Lechao Xiao · PDF
  87. Second-Order Forward-Mode Automatic Differentiation for Optimization

    Adam D. Cobb, Atilim Gunes Baydin, Barak A. Pearlmutter, Susmit Jha · PDF
  88. SICNN: Sparsity-induced Input Convex Neural Network for Optimal Transport

    Peter Chen, Yue Xie, Qingpeng Zhang · PDF
  89. Simple and Scalable Federated Learning with Uncertainty via Improved Variational Online Newton

    Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai · PDF
  90. SOAP: Improving and Stabilizing Shampoo using Adam

    Nikhil Vyas, Depen Morwani, Rosie Zhao, Itai Shapira, David Brandfonbrener, Lucas Janson, Sham M. Kakade · PDF
  91. Solving hidden monotone variational inequalities with surrogate losses

    Ryan D'Orazio, Danilo Vucetic, Zichu Liu, Junhyung Lyle Kim, Ioannis Mitliagkas, Gauthier Gidel · PDF
  92. SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Nonconvex Cross-Device Federated Learning

    Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richtárik · PDF
  93. Spurious Stationarity and Hardness Results for Mirror Descent

    He Chen, Jiajin Li, Anthony Man-Cho So · PDF
  94. Statistical Inference in Latent Convex Objectives with Stream Data

    Rohan Chauhan, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Michael I. Jordan · PDF
  95. Stochastic Proximal Point Methods for Monotone Inclusions under Expected Similarity

    Abdurakhmon Sadiev, Laurent Condat, Peter Richtárik · PDF
  96. Stochastic Quasi-Variational Inequalities: Convergence Analysis Beyond Strong Monotonicity

    Zeinab Alizadeh, Afrooz Jalilzadeh · PDF
  97. Structured Regularization on the SPD Manifold

    Andrew Nicholas Cheng, Melanie Weber · PDF
  98. Tensor-GaLore: Memory-Efficient Training via Gradient Tensor Decomposition

    Robert Joseph George, David Pitt, Jiawei Zhao, Jean Kossaifi, Cheng Luo, Yuandong Tian, Anima Anandkumar · PDF
  99. The Crucial Role of Samplers in Online Direct Preference Optimization

    Ruizhe Shi, Runlong Zhou, Simon Shaolei Du · PDF
  100. The Dimension Strikes Back with Gradients: Generalization of Gradient Methods in Stochastic Convex Optimization

    Matan Schliserman, Uri Sherman, Tomer Koren · PDF
  101. Tight Lower Bounds and Improved Convergence in Performative Prediction

    Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami, Simon Lacoste-Julien, Gauthier Gidel · PDF
  102. u-$\mu$P: The Unit-Scaled Maximal Update Parametrization

    Charlie Blake, Constantin Eichenberg, Josef Dean, Lukas Balles, Luke Yuri Prince, Björn Deiseroth, Andres Felipe Cruz-Salinas, Carlo Luschi, Samuel Weinbach, Douglas Orr · PDF
  103. Uncoupled and Convergent Learning in Monotone Games under Bandit Feedback

    Jing Dong, Baoxiang Wang, Yaoliang Yu · PDF
  104. Understanding Adam Requires Better Rotation Dependent Assumptions

    Tianyue H. Zhang, Lucas Maes, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret · PDF
  105. WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

    Louis Fournier, Adel Nabli, Masih Aminbeidokhti, Marco Pedersoli, Eugene Belilovsky, Edouard Oyallon · PDF
  106. Weak to Strong Learning from Aggregate Labels

    Yukti Makhija, Rishi Saket · PDF