NeurIPS 2025 Past Optimization
OPT 2025: Optimization for Machine Learning
NeurIPS 2025 Workshop
- Submission deadline
- Sep 3, 2025, 12:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (132)
Fetched from OpenReview (v2) on 2026-06-10.
-
\textsc{LeonArDBO}: Fast and Prior-Driven Bayesian Optimization without Surrogate Modeling
-
A Monte Carlo Approach to Nonsmooth Convex Optimization via Proximal Splitting Algorithms
-
A Non-Convex Method for Polynomial Manifold Learning
-
A Simplified Analysis of SGD for Linear Regression with Weight Averaging
-
A stochastic Lagrangian-based method for nonconvex empirical risk minimization with nonlinear constraints
-
A Theoretical Analysis for CUR Decomposition based Active Learning and Feature Selection
-
A Unified Noise-Curvature View of Loss of Trainability
-
Achieving First-Order Statistical Improvements in Data-Driven Optimization
-
AdaGrad Meets Muon: Adaptive Stepsizes for Orthogonal Updates
-
Adaptive acceleration without strong convexity priors or restarts
-
Algorithm design and sharper bounds for improving bandits
-
Aligning Distributionally Robust Optimization with Practical Deep Learning Needs
-
Aligning Theory with Practice for Muon-type Optimizers: A Layer-wise Framework
-
Analysis of Schedule Free Non-Convex Optimization
-
Analyzing Neural Network-Based Generative Diffusion Models through Convex Optimization
-
Asymptotic and Finite-Time Guarantees for Langevin-Based Temperature Annealing in InfoNCE
-
Atlas – Rethinking Optimizer Design for Stability and Speed
-
Augmented Normalization: Differentiating the Generalized Geometric Median
-
Automatic mixed precision for optimizing gained time with constrained loss mean-squared-error based on model partition to sequential sub-graphs
-
Balanced Locality-Sensitive Hashing for Online Data Selection
-
BatchNorm Layers have an Outsized Effect on Adversarial Robustness
-
Benefits of Learning Rate Annealing for Tuning-Robustness in Stochastic Optimization
-
Block-Diagonal K-FAC: A Trade-off Between Curvature Information and Resource Efficiency
-
Can SGD Handle Heavy-Tailed Noise?
-
Can We Estimate The Entropy Of Arbitrary Distributions Known Up To A Normalization Constant?
-
Cautious Optimism: A Meta-Algorithm for Near-Constant Regret in General Games
-
Central Limit Theorems for Asynchronous Averaged Q-Learning
-
Chebyshev Moment Regularization (CMR): Condition-Number Control with Moment Shaping
-
Communication Efficient LLM Pre-training with SparseLoCo
-
Connecting Membership Inference Privacy and Generalization through Instance-Wise Measurements
-
Convergence for Discrete Parameter Update Schemes
-
Convex Neural Networks For Robust ASR Language Detection
-
Curriculum-Learning PIELMs for Hemodynamic Flows
-
Data Generation without Function Estimation
-
Data Geometry Determines Generalization Below the Edge-of-Stability
-
Data Source Adaptive Online Learning under Heteroscedastic Noise
-
Data-Aware Training Quality Monitoring and Certification for Deep Learning
-
Delayed Momentum Aggregation: Communication-efficient Byzantine-robust Federated Learning with Partial Participation
-
Designing Algorithms for Entropic Optimal Transport from an Optimisation Perspective
-
Distributionally Robust Nash Equilibria via Variational Inequalities
-
Distributionally Robust Optimization via Diffusion Ambiguity Modeling
-
Domain-Aware Scaling Laws Uncover Data Synergy
-
DRO: A Python Library for Distributionally Robust Optimization in Machine Learning
-
DSGD-AC: controlled consensus errors improve generalization in decentralized training
-
EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients
-
Efficient Algorithms for Combinatorial-Bandits with Monotonicity
-
Efficient Training of CNN Ensembles via Feature-Prioritized Boosting
-
EMA Without the Lag: Bias-Corrected Iterate Averaging Schemes
-
Empirical-Bayes XTFC for Inverse Parameter Estimation
-
Entropy Meets Importance: A Unified Head Importance–Entropy Score for Stable and Efficient Transformer Pruning
-
Error Feedback for Muon and Friends
-
Evolution of the Spectral Dimension of Transformer Activations
-
Exploring Magnitude Preservation and Rotation Modulation in Diffusion Transformers
-
Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers
-
FairPO: Fair Preference Optimization for Multi-Label Learning
-
Fast decentralized gradient tracking for federated learning with local updates
-
Faster Gradient Methods for Highly-smooth Stochastic Bilevel Optimization
-
Faster Than SVD, Smarter Than SGD: The OPLoRA Alternating Update
-
Feature Learning as a Virtual Covariance Learning
-
FineAMP: Optimization-Based Automatic Mixed Precision Quantization for Efficient Diffusion Model Inference
-
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
-
Flat Minima and Generalization: Insights from Stochastic Convex Optimization
-
Foundations of Top-$k$ Decoding for Language Models
-
From Emergence to Intention: A Statistical Inductive Bias for Tractable Optimization in Multi-Agent Coordination
-
Gradient Descent’s Last Iterate is Often (slightly) Suboptimal
-
Graph-theoretic perspectives on splitting methods for sparse optimal transport
-
Grassmannian Optimization Drives Generationlization in Overparameterized DNN
-
Hessian Spectrum is Constant Across Minimizers in Regularized Deep Scalar Factorization
-
Hessian-Dependent Sample Complexity in Zeroth-Order Stochastic Optimization: Nonconvex Support Sampling Is Necessary for Optimality
-
High-dimensional isotropic scaling dynamics of Muon and SGD
-
HiSo: Efficient Federated Zeroth-Order Optimization via Hessian-Informed Acceleration and Scalar-Only Communication
-
How Does Layer Normalization Improve Deep $Q$-learning?
-
HyperPALoRA: Parameter-Efficient Pareto Hypernetworks via Preference-Based Diverse Low-Rank Adaptations
-
Hyperparameter-Free Auto-Scaled Gradient Normalization via Global Standard Deviation Dynamics
-
Implicit Bias of Per-sample Adam on Separable Data: Departure from the Full-batch Regime
-
Implicit Bias of Polyak and Line-Search Step Sizes on Linear Classification with Separable Data
-
Incentivizing Permissionless Distributed Learning of LLMs
-
Larger Datasets Can Be Repeated More: A Theoretical Analysis of Multi-Epoch Scaling in Linear Regression
-
Learning by solving differential equations
-
Lipschitz Optimization via Weighted Sampling Based on Expected Potential Maximizers Reduction
-
LOTION: Smoothing the Optimization Landscape for Quantized Training
-
M+Adam: Stable Low-Precision Training with Combined Adam--Madam Updates
-
Multi-Timescale Gradient Sliding for Distributed Optimization
-
Muon Optimizes Under Spectral Norm Constraints
-
New Optimization Methods for Very Large Scale SVMs
-
On Optimizing Large Scale Multi-Class Logistic Regression
-
On Riemannian Gradient Descent Algorithm using gradient averaging
-
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
-
On the Finite-Sample Bias of Minimizing Expected Wasserstein Loss Between Empirical Distributions
-
On the Limits of Momentum in Decentralized and Federated Optimization
-
On the Potential of the Four-Point Model for Studying the Role of Optimization in Robustness to Spurious Correlations
-
On the Rollout-Training Mismatch in Modern RL Systems
-
One-Sided Matrix Completion from Ultra-Sparse Samples
-
OptiBridge: Multi-Scale Multi-Shift Bridging for Conditioning Optimization Landscapes
-
Optimal Implicit Bias in Linear Regression
-
Optimized Statistical Ranking is All You Need for Robust Coreset Selection in Efficient Transformer-Based Spam Detection
-
OrthoGrad Improves Neural Calibration
-
Parameter-Agnostic Error Feedback Enhanced With Hessian-Corrected Momentum
-
Partial Parameter Updates for Efficient Distributed Training
-
PEARL-Prox: Proximal Algorithm for Resolving Player Drift in Multiplayer Federated Learning
-
Per-Group Distributionally Robust Optimization (Per-GDRO) with Learnable Ambiguity Set Sizes via Bilevel Optimization
-
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
-
Policy Gradient Methods Converge Globally in Imperfect-Information Extensive-Form Games
-
Primal-dual hybrid algorithms for chi-squared regularized Optimal Transport: statistical-computational trade-offs and applications to Wasserstein Barycenters
-
Projected Compression
-
Provable Benefit of Sign Descent: A Minimal Model Under Heavy-Tail Class Imbalance
-
Quantum Non-Linear Bandit Optimization
-
Quantum Optimal Transport: Regularization and Algorithms
-
Quasi-Newton Methods for Federated Learning with Error Feedback
-
Regularizing the Entropy Landscape of Self-Attention: Towards a Soft Inductive Bias in LLMs
-
Revisiting Stochastic Proximal Point Methods: Generalized Smoothness and Similarity
-
Revisiting the Geometrically Decaying Step Size: Linear Convergence for Smooth or Non-Smooth Functions
-
Sharpness-Aware Minimization with Z-Score Gradient Filtering
-
Simultaneous Fine-Tuning and Pruning of LLMs
-
Sparse Adversarial Perturbation-Driven Scalable Coreset Optimization
-
Spiking Brain Compression: Exploring One-Shot Post-Training Pruning and Quantization for Spiking Neural Networks
-
Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game
-
Stochastic Neural Tangent Kernel: Revisiting the NTK For SGD
-
Switching Gradient Methods for Constrained Federated Optimization
-
The Hebbian Forward-Forward Algorithm
-
The Hidden Cost of Approximation in Online Mirror Descent
-
The Limits of large learning rates: A Case Study in Single Index Models
-
Toward the First Optimization Framework for Low-Rank Adaptation
-
Towards Characterizing the Complexity of Riemannian Online Convex Optimization
-
Towards Quantifying the Hessian Structure of Neural Networks
-
Towards Robust Unroll Generalization in Learned Optimizers
-
Understanding and Improving Shampoo via Kullback–Leibler Minimization
-
Weight Decay may matter more than µP for Learning Rate Transfer in Practice
-
What really matters in matrix-whitening optimizers?
-
Who to Trust? Aggregating Client Knowledge in Logit-Based Federated Learning
-
Why Does Stochastic Gradient Descent Slow Down in Low-Precision Training?
-
Zero-Infinity GAN: Stable Dynamics and Implicit Bias of Extragradient