NeurIPS 2024 Past Math & reasoning
NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning
M3L
- Submission deadline
- Oct 2, 2024, 19:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (81)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
-
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
-
A Theoretical Framework for Federated Domain Generalization with Gradient Alignment
-
A Theory of Initialisation's Impact on Specialisation
-
Accumulating Data Avoids Model Collapse
-
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs
-
Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks
-
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
-
Algorithmic Stability of Minimum-Norm Interpolating Deep Neural Networks
-
An empirical study of the $(L_0, L_1)$-smoothness condition
-
Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks
-
Benign Overfitting in Out-of-Distribution Generalization of Linear Models
-
Benign Overfitting in Single-Head Attention
-
Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training
-
Can Bayesian Neural Networks Make Confident Predictions?
-
Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model
-
Classifier-Free Guidance is a Predictor-Corrector
-
Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning
-
Comparing Implicit and Denoising Score-Matching Objectives
-
Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization
-
Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets
-
Continuous-Time Analysis of Adaptive Optimization and Normalization
-
Convergence of Distributed Adaptive Optimization with Local Updates
-
Convergence Properties of Hyperbolic Neural Networks on Riemannian Manifolds
-
Declarative characterizations of direct preference alignment algorithms
-
Depth Extrapolation of Decoders Trained on Nested Structures
-
Diffusion Model Learns Low-Dimensional Distributions via Subspace Clustering
-
Diffusion Models With Learned Adaptive Noise Processes
-
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
-
Does Machine Bring in Extra Bias in Learning? Approximating Discrimination Within Models Quickly
-
Dynamics of Concept Learning and Compositional Generalization
-
Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs
-
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
-
Exploring Task Affinities through NTK Alignment and Early Training Dynamics in Multi-Task Learning
-
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
-
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks
-
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
-
Geometric Deep Learning with Quasiconformal Neural Networks: An Introduction
-
Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift
-
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models
-
HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks
-
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
-
How do students become teachers: A dynamical analysis for two-layer neural networks
-
Implicit Bias of Adam versus Gradient Descent in One-Hidden-Layer Neural Networks
-
Improving the Gaussian Approximation in Neural Networks: Para-Gaussians and Edgeworth Expansions
-
In-Context Learning by Linear Attention: Exact Asymptotics and Experiments
-
Increasing Fairness via Combination with Learning Guarantees
-
Information-Theoretic Foundations for Neural Scaling Laws
-
Information-Theoretic Generalization Bounds for Batch Reinforcement Learning
-
Label Noise: Ignorance Is Bliss
-
Leveraging Intermediate Neural Collapse with Simplex ETFs for Efficient Deep Neural Networks
-
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models
-
Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error
-
Mixture of Parrots: Mixtures of experts improve memorization more than reasoning
-
On the Implicit Relation between Low-Rank Adaptation and Differential Privacy
-
On Your Mark, Get Set, Warmup!
-
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
-
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression
-
Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection
-
Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models
-
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent
-
Progressive distillation induces an implicit curriculum
-
Provable unlearning in topic modeling and downstream tasks
-
Provable weak-to-strong generalization via benign overfitting
-
Robust Feature Learning for Multi-Index Models in High Dimensions
-
Sample compression unleashed : New generalization bounds for real valued losses
-
Self-Improvement in Language Models: The Sharpening Mechanism
-
SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network
-
Simple and Effective Masked Diffusion Language Models
-
The Crucial Role of Samplers in Online Direct Preference Optimization
-
The GAN is dead; long live the GAN! A Modern GAN Baseline
-
Towards characterizing the value of edge embeddings in Graph Neural Networks
-
Towards Principled Graph Transformers
-
Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study
-
Transformers are Efficient Compilers, Provably
-
Transformers Provably Solve Parity Efficiently with Chain of Thought
-
Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling
-
Understanding Factual Recall in Transformers via Associative Memories
-
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
-
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
-
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues