NeurIPS 2024 Past Math & reasoning

NeurIPS 2024 Workshop on Mathematics of Modern Machine Learning

M3L

Submission deadline
Oct 2, 2024, 19:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (81)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers

    William Merrill, Ashish Sabharwal · PDF
  2. A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules

    Kairong Luo, Haodong Wen, Shengding Hu, Zhenbo Sun, Zhiyuan Liu, Maosong Sun, Kaifeng Lyu, Wenguang Chen · PDF
  3. A Theoretical Framework for Federated Domain Generalization with Gradient Alignment

    Mahdiyar Molahasani, Milad Soltany, Farhad Pourpanah, Michael Greenspan, Ali Etemad · PDF
  4. A Theory of Initialisation's Impact on Specialisation

    Devon Jarvis, Sebastian Lee, Clémentine Carla Juliette Dominé, Andrew M Saxe, Stefano Sarao Mannelli · PDF
  5. Accumulating Data Avoids Model Collapse

    Joshua Kazdan, Apratim Dey, Rylan Schaeffer, Matthias Gerstgrasser, Rafael Rafailov, David L. Donoho, Sanmi Koyejo · PDF
  6. Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

    Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael Jordan, Song Mei · PDF
  7. Adversarial Attacks as Near-Zero Eigenvalues in the Empirical Kernel of Neural Networks

    Ouns El Harzli, Bernardo Cuenca Grau · PDF
  8. Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data

    Binghui Li, Yuanzhi Li · PDF
  9. Algorithmic Stability of Minimum-Norm Interpolating Deep Neural Networks

    Ouns El Harzli, Yoonsoo Nam, Ilja Kuzborskij, Bernardo Cuenca Grau, Ard A. Louis · PDF
  10. An empirical study of the $(L_0, L_1)$-smoothness condition

    Y Cooper · PDF
  11. Bayesian Treatment of the Spectrum of the Empirical Kernel in (Sub)Linear-Width Neural Networks

    Ouns El Harzli, Bernardo Cuenca Grau · PDF
  12. Benign Overfitting in Out-of-Distribution Generalization of Linear Models

    Shange Tang, Jiayun Wu, Jianqing Fan, Chi Jin · PDF
  13. Benign Overfitting in Single-Head Attention

    Roey Magen, Shuning Shang, Zhiwei Xu, Spencer Frei, Wei Hu, Gal Vardi · PDF
  14. Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

    Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli · PDF
  15. Can Bayesian Neural Networks Make Confident Predictions?

    Katharine Fisher · PDF
  16. Can Neural Networks Achieve Optimal Computational-statistical Tradeoff? An Analysis on Single-Index Model

    Siyu Chen, Beining Wu, Miao Lu, Zhuoran Yang, Tianhao Wang · PDF
  17. Classifier-Free Guidance is a Predictor-Corrector

    Arwen Bradley, Preetum Nakkiran · PDF
  18. Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning

    Alexey Rukhovich, Alexander Podolskiy, Irina Piontkovskaya · PDF
  19. Comparing Implicit and Denoising Score-Matching Objectives

    Artem Artemev, Ayan Das, Farhang Nabiei, Alberto Bernacchia · PDF
  20. Complexity of Vector-valued Prediction: From Linear Models to Stochastic Convex Optimization

    Matan Schliserman, Tomer Koren · PDF
  21. Composing Global Optimizers to Reasoning Tasks via Algebraic Objects in Neural Nets

    Yuandong Tian · PDF
  22. Continuous-Time Analysis of Adaptive Optimization and Normalization

    Rhys Gould, Hidenori Tanaka · PDF
  23. Convergence of Distributed Adaptive Optimization with Local Updates

    Ziheng Cheng, Margalit Glasgow · PDF
  24. Convergence Properties of Hyperbolic Neural Networks on Riemannian Manifolds

    Nico Alvarado, Sebastian Burgos · PDF
  25. Declarative characterizations of direct preference alignment algorithms

    Kyle Richardson, Vivek Srikumar, Ashish Sabharwal · PDF
  26. Depth Extrapolation of Decoders Trained on Nested Structures

    Emile R Richard · PDF
  27. Diffusion Model Learns Low-Dimensional Distributions via Subspace Clustering

    Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu · PDF
  28. Diffusion Models With Learned Adaptive Noise Processes

    Subham Sekhar Sahoo, Aaron Gokaslan, Christopher De Sa, Volodymyr Kuleshov · PDF
  29. Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

    Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam · PDF
  30. Does Machine Bring in Extra Bias in Learning? Approximating Discrimination Within Models Quickly

    Yijun Bian, Yujie Luo, Ping Xu · PDF
  31. Dynamics of Concept Learning and Compositional Generalization

    Yongyi Yang, Core Francisco Park, Ekdeep Singh Lubana, Maya Okawa, Wei Hu, Hidenori Tanaka · PDF
  32. Efficiently Learning at Test-Time: Active Fine-Tuning of LLMs

    Jonas Hübotter, Sascha Bongni, Ido Hakimi, Andreas Krause · PDF
  33. Emergence in non-neural models: grokking modular arithmetic via average gradient outer product

    Neil Rohit Mallinar, Daniel Beaglehole, Libin Zhu, Adityanarayanan Radhakrishnan, Parthe Pandit, Mikhail Belkin · PDF
  34. Exploring Task Affinities through NTK Alignment and Early Training Dynamics in Multi-Task Learning

    Yoann Morello, Emilie Gregoire, Sam Verboven · PDF
  35. Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks

    Nikolaos Tsilivis, Gal Vardi, Julia Kempe · PDF
  36. From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks

    Clémentine Carla Juliette Dominé, Nicolas Anguita, Alexandra Maria Proca, Lukas Braun, Daniel Kunin, Pedro A. M. Mediano, Andrew M Saxe · PDF
  37. From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

    Kaiyue Wen, Huaqing Zhang, Hongzhou Lin, Jingzhao Zhang · PDF
  38. Geometric Deep Learning with Quasiconformal Neural Networks: An Introduction

    Nico Alvarado, Hans Lobel · PDF
  39. Harnessing the Power of Vicinity-Informed Analysis for Classification under Covariate Shift

    Mitsuhiro Fujikawa, Youhei Akimoto, Jun Sakuma, Kazuto Fukuchi · PDF
  40. Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

    Frederik Kunstner, Alan Milligan, Robin Yadav, Mark Schmidt, Alberto Bietti · PDF
  41. HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural Networks

    Yongyi Yang, Jiaming Yang, Wei Hu, Michal Derezinski · PDF
  42. How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework

    Yinuo Ren, Haoxuan Chen, Grant M. Rotskoff, Lexing Ying · PDF
  43. How do students become teachers: A dynamical analysis for two-layer neural networks

    Zhenyu Zhu, Fanghui Liu, Volkan Cevher · PDF
  44. Implicit Bias of Adam versus Gradient Descent in One-Hidden-Layer Neural Networks

    Bhavya Vasudeva, Vatsal Sharan, Mahdi Soltanolkotabi · PDF
  45. Improving the Gaussian Approximation in Neural Networks: Para-Gaussians and Edgeworth Expansions

    Mihai Nica, Janosch Ortmann · PDF
  46. In-Context Learning by Linear Attention: Exact Asymptotics and Experiments

    Yue Lu, Mary Letey, Jacob A Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan · PDF
  47. Increasing Fairness via Combination with Learning Guarantees

    Yijun Bian, Kun Zhang · PDF
  48. Information-Theoretic Foundations for Neural Scaling Laws

    Hong Jun Jeon, Benjamin Van Roy · PDF
  49. Information-Theoretic Generalization Bounds for Batch Reinforcement Learning

    Xingtu Liu · PDF
  50. Label Noise: Ignorance Is Bliss

    Yilun Zhu, Jianxin Zhang, Aditya Gangrade, Clayton Scott · PDF
  51. Leveraging Intermediate Neural Collapse with Simplex ETFs for Efficient Deep Neural Networks

    Emily Liu · PDF
  52. Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models

    Yuda Song, Hanlin Zhang, Carson Eisenach, Sham M. Kakade, Dean Foster, Udaya Ghai · PDF
  53. Misspecified $Q$ -Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Ally Yalei Du, Lin Yang, Ruosong Wang · PDF
  54. Mixture of Parrots: Mixtures of experts improve memorization more than reasoning

    Samy Jelassi, Clara Mohri, David Brandfonbrener, Alex Gu, Nikhil Vyas, Nikhil Anand, David Alvarez-Melis, Yuanzhi Li, Sham M. Kakade, Eran Malach · PDF
  55. On the Implicit Relation between Low-Rank Adaptation and Differential Privacy

    Saber Malekmohammadi, Golnoosh Farnadi · PDF
  56. On Your Mark, Get Set, Warmup!

    Dayal Singh Kalra, Maissam Barkeshli · PDF
  57. Optimal Protocols for Continual Learning via Statistical Physics and Control Theory

    Francesco Mori, Stefano Sarao Mannelli, Francesca Mignacco · PDF
  58. Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

    Juno Kim, Dimitri Meunier, Arthur Gretton, Taiji Suzuki, Zhu Li · PDF
  59. Optimizing Attention with Mirror Descent: Generalized Max-Margin Token Selection

    Aaron Alvarado Kristanto Julistiono, Davoud Ataee Tarzanagh, Navid Azizan · PDF
  60. Optimizing Fine-Tuning Efficiency: Gradient Subspace Tracking on Grassmann Manifolds for Large Language Models

    Sahar Rajabi, Sirisha Rambhatla · PDF
  61. Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent

    Liu Ziyin, Mingze Wang, Hongchao Li, Lei Wu · PDF
  62. Progressive distillation induces an implicit curriculum

    Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel · PDF
  63. Provable unlearning in topic modeling and downstream tasks

    Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal · PDF
  64. Provable weak-to-strong generalization via benign overfitting

    David Xing Wu, Anant Sahai · PDF
  65. Robust Feature Learning for Multi-Index Models in High Dimensions

    Alireza Mousavi-Hosseini, Adel Javanmard, Murat A Erdogdu · PDF
  66. Sample compression unleashed : New generalization bounds for real valued losses

    Mathieu Bazinet, Valentina Zantedeschi, Pascal Germain · PDF
  67. Self-Improvement in Language Models: The Sharpening Mechanism

    Audrey Huang, Adam Block, Dylan J Foster, Dhruv Rohatgi, Cyril Zhang, Max Simchowitz, Jordan T. Ash, Akshay Krishnamurthy · PDF
  68. SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network

    Tomer Galanti, Zachary S Siegel, Aparna Gupte, Tomaso A Poggio · PDF
  69. Simple and Effective Masked Diffusion Language Models

    Subham Sekhar Sahoo, Marianne Arriola, Aaron Gokaslan, Yair Schiff, Edgar Mariano Marroquin, Justin T Chiu, Alexander M Rush, Volodymyr Kuleshov · PDF
  70. The Crucial Role of Samplers in Online Direct Preference Optimization

    Ruizhe Shi, Runlong Zhou, Simon Shaolei Du · PDF
  71. The GAN is dead; long live the GAN! A Modern GAN Baseline

    Nick Huang, Aaron Gokaslan, Volodymyr Kuleshov, James Tompkin · PDF
  72. Towards characterizing the value of edge embeddings in Graph Neural Networks

    Dhruv Rohatgi, Tanya Marwah, Zachary Chase Lipton, Jianfeng Lu, Ankur Moitra, Andrej Risteski · PDF
  73. Towards Principled Graph Transformers

    Luis Müller, Daniel Kusuma, Blai Bonet, Christopher Morris · PDF
  74. Towards the Effect of Examples on In-Context Learning: A Theoretical Case Study

    Pengfei He, Yingqian Cui, Han Xu, Hui Liu, Makoto Yamada, Jiliang Tang, Yue Xing · PDF
  75. Transformers are Efficient Compilers, Provably

    Xiyu Zhai, Runlong Zhou, Liao Zhang, Simon Shaolei Du · PDF
  76. Transformers Provably Solve Parity Efficiently with Chain of Thought

    Juno Kim, Taiji Suzuki · PDF
  77. Understanding Diffusion-based Representation Learning via Low-Dimensional Modeling

    Xiao Li, Zekai Zhang, Xiang Li, Siyi Chen, Zhihui Zhu, Peng Wang, Qing Qu · PDF
  78. Understanding Factual Recall in Transformers via Associative Memories

    Eshaan Nichani, Jason D. Lee, Alberto Bietti · PDF
  79. Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

    Noam Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, Boris Hanin · PDF
  80. Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos

    Dayal Singh Kalra, Tianyu He, Maissam Barkeshli · PDF
  81. Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

    Riccardo Grazzi, Julien Siems, Jörg K.H. Franke, Arber Zela, Frank Hutter, Massimiliano Pontil · PDF