ICML 2024 Past Math & reasoning

High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning

HiLD at ICML 2024

Submission deadline
May 29, 2024, 04:30 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (73)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Hessian-Aware Stochastic Differential Equation for Modelling SGD

    Xiang Li, Zebang Shen, Liang Zhang, Niao He · PDF
  2. A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

    Hugo Cui, Freya Behrens, Florent Krzakala, Lenka Zdeborova · PDF
  3. A Random Matrix Analysis of Learning with Noisy Labels

    Aymane El Firdoussi, Mohamed El Amine Seddik · PDF
  4. A Unified Approach to Feature Learning in Bayesian Neural Networks

    Noa Rubin, Zohar Ringel, Inbar Seroussi, Moritz Helias · PDF
  5. A Universal Class of Sharpness-Aware Minimization Algorithms

    Behrooz Tahmasebi, Ashkan Soleymani, Dara Bahri, Stefanie Jegelka, Patrick Jaillet · PDF
  6. Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity

    Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li · PDF
  7. All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models

    Charumathi Badrinath, Usha Bhalla, Alex Oesterling, Suraj Srinivas, Himabindu Lakkaraju · PDF
  8. An exactly solvable model for emergence and scaling laws

    Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A. Louis · PDF
  9. Analysing feature learning of gradient descent using periodic functions

    Jaehui Hwang, Taeyoung Kim, Hongseok Yang · PDF
  10. Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training

    Atli Kosson, Bettina Messmer, Martin Jaggi · PDF
  11. Asymptotic Dynamics for Delayed Feature Learning in a Toy Model

    Blake Bordelon, Tanishq Kumar, Samuel J. Gershman, Cengiz Pehlevan · PDF
  12. Boundary between noise and information applied to filtering neural network weight matrices

    Max Staats, Matthias Thamm, Bernd Rosenow · PDF
  13. Closed form of the Hessian spectrum for some Neural Networks

    Sidak Pal Singh, Thomas Hofmann · PDF
  14. Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances

    Marcel Kühn, Bernd Rosenow · PDF
  15. Decomposing and Editing Predictions by Modeling Model Computation

    Harshay Shah, Andrew Ilyas, Aleksander Madry · PDF
  16. Deep Networks Always Grok and Here is Why

    Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk · PDF
  17. Do Parameters Reveal More than Loss for Membership Inference?

    Anshuman Suri, Xiao Zhang, David Evans · PDF
  18. Does SGD really happen in tiny subspaces?

    Minhak Song, Kwangjun Ahn, Chulhee Yun · PDF
  19. Early Period of Training Impacts Out-of-Distribution Generalization

    Chen Cecilia Liu, Iryna Gurevych · PDF
  20. Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution

    Naoki Yoshida, Shogo Nakakita, Masaaki Imaizumi · PDF
  21. Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling

    Moritz Haas, Jin Xu, Volkan Cevher, Leena Chennuru Vankadara · PDF
  22. Emergent representations in networks trained with the Forward-Forward algorithm

    Niccolo Tosato, Lorenzo Basile, Emanuele Ballarin, Giuseppe De Alteriis, Alberto Cazzaniga, Alessio ansuini · PDF
  23. Exploring the development of complexity over depth and time in deep neural networks

    Hannah Pinson, Aurélien Boland, Vincent Ginis, Mykola Pechenizkiy · PDF
  24. Expressivity of Neural Networks with Fixed Weights and Learned Biases

    Ezekiel Williams, Avery Hee-Woon Ryoo, Thomas Jiralerspong, Alexandre Payeur, Matthew G Perich, Luca Mazzucato, Guillaume Lajoie · PDF
  25. Feature Learning Dynamics under Grokking in a Sparse Parity Task

    Javier Sanguino Bautiste, Gregor Bachmann, Bobby He, Lorenzo Noci, Thomas Hofmann · PDF
  26. Fine-grained Analysis of In-context Linear Estimation

    Yingcong Li, Ankit Singh Rawat, Samet Oymak · PDF
  27. Fundamental limits of weak learnability in high-dimensional multi-index models

    Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborova, Bruno Loureiro, Florent Krzakala · PDF
  28. Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

    Daniel Kunin, Allan Raventos, Clémentine Carla Juliette Dominé, Feng Chen, David Klindt, Andrew M Saxe, Surya Ganguli · PDF
  29. Gradient descent induces alignment between weights and the pre-activation tangents for deep non-linear networks

    Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala · PDF
  30. Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

    Chenyang Zhang, Gao Peifeng, Difan Zou, Yuan Cao · PDF
  31. Gradient Descent with Polyak’s Momentum Finds Flatter Minima via Large Catapults

    Prin Phunyaphibarn, Junghyun Lee, Bohan Wang, Huishuai Zhang, Chulhee Yun · PDF
  32. Gradient Dissent in Language Model Training and Saturation

    Andrei Mircea, Ekaterina Lobacheva, Irina Rish · PDF
  33. Hidden Learning Dynamics of Capability before Behavior in Diffusion Models

    Core Francisco Park, Maya Okawa, Andrew Lee, Ekdeep Singh Lubana, Hidenori Tanaka · PDF
  34. How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?

    Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen · PDF
  35. How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion

    Pulkit Gopalani, Ekdeep Singh Lubana, Wei Hu · PDF
  36. How Truncating Weights Improves Reasoning in Language Models

    Lei Chen, Joan Bruna, Alberto Bietti · PDF
  37. InfoNCE: Identifying the Gap Between Theory and Practice

    Evgenia Rusak, Patrik Reizinger, Attila Juhos, Oliver Bringmann, Roland S. Zimmermann, Wieland Brendel · PDF
  38. Interpolated-MLPs: Controllable Inductive Bias

    Sean Wu, Jordan Hong, keybai, Gregor Bachmann · PDF
  39. Landscaping Linear Mode Connectivity

    Sidak Pal Singh, Linara Adilova, Michael Kamp, Asja Fischer, Bernhard Schölkopf, Thomas Hofmann · PDF
  40. Latent functional maps

    Marco Fumero, Marco Pegoraro, Valentino Maiorca, Francesco Locatello, Emanuele Rodolà · PDF
  41. Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

    Alireza Mousavi-Hosseini, Denny Wu, Murat A Erdogdu · PDF
  42. Linear Weight Interpolation Leads to Transient Performance Gains

    Gaurav Iyer, Gintare Karolina Dziugaite, David Rolnick · PDF
  43. Looking at Deep Learning Phenomena Through a Telescoping Lens

    Alan Jeffares, Alicia Curth, Mihaela van der Schaar · PDF
  44. Loss landscape geometry reveals stagewise development of transformers

    George Wang, Matthew Farrugia-Roberts, Jesse Hoogland, Liam Carroll, Susan Wei, Daniel Murfet · PDF
  45. Merging Text Transformer Models from Different Initializations

    Neha Verma, Maha Elbayad · PDF
  46. Neural collapse versus low-rank bias: Is deep neural collapse really optimal?

    Peter Súkeník, Marco Mondelli, Christoph H. Lampert · PDF
  47. Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

    Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu · PDF
  48. Neural Symmetry Detection for Learning Neural Network Constraints

    Alex Gabel, Rick Quax, Stratis Gavves · PDF
  49. Nonconvex Meta-optimization for Deep Learning

    Xinyi Chen, Evan Dogariu, Zhou Lu, Elad Hazan · PDF
  50. On the metastability of learning algorithms in physics-informed neural networks: a case study on Schr\"{o}dinger operators

    Alessandro Maria Selvitella · PDF
  51. Probability Tools for Sequential Random Projection

    Yingru Li · PDF
  52. Progress Measures for Grokking on Real-world Tasks

    Satvik Golechha · PDF
  53. Provable Benefit of Cutout and CutMix for Feature Learning

    Junsoo Oh, Chulhee Yun · PDF
  54. Provable Tempered Overfitting of Minimal Nets and Typical Nets

    Itamar Harel, William M. Hoza, Gal Vardi, Itay Evron, Nathan Srebro, Daniel Soudry · PDF
  55. Random matrix theory analysis of neural network weight matrices

    Matthias Thamm, Max Staats, Bernd Rosenow · PDF
  56. Rank Minimization, Alignment and Weight Decay in Neural Networks

    David Yunis, Kumar Kshitij Patel, Samuel Wheeler, Pedro Henrique Pamplona Savarese, Gal Vardi, Karen Livescu, Michael Maire, Matthew Walter · PDF
  57. ReLU Characteristic Activation Analysis

    Wenlin Chen, Hong Ge · PDF
  58. Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

    Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Luca Pesce, Ludovic Stephan · PDF
  59. SGD vs GD: Rank Deficiency in Linear Networks

    Aditya Varre, Margarita Sagitova, Nicolas Flammarion · PDF
  60. Simple, unified analysis of Johnson-Lindenstrauss with applications

    Yingru Li · PDF
  61. The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge

    Gül Sena Altıntaş, Devin Kwok, David Rolnick · PDF
  62. The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof

    Derek Lim, Theo Putterman, Robin Walters, Haggai Maron, Stefanie Jegelka · PDF
  63. The Hidden Pitfalls of the Cosine Similarity Loss

    Andrew Draganov, Sharvaree Vadgama, Erik J Bekkers · PDF
  64. The Implicit Bias of Adam on Separable Data

    Chenyang Zhang, Difan Zou, Yuan Cao · PDF
  65. The optimization landscape of Spectral neural network

    Chenghui Li, Rishi Sonthalia, Nicolas Garcia Trillos · PDF
  66. Three Mechanisms of Feature Learning in an Analytically Solvable Model

    Yizhou Xu, Liu Ziyin · PDF
  67. Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models

    Weihang Xu, Maryam Fazel, Simon Shaolei Du · PDF
  68. u-μP: The Unit-Scaled Maximal Update Parametrization

    Charlie Blake, Constantin Eichenberg, Josef Dean, Lukas Balles, Luke Yuri Prince, Björn Deiseroth, Andres Felipe Cruz-Salinas, Carlo Luschi, Samuel Weinbach, Douglas Orr · PDF
  69. Understanding Adversarially Robust Generalization via Weight-Curvature Index

    Yuelin Xu, Xiao Zhang · PDF
  70. Understanding Nonlinear Implicit Bias via Region Counts in Input Space

    Jingwei Li, Jing Xu, Zifan Wang, Huishuai Zhang, Jingzhao Zhang · PDF
  71. When Are Bias-Free ReLU Networks Like Linear Networks?

    Yedi Zhang, Andrew M Saxe, Peter E. Latham · PDF
  72. Where Do Large Learning Rates Lead Us? A Feature Learning Perspective

    Ildus Sadrtdinov, Maxim Kodryan, Eduard Pokonechny, Ekaterina Lobacheva, Dmitry Vetrov · PDF
  73. Why Pruning and Conditional Computation Work: A High-Dimensional Perspective

    Erdem Koyuncu · PDF