ICML 2025 Past Other

High-dimensional Learning Dynamics 2025

HiLD at ICML 2025

Submission deadline
May 22, 2025, 15:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (84)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention

    Nandan Kumar Jha, Brandon Reagen · PDF
  2. A simple connection from loss flatness to compressed neural representations

    Shirui Chen, Stefano Recanatesi, Eric Todd SheaBrown · PDF
  3. A solvable generative model with a linear, one-step denoiser

    Indranil Halder · PDF
  4. Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold

    Xinghan Li, Haodong Wen, Kaifeng Lyu · PDF
  5. Adapting to High Dimensional Concepts with Metalearning

    Max Gupta · PDF
  6. Attention with Trained Embeddings Provably Selects Important Tokens

    Diyuan Wu, Aleksandr Shevchenko, Samet Oymak, Marco Mondelli · PDF
  7. Bayes optimal learning of attention-indexed models

    Fabrizio Boncoraglio, Emanuele Troiani, Vittorio Erba, Lenka Zdeborova · PDF
  8. Bayesian Influence Functions for Scalable Data Attribution

    Philipp Alexander Kreer, Wilson Wu, Maxwell Adam, Zach Furman, Jesse Hoogland · PDF
  9. Benignity of loss landscape with weight decay requires both large overparametrization and initialization

    Etienne Boursier, Matthew Bowditch, Matthias Englert, Ranko Lazic · PDF
  10. Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping

    Simone Bombari, Inbar Seroussi, Marco Mondelli · PDF
  11. Catalyst: Structured Pruning with Robust Bifurcation Dynamics

    Jaeheun Jung, Donghun Lee · PDF
  12. Data Free Metrics Are Not Reparameterisation Invariant Under the Critical and Robust Layer Phenomena

    Gabryel Mason-Williams, Israel Mason-Williams, Fredrik Dahlqvist · PDF
  13. Data-Free Transformer Quantization Using Parameter-Space Symmetry

    Lucas Laird, Bo Zhao, Rose Yu, Robin Walters · PDF
  14. Different simultaneous mechanisms for in-context recall have distinct learning dynamics

    Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai · PDF
  15. Emergence of Hebbian Dynamics in Regularized Non-Local Learners

    David Aaron Koplow, Tomaso Poggio, Liu Ziyin · PDF
  16. Emergent Linear Separability of Unseen Data Points in High-dimensional Last-Layer Feature Space

    Taehun Cha, Donghun Lee · PDF
  17. Emergent Specialization: Rare Token Neurons in Language Models

    Jing Liu, Haozheng Wang, Yueheng Li · PDF
  18. Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity

    George Giapitzakis, Artur Back de Luca, Kimon Fountoulakis · PDF
  19. Exploration Behavior of Untrained Policies

    Jacob Adamczyk · PDF
  20. Exploring L2-Phase Transitions on Error Landscapes

    Ibrahim Talha Ersoy, Karoline Wiesner · PDF
  21. Feature learning is decoupled from generalization in high capacity neural networks

    Niclas Alexander Göring, Charles London, Abdurrahman Hadi Erturk, Chris Mingard, Yoonsoo Nam, Ard A. Louis · PDF
  22. From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD

    Konstantinos Christopher Tsiolis, Alireza Mousavi-Hosseini, Murat A Erdogdu · PDF
  23. From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning

    Junsoo Oh, Jerry Song, Chulhee Yun · PDF
  24. Fundamental Limits of Learning Single-Index Models under Structured Data

    Jivan Waber, Alireza Mousavi-Hosseini, Murat A Erdogdu · PDF
  25. Generalisation and Safety Critical Evaluations at Sharp Minima: A Geometric Reappraisal

    Israel Mason-Williams, Gabryel Mason-Williams, Helen Yannakoudakis · PDF
  26. Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

    Artem Riabinin, Egor Shulgin, Kaja Gruntkowska, Peter Richtárik · PDF
  27. Grokking and Generalization Collapse: Insights from HTSR theory

    Hari Kishan Prakash, charles h martin · PDF
  28. How Compositional Generalization and Creativity Improve as Diffusion Models are Trained

    Alessandro Favero, Antonio Sclocchi, Francesco Cagnetta, Pascal Frossard, Matthieu Wyart · PDF
  29. How Transformers Get Rich: Training Dynamics Analysis

    Mingze Wang, Ruoxi Yu, Weinan E, Lei Wu · PDF
  30. Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rank Solutions

    Baekrok Shin, Chulhee Yun · PDF
  31. Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data

    Chen Fan, Mark Schmidt, Christos Thrampoulidis · PDF
  32. In Search of Adam’s Secret Sauce

    Antonio Orvieto, Robert M. Gower · PDF
  33. Information-Geometric Neural Granger Causality

    Pauline Bourigault, Danilo Mandic · PDF
  34. Input differentiation via negative computation

    Linghao Kong, Angelina Ning, Nir N Shavit · PDF
  35. Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling

    Teodora Srećković, Jonas Geiping, Antonio Orvieto · PDF
  36. Jacobian Alignment Explains Grokking and Centroid Alignment Identifies It

    Thomas Walker, Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk · PDF
  37. Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks

    Yehonatan Avidan, Haim Sompolinsky · PDF
  38. Latent Concept Disentanglement in Transformer-based Language Models

    Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, Rina Panigrahy · PDF
  39. Learning curves theory of hierarchically compositional data with power-law distributed features

    Francesco Cagnetta, Hyunmo Kang, Matthieu Wyart · PDF
  40. Learning how to step in gradient-based optimization: beyond convexity and smoothness

    Dravyansh Sharma · PDF
  41. Low Rank Gradients and Where To Find Them

    Rishi Sonthalia, Michael Murray, Guido Montufar · PDF
  42. Lyapunov Learning at the Onset of Chaos

    Alessandro Londei, Denise Lanzieri, Matteo Benati, Vittorio Loreto · PDF
  43. Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers

    Peter Súkeník, Christoph H. Lampert, Marco Mondelli · PDF
  44. New Evidence of the Two-Phase Learning Dynamics of Neural Networks

    Zhanpeng Zhou, Yongyi Yang, Mahito Sugiyama, Junchi Yan · PDF
  45. On Generalization of Spectral Gradient Descent: A Case Study on Imbalanced Data

    Bhavya Vasudeva, Puneesh Deora, Christos Thrampoulidis · PDF
  46. On the Existence of Hidden Subnetworks Within a Randomly Weighted Multi-Head Attention Mechanism

    Hikari Otsuka, Yasuyuki Okoshi, Daichi Fujiki, Susumu Takeuchi, Masato Motomura, Daiki Chijiwa · PDF
  47. On the Interaction of Noise, Compression, and Adaptivity under $(L_0,L_1)$-Smoothness: An SDE Approach

    Enea Monzio Compagnoni, Rustem Islamov, Antonio Orvieto, Eduard Gorbunov · PDF
  48. On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD

    Tongcheng Zhang, Zhanpeng Zhou, Mingze Wang, Andi Han, Wei Huang, Taiji Suzuki, Junchi Yan · PDF
  49. On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance

    Qiaoyue Tang, Alain Zhiyanov, Mathias Lécuyer · PDF
  50. Origins of Creativity in Attention Based Diffusion Models

    Emma Lucia Byrnes Finn, T. Anderson Keller, Manos Theodosis, Demba E. Ba · PDF
  51. Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy

    Karthik Viswanathan, Sang Eon Park · PDF
  52. Quantitative Bounds for Length Generalization in Transformers

    Zachary Izzo, Eshaan Nichani, Jason D. Lee · PDF
  53. Quantization and the Bottom of the Loss Landscape

    Luca Di Carlo, Daniel T. Bernstein, David J. Schwab · PDF
  54. Reactivation: Empirical NTK Dynamics Under Task Shifts

    Yuzhi LIU, Zixuan Chen, Zirui zhang, Yufei Liu, Giulia Lanzillotta · PDF
  55. Reduce and Conquer: Independent Component Analysis at linear sample complexity

    Fabiola Ricci, Lorenzo Bardone, Sebastian Goldt · PDF
  56. Rethinking Memorization–Generalization Trade-Off in Generative Models

    Jiseok Chae, Kyuwon Kim, Donghwan Kim · PDF
  57. Revisiting the Goldilocks Zone in Inhomogeneous Networks

    Zacharie Garnier Cuchet, Sarath Chandar, Ekaterina Lobacheva · PDF
  58. Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting

    Jiping Li, Rishi Sonthalia · PDF
  59. Selective Prediction via Training Dynamics

    Stephan Rabanser, Anvith Thudi, Kimia Hamidieh, Adam Dziedzic, Israfil Bahceci, Akram Bin Sediq, Hamza Sokun, Nicolas Papernot · PDF
  60. Spectral Dynamics of Contrastive Learning with Spurious Features

    Naghmeh Ghanooni, Dennis Wagner, Waleed Mustafa, Anthony Widjaja Lin, Sophie Fellenz, Marius Kloft · PDF
  61. Studying Data Complexity and Learned Structure in Neural Networks with Bayesian Probes

    Maxwell Adam, Zach Furman, Wilson Wu, Philipp Alexander Kreer, Jesse Hoogland · PDF
  62. Symmetries in Weight Space Learning: To Retain or Remove?

    Fynn Kiwitt, Behrooz Tahmasebi, Stefanie Jegelka · PDF
  63. The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets

    Yujun Kim, Chaewon Moon, Chulhee Yun · PDF
  64. The Interplay Between Implicit Bias and Adversarial Robustness in Linear Convolutional Neural Networks

    Aurélien Boland, Hannah Pinson · PDF
  65. The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks

    Vittorio Erba, Emanuele Troiani, Lenka Zdeborova, Florent Krzakala · PDF
  66. The Price of Robustness: Stable Classifiers Need Overparameterization

    Jonas von Berg, Adalbert Fono, Massimiliano Datres, Sohir Maskey, Gitta Kutyniok · PDF
  67. The Shape of Generalization through the Lens of Norm-based Capacity Control

    Yichen Wang, Yudong Chen, Lorenzo Rosasco, Fanghui Liu · PDF
  68. The Silent Helper: How Implicit Regularization Enhances Group Robustness

    Nahal Mirzaie, Mahdi Ghaznavi, Hosna Oyarhoseini, Alireza Alipanah, Erfan Sobhaei, Ali Abbasi, Amirmahdi Farzane, Hossein Jafarinia, Parsa Sharifi Sedeh, Arefe Boushehrian, Mahdieh Soleymani Baghshah, Mohammad Hossein Rohban · PDF
  69. Theoretical Guarantees and Training Dynamics of Contrastive Learning: How Misaligned Data Influence Feature Purity

    Jiawei Sun, Shuai Zhang, Hongkang Li, Meng Wang · PDF
  70. Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training

    Minhak Song, Beomhan Baek, Kwangjun Ahn, Chulhee Yun · PDF
  71. Topology-Aware Robust Representation Balancing for Estimating Causal Effects

    Amirhossein Farzam, Ahmed Aloui, Vahid Tarokh, Guillermo Sapiro · PDF
  72. Towards an Optimal Control Perspective of ResNet Training

    Jens Püttschneider, Simon Heilig, Asja Fischer, Timm Faulwasser · PDF
  73. Towards Understanding Orthogonalization in Muon

    Valentyn Boreiko, Zhiqi Bu, Sheng Zha · PDF
  74. Tracing the representation geometry of language models from pretraining to post-training

    Melody Zixuan Li, Kumar Krishna Agrawal, Arna Ghosh, Komal Kumar Teru, Guillaume Lajoie, Blake Aaron Richards · PDF
  75. Training Dynamics of In-Context Learning in Linear Attention

    Yedi Zhang, Aaditya K Singh, Peter E. Latham, Andrew M Saxe · PDF
  76. Two-point deterministic equivalence for SGD in random feature models

    Alexander Atanasov, Blake Bordelon, Jacob A Zavatone-Veth, Courtney Paquette, Cengiz Pehlevan · PDF
  77. Understanding Generalization in Diffusion Models via Probability Flow Distance

    Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu · PDF
  78. Understanding Lookahead Dynamics Through Laplace Transforms

    Aniket Sanyal, Tatjana Chavdarova · PDF
  79. Understanding Mamba in In-Context Learning with Outliers: A Theoretical Generalization Analysis

    Hongkang Li, Songtao Lu, Xiaodong Cui, Pin-Yu Chen, Meng Wang · PDF
  80. Understanding Normalization Layers for Sparse Training

    Mohammed Adnan, Ekansh Sharma, Rahul Krishnan, Yani Ioannou · PDF
  81. Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers

    Annalisa Belloni, Lorenzo Noci, Antonio Orvieto · PDF
  82. What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers

    Pulkit Gopalani, Wei Hu · PDF
  83. When Can You Get Away with Low Memory Adam?

    Dayal Singh Kalra, John Kirchenbauer, Maissam Barkeshli, Tom Goldstein · PDF
  84. When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

    Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, Murat A Erdogdu · PDF