ICML 2025 Past Other
High-dimensional Learning Dynamics 2025
HiLD at ICML 2025
- Submission deadline
- May 22, 2025, 15:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (84)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Random Matrix Theory Perspective on the Learning Dynamics of Multi-head Latent Attention
-
A simple connection from loss flatness to compressed neural representations
-
A solvable generative model with a linear, one-step denoiser
-
Adam Reduces a Unique Form of Sharpness: Theoretical Insights Near the Minimizer Manifold
-
Adapting to High Dimensional Concepts with Metalearning
-
Attention with Trained Embeddings Provably Selects Important Tokens
-
Bayes optimal learning of attention-indexed models
-
Bayesian Influence Functions for Scalable Data Attribution
-
Benignity of loss landscape with weight decay requires both large overparametrization and initialization
-
Better Rates for Private Linear Regression in the Proportional Regime via Aggressive Clipping
-
Catalyst: Structured Pruning with Robust Bifurcation Dynamics
-
Data Free Metrics Are Not Reparameterisation Invariant Under the Critical and Robust Layer Phenomena
-
Data-Free Transformer Quantization Using Parameter-Space Symmetry
-
Different simultaneous mechanisms for in-context recall have distinct learning dynamics
-
Emergence of Hebbian Dynamics in Regularized Non-Local Learners
-
Emergent Linear Separability of Unseen Data Points in High-dimensional Last-Layer Feature Space
-
Emergent Specialization: Rare Token Neurons in Language Models
-
Exact Learning of Permutations for Nonzero Binary Inputs with Logarithmic Training Size and Quadratic Ensemble Complexity
-
Exploration Behavior of Untrained Policies
-
Exploring L2-Phase Transitions on Error Landscapes
-
Feature learning is decoupled from generalization in high capacity neural networks
-
From Information to Generative Exponent: Learning Rate Induces Phase Transitions in SGD
-
From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning
-
Fundamental Limits of Learning Single-Index Models under Structured Data
-
Generalisation and Safety Critical Evaluations at Sharp Minima: A Geometric Reappraisal
-
Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)
-
Grokking and Generalization Collapse: Insights from HTSR theory
-
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
-
How Transformers Get Rich: Training Dynamics Analysis
-
Implicit Bias and Loss of Plasticity in Matrix Completion: Depth Promotes Low-Rank Solutions
-
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
-
In Search of Adam’s Secret Sauce
-
Information-Geometric Neural Granger Causality
-
Input differentiation via negative computation
-
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
-
Jacobian Alignment Explains Grokking and Centroid Alignment Identifies It
-
Langevin Learning Dynamics in Lazy and Non-Lazy Wide Neural Networks
-
Latent Concept Disentanglement in Transformer-based Language Models
-
Learning curves theory of hierarchically compositional data with power-law distributed features
-
Learning how to step in gradient-based optimization: beyond convexity and smoothness
-
Low Rank Gradients and Where To Find Them
-
Lyapunov Learning at the Onset of Chaos
-
Neural Collapse is Globally Optimal in Deep Regularized ResNets and Transformers
-
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
-
On Generalization of Spectral Gradient Descent: A Case Study on Imbalanced Data
-
On the Existence of Hidden Subnetworks Within a Randomly Weighted Multi-Head Attention Mechanism
-
On the Interaction of Noise, Compression, and Adaptivity under $(L_0,L_1)$-Smoothness: An SDE Approach
-
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
-
On the Performance of Differentially Private Optimization with Heavy-Tail Class Imbalance
-
Origins of Creativity in Attention Based Diffusion Models
-
Probing Geometry of Next Token Prediction Using Cumulant Expansion of the Softmax Entropy
-
Quantitative Bounds for Length Generalization in Transformers
-
Quantization and the Bottom of the Loss Landscape
-
Reactivation: Empirical NTK Dynamics Under Task Shifts
-
Reduce and Conquer: Independent Component Analysis at linear sample complexity
-
Rethinking Memorization–Generalization Trade-Off in Generative Models
-
Revisiting the Goldilocks Zone in Inhomogeneous Networks
-
Risk Phase Transitions in Spiked Regression: Alignment Driven Benign and Catastrophic Overfitting
-
Selective Prediction via Training Dynamics
-
Spectral Dynamics of Contrastive Learning with Spurious Features
-
Studying Data Complexity and Learned Structure in Neural Networks with Bayesian Probes
-
Symmetries in Weight Space Learning: To Retain or Remove?
-
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
-
The Interplay Between Implicit Bias and Adversarial Robustness in Linear Convolutional Neural Networks
-
The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks
-
The Price of Robustness: Stable Classifiers Need Overparameterization
-
The Shape of Generalization through the Lens of Norm-based Capacity Control
-
The Silent Helper: How Implicit Regularization Enhances Group Robustness
-
Theoretical Guarantees and Training Dynamics of Contrastive Learning: How Misaligned Data Influence Feature Purity
-
Through the River: Understanding the Benefit of Schedule-Free Methods for Language Model Training
-
Topology-Aware Robust Representation Balancing for Estimating Causal Effects
-
Towards an Optimal Control Perspective of ResNet Training
-
Towards Understanding Orthogonalization in Muon
-
Tracing the representation geometry of language models from pretraining to post-training
-
Training Dynamics of In-Context Learning in Linear Attention
-
Two-point deterministic equivalence for SGD in random feature models
-
Understanding Generalization in Diffusion Models via Probability Flow Distance
-
Understanding Lookahead Dynamics Through Laplace Transforms
-
Understanding Mamba in In-Context Learning with Outliers: A Theoretical Generalization Analysis
-
Understanding Normalization Layers for Sparse Training
-
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
-
What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers
-
When Can You Get Away with Low Memory Adam?
-
When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective