ICML 2024 Past Math & reasoning
High-dimensional Learning Dynamics 2024: The Emergence of Structure and Reasoning
HiLD at ICML 2024
- Submission deadline
- May 29, 2024, 04:30 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (73)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Hessian-Aware Stochastic Differential Equation for Modelling SGD
-
A Phase Transition between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention
-
A Random Matrix Analysis of Learning with Noisy Labels
-
A Unified Approach to Feature Learning in Bayesian Neural Networks
-
A Universal Class of Sharpness-Aware Minimization Algorithms
-
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
-
All Roads Lead to Rome? Exploring Representational Similarities Between Latent Spaces of Generative Image Models
-
An exactly solvable model for emergence and scaling laws
-
Analysing feature learning of gradient descent using periodic functions
-
Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training
-
Asymptotic Dynamics for Delayed Feature Learning in a Toy Model
-
Boundary between noise and information applied to filtering neural network weight matrices
-
Closed form of the Hessian spectrum for some Neural Networks
-
Correlated Noise in Epoch-Based Stochastic Gradient Descent: Implications for Weight Variances
-
Decomposing and Editing Predictions by Modeling Model Computation
-
Deep Networks Always Grok and Here is Why
-
Do Parameters Reveal More than Loss for Membership Inference?
-
Does SGD really happen in tiny subspaces?
-
Early Period of Training Impacts Out-of-Distribution Generalization
-
Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution
-
Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling
-
Emergent representations in networks trained with the Forward-Forward algorithm
-
Exploring the development of complexity over depth and time in deep neural networks
-
Expressivity of Neural Networks with Fixed Weights and Learned Biases
-
Feature Learning Dynamics under Grokking in a Sparse Parity Task
-
Fine-grained Analysis of In-context Linear Estimation
-
Fundamental limits of weak learnability in high-dimensional multi-index models
-
Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning
-
Gradient descent induces alignment between weights and the pre-activation tangents for deep non-linear networks
-
Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks
-
Gradient Descent with Polyak’s Momentum Finds Flatter Minima via Large Catapults
-
Gradient Dissent in Language Model Training and Saturation
-
Hidden Learning Dynamics of Capability before Behavior in Diffusion Models
-
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?
-
How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion
-
How Truncating Weights Improves Reasoning in Language Models
-
InfoNCE: Identifying the Gap Between Theory and Practice
-
Interpolated-MLPs: Controllable Inductive Bias
-
Landscaping Linear Mode Connectivity
-
Latent functional maps
-
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
-
Linear Weight Interpolation Leads to Transient Performance Gains
-
Looking at Deep Learning Phenomena Through a Telescoping Lens
-
Loss landscape geometry reveals stagewise development of transformers
-
Merging Text Transformer Models from Different Initializations
-
Neural collapse versus low-rank bias: Is deep neural collapse really optimal?
-
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
-
Neural Symmetry Detection for Learning Neural Network Constraints
-
Nonconvex Meta-optimization for Deep Learning
-
On the metastability of learning algorithms in physics-informed neural networks: a case study on Schr\"{o}dinger operators
-
Probability Tools for Sequential Random Projection
-
Progress Measures for Grokking on Real-world Tasks
-
Provable Benefit of Cutout and CutMix for Feature Learning
-
Provable Tempered Overfitting of Minimal Nets and Typical Nets
-
Random matrix theory analysis of neural network weight matrices
-
Rank Minimization, Alignment and Weight Decay in Neural Networks
-
ReLU Characteristic Activation Analysis
-
Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions
-
SGD vs GD: Rank Deficiency in Linear Networks
-
Simple, unified analysis of Johnson-Lindenstrauss with applications
-
The Butterfly Effect: Tiny Perturbations Cause Neural Network Training to Diverge
-
The Empirical Impact of Neural Parameter Symmetries, or Lack Thereof
-
The Hidden Pitfalls of the Cosine Similarity Loss
-
The Implicit Bias of Adam on Separable Data
-
The optimization landscape of Spectral neural network
-
Three Mechanisms of Feature Learning in an Analytically Solvable Model
-
Toward Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixture Models
-
u-μP: The Unit-Scaled Maximal Update Parametrization
-
Understanding Adversarially Robust Generalization via Weight-Curvature Index
-
Understanding Nonlinear Implicit Bias via Region Counts in Input Space
-
When Are Bias-Free ReLU Networks Like Linear Networks?
-
Where Do Large Learning Rates Lead Us? A Feature Learning Perspective
-
Why Pruning and Conditional Computation Work: A High-Dimensional Perspective