ICLR 2026 Past AI for science
Workshop on Scientific Methods for Understanding Deep Learning
Sci4DL 2026
- Submission deadline
- Feb 5, 2026, 12:10 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (88)
Fetched from OpenReview (v2) on 2026-06-10.
-
"Faithful to What?" On the Limits of Fidelity-Based Explanations
-
Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections
-
All in the Head?: A Controlled Study of Component Contributions in Few-Shot NLP
-
Analysing the Linearity of Linguistic Relations in Language Model Embedding Spaces
-
Attention Projection Mixing with Exogenous Anchors
-
Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models
-
Birkhoff-Exact Hyper-Connections: Exact Spectral Stability for Deep Residual Networks
-
Configuration-to-Performance Scaling Law with Neural Ansatz
-
Decoupled Orthogonal Dynamics: Regularization for Deep Network Optimizers
-
Deriving Hyperparameter Scaling Laws via Modern Optimization Theory
-
DIAGNOSING FP4 INFERENCE: A LAYER-WISE AND BLOCK-WISE SENSITIVITY ANALYSIS OF NVFP4 AND MXFP4
-
Divergent Tasks Harm Integration Of New Entities Via Fine-Tuning
-
Divine Benevolence is an $x^2$: GLUs have asymptotically faster scaling laws than MLPs
-
Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
-
Does Aurora Encode Atmospheric Structure? Latent Regime Analysis and Attribution
-
Does LLM Pre-Training Typically Occur at the Edge of Stability?
-
Dropout and the Outliers: Could Transformers Overcome Their Single Points of Failure?
-
Endogenous Resistance to Activation Steering in Language Models
-
Entropy-Lens: Uncovering Decision Strategies in LLMs
-
Evidence Slopes and Effective Dimension in Singular Linear Models
-
Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models
-
From Growing to Looping: A Unified View of Iterative Computation in LLMs
-
Generalized Dual-Scale Optimization: Topology-Aware Margin Dynamics in Fine-Grained Vision
-
Generating output diversity from prompt re-tokenization
-
Genomic Next-Token Predictors are In-Context Learners
-
Geometric Properties of Neural Multivariate Regression: An Empirical Study
-
Geometric Stability of Representation Manifolds as a Training-Free Diagnostic for Studying Data Augmentations
-
Gradual Stochastic Gradient Descent: from signSGD to SGD via $\ell_p$ Norm
-
Homophily as a Lossy Channel: Decomposing Information in Graphs and Graph Neural Networks
-
In-Context Benign Overfitting: A Feature-Selection Model in In-Context Linear Regression
-
Information spreading in diffusion models from effective field theory
-
Instruction Following by Principled Attention Boosting of Large Language Models
-
Is GPU Numerical Noise Really Random? An Empirical Investigation of Floating-Point Error Structure
-
LAYER-DEPENDENT STRUCTURE IN GRADIENT NOISE OF SMALL CONVOLUTIONAL NETWORKS
-
Learning When to Be Sparse: Adaptive Activations via Two-Parameter Entropy
-
Less Data, Faster Training: sampling bias from small dataset can speed up training
-
Leveraging Low-Rank Structure for Effective Weight-Sharing in Language Models
-
Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models
-
Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective
-
Multi-Task Pretraining Drives Representational Convergence
-
Network of Theseus (Like the ship)
-
Neural Multivariate Regression with Multi-Task Learning and Target Preprocessing
-
Normalized Conditional Mutual Information Surrogate Loss for Deep Learning Classifiers
-
On the "Induction Bias" in Sequence Models
-
On the Complexity of Neural Computation in Superposition
-
On the Simplicity-Similarity Tradeoff of LoRA and Full Fine-Tuning
-
Optimal learning rate scaling depends on data in deep scalar linear networks
-
Optimal scaling laws in learning hierarchical multi-index models
-
Optimization, Not Architecture, Governs Vision Transformer Generalization in Small-Data Regimes
-
Pretraining with Masked Backstories in a Toy World
-
PROBING INFORMATION FLOW IN VISION TRANSFORMERS THROUGH CONTROLLED ATTENTION PERTURBATION
-
Process-then-Retrieve: A Mechanistic Study of Cross-Modal Alignment in Vision-Language Models
-
Representation Geometry Mediates Neural Circuit Formation: Evidence from Systematic Regularization Analysis
-
Revealing Task-Dependent Layer Relevance via Attentive Multi-Layer Fusion
-
RouterInterp: Understanding Superposed Specialisation in MoE Routing
-
Scaling-Law Analysis of SignSGD: From Feature-Space Linear Regression to LLM Pre-training
-
Shared Gradient Discovery and Superposition: Learning Dynamics of Generalization in LLMs
-
Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting
-
Simple LLM Baselines are Competitive for Model Diffing
-
Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws
-
Skip To The Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs Autoregressive LLM
-
Soft Gates for Sharp Experts in Tabular Representation Learning
-
Special solutions with small volume exist
-
Spherical Cautious Optimizers
-
Steered LLM Activations are Non-Surjective
-
STRIDE: Training Data Attribution Can Be Estimated In Activation Space
-
Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment
-
The Feature-Space Alignment Hypothesis for Neural Network Sparsity
-
The Offline-Frontier Shift: Diagnosing Distributional Limits in Generative Multi-Objective Optimization
-
The Role of Data in Model Merging
-
Thermodynamics of Reinforcement Learning Curricula
-
To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters
-
Toy Models of Combinatorial Interpretability
-
Training for Compositional Sensitivity Reduces Dense Retrieval Generalization
-
TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers
-
Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
-
Understanding Learning Dynamics of Zeroth-Order Optimization
-
Understanding Scaling Laws With Token-Level Analysis
-
Unified Perspectives on Balancedness and Parameter-norm Evolution in Neural Nets
-
Vision Language Models Inherit Human Color Perception
-
Weight Decay Improves Language Model Plasticity
-
What Flow-Matching Brings to TD Learning?
-
When Does Diffusion Help? PDE-Inspired Optimization on Fragmented and Noisy Data
-
WHEN DOES META LEARNING ACTUALLY HELP? A SCIENTIFIC STUDY OF PHYSICAL INVERSE PROBLEMS
-
When does Observational Data Teach Latent Dynamics? Understanding Control Misalignment with Synthetic Tasks
-
When to restart? Exploring escalating restarts on convergence
-
Which Sparse Code? Identifiability Failures in SAE Inference
-
Zeroth-Order Optimization at the Edge of Stability