ICLR 2026 Past AI for science

Workshop on Scientific Methods for Understanding Deep Learning

Sci4DL 2026

Submission deadline
Feb 5, 2026, 12:10 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (88)

Fetched from OpenReview (v2) on 2026-06-10.

  1. "Faithful to What?" On the Limits of Fidelity-Based Explanations

    Jackson Eshbaugh · PDF
  2. Ablate and Rescue: A Causal Analysis of Residual Stream Hyper-Connections

    William Gao Peng, Josheev Rai, Kevin Tseng, Siwei Wang, Sean Wu · PDF
  3. All in the Head?: A Controlled Study of Component Contributions in Few-Shot NLP

    Rishaan Desai · PDF
  4. Analysing the Linearity of Linguistic Relations in Language Model Embedding Spaces

    Vasudevan Nedumpozhimana, Fathima Thekkekara, John Kelleher · PDF
  5. Attention Projection Mixing with Exogenous Anchors

    Jonathan Su · PDF
  6. Attention Sinks as Internal Signals for Hallucination Detection in Large Language Models

    Jakub Binkowski, Kamil Adamczewski, Tomasz Jan Kajdanowicz · PDF
  7. Birkhoff-Exact Hyper-Connections: Exact Spectral Stability for Deep Residual Networks

    Hyunjun Kim · PDF
  8. Configuration-to-Performance Scaling Law with Neural Ansatz

    Huaqing Zhang, Kaiyue Wen, Tengyu Ma · PDF
  9. Decoupled Orthogonal Dynamics: Regularization for Deep Network Optimizers

    Hao Chen, Jh Yuan, Hanmin Zhang · PDF
  10. Deriving Hyperparameter Scaling Laws via Modern Optimization Theory

    Egor Shulgin, Dimitri von Rütte, Tianyue H. Zhang, Niccolò Ajroldi, Bernhard Schölkopf, Antonio Orvieto · PDF
  11. DIAGNOSING FP4 INFERENCE: A LAYER-WISE AND BLOCK-WISE SENSITIVITY ANALYSIS OF NVFP4 AND MXFP4

    Musa Cim, Burak Topcu, Mahmut Kandemir · PDF
  12. Divergent Tasks Harm Integration Of New Entities Via Fine-Tuning

    Core Francisco Park · PDF
  13. Divine Benevolence is an $x^2$: GLUs have asymptotically faster scaling laws than MLPs

    Alejandro Francisco Queiruga · PDF
  14. Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

    Ferdinand Kapl, Emmanouil Angelis, Tobias Höppe, Kaitlin Maile, Johannes von Oswald, Nino Scherrer, Stefan Bauer · PDF
  15. Does Aurora Encode Atmospheric Structure? Latent Regime Analysis and Attribution

    Emma Kasteleyn, Ana Lucic · PDF
  16. Does LLM Pre-Training Typically Occur at the Edge of Stability?

    Yuhang Cai, Haofeng Huang, Haodong Wen, Deyi Liu, Yiyuan Ma, Kaifeng Lyu · PDF
  17. Dropout and the Outliers: Could Transformers Overcome Their Single Points of Failure?

    Nour Hezbri, Gilles Bareilles, El-Mahdi El-Mhamdi · PDF
  18. Endogenous Resistance to Activation Steering in Language Models

    Alex McKenzie, Keenan Pepper, Stijn Servaes, Martin Leitgab, Murat Cubuktepe, Michael Vaiana, Diogo S de Lucena, Judd Rosenblatt, Michael S. A. Graziano · PDF
  19. Entropy-Lens: Uncovering Decision Strategies in LLMs

    Christopher Irwin, Francesco Caso, Riccardo Ali, Pietro Lio · PDF
  20. Evidence Slopes and Effective Dimension in Singular Linear Models

    Kalyaan Rao · PDF
  21. Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models

    Marcos Villagra, Bidhan Roy, Raihan Seraj, Zhiying Jiang · PDF
  22. From Growing to Looping: A Unified View of Iterative Computation in LLMs

    Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile, Johannes von Oswald, Stefan Bauer · PDF
  23. Generalized Dual-Scale Optimization: Topology-Aware Margin Dynamics in Fine-Grained Vision

    lingfeng xia · PDF
  24. Generating output diversity from prompt re-tokenization

    Kanishk Jain, Matthew Day, Tankut Can · PDF
  25. Genomic Next-Token Predictors are In-Context Learners

    Nathan Breslow, Aayush Mishra, Michael Schatz, Anqi Liu, Mahler Revsine, Daniel Khashabi · PDF
  26. Geometric Properties of Neural Multivariate Regression: An Empirical Study

    George Andriopoulos, Zixuan Dong, Bimarsha Adhikari, Keith W. Ross · PDF
  27. Geometric Stability of Representation Manifolds as a Training-Free Diagnostic for Studying Data Augmentations

    Ahmad Taha, Rustam A. Lukmanov · PDF
  28. Gradual Stochastic Gradient Descent: from signSGD to SGD via $\ell_p$ Norm

    Jh Yuan, Liu Jiachen, Feiping Nie · PDF
  29. Homophily as a Lossy Channel: Decomposing Information in Graphs and Graph Neural Networks

    Vivek Kothari, Nicholas D. Lane · PDF
  30. In-Context Benign Overfitting: A Feature-Selection Model in In-Context Linear Regression

    Puneesh Deora, Bhavya Vasudeva, Christos Thrampoulidis · PDF
  31. Information spreading in diffusion models from effective field theory

    Navonil Neogi, Nabil Iqbal · PDF
  32. Instruction Following by Principled Attention Boosting of Large Language Models

    Vitoria Guardieiro, Avishree Khare, Adam Stein, Eric Wong · PDF
  33. Is GPU Numerical Noise Really Random? An Empirical Investigation of Floating-Point Error Structure

    Tadisetty Sai Yashwanth · PDF
  34. LAYER-DEPENDENT STRUCTURE IN GRADIENT NOISE OF SMALL CONVOLUTIONAL NETWORKS

    Mahule Roy, Subhas Roy · PDF
  35. Learning When to Be Sparse: Adaptive Activations via Two-Parameter Entropy

    Roman Rudamenko, Dmitry Abulkhanov, Konstantin Semenov, Michael Diskin, Alexander Savchenko · PDF
  36. Less Data, Faster Training: sampling bias from small dataset can speed up training

    Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu · PDF
  37. Leveraging Low-Rank Structure for Effective Weight-Sharing in Language Models

    Mark Muchane, George Sokolik, Micah Goldblum, Sanae Lotfi · PDF
  38. Masks Can Be Distracting: On Context Comprehension in Diffusion Language Models

    Julianna Piskorz, Cristina Pinneri, Alvaro Correia, Motasem Alfarra, Risheek Garrepalli, Christos Louizos · PDF
  39. Model Evolution Under Zeroth-Order Optimization: A Neural Tangent Kernel Perspective

    Chen Zhang, Yuxin Cheng, Chenchen Ding, Shuqi Wang, Jingreng Lei, Runsheng Yu, Yik-Chung WU, Ngai Wong · PDF
  40. Multi-Task Pretraining Drives Representational Convergence

    Core Francisco Park · PDF
  41. Network of Theseus (Like the ship)

    Vighnesh Subramaniam, Colin Conwell, Boris Katz, Andrei Barbu, Brian Cheung · PDF
  42. Neural Multivariate Regression with Multi-Task Learning and Target Preprocessing

    George Andriopoulos, Soyuj Jung Basnet, Juan Guevara, Bimarsha Adhikari, Li Guo, Keith W. Ross · PDF
  43. Normalized Conditional Mutual Information Surrogate Loss for Deep Learning Classifiers

    Linfeng Ye, Zhixiang Chi · PDF
  44. On the "Induction Bias" in Sequence Models

    Reza Ebrahimi, Michaël Defferrard, Sunny Panchal, Roland Memisevic · PDF
  45. On the Complexity of Neural Computation in Superposition

    Micah Adler, Nir N Shavit · PDF
  46. On the Simplicity-Similarity Tradeoff of LoRA and Full Fine-Tuning

    Jerome Emery, Darshan Patil, François Leduc-Primeau, Sarath Chandar, Ekaterina Lobacheva · PDF
  47. Optimal learning rate scaling depends on data in deep scalar linear networks

    Yedi Zhang, Peter E. Latham, Leena Chennuru Vankadara, Andrew M Saxe · PDF
  48. Optimal scaling laws in learning hierarchical multi-index models

    Leonardo Defilippis, Florent Krzakala, Bruno Loureiro, Antoine Maillard · PDF
  49. Optimization, Not Architecture, Governs Vision Transformer Generalization in Small-Data Regimes

    Divyanshu Gupta · PDF
  50. Pretraining with Masked Backstories in a Toy World

    Sultan Daniels, Dylan Davis, Gireeja Ranade, Anant Sahai · PDF
  51. PROBING INFORMATION FLOW IN VISION TRANSFORMERS THROUGH CONTROLLED ATTENTION PERTURBATION

    Thanh Do, Abe Leite · PDF
  52. Process-then-Retrieve: A Mechanistic Study of Cross-Modal Alignment in Vision-Language Models

    Arpita A Shanbhag, Julia Tran, Dhruv Reddy Mandala, Ayda Sultan · PDF
  53. Representation Geometry Mediates Neural Circuit Formation: Evidence from Systematic Regularization Analysis

    Hyunjun Kim · PDF
  54. Revealing Task-Dependent Layer Relevance via Attentive Multi-Layer Fusion

    Marco Morik, Laure Ciernik, Lukas Thede, Luca Eyring, Shinichi Nakajima, Zeynep Akata, Lukas Muttenthaler · PDF
  55. RouterInterp: Understanding Superposed Specialisation in MoE Routing

    Ilya Lasy, Nora Yinuo Cai, Kola Ayonrinde · PDF
  56. Scaling-Law Analysis of SignSGD: From Feature-Space Linear Regression to LLM Pre-training

    Binghui Li, Jianan Wang, Jinbo Wang, Lean Wang, Zilin Wang, Lei Wu · PDF
  57. Shared Gradient Discovery and Superposition: Learning Dynamics of Generalization in LLMs

    Andrei Mircea, Ildus Sadrtdinov, Irina Rish, Ekaterina Lobacheva · PDF
  58. Sharpness-Aware Pretraining Mitigates Catastrophic Forgetting

    Ishaan Watts, Catherine Li, Sachin Goyal, Jacob Mitchell Springer, Aditi Raghunathan · PDF
  59. Simple LLM Baselines are Competitive for Model Diffing

    Elias Kempf, Simon Schrodi, Bartosz Cywiński, Thomas Brox, Neel Nanda, Arthur Conmy · PDF
  60. Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws

    Fabrizio Boncoraglio, Vittorio Erba, Emanuele Troiani, Yizhou Xu, Florent Krzakala, Lenka Zdeborová · PDF
  61. Skip To The Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs Autoregressive LLM

    Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Christopher Lott, Fatih Porikli, Mingu Lee · PDF
  62. Soft Gates for Sharp Experts in Tabular Representation Learning

    Iago Breno Araujo · PDF
  63. Special solutions with small volume exist

    Tausifa Jan Saleem, Ramanjit Ahuja, Surendra Prasad, Brejesh Lall · PDF
  64. Spherical Cautious Optimizers

    Jh Yuan, Feiping Nie · PDF
  65. Steered LLM Activations are Non-Surjective

    Aayush Mishra, Daniel Khashabi, Anqi Liu · PDF
  66. STRIDE: Training Data Attribution Can Be Estimated In Activation Space

    Abir HARRASSE, Rishit Dagli, Amir Abdullah, Zhijing Jin · PDF
  67. Sustained Gradient Alignment Mediates Subliminal Learning in a Multi-Step Setting: Evidence from MNIST Auxiliary Logit Distillation Experiment

    Chayanon Kitkana, Shivam Arora · PDF
  68. The Feature-Space Alignment Hypothesis for Neural Network Sparsity

    Linghao Kong, Micah Adler, Nir N Shavit · PDF
  69. The Offline-Frontier Shift: Diagnosing Distributional Limits in Generative Multi-Objective Optimization

    Stephanie Holly, Alexandru-Ciprian Zavoianu, Siegfried Silber, Sepp Hochreiter, Werner Zellinger · PDF
  70. The Role of Data in Model Merging

    Gaurav Iyer, Ekaterina Lobacheva · PDF
  71. Thermodynamics of Reinforcement Learning Curricula

    Jacob Adamczyk, Juan Sebastian Rojas, Rahul V Kulkarni · PDF
  72. To Use or not to Use Muon: How Simplicity Bias in Optimizers Matters

    Sara Dragutinović, Rajesh Ranganath · PDF
  73. Toy Models of Combinatorial Interpretability

    Nir N Shavit, Dan Alistarh, Micah Adler · PDF
  74. Training for Compositional Sensitivity Reduces Dense Retrieval Generalization

    Radoslav Ralev, Aditeya Baral, Iliya Sotirov Zhechev, Jen Agarwal, Srijith Rajamohan · PDF
  75. TrasMuon: Trust-Region Adaptive Scaling for Orthogonalized Momentum Optimizers

    Peng Cheng, Jiucheng Zang, Qingnan Li, Liheng Ma, Yufei Cui, Yingxue Zhang, Boxing Chen, Ming Jian, Wen Tong · PDF
  76. Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge

    Bhavya Vasudeva, Puneesh Deora, Alberto Bietti, Vatsal Sharan, Christos Thrampoulidis · PDF
  77. Understanding Learning Dynamics of Zeroth-Order Optimization

    Zhe Li, Bicheng Ying, Zidong Liu, Haibo Yang · PDF
  78. Understanding Scaling Laws With Token-Level Analysis

    Arkil Patel, Marius Mosbach, Siva Reddy, Dzmitry Bahdanau · PDF
  79. Unified Perspectives on Balancedness and Parameter-norm Evolution in Neural Nets

    Jasraj Singh, Enea Monzio Compagnoni, Antonio Orvieto · PDF
  80. Vision Language Models Inherit Human Color Perception

    Core Francisco Park · PDF
  81. Weight Decay Improves Language Model Plasticity

    Tessa Han, Sebastian Bordt, Hanlin Zhang, Sham M. Kakade · PDF
  82. What Flow-Matching Brings to TD Learning?

    Bhavya Kumar Agrawalla, Michal Nauman, Aviral Kumar · PDF
  83. When Does Diffusion Help? PDE-Inspired Optimization on Fragmented and Noisy Data

    Rahul D Ray · PDF
  84. WHEN DOES META LEARNING ACTUALLY HELP? A SCIENTIFIC STUDY OF PHYSICAL INVERSE PROBLEMS

    Rahul D Ray · PDF
  85. When does Observational Data Teach Latent Dynamics? Understanding Control Misalignment with Synthetic Tasks

    Kento Nishi, Raphael Tang, Karun Kumar, Core Francisco Park, Hidenori Tanaka · PDF
  86. When to restart? Exploring escalating restarts on convergence

    Ayush K. Varshney, Sarunas Girdzijauskas, Konstantinos Vandikas, Aneta Vulgarakis Feljan · PDF
  87. Which Sparse Code? Identifiability Failures in SAE Inference

    Alessa Carbo, Eric Nalisnick · PDF
  88. Zeroth-Order Optimization at the Edge of Stability

    Minhak Song, Liang Zhang, Bingcong Li, Niao He, Michael Muehlebach, Sewoong Oh · PDF