ICML 2026 Past Math & reasoningGenerative models

ICML 2026 Workshop on Foundations of Deep Generative Models: Understanding Memorization, Generalization, and Reasoning

ICML 2026 FoGen Workshop

Submission deadline
May 9, 2026, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (193)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Theoretical Analysis of Curriculum Training in Diffusion Models

    Jiawei Sun, Shuai Zhang, Hongkang Li, Sijia Liu, Pin-Yu Chen, Meng Wang · PDF
  2. A Theoretical Analysis of Why Masked Diffusion Models Mitigate the Reversal Curse

    Moongyu Jeon, Sangwoo Shin, Bumjun Kim, Kyelim Lee, Albert No · PDF
  3. A Unified Perspective on Task Retrieval and Learning in In-Context Learning on Markov Data

    Peng Wang, Huikang Liu, Zhiyuan Huang, Wendong Li, Po Chen, Rujun Jiang · PDF
  4. Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling

    Yu Yao, Huanjian Zhou, Andi Han, Wei Huang, Masashi Sugiyama · PDF
  5. Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules

    Riccardo De Santi, Bruce D Lee, Cristian Perez Jensen, Kimon Protopapas, Sophia Tang, Cheng-Hao Liu, Pranam Chatterjee, Yisong Yue, Andreas Krause · PDF
  6. Amortising Bayesian Experimental Design for Sequential Information Gathering in LLMs

    Jakob Hartmann, James Harvey, Jhonathan Navott, Erik Y. Wang, Luckeciano Carvalho Melo, Flaviu Cipcigan, Cheng Zhang, Alessandro Abate · PDF
  7. An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms

    Nils Grünefeld, Jes Frellsen, Christian Hardmeier · PDF
  8. Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics

    Matthew Smart, Soumya Ganguly, Nilava Metya, Alexandre V. Morozov, Anirvan M. Sengupta · PDF
  9. Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders

    Benjamin Rozonoyer, Chong You, Michael Boratko, Himanshu Jain, Nilesh Gupta, Srinadh Bhojanapalli, Andrew McCallum, Felix X. Yu · PDF
  10. Benchmarking Multimodal Personalized Reasoning of Vision-Language Models in the Wild

    Dohyun Kim, Junyong Lhim, Kwanyong Park · PDF
  11. Benign Overfitting Does Not Occur in Diffusion Models

    Tyler Farghly, Benjamin Dupuis, Alain Oliviero Durmus, Umut Simsekli · PDF
  12. Beyond Pixel Space: Frequency-Domain Uncertainty for Structure Aware Diffusion Guidance

    Tianqi Zhao, Xixi Liu, Yingzhen Li, Zhengrui Xiang, Liangrui Peng · PDF
  13. Beyond Power Spectra: Cross-Frequency Interactions in Generative Dynamics

    Amir Mehrpanah, Mohammed Al-Jaff, Matteo Gamba, Hossein Azizpour · PDF
  14. Beyond Raw Competence: Logical Equivariance in Diffusion Language Models

    Sunwoo Hong, Seo Hyun Kim, Younwoo Choi, Chen-Hao Chao, Se-Young Yun, Rahul G Krishnan · PDF
  15. Bidirectional Trajectory Smoothing for Training-Free Image Generation with Rectified Flows

    Yan Luo, Henry Huang, Mengyu Wang · PDF
  16. Blind denoising diffusion models and adaptive sampling algorithms

    Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero P Simoncelli · PDF
  17. Boltz-Perturb: Probing Generalization in Co-Folding Models via Inference-Time Perturbation

    Hyeyun Jung, Alan C Cheng, BoRam Lee · PDF
  18. Brain-Measurable Diffusion Decoding: Auditing Information Provenance in fMRI Reconstruction

    Chun-Mei Tseng · PDF
  19. Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular Design

    Lingxiao Li, Haobo Zhang, Bin Chen, Jiayu Zhou · PDF
  20. Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

    Young Kyung Kim, Oded Schlesinger, Yuzhou Zhao, J. Matias Di Martino, Guillermo Sapiro · PDF
  21. Chain-of-Thought Gradient Descent

    Hong-Yu Chen, Venkat Sripad Ganti, Jerry Yao-Chieh Hu, Hude Liu, Han Liu · PDF
  22. Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects

    Xiaoyu Luo, Wenrui Yu, Qiongxiu Li, Johannes Bjerva · PDF
  23. Complexity-Stratified Evaluation Reveals Shortcut Regimes in Rotational Novel View Synthesis

    Rohan Keyur Dalal, Han Lee, Irene Tang · PDF
  24. Compositional Flow Matching with Factored Velocity Fields

    Avery Hee-Woon Ryoo, Dane Malenfant, Matthew G Perich, Guillaume Lajoie · PDF
  25. Context Over Content: Exposing Evaluation Faking in Automated Judges

    Manan Gupta, Inderjeet Jayakumar Nair, Lu Wang, Dhruv Kumar · PDF
  26. Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation

    Siyi Chen, Shaowei Liu, Yixuan Jia, Zian Wang, Huan Ling, Qing Qu, Jun Gao · PDF
  27. Demystifying the Slash Pattern in Attention: The Role of RoPE

    Yuan Cheng, Fengzhuo Zhang, Yunlong Hou, Cunxiao Du, Chao Du, Tianyu Pang, Aixin Sun, Zhuoran Yang · PDF
  28. Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

    Manan Gupta, Dhruv Kumar · PDF
  29. Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold

    Ye He, Yitong Qiu, Molei Tao · PDF
  30. Diffusion Models for Inverse Problems on Riemannian Manifolds

    Ivan Lee, Xingyu Xu, Yuejie Chi · PDF
  31. Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories

    Dake Bu, Wei Huang, Andi Han, Bo Xue, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda · PDF
  32. Distributional Readout: A Memorization Regime in Autoregressive Generative Models

    Anany Kotawala · PDF
  33. Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework

    Xiaoyu Luo, Yiyi Chen, Qiongxiu Li, Johannes Bjerva · PDF
  34. Do Thinking Tokens Help with Safety?

    Narutatsu Ri, Abhishek Panigrahi, Sanjeev Arora · PDF
  35. DPMI: A Principled Index for Neural Polysemanticity via Dirichlet Process Mixture Modeling

    Manan Gupta, Dhruv Kumar · PDF
  36. DPRM: A Plug-in Token-Ordering Module for Diffusion Language Models

    Dake Bu, Wei Huang, Andi Han, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda · PDF
  37. Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?

    Rui Zhao, Kaiming Yang, Jifeng Zhu, Siyang Chen, Ziqi Wang, Weijia Wu, Kevin Qinghong Lin, Mike Zheng Shou · PDF
  38. DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking

    Gilad Turok, Yair Schiff, Christopher De Sa, Volodymyr Kuleshov · PDF
  39. DVD: Deterministic Video Depth Estimation with Generative Priors

    Hongfei Zhang, Harold Haodong Chen, Chenfei Liao, Jing He, Zixin Zhang, Haodong Li, Yihao Liang, Kanghao Chen, Bin Ren, Xu Zheng, Shuai Yang, Kun Zhou, Yinchuan Li, Nicu Sebe, Ying-Cong Chen · PDF
  40. Early Semantic Commitment in Diffusion Sampling

    Patrick Reichherzer · PDF
  41. EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL

    Lunjun Zhang, Jimmy Ba · PDF
  42. Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer

    Baris Askin, Muhammed Ustaomeroglu, Anupam Nayak, Gauri Joshi, Guannan Qu, Carlee Joe-Wong · PDF
  43. Enhancing Knowledge Injection with Surrounding Backgrounds in Continual Training LLMs

    Zichen TANG, Zhenheng Tang, Yifan Hou, Peijie Dong, Xiang Liu, Shaohuai Shi, Xiaowen Chu, Bo Li · PDF
  44. Evaluating Spatial World Modeling in Video \\ Generators via 3D Camera Trajectory Generation

    Tianlong Wang, Lehan Yang, Yuheng Liu, Hanzhang Yuan, Wenhao Zhang, Pinqiao Wang, Yifan Li, Yu Kong, Sheng Li · PDF
  45. Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles

    Xiao Li, Yixuan Jia, Zekai Zhang, Xiang Li, Lianghe Shi, Jinxin Zhou, Zhihui Zhu, Liyue Shen, Qing Qu · PDF
  46. Evolutionary System Prompt Learning for Reinforcement Learning in LLMs

    Lunjun Zhang, Ryan Chen, Bradly C. Stadie · PDF
  47. Extracting the Data Manifold from Diffusion Models via a Score-Based Non-Conformal Riemannian Metric

    Shinnosuke Saito, Takashi Matsubara · PDF
  48. Feedforward Mixing is as Sharp as it is Slow in Reverse

    Benedict Aaron Tjandra, Avi Wigderson, João G. M. Araújo, Alex Vitvitskyi, Federico Barbero, Petar Veličković · PDF
  49. FERMI: Feature-Mapping for Relational Membership Inference on Tabular Diffusion Models

    Abtin Mahyar, Masoumeh Shafieinejad, Yuhan Liu, Xi He · PDF
  50. Few-Shot Learning in Video Diffusion Models

    Pablo Acuaviva, Aram Davtyan, Mariam Hassan, Sebastian Stapf, Ahmad Rahimi, Alexandre Alahi, Paolo Favaro · PDF
  51. Fine-Tuning Dynamics of In-Context Factual Recall in Transformers

    Ruomin Huang, Eshaan Nichani, Jason D. Lee, Rong Ge · PDF
  52. Fixed-Point Reasoning: Stable and Adaptive Deep Looped Models

    Sajad Movahedi, Shlomo Libo Feigin, Vera Milovanović, Alexander Theus, Thomas Hofmann, Valentina Boeva, T. Konstantin Rusch, Antonio Orvieto · PDF
  53. Flow Matching on General Manifolds via Pulling Back Geodesic Convex Latent Manifolds

    Neil He, Ge Liu · PDF
  54. ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing

    Yixuan Jia, Siyi Chen, Yida Pan, Xiao Li, Lianghe Shi, Chanyong Jung, Haijie Yuan, Ismail Alkhouri, Yue Cynthia Wu, Saiprasad Ravishankar, Jeffrey A Fessler, Qing Qu · PDF
  55. Forward-Chaining Temporal Point Process

    Chao Yang, Wendi Ren, Shuang Li · PDF
  56. Frontier Language Models Struggle to Copy: Text Can Be Better Viewed in 2D

    Haodong Wen, Yiran Zhang, Yingfa Chen, Kaifeng Lyu · PDF
  57. Frontier Learning: Training LLM Reasoners at the Edge of Capability

    Shyam Sundhar Ramesh, Robin Faro, Enea Monzio Compagnoni, Ilija Bogunovic, Aurelien Lucchi · PDF
  58. Gating Enables Curvature: A Geometric Expressivity Gap in Attention

    Satwik Bathula, Anand Joshi · PDF
  59. Generalization of Diffusion Models Arises with a Balanced Representation Space

    Zekai Zhang, Xiao Li, Xiang Li, Lianghe Shi, Meng Wu, Molei Tao, Qing Qu · PDF
  60. Growing Images: Spatial Scheduling in Diffusion Inpainting

    Matteo Vilucchio, Luca Pesce, Florent Krzakala · PDF
  61. Hazard Compression: Catastrophic Forgetting in Diffusion-Based Generative Replay under Distribution Shift

    Wei Gao, Ellie Du, Akeel Majeed · PDF
  62. How Cross-Entropy Shapes Representation Geometry: A Spectral Study on Cycle Graphs

    Tina Behnia, Christos Thrampoulidis · PDF
  63. How Data Shapes RoPE Frequency Usage: From Positional Scale Matching to Length Generalization

    Xinyi Wu, Siyuan Liu, Ali Jadbabaie · PDF
  64. How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

    Mark Kozdoba, Shie Mannor · PDF
  65. How Recursive Training Collapses and What Can Be Done About It

    Ajaz A Bhat · PDF
  66. How to Train Your Latent Diffusion Language Model Jointly With the Latent Space

    Viacheslav Meshchaninov, Alexander Shabalin, Egor Chimbulatov, Nikita Gushchin, Ilya Koziev, Alexander Korotin, Dmitry Vetrov · PDF
  67. Imagined Memorisation: Training-Data Leakage in Model-Based RL World Models

    Wang Ngai Ng · PDF
  68. In- and Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks

    Yannic Neuhaus, Chanoknan Tanchotikul, Nicolas Flammarion, Matthias Hein, Francesco Croce · PDF
  69. In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration

    Youngbin Choi, Minjong Lee, Saemi Moon, Seunghyuk Cho, Chaehyeon Chung, MoonJeong Park, Dongwoo Kim · PDF
  70. Interdomain Attention: Beyond Token-Level Key-Value Memory

    Naoki Kiyohara, Harrison Bo Hua Zhu, Riccardo El Hassanin, Zhuo Sun, Wenlong Chen, Samir Bhatt, Yingzhen Li · PDF
  71. Internal Data Repetition Destroys Language Models

    Jessica Chudnovsky, Joshua Kazdan, Noam Itzhak Levi, Rylan Schaeffer, Yegor Denisov-Blanch, Sanmi Koyejo, David L. Donoho · PDF
  72. Internal Tree Search Execution in Transformers

    Hibiki Fukushima, Taiji Suzuki · PDF
  73. Interpreting Latent CoT Reasoning as Dynamical Systems

    Shreya Sanjay Boyane, Sabari Iyyappan Duraipandian, Manju Nagesh, Jerome Francis, Archana Vaidheeswaran, Kevin Zhu · PDF
  74. Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds

    Guoji Fu, Taiji Suzuki, Wee Sun Lee, Atsushi Nitanda · PDF
  75. Inverse-Confidence Sampling for Continuous Diffusion Language Models

    Andrei Rekesh, Jarrid Rector-Brooks, Cheng-Hao Liu · PDF
  76. Is your Flow Matching Model Really Generalising? A Path-Length Diagnostic

    Eshant English, Wei Huang, Taiji Suzuki · PDF
  77. JUMP: Single-Pass Membership Inference on Fine-Tuned Diffusion Language Models

    Yeachan Jun, Albert No · PDF
  78. Just Add More Capacitors: Eliminating Flux Leakage in Electrostatic Field Matching

    Daniil Shlenskii, S. I. Manukhov, Alexander Kolesov, Alexander Varlamov, Nazar Buzun, V.V. Palyulin, Alexander Korotin · PDF
  79. LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling

    Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, Ge Liu · PDF
  80. Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data

    Bao Pham, Mohammed J Zaki, Luca Ambrogioni, Dmitry Krotov, Matteo Negri · PDF
  81. Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering

    Manan Gupta, Dhruv Kumar · PDF
  82. Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

    Yair Schiff, Omer Belhasin, Roy Uziel, Guanghan Wang, Marianne Arriola, Gilad Turok, Ran Zilberstein, Michael Elad, Volodymyr Kuleshov · PDF
  83. Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

    Benjamin Rozonoyer, Jacopo Minniti, Dhruvesh Patel, Neil Band, Joey Bose, Tim G. J. Rudner, Andrew McCallum · PDF
  84. Learning Human Habits with Rule-Guided Active Inference

    Zhiren Gong, Chao Yang, Wendi Ren, Shuang Li · PDF
  85. Learning Manifold Data with Flow Matching

    Sophia Pi, Mingcheng Lu, Jerry Yao-Chieh Hu, Maojiang Su, Weimin Wu, Han Liu · PDF
  86. Learning to Trade Like an Expert: Cognitive Fine-Tuning for Stable Financial Reasoning in Language Models

    Yuchen Pan, Soung Chang Liew · PDF
  87. Leveraging Instruction Tuning and Merging for Reasoning Model Adaptation

    Yu-Du Feng, Niels Mündler, Mark Vero, Martin Vechev · PDF
  88. LLM Generation Novelty Through the Lens of Semantic Similarity

    Philipp Davydov, Ameya Prabhu, Matthias Bethge, Elisa Nguyen, Seong Joon Oh · PDF
  89. LLM-WikiRace: A Benchmark for Planning and Reasoning over Real-World Knowledge Graphs

    Juliusz Ziomek, William Bankes, Lorenz Wolf, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic · PDF
  90. Local Coverage Governs Memorization in Diffusion Models

    Claudia Merger, Sebastian Goldt · PDF
  91. Local Manifold Identification with Latent Linear Models and OT Flows

    Sherman Khoo, Song Liu, Mark Beaumont · PDF
  92. Mamba as Measure-Valued Associative Memory: Infinite-Context Limits and Minimax-Optimal Learning

    Zeyu Bao, Tianyao Zhang · PDF
  93. Manifold-Guided Attention Steering

    Ian Li, Kapilesh Guruprasad, Raunak Sengupta, Ninad Satish, Loris D'Antoni, Rose Yu · PDF
  94. Masked Distillation: Internalizing Chain-of-Thought in Small Language Models

    Durgesh Kalwar, Vardhan Palod, Subbarao Kambhampati · PDF
  95. MCLR: Improving Conditional Modeling via Inter-Class Likelihood-Ratio Maximization and Unifying Classifier-Free Guidance with Alignment Objectives

    Xiang Li, Yixuan Jia, Xiao Li, Jeffrey A Fessler, Rongrong Wang, Qing Qu · PDF
  96. Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers

    Lee Hyoseok, Sohwi Lim, Eunju Cha, Tae-Hyun Oh · PDF
  97. Memorization Detection in Diffusion Models via Text Embedding Interpolation

    Changsu Shin, Jinseong Park, Sungyoon Lee · PDF
  98. Memorization, Retrieval, and Reasoning in LLM-Driven EDA: A Case Study in FPGA Timing Closure

    Saher Elsayed · PDF
  99. Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

    Victor Conchello Vendrell, Arnau Padrés Masdemont, Niccolò Grillo, Jordi Ros-Giralt, Arash Behboodi, Fabio Valerio Massoli · PDF
  100. Midpoint Generative Models

    Daniil Shlenskii, Nikita Gushchin, Lev Novitskiy, Dmitry V. Dylov, Alexander Korotin · PDF
  101. MINDGRAPH: Faithful Concept-Graph Memory for Long-Context Reasoning

    Xiangwen Wang, Varun Chandrasekaran · PDF
  102. Mixture-Greedy for Online Generative Model Selection: Is UCB Necessary in Diversity-Aware Multi-Armed Bandits?

    Bahar Dibaei Nia, Farzan Farnia · PDF
  103. Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds

    YIDING SONG, Hanming Ye · PDF
  104. Neural Network-Based Diffusion Models Adapt to Low-Dimensional Multi-Modal Data Structure

    Jingda Wu, Changxiao Cai · PDF
  105. Norm-Controlled Likelihood Guidance for Diffusion-based Inverse Solver

    Yi Zhang, Xingwu Chen, Difan Zou · PDF
  106. Not Every Time and Frequency Need to Be Forgotten in Diffusion Unlearning

    Jinseong Park, Mijung Park · PDF
  107. On approximation and estimation of Schrödinger potentials without the curse of dimensionality

    Artsiom Patarusau, Nikita Puchkin, Konstantin Yakovlev · PDF
  108. On the approximation of Schrödinger bridge potentials

    Denis Belomestny, Alexey Naumov, Nikita Puchkin, Denis Suchkov · PDF
  109. On the Memorization of Consistency Distillation for Diffusion Models

    Bingqing Jiang, Difan Zou · PDF
  110. On the Policy Gradient Foundations of Group Relative Policy Optimization: Credit Assignment, Gradient Sparsity, and Rank Collapse

    Amritansh Mishra, Supriyo Chakraborty, Berkcan Kapusuzoglu · PDF
  111. On the Relationship between the Choice of Representation and In-context Learning

    Ioana Marinescu, Kyunghyun Cho, Eric Karl Oermann · PDF
  112. On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity

    Andrei Liviu Nicolicioiu, Mohammad Pezeshki, Aaron Courville · PDF
  113. One Coupling to Rule Them All: Optimal Transport as the Unifying Geometry of Diffusion Models, Flow Matching, and Reasoning in Deep Generative Models

    Mohammad Sajjad Ghaemi · PDF
  114. Pathwise Transported Memory Priors for Autoregressive Generative Models

    Tomoya Mizuguchi, Bum Jun Kim · PDF
  115. Personalized Federated Training of Latent Diffusion Models with Privacy Guarantees

    A F M Mahfuzul Kabir, Lingxiao Wang · PDF
  116. Personalized Privacy Control in LLMs via Attention Head Intervention

    Junseok Kim, Nakyeong Yang, Kyomin Jung · PDF
  117. Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them

    Woojung Han, Seil Kang, Youngjun Jun, Min-Hung Chen, Fu-En Yang, Seong Jae Hwang · PDF
  118. Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation

    Yi Zhang, Peng Wang, Difan Zou · PDF
  119. POLYOMINOGEN: A Controlled Testbed for Understanding Memorization and Compositional Generalization in Conditional Diffusion Models

    Aishi Huang · PDF
  120. Position Augmentation: Reducing RoPE Extrapolation Cliffs via Random Position Scaling During Training

    Zacharie Bugaud · PDF
  121. Prior Dominance in Audio-Visual LLMs: When Generative Models Memorize Over Reasoning Under Cross-modal Conflict

    Adarsh Sudheer, David Li, Omar El-Banna, Ishaan Kodarapu, Arjun Bahuguna, Vasu Sharma · PDF
  122. Probe Choice Changes Canary-Memorization Verdicts: Three Post-Hoc Disagreement Case Studies in a Text-Dominant LoRA-Tuned Autoregressive Testbed

    Zhichao Fan, Zexin Zhuang, Yanhang Li · PDF
  123. Quantifying the Effect of Test Set Contamination on Generative Evaluations

    Rylan Schaeffer, Joshua Kazdan, Baber Abbasi, Ken Liu, Brando Miranda, Ahmed M Ahmed, Fazl Barez, Abhay Puri, Stella Biderman, Niloofar Mireshghallah, Sanmi Koyejo · PDF
  124. Quantifying the Memorization-to-Generalization Transition: Scaling Laws and Phase Structure in Grokking

    Anish Kataria · PDF
  125. Reasoning Across Space: Tiny Recursive Models for Spatial Omics

    Dhruva Rajwade, Marianna Rapsomaniki · PDF
  126. Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models

    Siyuan Zhang, Jialian Li, Yichi Zhang, Xiao Yang, Yinpeng Dong, Hang Su · PDF
  127. Reasoning Phases Are Continuous, Not Discrete: Evidence from Switching Linear Dynamical Systems Applied to Chain-of-Thought Residual Streams

    Manan Gupta, Dhruv Kumar · PDF
  128. ReCAST: Probing Sparse Reference Use in In-Context Image Generation

    YeonGyu Han, Junah Jung, Dongheon Lee · PDF
  129. Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics

    Benjamin Sterling, Monica Bugallo, Tom Tirer · PDF
  130. Registers Matter for Pixel-space Diffusion Transformers

    Nikita Starodubcev, Ilia Sudakov, Ilya Drobyshevskiy, Artem Babenko, Dmitry Baranchuk · PDF
  131. Reinforcement Learning with Promising Tokens for Large Language Models

    Jing-Cheng Pang, Liang Lu, XianTang, KUN JIANG, Sijie Wu, Kai Zhang, Xubin Li · PDF
  132. Relative Score Policy Optimization for Diffusion Language Models

    Zichao Yu, Shengze Xu, Bingqing Jiang, Wenyi Zhang, Difan Zou · PDF
  133. Rethinking "RL Generalizes, SFT Memorizes": The Role of SFT Data

    Yunlong Hou, Fengzhuo Zhang, Yuan Cheng, Jiachun Pan, Xingyao Li, Zhuoran Yang · PDF
  134. Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

    Qihan Ren, Wang Peng, Ruikun Cai, Shuai Shao, Dadi Guo, Yuejin Xie, Yafu Li, Quanshi Zhang, Xia Hu, Jing Shao, Dongrui Liu · PDF
  135. Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

    Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding · PDF
  136. Rethinking On-Policy Self-Distillation for Thinking Models

    Simran Kaur, Narutatsu Ri, Yinghui He, Liam H Fowl, Sanjeev Arora · PDF
  137. Retrieval Dwelling: A Principled Sampling Strategy for Exploiting Spurious State Exploration

    Rohit Sinha, Saroj Kumar · PDF
  138. Revisiting Spectral Representations in Generative Diffusion Models

    Yuehao Wang, Peihao Wang, Hanwen Jiang, Ziyi Yang, Qixing Huang, Zhangyang Wang · PDF
  139. RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution

    Kaiyuan Li, Jing-Cheng Pang, Yang Yu · PDF
  140. SALSA: State Augmentation via Learned Selective Attention

    Joel Manu, Frederick Hoffman, Krithik Ramesh · PDF
  141. Sample Efficient Generative Model for Molecular Dynamics Trajectories via Twisted Sequential Monte Carlo

    Qijia Jiang · PDF
  142. Scale Dependent Data Duplication

    Joshua Kazdan, Noam Itzhak Levi, Rylan Schaeffer, Jessica Chudnovsky, Abhay Puri, Bo He, Mehmet Donmez, Sanmi Koyejo, David L. Donoho · PDF
  143. Scaling with Recursion in Masked Discrete Diffusion Models

    Alba Carballo-Castro, Julianna Piskorz, Paulius Rauba, Mihaela van der Schaar, Pascal Frossard · PDF
  144. SciReview: Diagnosing Compositional Scientific Reasoning in Frontier Models

    Sushant Mehta · PDF
  145. Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision

    Yinghui He, Simran Kaur, Adithya Bhaskar, Yongjin Yang, Jiarui Liu, Narutatsu Ri, Liam H Fowl, Abhishek Panigrahi, Danqi Chen, Sanjeev Arora · PDF
  146. Separating Intrinsic Ambiguity from Estimation Uncertainty in Deep Generative Models for Linear Inverse Problems

    Yuxin Guo, Dongrui Deng, Pulkit Grover · PDF
  147. Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models

    Advaith Ravishankar, Serena Liu, Mingyang Wang, Todd Y. Zhou, Jeffrey Zhou, Arnav Sharma, Ziling Hu, Léopold Das, Abdulaziz Sobirov, Faizaan Siddique, Freddy Yu, Julia Seungjoo Baek, Yan Luo, Mengyu Wang · PDF
  148. Sobolev Regularized Score Difference Estimation in Diffusion Models

    Chenghan Xie, Jose Blanchet, Renyuan Xu · PDF
  149. Solve the Loop: Attractor Models for Language and Reasoning

    Jacob Fein-Ashley, Paria Rashidinejad · PDF
  150. Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning

    Prakhar Gupta, Garv Shah, Satyam Goyal, Anirudh Kanchi · PDF
  151. Spectral Signatures of Memorization in Diffusion Models: A Multi-Scale Diagnostic Study

    Raghav Agarwal, Aayam Bansal, Ishaan Gangwani, Kabir Jain · PDF
  152. SRM-LoRA: Sub-Riemannian-Style Updates for Mitigating LLM Hallucination in Low-Rank Adaptation

    Changgyu Boo · PDF
  153. Steering Dynamical Regimes of Diffusion Models by Breaking Detailed Balance

    Haiqi Lu, Ying Tang · PDF
  154. Structure Over Scale: Rethinking Adaptation for Reinforcement Learning with Verifiable Rewards

    Allan Kazakov, Abdurrahman Javat · PDF
  155. Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

    Sudhanshu Agrawal, Risheek Garrepalli, Raghavv Goel, Christopher Lott, Fatih Porikli, Mingu Lee · PDF
  156. Synthesizability-Aware Materials Generation with Target Properties via Reinforcement Learning

    Duc Dat Dang, Junwu Chen, Philippe Schwaller · PDF
  157. Temporal Backtracking Search for Test-time Generative Video Reasoning

    SeJoon Jun, Zheng Ding, Huangyuan Su, Weirui Ye, Yilun Du · PDF
  158. Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling

    Afiq Abdillah Effiezal Aswadi, Oliver Britton, Ross Baker, Matthew Farrugia-Roberts · PDF
  159. Test of Time: Rethinking Temporal Signal of Benchmark Contamination

    Terry Jingchen Zhang, Gopal Dev, Ning Wang, Max Obreiter, Wenyuan Jiang, Punya Syon Pandey, Keenan Samway, Yinya Huang, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin · PDF
  160. The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models

    Dueun Kim, Albert No · PDF
  161. The Distillation Game: Adaptive Attacks & Efficient Defenses

    Youssef Allouah, Mahdi Haghifam, Sanmi Koyejo, Reza Shokri · PDF
  162. The Surprising Effectiveness of Deleting Weights in LLM Reasoning and Adaptation

    Jack Lu, Zhenbang Yang, Mike Lasby, Tejas Pote, Yani Ioannou, Mengye Ren · PDF
  163. TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals

    Gregory Kang Ruey Lau, Minh Huynh Nguyen, Bryan Kian Hsiang Low · PDF
  164. Tight PAC-Bayes Generalisation Guarantees for Large Language Model Safety Monitoring

    Tom A. Lamb, Philip Torr, Tim G. J. Rudner · PDF
  165. Time-Correlated Video Bridge Matching

    Viacheslav Vasilev, Arseny Ivanov, Nikita Gushchin, Maria Kovaleva, Alexander Korotin · PDF
  166. Towards \textit{Effective Theory} of LLMs: A Representation Learning Approach

    Muhammed Ustaomeroglu, Guannan Qu · PDF
  167. Tracing Uncertainty in Language Model "Reasoning"

    Nils Grünefeld, Bertram Højer, Philipp Mondorf, Barbara Plank, Anna Rogers, Christian Hardmeier, Stefan Heinrich, Jes Frellsen · PDF
  168. Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling

    Deevyanshu Malu, Deeptanshu Malu, Sunita Sarawagi, Aditya Nemiwal · PDF
  169. Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs

    Hongkang Li, Hancheng Min, Rene Vidal · PDF
  170. TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models

    Arseny Ivanov, Sergei Kholkin, Vladislav Gromadskii, Grigoriy Ksenofontov, Ivan Oseledets, Alexander Korotin · PDF
  171. Understanding Flatness in Generative Models: Its Role and Benefits

    Taehwan Lee, Kyeongkook Seo, Jaejun Yoo, Sung Whan Yoon · PDF
  172. Understanding Generalization in Diffusion Distillation via Probability Flow Distance

    Huijie Zhang, Zijian Huang, Siyi Chen, Jinfan Zhou, Zekai Zhang, Peng Wang, Qing Qu · PDF
  173. Understanding LLM generalization through fine-tuning

    Harshul Basava, Whang Shih Ee, Keshav Shenoy, Emil Ryd · PDF
  174. Understanding Solver-Induced Variance Distortion in Conditional Diffusion Regression

    Jaemin Song, Jaegi Jeon · PDF
  175. Universality, Composition Generalization, and Algorithm Emulation All In-Context

    Jerry Yao-Chieh Hu, Hong-Yu Chen, Po-Chiao Lin, Maojiang Su, Han Liu · PDF
  176. Unlearning for One-Step Generative Models via Unbalanced Optimal Transport

    Hyundo Choi, Junhyeong An, Jinseong Park, Jaewoong Choi · PDF
  177. Unlocking the Duality between Flow and Field Matching

    Daniil Shlenskii, Alexander Varlamov, Nazar Buzun, Alexander Korotin · PDF
  178. Velocity Adaptation for Flow-Matching Models

    Sai Praneeth, Gouri Shanker, Harshit Dubey, Chandra Sekhar Seelamantula · PDF
  179. ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?

    Haonan Han, Jiancheng Huang, Xiaopeng Sun, Jun-Yan He, Rui Yang, Jie Hu, Xiaojiang Peng, Lin Ma, Xiaoming Wei, Xiu Li · PDF
  180. What a Small Autoregressive Transformer Briefly Learns and Then Forgets: Transient Structural Capabilities and Probe-Specific Head Repurposing

    Juliana Li, Diya Sreedhar · PDF
  181. What Architectural Inductive Bias Makes Diffusion Models Succeed? A Perspective from the Implicit Regularization of Gradient Descent

    Tongtong Liang, Esha Singh, Rahul Parhi, Alex Cloninger, Yu-Xiang Wang · PDF
  182. What to Forget in Unlearning? Forget Set Curation for Language Models

    Animesh Jha, Arpandeep Khatua, Youssef Allouah, Sanmi Koyejo · PDF
  183. What You Predict Shapes How You Memorize: Target-Parameterization and Memorization Dynamics in Flow Matching

    Mohammed Al-Jaff, Amir Mehrpanah, Gustav Eje Henter, Hossein Azizpour, Danica Kragic · PDF
  184. When Does Diffusion Purification Amplify Perturbations?

    Haibo Zhang, Saiyue Lyu, Yuntao Wang, Takeshi Saitoh · PDF
  185. Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR

    Soeun Kim, Albert No · PDF
  186. Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement

    Injin Kong, Hyoungjoon Lee, Yohan Jo · PDF
  187. Where’s the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions

    Nicole H. Ma, Nick Rui · PDF
  188. Why Alignment Must Precede Distillation: A Minimal Working Explanation

    Sungmin Cha, Kyunghyun Cho · PDF
  189. Why Are Distribution-Matching Distilled Students Lazy? Understanding the Copying Behavior in Few-Step Distillation

    Shucheng Li, Iolo Jones, Alexander Tong, Michael M. Bronstein · PDF
  190. Why Does Pruning during Training Work? A Signal-to-Noise Analysis of Sparse Neural Network Training

    Shiyuan Ren, Jiarui Jiang, Miao Zhang, Xiang Deng, Min Zhang, Liqiang Nie · PDF
  191. Why is A+B Better Than B? A Simple Graph Perspective on Task Transfer

    Dang Nguyen, Jianhao Huang, Ali Payani, Baharan Mirzasoleiman · PDF
  192. Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

    Mujtaba farhan, Ashwinee Panda, Maheep Chaudhary, Sean Wu · PDF
  193. Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing

    Ankush Kadu, Aswanth Krishnan · PDF