ICLR 2026 Past Generative modelsTheory

ICLR 2026 2nd Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Efficacy

ICLR 2026 DeLTa Workshop

Submission deadline
Feb 9, 2026, 12:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (133)

Fetched from OpenReview (v2) on 2026-06-10.

  1. $\mathbf{R^3}$-Adapter: Progressive Residual Refinement and Representational Alignment for Personalized Image Generation

    Veddhanth Chakravarthy, Samir Kumar das mohapatra, Chandrakala Shanmuganathan · PDF
  2. $W_K, W_V$ is Probably All You Need: On the Necessity of the Query, Key, and Value Weight Triplet in Self-Attention Transformers

    Marko Karbevski, Antonij Mijoski · PDF
  3. A Complete Decomposition of Stochastic Differential Equations

    Samuel Duffield · PDF
  4. A Diffusive Classification Loss for Learning Energy-based Generative Models

    RuiKang OuYang, Louis Grenioux, José Miguel Hernández-Lobato · PDF
  5. A Geometric Perspective on Recursive Synthetic Training

    Patrick Batsell, Thomas Walker, Richard Baraniuk · PDF
  6. A Graph-Theoretical View of Space Folding via the Motzkin–Straus Framework

    Michal Lewandowski, Bernhard Heinzl, Roman Rainer, Bernhard Nessler, Bernhard A. Moser · PDF
  7. A Unified Density Operator View of Flow Control and Merging

    Riccardo De Santi, Malte Franke, Ya-Ping Hsieh, Andreas Krause · PDF
  8. Adapting Noise to Data by Quantile Learning

    Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl · PDF
  9. AlphaQ: Calibration-Free Bit Allocation for Mixture-of-Experts Quantization

    Wanqi Yang, Yuexiao Ma, Alexander Conzelmann, Xiawu Zheng, Michael W. Mahoney, T. Konstantin Rusch, Shiwei Liu · PDF
  10. An Efficient Test-Time Scaling Approach for Image Generation

    Vignesh Sundaresha, Akash Haridas, Vikram Appia, Lav R. Varshney · PDF
  11. An Equivariance Toolbox for Learning Dynamics

    Yongyi Yang, Liu Ziyin · PDF
  12. ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling

    Yilang Zhang, Bingcong Li, Niao He, Georgios B. Giannakis · PDF
  13. AnimalBooth: Multimodal Feature Enhancement for Animal Subject Personalization

    Chen Liu, Haitao Wu, Kafeng Wang, Xiaowang Zhang, Weiran Huang · PDF
  14. AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance

    Tong Chen, Yinuo Zhang, Sophia Vincoff, Pranam Chatterjee · PDF
  15. Attention Projection Mixing with Exogenous Anchors

    Jonathan Su · PDF
  16. Avoid What You Know: Divergent Trajectory Balance for GFlowNets

    Pedro Dall'Antonia, Tiago Silva, Daniel Csillag, Salem Lahlou, Diego Mesquita · PDF
  17. B-DENSE: Branching For Dense Ensemble Network Supervision Effeciency

    Cherish Puniani, Tushar Kumar, Arnav Bendre, Gaurav kumar, Shree Singhi · PDF
  18. Balancing Symmetry and Efficiency in Graph Flow Matching

    Benjamin Honoré, Alba Carballo-Castro, Yiming QIN, Pascal Frossard · PDF
  19. BézierFlow: Learning Bézier Stochastic Interpolant Schedulers for Few-Step Generation

    Yunhong Min, Juil Koo, Seungwoo Yoo, Minhyuk Sung · PDF
  20. BlockGen: Flexible Blockwise Sequence Modeling with Hybrid Samplers

    Justin Deschenaux, Caglar Gulcehre · PDF
  21. BSTabDiff: Block-Subunit Diffusion Priors for High-Dimensional Tabular Data Generation

    Al Zadid Sultan Bin Habib, Md Younus Ahamed, Prashnna Kumar Gyawali, Gianfranco Doretto, Donald Adjeroh · PDF
  22. CATS: Inference-aligned SFT for Diffusion LLMs via Context-sensitivity Aware Trajectory Sampling

    Seunghyuk Oh, Minjae Lee, Kevin Galim, Minseo Kim, Hyung Il Koo, Wonjun Kang, Hanbaek Lyu, Kangwook Lee · PDF
  23. CupOFMoCA: Coupled Objective-Guided Discrete Flows for Molecular Conjugate Assembly

    Ruoxi Zhang, Jiatao Gu, Pranam Chatterjee · PDF
  24. Curriculum Sampling: A Two-Phase Curriculum for Efficient Training of Flow Matching

    Pengwei Sun · PDF
  25. Data-Aware Random Feature Kernel for Transformers

    Amirhossein Farzam, Hossein Mobahi, Nolan Andrew Miller, Luke Sernau · PDF
  26. Decoding Large Language Diffusion Models with Foreseeing Movement

    Yichuan Mo, Quan Chen, Mingjie Li, Zeming Wei, Yisen Wang · PDF
  27. Decoupled Diffusion Solver for Inverse Problems on Function Spaces

    Thomas Y.L. Lin, Jiachen Yao, Lufang Chiang, Julius Berner, Anima Anandkumar · PDF
  28. DELTA: Robustly Training Diffusion Models with Weak Annotations

    Dong-Dong Wu, Jiacheng Cui, Wei Wang, Zhiqiang Shen, Masashi Sugiyama · PDF
  29. Demystifying Transition Matching: When and Why It Can Beat Flow Matching

    Jaihoon Kim, Rajarshi Saha, Minhyuk Sung, Youngsuk Park · PDF
  30. Designing Continuous Conditioning for GANs from WAE Latent Structure

    Pavlo Potapenko, Sebastien Bompas, Stefan Sandfeld · PDF
  31. Dichotomous Diffusion Policy Optimization

    Ruiming Liang, Yinan Zheng, Kexin ZHENG, Tianyi Tan, Jianxiong Li, Liyuan Mao, Zhihao Wang, Guang Chen, Hangjun Ye, Jingjing Liu, Jinqiao Wang, Xianyuan Zhan · PDF
  32. Diffusion Models with Double Guidance

    Yanfeng Yang, Kenji Fukumizu · PDF
  33. Diffusion Policy Optimization without Drifting Apart

    Haozhe Jiang, Haiwen Feng, Jiantao Jiao, Angjoo Kanazawa, Nika Haghtalab · PDF
  34. Diffusion Schrödinger Bridge Matching: When Resampling Fails

    Teodora Reu, Michael M. Bronstein, Francisco Vargas · PDF
  35. DiffusionShield: A Watermarking Approach to Safeguarding Video Integrity Against Stable Diffusion

    Zuzanna Gawrysiak, Mateusz Gabor, Tomasz Hawro · PDF
  36. Dimension-Independent Convergence of Underdamped Langevin Monte Carlo in KL Divergence

    Shiyuan Zhang, Qiwei Di, Xuheng Li, Quanquan Gu · PDF
  37. Discrete Adjoint Schrödinger Bridge Sampler

    Wei Guo, Yuchen Zhu, Xiaochen Du, Juno Nam, Yongxin Chen, Rafael Gomez-Bombarelli, Guan-Horng Liu, Molei Tao, Jaemoo Choi · PDF
  38. Discrete Bridges for Mutual Information Estimation

    Iryna Zabarianska, Sergei Kholkin, Grigoriy Ksenofontov, Ivan Butakov, Alexander Korotin · PDF
  39. Discrete Diffusion Samplers and Bridges: Off-Policy Algorithms and Applications in Latent Spaces

    Arran Carter, Sanghyeok Choi, Kirill Tamogashev, Víctor Elvira, Nikolay Malkin · PDF
  40. Discrete Meanflow Training Curriuculum

    Chia-Hong HSU, Frank Wood · PDF
  41. Discriminative Multimodal Preference Models as Guidance for Personalized Image Generation

    Wenyi Mo, Ying Ba, Tianyu Zhang, Yalong Bai, Dimitris N. Metaxas · PDF
  42. Dynamic Mixture-of-Experts for Visual Autoregressive Model

    Jort Vincenti, Metod Jazbec, Guoxuan Xia · PDF
  43. Efficient Tail-Aware Generative Optimization via Flow Model Fine-Tuning

    Zifan Wang, Riccardo De Santi, Xiaoyu Mo, Michael M. Zavlanos, Andreas Krause, Karl Henrik Johansson · PDF
  44. Elucidating Guidance in Variance Exploding Diffusion Models: Fast Convergence and Better Diversity

    Ruofeng Yang, Yiyu Qiu, Bo Jiang, Cheng Chen, Shuai Li · PDF
  45. Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

    Niclas Dern, Lennart Redl, Sebastian Pfister, Marcel Kollovieh, David Lüdke, Stephan Günnemann · PDF
  46. Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

    Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu · PDF
  47. Evaluating the Role of Great Pre-trained Diffusion Models in Few-shot Phase: Warm-up and Acceleration

    Ruofeng Yang, Yongcan Li, Bo Jiang, Cheng Chen, Shuai Li · PDF
  48. Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models

    Marcos Villagra, Bidhan Roy, Raihan Seraj, Zhiying Jiang · PDF
  49. Exposing Diversity Bias in Deep Generative Models: Statistical Origins and Correction of Diversity Error

    Farzan Farnia, Mohammad Jalali, Azim Ospanov · PDF
  50. Flow Matching based Conditional Independence Tests and Causal Structure Learning

    SHUNYU ZHAO, Yanfeng Yang, Shuai Li, Kenji Fukumizu · PDF
  51. Flow Matching in the Low-Noise Regime: Pathologies and a Contrastive Remedy

    Weili Zeng, Yichao Yan · PDF
  52. FMMI: Flow Matching Mutual Information Estimation

    Ivan Butakov, Alexander Semenenko, Valeriia Kirova, Ivan Oseledets, Alexey Frolov · PDF
  53. From Compression to Expression: A Layerwise Analysis of In-Context Learning

    Jiachen Jiang, Yuxin Dong, Jinxin Zhou, Zhihui Zhu · PDF
  54. Generative Hints

    Andy Dimnaku, A. Yusuf Kavranoglu, Yaser S. Abu-Mostafa · PDF
  55. Generative Model via Quantile Assignment

    Georgi Hrusanov, Oliver Y. Chén, Julien Bodelet · PDF
  56. Gradual Fine-Tuning for Flow Matching Models

    Gudrun Thorkelsdottir, Arindam Banerjee · PDF
  57. Grokking of Diffusion Models: Case Study on Modular Addition

    Joon Hyeok Kim, Yong-Hyun Park, Mattis Dalsætra Østby, Jiatao Gu · PDF
  58. GUIDE: Guided Initialization and Distillation of Embeddings

    Khoa Trinh, Gaurav Menghani, Erik Vee · PDF
  59. Heterogeneous Low-Bandwidth Pre-Training of LLMs

    Yazan Obeidi, Amir Sarfi, Joel Lidin, Paul Janson, Eugene Belilovsky · PDF
  60. Higher-order grammar representations for molecular generation and learning

    Yiming Huang, Yujie Zeng, Vijay Prakash Dwivedi, Simone Foti, Jianmin Wang, Jure Leskovec, Tolga Birdal · PDF
  61. Information-Geometric Optimal Control for Diffusion Models: Unified Framework via Fisher-Rao Geodesics

    Kaustubh S. Bukkapatnam, Laksh Patel · PDF
  62. Informative Data Reweighting for Image Classification

    Yancheng Wang, Ping Li, Alvin C Silva, Teresa Wu, Yingzhen Yang · PDF
  63. Inverse Entropic Optimal Transport Solves Semi-supervised Learning via Data Likelihood Maximization

    Mikhail Persiianov, Arip Asadulaev, Nikita Andreev, Nikita Starodubcev, Dmitry Baranchuk, Anastasis Kratsios, Evgeny Burnaev, Alexander Korotin · PDF
  64. Inverse-distilled Diffusion Language Models

    David Li, Nikita Gushchin, Dmitry Abulkhanov, Eric Moulines, Ivan Oseledets, Maxim Panov, Alexander Korotin · PDF
  65. Latent Process Generator Matching

    Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell · PDF
  66. Learning Generation Orders for Masked Discrete Diffusion Models via Variational Inference

    David Fox, Sam Bowyer, Song Liu, Laurence Aitchison, Raul Santos-Rodriguez, Mengyue Yang · PDF
  67. Learning Unmasking Policies for Diffusion Language Models

    Metod Jazbec, Theo X. Olausson, Louis Béthune, Pierre Ablin, Michael Kirchhof, Joao Monteiro, Victor Guilherme Turrisi da Costa, Jason Ramapuram, marco cuturi · PDF
  68. Log-density Hessian estimation without the curse of dimensionality via denoising score matching

    Konstantin Yakovlev, Anna Markovich, Nikita Puchkin · PDF
  69. Low-Pass Flow Matching

    Francesco M. Ruscio, T. Konstantin Rusch · PDF
  70. Manifold Generalization Provably Proceeds Memorization in Diffusion Models

    Zebang Shen, Ya-Ping Hsieh, Niao He · PDF
  71. Maximum Entropy under Carre du Champ Constraints

    Tanya Veeravalli, Atsushi Nitanda · PDF
  72. Minimal-Action Discrete Schrödinger Bridge Matching for Peptide Sequence Design

    Shrey Goel, Pranam Chatterjee · PDF
  73. MixFlow: Mixed Source Distributions Improve Rectified Flows

    Nazir Nayal, Christopher Wewer, Jan Eric Lenssen · PDF
  74. On Closed-Form Couplings

    Tobias Höppe, Stefan Bauer, qiang liu, Andrea Dittadi, Kirill Neklyudov · PDF
  75. On the "Induction Bias" in Sequence Models

    Reza Ebrahimi, Michaël Defferrard, Sunny Panchal, Roland Memisevic · PDF
  76. On the Lipschitz Regularity of Optimal Discriminators

    Karthik Srikumar, Aryaman Singh · PDF
  77. On the Memorization of Consistency Distillation for Diffusion Models

    Bingqing Jiang, Difan Zou · PDF
  78. On the Use of Schrödinger Bridges for Tabular Data Generation

    Irina Deeva, Kartashov Igor, Ivan Lopatin · PDF
  79. One LR Doesn’t Fit All: Heavy-Tail Guided Layerwise Learning Rates for LLMs

    Di He, Songjun Tu, Keyu Wang, Lu Yin, Shiwei Liu · PDF
  80. One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

    Daniil Selikhanovych, David Li, Aleksei Leonov, Nikita Gushchin, Sergey Kushneryuk, Alexander Filippov, Evgeny Burnaev, Iaroslav Sergeevich Koshelev, Alexander Korotin · PDF
  81. Optimal Learning-Rate Schedules under Functional Scaling Laws: Power Decay and Warmup-Stable-Decay

    Binghui Li, Zilin Wang, Fengling Chen, Shiyang Zhao, Ruiheng Zheng, Lei Wu · PDF
  82. Overclocking Electrostatic Generative Models

    Daniil Shlenskii, Alexander Korotin · PDF
  83. Paired Wasserstein Autoencoders for Conditional Sampling

    Moritz Piening, Matthias Chung · PDF
  84. PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models

    Mingue Park, Jisung Hwang, Seungwoo Yoo, Kyeongmin Yeo, Minhyuk Sung · PDF
  85. Path Invariance and the Robustness of Flow Matching: Beyond Architectural and Data Perturbations

    Eshant English, Taiji Suzuki · PDF
  86. pCoMole: Pareto-Constrained Molecule Editing with Discrete Flows

    Tong Chen, Maximilian Holsman, Lin Zhao, Yinuo Zhang, Pranam Chatterjee · PDF
  87. Performance Limits of Score-Based Generative Models via Stochastic Thermodynamics

    Nathan Kodama, Michael Hinczewski · PDF
  88. Permutation-Symmetrized Diffusion for Unconditional Molecular Generation

    Gyeonghoon Ko, Juho Lee · PDF
  89. Pre-training Large Language Models with Dynamic Precision: Low-Cost Computation with High-Fidelity Performance

    Boao Kong, Weichen Jia, Engao Zhang, Guohong Li, Yonghan Dong, Yao Wang, Yaoyuan Wang, Yunke Peng, Kun Yuan · PDF
  90. Principled Randomized Exploration of Gradient Subspaces for Efficient LLM Training

    Sahar Rajabi, Nayeema Nonta, Sirisha Rambhatla · PDF
  91. Provable Benefits of RLVR over SFT for Reasoning Models: Learning to Backtrack Efficiently

    Stanley Wei, Juno Kim · PDF
  92. Query Lower Bounds for Diffusion Sampling

    Zhiyang Xun, Eric Price · PDF
  93. Rejection Mixing: Fast Semantic Propagation of Mask Tokens for Efficient DLLM Inference

    Yushi Ye, Feng Hong, Huangjie Zheng, Xu Chen, Zhiyong Chen, Yanfeng Wang, Jiangchao Yao · PDF
  94. Rethinking Reparameterization of Stochastic Processes in Generative Modeling

    Wojciech Maciej Kozłowski, Kamil Adamczewski, Radosław Kuczbański, Maciej Zieba · PDF
  95. Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain

    Prin Phunyaphibarn, Minhyuk Sung · PDF
  96. Rex: A Family of Reversible Exponential (Stochastic) Runge-Kutta Solvers

    Zander W. Blasingame, Chen Liu · PDF
  97. RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance

    Tianlang Chen, Minkai Xu, Jure Leskovec, Stefano Ermon · PDF
  98. Robust Graph Diffusion Model

    Yancheng Wang, Ping Li, Chengshuai Zhao, huan liu, Dongfang Sun, Yingzhen Yang · PDF
  99. Robust Stochastic Gradient Posterior Sampling with Lattice Based Discretisation

    Zier Mensch, Lars Holdijk, Samuel Duffield, Maxwell Aifer, Patrick J. Coles, Max Welling, Miranda C. N. Cheng · PDF
  100. Scalable Sampling via Generalized Fixed-Point Diffusion Matching

    Denis Blessing, Lorenz Richter, Julius Berner, Egor Malitskiy, Gerhard Neumann · PDF
  101. Schrödinger bridge problem via empirical risk minimization

    Denis Belomestny, Alexey Naumov, Nikita Puchkin, Denis Suchkov · PDF
  102. Score-Guided Proximal Projection: A Unified Geometric Framework for Rectified Flow Editing

    Vansh Bansal, James G. Scott · PDF
  103. Search or Accelerate: Confidence-Switched Position Beam Search for Diffusion Language Models

    Mingyu Cao, Alvaro Correia, Christos Louizos, Shiwei Liu, Lu Yin · PDF
  104. SHAPE: SCHEDULE HESSIAN ADAPTIVE PARAMETER ESTIMATION FOR SMOOTHER DIFFUSION OPTIMIZATION

    Ritika Lamba, Jing Ma · PDF
  105. SingLoRA: Low Rank Adaptation Using a Single Matrix

    David Bensaid, Noam Rotstein, Roy Velich, Daniel Ben Saïd, Ron Kimmel · PDF
  106. Skip To The Good Part: Representation Structure & Inference-Time Layer Skipping in Diffusion vs Autoregressive LLM

    Raghavv Goel, Risheek Garrepalli, Sudhanshu Agrawal, Christopher Lott, Mingu Lee, Fatih Porikli · PDF
  107. Sliding Critical Band in RoPE-based Length Extrapolation

    Zifei Bai, Zhiwei Xu · PDF
  108. Spectral Condition for $\mu$P under Width–Depth Scaling

    Chenyu Zheng, Rongzhen Wang, Xinyu Zhang, Chongxuan Li · PDF
  109. Steering diffusion models with quadratic rewards: a fine-grained analysis

    Ankur Moitra, Andrej Risteski, Dhruv Rohatgi · PDF
  110. Stein Diffusion Guidance: Training-Free Posterior Correction for Sampling Beyond High-Density Regions

    Van Khoa Nguyen, Lionel Blondé, Alexandros Kalousis · PDF
  111. Stochastic Few-step Models

    Romeo Passaro, Zander W. Blasingame, Michael M. Bronstein, Alexander Tong · PDF
  112. Strong Reward Only: Pareto-Guided Multi-Reward Optimization

    Ying Ba, Tianyu Zhang, Mohan Zhou, Wenyi Mo, Yalong Bai · PDF
  113. Structured image representation learning for flow-matching models

    Alexandros Graikos, Kostas Triaridis, Nikolay Malkin, Dimitris Samaras · PDF
  114. Support-Proximity Augmented Diffusion Estimation for Offline Black-Box Optimization

    Yonghan Yang, Ye Yuan, Zipeng Sun, Linfeng Du, Bowei He, Haolun Wu, Can Chen, Xue Liu · PDF
  115. Synergistic Intra- and Cross-Layer Regularization Losses for MoE Expert Specialization

    Rizhen Hu, Yuan Cao, Boao Kong, Mou Sun, Kun Yuan · PDF
  116. SYNTHONY: A Stress-Aware, Intent-Conditioned Agent for Deep Tabular Generative Models Selection

    Hochan Son, Xiaofeng Lin, Jason Ni, Guang Cheng · PDF
  117. TD3B: Transition-Directed Discrete Diffusion for Allosteric Binder Generation

    Hanqun Cao, Aastha Pal, Sophia Tang, Yinuo Zhang, Jingjie Zhang, Pheng-Ann Heng, Pranam Chatterjee · PDF
  118. Time Dependent Loss Reweighting for Flow Matching and Diffusion Models is Theoretically Justified

    Lukas Billera, Hedwig Nora Nordlinder, Ben Murrell · PDF
  119. Time-Correlated Video Bridge Matching

    Viacheslav Vasilev, Arseny Ivanov, Nikita Gushchin, Maria Kovaleva, Alexander Korotin · PDF
  120. Tokenize, Diffuse, Decode: A Generative Approach to Neighborhood Discovery on Graphs

    Zhuowen Yuan, Tao Liu, Kaushik Rangadurai, Yang Yang, Minhui Huang, Yiping Han, Bo Li, Shuang Yang · PDF
  121. Training Flow Matching: The Role of Weighting and Parameterization

    Anne Gagneux, Ségolène Tiffany Martin, Rémi Gribonval, Mathurin Massias · PDF
  122. Training-Free Length Discovery for Diffusion Language Model Infilling

    Hengchang Liu, Zhao Yang, Bing Su · PDF
  123. Understanding Deterministic Diffusion through Reverse Transition Kernels

    Adrita Das, Peiran Jiang, Dantong Zhu, Barnabas Poczos, Jose Lugo-Martinez · PDF
  124. Unleashing the Potential of Diffusion Models for End-to-End Autonomous Driving

    Yinan Zheng, Tianyi Tan, Bin Huang, Enguang Liu, Ruiming Liang, Jianlin Zhang, Jianwei Cui, Guang Chen, Kun Ma, Hangjun Ye, Long Chen, Ya-Qin Zhang, Xianyuan Zhan, Jingjing Liu · PDF
  125. Unlocking the Duality between Flow and Field Matching

    Daniil Shlenskii, Alexander Varlamov, Nazar Buzun, Alexander Korotin · PDF
  126. Video Unlearning via Low-Rank Refusal Vector

    Simone Facchiano, Stefano Saravalle, Matteo Migliarini, Edoardo De Matteis, Alessio Sampieri, Andrea Pilzer, Emanuele Rodolà, Indro Spinelli, Luca Franco, Fabio Galasso · PDF
  127. What Flow-Matching Brings To TD-Learning

    Bhavya Kumar Agrawalla, Michal Nauman, Aviral Kumar · PDF
  128. What Lies Beneath the Curve? Scaling Laws in the Presence of Exact Posteriors

    Arian Khorasani, Nathaniel Chen, Yug D Oswal, Akshat Santhana Gopalan, Egemen Kolemen, Ravid Shwartz-Ziv · PDF
  129. When Does Sparsity Mitigate the Curse of Depth in LLMs

    Dilxat Muhtar, Xinyuan Song, Sebastian Pokutta, Max Zimmer, Nico Pelleriti, Thomas Hofmann, Shiwei Liu · PDF
  130. When Does Stein Beat Antithetic Sampling? Distribution Complexity in Discrete Gradient Estimation

    Hyunjun Kim · PDF
  131. Why Diffusion Language Models Struggle with Truly Parallel (Non-Autoregressive) Decoding?

    Pengxiang Li, Dilxat Muhtar, Tianlong Chen, Lu Yin, Shiwei Liu · PDF
  132. WiSP-OSch: Solver Within-Step Parallelism and Order Scheduling for Diffusion Sampling

    Víctor Lucas Rosada Canesin, Julia Gusak · PDF
  133. Zero-Flow Encoders

    Yakun Wang, Leyang Wang, Song Liu, Taiji Suzuki · PDF