ICML 2026 Past Math & reasoningGenerative models
ICML 2026 Workshop on Foundations of Deep Generative Models: Understanding Memorization, Generalization, and Reasoning
ICML 2026 FoGen Workshop
- Submission deadline
- May 9, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (193)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Theoretical Analysis of Curriculum Training in Diffusion Models
-
A Theoretical Analysis of Why Masked Diffusion Models Mitigate the Reversal Curse
-
A Unified Perspective on Task Retrieval and Learning in In-Context Learning on Markov Data
-
Accelerating Discrete Diffusion Models with Parallel-In-Time Sampling
-
Active Flow Expansion for Out-of-Distribution Discovery: from Theory to Molecules
-
Amortising Bayesian Experimental Design for Sequential Information Gathering in LLMs
-
An Isotropic Approach to Efficient Uncertainty Quantification with Gradient Norms
-
Attention as In-Context Empirical Bayes: A Two-Stage View via Particle Dynamics
-
Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders
-
Benchmarking Multimodal Personalized Reasoning of Vision-Language Models in the Wild
-
Benign Overfitting Does Not Occur in Diffusion Models
-
Beyond Pixel Space: Frequency-Domain Uncertainty for Structure Aware Diffusion Guidance
-
Beyond Power Spectra: Cross-Frequency Interactions in Generative Dynamics
-
Beyond Raw Competence: Logical Equivariance in Diffusion Language Models
-
Bidirectional Trajectory Smoothing for Training-Free Image Generation with Rectified Flows
-
Blind denoising diffusion models and adaptive sampling algorithms
-
Boltz-Perturb: Probing Generalization in Co-Folding Models via Inference-Time Perturbation
-
Brain-Measurable Diffusion Decoding: Auditing Information Provenance in fMRI Reconstruction
-
Chain-of-Generation: Progressive Latent Diffusion for Text-Guided Molecular Design
-
Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation
-
Chain-of-Thought Gradient Descent
-
Characterizing Memorization in Diffusion Language Models: Generalized Extraction and Sampling Effects
-
Complexity-Stratified Evaluation Reveals Shortcut Regimes in Rotational Novel View Synthesis
-
Compositional Flow Matching with Factored Velocity Fields
-
Context Over Content: Exposing Evaluation Faking in Automated Judges
-
Data-Forcing Distillation: Restoring Diversity and Fidelity in Few-Step Video Generation
-
Demystifying the Slash Pattern in Attention: The Role of RoPE
-
Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations
-
Diffusion Model's Generalization Can Be Characterized by Inductive Biases toward a Data-Dependent Ridge Manifold
-
Diffusion Models for Inverse Problems on Riemannian Manifolds
-
Distributional Biases in Post-Training: A Markovian Analysis of Reasoning Trajectories
-
Distributional Readout: A Memorization Regime in Autoregressive Generative Models
-
Do LLMs Really Memorize Personally Identifiable Information? Revisiting PII Leakage with a Cue-Controlled Memorization Framework
-
Do Thinking Tokens Help with Safety?
-
DPMI: A Principled Index for Neural Polysemanticity via Dirichlet Process Mixture Modeling
-
DPRM: A Plug-in Token-Ordering Module for Diffusion Language Models
-
Dream.exe: Can Video Generation Models Dream Executable Robot Manipulation?
-
DUEL: Exact Likelihood for Masked Diffusion via Deterministic Unmasking
-
DVD: Deterministic Video Depth Estimation with Generative Priors
-
Early Semantic Commitment in Diffusion Sampling
-
EMA Policy Gradient: Taming Reinforcement Learning for LLMs with EMA Anchor and Top-k KL
-
Emergent and Subliminal Misalignment Through the Lens of Data-Mediated Transfer
-
Enhancing Knowledge Injection with Surrounding Backgrounds in Continual Training LLMs
-
Evaluating Spatial World Modeling in Video \\ Generators via 3D Camera Trajectory Generation
-
Evaluating the Representation Space of Diffusion Models via Self-Supervised Principles
-
Evolutionary System Prompt Learning for Reinforcement Learning in LLMs
-
Extracting the Data Manifold from Diffusion Models via a Score-Based Non-Conformal Riemannian Metric
-
Feedforward Mixing is as Sharp as it is Slow in Reverse
-
FERMI: Feature-Mapping for Relational Membership Inference on Tabular Diffusion Models
-
Few-Shot Learning in Video Diffusion Models
-
Fine-Tuning Dynamics of In-Context Factual Recall in Transformers
-
Fixed-Point Reasoning: Stable and Adaptive Deep Looped Models
-
Flow Matching on General Manifolds via Pulling Back Geodesic Convex Latent Manifolds
-
ForcingDAS: Unified and Robust Data Assimilation via Diffusion Forcing
-
Forward-Chaining Temporal Point Process
-
Frontier Language Models Struggle to Copy: Text Can Be Better Viewed in 2D
-
Frontier Learning: Training LLM Reasoners at the Edge of Capability
-
Gating Enables Curvature: A Geometric Expressivity Gap in Attention
-
Generalization of Diffusion Models Arises with a Balanced Representation Space
-
Growing Images: Spatial Scheduling in Diffusion Inpainting
-
Hazard Compression: Catastrophic Forgetting in Diffusion-Based Generative Replay under Distribution Shift
-
How Cross-Entropy Shapes Representation Geometry: A Spectral Study on Cycle Graphs
-
How Data Shapes RoPE Frequency Usage: From Positional Scale Matching to Length Generalization
-
How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs
-
How Recursive Training Collapses and What Can Be Done About It
-
How to Train Your Latent Diffusion Language Model Jointly With the Latent Space
-
Imagined Memorisation: Training-Data Leakage in Model-Based RL World Models
-
In- and Out-of-Distribution Generalization of Reasoning in Multimodal LLMs for Simple Visual Planning Tasks
-
In-Place Feedback: Reliable Refinement for Multi-Turn Expert-LLM Collaboration
-
Interdomain Attention: Beyond Token-Level Key-Value Memory
-
Internal Data Repetition Destroys Language Models
-
Internal Tree Search Execution in Transformers
-
Interpreting Latent CoT Reasoning as Dynamical Systems
-
Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds
-
Inverse-Confidence Sampling for Continuous Diffusion Language Models
-
Is your Flow Matching Model Really Generalising? A Path-Length Diagnostic
-
JUMP: Single-Pass Membership Inference on Fine-Tuned Diffusion Language Models
-
Just Add More Capacitors: Eliminating Flux Leakage in Electrostatic Field Matching
-
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
-
Language Diffusion Models are Associative Memories Capable of Retrieving Unseen Data
-
Latent Phase-Shift Rollback: Inference-Time Error Correction via Residual Stream Monitoring and KV-Cache Steering
-
Learn from Your Mistakes: Self-Correcting Masked Diffusion Models
-
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models
-
Learning Human Habits with Rule-Guided Active Inference
-
Learning Manifold Data with Flow Matching
-
Learning to Trade Like an Expert: Cognitive Fine-Tuning for Stable Financial Reasoning in Language Models
-
Leveraging Instruction Tuning and Merging for Reasoning Model Adaptation
-
LLM Generation Novelty Through the Lens of Semantic Similarity
-
LLM-WikiRace: A Benchmark for Planning and Reasoning over Real-World Knowledge Graphs
-
Local Coverage Governs Memorization in Diffusion Models
-
Local Manifold Identification with Latent Linear Models and OT Flows
-
Mamba as Measure-Valued Associative Memory: Infinite-Context Limits and Minimax-Optimal Learning
-
Manifold-Guided Attention Steering
-
Masked Distillation: Internalizing Chain-of-Thought in Small Language Models
-
MCLR: Improving Conditional Modeling via Inter-Class Likelihood-Ratio Maximization and Unifying Classifier-Free Guidance with Alignment Objectives
-
Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers
-
Memorization Detection in Diffusion Models via Text Embedding Interpolation
-
Memorization, Retrieval, and Reasoning in LLM-Driven EDA: A Case Study in FPGA Timing Closure
-
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models
-
Midpoint Generative Models
-
MINDGRAPH: Faithful Concept-Graph Memory for Long-Context Reasoning
-
Mixture-Greedy for Online Generative Model Selection: Is UCB Necessary in Diversity-Aware Multi-Armed Bandits?
-
Model Capacity Determines Grokking through Competing Memorisation and Generalisation Speeds
-
Neural Network-Based Diffusion Models Adapt to Low-Dimensional Multi-Modal Data Structure
-
Norm-Controlled Likelihood Guidance for Diffusion-based Inverse Solver
-
Not Every Time and Frequency Need to Be Forgotten in Diffusion Unlearning
-
On approximation and estimation of Schrödinger potentials without the curse of dimensionality
-
On the approximation of Schrödinger bridge potentials
-
On the Memorization of Consistency Distillation for Diffusion Models
-
On the Policy Gradient Foundations of Group Relative Policy Optimization: Credit Assignment, Gradient Sparsity, and Rank Collapse
-
On the Relationship between the Choice of Representation and In-context Learning
-
On-Policy Self-Distillation with Sampled Demonstrations Reduces Output Diversity
-
One Coupling to Rule Them All: Optimal Transport as the Unifying Geometry of Diffusion Models, Flow Matching, and Reasoning in Deep Generative Models
-
Pathwise Transported Memory Priors for Autoregressive Generative Models
-
Personalized Federated Training of Latent Diffusion Models with Privacy Guarantees
-
Personalized Privacy Control in LLMs via Attention Head Intervention
-
Physics in 2-Steps: Locking Motion Priors Before Visual Refinement Erases Them
-
Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation
-
POLYOMINOGEN: A Controlled Testbed for Understanding Memorization and Compositional Generalization in Conditional Diffusion Models
-
Position Augmentation: Reducing RoPE Extrapolation Cliffs via Random Position Scaling During Training
-
Prior Dominance in Audio-Visual LLMs: When Generative Models Memorize Over Reasoning Under Cross-modal Conflict
-
Probe Choice Changes Canary-Memorization Verdicts: Three Post-Hoc Disagreement Case Studies in a Text-Dominant LoRA-Tuned Autoregressive Testbed
-
Quantifying the Effect of Test Set Contamination on Generative Evaluations
-
Quantifying the Memorization-to-Generalization Transition: Scaling Laws and Phase Structure in Grokking
-
Reasoning Across Space: Tiny Recursive Models for Spatial Omics
-
Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models
-
Reasoning Phases Are Continuous, Not Discrete: Evidence from Switching Linear Dynamical Systems Applied to Chain-of-Thought Residual Streams
-
ReCAST: Probing Sparse Reference Use in In-Context Image Generation
-
Reducing Diffusion Model Memorization with Higher Order Langevin Dynamics
-
Registers Matter for Pixel-space Diffusion Transformers
-
Reinforcement Learning with Promising Tokens for Large Language Models
-
Relative Score Policy Optimization for Diffusion Language Models
-
Rethinking "RL Generalizes, SFT Memorizes": The Role of SFT Data
-
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
-
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
-
Rethinking On-Policy Self-Distillation for Thinking Models
-
Retrieval Dwelling: A Principled Sampling Strategy for Exploiting Spurious State Exploration
-
Revisiting Spectral Representations in Generative Diffusion Models
-
RLVR Training of LLMs Does Not Improve Thinking Ability for General QA: Evaluation Method and a Simple Solution
-
SALSA: State Augmentation via Learned Selective Attention
-
Sample Efficient Generative Model for Molecular Dynamics Trajectories via Twisted Sequential Monte Carlo
-
Scale Dependent Data Duplication
-
Scaling with Recursion in Masked Discrete Diffusion Models
-
SciReview: Diagnosing Compositional Scientific Reasoning in Frontier Models
-
Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision
-
Separating Intrinsic Ambiguity from Estimation Uncertainty in Deep Generative Models for Linear Inverse Problems
-
Setting-Matched and Semantics-Scaled Benchmarking of One-Step Generative Models Against Multistep Diffusion and Flow Models
-
Sobolev Regularized Score Difference Estimation in Diffusion Models
-
Solve the Loop: Attractor Models for Language and Reasoning
-
Sparse Memory Finetuning as a Low-Forgetting Alternative to LoRA and Full Finetuning
-
Spectral Signatures of Memorization in Diffusion Models: A Multi-Scale Diagnostic Study
-
SRM-LoRA: Sub-Riemannian-Style Updates for Mitigating LLM Hallucination in Low-Rank Adaptation
-
Steering Dynamical Regimes of Diffusion Models by Breaking Detailed Balance
-
Structure Over Scale: Rethinking Adaptation for Reinforcement Learning with Verifiable Rewards
-
Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs
-
Synthesizability-Aware Materials Generation with Target Properties via Reinforcement Learning
-
Temporal Backtracking Search for Test-time Generative Video Reasoning
-
Temporal Task Diversity: Inductive Biases Under Non-Stationarity in Synthetic Sequence Modelling
-
Test of Time: Rethinking Temporal Signal of Benchmark Contamination
-
The Confidence Shortcut: A Reasoning Failure Mode of Masked Diffusion Models
-
The Distillation Game: Adaptive Attacks & Efficient Defenses
-
The Surprising Effectiveness of Deleting Weights in LLM Reasoning and Adaptation
-
TIGER: Bridging the Multimodal Reasoning-Access Gap via Modality Counterfactuals
-
Tight PAC-Bayes Generalisation Guarantees for Large Language Model Safety Monitoring
-
Time-Correlated Video Bridge Matching
-
Towards \textit{Effective Theory} of LLMs: A Representation Learning Approach
-
Tracing Uncertainty in Language Model "Reasoning"
-
Training In-Context and In-Weights Mixtures Via Contrastive Context Sampling
-
Transformers Learn the Optimal DDPM Denoiser for Multi-Token GMMs
-
TUBE: Tangent Upper Bound on Evidence for Discrete Diffusion Language Models
-
Understanding Flatness in Generative Models: Its Role and Benefits
-
Understanding Generalization in Diffusion Distillation via Probability Flow Distance
-
Understanding LLM generalization through fine-tuning
-
Understanding Solver-Induced Variance Distortion in Conditional Diffusion Regression
-
Universality, Composition Generalization, and Algorithm Emulation All In-Context
-
Unlearning for One-Step Generative Models via Unbalanced Optimal Transport
-
Unlocking the Duality between Flow and Field Matching
-
Velocity Adaptation for Flow-Matching Models
-
ViGoR-Bench: How Far Are Visual Generative Models From Zero-Shot Visual Reasoners?
-
What a Small Autoregressive Transformer Briefly Learns and Then Forgets: Transient Structural Capabilities and Probe-Specific Head Repurposing
-
What Architectural Inductive Bias Makes Diffusion Models Succeed? A Perspective from the Implicit Regularization of Gradient Descent
-
What to Forget in Unlearning? Forget Set Curation for Language Models
-
What You Predict Shapes How You Memorize: Target-Parameterization and Memorization Dynamics in Flow Matching
-
When Does Diffusion Purification Amplify Perturbations?
-
Where Rollouts Begin: Low-Load, High-Leverage First-Token Diversification for RLVR
-
Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
-
Where’s the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
-
Why Alignment Must Precede Distillation: A Minimal Working Explanation
-
Why Are Distribution-Matching Distilled Students Lazy? Understanding the Copying Behavior in Few-Step Distillation
-
Why Does Pruning during Training Work? A Signal-to-Noise Analysis of Sparse Neural Network Training
-
Why is A+B Better Than B? A Simple Graph Perspective on Task Transfer
-
Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning
-
Within-Episode Failure Recovery in LLM Agents via Progress-Gated Dual-Process Routing