ICML 2026PastOther

ICML 2026 Workshop on Weight-Space Symmetries: from Foundations to Practical Applications

ICML 2026 Workshop WSS

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: May 9, 2026, 11:59 UTC
OpenReview-synced 2026-05-09 11:59 UTC (as of 2026-06-23) — extensions on OpenReview are applied automatically; verify on the website.
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (51)

Fetched from OpenReview (v2) on 2026-06-10.

A Geometric View of Model Merging: Quotient Fréchet Averages from Toy Models to LoRA
Marvin F. da Silva, Mohammed Adnan, Felix Dangel, Sageev Oore
Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging
Yuanyi Wang, Yanggan Gu, Su Lu, Yifan Yang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Hongxia Yang · PDF
Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation
Ekaterina Alimaskina, Gleb Molodtsov, Aleksandr Beznosikov · PDF
Are we Merging the Right Models? Impact of Expert Training Duration on Model Merging for LLMs
Nikita Kozodoi, Zainab Afolabi, Jack Butler
Attention Weight Decomposition for Vision Model Compression
Hyunwoo Yu, Yubin Cho, Kyeongbo Kong, Suk-Ju Kang
Auditing Neural Thickets with Low-Rank Routes
Miroslav Lžičař
Beyond Pairwise: Diagnosing Higher-Order Merge Failures via Hodge Decomposition
Dongzhe Zheng, Christine Allen-Blanchette
Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability
Vincent Bürgin, Daniel Herbst, Ya-Wei Eileen Lin, Stefanie Jegelka · PDF
Block-Level Weight-Space Structure Persists Under Post-Training: An Empirical Study Across LLM Families
Zhaohui Geoffrey Wang · PDF
Breaking Random-Init Symmetry: Theory-Informed Initialization for ReLU Networks
Anass Al ammiri
Debugging ReBasin: What Limits Symmetry-Based Model Merging?
Elliot Stein, Christine Evers, Jonathon Hare
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
Connall Garrod, Jonathan P. Keating, Christos Thrampoulidis · PDF
Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization
Kirato Yoshihara
DotResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging
Neha Verma, Kenton Murray, Kevin Duh
Endpoint Symmetry for Edge Updates: Weight-Space Redundancy in GNNs on Undirected Graphs
Charlotte Cambier van Nooten, Stijn van den Beemt, Yuliya Shapovalova, Tom Heskes
Flow Equivariant Transformers
Ibrahim Khaliliya, T. Anderson Keller
Generic Fibers and Functional Dimension of Multi-Head Attention
Nathan W. Henry
Hierarchical Mixture-of-Experts with Two-Stage Optimization
Gleb Molodtsov, Alexander Miasnikov, Aleksandr Beznosikov · PDF
How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs
Mark Kozdoba, Shie Mannor
How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks
Teodor-Mihai Stupariu, Andrei Manolache
Iterative Magnitude Pruning Reduces Weight-Space Coupling
Lucas Perez, Mariana Ordones Oliveira Soares, Jackson de Faria, Fabricio Murai, Renato M. Assunção
Low-Rank Networks Recover Weight and Functional Symmetry Better
Janis Aiad
LS-Merge: Merging Language Models in Latent Space
Bedionita Soro, Aoxuan Silvia Zhang, Bruno Andreis, Jaehyeong Jo, Song Chong, Sung Ju Hwang
Meta-Merging by Checkpoint Nowcasting
Albert Manuel Orozco Camacho, Boris Knyazev, Eugene Belilovsky, Guy Wolf
Model Merging by Output-Space Projection
Bethan Evans, Benjamin Etheridge, S Roberts, Jared Tanner
Model Merging via Averaged Representational Similarity
Christopher Wang, Vighnesh Subramaniam, Dan Gutfreund, Boris Katz, Phillip Isola, Brian Cheung
MoRE: Mixture of Reused Experts
Eric S. Qiu, Utku Umur ACIKALIN, Justin Lovelace, Christian Belardi, Arjun B. Mulchandani, Carla P Gomes, Kilian Q Weinberger
No Global Gauge in Neural Weight Space: Branched Quotient Geometry and Atlas-Optimal Learning
Manoj Saravanan, Rohit Kumar Salla
Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning
Arghamitra Talukder, Philippe Chlenski, Itsik Pe'er
On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors
Julius Kobialka, Emanuel Sommer, Chris Kolb, Juntae Kwon, Daniel Dold, David Rügamer
Parameter symmetries determine representational geometry in overparameterized nonlinear networks
Marvin Theiss, Lukas Braun, Andrew M Saxe, Erin Grant
Pre-Normalization Momentum Governs Optimizer-Induced Rank Bias
Raghav Kaushik Ravi, Srivarshinee Sridhar
Quantifying Symmetries: How Optimisers Impact the Functional Dimension
Johanna Marie Gegenfurtner, Naima Elosegui Borras, Georgios Arvanitidis
Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression
Artur Zagitov, Alexander Miasnikov, Maxim Krutikov, Artem Tsedenov, Vladimir Aletov, Gleb Molodtsov, Nail Bashirov, Aleksandr Beznosikov · PDF
Rotation Symmetry in Vision Quantization: The Objective Function is the Bottleneck
Jaewoo Park, Jihae Lee, Yunjeong yong
Scale-Equivariant Alignment: Closing the Residual Barrier After Permutation Matching
Kaustubh S. Bukkapatnam, Siddharth Karuturi
Scale-Invariant Empirical-Bayes Laplace Approximation for ReLU Networks
Shivam Pal, Piyush Rai
Sharpness-Aware Minimization Directly on the Boolean Hypercube
Ba-Hien TRAN · PDF
Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates
Edward Sun, Dmitrii Troitskii · PDF
SIB: Reparameterization of LLMs for Better Learning-Forgetting under SFT
Albert Catalan-Tatjer, Jonas Geiping
Symmetry Acquisition in Predictive Coding Networks
Adam Shaw, Jiayu Li, Michael Sperling, Michael Kim, Alvin Jin
Symmetry-Induced Non-Identifiability in Neural Circuit Inference
Seungwon Yu, Jaeho Yang, Kijung Yoon
T-REX: Tied Recurrence Extraction
Mozes Jacobs, T. Anderson Keller, Thomas Fel, Bingbin Liu, Richard Hakim, Yilun Du, Demba E. Ba
Task-Restricted Symmetries in Recurrent Weight Space
Simon Dräger
The GL(r) Gauge Symmetry of LoRA: Principal Bundle Structure, Loss Landscape Geometry, and Implications for Adapter Merging
Siddharth Karuturi, Kaustubh S. Bukkapatnam, Laksh Patel, Tanush Ajay Shastry
The Role of Symmetry in Optimizing Overparameterized Networks
Kusha Sareen, Mohammad Pedramfar, Sékou-Oumar Kaba, Mehran Shakerinava, Siamak Ravanbakhsh
Toward a Type-Theoretic Framework for Linear Mode Connectivity: Univalence and Path-Finding in Weight Spaces
Şuayp Talha Kocabay, Kerem Yalçın, Talha Rüzgar Akkuş, Erik Hillbom
WARP: Weight-Space Analysis for Recovering Training Data Portfolios
Tzu-Heng Huang, Aditya Goyal, John Cooper, Frederic Sala
Weight Space Representation Learning via Neural Field Adaptation
Zhuoqian Yang, Mathieu Salzmann, Sabine Süsstrunk
What Survives of Path Norms? Path-Lifting as an Intermediate Representation for ReLU Networks
Antoine Gonon, Rémi Gribonval · PDF
WK, WV is (Linearly) All You Need: On the Necessity of the QKV Weight Triplet in Self-Attention Transformers
Marko Karbevski, Antonij Mijoski

Accepted papers (51)

☆A Geometric View of Model Merging: Quotient Fréchet Averages from Toy Models to LoRA

☆Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

☆Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

☆Are we Merging the Right Models? Impact of Expert Training Duration on Model Merging for LLMs

☆Attention Weight Decomposition for Vision Model Compression

☆Auditing Neural Thickets with Low-Rank Routes

☆Beyond Pairwise: Diagnosing Higher-Order Merge Failures via Hodge Decomposition

☆Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability

☆Block-Level Weight-Space Structure Persists Under Post-Training: An Empirical Study Across LLM Families

☆Breaking Random-Init Symmetry: Theory-Informed Initialization for ReLU Networks

☆Debugging ReBasin: What Limits Symmetry-Based Model Merging?

☆Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

☆Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

☆DotResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

☆Endpoint Symmetry for Edge Updates: Weight-Space Redundancy in GNNs on Undirected Graphs

☆Flow Equivariant Transformers

☆Generic Fibers and Functional Dimension of Multi-Head Attention

☆Hierarchical Mixture-of-Experts with Two-Stage Optimization

☆How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

☆How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

☆Iterative Magnitude Pruning Reduces Weight-Space Coupling

☆Low-Rank Networks Recover Weight and Functional Symmetry Better

☆LS-Merge: Merging Language Models in Latent Space

☆Meta-Merging by Checkpoint Nowcasting

☆Model Merging by Output-Space Projection

☆Model Merging via Averaged Representational Similarity

☆MoRE: Mixture of Reused Experts

☆No Global Gauge in Neural Weight Space: Branched Quotient Geometry and Atlas-Optimal Learning

☆Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning

☆On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors

☆Parameter symmetries determine representational geometry in overparameterized nonlinear networks

☆Pre-Normalization Momentum Governs Optimizer-Induced Rank Bias

☆Quantifying Symmetries: How Optimisers Impact the Functional Dimension

☆Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

☆Rotation Symmetry in Vision Quantization: The Objective Function is the Bottleneck

☆Scale-Equivariant Alignment: Closing the Residual Barrier After Permutation Matching

☆Scale-Invariant Empirical-Bayes Laplace Approximation for ReLU Networks

☆Sharpness-Aware Minimization Directly on the Boolean Hypercube

☆Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

☆SIB: Reparameterization of LLMs for Better Learning-Forgetting under SFT

☆Symmetry Acquisition in Predictive Coding Networks

☆Symmetry-Induced Non-Identifiability in Neural Circuit Inference

☆T-REX: Tied Recurrence Extraction

☆Task-Restricted Symmetries in Recurrent Weight Space

☆The GL(r) Gauge Symmetry of LoRA: Principal Bundle Structure, Loss Landscape Geometry, and Implications for Adapter Merging

☆The Role of Symmetry in Optimizing Overparameterized Networks

☆Toward a Type-Theoretic Framework for Linear Mode Connectivity: Univalence and Path-Finding in Weight Spaces

☆WARP: Weight-Space Analysis for Recovering Training Data Portfolios

☆Weight Space Representation Learning via Neural Field Adaptation

☆What Survives of Path Norms? Path-Lifting as an Intermediate Representation for ReLU Networks

☆WK, WV is (Linearly) All You Need: On the Necessity of the QKV Weight Triplet in Self-Attention Transformers

A Geometric View of Model Merging: Quotient Fréchet Averages from Toy Models to LoRA

Access Sets Matter: Budgeting Expert Reads for Scalable Weight-Space Model Merging

Analyzing Stream Collapse in Hyper-Connections: From Diagnosis to Mitigation

Are we Merging the Right Models? Impact of Expert Training Duration on Model Merging for LLMs

Attention Weight Decomposition for Vision Model Compression

Auditing Neural Thickets with Low-Rank Routes

Beyond Pairwise: Diagnosing Higher-Order Merge Failures via Hodge Decomposition

Beyond Structural Symmetries: Linear Mode Connectivity via Neuron Identifiability

Block-Level Weight-Space Structure Persists Under Post-Training: An Empirical Study Across LLM Families

Breaking Random-Init Symmetry: Theory-Informed Initialization for ReLU Networks

Debugging ReBasin: What Limits Symmetry-Based Model Merging?

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

DotResize: Reducing LLM Width via Discrete Optimal Transport-based Neuron Merging

Endpoint Symmetry for Edge Updates: Weight-Space Redundancy in GNNs on Undirected Graphs

Flow Equivariant Transformers

Generic Fibers and Functional Dimension of Multi-Head Attention

Hierarchical Mixture-of-Experts with Two-Stage Optimization

How Deep Are Deep GPs, Really? A Sharp Threshold and a Non-Gaussian Limit for Compositional GPs

How the Optimizer Shapes Learned Solutions in Equivariant Neural Networks

Iterative Magnitude Pruning Reduces Weight-Space Coupling

Low-Rank Networks Recover Weight and Functional Symmetry Better

LS-Merge: Merging Language Models in Latent Space

Meta-Merging by Checkpoint Nowcasting

Model Merging by Output-Space Projection

Model Merging via Averaged Representational Similarity

MoRE: Mixture of Reused Experts

No Global Gauge in Neural Weight Space: Branched Quotient Geometry and Atlas-Optimal Learning

Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning

On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors

Parameter symmetries determine representational geometry in overparameterized nonlinear networks

Pre-Normalization Momentum Governs Optimizer-Induced Rank Bias

Quantifying Symmetries: How Optimisers Impact the Functional Dimension

Rethinking the Role of Tensor Decompositions in Post-Training LLM Compression

Rotation Symmetry in Vision Quantization: The Objective Function is the Bottleneck

Scale-Equivariant Alignment: Closing the Residual Barrier After Permutation Matching

Scale-Invariant Empirical-Bayes Laplace Approximation for ReLU Networks

Sharpness-Aware Minimization Directly on the Boolean Hypercube

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

SIB: Reparameterization of LLMs for Better Learning-Forgetting under SFT

Symmetry Acquisition in Predictive Coding Networks

Symmetry-Induced Non-Identifiability in Neural Circuit Inference

T-REX: Tied Recurrence Extraction

Task-Restricted Symmetries in Recurrent Weight Space

The GL(r) Gauge Symmetry of LoRA: Principal Bundle Structure, Loss Landscape Geometry, and Implications for Adapter Merging

The Role of Symmetry in Optimizing Overparameterized Networks

Toward a Type-Theoretic Framework for Linear Mode Connectivity: Univalence and Path-Finding in Weight Spaces

WARP: Weight-Space Analysis for Recovering Training Data Portfolios

Weight Space Representation Learning via Neural Field Adaptation

What Survives of Path Norms? Path-Lifting as an Intermediate Representation for ReLU Networks

WK, WV is (Linearly) All You Need: On the Necessity of the QKV Weight Triplet in Self-Attention Transformers