ICML 2025 Past Efficiency

ICML 2025 Workshop on Methods and Opportunities at Small Scale

MOSS@ICML2025

Submission deadline
May 27, 2025, 15:50 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (61)

Fetched from OpenReview (v2) on 2026-06-10.

  1. AdaptMI: Adaptive Skill-based In-context Math Instructions for Small Language Models

    Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora · PDF
  2. An Empirical Investigation of Initialization Strategies for Kolmogorov–Arnold Networks

    Spyros Rigas, Dhruv Verma, Georgios Alexandridis, Yixuan Wang · PDF
  3. Approximate Message Passing on General Factor Graphs using Shallow Neural Networks

    Leonhard Hennicke, Jan Lemcke, Rainer Schlosser, Ralf Herbrich · PDF
  4. CaliPSo: Calibrated Predictive Models with Sharpness as Loss Function

    Alexandre Capone, Kamron Zaidi, Tianyu Xu, Brian Yang, Geoff Pleiss, Jeff Schneider · PDF
  5. Continuous Chain of Thought Enables Parallel Exploration and Reasoning

    Halil Alperen Gozeten, Muhammed Emrullah Ildiz, Xuechen Zhang, Hrayr Harutyunyan, Ankit Singh Rawat, Samet Oymak · PDF
  6. Cross-Validation Error Dynamics in Smaller Datasets

    Bethany austhof, Lev Reyzin · PDF
  7. Dataset Distillation for Memorized Data: Soft Labels can Leak Held-Out Teacher Knowledge

    Freya Behrens, Lenka Zdeborova
  8. Decomposed Learning: An Avenue for Mitigating Grokking

    Gabryel Mason-Williams, Israel Mason-Williams · PDF
  9. Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

    Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU · PDF
  10. Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning

    Xinyi Wang, Shawn Tan, Mingyu Jin, William Yang Wang, Rameswar Panda, Yikang Shen · PDF
  11. Dynamic Low-Rank Training with Spectral Regularization: Achieving Robustness in Compressed Representations

    Steffen Schotthöfer, H. Lexie Yang, Stefan Schnake · PDF
  12. Effective Reinforcement Learning for Reasoning in Language Models

    Lianghuan Huang, Shuo Li, Sagnik Anupam, Insup Lee, Osbert Bastani · PDF
  13. Efficient B-Tree Insertions Using Proximal Policy Optimization and Hierarchical Attention Models

    Alexander Kastius, Nick Lechtenbörger, Felix Schulz, Johann Schulze Tast, Rainer Schlosser, Ralf Herbrich · PDF
  14. Emergence of Hebbian Dynamics in Regularized Non-Local Learners

    David Aaron Koplow, Tomaso Poggio, Liu Ziyin · PDF
  15. Emergence, pretraining loss and associative recall: a toy model

    Sultan Daniels, Dylan Davis, Dhruv Gautam, Wentinn Liao, Gireeja Ranade, Anant Sahai · PDF
  16. Encoding Domain Insights into Multi-modal Fusion: Improved Performance at the Cost of Robustness

    Jackson Sam Michaels, Sidong Zhang, Madalina Fiterau · PDF
  17. Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

    Rahul Raja, Arpita Vats · PDF
  18. Evaluating Sparse Autoencoders: From Shallow Design to Matching Pursuit

    Valérie Costa, Thomas Fel, Ekdeep Singh Lubana, Bahareh Tolooshams, Demba E. Ba · PDF
  19. Exploring Diverse Solutions for Underdetermined Problems

    Eric Volkmann, Andreas Radler, Johannes Brandstetter, Arturs Berzins · PDF
  20. Extrapolation by Association: Length Generalization Transfer in Transformers

    Ziyang Cai, Nayoung Lee, Avi Schwarzschild, Samet Oymak, Dimitris Papailiopoulos · PDF
  21. Foundation Models on a Budget: Approximating Blocks in Large Vision Models

    Irene Cannistraci, Simone Antonelli, Emanuele Palumbo, Thomas M. Sutter, Emanuele Rodolà, Bastian Rieck, Julia E Vogt · PDF
  22. From SGD to Spectra: A Theory of Neural Network Weight Dynamics

    Brian Richard Olsen, Sam Fatehmanesh, Frank Xiao, Adarsh Kumarappan, Anirudh Gajula · PDF
  23. Generalizing Trust: Weak-to-Strong Trustworthiness in Language Models

    Lillian Sun, Martin Pawelczyk, Zhenting Qi, Aounon Kumar, Himabindu Lakkaraju · PDF
  24. Generative or Discriminative? Revisiting Text Classification in the Era of Transformers

    Siva Rajesh Kasa, Sumegh Roychowdhury, Karan Gupta, Yaswanth Biruduraju, Santhosh Kumar Kasa, Ashutosh Kumar, Pattisapu Nikhil Priyatam, Arindam Bhattacharya, Shailendra Agarwal, Vijay huddar · PDF
  25. Geometry of Rank Constraints in Shallow Polynomial Neural Networks

    Param Mody, Maksym Zubkov · PDF
  26. Gradient descent in presence of extreme flatness and steepness

    Dravyansh Sharma
  27. How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles

    Vala Vakilian, Sadegh Mahdavi, Christos Thrampoulidis · PDF
  28. Improving Pathfinding with Anchoring Tokens

    Huaqing Zhang, Bingbin Liu, Juno Kim, Andrej Risteski · PDF
  29. In-Context Occam’s Razor: How Transformers Prefer Simpler Hypotheses on the Fly

    Puneesh Deora, Bhavya Vasudeva, Tina Behnia, Christos Thrampoulidis · PDF
  30. Is Visual Prompting the Right Setup for Knowledge Transfer in new Foundation Models?

    Niclas Hergenröther, Antonio Orvieto · PDF
  31. Koopman Autoencoders Learn Neural Representation Dynamics

    Nishant Suresh Aswani, Saif Jabari · PDF
  32. Learning Gaussian Mixture Models via Transformer Measure Flows

    Aleksandr Zimin, Anastasiia Kutakh, Yury Polyanskiy, Philippe Rigollet · PDF
  33. LiteByte: Efficient and Fast-Adapting MLPs for Online Byte-Level Prediction

    Yu Mao, Yuyan Lin, Xue Liu, Chun Jason Xue · PDF
  34. Measuring Memorization and Generalization in Forecasting Models via Structured Perturbations of Chaotic Systems

    Max Kanwal, Caryn Tran · PDF
  35. Mind the Gap: Removing the Discretization Gap in Differentiable Logic Gate Networks

    Shakir Yousefi, Andreas Plesner, Till Aczel, Roger Wattenhofer · PDF
  36. Neural Stochastic Differential Equations on Compact State-Spaces

    Yue-Jane Liu, Malinda Lu, Matthew K. Nock, Yaniv Yacoby · PDF
  37. On the Emergence of Position Bias in Transformers

    Xinyi Wu, Yifei Wang, Stefanie Jegelka, Ali Jadbabaie · PDF
  38. Optimizing Explanations: Nuances Matter When Evaluation Metrics Become Loss Functions

    Jonas B Raedler, Hiwot Belay Tadesse, Weiwei Pan, Finale Doshi-Velez · PDF
  39. Parity Requires Unified Input Dependence and Negative Eigenvalues in SSMs

    Behnoush Khavari, Jayesh Khullar, Mehran Shakerinava, Jerry Huang, Siamak Ravanbakhsh, Sarath Chandar · PDF
  40. Performance Plateaus in Inference-Time Scaling for Text-to-Image Diffusion Without External Models

    Changhyun Choi, Sungha Kim, H. Jin Kim · PDF
  41. Permutations as a testbed for studying the effect of input representations on learning

    Sarah McGuire Scullen, Davis Brown, Robert Jasper, Henry Kvinge, Helen Jenne · PDF
  42. Personalizing AI Interventions in Multiple Health Behavioral Change Settings

    Samantha Marks, Michelle Chang, Eura Nofshin, Weiwei Pan, Finale Doshi-Velez · PDF
  43. Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

    Sai Sumedh R. Hindupur, Ekdeep Singh Lubana, Thomas Fel, Demba E. Ba · PDF
  44. Pruning Increases Orderedness in Weight-Tied Recurrent Computation

    YIDING SONG · PDF
  45. Quantitative Bounds for Length Generalization in Transformers

    Zachary Izzo, Eshaan Nichani, Jason D. Lee · PDF
  46. Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought

    Hanlin Zhu, Shibo Hao, Zhiting Hu, Jiantao Jiao, Stuart Russell, Yuandong Tian · PDF
  47. Restoring Task-Relevant Information in Synthetic Data: A Small-Scale V-Information View

    Sid Bharthulwar · PDF
  48. Review, Remask, Refine: Process-Guided Block Diffusion for Text Generation

    Nikita Mounier, Parsa Idehpour · PDF
  49. Stats or Facts: Decomposing Generalization in Language Models with Small-Scale Models

    Tina Behnia, Puneesh Deora, Christos Thrampoulidis · PDF
  50. SynDaCaTE: A Synthetic Dataset For Evaluating Part-Whole Hierarchical Inference

    Jake Levi, Mark van der Wilk · PDF
  51. The Necessity for Intervention Fidelity: Unintended Side Effects When Steering LLMs

    Jonas B Raedler, Weiyue Li, Alyssa Mia Taliotis, Manasvi Goyal, Siddharth Swaroop, Weiwei Pan · PDF
  52. TinyServe: Query-Aware Cache Selection for Efficient LLM Inference

    Dong Liu, Yanxuan Yu · PDF
  53. Towards Understanding Self-Pretraining for Sequence Classification

    Omar Coser, Antonio Orvieto · PDF
  54. Transformers May Learn to Classify In-Context by Context-Adaptive Kernel Gradient Descent

    Sara Dragutinović, Andrew M Saxe, Aaditya K Singh · PDF
  55. Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning

    Zachary Shinnick, Liangze Jiang, Hemanth Saratchandran, Anton van den Hengel, Damien Teney · PDF
  56. Understanding Attention Glitches with Threshold Relative Attention

    Mattia Opper, Roland Fernandez, Paul Smolensky, Jianfeng Gao · PDF
  57. Understanding How Chess-Playing Language Models Compute Linear Board Representations

    Aaron Mei · PDF
  58. Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers

    Annalisa Belloni, Lorenzo Noci, Antonio Orvieto · PDF
  59. What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers

    Pulkit Gopalani, Wei Hu · PDF
  60. Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features

    Yize Zhao, Christos Thrampoulidis · PDF
  61. ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training

    Feijiang Han, Xiaodong Yu, Jianheng Tang, Qingyun Zeng, Licheng Guo, Lyle Ungar · PDF