ICML 2025PastEfficiency

Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM)

TTODLer-FM @ ICML 2025

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: May 27, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (32)

Fetched from OpenReview (v2) on 2026-06-10.

Addition is almost all you need: Compressing neural networks with double binary factorization
Vladimír Boža, Vladimír Macko · PDF
Capability Transfer from Large to Small Models with Synthetically-Generated Data
Lillian Sun, Emma Yang, Arif Kerem Dayi · PDF
Compression of Large Language Models by Condensed Weight Representation
Yancheng Wang, Dongfang Sun, Yingzhen Yang · PDF
DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion
Makoto Shing, Takuya Akiba · PDF
Dynamic Guardian Models: Realtime Content Moderation With User-Defined Policies
Monte Hoover, Vatsal Baherwani, Neel Jain, Khalid Saifullah, Joseph James Vincent, Chirag Jain, Melissa Kazemi Rad, C. Bayan Bruss, Ashwinee Panda, Tom Goldstein · PDF
Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search
Dongge Han, Menglin Xia, Daniel Madrigal, Samuel Kessler, Ankur Mallick, Xuchao Zhang, Mirian Del Carmen Hipolito Garcia, Jin Xu, Victor Rühle, Saravan Rajmohan · PDF
FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training
Haoyuan Li, Mathias Funk, Jindong Wang, Aaqib Saeed · PDF
FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression
Kuan-Ting Tu, Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien · PDF
First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
Egor Shulgin, Grigory Malinovsky, Sarit Khirirat, Peter Richtárik · PDF
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko, Aleksandr Beznosikov, Martin Takáč, Samuel Horváth · PDF
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha, Petra Poklukar, Wittawat Jitkrittum, Sean Augenstein, Congchao Wang, Federico Tombari · PDF
Higher Acceptance Rates for Speculative Decoding with Randomised Drafting
William Toner, Martin Asenov, Rajkarn Singh, Artjom Joosen · PDF
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen · PDF
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order LLM Fine-Tuning
Egor Petrov, Evseev Grigoriy, Aleksey Antonov, Andrey Veprikov, Pavel Plyusnin, Nikolay Bushkov, Stanislav Moiseev, Aleksandr Beznosikov · PDF
Lion Cub: Minimizing Communication Overhead in Distributed Lion
Satoki Ishikawa, Tal Ben-Nun, Brian Van Essen, Rio Yokota, Nikoli Dryden · PDF
LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning
Nurbek Tastan, Stefanos Laskaridis, Martin Takáč, Karthik Nandakumar, Samuel Horváth · PDF
MatMuls are Enough for Efficient and Performant Linear-Time Attention
Andrew Argatkiny, Ilya Makarov · PDF
Offloaded Reasoning: Efficient Inference for Large Language Models via Modular Reasoning and Refinement
Ishan Jindal, Jayant Taneja, Badrinath chandana, Vikas Kapur, SACHIN DEV SHARMA · PDF
Overcoming label shift in targeted federated learning
Adam Breitholtz, Edvin Listo Zec, Fredrik D. Johansson · PDF
Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models
Katrina Brown, Aneesh Muppidi, Rana Shahout · PDF
Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction
Yoonjun Cho, Dongjae Jeon, Soeun Kim, Albert No · PDF
Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation for Federated Learning
Grigory Malinovsky, Umberto Michieli, Hasan Abed Al Kader Hammoud, Taha Ceritli, Hayder Elesedy, Mete Ozay, Peter Richtárik · PDF
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Avetik Karagulyan, Egor Shulgin, Abdurakhmon Sadiev, Peter Richtárik · PDF
Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding
Mingxiao Huo, Jiayi Zhang, Hewei Wang, Jinfeng Xu, Zheyu Chen, Huilin Tai, Ian Yijun Chen · PDF
TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices
Mingxue Xu, Yao Lei Xu, Danilo Mandic · PDF
Token-Efficient RL for LLM Reasoning
Alan Lee, Harry Tong · PDF
Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers
Joshua Barron, Devin White · PDF
Towards understanding of orthogonalization in Muon
Valentyn Boreiko, Zhiqi Bu, Sheng Zha · PDF
Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights
Yasuyuki Okoshi, Hikari Otsuka, Junnosuke Suzuki, Daichi Fujiki, Masato Motomura · PDF
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
Berkin Durmus, Arda Okan, Eduardo Pacheco, Zach Nagengast, Atila Orhon · PDF
Zeroth-Order Optimization is Secretly Single-Step Policy Optimization
Junbin Qiu, Zhengpeng Xie, Xiangda Yan, Yongjie Yang, Yao Shu · PDF
Zoop it! Efficient Zero-Order Optimization with Output Perturbation
Xixi Hu, Bo Liu, qiang liu, Xiaocong Du, Bhargav Bhushanam, Louis Feng, Chengyue Gong, Kaizhao Liang · PDF

Accepted papers (32)

☆Addition is almost all you need: Compressing neural networks with double binary factorization

☆Capability Transfer from Large to Small Models with Synthetically-Generated Data

☆Compression of Large Language Models by Condensed Weight Representation

☆DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

☆Dynamic Guardian Models: Realtime Content Moderation With User-Defined Policies

☆Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search

☆FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training

☆FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

☆First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions

☆FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

☆Gatekeeper: Improving Model Cascades Through Confidence Tuning

☆Higher Acceptance Rates for Speculative Decoding with Randomised Drafting

☆Kinetics: Rethinking Test-Time Scaling Laws

☆Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order LLM Fine-Tuning

☆Lion Cub: Minimizing Communication Overhead in Distributed Lion

☆LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

☆MatMuls are Enough for Efficient and Performant Linear-Time Attention

☆Offloaded Reasoning: Efficient Inference for Large Language Models via Modular Reasoning and Refinement

☆Overcoming label shift in targeted federated learning

☆Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

☆Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction

☆Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation for Federated Learning

☆SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning

☆Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding

☆TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

☆Token-Efficient RL for LLM Reasoning

☆Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

☆Towards understanding of orthogonalization in Muon

☆Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights

☆WhisperKit: On-device Real-time ASR with Billion-Scale Transformers

☆Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

☆Zoop it! Efficient Zero-Order Optimization with Output Perturbation

Addition is almost all you need: Compressing neural networks with double binary factorization

Capability Transfer from Large to Small Models with Synthetically-Generated Data

Compression of Large Language Models by Condensed Weight Representation

DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

Dynamic Guardian Models: Realtime Content Moderation With User-Defined Policies

Enhancing Reasoning Capabilities of Small Language Models with Blueprints and Prompt Template Search

FAST: Federated Active Learning with Foundation Models for Communication-efficient Sampling and Training

FGFP: A Fractional Gaussian Filter and Pruning for Deep Neural Networks Compression

First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions

FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training

Gatekeeper: Improving Model Cascades Through Confidence Tuning

Higher Acceptance Rates for Speculative Decoding with Randomised Drafting

Kinetics: Rethinking Test-Time Scaling Laws

Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order LLM Fine-Tuning

Lion Cub: Minimizing Communication Overhead in Distributed Lion

LoFT: Low-Rank Adaptation That Behaves Like Full Fine-Tuning

MatMuls are Enough for Efficient and Performant Linear-Time Attention

Offloaded Reasoning: Efficient Inference for Large Language Models via Modular Reasoning and Refinement

Overcoming label shift in targeted federated learning

Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models

Preserve then Quantize: Dominant-Subspace Guided Low-Rank Reconstruction

Randomized Asymmetric Chain of LoRA: The First Meaningful Theoretical Framework for Low-Rank Adaptation for Federated Learning

SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning

Spec-LLaVA: Accelerating Vision-Language Models with Dynamic Tree-Based Speculative Decoding

TensorSLM: Energy-efficient Embedding Compression of Sub-billion Parameter Language Models on Low-end Devices

Token-Efficient RL for LLM Reasoning

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers

Towards understanding of orthogonalization in Muon

Unlocking the Potential of Extremely Low-Bit Sparse Transformers through Adaptive Multi-bit Supermasks and Random Weights

WhisperKit: On-device Real-time ASR with Billion-Scale Transformers

Zeroth-Order Optimization is Secretly Single-Step Policy Optimization

Zoop it! Efficient Zero-Order Optimization with Output Perturbation