ICLR 2025PastLarge language modelsEfficiencyOptimization

First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models

SCOPE - ICLR 2025

Official website ↗OpenReview venue ↗See all ICLR workshops →✎ Edit this entry

Submission deadline: Feb 10, 2025, 12:05 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (56)

Fetched from OpenReview (v2) on 2026-06-10.

A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev · PDF
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention
Zhendong Zhang · PDF
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal, Phillip Isola, Antonio Torralba, William T. Freeman · PDF
AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
Abdelhakim Benechehab, Vasilii Feofanov, Giuseppe Paolo, Albert Thomas, Maurizio Filippone, Balázs Kégl · PDF
AsymLoRA: Unlocking the Power of Multimodal LLMs via Asymmetric LoRA
Xuyang Wei, Chunlin Tian, Li Li · PDF
Attention Is All You Need For Mixture-of-Depths Routing
Advait Gadhikar, Souptik Kumar Majumdar, Niclas Popp, Piyapat Saranrittichai, Martin Rapp, Lukas Schott · PDF
ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters
Kamer Ali Yuksel, Hassan Sawaf · PDF
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
Andy Zhou, Ron Arel · PDF
Conformal Transformations for Symmetric Power Transformers
Saurabh Kumar, Jacob Buckman, Carles Gelada, Xiaowen Zhang · PDF
Context Is All You Need: Efficient Retrieval Augmented Generation for Domain Specific AI
Peixi Xiong, Chaunte W. Lacewell, Sameh Gobriel, Nilesh Jain · PDF
DARS : ROBUST SPARSE FINE-TUNING WITH REGULARIZED SUBSPACE DISALIGNMENT
Sumin Park, Noseong Park · PDF
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Zhen Tan, Daize Dong, Xinyu Zhao, Jianing Cai, Jie Peng, Yu Cheng, Tianlong Chen · PDF
Domain-Invariant Prompt Learning for Vision-Language Models
Arsham Gholamzadeh Khoee, Yinan Yu, Robert Feldt · PDF
Efficient Distributed Optimization under Heavy-Tailed Noise
Su Hyeong Lee, Manzil Zaheer, Tian Li · PDF
Efficient Open-set Test Time Adaptation of Vision Language Models
Manogna Sreenivas, Soma Biswas · PDF
Effortless Efficiency: Low-Cost Pruning of Diffusion Models
Yang Zhang, Er Jin, Yanfei Dong, Ashkan Khakzar, Philip Torr, Johannes Stegmaier, Kenji Kawaguchi · PDF
Enhanced Continual Learning of Vision-Language Models with Model Fusion
Haoyuan Gao, Zicong Zhang, Yuqi Wei, Linglan Zhao, Guilin Li, Yexin Li, Linghe Kong, Weiran Huang · PDF
Fast Gradient Computation for RoPE Attention in Almost Linear Time
Yifang Chen, Jiayan Huo, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song · PDF
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
Raghav Singhal, Kaustubh Ponkshe, Praneeth Vepakomma · PDF
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
Sajad Movahedi, Felix Sarnthein, Nicola Muca Cirone, Antonio Orvieto · PDF
Grams: Gradient Descent with Adaptive Momentum Scaling
Yang Cao, Xiaoyu Li, Zhao Song · PDF
Graph Low-Rank Adapters of High Regularity for Graph Neural Networks and Graph Transformers
Pantelis Papageorgiou, Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Michael M. Bronstein · PDF
In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs
Minjae Lee, Wonjun Kang, Byeongkeun Ahn, Christian Classen, Minghao Yan, Hyung Il Koo, Kangwook Lee · PDF
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
Kevin Li, Sachin Goyal, João D. Semedo, J Zico Kolter · PDF
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
Kaustubh Ponkshe, Raghav Singhal, Eduard Gorbunov, Alexey Tumanov, Samuel Horváth, Praneeth Vepakomma · PDF
KV Prediction for Improved Time to First Token
Maxwell Horton, Qingqing Cao, Chenfan Sun, Yanzi Jin, Sachin Mehta, Mohammad Rastegari, Moin Nabi · PDF
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park, Doohyuk Jang, Sung-Yub Kim, Souvik Kundu, Eunho Yang · PDF
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun, Disen Lan, Tong Zhu, Xiaoye Qu, Yu Cheng · PDF
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
Aviv Bick, Tobias Katsch, Nimit Sharad Sohoni, Arjun D Desai, Albert Gu · PDF
Low-Rank Continual Personalization of Diffusion Models
Łukasz Staniszewski, Katarzyna Zaleska, Kamil Deja · PDF
M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS
Nikhil Bhendawade, Mahyar Najibi, Devang Naik, Irina Belousova · PDF
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong, Sayak Paul, Noah Lee, Kashif Rasul, James Thorne, Jongheon Jeong · PDF
MixER: Better Mixture of Experts Routing for Hierarchical Meta-Learning
Roussel Desmond Nzoyem, Grant Stevens, Amarpal Sahota, David A.W. Barton, Tom Deakin · PDF
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
Weixin Liang, Junhong Shen, Genghan Zhang, Ning Dong, Luke Zettlemoyer, LILI YU · PDF
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs
Ilya Zisman, Alexander Nikulin, Viacheslav Sinii, Denis Tarasov, Lyubaykin Nikita, Andrei Polubarov, Igor Kiselev, Vladislav Kurenkov · PDF
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2
Steven Abreu, Sumit Bam Shrestha, Rui-Jie Zhu, Jason Eshraghian · PDF
On Vanishing Variance in Transformer Length Generalization
Ruining Li, Gabrijel Boduljak, Jensen Zhou · PDF
OPPA: OPtimizing PArallelism for Language Model Training
Apivich Hemachandra, Yizhan Han, See-Kiong Ng, Bryan Kian Hsiang Low · PDF
Overtrained Language Models Are Harder to Fine-Tune
Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen, Tanishq Kumar, Xiang Yue, Sadhika Malladi, Graham Neubig, Aditi Raghunathan · PDF
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang, Nathan Srebro, David McAllester, Zhiyuan Li · PDF
QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models
Seyedarmin Azizi, Souvik Kundu, Mohammad Erfan Sadeghi, Massoud Pedram · PDF
RecurFormer: Not All Transformer Heads Need Self-Attention
RuiqingYan, Linghan Zheng, Xingbo Du, Han Zou, Yufeng Guo, Jianfei Yang · PDF
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
Will LeVine, Bijan Varjavand · PDF
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
Utkarsh Saxena, Sayeh Sharify, Kaushik Roy, Xin Wang · PDF
Revisiting Associative Recall in Modern Recurrent Models
Destiny Okpekpe, Antonio Orvieto · PDF
SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization
Jintao Zhang, Haofeng Huang, Pengle Zhang, Jia wei, Jun Zhu, Jianfei Chen · PDF
SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING
Tianjin Huang, Ziquan Zhu, Gaojie Jin, Lu Liu, Zhangyang Wang, Shiwei Liu · PDF
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, Shiwei Liu · PDF
STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION
Zongyu Lin, Wei Liu, Chen Chen, Jiasen Lu, Wenze Hu, Tsu-Jui Fu, Jesse Allardice, Zhengfeng Lai, Liangchen Song, Bowen Zhang, cha chen, Yiran Fei, Yifan Jiang, Lezhi Li, Yizhou Sun, Kai-Wei Chang, Yinfei Yang · PDF
The Curse of Depth in Large Language Models
Wenfang Sun, Xinyuan Song, Pengxiang Li, Lu Yin, Yefeng Zheng, Shiwei Liu · PDF
Towards Infinite-Long Prefix in Transformers
Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang · PDF
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Fenglu Hong, Ravi Shanker Raju, Jonathan Lingjie Li, Bo Li, Urmish Thakker, Avinash Ravichandran, Swayambhoo Jain, Changran Hu · PDF
UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices
Seul-Ki Yeom, Tae-Ho Kim · PDF
Universal LLM Routing with Correctness-Based Representation
Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, Sanjiv Kumar · PDF
XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units
Arghadip Das, Arnab Raha, Shamik Kundu, Soumendu Kumar Ghosh, Deepak Mathaikutty, Vijay Raghunathan · PDF
Yes, Q-learning Helps Offline In-Context RL
Denis Tarasov, Alexander Nikulin, Ilya Zisman, Albina Klepach, Andrei Polubarov, Lyubaykin Nikita, Alexander Derevyagin, Igor Kiselev, Vladislav Kurenkov · PDF

Accepted papers (56)

☆A Unified Approach to Routing and Cascading for LLMs

☆Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention

☆Adaptive Length Image Tokenization via Recurrent Allocation

☆AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting

☆AsymLoRA: Unlocking the Power of Multimodal LLMs via Asymmetric LoRA

☆Attention Is All You Need For Mixture-of-Depths Routing

☆ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

☆Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

☆Conformal Transformations for Symmetric Power Transformers

☆Context Is All You Need: Efficient Retrieval Augmented Generation for Domain Specific AI

☆DARS : ROBUST SPARSE FINE-TUNING WITH REGULARIZED SUBSPACE DISALIGNMENT

☆DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

☆Domain-Invariant Prompt Learning for Vision-Language Models

☆Efficient Distributed Optimization under Heavy-Tailed Noise

☆Efficient Open-set Test Time Adaptation of Vision Language Models

☆Effortless Efficiency: Low-Cost Pruning of Diffusion Models

☆Enhanced Continual Learning of Vision-Language Models with Model Fusion

☆Fast Gradient Computation for RoPE Attention in Almost Linear Time

☆FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models

☆Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations

☆Grams: Gradient Descent with Adaptive Momentum Scaling

☆Graph Low-Rank Adapters of High Regularity for Graph Neural Networks and Graph Transformers

☆In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs

☆Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

☆Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

☆KV Prediction for Improved Time to First Token

☆LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models

☆Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

☆Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

☆Low-Rank Continual Personalization of Diffusion Models

☆M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS

☆Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

☆MixER: Better Mixture of Experts Routing for Hierarchical Meta-Learning

☆Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

☆N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs

☆Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2

☆On Vanishing Variance in Transformer Length Generalization

☆OPPA: OPtimizing PArallelism for Language Model Training

☆Overtrained Language Models Are Harder to Fine-Tune

☆PENCIL: Long Thoughts with Short Memory

☆QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models

☆RecurFormer: Not All Transformer Heads Need Self-Attention

☆Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking

☆ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals

☆Revisiting Associative Recall in Modern Recurrent Models

☆SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization

☆SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING

☆Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

☆STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION

☆The Curse of Depth in Large Language Models

☆Towards Infinite-Long Prefix in Transformers

☆Training Domain Draft Models for Speculative Decoding: Best Practices and Insights

☆UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

☆Universal LLM Routing with Correctness-Based Representation

☆XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units

☆Yes, Q-learning Helps Offline In-Context RL

A Unified Approach to Routing and Cascading for LLMs

Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention

Adaptive Length Image Tokenization via Recurrent Allocation

AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting

AsymLoRA: Unlocking the Power of Multimodal LLMs via Asymmetric LoRA

Attention Is All You Need For Mixture-of-Depths Routing

ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters

Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models

Conformal Transformations for Symmetric Power Transformers

Context Is All You Need: Efficient Retrieval Augmented Generation for Domain Specific AI

DARS : ROBUST SPARSE FINE-TUNING WITH REGULARIZED SUBSPACE DISALIGNMENT

DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs

Domain-Invariant Prompt Learning for Vision-Language Models

Efficient Distributed Optimization under Heavy-Tailed Noise

Efficient Open-set Test Time Adaptation of Vision Language Models

Effortless Efficiency: Low-Cost Pruning of Diffusion Models

Enhanced Continual Learning of Vision-Language Models with Model Fusion

Fast Gradient Computation for RoPE Attention in Almost Linear Time

FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models

Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations

Grams: Gradient Descent with Adaptive Momentum Scaling

Graph Low-Rank Adapters of High Regularity for Graph Neural Networks and Graph Transformers

In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs

Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

KV Prediction for Improved Time to First Token

LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models

Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

Low-Rank Continual Personalization of Diffusion Models

M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS

Margin-aware Preference Optimization for Aligning Diffusion Models without Reference

MixER: Better Mixture of Experts Routing for Hierarchical Meta-Learning

Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity

N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs

Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2

On Vanishing Variance in Transformer Length Generalization

OPPA: OPtimizing PArallelism for Language Model Training

Overtrained Language Models Are Harder to Fine-Tune

PENCIL: Long Thoughts with Short Memory

QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models

RecurFormer: Not All Transformer Heads Need Self-Attention

Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking

ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals

Revisiting Associative Recall in Modern Recurrent Models

SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization

SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam

STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION

The Curse of Depth in Large Language Models

Towards Infinite-Long Prefix in Transformers

Training Domain Draft Models for Speculative Decoding: Best Practices and Insights

UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices

Universal LLM Routing with Correctness-Based Representation

XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units

Yes, Q-learning Helps Offline In-Context RL