ICLR 2025 Past Large language modelsEfficiencyOptimization
First Workshop on Scalable Optimization for Efficient and Adaptive Foundation Models
SCOPE - ICLR 2025
- Submission deadline
- Feb 10, 2025, 12:05 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (56)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Unified Approach to Routing and Cascading for LLMs
-
Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention
-
Adaptive Length Image Tokenization via Recurrent Allocation
-
AdaPTS: Adapting Univariate Foundation Models to Probabilistic Multivariate Time Series Forecasting
-
AsymLoRA: Unlocking the Power of Multimodal LLMs via Asymmetric LoRA
-
Attention Is All You Need For Mixture-of-Depths Routing
-
ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters
-
Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models
-
Conformal Transformations for Symmetric Power Transformers
-
Context Is All You Need: Efficient Retrieval Augmented Generation for Domain Specific AI
-
DARS : ROBUST SPARSE FINE-TUNING WITH REGULARIZED SUBSPACE DISALIGNMENT
-
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
-
Domain-Invariant Prompt Learning for Vision-Language Models
-
Efficient Distributed Optimization under Heavy-Tailed Noise
-
Efficient Open-set Test Time Adaptation of Vision Language Models
-
Effortless Efficiency: Low-Cost Pruning of Diffusion Models
-
Enhanced Continual Learning of Vision-Language Models with Model Fusion
-
Fast Gradient Computation for RoPE Attention in Almost Linear Time
-
FedEx-LoRA: Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
-
Fixed-Point RNNs: From Diagonal to Dense in a Few Iterations
-
Grams: Gradient Descent with Adaptive Momentum Scaling
-
Graph Low-Rank Adapters of High Regularity for Graph Neural Networks and Graph Transformers
-
In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs
-
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
-
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning
-
KV Prediction for Improved Time to First Token
-
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
-
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
-
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing
-
Low-Rank Continual Personalization of Diffusion Models
-
M2R2: EFFICIENT TRANSFORMERS WITH MIXTURE OF MULTI-RATE RESIDUALS
-
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
-
MixER: Better Mixture of Experts Routing for Hierarchical Meta-Learning
-
Mixture-of-Mamba: Enhancing Multi-Modal State-Space Models with Modality-Aware Sparsity
-
N-Gram Induction Heads for In-Context RL: Improving Stability and Reducing Data Needs
-
Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2
-
On Vanishing Variance in Transformer Length Generalization
-
OPPA: OPtimizing PArallelism for Language Model Training
-
Overtrained Language Models Are Harder to Fine-Tune
-
PENCIL: Long Thoughts with Short Memory
-
QMambaExtend: Improving Long-Context Extension of Memory-Efficient Mamba Models
-
RecurFormer: Not All Transformer Heads Need Self-Attention
-
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
-
ResQ: Mixed-Precision Quantization of Large Language Models with Low-Rank Residuals
-
Revisiting Associative Recall in Modern Recurrent Models
-
SageAttention2: Efficient Attention with Smoothing Q and Per-thread Quantization
-
SPAM: SPIKE-AWARE ADAM WITH MOMENTUM RESET FOR STABLE LLM TRAINING
-
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
-
STIV: SCALABLE TEXT AND IMAGE CONDITIONED VIDEO GENERATION
-
The Curse of Depth in Large Language Models
-
Towards Infinite-Long Prefix in Transformers
-
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
-
UniForm: A Reuse Attention Mechanism for Efficient Transformers on Resource-Constrained Edge Devices
-
Universal LLM Routing with Correctness-Based Representation
-
XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units
-
Yes, Q-learning Helps Offline In-Context RL