ICML 2024 Past Large language modelsEfficiencyML systems

Workshop on Efficient Systems for Foundation Models II @ ICML2024

ES-FoMo-II 2024

Submission deadline
Jun 4, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (80)

Fetched from OpenReview (v2) on 2026-06-10.

  1. AdaInf: Adaptive Inference for Resource-Constrained Foundation Models

    Zhuoyan Xu, Khoi Duc Nguyen, Preeti Mukherjee, Somali Chaterji, Yingyu Liang, Yin Li · PDF
  2. Adam-mini: Use Fewer Learning Rates To Gain More

    Yushun Zhang, Congliang Chen, Ziniu Li, Tian Ding, Chenwei Wu, Yinyu Ye, Zhi-Quan Luo, Ruoyu Sun · PDF
  3. AdaNF: Quantization Group Adaptive NormalFloat for Low Bit Fine-tuning of LLMs

    Yeojoon Youn, Sehoon Kim, Suhong Moon, Sang Keun Choe, Ce Zhang · PDF
  4. BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

    Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Nicolaus Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Üstün, Acyr Locatelli · PDF
  5. Block Verification Accelerates Speculative Decoding

    Ziteng Sun, Uri Mendlovic, Yaniv Leviathan, Asaf Aharoni, Ahmad Beirami, Jae Hun Ro, Ananda Theertha Suresh · PDF
  6. Can Transformers Solve Least Squares to High Precision?

    Jerry Weihong Liu, Jessica Grogan, Owen M Dugan, Simran Arora, Atri Rudra, Christopher Re · PDF
  7. Characterizing Prompt Compression Methods for Long Context Inference

    Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim, Kurt Keutzer, Amir Gholami · PDF
  8. CLAM: Unifying Finetuning, Quantization, and Pruning by Chaining LLM Adapter Modules

    Neelay Velingker, Jason Liu, Amish Sethi, William Dodds, Zhiqiu Xu, Saikat Dutta, Mayur Naik, Eric Wong · PDF
  9. CO2: Precise Attention Score Observation for improving KV Cache Replacement in Large Language Model

    Meguru Yamazaki, Shivaram Venkataraman · PDF
  10. Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead

    Rickard Brüel Gabrielsson, Jiacheng Zhu, Onkar Bhardwaj, Leshem Choshen, Kristjan Greenewald, Mikhail Yurochkin, Justin Solomon · PDF
  11. DocParseNet: Advanced Semantic Segmentation and OCR Embeddings for Efficient Scanned Document Annotation

    Ahmad Mohammadshirazi, Ali Nosratifiroozsalari, Mengxi Zhou, Dheeraj Kulshrestha, Rajiv Ramnath · PDF
  12. Does your data spark joy? Performance gains from domain upsampling at the end of training

    Cody Blakeney, Mansheej Paul, Brett W. Larsen, Sean Owen, Jonathan Frankle · PDF
  13. Efficient LLM Pruning with Global Token-Dependency Awareness and Hardware-Adapted Inference

    Oshin Dutta, Ritvik Gupta, Sumeet Agarwal · PDF
  14. Efficient multi-prompt evaluation of LLMs

    Felipe Maia Polo, Ronald Xu, Lucas Weber, Mírian Silva, Onkar Bhardwaj, Leshem Choshen, Allysson Flavio Melo de Oliveira, Yuekai Sun, Mikhail Yurochkin · PDF
  15. Efficient Training of Language Models with Compact and Consistent Next Token Distributions

    Ashutosh Sathe, Sunita Sarawagi · PDF
  16. Enhancing Stability for Large Models Training in Constrained Bandwidth Networks

    Yun Dai, Tejas Dharamsi, Pin-Lun Hsu, Tao Song, Hamed Firooz · PDF
  17. Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion

    Filip Szatkowski, Bartosz Wójcik, Mikołaj Piórczyński, Simone Scardapane · PDF
  18. Exploring and Improving Drafts in Blockwise Parallel Decoding

    Taehyeon Kim, Ananda Theertha Suresh, Kishore A Papineni, Michael Riley, Sanjiv Kumar, Adrian Benton · PDF
  19. Exploring Monotonicity in Early-Exiting Language Models

    Filipe Laitenberger, Max Belitsky, Denys Sheremet · PDF
  20. ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement

    Eashan Adhikarla, Kai Zhang, John Nicholson, Brian D. Davison · PDF
  21. Exponential Quantum Communication Advantage in Distributed Inference and Learning

    Hagay Michaeli, Dar Gilboa, Daniel Soudry, Jarrod Ryan McClean · PDF
  22. Fast Adaptation and Robust Quantization of Multi-Modal Foundation Models from Associative Memory: A Case Study in SpeechLM

    Shang Wu, Yen-Ju Lu, Haozheng Luo, Jerry Yao-Chieh Hu, Jiayi Wang, Najim Dehak, Jesus Villalba, Han Liu · PDF
  23. Fast and Memory-Efficient Multi-Sequence Generation via Structured Masking

    Daniel Mingyi Israel, Siyan Zhao, Guy Van den Broeck, Aditya Grover · PDF
  24. Fast yet Safe: Early-Exiting with Risk Control

    Metod Jazbec, Alexander Timans, Tin Hadži Veljković, Kaspar Sakmann, Dan Zhang, Christian A. Naesseth, Eric Nalisnick · PDF
  25. Fewer Truncations Improve Language Modeling

    Hantian Ding, Zijian Wang, Giovanni Paolini, Varun Kumar, Anoop Deoras, Dan Roth, Stefano Soatto · PDF
  26. GPTVQ: The Blessing of Dimensionality for LLM Quantization

    Mart Van Baalen, Andrey Kuzmin, Markus Nagel, Peter Couperus, Artem Bolshakov, Cedric Bastoul, Eric Mahurin, Tijmen Blankevoort, Paul Whatmough · PDF
  27. GRASS: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients

    Aashiq Muhamed, Oscar Li, David Woodruff, Mona T. Diab, Virginia Smith · PDF
  28. Hardware-Efficient Quantization for Green Custom Foundation Models

    Toshiaki Koike-Akino, Chang Meng, Volkan Cevher, Giovanni De Micheli · PDF
  29. HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

    Darren Yan Key, Andy He, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee · PDF
  30. Hydragen: High-Throughput LLM Inference with Shared Prefixes

    Jordan Juravsky, Bradley Brown, Ryan Saul Ehrlich, Daniel Y Fu, Christopher Re, Azalia Mirhoseini · PDF
  31. Implicit Optimization Bias of Next-token Prediction in Linear Models

    Christos Thrampoulidis · PDF
  32. In Defense of Structural Sparse Adapters for Concurrent LLM Serving

    Junda Su, Zirui Liu, Zeju Qiu, Weiyang Liu, Zhaozhuo Xu · PDF
  33. Janus: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences

    Krithik Ramesh, Sameed Muneeb Siddiqui, Michael Mitzenmacher, Pardis Sabeti · PDF
  34. Just read twice: closing the recall gap for recurrent language models

    Simran Arora, Aman Timalsina, Aaryan Singhal, Sabri Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, Christopher Re · PDF
  35. LAuReL: Learned Augmented Residual Layer

    Gaurav Menghani, Ravi Kumar, Sanjiv Kumar · PDF
  36. LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference

    Qichen Fu, Minsik Cho, Thomas Merth, Sachin Mehta, Mohammad Rastegari, Mahyar Najibi · PDF
  37. Learned Best-Effort LLM Serving

    Siddharth Jha, Coleman Richard Charles Hooper, Xiaoxuan Liu, Sehoon Kim, Kurt Keutzer · PDF
  38. Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs

    Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal · PDF
  39. Low Rank Quantization-Aware Training for LLMs

    Yelysei Bondarenko, Riccardo Del Chiaro, Markus Nagel · PDF
  40. Low-rank Linearization of Large Language Models

    Michael Zhang, Aaryan Singhal, Benjamin Frederick Spector, Simran Arora, Christopher Re · PDF
  41. Mamba-PTQ: Outlier Channels in Recurrent Large Language Models

    Alessandro Pierro, Steven Abreu · PDF
  42. MInference: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

    Huiqiang Jiang, YUCHENG LI, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu · PDF
  43. Mobile and Edge Evaluation of Large Language Models

    Stefanos Laskaridis, Kleomenis Katevas, Lorenzo Minto, Hamed Haddadi · PDF
  44. MoRe Fine-Tuning with 10x Fewer Parameters

    Wenxuan Tan, Nicholas Roberts, Tzu-Heng Huang, Jitian Zhao, John Cooper, Samuel Guo, Chengyu Duan, Frederic Sala · PDF
  45. NVDSL: Simplifying Tensor Cores with Python-Driven MLIR Metaprogramming

    guray ozen · PDF
  46. OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

    Sami Jaghouar, Johannes Hagemann · PDF
  47. OpenELM: An Efficient Language Model Family with Open Training and Inference Framework

    Sachin Mehta, Mohammad Hossein Sekhavat, Qingqing Cao, Maxwell Horton, Yanzi Jin, Chenfan Sun, Seyed Iman Mirzadeh, Mahyar Najibi, Dmitry Belenko, Peter Zatloukal, Mohammad Rastegari · PDF
  48. Optimised Grouped-Query Attention Mechanism for Transformers

    Yuang Chen, Cheng Zhang, Xitong Gao, Robert D. Mullins, George Anthony Constantinides, Yiren Zhao · PDF
  49. Optimistic Verifiable Training by Controlling Hardware Nondeterminism

    Megha Srivastava, Simran Arora, Dan Boneh · PDF
  50. OutEffHop: A Principled Outlier-Efficient Attention Layer from Dense Associative Memory Models

    Haozheng Luo, Jerry Yao-Chieh Hu, Pei-Hsuan Chang, Hong-Yu Chen, Weijian Li, Wei-Po Wang, Han Liu · PDF
  51. Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs

    Davide Paglieri, Saurabh Dash, Tim Rocktäschel, Jack Parker-Holder · PDF
  52. Performance Control in Early Exiting to Deploy Large Models at the Same Cost of Smaller Ones

    Mehrnaz Mofakhami, Reza Bayat, Ioannis Mitliagkas, Joao Monteiro, Valentina Zantedeschi · PDF
  53. PQV-Mobile: A Combined Pruning and Quantization Toolkit to Optimize Vision Transformers for Mobile Applications

    Kshitij Bhardwaj · PDF
  54. Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models

    Siyan Zhao, Daniel Mingyi Israel, Guy Van den Broeck, Aditya Grover · PDF
  55. Pretrained Hybrids with MAD Skills

    Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala · PDF
  56. Projectable Models: One-Shot Generation of Small Specialized Transformers from Large Ones

    Andrey Zhmoginov, Jihwan Lee, Mark Sandler · PDF
  57. Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation

    Harry Dong, Beidi Chen, Yuejie Chi · PDF
  58. Quantum-PEFT: Ultra parameter-efficient fine-tuning

    Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu, Leyla Naz Candogan, Volkan Cevher · PDF
  59. Revealing the Utilized Rank of Subspaces of Learning in Neural Networks

    Isha Garg, Christian Koguchi, Eshan Verma, Daniel Ulbricht · PDF
  60. Revisiting Cascaded Ensembles for Efficient Inference

    Steven Kolawole, Don Dennis, Ameet Talwalkar, Virginia Smith · PDF
  61. Robust Federated Finetuning of Foundation Models via Alternating Minimization of LoRA

    Shuangyi Chen, Yue Ju, Hardik Dalal, Zhongwen Zhu, Ashish J Khisti · PDF
  62. Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

    Alexander Hägele, Elie Bakouch, Atli Kosson, Loubna Ben allal, Leandro Von Werra, Martin Jaggi · PDF
  63. Scavenging Hyena: Distilling Transformers into Long Convolution Models

    Tokiniaina Raharison Ralambomihanta, Shahrad Mohammadzadeh, Sami Nur Islam, Wassim Jabbour, Laurence Liang · PDF
  64. Seeded LoRA: Collaborative Fine-Tuning Through Seed Initialization of Adapters

    Alejandro R. Salamanca, Ahmet Üstün, Nicki Skafte Detlefsen, Tim Dettmers · PDF
  65. Simple linear attention language models balance the recall-throughput tradeoff

    Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher Re · PDF
  66. SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths

    Kaixuan Huang, Xudong Guo, Mengdi Wang · PDF
  67. SVFT: Parameter-Efficient Fine-Tuning with Singular Vectors

    Vijay Lingam, Atula Tejaswi Neerkaje, Aditya Vavre, Aneesh Shetty, Gautham Krishna Gudur, Joydeep Ghosh, Alex Dimakis, Eunsol Choi, Aleksandar Bojchevski, sujay sanghavi · PDF
  68. Task Addition and Weight Disentanglement in Closed-Vocabulary Models

    Adam Hazimeh, Alessandro Favero, Pascal Frossard · PDF
  69. The Mamba in the Llama: Distilling and Accelerating Hybrid Models

    Junxiong Wang, Daniele Paliotta, Avner May, Alexander M Rush, Tri Dao · PDF
  70. Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

    Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki M Asano, Babak Ehteshami Bejnordi · PDF
  71. TinyAgent: Quantization-aware Model Compression and Adaptation for On-device LLM Agent Deployment

    Jason Kong, Lanxiang Hu, Flavio Ponzina, Tajana Rosing · PDF
  72. Towards Efficient Large-Scale Language-3D Representation Learning

    Shentong Mo, Xiaogang Xu, Tongzhou Wang, Antonio Torralba, Shuang Li · PDF
  73. Towards smaller language models via layer looping

    Sabri Eyuboglu, Dylan Zinsley, Jon Saad-Falcon, Simran Arora, Atri Rudra, James Zou, Christopher Re · PDF
  74. Train your cake and eat it too! Repurposing collaborative training to tailor LLMs to private data without sharing

    Boris Radovič, Mohammed Aljahdali, Marco Canini, Veljko Pejović, Zuhair Khayyat · PDF
  75. Training-Free Acceleration of ViTs with Delayed Spatial Merging

    Jung Hwan Heo, Seyedarmin Azizi, Arash Fayyazi, Massoud Pedram · PDF
  76. Understanding and Minimising Outlier Features in Neural Network Training

    Bobby He, Lorenzo Noci, Daniele Paliotta, Imanol Schlag, Thomas Hofmann · PDF
  77. Unlocking the Global Synergies in Low-Rank Adapters

    Zixi Zhang, Cheng Zhang, Xitong Gao, Robert D. Mullins, George Anthony Constantinides, Yiren Zhao · PDF
  78. Why Transformers Need Adam: A Hessian Perspective

    Yushun Zhang, Congliang Chen, Tian Ding, Ziniu Li, Ruoyu Sun, Zhi-Quan Luo · PDF
  79. xLSTM: Extended Long Short-Term Memory

    Maximilian Beck, Korbinian Pöppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael K Kopp, Günter Klambauer, Johannes Brandstetter, Sepp Hochreiter · PDF
  80. Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity

    Wentao Guo, Jikai Long, Yimeng Zeng, Zirui Liu, Xinyu Yang, Yide Ran, Jacob R. Gardner, Osbert Bastani, Christopher De Sa, Xiaodong Yu, Beidi Chen, Zhaozhuo Xu · PDF