ICML 2025PastLarge language models

ICML 2025 Workshop on Long-Context Foundation Models

LCFM 2025

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: May 29, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (35)

Fetched from OpenReview (v2) on 2026-06-10.

Accelerated Inference with Long-Sequence Transformers on CPUs
Yuzhen Mao, Martin Ester, Ke Li · PDF
ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction
Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Rajiv Ramnath · PDF
BSA: Ball Sparse Attention for Large-scale Geometries
Cătălin-Emanuel Brița, Hieu Nguyen, Lohithsai Yadala Chanchu, Domonkos Nagy, Maksim Zhdanov · PDF
CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs
Insu Han, Zeliang Zhang, Zhiyuan Wang, Yifan Zhu, Susan Liang, Jiani Liu, Haiting Lin, Mingjie Zhao, Chenliang Xu, Kun Wan, Wentian Zhao · PDF
Dynamic Causal‐Graph Memory: Structured Retrieval for Million–Token Reasoning
Thomas Y Chen · PDF
e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs
Amrith Setlur, Matthew Y. R. Yang, Charlie Victor Snell, Jeremiah Greer, Ian Wu, Virginia Smith, Max Simchowitz, Aviral Kumar · PDF
Enhancing Retrieval-Augmented Generation with Dehallucinating Parallel Context Extension
Zexiong Ma, Shengnan An, Zeqi Lin, Yanzhen Zou, Jian-Guang Lou, Bing Xie · PDF
Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions
Yuanzhe Hu, Yu Wang, Julian McAuley · PDF
Foreign Sparse Attention: Effective Distillation into Sparse Attention
Vijaykaarti Sundarapandiyan, Tom Goldstein, Ashwinee Panda · PDF
GSM-Infinite: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?
Yang Zhou, Hongyi Liu, Zhuoming Chen, Yuandong Tian, Beidi Chen · PDF
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bo Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, Anima Anandkumar · PDF
How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles
Vala Vakilian, Sadegh Mahdavi, Christos Thrampoulidis · PDF
Improving Context Fidelity via Native Retrieval-Augmented Reasoning
Suyuchen Wang, Jinlin Wang, Xinyu Wang, Shiqi Li, Xiangru Tang, Sirui Hong, Xiao-Wen Chang, Chenglin Wu, Bang Liu · PDF
Jailbreaking in the Haystack
Rishi Rajesh Shah, Chen Henry Wu, Ziqian Zhong, Alexander Robey, Aditi Raghunathan · PDF
Kinetics: Rethinking Test-Time Scaling Laws
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng, Yang Zhou, Emma Strubell, Beidi Chen · PDF
Language Modeling with Learned Meta-Tokens
Alok Shah, Khush Gupta, Keshav Ramji, Pratik Chaudhari · PDF
Looking beyond the next token
Abitha Thankaraj, Yiding Jiang, J Zico Kolter, Yonatan Bisk · PDF
Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement
Xuechen Zhang, Zijian Huang, Chenshun Ni, Ziyang Xiong, Jiasi Chen, Samet Oymak · PDF
MatMuls are Enough for Efficient and Performant Linear-Time Attention
Andrew Argatkiny, Ilya Makarov · PDF
MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning
Dong Liu, Yanxuan Yu, Xuhong Wang, Ben Lengerich, Ying Nian Wu · PDF
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly
Zhaowei Wang, Wenhao Yu, Xiyu REN, Jipeng Zhang, Yu Zhao, Rohit Saxena, Liang Cheng, Ginny Wong, Simon See, Pasquale Minervini, Yangqiu Song, Mark Steedman · PDF
Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling
Eric Egli, Matteo Manica, Jannis Born · PDF
NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts
Abhay Gupta, Kevin Zhu, Vasu Sharma, Sean O'Brien, Michael Lu · PDF
OracleKV: Oracle Guidance for Question-Independent KV Cache Eviction
Yuanbing Zhu, Zhenheng Tang, Xiang Liu, Ang Li, Bo Li, Xiaowen Chu, Bo Han · PDF
Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration
James Begin, Namit Agrawal, Eshan Singh, Yicheng Fu, Sean O'Brien, Vasu Sharma, Kevin Zhu · PDF
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong, Chao Wan, Kamilė Stankevičiūtė, Anmol Kabra, Raphael Thesmar, Johann Lee, Julius Klenke, Carla P Gomes, Kilian Q Weinberger · PDF
pLSTM: parallelizable Linear Source Transition Mark networks
Korbinian Pöppel, Richard Freinschlag, Thomas Schmied, Wei Lin, Sepp Hochreiter · PDF
Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Model
Kyu Won Kim, Suhwan Choi, Myeongho Jeon · PDF
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong, Bilge Acun, Beidi Chen, Yuejie Chi · PDF
Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations
Zhengyao Gu, Henry Peng Zou, Aiwei Liu, Yankai Chen, Weizhi Zhang, Philip S. Yu · PDF
Simple, Scalable Reasoning via Iterated Summarization
Vivek Vajipey, Aditya Tadimeti, Justin Shen, Ben Prystawski, Michael Y. Li, Noah Goodman · PDF
SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling
Krishna C Puvvada, Faisal Ladhak, Santiago Akle Serano, Cheng-Ping Hsieh, Shantanu Acharya, Somshubra Majumdar, Fei Jia, Samuel Kriman, Simeng Sun, Dima Rekesh, Boris Ginsburg · PDF
Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning
Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey · PDF
Towards Understanding Self-Pretraining for Sequence Classification
Omar Coser, Antonio Orvieto · PDF
Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length
Chupei Wang, Jiaqiu Vince Sun · PDF

Accepted papers (35)

☆Accelerated Inference with Long-Sequence Transformers on CPUs

☆ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction

☆BSA: Ball Sparse Attention for Large-scale Geometries

☆CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs

☆Dynamic Causal‐Graph Memory: Structured Retrieval for Million–Token Reasoning

☆e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

☆Enhancing Retrieval-Augmented Generation with Dehallucinating Parallel Context Extension

☆Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

☆Foreign Sparse Attention: Effective Distillation into Sparse Attention

☆GSM-Infinite: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?

☆HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

☆How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles

☆Improving Context Fidelity via Native Retrieval-Augmented Reasoning

☆Jailbreaking in the Haystack

☆Kinetics: Rethinking Test-Time Scaling Laws

☆Language Modeling with Learned Meta-Tokens

☆Looking beyond the next token

☆Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

☆MatMuls are Enough for Efficient and Performant Linear-Time Attention

☆MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning

☆MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

☆Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling

☆NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts

☆OracleKV: Oracle Guidance for Question-Independent KV Cache Eviction

☆Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration

☆PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation

☆pLSTM: parallelizable Linear Source Transition Mark networks

☆Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Model

☆Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

☆Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations

☆Simple, Scalable Reasoning via Iterated Summarization

☆SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling

☆Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning

☆Towards Understanding Self-Pretraining for Sequence Classification

☆Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length

Accelerated Inference with Long-Sequence Transformers on CPUs

ALCo-FM: Adaptive Long-Context Foundation Model for Accident Prediction

BSA: Ball Sparse Attention for Large-scale Geometries

CalibQuant: 1-Bit KV Cache Quantization for Multimodal LLMs

Dynamic Causal‐Graph Memory: Structured Retrieval for Million–Token Reasoning

e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

Enhancing Retrieval-Augmented Generation with Dehallucinating Parallel Context Extension

Evaluating Memory in LLM Agents via Incremental Multi-Turn Interactions

Foreign Sparse Attention: Effective Distillation into Sparse Attention

GSM-Infinite: How Do your LLMs Behave over Infinitely Increasing Reasoning Complexity and Context Length?

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

How Much Context Does Natural Language Actually Require? An Analysis Using LLMs as Statistical Oracles

Improving Context Fidelity via Native Retrieval-Augmented Reasoning

Jailbreaking in the Haystack

Kinetics: Rethinking Test-Time Scaling Laws

Language Modeling with Learned Meta-Tokens

Looking beyond the next token

Making Small Language Models Efficient Reasoners: Intervention, Supervision, Reinforcement

MatMuls are Enough for Efficient and Performant Linear-Time Attention

MKA: Memory-Keyed Attention for Efficient Long-Context Reasoning

MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly

Multiscale Byte Language Models -- A Hierarchical Architecture for Causal Million-Length Sequence Modeling

NovelHopQA: Diagnosing Multi-Hop Reasoning Failures in Long Narrative Contexts

OracleKV: Oracle Guidance for Question-Independent KV Cache Eviction

Pause-Tuning for Long-Context Comprehension: A Lightweight Approach to LLM Attention Recalibration

PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation

pLSTM: parallelizable Linear Source Transition Mark networks

Say as It Is: Verbatim Fidelity Evaluation of Long-Context Language Model

Scalable LLM Math Reasoning Acceleration with Low-rank Distillation

Scaling Laws for Many-Shot In-Context Learning with Self-Generated Annotations

Simple, Scalable Reasoning via Iterated Summarization

SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling

Thinformer: Guaranteed Attention Approximation via Low-Rank Thinning

Towards Understanding Self-Pretraining for Sequence Classification

Unable to Forget: Proactive Interference Reveals Working Memory Limits in LLMs Beyond Context Length