ICML 2024PastLarge language modelsTheory

ICML 2024 Workshop on Theoretical Foundations of Foundation Models

TF2M 2024

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: Jun 1, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (58)

Fetched from OpenReview (v2) on 2026-06-10.

A deeper look at depth pruning of LLMs
Shoaib Ahmed Siddiqui, Xin Dong, Greg Heinrich, Thomas Breuel, Jan Kautz, David Krueger, Pavlo Molchanov · PDF
A Theoretical Understanding of Self-Correction through In-context Alignment
Yifei Wang, Yuyang Wu, Zeming Wei, Stefanie Jegelka, Yisen Wang · PDF
Active Preference Optimization for Sample Efficient RLHF
Nirjhar Das, Souradip Chakraborty, Aldo Pacchiano, Sayak Ray Chowdhury · PDF
Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models
Georgy Tyukin, Gbetondji Jean-Sebastien Dovonon, Jean Kaddour, Pasquale Minervini · PDF
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Yunzhen Feng, Elvis Dohmatob, Pu Yang, Francois Charton, Julia Kempe · PDF
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon Shaolei Du · PDF
Detrimental Memories in Transfer Learning
Amal Alnouri, Timothy J Wroge, Bilal Alsallakh · PDF
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Chanwoo Park, Xiangyu Liu, Asuman E. Ozdaglar, Kaiqing Zhang · PDF
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers
Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam · PDF
Efficient Document Ranking with Learnable Late Interactions
Himanshu Jain, Ziwei Ji, Sashank J. Reddi, Ankit Singh Rawat, Felix Yu, Aditya Krishna Menon, Sadeep Jayasumana · PDF
Fast Machine Unlearning via Robust Training
Youssef Allouah, Joshua Kazdan, Rachid Guerraoui, Sanmi Koyejo · PDF
Fine-Tuning Large Language Models with User-Level Differential Privacy
Zachary Charles, Arun Ganesh, Ryan McKenna, Hugh Brendan McMahan, Nicole Elyse Mitchell, Krishna Pillutla, J Keith Rush · PDF
Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models
Adway Girish, Alliot Nagle, Ashok Vardhan Makkuva, Marco Bondaschi, Michael Gastpar, Hyeji Kim · PDF
Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment
Jiaxiang Li, Siliang Zeng, Hoi To Wai, Chenliang Li, Alfredo Garcia, Mingyi Hong · PDF
Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy
Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf · PDF
How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?
Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen · PDF
How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion
Pulkit Gopalani, Ekdeep Singh Lubana, Wei Hu · PDF
How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining
Yu Huang, Zixin Wen, Yuejie Chi, Yingbin Liang · PDF
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression
Xingwu Chen, Lei Zhao, Difan Zou · PDF
Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models
Zachary Robertson, Hannah Cha, Andrew Sheha, Sanmi Koyejo · PDF
Implicit Optimization Bias of Next-token Prediction in Linear Models
Christos Thrampoulidis · PDF
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Bingcong Li, Liang Zhang, Niao He · PDF
ImportanceWeighted Multi-Draft Speculative Sampling
Ashish J Khisti, Arash Behravesh, Hassan Dbouk, Arash Behboodi, Roland Memisevic, Christos Louizos · PDF
In-Context Learning from Training on Unstructured Data: The Role of Co-Occurrence, Positional Information, and Training Data Structure
Kevin Christian Wibisono, Yixin Wang · PDF
In-Context Learning with Representations: Contextual Generalization of Trained Transformers
Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi · PDF
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
Ashok Vardhan Makkuva, Marco Bondaschi, Chanakya Ekbote, Adway Girish, Alliot Nagle, Hyeji Kim, Michael Gastpar · PDF
Meta-optimization for Deep Learning via Nonstochastic Control
Xinyi Chen, Evan Dogariu, Zhou Lu, Elad Hazan · PDF
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su, Julia Kempe, Karen Ullrich · PDF
Modeling the Plurality of Human Preferences via Ideal Points
Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak · PDF
Models That Prove Their Own Correctness
Noga Amit, Shafi Goldwasser, Orr Paradise, Guy N. Rothblum · PDF
MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis
Vishrut Thoutam, Dina Ellsworth · PDF
Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?
Alexander Tsvetkov, Alon Kipnis · PDF
On Provable Length and Compositional Generalization
Kartik Ahuja, Amin Mansouri · PDF
On the Power of Convolution Augmented Transformer
Mingchen Li, Xuechen Zhang, Yixiao Huang, Samet Oymak · PDF
One-Shot Safety Alignment for Large Language Models via Optimal Dualization
Xinmeng Huang, Shuo Li, Edgar Dobriban, Osbert Bastani, Hamed Hassani, Dongsheng Ding · PDF
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
Junsong Chen, Simian Luo, Enze Xie · PDF
Preference Learning Algorithms Do Not Learn Preference Rankings
Angelica Chen, Sadhika Malladi, Lily H Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho · PDF
Progressive distillation improves feature learning via implicit curriculum
Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Andrej Risteski, Surbhi Goel · PDF
Rethinking Invariance in In-context Learning
Lizhe Fang, Yifei Wang, Khashayar Gatmiry, Lei Fang, Yisen Wang · PDF
RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation
Chanwoo Park, Mingyang Liu, Dingwen Kong, Kaiqing Zhang, Asuman E. Ozdaglar · PDF
SAIL: Self-improving Efficient Online Alignment of Large Language Models
Mucong Ding, Souradip Chakraborty, Vibhu Agrawal, Zora Che, Alec Koppel, Mengdi Wang, Amrit Bedi, Furong Huang · PDF
Self-Play Preference Optimization for Language Model Alignment
Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu · PDF
Setting the Record Straight on Transformer Oversmoothing
Gbetondji Jean-Sebastien Dovonon, Michael M. Bronstein, Matt Kusner · PDF
Sparse network initialization using deterministic Ramanujan graphs
Arindam Biswas, Suryam Arnav Kalra, Pabitra Mitra, BISWAJIT BASU · PDF
State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness
Naoki Nishikawa, Taiji Suzuki · PDF
The Geometry of Categorical and Hierarchical Concepts in Large Language Models
Kiho Park, Yo Joong Choe, Yibo Jiang, Victor Veitch · PDF
Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates
Afrin Dange, Vaibhav Raj, Praneeth Netrapalli, Sunita Sarawagi · PDF
Transformer Efficiently Learns Low-dimensional Target Functions In-context
Yujin Song, Denny Wu, Kazusato Oko, Taiji Suzuki · PDF
Transformers are Minimax Optimal Nonparametric In-Context Learners
Juno Kim, Tai Nakamaki, Taiji Suzuki · PDF
Transformers need glasses! Information over-squashing in language tasks
Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João Guilherme Madeira Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković · PDF
Unavoidable Learning Constraints Alter the Foundations of Direct Preference Optimization
David Wipf · PDF
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He, Lorenzo Noci, Daniele Paliotta, Imanol Schlag, Thomas Hofmann · PDF
Understanding and Mitigating Tokenization Bias in Language Models
Buu Phan, Marton Havasi, Matthew J. Muckley, Karen Ullrich · PDF
Understanding the Role of Equivariance in Self-supervised Learning
Yifei Wang, Kaiwen Hu, Sharut Gupta, Ziyu Ye, Yisen Wang, Stefanie Jegelka · PDF
Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks
Grzegorz Gluch, Sai Ganesh Nagarajan, Berkant Turan · PDF
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Sanae Lotfi, Yilun Kuang, Marc Anton Finzi, Brandon Amos, Micah Goldblum, Andrew Gordon Wilson · PDF
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers
Siyu Chen, Heejune Sheen, Tianhao Wang, Zhuoran Yang · PDF
Zero-Shot Generalization of GNNs over Distinct Attribute Domains
Yangyi Shen, Beatrice Bevilacqua, Joshua Robinson, Charilaos Kanatsoulis, Jure Leskovec, Bruno Ribeiro · PDF

Accepted papers (58)

☆A deeper look at depth pruning of LLMs

☆A Theoretical Understanding of Self-Correction through In-context Alignment

☆Active Preference Optimization for Sample Efficient RLHF

☆Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models

☆Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

☆Decoding-Time Language Model Alignment with Multiple Objectives

☆Detrimental Memories in Transfer Learning

☆Do LLM Agents Have Regret? A Case Study in Online Learning and Games

☆Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

☆Efficient Document Ranking with Learnable Late Interactions

☆Fast Machine Unlearning via Robust Training

☆Fine-Tuning Large Language Models with User-Level Differential Privacy

☆Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

☆Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

☆Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy

☆How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?

☆How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion

☆How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining

☆How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

☆Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models

☆Implicit Optimization Bias of Next-token Prediction in Linear Models

☆Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

☆ImportanceWeighted Multi-Draft Speculative Sampling

☆In-Context Learning from Training on Unstructured Data: The Role of Co-Occurrence, Positional Information, and Training Data Structure

☆In-Context Learning with Representations: Contextual Generalization of Trained Transformers

☆Local to Global: Learning Dynamics and Effect of Initialization for Transformers

☆Meta-optimization for Deep Learning via Nonstochastic Control

☆Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

☆Modeling the Plurality of Human Preferences via Ideal Points

☆Models That Prove Their Own Correctness

☆MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

☆Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?

☆On Provable Length and Compositional Generalization

☆On the Power of Convolution Augmented Transformer

☆One-Shot Safety Alignment for Large Language Models via Optimal Dualization

☆PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

☆Preference Learning Algorithms Do Not Learn Preference Rankings

☆Progressive distillation improves feature learning via implicit curriculum

☆Rethinking Invariance in In-context Learning

☆RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

☆SAIL: Self-improving Efficient Online Alignment of Large Language Models

☆Self-Play Preference Optimization for Language Model Alignment

☆Setting the Record Straight on Transformer Oversmoothing

☆Sparse network initialization using deterministic Ramanujan graphs

☆State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

☆The Geometry of Categorical and Hierarchical Concepts in Large Language Models

☆Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates

☆Transformer Efficiently Learns Low-dimensional Target Functions In-context

☆Transformers are Minimax Optimal Nonparametric In-Context Learners

☆Transformers need glasses! Information over-squashing in language tasks

☆Unavoidable Learning Constraints Alter the Foundations of Direct Preference Optimization

☆Understanding and Minimising Outlier Features in Neural Network Training

☆Understanding and Mitigating Tokenization Bias in Language Models

☆Understanding the Role of Equivariance in Self-supervised Learning

☆Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks

☆Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

☆Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers

☆Zero-Shot Generalization of GNNs over Distinct Attribute Domains

A deeper look at depth pruning of LLMs

A Theoretical Understanding of Self-Correction through In-context Alignment

Active Preference Optimization for Sample Efficient RLHF

Attention Is All You Need But You Don’t Need All Of It For Inference of Large Language Models

Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement

Decoding-Time Language Model Alignment with Multiple Objectives

Detrimental Memories in Transfer Learning

Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Efficient Document Ranking with Learnable Late Interactions

Fast Machine Unlearning via Robust Training

Fine-Tuning Large Language Models with User-Level Differential Privacy

Fundamental Limits of Prompt Compression: A Rate-Distortion Framework for Black-Box Language Models

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Hallmarks of Optimization Trajectories in Neural Networks and LLMs: Directional Exploration and Redundancy

How Do Nonlinear Transformers Acquire Generalization-Guaranteed CoT Ability?

How Do Transformers Fill in the Blanks? A Case Study on Matrix Completion

How Transformers Learn Diverse Attention Correlations in Masked Vision Pretraining

How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression

Implementability of Information Elicitation Mechanisms with Pre-Trained Language Models

Implicit Optimization Bias of Next-token Prediction in Linear Models

Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems

ImportanceWeighted Multi-Draft Speculative Sampling

In-Context Learning from Training on Unstructured Data: The Role of Co-Occurrence, Positional Information, and Training Data Structure

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Local to Global: Learning Dynamics and Effect of Initialization for Transformers

Meta-optimization for Deep Learning via Nonstochastic Control

Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

Modeling the Plurality of Human Preferences via Ideal Points

Models That Prove Their Own Correctness

MSAMamba: Adapting Subquadratic Models To Long-Context DNA MSA Analysis

Multilingual Compression Parity: How Efficiently Large Language Models Represent Information Across Languages?

On Provable Length and Compositional Generalization

On the Power of Convolution Augmented Transformer

One-Shot Safety Alignment for Large Language Models via Optimal Dualization

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Preference Learning Algorithms Do Not Learn Preference Rankings

Progressive distillation improves feature learning via implicit curriculum

Rethinking Invariance in In-context Learning

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation

SAIL: Self-improving Efficient Online Alignment of Large Language Models

Self-Play Preference Optimization for Language Model Alignment

Setting the Record Straight on Transformer Oversmoothing

Sparse network initialization using deterministic Ramanujan graphs

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

Transformer Designs for In-Context Learning in Foundation Models for Time Series Forecasting with Covariates

Transformer Efficiently Learns Low-dimensional Target Functions In-context

Transformers are Minimax Optimal Nonparametric In-Context Learners

Transformers need glasses! Information over-squashing in language tasks

Unavoidable Learning Constraints Alter the Foundations of Direct Preference Optimization

Understanding and Minimising Outlier Features in Neural Network Training

Understanding and Mitigating Tokenization Bias in Language Models

Understanding the Role of Equivariance in Self-supervised Learning

Unified Taxonomy in AI Safety: Watermarks, Adversarial Defenses, and Transferable Attacks

Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models

Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers

Zero-Shot Generalization of GNNs over Distinct Attribute Domains