ICLR 2024PastMath & reasoningLarge language models

ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models

ME-FoMo 2024

Official website ↗OpenReview venue ↗See all ICLR workshops →✎ Edit this entry

Submission deadline: Feb 4, 2024, 12:30 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (71)

Fetched from OpenReview (v2) on 2026-06-10.

"I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models
· PDF
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
· PDF
Asymmetry in Low-Rank Adapters of Foundation Models
Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon · PDF
Attributing Mode Collapse in the Fine-Tuning of Large Language Models
· PDF
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task
Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda, Christian Bartelt · PDF
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
· PDF
Best Arm Identification for Prompt Learning under a Limited Budget
· PDF
BlackMamba: Mixture of Experts for State-Space Models
· PDF
Can Generative Multimodal Models Count to Ten?
· PDF
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
· PDF
Concept-aware Data Construction Improves In-context Learning of Language Models
· PDF
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs
· PDF
Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?
· PDF
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
Zhuoyan Xu, Zhenmei Shi, Yingyu Liang · PDF
Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models
· PDF
Dual Operating Modes of In-Context Learning
· PDF
Editing Large Language Models: Problems, Methods, and Opportunities
· PDF
Eliciting Latent Knowledge from Quirky Language Models
· PDF
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
· PDF
Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models
· PDF
Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic
· PDF
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
· PDF
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
· PDF
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation
Zhongyi Han, Guanglin Zhou, Rundong He, Jindong Wang, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang · PDF
In-Context Data Distillation with TabPFN
· PDF
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
· PDF
Is Mamba Capable of In-Context Learning?
· PDF
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
· PDF
LangBridge: Multilingual Reasoning Without Multilingual Supervision
· PDF
Linear Alignment of Vision-language Models for Image Captioning
· PDF
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
· PDF
Massive Activations in Large Language Models
Mingjie Sun, Xinlei Chen, J Zico Kolter, Zhuang Liu · PDF
MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model
· PDF
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
· PDF
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
· PDF
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models
Ken Liu, Zhoujie Ding, Berivan Isik, Sanmi Koyejo · PDF
On provable length and compositional generalization
· PDF
On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval
Kaiyue Wen, Xingyu Dang, Kaifeng Lyu · PDF
ORCHID: FLEXIBLE AND DATA-DEPENDENT CONVO- LUTION FOR SEQUENCE MODELING
· PDF
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld, Andrej Risteski · PDF
Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models
Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L Leavitt, Mansheej Paul · PDF
Pre-training and In-context Learning IS Bayesian Inference a la De Finetti
· PDF
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
· PDF
Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning
· PDF
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
· PDF
Prompting a Pretrained Transformer Can Be a Universal Approximator
· PDF
Provably Robust DPO: Aligning Language Models with Noisy Feedback
· PDF
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
· PDF
QuRating: Selecting High-Quality Data for Training Language Models
· PDF
Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein · PDF
Scalable Ensembling For Mitigating Reward Overoptimisation
· PDF
Scaling Laws for Downstream Task Performance of Large Language Models
Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo · PDF
Scaling Laws for Fine-Grained Mixture of Experts
· PDF
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang · PDF
Self-Supervised Open-Ended Classification with Small Visual Language Models
· PDF
ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning
· PDF
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
· PDF
Simple linear attention language models balance the recall-throughput tradeoff
· PDF
SparQ Attention: Bandwidth-Efficient LLM Inference
· PDF
The Effect of Model Capacity on the Emergence of In-Context Learning
· PDF
tinyBenchmarks: evaluating LLMs with fewer examples
· PDF
Towards an empirical understanding of Mixture of Experts Design Choices
· PDF
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
Rylan Schaeffer, Berivan Isik, Dhruv Bhandarkar Pai, Andres Carranza, Victor Lecomte, Alyssa Unell, Mikail Khona, Thomas Edward Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo · PDF
Transformers Can Achieve Length Generalization But Not Robustly
Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou · PDF
Transformers Learn Nonlinear Features In Context
Juno Kim, Taiji Suzuki · PDF
Transformers' Spectral Bias and The Symmetric Group
· PDF
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning
· PDF
Understanding and Improving In-Context Learning on Vision-language Models
· PDF
Unsupervised Domain Adaptation within Deep Foundation Latent Spaces
· PDF
What makes vision transformers robust towards bit-flip attack?
· PDF
Zero-Shot Recognition with Guided Cropping
· PDF

Accepted papers (71)

☆"I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models

☆Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

☆Asymmetry in Low-Rank Adapters of Foundation Models

☆Attributing Mode Collapse in the Fine-Tuning of Large Language Models

☆Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task

☆Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

☆Best Arm Identification for Prompt Learning under a Limited Budget

☆BlackMamba: Mixture of Experts for State-Space Models

☆Can Generative Multimodal Models Count to Ten?

☆Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

☆Concept-aware Data Construction Improves In-context Learning of Language Models

☆Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs

☆Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

☆Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

☆Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models

☆Dual Operating Modes of In-Context Learning

☆Editing Large Language Models: Problems, Methods, and Opportunities

☆Eliciting Latent Knowledge from Quirky Language Models

☆Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

☆Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models

☆Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

☆Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study

☆GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks

☆How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

☆In-Context Data Distillation with TabPFN

☆Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

☆Is Mamba Capable of In-Context Learning?

☆Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

☆LangBridge: Multilingual Reasoning Without Multilingual Supervision

☆Linear Alignment of Vision-language Models for Image Captioning

☆LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

☆Massive Activations in Large Language Models

☆MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model

☆MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

☆Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

☆On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models

☆On provable length and compositional generalization

☆On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

☆ORCHID: FLEXIBLE AND DATA-DEPENDENT CONVO- LUTION FOR SEQUENCE MODELING

☆Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

☆Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models

☆Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

☆Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

☆Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning

☆Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

☆Prompting a Pretrained Transformer Can Be a Universal Approximator

☆Provably Robust DPO: Aligning Language Models with Noisy Feedback

☆Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP

☆QuRating: Selecting High-Quality Data for Training Language Models

☆Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models

☆Scalable Ensembling For Mitigating Reward Overoptimisation

☆Scaling Laws for Downstream Task Performance of Large Language Models

☆Scaling Laws for Fine-Grained Mixture of Experts

☆Selecting Large Language Model to Fine-tune via Rectified Scaling Law

☆Self-Supervised Open-Ended Classification with Small Visual Language Models

☆ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning

☆Shortened LLaMA: A Simple Depth Pruning for Large Language Models

☆Simple linear attention language models balance the recall-throughput tradeoff

☆SparQ Attention: Bandwidth-Efficient LLM Inference

☆The Effect of Model Capacity on the Emergence of In-Context Learning

☆tinyBenchmarks: evaluating LLMs with fewer examples

☆Towards an empirical understanding of Mixture of Experts Design Choices

☆Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

☆Transformers Can Achieve Length Generalization But Not Robustly

☆Transformers Learn Nonlinear Features In Context

☆Transformers' Spectral Bias and The Symmetric Group

☆Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning

☆Understanding and Improving In-Context Learning on Vision-language Models

☆Unsupervised Domain Adaptation within Deep Foundation Latent Spaces

☆What makes vision transformers robust towards bit-flip attack?

☆Zero-Shot Recognition with Guided Cropping

"I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Asymmetry in Low-Rank Adapters of Foundation Models

Attributing Mode Collapse in the Fine-Tuning of Large Language Models

Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

Best Arm Identification for Prompt Learning under a Limited Budget

BlackMamba: Mixture of Experts for State-Space Models

Can Generative Multimodal Models Count to Ten?

Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

Concept-aware Data Construction Improves In-context Learning of Language Models

Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs

Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models

Dual Operating Modes of In-Context Learning

Editing Large Language Models: Problems, Methods, and Opportunities

Eliciting Latent Knowledge from Quirky Language Models

Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models

Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study

GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks

How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

In-Context Data Distillation with TabPFN

Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

Is Mamba Capable of In-Context Learning?

Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

LangBridge: Multilingual Reasoning Without Multilingual Supervision

Linear Alignment of Vision-language Models for Image Captioning

LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

Massive Activations in Large Language Models

MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models

On provable length and compositional generalization

On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

ORCHID: FLEXIBLE AND DATA-DEPENDENT CONVO- LUTION FOR SEQUENCE MODELING

Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models

Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning

Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

Prompting a Pretrained Transformer Can Be a Universal Approximator

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP

QuRating: Selecting High-Quality Data for Training Language Models

Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models

Scalable Ensembling For Mitigating Reward Overoptimisation

Scaling Laws for Downstream Task Performance of Large Language Models

Scaling Laws for Fine-Grained Mixture of Experts

Selecting Large Language Model to Fine-tune via Rectified Scaling Law

Self-Supervised Open-Ended Classification with Small Visual Language Models

ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning

Shortened LLaMA: A Simple Depth Pruning for Large Language Models

Simple linear attention language models balance the recall-throughput tradeoff

SparQ Attention: Bandwidth-Efficient LLM Inference

The Effect of Model Capacity on the Emergence of In-Context Learning

tinyBenchmarks: evaluating LLMs with fewer examples

Towards an empirical understanding of Mixture of Experts Design Choices

Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

Transformers Can Achieve Length Generalization But Not Robustly

Transformers Learn Nonlinear Features In Context

Transformers' Spectral Bias and The Symmetric Group

Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning

Understanding and Improving In-Context Learning on Vision-language Models

Unsupervised Domain Adaptation within Deep Foundation Latent Spaces

What makes vision transformers robust towards bit-flip attack?

Zero-Shot Recognition with Guided Cropping