ICLR 2024 Past Math & reasoningLarge language models
ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models
ME-FoMo 2024
- Submission deadline
- Feb 4, 2024, 12:30 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (71)
Fetched from OpenReview (v2) on 2026-06-10.
-
"I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models
-
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
-
Asymmetry in Low-Rank Adapters of Foundation Models
-
Attributing Mode Collapse in the Fine-Tuning of Large Language Models
-
Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task
-
Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT
-
Best Arm Identification for Prompt Learning under a Limited Budget
-
BlackMamba: Mixture of Experts for State-Space Models
-
Can Generative Multimodal Models Count to Ten?
-
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
-
Concept-aware Data Construction Improves In-context Learning of Language Models
-
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs
-
Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?
-
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability
-
Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models
-
Dual Operating Modes of In-Context Learning
-
Editing Large Language Models: Problems, Methods, and Opportunities
-
Eliciting Latent Knowledge from Quirky Language Models
-
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems
-
Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models
-
Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic
-
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study
-
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
-
How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation
-
In-Context Data Distillation with TabPFN
-
Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting
-
Is Mamba Capable of In-Context Learning?
-
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint
-
LangBridge: Multilingual Reasoning Without Multilingual Supervision
-
Linear Alignment of Vision-language Models for Image Captioning
-
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression
-
Massive Activations in Large Language Models
-
MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model
-
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
-
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
-
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models
-
On provable length and compositional generalization
-
On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval
-
ORCHID: FLEXIBLE AND DATA-DEPENDENT CONVO- LUTION FOR SEQUENCE MODELING
-
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
-
Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models
-
Pre-training and In-context Learning IS Bayesian Inference a la De Finetti
-
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
-
Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning
-
Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation
-
Prompting a Pretrained Transformer Can Be a Universal Approximator
-
Provably Robust DPO: Aligning Language Models with Noisy Feedback
-
Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP
-
QuRating: Selecting High-Quality Data for Training Language Models
-
Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models
-
Scalable Ensembling For Mitigating Reward Overoptimisation
-
Scaling Laws for Downstream Task Performance of Large Language Models
-
Scaling Laws for Fine-Grained Mixture of Experts
-
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
-
Self-Supervised Open-Ended Classification with Small Visual Language Models
-
ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning
-
Shortened LLaMA: A Simple Depth Pruning for Large Language Models
-
Simple linear attention language models balance the recall-throughput tradeoff
-
SparQ Attention: Bandwidth-Efficient LLM Inference
-
The Effect of Model Capacity on the Emergence of In-Context Learning
-
tinyBenchmarks: evaluating LLMs with fewer examples
-
Towards an empirical understanding of Mixture of Experts Design Choices
-
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
-
Transformers Can Achieve Length Generalization But Not Robustly
-
Transformers Learn Nonlinear Features In Context
-
Transformers' Spectral Bias and The Symmetric Group
-
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning
-
Understanding and Improving In-Context Learning on Vision-language Models
-
Unsupervised Domain Adaptation within Deep Foundation Latent Spaces
-
What makes vision transformers robust towards bit-flip attack?
-
Zero-Shot Recognition with Guided Cropping