ICLR 2024 Past Math & reasoningLarge language models

ICLR 2024 Workshop on Mathematical and Empirical Understanding of Foundation Models

ME-FoMo 2024

Submission deadline
Feb 4, 2024, 12:30 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (71)

Fetched from OpenReview (v2) on 2026-06-10.

  1. "I'm not Racist but…": Discovering Bias in the Internal Knowledge of Large Language Models

    · PDF
  2. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

    · PDF
  3. Asymmetry in Low-Rank Adapters of Foundation Models

    Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon · PDF
  4. Attributing Mode Collapse in the Fine-Tuning of Large Language Models

    · PDF
  5. Backward Chaining Circuits in a Transformer Trained on a Symbolic Reasoning Task

    Jannik Brinkmann, Abhay Sheshadri, Victor Levoso, Paul Swoboda, Christian Bartelt · PDF
  6. Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

    · PDF
  7. Best Arm Identification for Prompt Learning under a Limited Budget

    · PDF
  8. BlackMamba: Mixture of Experts for State-Space Models

    · PDF
  9. Can Generative Multimodal Models Count to Ten?

    · PDF
  10. Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks

    · PDF
  11. Concept-aware Data Construction Improves In-context Learning of Language Models

    · PDF
  12. Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-tuned LLMs

    · PDF
  13. Do Diffusion Models Learn Semantically Meaningful and Efficient Representations?

    · PDF
  14. Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

    Zhuoyan Xu, Zhenmei Shi, Yingyu Liang · PDF
  15. Does Data Contamination Make a Difference? Insights from Intentionally Contamination Pre-training Data For Language Models

    · PDF
  16. Dual Operating Modes of In-Context Learning

    · PDF
  17. Editing Large Language Models: Problems, Methods, and Opportunities

    · PDF
  18. Eliciting Latent Knowledge from Quirky Language Models

    · PDF
  19. Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems

    · PDF
  20. Few-Shot Dual-Path Adaptation of Vision-Language Foundation Models

    · PDF
  21. Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic

    · PDF
  22. Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study

    · PDF
  23. GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks

    · PDF
  24. How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation

    Zhongyi Han, Guanglin Zhou, Rundong He, Jindong Wang, Tailin Wu, Yilong Yin, Salman Khan, Lina Yao, Tongliang Liu, Kun Zhang · PDF
  25. In-Context Data Distillation with TabPFN

    · PDF
  26. Investigating Uncertainty Calibration of Aligned Language Models under the Multiple-Choice Setting

    · PDF
  27. Is Mamba Capable of In-Context Learning?

    · PDF
  28. Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-constraint

    · PDF
  29. LangBridge: Multilingual Reasoning Without Multilingual Supervision

    · PDF
  30. Linear Alignment of Vision-language Models for Image Captioning

    · PDF
  31. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression

    · PDF
  32. Massive Activations in Large Language Models

    Mingjie Sun, Xinlei Chen, J Zico Kolter, Zhuang Liu · PDF
  33. MathSensei: Mathematical Reasoning with a Tool-Augmented Large Language Model

    · PDF
  34. MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

    · PDF
  35. Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?

    · PDF
  36. On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models

    Ken Liu, Zhoujie Ding, Berivan Isik, Sanmi Koyejo · PDF
  37. On provable length and compositional generalization

    · PDF
  38. On the Representation Gap Between Modern RNNs and Transformers: The Curse of Memory Efficiency and the Fix of In-Context Retrieval

    Kaiyue Wen, Xingyu Dang, Kaifeng Lyu · PDF
  39. ORCHID: FLEXIBLE AND DATA-DEPENDENT CONVO- LUTION FOR SEQUENCE MODELING

    · PDF
  40. Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization

    Elan Rosenfeld, Andrej Risteski · PDF
  41. Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models

    Zachary Ankner, Cody Blakeney, Kartik Sreenivasan, Max Marion, Matthew L Leavitt, Mansheej Paul · PDF
  42. Pre-training and In-context Learning IS Bayesian Inference a la De Finetti

    · PDF
  43. Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

    · PDF
  44. Preserving Principal Subspaces to Reduce Catastrophic Forgetting in Fine-tuning

    · PDF
  45. Prometheus-Vision: Vision-Language Model as a Judge for Fine-Grained Evaluation

    · PDF
  46. Prompting a Pretrained Transformer Can Be a Universal Approximator

    · PDF
  47. Provably Robust DPO: Aligning Language Models with Noisy Feedback

    · PDF
  48. Quantified Task Misalignment to Inform PEFT: An Exploration of Domain Generalization and Catastrophic Forgetting in CLIP

    · PDF
  49. QuRating: Selecting High-Quality Data for Training Language Models

    · PDF
  50. Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models

    Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein · PDF
  51. Scalable Ensembling For Mitigating Reward Overoptimisation

    · PDF
  52. Scaling Laws for Downstream Task Performance of Large Language Models

    Berivan Isik, Natalia Ponomareva, Hussein Hazimeh, Dimitris Paparas, Sergei Vassilvitskii, Sanmi Koyejo · PDF
  53. Scaling Laws for Fine-Grained Mixture of Experts

    · PDF
  54. Selecting Large Language Model to Fine-tune via Rectified Scaling Law

    Haowei Lin, Baizhou Huang, Haotian Ye, Qinyu Chen, Zihao Wang, Sujian Li, Jianzhu Ma, Xiaojun Wan, James Zou, Yitao Liang · PDF
  55. Self-Supervised Open-Ended Classification with Small Visual Language Models

    · PDF
  56. ShERPA: Leveraging Neuron Alignment for Knowledge-preserving Fine-tuning

    · PDF
  57. Shortened LLaMA: A Simple Depth Pruning for Large Language Models

    · PDF
  58. Simple linear attention language models balance the recall-throughput tradeoff

    · PDF
  59. SparQ Attention: Bandwidth-Efficient LLM Inference

    · PDF
  60. The Effect of Model Capacity on the Emergence of In-Context Learning

    · PDF
  61. tinyBenchmarks: evaluating LLMs with fewer examples

    · PDF
  62. Towards an empirical understanding of Mixture of Experts Design Choices

    · PDF
  63. Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations

    Rylan Schaeffer, Berivan Isik, Dhruv Bhandarkar Pai, Andres Carranza, Victor Lecomte, Alyssa Unell, Mikail Khona, Thomas Edward Yerxa, Yann LeCun, SueYeon Chung, Andrey Gromov, Ravid Shwartz-Ziv, Sanmi Koyejo · PDF
  64. Transformers Can Achieve Length Generalization But Not Robustly

    Yongchao Zhou, Uri Alon, Xinyun Chen, Xuezhi Wang, Rishabh Agarwal, Denny Zhou · PDF
  65. Transformers Learn Nonlinear Features In Context

    Juno Kim, Taiji Suzuki · PDF
  66. Transformers' Spectral Bias and The Symmetric Group

    · PDF
  67. Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning

    · PDF
  68. Understanding and Improving In-Context Learning on Vision-language Models

    · PDF
  69. Unsupervised Domain Adaptation within Deep Foundation Latent Spaces

    · PDF
  70. What makes vision transformers robust towards bit-flip attack?

    · PDF
  71. Zero-Shot Recognition with Guided Cropping

    · PDF