NeurIPS 2024 Past Large language models

🍃 MINT: Foundation Model Interventions

MINT@NeurIPS2024

Submission deadline
Sep 14, 2024, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (31)

Fetched from OpenReview (v2) on 2026-06-10.

  1. Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

    Yushi Yang, Filip Sondej, Harry Mayne, Adam Mahdi · PDF
  2. Analysing the Residual Stream of Language Models Under Knowledge Conflicts

    Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru WANG, Xuanli He, Kam-Fai Wong, Pasquale Minervini · PDF
  3. Analyzing (In)Abilities of SAEs via Formal Languages

    Abhinav Menon, Manish Shrivastava, Ekdeep Singh Lubana, David Krueger · PDF
  4. Can sparse autoencoders be used to decompose and interpret steering vectors?

    Harry Mayne, Yushi Yang, Adam Mahdi · PDF
  5. Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

    Madeline Brumley, Joe Kwon, David Krueger, Dmitrii Krasheninnikov, Usman Anwar · PDF
  6. Decomposing and Editing Predictions by Modeling Model Computation

    Harshay Shah, Andrew Ilyas, Aleksander Madry · PDF
  7. Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

    Gregory Kang Ruey Lau, Wenyang Hu, Liu Diwen, Chen Jizhuo, See-Kiong Ng, Bryan Kian Hsiang Low · PDF
  8. Do LLMs internally ``know'' when they follow instructions?

    Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar, Shirley You Ren, Kwan Ho Ryan Chan, Udhyakumar Nallasamy, Andrew Miller, Jaya Narain · PDF
  9. Entropy-Based Decoding for Retrieval-Augmented Large Language Models

    Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King · PDF
  10. Extracting Paragraphs from LLM Token Activations

    Nicky Pochinkov, Angelo Benoit, Lovkush Agarwal, Zainab Ali Majid, Lucile Ter-Minassian · PDF
  11. GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens

    Sonakshi Chauhan, Atticus Geiger · PDF
  12. Is Free Self-Alignment Possible?

    Dyah Adila, Changho Shin, Yijing Zhang, Frederic Sala · PDF
  13. Linearly Controlled Language Generation with Performative Guarantees

    Emily Cheng, Marco Baroni, Carmen Amo Alonso · PDF
  14. Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

    Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang Cai · PDF
  15. LoFiT: Localized Fine-tuning on LLM Representations

    Fangcong Yin, Xi Ye, Greg Durrett · PDF
  16. Pay Attention to What Matters

    Pedro Luiz Silva, Fadhel Ayed, Antonio De Domenico, Ali Maatouk · PDF
  17. Probing the Decision Boundaries of In-context Learning in Large Language Models

    Siyan Zhao, Tung Nguyen, Aditya Grover · PDF
  18. Representation Tuning

    Christopher Ackerman · PDF
  19. SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

    Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner · PDF
  20. Secret Seeds in Text-to-Image Diffusion Models

    Katherine Xu, Lingzhi Zhang, Jianbo Shi · PDF
  21. Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

    Jiatong Han, Jannik Kossen, Muhammed Razzak, Yarin Gal · PDF
  22. Steering Clear: A Systematic Study of Activation Steering in a Toy Setup

    Dmitrii Krasheninnikov, David Krueger · PDF
  23. Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering

    Joris Postmus, Steven Abreu · PDF
  24. Steering semantic search with interpretable features from sparse autoencoders

    Christine Ye, Charles O'Neill, John F Wu, Kartheik G. Iyer · PDF
  25. Toward Explanation Bottleneck Models

    Shin'ya Yamaguchi, Kosuke Nishida · PDF
  26. Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

    Itamar Pres, Laura Ruis, Ekdeep Singh Lubana, David Krueger · PDF
  27. Uncovering Uncertainty in Transformer Inference

    Greyson Brothers, Willa M. Mannering, John Winder, Amber Tien · PDF
  28. Understanding Visual Concepts Across Models

    Brandon Trabucco, Max A Gurinas, Kyle Doherty, Russ Salakhutdinov · PDF
  29. Unveiling and Manipulating Concepts in Time Series Foundation Models

    Michał Wiliński, Mononito Goswami, Nina Żukowska, Willa Potosnak, Artur Dubrawski · PDF
  30. WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

    Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen · PDF
  31. Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

    Ido Sobol, Chenfeng Xu, Or Litany · PDF