NeurIPS 2024PastLarge language models

🍃 MINT: Foundation Model Interventions

MINT@NeurIPS2024

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Sep 14, 2024, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (31)

Fetched from OpenReview (v2) on 2026-06-10.

Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
Yushi Yang, Filip Sondej, Harry Mayne, Adam Mahdi · PDF
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru WANG, Xuanli He, Kam-Fai Wong, Pasquale Minervini · PDF
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon, Manish Shrivastava, Ekdeep Singh Lubana, David Krueger · PDF
Can sparse autoencoders be used to decompose and interpret steering vectors?
Harry Mayne, Yushi Yang, Adam Mahdi · PDF
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
Madeline Brumley, Joe Kwon, David Krueger, Dmitrii Krasheninnikov, Usman Anwar · PDF
Decomposing and Editing Predictions by Modeling Model Computation
Harshay Shah, Andrew Ilyas, Aleksander Madry · PDF
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks
Gregory Kang Ruey Lau, Wenyang Hu, Liu Diwen, Chen Jizhuo, See-Kiong Ng, Bryan Kian Hsiang Low · PDF
Do LLMs internally ``know'' when they follow instructions?
Juyeon Heo, Christina Heinze-Deml, Oussama Elachqar, Shirley You Ren, Kwan Ho Ryan Chan, Udhyakumar Nallasamy, Andrew Miller, Jaya Narain · PDF
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
Zexuan Qiu, Zijing Ou, Bin Wu, Jingjing Li, Aiwei Liu, Irwin King · PDF
Extracting Paragraphs from LLM Token Activations
Nicky Pochinkov, Angelo Benoit, Lovkush Agarwal, Zainab Ali Majid, Lucile Ter-Minassian · PDF
GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens
Sonakshi Chauhan, Atticus Geiger · PDF
Is Free Self-Alignment Possible?
Dyah Adila, Changho Shin, Yijing Zhang, Frederic Sala · PDF
Linearly Controlled Language Generation with Performative Guarantees
Emily Cheng, Marco Baroni, Carmen Amo Alonso · PDF
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
Xinyu Zhou, Delong Chen, Samuel Cahyawijaya, Xufeng Duan, Zhenguang Cai · PDF
LoFiT: Localized Fine-tuning on LLM Representations
Fangcong Yin, Xi Ye, Greg Durrett · PDF
Pay Attention to What Matters
Pedro Luiz Silva, Fadhel Ayed, Antonio De Domenico, Ali Maatouk · PDF
Probing the Decision Boundaries of In-context Learning in Large Language Models
Siyan Zhao, Tung Nguyen, Aditya Grover · PDF
Representation Tuning
Christopher Ackerman · PDF
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
Carter Teplica, Yixin Liu, Arman Cohan, Tim G. J. Rudner · PDF
Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu, Lingzhi Zhang, Jianbo Shi · PDF
Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs
Jiatong Han, Jannik Kossen, Muhammed Razzak, Yarin Gal · PDF
Steering Clear: A Systematic Study of Activation Steering in a Toy Setup
Dmitrii Krasheninnikov, David Krueger · PDF
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
Joris Postmus, Steven Abreu · PDF
Steering semantic search with interpretable features from sparse autoencoders
Christine Ye, Charles O'Neill, John F Wu, Kartheik G. Iyer · PDF
Toward Explanation Bottleneck Models
Shin'ya Yamaguchi, Kosuke Nishida · PDF
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
Itamar Pres, Laura Ruis, Ekdeep Singh Lubana, David Krueger · PDF
Uncovering Uncertainty in Transformer Inference
Greyson Brothers, Willa M. Mannering, John Winder, Amber Tien · PDF
Understanding Visual Concepts Across Models
Brandon Trabucco, Max A Gurinas, Kyle Doherty, Russ Salakhutdinov · PDF
Unveiling and Manipulating Concepts in Time Series Foundation Models
Michał Wiliński, Mononito Goswami, Nina Żukowska, Willa Potosnak, Artur Dubrawski · PDF
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
Peng Wang, Zexi Li, Ningyu Zhang, Ziwen Xu, Yunzhi Yao, Yong Jiang, Pengjun Xie, Fei Huang, Huajun Chen · PDF
Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering
Ido Sobol, Chenfeng Xu, Or Litany · PDF

Accepted papers (31)

☆Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

☆Analysing the Residual Stream of Language Models Under Knowledge Conflicts

☆Analyzing (In)Abilities of SAEs via Formal Languages

☆Can sparse autoencoders be used to decompose and interpret steering vectors?

☆Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

☆Decomposing and Editing Predictions by Modeling Model Computation

☆Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

☆Do LLMs internally ``know'' when they follow instructions?

☆Entropy-Based Decoding for Retrieval-Augmented Large Language Models

☆Extracting Paragraphs from LLM Token Activations

☆GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens

☆Is Free Self-Alignment Possible?

☆Linearly Controlled Language Generation with Performative Guarantees

☆Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

☆LoFiT: Localized Fine-tuning on LLM Representations

☆Pay Attention to What Matters

☆Probing the Decision Boundaries of In-context Learning in Large Language Models

☆Representation Tuning

☆SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

☆Secret Seeds in Text-to-Image Diffusion Models

☆Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

☆Steering Clear: A Systematic Study of Activation Steering in a Toy Setup

☆Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering

☆Steering semantic search with interpretable features from sparse autoencoders

☆Toward Explanation Bottleneck Models

☆Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

☆Uncovering Uncertainty in Transformer Inference

☆Understanding Visual Concepts Across Models

☆Unveiling and Manipulating Concepts in Time Series Foundation Models

☆WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

☆Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering

Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

Analysing the Residual Stream of Language Models Under Knowledge Conflicts

Analyzing (In)Abilities of SAEs via Formal Languages

Can sparse autoencoders be used to decompose and interpret steering vectors?

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

Decomposing and Editing Predictions by Modeling Model Computation

Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks

Do LLMs internally ``know'' when they follow instructions?

Entropy-Based Decoding for Retrieval-Augmented Large Language Models

Extracting Paragraphs from LLM Token Activations

GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens

Is Free Self-Alignment Possible?

Linearly Controlled Language Generation with Performative Guarantees

Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models

LoFiT: Localized Fine-tuning on LLM Representations

Pay Attention to What Matters

Probing the Decision Boundaries of In-context Learning in Large Language Models

Representation Tuning

SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

Secret Seeds in Text-to-Image Diffusion Models

Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs

Steering Clear: A Systematic Study of Activation Steering in a Toy Setup

Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering

Steering semantic search with interpretable features from sparse autoencoders

Toward Explanation Bottleneck Models

Towards Reliable Evaluation of Behavior Steering Interventions in LLMs

Uncovering Uncertainty in Transformer Inference

Understanding Visual Concepts Across Models

Unveiling and Manipulating Concepts in Time Series Foundation Models

WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models

Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering