NeurIPS 2024 Past Large language models
🍃 MINT: Foundation Model Interventions
MINT@NeurIPS2024
- Submission deadline
- Sep 14, 2024, 12:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (31)
Fetched from OpenReview (v2) on 2026-06-10.
-
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
-
Analysing the Residual Stream of Language Models Under Knowledge Conflicts
-
Analyzing (In)Abilities of SAEs via Formal Languages
-
Can sparse autoencoders be used to decompose and interpret steering vectors?
-
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
-
Decomposing and Editing Predictions by Modeling Model Computation
-
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasks
-
Do LLMs internally ``know'' when they follow instructions?
-
Entropy-Based Decoding for Retrieval-Augmented Large Language Models
-
Extracting Paragraphs from LLM Token Activations
-
GPT-2 Small Fine-Tuned on Logical Reasoning Summarizes Information on Punctuation Tokens
-
Is Free Self-Alignment Possible?
-
Linearly Controlled Language Generation with Performative Guarantees
-
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models
-
LoFiT: Localized Fine-tuning on LLM Representations
-
Pay Attention to What Matters
-
Probing the Decision Boundaries of In-context Learning in Large Language Models
-
Representation Tuning
-
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
-
Secret Seeds in Text-to-Image Diffusion Models
-
Semantic Entropy Neurons: Encoding Semantic Uncertainty in the Latent Space of LLMs
-
Steering Clear: A Systematic Study of Activation Steering in a Toy Setup
-
Steering Large Language Models using Conceptors: Improving Addition-Based Activation Engineering
-
Steering semantic search with interpretable features from sparse autoencoders
-
Toward Explanation Bottleneck Models
-
Towards Reliable Evaluation of Behavior Steering Interventions in LLMs
-
Uncovering Uncertainty in Transformer Inference
-
Understanding Visual Concepts Across Models
-
Unveiling and Manipulating Concepts in Time Series Foundation Models
-
WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models
-
Zero-to-Hero: Enhancing Zero-Shot Novel View Synthesis via Attention Map Filtering