NeurIPS 2024 Past Other
Second NeurIPS Workshop on Attributing Model Behavior at Scale
ATTRIB 2024
- Submission deadline
- Oct 5, 2024, 12:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (72)
Fetched from OpenReview (v2) on 2026-06-10.
-
$\Delta$-Influence: Unlearning Poisons via Influence Functions
-
$\texttt{dattri}$: A Library for Efficient Data Attribution
-
A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification
-
A Versatile Influence Function for Data Attribution with Non-Decomposable Loss
-
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction
-
Accumulated Local Effects for Link Prediction with Graph Neural Networks
-
Accumulating Data Avoids Model Collapse
-
Activation Monitoring: Advantages of Using Internal Representations for LLM Oversight
-
Adversarial Attacks on Data Attribution
-
Algorithmic Phase Transitions in Large Language Models: A Mechanistic Case Study of Arithmetic
-
Approximations to worst-case data dropping: unmasking failure modes
-
Attributing Statistics to Synthesis Quality in Correlation-Based Texture Models
-
BAKU: An Efficient Transformer for Multi-Task Policy Learning
-
Better Counterfactual Model Reasoning with Submodular Quadratic Component Models
-
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features
-
Bias Analysis for Unconditional Image Generative Models
-
Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations
-
Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models
-
Data Attribution for Multitask Learning
-
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs
-
Detecting Origin Attribution for Text-to-Image Diffusion Models in RGB and Beyond
-
Efficient Ensembles Improve Training Data Attribution
-
Evaluating Sparse Autoencoders for Controlling Open-Ended Text Generation
-
Evaluating Sparse Autoencoders on Targeted Concept Removal Tasks
-
Evaluating Synthetic Activations composed of SAE Latents in GPT-2
-
Evolution of SAE Features Across Layers in LLMs
-
Feature Responsiveness Scores: Model-Agnostic Explanations for Agency
-
Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods
-
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
-
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
-
Generalized Group Data Attribution
-
GPT-2 Through the Lens of Vector Symbolic Architectures
-
GRADE: A Fine-grained Approach to Measure Sample Diversity in Text-to-Image Models
-
Hessian Sets: Uncovering Feature Interactions in Image Classification
-
How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold
-
How much can we forget about Data Contamination?
-
In Search of Forgotten Domain Generalization
-
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples
-
Inductive Linguistic Reasoning with Large Language Models
-
Influence Functions for Scalable Data Attribution in Diffusion Models
-
Influence-based Attributions can be Manipulated
-
Investigating Language Model Dynamics using Meta-Tokens
-
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs
-
Just Select Twice: Leveraging Low Quality Data to Improve Data Selection
-
Latent Concept-based Explanation of NLP Models
-
Loss-to-Loss Prediction: Language model scaling laws across datasets
-
Most Influential Subset Selection: Challenges, Promises, and Beyond
-
On Linear Representations and Pretraining Data Frequency in Language Models
-
Peter Parker or Spiderman? Disambiguating Multiple Class Labels
-
Pruning-based Data Selection and Network Fusion for Efficient Deep Learning
-
Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond
-
Quantifying Positional Biases in Text Embedding Models
-
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
-
SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models
-
Secret Seeds in Text-to-Image Diffusion Models
-
Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale
-
The Association Between Training Data and Text-to-Image Generation Capabilities
-
Toward Optimal Search and Retrieval for RAG
-
Towards a Mechanistic Explanation of Diffusion Model Generalization
-
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison
-
Training on the Test Task Confounds Evaluation and Emergence
-
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models
-
Understanding Compute-Parameter Trade-offs in Sparse Mixture-of-Expert Language Models
-
Understanding the Sources of Performance in Deep Drug Response Models
-
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
-
Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
-
Weak-to-Strong Confidence Prediction
-
Weak-to-Strong In-Context Optimization of Language Model Reasoning
-
What do Learning Dynamics Reveal about Generalization in LLM Reasoning?
-
What's In My Big Data?
-
When Attention Sink Emerges in Language Models: An Empirical View
-
You can remove GPT2's LayerNorm by fine-tuning