NeurIPS 2024 Past Other

Second NeurIPS Workshop on Attributing Model Behavior at Scale

ATTRIB 2024

Submission deadline
Oct 5, 2024, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (72)

Fetched from OpenReview (v2) on 2026-06-10.

  1. $\Delta$-Influence: Unlearning Poisons via Influence Functions

    · PDF
  2. $\texttt{dattri}$: A Library for Efficient Data Attribution

    · PDF
  3. A Comparative Study of Translation Bias and Accuracy in Multilingual Large Language Models for Cross-Language Claim Verification

    · PDF
  4. A Versatile Influence Function for Data Attribution with Non-Decomposable Loss

    · PDF
  5. Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction

    · PDF
  6. Accumulated Local Effects for Link Prediction with Graph Neural Networks

    · PDF
  7. Accumulating Data Avoids Model Collapse

    · PDF
  8. Activation Monitoring: Advantages of Using Internal Representations for LLM Oversight

    · PDF
  9. Adversarial Attacks on Data Attribution

    · PDF
  10. Algorithmic Phase Transitions in Large Language Models: A Mechanistic Case Study of Arithmetic

    · PDF
  11. Approximations to worst-case data dropping: unmasking failure modes

    · PDF
  12. Attributing Statistics to Synthesis Quality in Correlation-Based Texture Models

    · PDF
  13. BAKU: An Efficient Transformer for Multi-Task Policy Learning

    · PDF
  14. Better Counterfactual Model Reasoning with Submodular Quadratic Component Models

    · PDF
  15. Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features

    · PDF
  16. Bias Analysis for Unconditional Image Generative Models

    · PDF
  17. Common Functional Decompositions Can Mis-attribute Differences in Outcomes Between Populations

    · PDF
  18. Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models

    · PDF
  19. Data Attribution for Multitask Learning

    · PDF
  20. Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in LLMs

    · PDF
  21. Detecting Origin Attribution for Text-to-Image Diffusion Models in RGB and Beyond

    · PDF
  22. Efficient Ensembles Improve Training Data Attribution

    · PDF
  23. Evaluating Sparse Autoencoders for Controlling Open-Ended Text Generation

    · PDF
  24. Evaluating Sparse Autoencoders on Targeted Concept Removal Tasks

    · PDF
  25. Evaluating Synthetic Activations composed of SAE Latents in GPT-2

    · PDF
  26. Evolution of SAE Features Across Layers in LLMs

    · PDF
  27. Feature Responsiveness Scores: Model-Agnostic Explanations for Agency

    · PDF
  28. Final-Model-Only Data Attribution with a Unifying View of Gradient-Based Methods

    · PDF
  29. From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

    · PDF
  30. Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data

    · PDF
  31. Generalized Group Data Attribution

    · PDF
  32. GPT-2 Through the Lens of Vector Symbolic Architectures

    · PDF
  33. GRADE: A Fine-grained Approach to Measure Sample Diversity in Text-to-Image Models

    · PDF
  34. Hessian Sets: Uncovering Feature Interactions in Image Classification

    · PDF
  35. How Many Van Goghs Does It Take to Van Gogh? Finding the Imitation Threshold

    · PDF
  36. How much can we forget about Data Contamination?

    · PDF
  37. In Search of Forgotten Domain Generalization

    · PDF
  38. Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples

    · PDF
  39. Inductive Linguistic Reasoning with Large Language Models

    · PDF
  40. Influence Functions for Scalable Data Attribution in Diffusion Models

    · PDF
  41. Influence-based Attributions can be Manipulated

    · PDF
  42. Investigating Language Model Dynamics using Meta-Tokens

    · PDF
  43. Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs

    · PDF
  44. Just Select Twice: Leveraging Low Quality Data to Improve Data Selection

    · PDF
  45. Latent Concept-based Explanation of NLP Models

    · PDF
  46. Loss-to-Loss Prediction: Language model scaling laws across datasets

    · PDF
  47. Most Influential Subset Selection: Challenges, Promises, and Beyond

    · PDF
  48. On Linear Representations and Pretraining Data Frequency in Language Models

    · PDF
  49. Peter Parker or Spiderman? Disambiguating Multiple Class Labels

    · PDF
  50. Pruning-based Data Selection and Network Fusion for Efficient Deep Learning

    · PDF
  51. Quanda: An Interpretability Toolkit for Training Data Attribution Evaluation and Beyond

    · PDF
  52. Quantifying Positional Biases in Text Embedding Models

    · PDF
  53. ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models

    · PDF
  54. SCIURus: Shared Circuits for Interpretable Uncertainty Representations in Language Models

    · PDF
  55. Secret Seeds in Text-to-Image Diffusion Models

    · PDF
  56. Small-to-Large Generalization: Training Data Influences Models Consistently Across Scale

    · PDF
  57. The Association Between Training Data and Text-to-Image Generation Capabilities

    · PDF
  58. Toward Optimal Search and Retrieval for RAG

    · PDF
  59. Towards a Mechanistic Explanation of Diffusion Model Generalization

    · PDF
  60. Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison

    · PDF
  61. Training on the Test Task Confounds Evaluation and Emergence

    · PDF
  62. U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models

    · PDF
  63. Understanding Compute-Parameter Trade-offs in Sparse Mixture-of-Expert Language Models

    · PDF
  64. Understanding the Sources of Performance in Deep Drug Response Models

    · PDF
  65. Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

    · PDF
  66. Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling

    · PDF
  67. Weak-to-Strong Confidence Prediction

    · PDF
  68. Weak-to-Strong In-Context Optimization of Language Model Reasoning

    · PDF
  69. What do Learning Dynamics Reveal about Generalization in LLM Reasoning?

    · PDF
  70. What's In My Big Data?

    · PDF
  71. When Attention Sink Emerges in Language Models: An Empirical View

    · PDF
  72. You can remove GPT2's LayerNorm by fine-tuning

    · PDF