NeurIPS 2024 Past InterpretabilityTabular & structured data
Interpretable AI: Past, Present and Future
IAI Workshop @ NeurIPS 2024
- Submission deadline
- Sep 2, 2024, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (46)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Concept-Based Explainability Framework for Large Multimodal Models
-
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
-
A Theory of Interpretable Approximations
-
Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability
-
Bivariate Decision Trees: Smaller, Interpretable, More Accurate
-
Can sparse autoencoders be used to decompose and interpret steering vectors?
-
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition
-
Competence-Based Analysis of Language Models
-
ConceptDrift: Uncovering Biases through the Lens of Foundation Models
-
CoS: Enhancing Personalization and Mitigating Bias with Context Steering
-
Deep quantum graph dreaming: deciphering neural network insights into quantum experiments
-
Disentangling Mean Embeddings for Better Diagnostics of Image Generators
-
Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques
-
Error-controlled interaction discovery in deep neural networks
-
Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits
-
Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes
-
Explainable Concept Generation through Vision-Language Preference Learning
-
Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks
-
GAMformer: Exploring In-Context Learning for Generalized Additive Models
-
How Do Training Methods Influence the Utilization of Vision Models?
-
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
-
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations
-
Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines
-
Isometry pursuit
-
Latent Concept-based Explanation of NLP Models
-
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
-
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
-
On Interpretability and Overreliance
-
Policy-shaped prediction: improving world modeling through interpretability
-
Position: In Defence of Post-hoc Explainability
-
Position: XAI needs formal notions of explanation correctness
-
Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings
-
Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy
-
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications
-
Residual Stream Analysis with Multi-Layer SAEs
-
Riemann Sum Optimization for Accurate Integrated Gradients Computation
-
Right on Time: Revising Time Series Models by Constraining their Explanations
-
SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
-
Subgroup Discovery with the Cox Model
-
The effect of whitening on explanation performance
-
The Price of Freedom: An Adversarial Attack on Interpretability Evaluation
-
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations
-
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
-
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
-
You can remove GPT2's LayerNorm by fine-tuning
-
Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing