NeurIPS 2024PastInterpretabilityTabular & structured data

Interpretable AI: Past, Present and Future

IAI Workshop @ NeurIPS 2024

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Sep 2, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (46)

Fetched from OpenReview (v2) on 2026-06-10.

A Concept-Based Explainability Framework for Large Multimodal Models
Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor, Alasdair Newson, Matthieu Cord · PDF
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders
David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Joseph Isaac Bloom · PDF
A Theory of Interpretable Approximations
Marco Bressan, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen · PDF
Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability
Bharat Chandra Yalavarthi, Nalini K. Ratha · PDF
Bivariate Decision Trees: Smaller, Interpretable, More Accurate
Rasul Kairgeldin, Miguel Á. Carreira-Perpiñán · PDF
Can sparse autoencoders be used to decompose and interpret steering vectors?
Harry Mayne, Yushi Yang, Adam Mahdi · PDF
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition
Tiberiu Mușat · PDF
Competence-Based Analysis of Language Models
Adam Davies, Jize Jiang, ChengXiang Zhai · PDF
ConceptDrift: Uncovering Biases through the Lens of Foundation Models
Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu · PDF
CoS: Enhancing Personalization and Mitigating Bias with Context Steering
Sashrika Pandey, Jerry Zhi-Yang He, Mariah L Schrum, Anca Dragan · PDF
Deep quantum graph dreaming: deciphering neural network insights into quantum experiments
Tareq Jaouni, Sören Arlt, Carlos Ruiz-Gonzalez, Ebrahim Karimi, Xuemei Gu, Mario Krenn · PDF
Disentangling Mean Embeddings for Better Diagnostics of Image Generators
Sebastian Gregor Gruber, Pascal Tobias Ziegler, Florian Buettner · PDF
Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques
Sharday Olowu, Neil D Lawrence, Soumya Banerjee · PDF
Error-controlled interaction discovery in deep neural networks
Winston Chen, Yifan Jiang, William Noble, Yang Young Lu · PDF
Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits
Zhuokai Zhao, Takumi Matsuzawa, William Irvine, Michael Maire, Gordon Kindlmann · PDF
Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes
Lukas Klein, Sebastian Ziegler, Felicia Gerst, Yanni Morgenroth, Karol Gotkowski, Eyke Schöniger, Nicole Kipke, Annika Seiler, Ellen Geibelt, Martin Heni, Silvia Wagner, Silvio Nadalin, Falko Fend, Daniela Aust, Andre Mihaljevic, Daniel Hartmann, Jurgen Weitz, Reiner Jumpertz-von Schwartzenberg, Marius Distler, Andreas Birkefeld, Susanne Ullrich, Paul F Jaeger, Fabian Isensee, Michele Solimena, Robert Wagner · PDF
Explainable Concept Generation through Vision-Language Preference Learning
Aditya Taparia, Som Sagar, Ransalu Senanayake · PDF
Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks
Alba Carballo-Castro, Sonia Laguna, Moritz Vandenhirtz, Julia E Vogt · PDF
GAMformer: Exploring In-Context Learning for Generalized Additive Models
Andreas C Mueller, Julien Siems, Harsha Nori, Rich Caruana, Frank Hutter · PDF
How Do Training Methods Influence the Utilization of Vision Models?
Paul Gavrikov, Shashank Agnihotri, Margret Keuper, Janis Keuper · PDF
Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability
Lukas Klein, Kenza Amara, Carsten T. Lüth, Antonio Foncubierta-Rodríguez, Hendrik Strobelt, Mennatallah El-Assady, Paul F Jaeger · PDF
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations
Kola Ayonrinde, Michael T Pearce · PDF
Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines
Pooria Assadi, NIMA SAFAEI · PDF
Isometry pursuit
Samson J Koelle, Marina Meila · PDF
Latent Concept-based Explanation of NLP Models
Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad · PDF
Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions
Marc Canby, Adam Davies, Chirag Rastogi, Julia Hockenmaier · PDF
Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory
Pasan Dissanayake, Sanghamitra Dutta · PDF
On Interpretability and Overreliance
Julian Skirzynski, Elena Glassman, Berk Ustun · PDF
Policy-shaped prediction: improving world modeling through interpretability
Miles Richard Hutson, Isaac Kauvar, Nick Haber · PDF
Position: In Defence of Post-hoc Explainability
Nick Oh · PDF
Position: XAI needs formal notions of explanation correctness
Stefan Haufe, Rick Wilming, Benedict Clark, Rustam Zhumagambetov, Danny Panknin, Ahcene Boubekki · PDF
Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings
Chunsheng Zuo, Pavel Guerzhoy, Michael Guerzhoy · PDF
Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy
Giang Nguyen, Valerie Chen, Mohammad Reza Taesiri, Anh Totti Nguyen · PDF
ProtoS-ViT: Visual foundation models for sparse self-explainable classifications
Hugues Turbe, Mina Bjelogrlic, Gianmarco Mengaldo, Christian Lovis · PDF
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison · PDF
Riemann Sum Optimization for Accurate Integrated Gradients Computation
Shree Singhi, Swadesh Swain · PDF
Right on Time: Revising Time Series Models by Constraining their Explanations
Maurice Kraus, David Steinmann, Antonia Wüst, Andre Kokozinski, Kristian Kersting · PDF
SignAttention: On the Interpretability of Transformer Models for Sign Language Translation
Pedro Alejandro Dal Bianco, Oscar Agustín Stanchi, Facundo Manuel Quiroga, Franco Ronchetti, Enzo Ferrante · PDF
Subgroup Discovery with the Cox Model
Zachary Izzo, Iain Melvin · PDF
The effect of whitening on explanation performance
Benedict Clark, Stoyan Karastoyanov, Rick Wilming, Stefan Haufe · PDF
The Price of Freedom: An Adversarial Attack on Interpretability Evaluation
Kristoffer Knutsen Wickstrøm, Marina MC Höhne, Anna Hedström · PDF
This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations
Chiyu Ma, Brandon Zhao, Chaofan Chen, Cynthia Rudin · PDF
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser, Gemma Elyse Moran, Aditya Ravuri, Kian Kenyon-Dean, Kristina Ulicna, Cian Eastwood, Jason Hartford · PDF
Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers
Omer Sahin Tas, Royden Wagner · PDF
You can remove GPT2's LayerNorm by fine-tuning
Stefan Heimersheim · PDF
Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing
Victoria Firsanova · PDF

Accepted papers (46)

☆A Concept-Based Explainability Framework for Large Multimodal Models

☆A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

☆A Theory of Interpretable Approximations

☆Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability

☆Bivariate Decision Trees: Smaller, Interpretable, More Accurate

☆Can sparse autoencoders be used to decompose and interpret steering vectors?

☆Clustering and Alignment: Understanding the Training Dynamics in Modular Addition

☆Competence-Based Analysis of Language Models

☆ConceptDrift: Uncovering Biases through the Lens of Foundation Models

☆CoS: Enhancing Personalization and Mitigating Bias with Context Steering

☆Deep quantum graph dreaming: deciphering neural network insights into quantum experiments

☆Disentangling Mean Embeddings for Better Diagnostics of Image Generators

☆Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques

☆Error-controlled interaction discovery in deep neural networks

☆Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

☆Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes

☆Explainable Concept Generation through Vision-Language Preference Learning

☆Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks

☆GAMformer: Exploring In-Context Learning for Generalized Additive Models

☆How Do Training Methods Influence the Utilization of Vision Models?

☆Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability

☆Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations

☆Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines

☆Isometry pursuit

☆Latent Concept-based Explanation of NLP Models

☆Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

☆Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

☆On Interpretability and Overreliance

☆Policy-shaped prediction: improving world modeling through interpretability

☆Position: In Defence of Post-hoc Explainability

☆Position: XAI needs formal notions of explanation correctness

☆Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings

☆Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy

☆ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

☆Residual Stream Analysis with Multi-Layer SAEs

☆Riemann Sum Optimization for Accurate Integrated Gradients Computation

☆Right on Time: Revising Time Series Models by Constraining their Explanations

☆SignAttention: On the Interpretability of Transformer Models for Sign Language Translation

☆Subgroup Discovery with the Cox Model

☆The effect of whitening on explanation performance

☆The Price of Freedom: An Adversarial Attack on Interpretability Evaluation

☆This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations

☆Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

☆Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

☆You can remove GPT2's LayerNorm by fine-tuning

☆Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing

A Concept-Based Explainability Framework for Large Multimodal Models

A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

A Theory of Interpretable Approximations

Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability

Bivariate Decision Trees: Smaller, Interpretable, More Accurate

Can sparse autoencoders be used to decompose and interpret steering vectors?

Clustering and Alignment: Understanding the Training Dynamics in Modular Addition

Competence-Based Analysis of Language Models

ConceptDrift: Uncovering Biases through the Lens of Foundation Models

CoS: Enhancing Personalization and Mitigating Bias with Context Steering

Deep quantum graph dreaming: deciphering neural network insights into quantum experiments

Disentangling Mean Embeddings for Better Diagnostics of Image Generators

Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques

Error-controlled interaction discovery in deep neural networks

Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes

Explainable Concept Generation through Vision-Language Preference Learning

Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks

GAMformer: Exploring In-Context Learning for Generalized Additive Models

How Do Training Methods Influence the Utilization of Vision Models?

Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations

Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines

Isometry pursuit

Latent Concept-based Explanation of NLP Models

Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

On Interpretability and Overreliance

Policy-shaped prediction: improving world modeling through interpretability

Position: In Defence of Post-hoc Explainability

Position: XAI needs formal notions of explanation correctness

Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings

Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy

ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

Residual Stream Analysis with Multi-Layer SAEs

Riemann Sum Optimization for Accurate Integrated Gradients Computation

Right on Time: Revising Time Series Models by Constraining their Explanations

SignAttention: On the Interpretability of Transformer Models for Sign Language Translation

Subgroup Discovery with the Cox Model

The effect of whitening on explanation performance

The Price of Freedom: An Adversarial Attack on Interpretability Evaluation

This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations

Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

You can remove GPT2's LayerNorm by fine-tuning

Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing