NeurIPS 2024 Past InterpretabilityTabular & structured data

Interpretable AI: Past, Present and Future

IAI Workshop @ NeurIPS 2024

Submission deadline
Sep 2, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (46)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Concept-Based Explainability Framework for Large Multimodal Models

    Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor, Alasdair Newson, Matthieu Cord · PDF
  2. A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders

    David Chanin, James Wilken-Smith, Tomáš Dulka, Hardik Bhatnagar, Joseph Isaac Bloom · PDF
  3. A Theory of Interpretable Approximations

    Marco Bressan, Nicolò Cesa-Bianchi, Emmanuel Esposito, Yishay Mansour, Shay Moran, Maximilian Thiessen · PDF
  4. Aligning Characteristic Descriptors with Images for Human-Expert-like Explainability

    Bharat Chandra Yalavarthi, Nalini K. Ratha · PDF
  5. Bivariate Decision Trees: Smaller, Interpretable, More Accurate

    Rasul Kairgeldin, Miguel Á. Carreira-Perpiñán · PDF
  6. Can sparse autoencoders be used to decompose and interpret steering vectors?

    Harry Mayne, Yushi Yang, Adam Mahdi · PDF
  7. Clustering and Alignment: Understanding the Training Dynamics in Modular Addition

    Tiberiu Mușat · PDF
  8. Competence-Based Analysis of Language Models

    Adam Davies, Jize Jiang, ChengXiang Zhai · PDF
  9. ConceptDrift: Uncovering Biases through the Lens of Foundation Models

    Cristian Daniel Paduraru, Antonio Barbalau, Radu Filipescu, Andrei Liviu Nicolicioiu, Elena Burceanu · PDF
  10. CoS: Enhancing Personalization and Mitigating Bias with Context Steering

    Sashrika Pandey, Jerry Zhi-Yang He, Mariah L Schrum, Anca Dragan · PDF
  11. Deep quantum graph dreaming: deciphering neural network insights into quantum experiments

    Tareq Jaouni, Sören Arlt, Carlos Ruiz-Gonzalez, Ebrahim Karimi, Xuemei Gu, Mario Krenn · PDF
  12. Disentangling Mean Embeddings for Better Diagnostics of Image Generators

    Sebastian Gregor Gruber, Pascal Tobias Ziegler, Florian Buettner · PDF
  13. Enhancing patient stratification and interpretability through class-contrastive and feature attribution techniques

    Sharday Olowu, Neil D Lawrence, Soumya Banerjee · PDF
  14. Error-controlled interaction discovery in deep neural networks

    Winston Chen, Yifan Jiang, William Noble, Yang Young Lu · PDF
  15. Evaluating Machine Learning Models with NERO: Non-Equivariance Revealed on Orbits

    Zhuokai Zhao, Takumi Matsuzawa, William Irvine, Michael Maire, Gordon Kindlmann · PDF
  16. Explainable AI-based analysis of human pancreas sections detects traits of type 2 diabetes

    Lukas Klein, Sebastian Ziegler, Felicia Gerst, Yanni Morgenroth, Karol Gotkowski, Eyke Schöniger, Nicole Kipke, Annika Seiler, Ellen Geibelt, Martin Heni, Silvia Wagner, Silvio Nadalin, Falko Fend, Daniela Aust, Andre Mihaljevic, Daniel Hartmann, Jurgen Weitz, Reiner Jumpertz-von Schwartzenberg, Marius Distler, Andreas Birkefeld, Susanne Ullrich, Paul F Jaeger, Fabian Isensee, Michele Solimena, Robert Wagner · PDF
  17. Explainable Concept Generation through Vision-Language Preference Learning

    Aditya Taparia, Som Sagar, Ransalu Senanayake · PDF
  18. Exploiting Interpretable Capabilities with Concept-Enhanced Diffusion and Prototype Networks

    Alba Carballo-Castro, Sonia Laguna, Moritz Vandenhirtz, Julia E Vogt · PDF
  19. GAMformer: Exploring In-Context Learning for Generalized Additive Models

    Andreas C Mueller, Julien Siems, Harsha Nori, Rich Caruana, Frank Hutter · PDF
  20. How Do Training Methods Influence the Utilization of Vision Models?

    Paul Gavrikov, Shashank Agnihotri, Margret Keuper, Janis Keuper · PDF
  21. Interactive Semantic Interventions for VLMs: A Human-in-the-Loop Approach to Interpretability

    Lukas Klein, Kenza Amara, Carsten T. Lüth, Antonio Foncubierta-Rodríguez, Hendrik Strobelt, Mennatallah El-Assady, Paul F Jaeger · PDF
  22. Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations

    Kola Ayonrinde, Michael T Pearce · PDF
  23. Interpretable AI in Human-Machine Systems: Insights from Human-in-the-Loop Product Recommendation Engines

    Pooria Assadi, NIMA SAFAEI · PDF
  24. Isometry pursuit

    Samson J Koelle, Marina Meila · PDF
  25. Latent Concept-based Explanation of NLP Models

    Xuemin Yu, Fahim Dalvi, Nadir Durrani, Marzia Nouri, Hassan Sajjad · PDF
  26. Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

    Marc Canby, Adam Davies, Chirag Rastogi, Julia Hockenmaier · PDF
  27. Model Reconstruction Using Counterfactual Explanations: A Perspective From Polytope Theory

    Pasan Dissanayake, Sanghamitra Dutta · PDF
  28. On Interpretability and Overreliance

    Julian Skirzynski, Elena Glassman, Berk Ustun · PDF
  29. Policy-shaped prediction: improving world modeling through interpretability

    Miles Richard Hutson, Isaac Kauvar, Nick Haber · PDF
  30. Position: In Defence of Post-hoc Explainability

    Nick Oh · PDF
  31. Position: XAI needs formal notions of explanation correctness

    Stefan Haufe, Rick Wilming, Benedict Clark, Rustam Zhumagambetov, Danny Panknin, Ahcene Boubekki · PDF
  32. Positional Information Can Emerge Through Causal Attention Making Nearby Token Embeddings Similar Even Without Positional Encodings

    Chunsheng Zuo, Pavel Guerzhoy, Michael Guerzhoy · PDF
  33. Probable-class Nearest-neighbor Explanations Improve AI & Human Accuracy

    Giang Nguyen, Valerie Chen, Mohammad Reza Taesiri, Anh Totti Nguyen · PDF
  34. ProtoS-ViT: Visual foundation models for sparse self-explainable classifications

    Hugues Turbe, Mina Bjelogrlic, Gianmarco Mengaldo, Christian Lovis · PDF
  35. Residual Stream Analysis with Multi-Layer SAEs

    Tim Lawson, Lucy Farnik, Conor Houghton, Laurence Aitchison · PDF
  36. Riemann Sum Optimization for Accurate Integrated Gradients Computation

    Shree Singhi, Swadesh Swain · PDF
  37. Right on Time: Revising Time Series Models by Constraining their Explanations

    Maurice Kraus, David Steinmann, Antonia Wüst, Andre Kokozinski, Kristian Kersting · PDF
  38. SignAttention: On the Interpretability of Transformer Models for Sign Language Translation

    Pedro Alejandro Dal Bianco, Oscar Agustín Stanchi, Facundo Manuel Quiroga, Franco Ronchetti, Enzo Ferrante · PDF
  39. Subgroup Discovery with the Cox Model

    Zachary Izzo, Iain Melvin · PDF
  40. The effect of whitening on explanation performance

    Benedict Clark, Stoyan Karastoyanov, Rick Wilming, Stefan Haufe · PDF
  41. The Price of Freedom: An Adversarial Attack on Interpretability Evaluation

    Kristoffer Knutsen Wickstrøm, Marina MC Höhne, Anna Hedström · PDF
  42. This Looks Like Those: Illuminating Prototypical Concepts Using Multiple Visualizations

    Chiyu Ma, Brandon Zhao, Chaofan Chen, Cynthia Rudin · PDF
  43. Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models

    Konstantin Donhauser, Gemma Elyse Moran, Aditya Ravuri, Kian Kenyon-Dean, Kristina Ulicna, Cian Eastwood, Jason Hartford · PDF
  44. Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

    Omer Sahin Tas, Royden Wagner · PDF
  45. You can remove GPT2's LayerNorm by fine-tuning

    Stefan Heimersheim · PDF
  46. Your Theory Is Wrong: Using Linguistic Frameworks for LLM Probing

    Victoria Firsanova · PDF