ICML 2026PastOther

ICML 2026 Workshop: Philosophy Meets Machine Learning

PhilML@ICML 2026

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: TBA — know the deadline? Help add it
Opens a short form — enter the date and pick its timezone, and we'll handle the conversion.
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (60)

Fetched from OpenReview (v2) on 2026-06-10.

A Definition of Good Explanations and the Challenges Explaining LLM Outputs
Louis Mahon, Elliot Ford, Callum Hackett
A Relativistic Perspective of Reliability in Machine Learning
Rajeev Verma
AI Review Is a Systemic Risk to Peer Review: Toward a Blockchain-Supported Claim-Level Ledger for Accountability
Yibo Miao, Yichi Zhang, Yinpeng Dong
AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs
Richard Ren, Kunyang Li, Mantas Mazeika, Wenyu Zhang, Yury Orlovskiy, Rishub Tamirisa, Wenjie Jacky Mo, Thuy Dung Nguyen, Long Phan, Steven Basart, Austin Meek, Aditya Mehta, Oliver Ingebretsen, Alice Blair, Brianna Adewinmbi, Vy Phan, Alice Gatti, Adam Khoja, Jason Hausenloy, Devin Kim, Dan Hendrycks
An Evolutionary Epistemology of Post-Training
Nicholas Clark
Articulate Intuition or Genuine Analysis? Benchmarking Epistemic Reliability in LLM-as-a-Judge Peer Review
Nuo Chen, Bingsheng He
Before Normative and Moral Alignment: Causal Contract Faithfulness as a Precondition for Trustworthy AI
Amine M'Charrak, Thong Pham, Thomas Lukasiewicz, Yuxiao Dong, Shohei Shimizu
Belief Without Justification: Sycophancy as a Single-Layer Truth–Compliance Tension in LLMs
Valentin NOËL
Beyond Accuracy: Epistemic Justification in Trustworthy Machine Learning
Poojak Patel, Maneth Perera
Can LLMs Navigate Beliefs and Facts? Depends on How You Phrase It
Quang Minh Nguyen, Luis Frentzen Salim
Can Standard MARL Metrics Distinguish Communicative from Strategic Action?
Majid Ghasemi, Mark Crowley
Constituting What Counts: A Phenomenological Approach to Human-AI Ontological Translation
Prerna Luthra, Manojshyaam C J
Counterfactuals Without Worlds: When ML Counterfactual Explanations Are Ill-Posed
Muhammet Anil Yagiz
DeepSWIP: Single-World Counterfactual Semantics for DeepProbLog
Saimun Habib, Vaishak Belle, Fengxiang He
Dignity as Answerability: How World-Model AI Reframes Human Moral Standing
Junghoon Justin Park, Jiook Cha
Do LLMs Really Represent the World? A Challenge from Teleosemantics
Eliot Du Sordet
Efficient Counterfactual Reasoning in ProbLog via Single-World Intervention Programs
Saimun Habib, Vaishak Belle, Fengxiang He
Epistemic Misalignment in Human-AI Systems: A Four-Quadrant Taxonomy of Uncertainty
Mayank Kejriwal
Explaining What Machine Learning Learns through Explainable AI
Jinyeong Gim
Explanation for Whom? Hospitable Interpretability for Machine Learning
Abutalib Namazov
Explanation in an Emerging Science of Large Language Models
Ming Liang Ang
Explanations are a Means to an End: Decision Theoretic Explanation Evaluation
Ziyang Guo, Berk Ustun, Jessica Hullman
Factuality Beyond Reference in LLMs
Thierry Poibeau
Fair Learning with Biased Labels: When Observed Accuracy Is the Wrong Target
Heng-Chien Liou, I-Hsiang Wang
Fictionalism about Personas: Folk Psychology as an Interpretability Strategy
Weiming Sheng
From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models
Leonard Engmann, Christian Medeiros Adriano, Holger Giese
From Prompts to Proof Obligations: Formal Sidecars as an Epistemic Interface for Trustworthy ML
Junyu Ren
Getting Monosemantic About Monosemanticity
Raphaël Millière, Kola Ayonrinde
Interpretability Should Prioritise Use-Inspired Basic Research for AI Safety
Kola Ayonrinde
Lifted Representation Hypothesis in Language Models
Bumjin Park, Jaesik Choi
Measuring the Ruler: Reading Benchmark Saturation as Evidence
Sebastian M Schmon
Mistakes as Epistemic Signatures: An Efficiency-Modulated Cumulative Error Framework for Comparison and Diagnosis of AI Errors
Darshini N
Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback
Thomas Jiralerspong, Flemming Kondrup, Yoshua Bengio
On Epistemic Diversity in Large Language Models
Elisabeth Kirsten, Nicole C. Krämer, Muhammad Bilal Zafar
On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?
Mingmeng Geng, Thierry Poibeau
Online Boundary-Aware Memory for Case-Based Reasoning Agents
Zheng Dong, Luming Shang
Operative Contexts: Belief Revision and Memory in Agentic AI
Emma Cabalé, Selina Guter, Philippe Beraud, Philippe Limantour
Privileged Self-Access Matters for Introspection in AI
Siyuan Song, Harvey Lederman, Jennifer Hu, Kyle Mahowald
Procedural Generalization: A Resource-Sensitive Account of Knowing-How
Tomer Galanti, Saharsh Koganti, Priyadarsi Mishra, Pierfrancesco Beneventano
Proleptic Epistemology for Societal Impacts of AGI
Priyansh Singhal, Sandeep Kumar, Piyush Joshi
Reality and Practice: A Relational Reading of the Platonic Representation Hypothesis
Sebastian M Schmon
Reconciling Causality and Non-Equilibrium Thermodynamics with Hamiltonian Causal Models
Dario Rancati, Max Welling, Francesco Locatello
Reliability, Faithfulness, and the Limits of Post-hoc Explanations of Opaque Scientific Models
Nick Oh, Helen Jin
Reliable for Whom? Directional Reliability in AI-Mediated Political Dialogue
Jaeyoun You
Savage Without Monotonicity
Shuo Li Liu, Jingni Yang
Self-Reports Do Not Identify Self-Models: An Identifiability Test for Counterfactual Reports
Phongsakon Mark Konrad, Toygar Tanyel, Serkan Ayvaz
The Concept of Representation in ML: Beyond Plato and Aristotle
Gilad Landau, Aviv Keren
The Hawk Effect: Why We Need a Two-Dimensional Measure of Machine Intelligence
Fryderyk Kuzma
The Opacity of Descent: Optimization, Epistemic Asymmetry, and the Semantics of Convergence in Deep Learning
Mahdi Ghaznavi
The Wrong Question? Artificial Consciousness and the Politics of AI Agency
Thierry Poibeau
Towards Automated Evaluation of Socio-Technical Harms in LLMs: A Normative Taxonomy and Multi-Turn Red-Teaming Framework
Byeongho Lee, Hyundeuk Cheon
Towards Formalizing Skepticism of Autoregressive Language Models: A Taxonomy in the Language of the Theory of Computation
Michael Guerzhoy
Trust as Predictive Precision: Reliability and Influence in Representation Alignment
Hidenori Tanaka
Trustworthiness and co-cognition in artificial intelligence systems
Silvère Gangloff
Uncertainty as Perceptual Testimony in Vision-Language Models
Ahmad A Rushdi
Unsafe Consensus in Diagnostic Deliberation
Yuting Yan, Yinghao Fu, Haozhou Gao, Tianjian Zhang, Aoxi Liu, Shuang Li
Vision-Language Asymmetry in Bistable Image Captioning
Arohan Agate
When Do Transformer Components Compose? Validating a Log-Pool Decomposition Criterion
Junyu Ren, Su Hyeong Lee, Risi Kondor
Where Does Prediction Error Come From When the Data Is Perfect? A Decomposition of the Model–World Gap in Predictive Uncertainty
Johanna Einsiedler, Rosa Lavelle-Hill, Constantin T. A. Wiegand
Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models
Joseph Keshet

Accepted papers (60)

☆A Definition of Good Explanations and the Challenges Explaining LLM Outputs

☆A Relativistic Perspective of Reliability in Machine Learning

☆AI Review Is a Systemic Risk to Peer Review: Toward a Blockchain-Supported Claim-Level Ledger for Accountability

☆AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs

☆An Evolutionary Epistemology of Post-Training

☆Articulate Intuition or Genuine Analysis? Benchmarking Epistemic Reliability in LLM-as-a-Judge Peer Review

☆Before Normative and Moral Alignment: Causal Contract Faithfulness as a Precondition for Trustworthy AI

☆Belief Without Justification: Sycophancy as a Single-Layer Truth–Compliance Tension in LLMs

☆Beyond Accuracy: Epistemic Justification in Trustworthy Machine Learning

☆Can LLMs Navigate Beliefs and Facts? Depends on How You Phrase It

☆Can Standard MARL Metrics Distinguish Communicative from Strategic Action?

☆Constituting What Counts: A Phenomenological Approach to Human-AI Ontological Translation

☆Counterfactuals Without Worlds: When ML Counterfactual Explanations Are Ill-Posed

☆DeepSWIP: Single-World Counterfactual Semantics for DeepProbLog

☆Dignity as Answerability: How World-Model AI Reframes Human Moral Standing

☆Do LLMs Really Represent the World? A Challenge from Teleosemantics

☆Efficient Counterfactual Reasoning in ProbLog via Single-World Intervention Programs

☆Epistemic Misalignment in Human-AI Systems: A Four-Quadrant Taxonomy of Uncertainty

☆Explaining What Machine Learning Learns through Explainable AI

☆Explanation for Whom? Hospitable Interpretability for Machine Learning

☆Explanation in an Emerging Science of Large Language Models

☆Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

☆Factuality Beyond Reference in LLMs

☆Fair Learning with Biased Labels: When Observed Accuracy Is the Wrong Target

☆Fictionalism about Personas: Folk Psychology as an Interpretability Strategy

☆From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

☆From Prompts to Proof Obligations: Formal Sidecars as an Epistemic Interface for Trustworthy ML

☆Getting Monosemantic About Monosemanticity

☆Interpretability Should Prioritise Use-Inspired Basic Research for AI Safety

☆Lifted Representation Hypothesis in Language Models

☆Measuring the Ruler: Reading Benchmark Saturation as Evidence

☆Mistakes as Epistemic Signatures: An Efficiency-Modulated Cumulative Error Framework for Comparison and Diagnosis of AI Errors

☆Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

☆On Epistemic Diversity in Large Language Models

☆On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?

☆Online Boundary-Aware Memory for Case-Based Reasoning Agents

☆Operative Contexts: Belief Revision and Memory in Agentic AI

☆Privileged Self-Access Matters for Introspection in AI

☆Procedural Generalization: A Resource-Sensitive Account of Knowing-How

☆Proleptic Epistemology for Societal Impacts of AGI

☆Reality and Practice: A Relational Reading of the Platonic Representation Hypothesis

☆Reconciling Causality and Non-Equilibrium Thermodynamics with Hamiltonian Causal Models

☆Reliability, Faithfulness, and the Limits of Post-hoc Explanations of Opaque Scientific Models

☆Reliable for Whom? Directional Reliability in AI-Mediated Political Dialogue

☆Savage Without Monotonicity

☆Self-Reports Do Not Identify Self-Models: An Identifiability Test for Counterfactual Reports

☆The Concept of Representation in ML: Beyond Plato and Aristotle

☆The Hawk Effect: Why We Need a Two-Dimensional Measure of Machine Intelligence

☆The Opacity of Descent: Optimization, Epistemic Asymmetry, and the Semantics of Convergence in Deep Learning

☆The Wrong Question? Artificial Consciousness and the Politics of AI Agency

☆Towards Automated Evaluation of Socio-Technical Harms in LLMs: A Normative Taxonomy and Multi-Turn Red-Teaming Framework

☆Towards Formalizing Skepticism of Autoregressive Language Models: A Taxonomy in the Language of the Theory of Computation

☆Trust as Predictive Precision: Reliability and Influence in Representation Alignment

☆Trustworthiness and co-cognition in artificial intelligence systems

☆Uncertainty as Perceptual Testimony in Vision-Language Models

☆Unsafe Consensus in Diagnostic Deliberation

☆Vision-Language Asymmetry in Bistable Image Captioning

☆When Do Transformer Components Compose? Validating a Log-Pool Decomposition Criterion

☆Where Does Prediction Error Come From When the Data Is Perfect? A Decomposition of the Model–World Gap in Predictive Uncertainty

☆Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models

A Definition of Good Explanations and the Challenges Explaining LLM Outputs

A Relativistic Perspective of Reliability in Machine Learning

AI Review Is a Systemic Risk to Peer Review: Toward a Blockchain-Supported Claim-Level Ledger for Accountability

AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs

An Evolutionary Epistemology of Post-Training

Articulate Intuition or Genuine Analysis? Benchmarking Epistemic Reliability in LLM-as-a-Judge Peer Review

Before Normative and Moral Alignment: Causal Contract Faithfulness as a Precondition for Trustworthy AI

Belief Without Justification: Sycophancy as a Single-Layer Truth–Compliance Tension in LLMs

Beyond Accuracy: Epistemic Justification in Trustworthy Machine Learning

Can LLMs Navigate Beliefs and Facts? Depends on How You Phrase It

Can Standard MARL Metrics Distinguish Communicative from Strategic Action?

Constituting What Counts: A Phenomenological Approach to Human-AI Ontological Translation

Counterfactuals Without Worlds: When ML Counterfactual Explanations Are Ill-Posed

DeepSWIP: Single-World Counterfactual Semantics for DeepProbLog

Dignity as Answerability: How World-Model AI Reframes Human Moral Standing

Do LLMs Really Represent the World? A Challenge from Teleosemantics

Efficient Counterfactual Reasoning in ProbLog via Single-World Intervention Programs

Epistemic Misalignment in Human-AI Systems: A Four-Quadrant Taxonomy of Uncertainty

Explaining What Machine Learning Learns through Explainable AI

Explanation for Whom? Hospitable Interpretability for Machine Learning

Explanation in an Emerging Science of Large Language Models

Explanations are a Means to an End: Decision Theoretic Explanation Evaluation

Factuality Beyond Reference in LLMs

Fair Learning with Biased Labels: When Observed Accuracy Is the Wrong Target

Fictionalism about Personas: Folk Psychology as an Interpretability Strategy

From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

From Prompts to Proof Obligations: Formal Sidecars as an Epistemic Interface for Trustworthy ML

Getting Monosemantic About Monosemanticity

Interpretability Should Prioritise Use-Inspired Basic Research for AI Safety

Lifted Representation Hypothesis in Language Models

Measuring the Ruler: Reading Benchmark Saturation as Evidence

Mistakes as Epistemic Signatures: An Efficiency-Modulated Cumulative Error Framework for Comparison and Diagnosis of AI Errors

Noticing the Watcher: LLM Agents Can Infer CoT Monitoring from Blocking Feedback

On Epistemic Diversity in Large Language Models

On the Detectability of LLM-Generated Text: What Exactly Is LLM-Generated Text?

Online Boundary-Aware Memory for Case-Based Reasoning Agents

Operative Contexts: Belief Revision and Memory in Agentic AI

Privileged Self-Access Matters for Introspection in AI

Procedural Generalization: A Resource-Sensitive Account of Knowing-How

Proleptic Epistemology for Societal Impacts of AGI

Reality and Practice: A Relational Reading of the Platonic Representation Hypothesis

Reconciling Causality and Non-Equilibrium Thermodynamics with Hamiltonian Causal Models

Reliability, Faithfulness, and the Limits of Post-hoc Explanations of Opaque Scientific Models

Reliable for Whom? Directional Reliability in AI-Mediated Political Dialogue

Savage Without Monotonicity

Self-Reports Do Not Identify Self-Models: An Identifiability Test for Counterfactual Reports

The Concept of Representation in ML: Beyond Plato and Aristotle

The Hawk Effect: Why We Need a Two-Dimensional Measure of Machine Intelligence

The Opacity of Descent: Optimization, Epistemic Asymmetry, and the Semantics of Convergence in Deep Learning

The Wrong Question? Artificial Consciousness and the Politics of AI Agency

Towards Automated Evaluation of Socio-Technical Harms in LLMs: A Normative Taxonomy and Multi-Turn Red-Teaming Framework

Towards Formalizing Skepticism of Autoregressive Language Models: A Taxonomy in the Language of the Theory of Computation

Trust as Predictive Precision: Reliability and Influence in Representation Alignment

Trustworthiness and co-cognition in artificial intelligence systems

Uncertainty as Perceptual Testimony in Vision-Language Models

Unsafe Consensus in Diagnostic Deliberation

Vision-Language Asymmetry in Bistable Image Captioning

When Do Transformer Components Compose? Validating a Log-Pool Decomposition Criterion

Where Does Prediction Error Come From When the Data Is Perfect? A Decomposition of the Model–World Gap in Predictive Uncertainty

Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models