COLM 2025PastMath & reasoningLarge language modelsInterpretability

The First Workshop on the Application of LLM Explainability to Reasoning and Planning

XLLM-Reason-Plan

Official website ↗OpenReview venue ↗See all COLM workshops →✎ Edit this entry

Submission deadline: Jun 28, 2025, 23:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (19)

Fetched from OpenReview (v2) on 2026-06-11.

Angular Steering: Behavior Control via Rotation in Activation Space
Hieu M. Vu, Tan Minh Nguyen · PDF
Are General-Purpose LLMs Ready for Planning? A Large- Scale Evaluation in PDDL
Kaustubh Vyas, Damien Graux, Sebastien Montella, Pavlos Vougiouklis, Jeff Z. Pan · PDF
Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation
Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz · PDF
Before You 〈think/〉, Monitor: Implementing Flavell's Metacognitive Framework in LLMs
Nick Oh · PDF
Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents
Runlong Ye, Zeling Zhang, Boushra Almazroua, Michael Liut · PDF
Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation
Ziling Cheng, Meng Cao, Leila Pishdad, Yanshuai Cao, Jackie CK Cheung · PDF
Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction
Guangyi Liu, Yongqi Zhang, Xunyuan Liu, Quanming Yao · PDF
Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing
Irina Saparina, Mirella Lapata · PDF
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data
Jiaming Zhou, Abbas Ghaddar, Ge Zhang, Liheng Ma, Yaochen Hu, Soumyasundar Pal, Bin Wang, Jianye HAO, Mark Coates, Yingxue Zhang · PDF
Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao · PDF
From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits
Karim Saraipour, Shichang Zhang · PDF
How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence
Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang · PDF
HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning
Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan · PDF
Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer
Wenquan Lu, Yuechuan Yang, Kyle Lee, Yanshu Li, Enqi Liu · PDF
Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models
Prahitha Movva · PDF
ReCalibrate: RL for Uncertainty-Aware Reasoning in LLMs
Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Jacob Andreas · PDF
Rethinking (Human) Preference Evaluation of LLM Rationales
Ziang Li, Manasi Ganti, Zixian Ma, Helena Vasconcelos, Qijia He, Ranjay Krishna · PDF
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee, Lihao Sun, Chris Wendler, Fernanda Viégas, Martin Wattenberg · PDF
When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration
Quan Shi, Carlos E Jimenez, Shunyu Yao, Nick Haber, Diyi Yang, Karthik R Narasimhan · PDF

Accepted papers (19)

☆Angular Steering: Behavior Control via Rotation in Activation Space

☆Are General-Purpose LLMs Ready for Planning? A Large- Scale Evaluation in PDDL

☆Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

☆Before You 〈think/〉, Monitor: Implementing Flavell's Metacognitive Framework in LLMs

☆Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents

☆Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

☆Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction

☆Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing

☆Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

☆Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

☆From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

☆How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

☆HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

☆Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

☆Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

☆ReCalibrate: RL for Uncertainty-Aware Reasoning in LLMs

☆Rethinking (Human) Preference Evaluation of LLM Rationales

☆The Geometry of Self-Verification in a Task-Specific Reasoning Model

☆When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration

Angular Steering: Behavior Control via Rotation in Activation Space

Are General-Purpose LLMs Ready for Planning? A Large- Scale Evaluation in PDDL

Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

Before You 〈think/〉, Monitor: Implementing Flavell's Metacognitive Framework in LLMs

Beyond Autocomplete: Designing CopilotLens Towards Transparent and Explainable AI Coding Agents

Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation

Case-Based Reasoning Enhances the Predictive Power of LLMs in Drug-Drug Interaction

Disambiguate First, Parse Later: Generating Interpretations for Ambiguity Resolution in Semantic Parsing

Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data

Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones

From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

Latent Chain-of-Thought? Decoding the Depth-Recurrent Transformer

Reasoning Riddles: How Explainability Reveals Cognitive Limits in Vision-Language Models

ReCalibrate: RL for Uncertainty-Aware Reasoning in LLMs

Rethinking (Human) Preference Evaluation of LLM Rationales

The Geometry of Self-Verification in a Task-Specific Reasoning Model

When Models Know More Than They Can Explain: Quantifying Knowledge Transfer in Human-AI Collaboration