NeurIPS 2025PastInterpretabilityNeuroscience

First Workshop on CogInterp: Interpreting Cognition in Deep Learning Models

CogInterp @ NeurIPS 2025

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Aug 28, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (112)

Fetched from OpenReview (v2) on 2026-06-10.

(How) Do LLMs Plan in One Forward Pass?
Michael Hanna, Emmanuel Ameisen · PDF
A Cognitive Architecture for Probing Hierarchical Processing and Predictive Coding in Deep Vision Models
Brennen Hill, Zhang Xinyu, Timothy Putra Prasetio · PDF
A Computational Model for Binding by Enhanced Firing Rate: Implementing Smooth Power-law enhancement in Object-Centric Representations
Ishanvir S. Choongh, Manu Madhav · PDF
A Control-Theoretic Account of Cognitive Effort in Language Models
Pranjal Garg · PDF
A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy
Claire O'Brien, Jessica Seto, Dristi Roy, Aditya Dwivedi, Ryan Lagasse, Sunishchal Dev, Kevin Zhu, Sean O'Brien · PDF
A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains
Harshini Suresha, Kavitha S H · PDF
A Neuroscience-Inspired Dual-Process Model of Compositional Generalization
Alexander Noviello, Claas Beger, Jacob Groner, Kevin Ellis, Weinan Sun · PDF
Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study
Francis Pingfan Chien, Chia-Chun Dan Hsu, Po-Jang Hsieh, Yu Tsao · PDF
Actual or counterfactual? Asymmetric responsibility attributions in language models
Eric Bigelow, Yang Xiang, Tobias Gerstenberg, Tomer Ullman, Samuel J. Gershman · PDF
Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning
Anjishnu Kumar · PDF
Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs
ARTHUR BUZELIN, Samira Malaquias, Victoria Estanislau, Yan Aquino, Pedro Augusto Torres Bento, Lucas Dayrell, Arthur Chagas, Gisele L. Pappa, Wagner Meira Jr. · PDF
Bitter Lesson of the ARC-AGI Challenge: Intelligence may look very different in machines and humans
Soumya Banerjee · PDF
Bridging the Von Neuman Gap: Why LLMs Haven’t Made Novel Discoveries
Ashwin Saraswatula · PDF
Can You Spot the Virtual Patient? Expert Review, Turing Test, and Linguistic–Semantic Analysis
Reyhaneh Hosseinpourkhoshkbari, Wei-chen Huang, Suvel Muttreja, Richard M. Golden · PDF
Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias
Zhenghao Zhou, R. Thomas McCoy, Robert Frank · PDF
Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs
Lianghuan Huang, Yingshan Chang · PDF
Cognitive Behavior Modeling via Activation Steering
Anthony Kuang, Ahmed Ismail, Ayo Akinkugbe, Kevin Zhu, Sean O'Brien · PDF
Cognitive Load Traces as Symbolic and Visual Accounts of Deep Model Cognition
Dong Liu, Yanxuan Yu · PDF
Cognitive Machine Learning for Patient-First Modeling in Clinical Research
Shashank Uttrani, Shruti Kaushik, Martin White · PDF
Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning
Caroline Baumgartner, Eleanor Spens, Neil Burgess, Petru Manescu · PDF
Conflict Adaptation in Vision-Language Models
Xiaoyang Hu · PDF
Context informs pragmatic interpretation in vision–language models
Alvin Wei Ming Tan, Ben Prystawski, Veronica Boyce, Michael Frank · PDF
CORE – Cognitive Observation of Reasoning Errors
Janos Horvath · PDF
Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression
Nathaniel Imel, Noga Zaslavsky · PDF
CurLL: Curriculum Learning of Language Models
Pavan Kalyan Tankala, Shubhra Mishra, Satya Lokam, Navin Goyal · PDF
DecepBench: Benchmarking Multimodal Deception Detection
Vittesh Maganti, Nysa Lalye, Ethan Braverman, Kevin Zhu, Vasu Sharma, Sean O'Brien · PDF
Decoding and Reconstructing Visual Experience from Brain Activity with Generative Latent Representations
Motokazu Umehara, Yoshihiro Nagano, Misato Tanaka, Yukiyasu Kamitani · PDF
Deconstructing the Reasoning Process of a Neuro-Fuzzy Agent: From Learned Concepts to Natural Language Narratives
Yumin Zhou, Whye Loon Tung, Hiok Quek · PDF
Demystifying Emergent Exploration in Goal-conditioned RL
Mahsa Bastankhah, Grace Liu, Dilip Arumugam, Thomas L. Griffiths, Benjamin Eysenbach · PDF
Detecting Motivated Reasoning in the Internal Representations of Language Models
Parsa Mirtaheri, Mikhail Belkin · PDF
Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction
James A. Michaelov, Catherine Arnett · PDF
Discovering Functionally Sufficient Projections with Functional Component Analysis
Satchel Grant · PDF
Disentangling Interpretable Cognitive Variables That Support Human Generalization
Xinyue Zhu, Daniel L. Kimmel · PDF
Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?
Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati · PDF
Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment
María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Giovanni Franco Gabriel Marraffini, Mario Leiva, Gerardo Simari, Maria Vanina Martinez · PDF
Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence
Sanish Suwal, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi · PDF
Does FLUX Know What It’s Writing?
Adrian Chang, Sheridan Feucht, Byron C Wallace, David Bau · PDF
Don’t Think of the White Bear: Ironic Negation in Transformer Models under Cognitive Load
Logan Mann, Nayan Saxena, Sarah Tandon, Chenhao Sun, Savar Toteja, Kevin Zhu · PDF
Emergent World Beliefs: Exploring Transformers in Stochastic Games
Tanish Rastogi, Michael Ma, Adam Kamel, Kailash Ranganathan · PDF
Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers
Nischal Mainali, Lucas Teixeira · PDF
Extracting Belief-Update Rules to Explain Theory-of-Mind Generalization Failures
Joel Phillips Michelson, Deepayan Sanyal, Maithilee Kunda · PDF
Forgetting as a Lens into Model Cognition: Selective Unlearning Reveals Cognitive Biases in Deep Neural Networks
Kaustubha V · PDF
From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment
Ella Lan, Andrea Yu, Sergio Charles · PDF
From Cephalopods to Large Language Models: Conceptions of Intelligence and Reasoning
Soumya Banerjee · PDF
From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories
Minghao Fu, Sheng Zhang, Guangyi Chen, Zijian Li, Fan Feng, Yifan Shen, Shaoan Xie, Kun Zhang · PDF
Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding
Vanessa Figueiredo · PDF
GBEval: A SHAP-based Interpretable Gender Bias Assessment Framework for LLMs
Jayan Adhikari, Raj Dandekar, Rajat Dandekar, Sreedath Panat · PDF
Generating Compromises Between Two Points of View
Sumanta Bhattacharyya, Francine Chen, Scott Carter, Yan-Ying Chen, Tatiana Lau, Nayeli Suseth Bravo, Monica P Van, Kate Sieck, Charlene C. Wu · PDF
Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows
Billy Dickson, Zoran Tiganj · PDF
How Do LLMs Ask Questions? A Pragmatic Comparison with Human Question-Asking
Chani Jung, Jimin Mun, Xuhui Zhou, Alice Oh, Maarten Sap, Hyunwoo Kim · PDF
How Intrinsic Motivation Shapes Learned Representations in Decision Transformers: A Cognitive Interpretability Analysis
Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri · PDF
I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs
Puria Radmard, Shi Feng · PDF
Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs
Sonia Krishna Murthy, Rosie Zhao, Jennifer Hu, Sham M. Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman · PDF
InterpDetect: Interpretable Signals for Detecting Hallucinations in Retrieval-Augmented Generation
Likun Tan, Kuan-Wei Huang, Joy Shi, Kevin Wu · PDF
Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning
Chonghao Cai, Liyuan Li, Yifei Cao, Maria K Eckstein · PDF
Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation
Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati · PDF
Interpreting style–content parsing in vision–language models
Fan L. Cheng, Xin Jing · PDF
Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games
Maya Okawa, Ekdeep Singh Lubana, Mai Uchida, Hidenori Tanaka · PDF
Language models can associate objects with their features without forming integrated representations
Simon Jerome Han, James Lloyd McClelland · PDF
Language Models use Lookbacks to Track Beliefs
Nikhil Prakash, Natalie Shapira, Arnab Sen Sharma, Christoph Riedl, Yonatan Belinkov, Tamar Rott Shaham, David Bau, Atticus Geiger · PDF
Language-Based Dementia Classification Should Consider Model Cognition for Interpretability
Yui Ishihara, Michelle Cohn, Kartik Patwari, Alyssa Weakley, Chen-Nee Chuah · PDF
Learning to Look: Cognitive Attention Alignment with Vision-Language Models
Ryan L. Yang, Dipkamal Bhusal, Nidhi Rastogi · PDF
Let's Think 一步一步: A Cognitive Framework for Characterizing Code-Switching in LLM Reasoning
Eleanor Lin, David Jurgens · PDF
LLM Agents Beyond Utility: An Open-Ended Perspective
Asen Nachkov, Xi Wang, Luc Van Gool · PDF
LRP-CLIP: A Zero Shot Approach for the Explanation of the Cognitive Functions of Vision Models
Malte Singerhoff, Viktor Matkovic, Torben Weis · PDF
Measuring LLM Generation Spaces with EigenScore
Sunny Yu, Myra Cheng, Ahmad Jabbar, Robert D. Hawkins, Dan Jurafsky · PDF
Mechanisms of Symbol Processing in Transformers
Paul Smolensky, Roland Fernandez, Zhenghao Zhou, Mattia Opper, Adam Davies, Jianfeng Gao · PDF
Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis
Amartya Hatua · PDF
Mechanistic Interpretability of Semantic Abstraction in Biomedical Text
Nikhil Gourisetty, Vishnu Srinivas, Snata Mohanty, Soumil Jain, Kevin Zhu, Benjamin Liu, Sunishchal Dev, Sunith Vallabhaneni · PDF
MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning
Jin Wu, Chanjin Zheng · PDF
Metacognitive Sensitivity for Test-Time Dynamic Model Selection
Le Tuan Minh Trinh, Le Minh Vu Pham, Thi Minh Anh Pham, An Duc Nguyen · PDF
Mind Games Machines Play: Contrastive Cognitive Bias Detection in LLMs and Distilled Models
Anusha Asim, Maryam Rifah · PDF
Minimization of Boolean Complexity in In-Context Concept Learning
Leroy Z. Wang, R. Thomas McCoy, Shane Steinert-Threlkeld · PDF
Misalignment Between Vision-Language Representations in Vision-Language Models
Yonatan Gideoni, Yoav Gelberg, Tim G. J. Rudner, Yarin Gal · PDF
Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm
Amrapali Pednekar, Álvaro Garrido Pérez, Yara Khaluf, Pieter Simoens · PDF
NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
Wilka Carvalho, Vikram Srinivas Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha · PDF
On the Role of Pretraining in Domain Adaptation in an Infant-Inspired Distribution Shift Task
Deepayan Sanyal, Joel Phillips Michelson, Maithilee Kunda · PDF
Pedagogical Alignment of LLMs requires Diverse Cognitively-Inspired Student Proxies
Suchir Salhan, Andrew Caines, Paula Buttery · PDF
Perceived vs. True Emergence: A Cognitive Account of Generalization in Clinical Time Series Models
Shashank Yadav · PDF
Personality Manipulation as a Cognitive Probe in Large Language Models
Gunmay Handa, Zekun Wu, Adriano Koshiyama, Philip Colin Treleaven · PDF
PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm
Jing-Jing Li, Joel Mire, Eve Fleisig, Valentina Pyatkin, Maarten Sap, Sydney Levine · PDF
Post-hoc Stochastic Concept Bottleneck Models
Wiktor Hoffmann, Sonia Laguna, Moritz Vandenhirtz, Emanuele Palumbo, Julia E Vogt · PDF
Predicting the Formation of Induction Heads
Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider · PDF
Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations
Ekdeep Singh Lubana, Sai Sumedh R. Hindupur, Can Rager, Valérie Costa, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Greta Tuckute, Daniel Wurgaft, Demba E. Ba, Melanie Weber, Aaron Mueller · PDF
Privileged Self-Access Matters for Introspection in AI
Siyuan Song, Harvey Lederman, Jennifer Hu, Kyle Mahowald · PDF
Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits
Jan Sobotka, Auke Ijspeert, Guillaume Bellegarda · PDF
RNNs reveal a new optimal stopping rule in sequential sampling for decision-making
Jialin Li, Kenway Louie, Paul W. Glimcher, Bo Shen · PDF
Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques
Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Kevin Zhu, Sean O'Brien, Vasu Sharma · PDF
Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models
Sayam Goyal, Brad Peters, María Emilia Granda, Akshath Vijayakumar Narmadha, Dharunish Yugeswardeenoo, Callum Stuart McDougall, Sean O'Brien, Ashwinee Panda, Kevin Zhu, Cole Blondin · PDF
Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behaviour
Eric Zhang, Daniel Aarao Reis Arturi, Andrew Adrian Ansah, Kevin Zhu, Ashwinee Panda, Aishwarya Balwani · PDF
Signatures of human-like processing in Transformer forward passes
Jennifer Hu, Michael A. Lepori, Michael Franke · PDF
Sparse Feature Coactivation Reveals Composable Semantic Modules in Large Language Models
Ruixuan Deng, Xiaoyang Hu, Miles Gilberti, Shane Storks, Aman Taxali, Mike Angstadt, Chandra Sripada, Joyce Chai · PDF
STAT: Skill-Targeted Adaptive Training
Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora · PDF
Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!
Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Paul Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas · PDF
Strategy and structure in Codenames: Comparing human and GPT-4 gameplay
Noah Prescott, Tracey Mills, Jonathan Phillips · PDF
The Mechanistic Emergence of Symbol Grounding in Language Models
Ziqiao Ma, Shuyu Wu, Xiaoxi Luo, Yidong Huang, Josue Torres-Fonseca, Freda Shi, Joyce Chai · PDF
The One Where They Brain-Tune for Social Cognition: Multi-Modal Brain-Tuning on Friends
Nico Policzer, Cameron Braunstein, Mariya Toneva · PDF
Theoretical Linguistics Constrains Hypothesis-Driven Causal Abstraction in Mechanistic Interpretability
Suchir Salhan, Konstantinos Voudouris · PDF
Towards Cognitively Plausible Concept Learning: Spatially Grounding Concepts with Anatomical Priors
Yuyu Zhou · PDF
Towards finding consensus about similarity of symbolic encodings associated with concepts between LLMs and human brain
Sushma Anand Akoju · PDF
Towards Visual Simulation in Multimodal Language Models
Catherine Finegan-Dollak · PDF
Tracing the Development of Syntax and Semantics in a Model trained on Child-Directed Speech and Visual Input
Nina Schoener, Mahesh Srinivasan, Colin Conwell · PDF
Understanding Pre-trained and Fine-tuned model behaviour using Model Diffing
Mallikarjuna Tupakula · PDF
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory
Ming Li, Nan Zhang, Chenrui Fan, Hong Jiao, Tianyi Zhou · PDF
Unifying Gestalt Principles Through Inference-Time Prior Integration
Tahereh Toosi, Kenneth D. Miller · PDF
Unraveling the cognitive patterns of Large Language Models through module communities
Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao · PDF
Value Entanglement: Conflation Between Moral and Grammatical Good In (Some) Large Language Models
Seong Hah Cho, Junyi Li, Anna Leshinskaya · PDF
Video Finetuning Improves Reasoning Between Frames
Ruiqi Yang, Tian Yun, Zihan Wang, Ellie Pavlick · PDF
Visual symbolic mechanisms: Emergent symbol processing in vision language models
Rim Assouel, Declan Iain Campbell, Taylor Whittington Webb · PDF
What Comes to Mind? Interpretable Dimensions in Embedding Space Predict Human Ad Hoc Category Construction
Alina Dracheva, Jonathan Phillips · PDF
What is a Number, That a Large Language Model May Know It?
Raja Marjieh, Veniamin Veselovsky, Thomas L. Griffiths, Ilia Sucholutsky · PDF
When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?
Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie Gorman · PDF

Accepted papers (112)

☆(How) Do LLMs Plan in One Forward Pass?

☆A Cognitive Architecture for Probing Hierarchical Processing and Predictive Coding in Deep Vision Models

☆A Computational Model for Binding by Enhanced Firing Rate: Implementing Smooth Power-law enhancement in Object-Centric Representations

☆A Control-Theoretic Account of Cognitive Effort in Language Models

☆A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy

☆A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains

☆A Neuroscience-Inspired Dual-Process Model of Compositional Generalization

☆Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study

☆Actual or counterfactual? Asymmetric responsibility attributions in language models

☆Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning

☆Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs

☆Bitter Lesson of the ARC-AGI Challenge: Intelligence may look very different in machines and humans

☆Bridging the Von Neuman Gap: Why LLMs Haven’t Made Novel Discoveries

☆Can You Spot the Virtual Patient? Expert Review, Turing Test, and Linguistic–Semantic Analysis

☆Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias

☆Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs

☆Cognitive Behavior Modeling via Activation Steering

☆Cognitive Load Traces as Symbolic and Visual Accounts of Deep Model Cognition

☆Cognitive Machine Learning for Patient-First Modeling in Clinical Research

☆Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning

☆Conflict Adaptation in Vision-Language Models

☆Context informs pragmatic interpretation in vision–language models

☆CORE – Cognitive Observation of Reasoning Errors

☆Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression

☆CurLL: Curriculum Learning of Language Models

☆DecepBench: Benchmarking Multimodal Deception Detection

☆Decoding and Reconstructing Visual Experience from Brain Activity with Generative Latent Representations

☆Deconstructing the Reasoning Process of a Neuro-Fuzzy Agent: From Learned Concepts to Natural Language Narratives

☆Demystifying Emergent Exploration in Goal-conditioned RL

☆Detecting Motivated Reasoning in the Internal Representations of Language Models

☆Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

☆Discovering Functionally Sufficient Projections with Functional Component Analysis

☆Disentangling Interpretable Cognitive Variables That Support Human Generalization

☆Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?

☆Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment

☆Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

☆Does FLUX Know What It’s Writing?

☆Don’t Think of the White Bear: Ironic Negation in Transformer Models under Cognitive Load

☆Emergent World Beliefs: Exploring Transformers in Stochastic Games

☆Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers

☆Extracting Belief-Update Rules to Explain Theory-of-Mind Generalization Failures

☆Forgetting as a Lens into Model Cognition: Selective Unlearning Reveals Cognitive Biases in Deep Neural Networks

☆From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment

☆From Cephalopods to Large Language Models: Conceptions of Intelligence and Reasoning

☆From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories

☆Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding

☆GBEval: A SHAP-based Interpretable Gender Bias Assessment Framework for LLMs

☆Generating Compromises Between Two Points of View

☆Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

☆How Do LLMs Ask Questions? A Pragmatic Comparison with Human Question-Asking

☆How Intrinsic Motivation Shapes Learned Representations in Decision Transformers: A Cognitive Interpretability Analysis

☆I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

☆Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

☆InterpDetect: Interpretable Signals for Detecting Hallucinations in Retrieval-Augmented Generation

☆Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning

☆Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

☆Interpreting style–content parsing in vision–language models

☆Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games

☆Language models can associate objects with their features without forming integrated representations

☆Language Models use Lookbacks to Track Beliefs

☆Language-Based Dementia Classification Should Consider Model Cognition for Interpretability

☆Learning to Look: Cognitive Attention Alignment with Vision-Language Models

☆Let's Think 一步一步: A Cognitive Framework for Characterizing Code-Switching in LLM Reasoning

☆LLM Agents Beyond Utility: An Open-Ended Perspective

☆LRP-CLIP: A Zero Shot Approach for the Explanation of the Cognitive Functions of Vision Models

☆Measuring LLM Generation Spaces with EigenScore

☆Mechanisms of Symbol Processing in Transformers

☆Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis

☆Mechanistic Interpretability of Semantic Abstraction in Biomedical Text

☆MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning

☆Metacognitive Sensitivity for Test-Time Dynamic Model Selection

☆Mind Games Machines Play: Contrastive Cognitive Bias Detection in LLMs and Distilled Models

☆Minimization of Boolean Complexity in In-Context Concept Learning

☆Misalignment Between Vision-Language Representations in Vision-Language Models

☆Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm

☆NiceWebRL: a Python library for human subject experiments with reinforcement learning environments

☆On the Role of Pretraining in Domain Adaptation in an Infant-Inspired Distribution Shift Task

☆Pedagogical Alignment of LLMs requires Diverse Cognitively-Inspired Student Proxies

☆Perceived vs. True Emergence: A Cognitive Account of Generalization in Clinical Time Series Models

(How) Do LLMs Plan in One Forward Pass?

A Cognitive Architecture for Probing Hierarchical Processing and Predictive Coding in Deep Vision Models

A Computational Model for Binding by Enhanced Firing Rate: Implementing Smooth Power-law enhancement in Object-Centric Representations

A Control-Theoretic Account of Cognitive Effort in Language Models

A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy

A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains

A Neuroscience-Inspired Dual-Process Model of Compositional Generalization

Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study

Actual or counterfactual? Asymmetric responsibility attributions in language models

Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning

Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs

Bitter Lesson of the ARC-AGI Challenge: Intelligence may look very different in machines and humans

Bridging the Von Neuman Gap: Why LLMs Haven’t Made Novel Discoveries

Can You Spot the Virtual Patient? Expert Review, Turing Test, and Linguistic–Semantic Analysis

Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias

Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs

Cognitive Behavior Modeling via Activation Steering

Cognitive Load Traces as Symbolic and Visual Accounts of Deep Model Cognition

Cognitive Machine Learning for Patient-First Modeling in Clinical Research

Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning

Conflict Adaptation in Vision-Language Models

Context informs pragmatic interpretation in vision–language models

CORE – Cognitive Observation of Reasoning Errors

Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression

CurLL: Curriculum Learning of Language Models

DecepBench: Benchmarking Multimodal Deception Detection

Decoding and Reconstructing Visual Experience from Brain Activity with Generative Latent Representations

Deconstructing the Reasoning Process of a Neuro-Fuzzy Agent: From Learned Concepts to Natural Language Narratives

Demystifying Emergent Exploration in Goal-conditioned RL

Detecting Motivated Reasoning in the Internal Representations of Language Models

Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

Discovering Functionally Sufficient Projections with Functional Component Analysis

Disentangling Interpretable Cognitive Variables That Support Human Generalization

Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?

Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment

Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence

Does FLUX Know What It’s Writing?

Don’t Think of the White Bear: Ironic Negation in Transformer Models under Cognitive Load

Emergent World Beliefs: Exploring Transformers in Stochastic Games

Exact Learning Dynamics of In-Context Learning in Linear Transformers and Its Application to Non-Linear Transformers

Extracting Belief-Update Rules to Explain Theory-of-Mind Generalization Failures

Forgetting as a Lens into Model Cognition: Selective Unlearning Reveals Cognitive Biases in Deep Neural Networks

From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment

From Cephalopods to Large Language Models: Conceptions of Intelligence and Reasoning

From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories

Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding

GBEval: A SHAP-based Interpretable Gender Bias Assessment Framework for LLMs

Generating Compromises Between Two Points of View

Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows

How Do LLMs Ask Questions? A Pragmatic Comparison with Human Question-Asking

How Intrinsic Motivation Shapes Learned Representations in Decision Transformers: A Cognitive Interpretability Analysis

I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

InterpDetect: Interpretable Signals for Detecting Hallucinations in Retrieval-Augmented Generation

Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning

Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation

Interpreting style–content parsing in vision–language models

Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games

Language models can associate objects with their features without forming integrated representations

Language Models use Lookbacks to Track Beliefs

Language-Based Dementia Classification Should Consider Model Cognition for Interpretability

Learning to Look: Cognitive Attention Alignment with Vision-Language Models

Let's Think 一步一步: A Cognitive Framework for Characterizing Code-Switching in LLM Reasoning

LLM Agents Beyond Utility: An Open-Ended Perspective

LRP-CLIP: A Zero Shot Approach for the Explanation of the Cognitive Functions of Vision Models

Measuring LLM Generation Spaces with EigenScore

Mechanisms of Symbol Processing in Transformers

Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis

Mechanistic Interpretability of Semantic Abstraction in Biomedical Text

MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning

Metacognitive Sensitivity for Test-Time Dynamic Model Selection

Mind Games Machines Play: Contrastive Cognitive Bias Detection in LLMs and Distilled Models

Minimization of Boolean Complexity in In-Context Concept Learning

Misalignment Between Vision-Language Representations in Vision-Language Models

Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm

NiceWebRL: a Python library for human subject experiments with reinforcement learning environments

On the Role of Pretraining in Domain Adaptation in an Infant-Inspired Distribution Shift Task

Pedagogical Alignment of LLMs requires Diverse Cognitively-Inspired Student Proxies

Perceived vs. True Emergence: A Cognitive Account of Generalization in Clinical Time Series Models

Personality Manipulation as a Cognitive Probe in Large Language Models