ICLR 2026 Past Math & reasoningLarge language models
ICLR 2026 Workshop on Logical Reasoning of Large Language Models
ICLR 2026 Workshop LLM Reasoning
- Submission deadline
- Mar 21, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (159)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Causal Legal Reasoning Method for Judicial Subjective Questions via Key Legal Fact Identification
-
Actor-Curator: Co-adaptive curricula via policy-improvement bandits for post-training
-
Against Homogeneous Consensus: Why Scientific Discovery Requires Heterogeneous Adversarial LLM Agents
-
Agentic Proving for Program Verification
-
AGM-Bench: Do Large Language Models Revise Beliefs Rationally?
-
AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring
-
An Informal Logic LLM-Based Argumentation Framework
-
An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems
-
Are VLM Identity Judgments Logically Consistent? Evaluating Symmetry, Chain-of-Thought, and Transitivity in Person Re-Identification
-
AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency
-
AtomGraph: Reasoning Isn't Linear, Why Should Verification Be?
-
Autoformalizing Biomedical Text into Verified Knowledge Graph Reasoning: A Neuro-Symbolic Architecture for Alzheimer's Disease
-
Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis
-
AVSAD: Automating Vector Symbolic Architecture Discovery with Iterative Evolution
-
Benchmark for Assessing Olfactory Perception of Large Language Models
-
Benchmarking Logical Reasoning Inconsistencies in Local Large Language Models: Evidence from Multi-Domain Evaluation
-
Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency
-
Beyond Clause Count: A Study of Proof-Relevant Difficulty in LLM SAT Reasoning
-
Beyond Rationalization: Criteria and Guidelines for Algorithmic Reasoning Traces in LLM Logical Reasoning
-
Beyond Self-Refinement: Ensembling and Chaining for Neurosymbolic Reasoning
-
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
-
Causal Evidence of Stack Representations in Modeling Counter Languages Using Transformers
-
CausalSim: Counterfactual Implication Inversion as a Logical Consistency Stress Test for Large Language Models
-
Certified Coherent Reasoning for LLMs via Weighted MaxSAT and Belief-Revision Geometry
-
CFLBENCH: BENCHMARKING NOVEL CONTROL FLOW LANGUAGE LEARNING
-
Chain-of-Thought Injection as an Inference-Time Safety Intervention
-
ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale
-
Characterizing Backtracking in CoT through Internal Probes and Surface-Level Features
-
Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning
-
Confidence-Gated RAG for Adaptive Retrieval in Sequential Agents
-
Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring
-
Configuration Perturbation Induces Logical Contradictions Across Related Queries
-
Constrained Wikigame: Benchmarking Deductive Reasoning for Multi-Step Planning
-
CONSTRAINING PROBABILITY WITH LOGIC: A SPECTRUM FROM STATISTICAL ALIGNMENT TO STRUCTURAL GUARANTEE
-
ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis
-
Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic
-
CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization
-
Debugging code world models
-
DECODING LOGICAL NEGATION IN LARGE LANGUAGE MODELS: FROM STATISTICAL HEURISTICS TO CAUSAL SEMANTIC CIRCUITS
-
Decoupling Reasoning from Action: Architectural Impacts on Agentic Planning Consistency
-
DEDUCTIVE CONSTRAINT SATISFACTION VS. PREVALENCE PRIORS: BENCHMARKING LLM LOGIC IN CLINICAL DIAGNOSTICS
-
DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models
-
Detecting Scaling Factors Beyond the Model: A Reporting Framework for AI Agent Systems
-
DIFFUSION REASONING FOR FORMAL LOGIC: CLOSING THE GAP BETWEEN MATHEMATICAL AND DEDUCTIVE CONSISTENCY IN LLMS
-
Distilling SMT Solver Reasoning into Compact Language Models
-
Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis
-
Do LLM Recommenders Obey Preference Axioms? Testing Logical Rationality in LLM-Based Recommendation
-
Do Transformers Use Their Depth Adaptively? Evidence from a Relational Reasoning Task
-
Embedding Distance as a Reward Signal can replace Verifiers for LLM Reasoning
-
Emergent Reasoning via Recursive Latent Reinforcement Pretraining
-
Enforcing Logical Invariance in Large Language Models via Symmetry Pair Training
-
Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey
-
Enhancing LLMs in Legal Judgment Prediction via Neuro-Symbolic Reasoning
-
Entailment Closure Failures in Large Language Models: A Benchmark for Cross-Query Logical Consistency
-
Entropy Jurisprudence: Auditing Procedural Fidelity in LLM Normative Reasoning
-
ERA-GAC for Stable Structured Reasoning with Attention Priors and Gain-Aware Entropy Control
-
EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages
-
Evaluation of Multi-Turn Consistency in LLM Agents: Survival Analysis and Failure-Rationale Taxonomy
-
Finny: A Multi-Agent System for Structured Decision-Making with LLMs
-
From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs
-
From Growing to Looping: A Unified View of Iterative Computation in LLMs
-
From Natural Language to Exact Cover: A Neuro-Symbolic Approach to Zebra Puzzles
-
Fully Asynchronous Federated Learning with Faster Convergence for LLM Reasoning
-
GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
-
GIFT: Guided Importance-Aware Fine-Tuning for Diffusion Language Models
-
Governed Self-Improvement for Logical Reasoning: Edit-Time Governance for Developmental Consistency
-
Grounding the "Not": Symbolic Representation of Negation for Logical Reasoning in VLMs
-
GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning
-
HALLUCINATION AS MISCLASSIFICATION: A COMPOSITE ABSTENTION ARCHITECTURE FOR LANGUAGE MODEL OUTPUT CONTROL
-
How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment
-
Improving Reachability on Reasoning Puzzles
-
Interpreting Chain-of-thought Reasoning via Partial Information Decomposition
-
Interventional Grounding Audits: Black-Box Premise-Dependency Tests for LLM Chain-of-Thought via Predicate Substitution
-
interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors
-
INVESTIGATING EQUATION-ONLY REASONING IN LARGE LANGUAGE MODELS
-
KV Cache as a Reasoning Primitive for Long Context Reasoning
-
LaPep: Can Language Contribute to Property-Guided Peptide Design?
-
Large Language Models Generate Harmful Content Using a Unified Mechanism
-
Latent-Implicit Thinking with Proof-Carrying Neuro-Symbolic Outputs for Biomedical Discovery
-
Learning Reasoning Reward Models from Expert Demonstration via Inverse Reinforcement Learning
-
Linear Mechanisms of Spatiotemporal Reasoning in Vision Language Models
-
LLATAS: Large LAnguage models as Tabular Auxiliary feature Synthesizer
-
LLM Routing as Reasoning: A MaxSAT View
-
LLM-as-a-Prophet: Understanding AI's Predictive Intelligence with Prophet Arena
-
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers
-
LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking
-
Logic-Verified GRPO: Graded Z3 Process Rewards for Logical Reasoning in Small LLMs
-
Logical Consistency Under Pressure: Probing and Repairing Cross-Query Contradictions in LLMs
-
Logical Reasoning Evaluation and Social Bias
-
LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision
-
LogicVault: Persistent Symbolic Belief States for Cross-Query Logical Consistency in LLMs
-
M3Kang: Evaluating Multilingual Multimodal Mathematical Reasoning in Vision-Language Models
-
Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery
-
Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning
-
MODALBENCH: EVALUATING MODAL AND DEONTIC LOGIC REASONING IN LARGE LANGUAGE MODELS
-
MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL
-
MUX: Continuous Reasoning via Multiplexed Tokens
-
Neuro-Symbolic Active Causal Hypothesis Testing for NAD+-Centered Alzheimer's Disease Reversal
-
Neuro-Symbolic Rule Discovery: Empowering LLMs with Causality for Vehicle Diagnostics
-
OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks
-
On the "Induction Bias" in Sequence Models
-
Out-of-Distribution Study of Rule-Based and Strategic Reasoning in Chess Transformers
-
PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs
-
PeerCoT: Structured Multi-Agent Chain-of-Thought Collaboration for Error Localization in LLM Reasoning
-
Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity
-
Position: Logical Soundness is not a Reliable Criterion for Neurosymbolic Fact-Checking with LLMs
-
POSITION: THE REASONING TRAP — LOGICAL REASONING AS A MECHANISTIC PATHWAY TO SITUATIONAL AWARENESS
-
Premises Reordering in Forward Chaining Improves LLM Symbolic Reasoning
-
PRISM: Prompt-Refined In-Context System Modeling for Financial Retrieval
-
ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward
-
Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning
-
Pruning via Causal Attribution Preserves Reasoning in Large Language Models
-
Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty
-
Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning
-
R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
-
RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking
-
Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models
-
Reasoning Structure of Large Language Models
-
Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models
-
RecRoll: Adaptive Depth First Search in Autoregressive Predictive Space
-
Recurrent Reasoning on Symbolic Puzzles with Sequence Models
-
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
-
ResistIA: Reasoning-Guided Agentic Evaluation of Synthetic Metal-Resistance Genes from Conditional Genomic Foundation Models
-
Rethinking LLM Judges: Chain-of-Thought and Multi-Step Pipelines for Math Grading
-
Rethinking LLMs as Verifiers: When Verification is Harder Than Solving
-
Revisiting Causal Reasoning in Language Models through Controlled Synthetic Worlds
-
RHIM: Benchmarking Redundant Hypothesis Identification Reveals Systematic Gaps in LLM Logical Reasoning
-
Riemann-Bench: A Benchmark for Moonshot Mathematics
-
RIGHT ANSWERS, WRONG REASONS: DISSOCIATING UNDERSTANDING FROM CORRECTNESS IN LLM REASONING
-
RSCE: Training-Free Residual Stream Encoding for Persistent Context Amortization
-
Rubric as Reward: Decomposing Verification Signals for Logical Reasoning in GRPO
-
Safe Context Switching for Agents in the Wild: Mitigating Subspace Interference via Orthogonal Adaptation
-
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
-
Scaffolding the Strategist: Architecture-Dependent Reasoning Interventions in Hotelling Spatial Markets
-
Scaling Reasoning Depth Reveals Three Tiers of Failure in Multi-Model Mathematical Deduction
-
Selective Enforcement of Order-Invariant Causal Reasoning in Language Models
-
SELF-AWARE MARKOV MODELS FOR DISCRETE REASONING
-
Semantic Search over 9 Million Mathematical Theorems
-
Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning
-
Small LLMs with Expert Blocks Are Good Enough for Hyperparamter Tuning
-
Sparse Spectral Signatures of Reasoning: Model-Agnostic Verification via Sentence- Level Graph Signals
-
Spectral Attention Steering for Prompt Highlighting
-
Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment
-
Stratum-Aware LLM Reasoning under Per-User Slot Constraints
-
STRuCT-LLM: Unifying Tabular and Graph Reasoning with Reinforcement Learning for Semantic Parsing
-
Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants
-
Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers
-
The AI Barrister Flight Simulator: A Neuro-Symbolic Benchmark for Structured Legal Reasoning
-
The Capability Frontier: Benchmarks Miss 82% of Model Performance
-
The Epistemic Cost of Preference Optimization
-
The First Tokens Matter: Early Confidence Signals for Evaluating LLM Reasoning
-
The Language Of Bargaining: Linguistic Effects In LLM Negotiations
-
The Yes-Bias in LLM Reasoning
-
Think Less, Code Better: Probing When Chain-of-Thought Hurts and How to Route Around It
-
TopoBench: Benchmarking LLMs on hard topological reasoning
-
VariantBench: Benchmarking Language Models on Scientific Reasoning Across the Pharmacogenomic Evidence Pipeline
-
When “Just Read the Chain of Thought” Fails: Five Tasks for Stress-Testing CoT Monitors
-
When Long Contexts Break Logic: Separating Evidence Use and Decision Bias in Instruction-Tuned LLMs
-
Your Model Diversity, Not Method, Determines Reasoning Strategy