ICLR 2026 Past Math & reasoningLarge language models

ICLR 2026 Workshop on Logical Reasoning of Large Language Models

ICLR 2026 Workshop LLM Reasoning

Submission deadline
Mar 21, 2026, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (159)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Causal Legal Reasoning Method for Judicial Subjective Questions via Key Legal Fact Identification

    Jinze Sang, Jiawen Zhang · PDF
  2. Actor-Curator: Co-adaptive curricula via policy-improvement bandits for post-training

    Zhengyao Gu, Jonathan Light, Raul Astudillo, Ziyu Ye, Langzhou He, Wei Cheng, Santiago Paternain, Philip S. Yu, Yisong Yue · PDF
  3. Against Homogeneous Consensus: Why Scientific Discovery Requires Heterogeneous Adversarial LLM Agents

    Shuai Wang · PDF
  4. Agentic Proving for Program Verification

    Alessandro Sosso, Akhil Arora, Bas Spitters · PDF
  5. AGM-Bench: Do Large Language Models Revise Beliefs Rationally?

    Ben Jenkins · PDF
  6. AI-BAAM: AI-Driven Bank Statement Analytics as Alternative Data for Malaysian MSME Credit Scoring

    Chun Chet Ng, Zhen Hao Chu, Jia Yu Lim, Boon Yin Yin, Low Wei Zeng, Jin Khye Tan · PDF
  7. An Informal Logic LLM-Based Argumentation Framework

    Paulo Pirozelli, Douglas Aldred, Victor Hugo Nascimento Rocha, Fabio Cozman · PDF
  8. An Investigation of Robustness of LLMs in Mathematical Reasoning: Benchmarking with Mathematically-Equivalent Transformation of Advanced Mathematical Problems

    Yuren Hao, Xiang Wan, ChengXiang Zhai · PDF
  9. Are VLM Identity Judgments Logically Consistent? Evaluating Symmetry, Chain-of-Thought, and Transitivity in Person Re-Identification

    Alok Upadhyay · PDF
  10. AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

    Max Henning Höth, Kristian Kersting, Björn Deiseroth, Letitia Parcalabescu · PDF
  11. AtomGraph: Reasoning Isn't Linear, Why Should Verification Be?

    Aryan Karmore · PDF
  12. Autoformalizing Biomedical Text into Verified Knowledge Graph Reasoning: A Neuro-Symbolic Architecture for Alzheimer's Disease

    David Scott Lewis, Enrique Zueco · PDF
  13. Automatically Generating Hard Math Problems from Hypothesis-Driven Error Analysis

    Jiayu Fu, Mourad Heddaya, Chenhao Tan · PDF
  14. AVSAD: Automating Vector Symbolic Architecture Discovery with Iterative Evolution

    Deja N Scott, Dmitry Zubarev, Massimiliano Esposito, Avraham Shinnar, Abbas Rahimi, Kenneth L. Clarkson, Lior Horesh, Michael Hersche, Shashanka Ubaru · PDF
  15. Benchmark for Assessing Olfactory Perception of Large Language Models

    Eftychia Makri, Nikolaos Nakis, Laura Sisson, Leandros Tassiulas, Vahid Satarifard, Nicholas A. Christakis · PDF
  16. Benchmarking Logical Reasoning Inconsistencies in Local Large Language Models: Evidence from Multi-Domain Evaluation

    Tadisetty Sai Yashwanth, Dhatri C · PDF
  17. Better Think Thrice: Learning to Reason Causally with Double Counterfactual Consistency

    Victoria Lin, Xinnuo Xu, Rachel Lawrence, Risa Ueno, Amit Sharma, Javier Gonzalez, Niranjani Prasad · PDF
  18. Beyond Clause Count: A Study of Proof-Relevant Difficulty in LLM SAT Reasoning

    Tao Jiang, Shaowei Cai · PDF
  19. Beyond Rationalization: Criteria and Guidelines for Algorithmic Reasoning Traces in LLM Logical Reasoning

    Karun Thankachan, Prateek Kohli · PDF
  20. Beyond Self-Refinement: Ensembling and Chaining for Neurosymbolic Reasoning

    Devesh Maheshwari, Surbhi Sharma · PDF
  21. Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order

    Prakhar Gupta, Vaibhav Gupta · PDF
  22. Causal Evidence of Stack Representations in Modeling Counter Languages Using Transformers

    Nishit Singh · PDF
  23. CausalSim: Counterfactual Implication Inversion as a Logical Consistency Stress Test for Large Language Models

    youla yang · PDF
  24. Certified Coherent Reasoning for LLMs via Weighted MaxSAT and Belief-Revision Geometry

    Murari Ambati · PDF
  25. CFLBENCH: BENCHMARKING NOVEL CONTROL FLOW LANGUAGE LEARNING

    Aaroosh Rustagi, Jounghyuck Sohn, Thomas Peng, Mykaala Firdaus, Huanzhi Mao · PDF
  26. Chain-of-Thought Injection as an Inference-Time Safety Intervention

    Lindsay M. Smith, Ananya Malik, Edward James Young, Puria Radmard, Cameron Tice, Hannes Whittingham · PDF
  27. ChaosBench-Logic v2: Evaluating LLM Logical Reasoning over Dynamical Systems at Scale

    Noel Thomas · PDF
  28. Characterizing Backtracking in CoT through Internal Probes and Surface-Level Features

    Adiba Ejaz, Aditya Gupta, Arthur Pogosian, Peter Hase · PDF
  29. Commitment-Aware Axiomatic Coherence: Measuring Non-Vacuous Consistency in LMM Logical Reasoning

    Md Muntaqim Meherab · PDF
  30. Confidence-Gated RAG for Adaptive Retrieval in Sequential Agents

    Srikanth Devarakonda, RAJESH LINGAM, Vagdevi Challa · PDF
  31. Confident RAG: Enhancing the Performance of LLMs for Mathematics Question Answering through Multi-Embedding and Confidence Scoring

    Shi-Ting Chen, Zijian Zhao, Jinsong Chen · PDF
  32. Configuration Perturbation Induces Logical Contradictions Across Related Queries

    Raghav Subramaniam · PDF
  33. Constrained Wikigame: Benchmarking Deductive Reasoning for Multi-Step Planning

    Rafael Mosquera-Gómez, Juan Felipe Rodriguez, Martin Diaz Velez, Ivan Alvarenga de Sousa Junior, Juan Jaramillo · PDF
  34. CONSTRAINING PROBABILITY WITH LOGIC: A SPECTRUM FROM STATISTICAL ALIGNMENT TO STRUCTURAL GUARANTEE

    Kun Yuan · PDF
  35. ContraPrompt: Contrastive Prompt Optimization via Dyadic Reasoning Trace Analysis

    Rishav Rishav, Pushpak Pujari, Pushpendre Rastogi · PDF
  36. Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic

    Abinav Rao, Sujan Rachuri, Nikhil Vemuri · PDF
  37. CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

    Deep Shah, Sanket Badhe, Nehal Kathrotia, Priyanka Tiwari · PDF
  38. Debugging code world models

    Babak Rahmani · PDF
  39. DECODING LOGICAL NEGATION IN LARGE LANGUAGE MODELS: FROM STATISTICAL HEURISTICS TO CAUSAL SEMANTIC CIRCUITS

    Umair Tariq, Brian Cong, Archish Prakhya, Tinuade Adeleke, Sean Wu, Ruizhe Li · PDF
  40. Decoupling Reasoning from Action: Architectural Impacts on Agentic Planning Consistency

    Himaneesh Sompalle · PDF
  41. DEDUCTIVE CONSTRAINT SATISFACTION VS. PREVALENCE PRIORS: BENCHMARKING LLM LOGIC IN CLINICAL DIAGNOSTICS

    Dharini Raghavan · PDF
  42. DeltaLogic: Minimal Premise Edits Reveal Belief-Revision Failures in Logical Reasoning Models

    Amit Dhanda · PDF
  43. Detecting Scaling Factors Beyond the Model: A Reporting Framework for AI Agent Systems

    Kenta Kitamura · PDF
  44. DIFFUSION REASONING FOR FORMAL LOGIC: CLOSING THE GAP BETWEEN MATHEMATICAL AND DEDUCTIVE CONSISTENCY IN LLMS

    Ritika Lamba · PDF
  45. Distilling SMT Solver Reasoning into Compact Language Models

    Emre Kıyak, Cagatay Cingoz, Hakan Çapuk, Aykut Erdem · PDF
  46. Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis

    Ferdinand Kapl, Emmanouil Angelis, Tobias Höppe, Kaitlin Maile, Johannes von Oswald, Nino Scherrer, Stefan Bauer · PDF
  47. Do LLM Recommenders Obey Preference Axioms? Testing Logical Rationality in LLM-Based Recommendation

    Alok Upadhyay · PDF
  48. Do Transformers Use Their Depth Adaptively? Evidence from a Relational Reasoning Task

    Alicia Curth, Rachel Lawrence, Sushrut Karmalkar, Niranjani Prasad · PDF
  49. Embedding Distance as a Reward Signal can replace Verifiers for LLM Reasoning

    Abdelhakim Benechehab, Youssef Attia El Hili, Albert Thomas, Giuseppe Paolo, Maurizio Filippone · PDF
  50. Emergent Reasoning via Recursive Latent Reinforcement Pretraining

    Gopeshh Subbaraj, Istabrak Abbes, Artem Zholus, Matthew Riemer, Irina Rish, Sarath Chandar · PDF
  51. Enforcing Logical Invariance in Large Language Models via Symmetry Pair Training

    Prasanth · PDF
  52. Enhancing Code LLMs with Reinforcement Learning in Code Generation: A Survey

    Junqiao Wang, Zeng Zhang, Yangfan He, Zihao Zhang, Xinyuan Song, Yuyang Song, TIANYU SHI, Yuchen Li, Hengyuan Xu, Kunyu Wu, Yi Xin, Zhongwei Wan, Xinhang Yuan, Zijun Wang, Kuan Lu, Menghao Huo, Jingqun Tang, Guangwu Qian, Keqin Li, Qiuwu Chen, Lewei He · PDF
  53. Enhancing LLMs in Legal Judgment Prediction via Neuro-Symbolic Reasoning

    Zhaozuo Liu, Zhengnan Li, Fengxiang Cheng, Fenrong Liu · PDF
  54. Entailment Closure Failures in Large Language Models: A Benchmark for Cross-Query Logical Consistency

    Ben Jenkins · PDF
  55. Entropy Jurisprudence: Auditing Procedural Fidelity in LLM Normative Reasoning

    CHEN XIWEI · PDF
  56. ERA-GAC for Stable Structured Reasoning with Attention Priors and Gain-Aware Entropy Control

    Rian Atri · PDF
  57. EsoLang-Bench: Evaluating Genuine Reasoning in Large Language Models via Esoteric Programming Languages

    Aman Sharma, Paras Chopra · PDF
  58. Evaluation of Multi-Turn Consistency in LLM Agents: Survival Analysis and Failure-Rationale Taxonomy

    Igor Bogdanov, Olga Manakina, Chung-Horng Lung · PDF
  59. Finny: A Multi-Agent System for Structured Decision-Making with LLMs

    Harshitha Ravindra, Utkarsh Bajaj, Madhur Mehta · PDF
  60. From Facts to Conclusions : Integrating Deductive Reasoning in Retrieval-Augmented LLMs

    Shubham Mishra, Shiv Tiwari, Samyek Jain, Gorang Mehrishi, Dhruv Kumar, Pratik Narang, Harsh Sharma · PDF
  61. From Growing to Looping: A Unified View of Iterative Computation in LLMs

    Ferdinand Kapl, Emmanouil Angelis, Kaitlin Maile, Johannes von Oswald, Stefan Bauer · PDF
  62. From Natural Language to Exact Cover: A Neuro-Symbolic Approach to Zebra Puzzles

    Paulius Skaisgiris, Thomas Pammer, Veronika Semmelrock, Mykyta Ielanskyi, Maximilian Heisinger, Erich Kobler · PDF
  63. Fully Asynchronous Federated Learning with Faster Convergence for LLM Reasoning

    Jingyuan Zheng, Siyu Li · PDF
  64. GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers

    Shufan Jiang, Chios Chen, Zhiyang Chen · PDF
  65. GIFT: Guided Importance-Aware Fine-Tuning for Diffusion Language Models

    Guowei Xu, Wenxin Xu, Zhao Jiawang, Kaisheng Ma · PDF
  66. Governed Self-Improvement for Logical Reasoning: Edit-Time Governance for Developmental Consistency

    David Scott Lewis, Enrique Zueco · PDF
  67. Grounding the "Not": Symbolic Representation of Negation for Logical Reasoning in VLMs

    Inha Kang, Seonho Lee, Jiho Choi, Junsuk Choe, Hyunjung Shim · PDF
  68. GRPO-VPS: Enhancing Group Relative Policy Optimization with Verifiable Process Supervision for Effective Reasoning

    Jingyi Wang, Lei Zhu, Tengjin Weng, Song-Li Wu, Haochen Tan, Jierun Chen, Chaofan Tao, Haoli Bai, Lu Hou, Lifeng Shang, Xiao-Ping Zhang · PDF
  69. HALLUCINATION AS MISCLASSIFICATION: A COMPOSITE ABSTENTION ARCHITECTURE FOR LANGUAGE MODEL OUTPUT CONTROL

    Angelina Davini · PDF
  70. How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment

    Rebecca Ansell, Autumn Toney · PDF
  71. Improving Reachability on Reasoning Puzzles

    Sukruta Prakash Midigeshi, Sai Soumya Nalli, Utkarsh Tiwari, Amit Deshpande, Nagarajan Natarajan, Vineeth N. Balasubramanian, Amit Sharma, Gaurav Sinha · PDF
  72. Interpreting Chain-of-thought Reasoning via Partial Information Decomposition

    Barproda Halder, Qiuyi Zhang, Sanghamitra Dutta · PDF
  73. Interventional Grounding Audits: Black-Box Premise-Dependency Tests for LLM Chain-of-Thought via Predicate Substitution

    Hironao Nakamura · PDF
  74. interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors

    Vishak K Bhat, Prateek Chanda, Ashmit Khandelwal, Maitreyi Swaroop, Subbarao Kambhampati, Vineeth N. Balasubramanian, Nagarajan Natarajan, Amit Sharma · PDF
  75. INVESTIGATING EQUATION-ONLY REASONING IN LARGE LANGUAGE MODELS

    Jonathan Chung · PDF
  76. KV Cache as a Reasoning Primitive for Long Context Reasoning

    Rian Atri · PDF
  77. LaPep: Can Language Contribute to Property-Guided Peptide Design?

    Kimberly Liang, Tong Chen, Pranam Chatterjee · PDF
  78. Large Language Models Generate Harmful Content Using a Unified Mechanism

    Hadas Orgad, Boyi Wei, Kaden Zheng, Martin Wattenberg, Peter Henderson, Seraphina Goldfarb-Tarrant, Yonatan Belinkov · PDF
  79. Latent-Implicit Thinking with Proof-Carrying Neuro-Symbolic Outputs for Biomedical Discovery

    David Scott Lewis, Enrique Zueco · PDF
  80. Learning Reasoning Reward Models from Expert Demonstration via Inverse Reinforcement Learning

    Claudio Fanconi, Nicolás Astorga, Mihaela van der Schaar · PDF
  81. Linear Mechanisms of Spatiotemporal Reasoning in Vision Language Models

    Raphi Kang, Hongqiao Chen, Georgia Gkioxari, Pietro Perona · PDF
  82. LLATAS: Large LAnguage models as Tabular Auxiliary feature Synthesizer

    Yuzhen Mao, Martin Ester · PDF
  83. LLM Routing as Reasoning: A MaxSAT View

    Son Nguyen, Xinyuan Liu, Ransalu Senanayake · PDF
  84. LLM-as-a-Prophet: Understanding AI's Predictive Intelligence with Prophet Arena

    Qingchuan Yang, Simon Mahns, Sida Li, Anri Gu, Jibang Wu, Haifeng Xu · PDF
  85. LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers

    Nikhil Abhyankar, Parshin Shojaee, Chandan K. Reddy · PDF
  86. LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

    Lukas Helff, Quentin Delfosse, David Steinmann, Ruben Härle, Hikaru Shindo, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting, Felix Friedrich · PDF
  87. Logic-Verified GRPO: Graded Z3 Process Rewards for Logical Reasoning in Small LLMs

    Ishaan Gangwani, Aayam Bansal · PDF
  88. Logical Consistency Under Pressure: Probing and Repairing Cross-Query Contradictions in LLMs

    Aayam Bansal, Ishaan Gangwani · PDF
  89. Logical Reasoning Evaluation and Social Bias

    Sofia Martinelli, Guido Ivetta, Luciana Benotti · PDF
  90. LogicReward: Incentivizing LLM Reasoning via Step-Wise Logical Supervision

    Jundong Xu, Hao Fei, Huichi Zhou, Xin Quan, Qijun Huang, Shengqiong Wu, William Yang Wang, Mong-Li Lee, Wynne Hsu · PDF
  91. LogicVault: Persistent Symbolic Belief States for Cross-Query Logical Consistency in LLMs

    Sarim Chaudhry · PDF
  92. M3Kang: Evaluating Multilingual Multimodal Mathematical Reasoning in Vision-Language Models

    Aleix Torres-Camps, Nathaniel Mitrani Hadida, Victor Conchello Vendrell, Àlex Batlle Casellas, Arnau Padrés Masdemont, Jordi Ros-Giralt · PDF
  93. Mamba-SSM with LLM Reasoning for Feature Selection: Faithfulness-Aware Biomarker Discovery

    Pushpa Kumar Balan, Aijing Feng · PDF
  94. Mathematical Reasoning for Unmanned Aerial Vehicles: A RAG-Based Approach for Complex Arithmetic Reasoning

    Mehdi Azarafza, Mojtaba Nayyeri, Faezeh Pasandideh, Steffen Staab, Achim Rettberg · PDF
  95. MODALBENCH: EVALUATING MODAL AND DEONTIC LOGIC REASONING IN LARGE LANGUAGE MODELS

    mujtaba hasan · PDF
  96. MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL

    Zekun Xu, Siyu Xia, Chuhuai Yue, Jiajun Chai, Mingxue Tian, Xiaohan Wang, Wei Lin, Haoxuan Li, Guojun Yin · PDF
  97. MUX: Continuous Reasoning via Multiplexed Tokens

    Ayhan Suleymanzade, Halil Alperen Gozeten, Ismail Ilkan Ceylan, Jinwoo Kim · PDF
  98. Neuro-Symbolic Active Causal Hypothesis Testing for NAD+-Centered Alzheimer's Disease Reversal

    David Scott Lewis, Enrique Zueco · PDF
  99. Neuro-Symbolic Rule Discovery: Empowering LLMs with Causality for Vehicle Diagnostics

    Hugo Math, Julian Lorenz, Rainer Lienhart · PDF
  100. OmniEAR: Benchmarking Agent Reasoning in Embodied Tasks

    Zixuan Wang, Dingming Li, Hongxing Li, Yuchen Yan, Shuo Chen, Zhipiao Liu, Hongwei Yang, XIE GUOQING, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang · PDF
  101. On the "Induction Bias" in Sequence Models

    Reza Ebrahimi, Michaël Defferrard, Sunny Panchal, Roland Memisevic · PDF
  102. Out-of-Distribution Study of Rule-Based and Strategic Reasoning in Chess Transformers

    Anna Mészáros, Patrik Reizinger, Ferenc Huszár · PDF
  103. PAVE: Premise-Aware Validation and Editing for Retrieval-Augmented LLMs

    Tianyi Huang, Caden Yang, Emily Yin, Eric Wang, Michael Zhang · PDF
  104. PeerCoT: Structured Multi-Agent Chain-of-Thought Collaboration for Error Localization in LLM Reasoning

    Isha Chaturvedi, Rhys Llewellyn-Jones, Sage Rain Schaffer · PDF
  105. Position: Beyond Reasoning Zombies — AI Reasoning Requires Process Validity

    Rachel Lawrence, Jacqueline R. M. A. Maasch · PDF
  106. Position: Logical Soundness is not a Reliable Criterion for Neurosymbolic Fact-Checking with LLMs

    Jason Chan, Robert J. Gaizauskas, Zhixue Zhao · PDF
  107. POSITION: THE REASONING TRAP — LOGICAL REASONING AS A MECHANISTIC PATHWAY TO SITUATIONAL AWARENESS

    Subramanyam Sahoo, Aman Chadha, Vinija Jain, Divya Chaudhary · PDF
  108. Premises Reordering in Forward Chaining Improves LLM Symbolic Reasoning

    Xin Zhang · PDF
  109. PRISM: Prompt-Refined In-Context System Modeling for Financial Retrieval

    Chun Chet Ng, Jia Yu Lim, Low Wei Zeng · PDF
  110. ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

    Jingpei Wu, Xiao Han, Weixiang Shen, Boer Zhang, Zifeng Ding, Volker Tresp · PDF
  111. Prompt-Level Distillation: A Non-Parametric Alternative to Model Fine-Tuning for Efficient Reasoning

    Sanket Badhe, Deep Shah · PDF
  112. Pruning via Causal Attribution Preserves Reasoning in Large Language Models

    Amogh Sheth, Andrew Lin, Yi Wen Huang, Biruk Assefa, Yuhao Ge · PDF
  113. Quantifying Consistency in LLM Logical Reasoning via Structural Uncertainty

    Baishali Chaudhury, Mengdie Flora Wang, Hyunji Hayley Park, Rahul Ghosh, Sungmin Hong, Jae Oh Woo · PDF
  114. Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning

    Rohit Kumar Salla, Ramya Manasa Amancherla, Manoj Saravanan · PDF
  115. R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning

    Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Na Li, Chuchu Fan · PDF
  116. RAG over Tables: Hierarchical Memory Index, Multi-Stage Retrieval, and Benchmarking

    Jiaru Zou, Dongqi Fu, Sirui Chen, Xinrui He, Zihao Li, Yada Zhu, Jiawei Han, Jingrui He · PDF
  117. Reasoning or Rhetoric? An Empirical Analysis of Moral Reasoning Explanations in Large Language Models

    Aryan Kasat, Smriti Singh, Vinija Jain, Aman Chadha · PDF
  118. Reasoning Structure of Large Language Models

    Frédéric Berdoz, Luca A Lanzendörfer, Fabian Farestam, Roger Wattenhofer · PDF
  119. Reasoning Under Constraint: How Batch Prompting Suppresses Overthinking in Reasoning Models

    Saurabh Srivastava, Janit Bidhan, Hao Yan, Abhishek Dey, Tanu Kansal, Paras Kath, Sina Mansouri, Mohit Marvania, Vamsi Shankar Simhadri, Gaurav Singh · PDF
  120. RecRoll: Adaptive Depth First Search in Autoregressive Predictive Space

    Mykyta Ielanskyi, Sepp Hochreiter · PDF
  121. Recurrent Reasoning on Symbolic Puzzles with Sequence Models

    Gowrav Mannem, Chowdhury Marzia Mahjabin, Jason Chen, Shivank Garg, Kevin Zhu · PDF
  122. Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning

    Sebastien Kawada · PDF
  123. ResistIA: Reasoning-Guided Agentic Evaluation of Synthetic Metal-Resistance Genes from Conditional Genomic Foundation Models

    José Vásquez-Bastías, Juan Stockle · PDF
  124. Rethinking LLM Judges: Chain-of-Thought and Multi-Step Pipelines for Math Grading

    Eric Chen, Aryan Gulati, Brando Miranda, Zeyu Tang, Sanmi Koyejo · PDF
  125. Rethinking LLMs as Verifiers: When Verification is Harder Than Solving

    Varul Srivastava, Sankarshan Damle, Manisha Padala · PDF
  126. Revisiting Causal Reasoning in Language Models through Controlled Synthetic Worlds

    Abhirath Sangala, Vineeth N. Balasubramanian, Amit Sharma · PDF
  127. RHIM: Benchmarking Redundant Hypothesis Identification Reveals Systematic Gaps in LLM Logical Reasoning

    Hai Dinh, Minh-Tuan Luong, Kha Pham · PDF
  128. Riemann-Bench: A Benchmark for Moonshot Mathematics

    Sushant Mehta · PDF
  129. RIGHT ANSWERS, WRONG REASONS: DISSOCIATING UNDERSTANDING FROM CORRECTNESS IN LLM REASONING

    Vimanyu Taneja, Soumya Banerjee · PDF
  130. RSCE: Training-Free Residual Stream Encoding for Persistent Context Amortization

    Adam Kamel, Eric Xu · PDF
  131. Rubric as Reward: Decomposing Verification Signals for Logical Reasoning in GRPO

    Ishaan Gangwani, Aayam Bansal · PDF
  132. Safe Context Switching for Agents in the Wild: Mitigating Subspace Interference via Orthogonal Adaptation

    Akash Das · PDF
  133. SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

    Yanxiao Zhao, Yaqian Li, Zi-Hao Bo, Rinyoichi Takezoe, Haojia Hui, Mo Guang, Lei Ren, Xiaolin Qin, Kaiwen Long · PDF
  134. Scaffolding the Strategist: Architecture-Dependent Reasoning Interventions in Hotelling Spatial Markets

    Pratyush Singh · PDF
  135. Scaling Reasoning Depth Reveals Three Tiers of Failure in Multi-Model Mathematical Deduction

    Harsh Rathwa · PDF
  136. Selective Enforcement of Order-Invariant Causal Reasoning in Language Models

    Devon Copley · PDF
  137. SELF-AWARE MARKOV MODELS FOR DISCRETE REASONING

    Gregor Kornhardt, Jannis Chemseddine, Christian Wald, Gabriele Steidl · PDF
  138. Semantic Search over 9 Million Mathematical Theorems

    Luke Alexander, Eric Leonen, Sophie Szeto, Artemii Remizov, Ignacio Tejeda, Giovanni Inchiostro, Vasily Ilin, Jarod Alper · PDF
  139. Single-Position Intervention Fails: Distributed Output Templates Drive In-Context Learning

    Bryan Cheng, Jasper Zhang · PDF
  140. Small LLMs with Expert Blocks Are Good Enough for Hyperparamter Tuning

    Om Naphade, Saksham Bansal, Parikshit Pareek · PDF
  141. Sparse Spectral Signatures of Reasoning: Model-Agnostic Verification via Sentence- Level Graph Signals

    Arjun Balaji · PDF
  142. Spectral Attention Steering for Prompt Highlighting

    Weixian Waylon Li, Yuchen Niu, Yongxin Yang, Keshuang Li, Tiejun Ma, Shay B Cohen · PDF
  143. Stabilizing Iterative Self-Training with Verified Reasoning via Symbolic Recursive Self-Alignment

    Xinyu Zhang · PDF
  144. Stratum-Aware LLM Reasoning under Per-User Slot Constraints

    Shijin Zhang, Tianyu Xia · PDF
  145. STRuCT-LLM: Unifying Tabular and Graph Reasoning with Reinforcement Learning for Semantic Parsing

    Josefa Lia Stoisser, Marc Boubnovski Martell, Lawrence Phillips, Casper Hansen, Julien Fauqueur · PDF
  146. Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants

    Sankalp Gilda, Shlok Gilda · PDF
  147. Surrogate Signals from Format and Length: Reinforcement Learning for Solving Mathematical Problems without Ground Truth Answers

    Rihui Xin, Han Liu, Zecheng Wang, Yupeng Zhang, Dianbo Sui, Xiaolin Hu, Bingning Wang · PDF
  148. The AI Barrister Flight Simulator: A Neuro-Symbolic Benchmark for Structured Legal Reasoning

    David Scott Lewis, Enrique Zueco, Haley Yi · PDF
  149. The Capability Frontier: Benchmarks Miss 82% of Model Performance

    Bradley Fowler, Ryan Smith, Daniel Thi Graviet, William Myers, Joshua Greaves, Narmeen Fatimah Oozeer, Antía García, Philip Quirke, Fazl Barez, Shriyash Kaustubh Upadhyay · PDF
  150. The Epistemic Cost of Preference Optimization

    Rian Atri · PDF
  151. The First Tokens Matter: Early Confidence Signals for Evaluating LLM Reasoning

    Ali Keramati, Justin Cheok, Jacob Horne, Mark Warschauer · PDF
  152. The Language Of Bargaining: Linguistic Effects In LLM Negotiations

    Stuti Sinha, Himanshu Kumar, Aryan Raju Mandapati, Rakshit Sakhuja, Dhruv Kumar · PDF
  153. The Yes-Bias in LLM Reasoning

    Mark Obozov, Egor Salygin, Peter Losev, Artem Alekseev, Nikolay Bushkov, Stanislav Moiseev · PDF
  154. Think Less, Code Better: Probing When Chain-of-Thought Hurts and How to Route Around It

    Rajarshi Ghoshal, Salma Emad Mahmoud Abdelhalim, Debadri Basak, Pratibha kaur arora · PDF
  155. TopoBench: Benchmarking LLMs on hard topological reasoning

    Mayug Maniparambil, Nils Hoehing, Janak Kapuriya, Arjun Karuvally, Ellen Rushe, Anthony Ventresque, Noel O'Connor, Fergal Reid · PDF
  156. VariantBench: Benchmarking Language Models on Scientific Reasoning Across the Pharmacogenomic Evidence Pipeline

    Shlok Natarajan, Andrew Lanpouthakoun, Etash Kumar Guha, Aaron Fanous, Roxana Daneshjou · PDF
  157. When “Just Read the Chain of Thought” Fails: Five Tasks for Stress-Testing CoT Monitors

    Daria Ivanova, Riya Tyagi, Joshua Engels, Neel Nanda · PDF
  158. When Long Contexts Break Logic: Separating Evidence Use and Decision Bias in Instruction-Tuned LLMs

    Pravish Sainath · PDF
  159. Your Model Diversity, Not Method, Determines Reasoning Strategy

    Moulik Choraria, Argyrios Gerogiannis, Anirban Das, Supriyo Chakraborty, Berkcan Kapusuzoglu, Chia-Hsuan Lee, Kartik Balasubramaniam, Shi-Xiong Zhang, Sambit Sahu · PDF