NeurIPS 2025PastMath & reasoningLarge language models

The 5th Workshop on Mathematical Reasoning and AI

Name: The 5th Workshop on Mathematical Reasoning and AI (MATH-AI)
Start: Dec 6, 2025

MATH-AI

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.

Submission deadline: Aug 23, 2025, 11:59 UTC
SEED estimate of the historical deadline — verify
Workshop day: Dec 6, 2025
Submission portal: OpenReview
Notes: SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.

Previous editions

2024 — website ↗ papers ↗

Accepted papers (150)

Fetched from OpenReview (v2) on 2026-06-10.

\textsc{Gambit}: Generating Automated Mathematical Bounds, Inequalities, and Theorems
Randy Davila · PDF
A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models
Shubhra Mishra, Yuka Machino, Gabriel Poesia, Albert Q. Jiang, Joy Hsu, Adrian Weller, Challenger Mishra, David Broman, Joshua B. Tenenbaum, Mateja Jamnik, Cedegao E. Zhang, Katherine M. Collins · PDF
A NUMA Aware Compiler Framework for Large Scale Mathematical Reasoning Inference on PCIe Based Multi Accelerator Systems
JooHyoung Cha, Yongin Kwon · PDF
A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture
Roussel Rahman, Jeff Shrager · PDF
A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation
Bohan Yao, Vikas Yadav · PDF
Adaptive Control for Test-time Scaling
Taneesh Gupta, Rahul Madhavan, Rishabh Tiwari, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan, Kurt Keutzer · PDF
Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning
Rui Jerry Huang, Anastasia Miin, Wendy Yaqiao Liu, Lei Ding · PDF
AI Impact on Human Proof Formalization Workflows
Katherine M. Collins, Simon Frieder, Jonas Bayer, Jacob Loader, Jeck Lim, Peiyang Song, Fabian Zaiser, Lexin Zhou, Shanda Li, Shi-Zhuo Looi, Jose Hernandez-Orallo, Joshua B. Tenenbaum, Cameron Freer, Umang Bhatt, Adrian Weller, Valerie Chen, Ilia Sucholutsky · PDF
AI-Driven Mathematical Discovery for the Andrews–Curtis Conjecture
Caroline Zhang, Aaron Zhou, Robert Joseph George, Sergei Gukov, Anima Anandkumar · PDF
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang · PDF
Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
Nathan Egbuna, Saatvik Gaur, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Maheep Chaudhary · PDF
Analytical Lyapunov Function Discovery: An RL-based Generative Approach
Haohan Zou, Jie Feng, Hao Zhao, Yuanyuan Shi · PDF
AntiderivBench: Evaluating language models on indefinite integration
Bartosz Piotrowski, Kaiyu Yang · PDF
ARM: Discovering Agentic Reasoning Modules for Mathematical Problem-Solving
Bohan Yao, Shiva Krishna Reddy Malay, Vikas Yadav · PDF
Aryabhata: An exam-focused language model for JEE Math
Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma Penmetsa · PDF
Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers
Vivan Doshi · PDF
Axiom-Aware FunSearch for Non-Constructive Mathematics
Massimiliano Esposito, Besart Shyti · PDF
Babel-formal: Translation of Proofs between Lean and Rocq
Théo Stoskopf, Cyril Cohen, Nicolas Tabareau · PDF
Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning
Sara Rajaee, Rochelle Choenni, Ekaterina Shutova, Christof Monz · PDF
Beyond Accuracy: Evaluating Multimodal Mathematical and Scientific Reasoning Through Error Analysis and Self-Correction
Arka Mukherjee, Shreya Ghosh · PDF
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Chenlu Ye, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwal · PDF
Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer
Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Kunpeng Liu · PDF
Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves
Zihao Wan, Pau Tong Lin Xu, Fuwen Luo, Ziyue Wang, Peng Li, Yang Liu · PDF
BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs
Ivo Petrov, Jasper Dekoninck, Martin Vechev · PDF
Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework
Yuan Xia, Akanksha Atrey, Fadoua Khmaissia, Kedar Namjoshi · PDF
CauSciBench: Assessing LLM Causal Reasoning for Scientific Research
Sawal Acharya, Terry Jingchen Zhang, Andrew Kim, Sun Xianlin, Anahita Haghighat, Maximilian Mordig, Rahul Babu Shrestha, Clijo Jose, Yahang Qi, Pepijn Cobben, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin · PDF
CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs
Alexander Chervov, Dmytro Fedoriaka, Mark Obozov, Elena V. Konstantinova, Anton Naumov, Igor Kiselev, Anastasia Sheveleva, Ivan Koltsov, Sergei Lytkin, Andrei Smolensky, Alexander Soibelman, Fedor Levkovich-Maslyuk, Ruslan Grimov, Dmitry Volovich, Artem Isakov, Anton Kostin, Michael Litvinov, Nick Vilkin-Krom, Alim Bidzhiev, Artem Krasnyi, Mikhail Evseev, Elizaveta Geraseva, Liliya Grunwald, Sergey Galkin, Eduard Koldunov, Stanislav Diner, Artem Chevychelov, Evelina Kudasheva, Arsenii Sychev, Zakhar Kogan, Altana Natyrova, Lidia Shishina, Lyudmila Cheldieva, Vladislav Zamkovoy, Dmitrii Kovalenko, Oleg Papulov, Kudashev Sergey, Dmitry Shiltsov, Rustem Turtayev, Olga Nikitina, Dariya Mamayeva, Nikolenko Sergei, Anton Titarenko, Antonina Dolgorukova, Alexey N. Aparnev, Orianne Debeaupuis, Simo Alami Chehboune, Herve Isambert · PDF
CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models
Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu · PDF
CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process
Arman Akbari, Jian Gao, Yifei Zou, Mei Yang, Jinru Duan, Dmitrii Torbunov, Yanzhi Wang, Yihui Ren, Xuan Zhang · PDF
Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?
Yiyou Sun, Georgia Zhou, Haoyue Bai, Hao Wang, Dacheng Li, Nouha Dziri, Dawn Song · PDF
Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models
Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han · PDF
CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Language Models
Zhanke Zhou, Xiangyu Lu, Chentao Cao, Brando Miranda, Tongliang Liu, Bo Han, Sanmi Koyejo · PDF
CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning
Hamed Mahdavi, Pouria Mahdavinia, Alireza Farhadi, Pegah Mohammadipour, Samira Malek, Majid Daliri, Pedram Mohammadipour, Alireza Hashemi, Amir Khasahmadi, Vasant G. Honavar · PDF
Combining Textual and Structural Information for Premise Selection in Lean
Job Petrovčič, David E. Narváez, Ljupco Todorovski · PDF
Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision
Dulhan Jayalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten · PDF
Concept Generalization in Humans and Large Language Models: Insights from the Number Game
Arghavan Bazigaran, Hansem Sohn · PDF
Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors
xuying li · PDF
Credit Cards, Confusion, Computation, and Consequences: How Well Do LLMs Reason About Financial Literacy?
Arnav Hiray, Agam Shah, Caleb Lu, Meghaj Tarte, Sreya Tummalapelly, Harsit Mittal, Sudheer Chava · PDF
Curiosity-driven RL for symbolic equation solving
Kevin O'Keeffe · PDF
DAG-Math: Graph-Guided Mathematical Reasoning in LLMs
Yuanhe Zhang, Ilja Kuzborskij, Jason D. Lee, Chenlei Leng, Fanghui Liu · PDF
Decompose, Adapt, and Evolve: Towards Efficient Scientific Equation Discovery with Large Language Models
Pouya Behzadifar, Parshin Shojaee, Sanchit Kabra, Kazem Meidani, Chandan K. Reddy · PDF
Decoupling Reasoning from Proving: A New Framework for Tackling Olympiad-Level Mathematics
Zhenwen Liang, Linfeng Song, Yang Li, TAO YANG, feng zhang, Haitao Mi, Dong Yu · PDF
DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?
Yiyou Sun, Yuhan Cao, Pohao Huang, Haoyue Bai, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song · PDF
DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation
Vishal Kumar, Shubhra Mishra, Rebecca Hao, Rizwaan Malik, David Broman, Dorottya Demszky · PDF
DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning
Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie · PDF
EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning
Dong Liu, Yanxuan Yu, Ying Nian Wu · PDF
Evaluating Spatial Reasoning in Language Models
Aarush Gupta · PDF
Exact Learning of Arithmetic with Differentiable Agents
Hristo Papazov, Francesco D'Angelo, Nicolas Flammarion · PDF
Expanding the Action Space of LLMs to Reason Beyond Language
Zhongqi Yue, Weishi Wang, Yundaichuan Zhan, Juncheng Li, Daniel Dahlmeier, Fredrik D. Johansson · PDF
Faults in our Formal Benchmarks
Pawan Sasanka Ammanamanchi, Siddharth Bhat · PDF
FoCus: Improving Faithfulness in Chain-of-Thoughts by Training on Structured Reasoning Data
Guan-Yi Lin, Chung-En Sun, Tsui-Wei Weng · PDF
FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory
Xiao-Wen Yang, Zihao Zhang, Jianuo Cao, Zhi Zhou, Zenan Li, Lan-Zhe Guo, Yuan Yao, Taolue Chen, Yu-Feng Li, Xiaoxing Ma · PDF
FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis
Jan Ondras, Marek Suppa · PDF
Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute
Sheng Liu, Tianlang Chen, Pan Lu, Haotian Ye, Yizheng Chen, Lei Xing, James Zou · PDF
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization
Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao · PDF
HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization
Hongzheng Chen, Yingheng Wang, Yaohui Cai, Hins Hu, Jiajie Li, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P Gomes, Zhiru Zhang · PDF
Hilbert: Recursively Building Formal Proofs with Informal Reasoning
Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, Ke Ye · PDF
How does RL induce skill composition? A Case Study using Countdown
Simon Park, Simran Kaur, Sanjeev Arora · PDF
HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning
Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan · PDF
I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models
Giacomo Camposampiero, Michael Hersche, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi · PDF
IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
Johannes Schmitt, Gergely Berczi, Jasper Dekoninck, Jeremy Feusi, Tim Gehrunger, Raphael Appenzeller, Jim Bryan, Niklas Canova, Timo de Wolff, Filippo Gaia, Michel van Garrel, Baran Hashemi, David Holmes, Aitor Iribar Lopez, Victor Jaeck, Martina Jørgensen, Steven Kelk, Stefan Kuhlmann, Adam Kurpisz, Chiara Meroni, Ingmar Metzler, Samuel Muñoz-Echániz, Robert Nowak, Georg Oberdieck, Daniel Platt, Dylan Possamaï, Gabriel Ribeiro, Raúl Sánchez Galán, Zheming Sun, Josef Teichmann, Richard P Thomas, Charles Vial · PDF
Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs
Mauricio Barba da Costa, Fabian Zaiser, Katherine M. Collins, Romir Patel, Timothy J. O'Donnell, Alexander K. Lew, Joshua B. Tenenbaum, Vikash Mansinghka, Cameron Freer · PDF
Improving ML attacks on LWE with data repetition and stepwise regression
Alberto Alfarano, Eshika Saxena, Emily Wenger, Francois Charton, Kristin E. Lauter · PDF
In Good GRACES: Principled Teacher Selection for Knowledge Distillation
Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Sham M. Kakade, Surbhi Goel · PDF
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use
Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu · PDF
Infinite-Dimensional HiPPO Provides an Explicit Formula for LSSLs
Atsushi Takabatake, Takaharu Yaguchi · PDF
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Siyan Zhao, Mengchen Liu, Jing Huang, Miao Liu, Chenyu Wang, Bo Liu, Yuandong Tian, Guan Pang, Sean Bell, Aditya Grover, Feiyu Chen · PDF
Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles
Antara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis · PDF
Kimina Lean Server: A High-Performance Lean Server for Large-Scale Verification
Marco Dos Santos, Hugues de Saxcé, Haiming Wang, Mantas Baksys, Mert Unsal, Junqi Liu, Zhengying Liu, Jia LI · PDF
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han · PDF
Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training
Aadim Nepal, Safal Shrestha, Anubhav Shrestha, Minwu Kim, Jalal Naghiyev, Ravid Shwartz-Ziv, Keith W. Ross · PDF
LeanDojo-v2: A Comprehensive Library for AI-Assisted Theorem Proving in Lean
Ryan Hsiang, William Adkisson, Robert Joseph George, Anima Anandkumar · PDF
Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning
Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta · PDF
Learning Modular Exponentiation with Transformers
David Demitri Africa, Sara Mani Kapoor, Theo Simon Sorg, Challenger Mishra · PDF
Learning Permuted Congruential Sequences with Transformers
Tao Tao, Maissam Barkeshli · PDF
Learning to Reason on Hard Problems with Privileged On-Policy Exploration
Yuxiao Qu, Amrith Setlur, Virginia Smith, Ruslan Salakhutdinov, Aviral Kumar · PDF
Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning
Md Tanvirul Alam, Nidhi Rastogi · PDF
Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs
Tristan Cinquin, Geoff Pleiss, Agustinus Kristiadi · PDF
LLM-Generated Search Heuristics Can Solve Open Instances of Combinatorial Design Problems
Christopher D. Rosin · PDF
Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains
Soumya Rani Samineni, Durgesh Kalwar, Vardaan Gangal, Siddhant Bhambri, Subbarao Kambhampati · PDF
MathBode: Understanding LLM Reasoning with Dynamical Systems
Charles L. Wang · PDF
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba · PDF
MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles
Yuheng Ji, Huajie Tan, Cheng Chi, Yijie Xu, Yuting Zhao, Enshen Zhou, Huaihai Lyu, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Xiaolong Zheng · PDF
Measuring Off-Trajectory Math Reasoning of LLMs
Aochong Oliver Li, Tanya Goyal · PDF
Meta Thinker: Thinking What AI Thinks
Junyu Guo, Shangding Gu, Costas Spanos, Javad Lavaei · PDF
Minif2f in Rocq: Automatic Translation Between Proof Assistants — A Case Study
Jules Viennot, Guillaume Baudart, Emilio Jesús Gallego Arias, Marc Lelarge · PDF
Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions
Haoze Wu, Cheng Wang, Wenshuo Zhao, Junxian He · PDF
Modeling Chain-of-Thought Collapse in Pruned Language Models: Fidelity and Similarity Analysis for Mathematical Reasoning
AVINASH KUMAR SHARMA, Tushar Shinde · PDF
Nested Depth Generalization in Transformers
Emile R Richard · PDF
Numbers Already Carry Their Own Embeddings
SuHyun Bae, Donghun Lee · PDF
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization
Yiyou Sun, Shawn Hu, Georgia Zhou, Ken Jiankun Zheng, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song · PDF
On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning
Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Yang Yuan, Quanquan Gu, Andrew C Yao · PDF
On the Evolution of Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation
Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu · PDF
One Token to Fool LLM-as-a-Judge
Yulai Zhao, Haolin Liu, Dian Yu, Sunyuan Kung, MEIJIA CHEN, Haitao Mi, Dong Yu · PDF
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin, Nanyun Peng, Wei Wang, Kai-Wei Chang · PDF
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Yuhua Jiang, Yuwen Xiong, Yufeng Yuan, Chao Xin, Wenyuan Xu, YuYue, Qianchuan Zhao, Lin Yan · PDF
Patching Gaps In LLM Reasoning With Interventional Training
Matthew Y. R. Yang, Hao Bai, Ian Wu, Gene Yang, Amrith Setlur, Aviral Kumar · PDF
Pretraining Scaling Laws for Generative Evaluations of Language Models
Rylan Schaeffer, Noam Itzhak Levi, Brando Miranda, Sanmi Koyejo · PDF
PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning
Wanjia Zhao, Qinwei Ma, Jingzhe Shi, Shirley Wu, Jiaqi Han, Yijia Xiao, Si-Yuan Chen, Xiao Luo, Ludwig Schmidt, James Zou · PDF
Probabilistic Soundness Guarantees in LLM Reasoning Chains
Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong · PDF
Process-Verified Reinforcement Learning for Theorem Proving via Lean
Minsu Kim, Se-Young Yun · PDF
ProofGym: Unifying LLM-Based Theorem Proving Across Formal Systems
Xinrui Li, Wenjie Ma, Hangrui Bi, Zhaoyu Li, Xujie Si, Kaiyu Yang · PDF
ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations
Alex Gu, Bartosz Piotrowski, Fabian Gloeckle, Kaiyu Yang, Aram H. Markosyan · PDF
ProxyThinker: Test-Time Guidance through Small Visual Reasoners
Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez · PDF
PVSGym: A Proof Learning Environment
Manoj Acharya, Karthik Nukala, Natarajan Shankar · PDF
Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead
Feiyang Kang, Michael Kuchnik, Karthik Padthe, Marin Vlastelica, Ruoxi Jia, Carole-Jean Wu, Newsha Ardalani · PDF
R-Zero: Self-Evolving Reasoning LLM from Zero Data
Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu · PDF
RADAR: Reasoning–Ability and Difficulty-Aware Routing for Reasoning LLMs
Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew Lan, Zichao Wang · PDF
RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval
Minhae Oh, Jeonghye Kim, Nakyung Lee, Donggeon Seo, Taeuk Kim, Jungwoo Lee · PDF
RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows
Hamed Mahdavi, Pouria Mahdavinia, Pegah Mohammadipour, Samira Malek, Alireza Farhadi, Majid Daliri, Alireza Hashemi, Niloofar Mireshghallah, Amir Khasahmadi, Vasant G. Honavar · PDF
Reinforcement Learning for Hierarchical Proof Generation in Lean 4
Fabian Gloeckle, Alex Gu, Gabriel Synnaeve, Amaury Hayat · PDF
Reliable Fine-Grained Evaluation of Natural Language Math Proofs
Wenjie Ma, Andrei Cojocaru, Neel Kolhe, Robin Sharif, Haihan Zhang, Vincent Zhuang, Matei Zaharia, Sewon Min · PDF
Restructuring the Corpus Makes RAG Work for Math
Negar Arabzadeh, Wenjie Ma, Sewon Min, Matei Zaharia · PDF
Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces
Minju Gwak, Guijin Son, Jaehyung Kim · PDF
Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models
Yuhua Jiang, Jiawei Huang, Yufeng Yuan, Xin Mao, YuYue, Qianchuan Zhao, Lin Yan · PDF
RLVR vs. Distillation: Understanding Accuracy and Capability in LLM Mathematical Reasoning
Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith W. Ross · PDF
SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers
Chaitanya Manem, Pratik Prabhanjan Brahma, Prakamya Mishra, Zicheng Liu, Emad Barsoum · PDF
SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas
Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken · PDF
Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection
Sadegh Mahdavi, Branislav Kisacanin, Shubham Toshniwal, Wei Du, Ivan Moshkov, George Armstrong, Renjie Liao, Christos Thrampoulidis, Igor Gitman · PDF
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao · PDF
Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems
Stephen Miner, Yoshiki Takashima, Simeng Han, Sam Kouteili, Ferhat Erata, Ruzica Piskac, Scott J Shapiro · PDF
SciML Agents: Write the Solver, Not the Solution
Saarth Gaonkar, Xiang Zheng, Haocheng Xi, Rishabh Tiwari, Kurt Keutzer, Dmitriy Morozov, Michael W. Mahoney, Amir Gholami · PDF
Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models
Sayam Goyal, Brad Peters, María Emilia Granda, Akshath Vijayakumar Narmadha, Dharunish Yugeswardeenoo, Callum Stuart McDougall, Sean O'Brien, Ashwinee Panda, Kevin Zhu, Cole Blondin · PDF
Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards
Yiran Jenny Shen, Yu Xia, Jonathan Daniel Chang, Prithviraj Ammanabrolu · PDF
Single-stream Policy Optimization
Zhongwen Xu, Zihan Ding · PDF
Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation
Lechen Zhang, Yunxiang Zhang, Wei Hu, Lu Wang · PDF
Specifying exact circuit algorithms in universal transformers
Takuya Ito, Ruchir Puri, Parikshit Ram · PDF
SPG: Sandwiched Policy Gradient for Mask Diffusion Language Models
Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu · PDF
SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification
Rocky Klopfenstein, YANG HE, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu · PDF
STAT: Skill-Targeted Adaptive Training
Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora · PDF
STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision
Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, ZIHAN WANG, Anudeepsekhar Bolimera, Marios Savvides · PDF
Stoic Reasoner: Dual-Mode Transformers that Compress to Think and Decompress to Speak
Angeliki Giannou, Liu Yang, Kangwook Lee, Robert D Nowak, Dimitris Papailiopoulos · PDF
StreetMath: Study of LLMs’ Approximation Behaviors
Chiung-Yi Tseng, Maisha thasin, Somshubhra Roy, Danyang Zhang, Blessing Effiong · PDF
Systematic Diagnosis of Brittle Reasoning in Large Language Models
V. S. Raghu Parupudi · PDF
Tales from a Graph: a Pipeline for Mathematical Problem Generation
Bastien Le Chenadec, Eleonora Kreacic, Mathieu Sibue, Gabriel Mercier, Jannik Brinkmann, Nelson Vadori, Manuela Veloso · PDF
Think, Align, Select: Query–Key Scores for LLM Reasoning
Mark Obozov, Eduard Tulchinskii, Kristian Kuznetsov, Michael Diskin, Serguei Barannikov · PDF
ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
Chung-En Sun, Ge Yan, Tsui-Wei Weng · PDF
Tool-Assisted Multi-Turn Theorem Proving with LLMs
Kanan Gupta, Jannis Limperg, Udaya Ghai · PDF
Towards Scaling Laws for Symbolic Regression
David Otte, Jörg K.H. Franke, Frank Hutter · PDF
Towards Understanding Self-play for LLM Reasoning
Justin Yang Chae, Md Tanvirul Alam, Nidhi Rastogi · PDF
TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models
Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi · PDF
Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning
Bingning Huang, Tu Nguyen, Matthieu Zimmer · PDF
Understanding Tool-Integrated Reasoning
Heng Lin, Zhongwen Xu · PDF
Unspoken Logic: Understanding and bridging the gap between free-form and LLM-interpretable natural language mathematical proofs
Chenjun Guo, Manooshree Patel, Bjoern Hartmann, J.D. Zamfirescu-Pereira, Sarah Chasins, Gireeja Ranade · PDF
Usefulness-Driven Learning of Formal Mathematics
Timothe Kasriel, Thomas Lu, Qinghua Ding, Jingxuan He, Dawn Song · PDF
VeriBench-FTP: A Formal Theorem Proving Benchmark in Lean 4 for Code Verification
Slim Barkallah, Srivatsava Daruru, Brando Miranda, Leni Aniva, Allen Nie, Sanmi Koyejo · PDF
Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients
Cheng Ge, Caitlyn Heqi Yin, Hao Liang, Jiawei Zhang · PDF
Why Reinforcement Learning Struggles with Expression Simplification: A Reward Analysis
Oleksii Shuhailo, Karel Chvalovsky, Tomáš Pevný · PDF
Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline
Yichen Huang, Lin F. Yang · PDF
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
Shuvendu Roy, Hossein Hajimirsadeghi, Mengyao Zhai, Golnoosh Samei · PDF

Previous editions

Accepted papers (150)

☆\textsc{Gambit}: Generating Automated Mathematical Bounds, Inequalities, and Theorems

☆A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

☆A NUMA Aware Compiler Framework for Large Scale Mathematical Reasoning Inference on PCIe Based Multi Accelerator Systems

☆A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

☆A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

☆Adaptive Control for Test-time Scaling

☆Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning

☆AI Impact on Human Proof Formalization Workflows

☆AI-Driven Mathematical Discovery for the Andrews–Curtis Conjecture

☆AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

☆Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

☆Analytical Lyapunov Function Discovery: An RL-based Generative Approach

☆AntiderivBench: Evaluating language models on indefinite integration

☆ARM: Discovering Agentic Reasoning Modules for Mathematical Problem-Solving

☆Aryabhata: An exam-focused language model for JEE Math

☆Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers

☆Axiom-Aware FunSearch for Non-Constructive Mathematics

☆Babel-formal: Translation of Proofs between Lean and Rocq

☆Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

☆Beyond Accuracy: Evaluating Multimodal Mathematical and Scientific Reasoning Through Error Analysis and Self-Correction

☆Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

☆Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer

☆Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

☆BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

☆Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

☆CauSciBench: Assessing LLM Causal Reasoning for Scientific Research

☆CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs

☆CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

☆CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process

☆Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?

☆Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

☆CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Language Models

☆CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

☆Combining Textual and Structural Information for Premise Selection in Lean

☆Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

☆Concept Generalization in Humans and Large Language Models: Insights from the Number Game

☆Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

☆Credit Cards, Confusion, Computation, and Consequences: How Well Do LLMs Reason About Financial Literacy?

☆Curiosity-driven RL for symbolic equation solving

☆DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

☆Decompose, Adapt, and Evolve: Towards Efficient Scientific Equation Discovery with Large Language Models

☆Decoupling Reasoning from Proving: A New Framework for Tackling Olympiad-Level Mathematics

☆DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

☆DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation

☆DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

☆EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning

☆Evaluating Spatial Reasoning in Language Models

☆Exact Learning of Arithmetic with Differentiable Agents

☆Expanding the Action Space of LLMs to Reason Beyond Language

☆Faults in our Formal Benchmarks

☆FoCus: Improving Faithfulness in Chain-of-Thoughts by Training on Structured Reasoning Data

☆FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

☆FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis

☆Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

☆From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

☆HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

☆Hilbert: Recursively Building Formal Proofs with Informal Reasoning

☆How does RL induce skill composition? A Case Study using Countdown

☆HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

☆I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

☆IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

☆Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs

☆Improving ML attacks on LWE with data repetition and stepwise regression

☆In Good GRACES: Principled Teacher Selection for Knowledge Distillation

☆In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

☆Infinite-Dimensional HiPPO Provides an Explicit Formula for LSSLs

☆Inpainting-Guided Policy Optimization for Diffusion Large Language Models

☆Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

☆Kimina Lean Server: A High-Performance Lean Server for Large-Scale Verification

☆Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

☆Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

☆LeanDojo-v2: A Comprehensive Library for AI-Assisted Theorem Proving in Lean

☆Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

☆Learning Modular Exponentiation with Transformers

☆Learning Permuted Congruential Sequences with Transformers

☆Learning to Reason on Hard Problems with Privileged On-Policy Exploration

☆Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

☆Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs

\textsc{Gambit}: Generating Automated Mathematical Bounds, Inequalities, and Theorems

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

A NUMA Aware Compiler Framework for Large Scale Mathematical Reasoning Inference on PCIe Based Multi Accelerator Systems

A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

Adaptive Control for Test-time Scaling

Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning

AI Impact on Human Proof Formalization Workflows

AI-Driven Mathematical Discovery for the Andrews–Curtis Conjecture

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

Analytical Lyapunov Function Discovery: An RL-based Generative Approach

AntiderivBench: Evaluating language models on indefinite integration

ARM: Discovering Agentic Reasoning Modules for Mathematical Problem-Solving

Aryabhata: An exam-focused language model for JEE Math

Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers

Axiom-Aware FunSearch for Non-Constructive Mathematics

Babel-formal: Translation of Proofs between Lean and Rocq

Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

Beyond Accuracy: Evaluating Multimodal Mathematical and Scientific Reasoning Through Error Analysis and Self-Correction

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer

Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

CauSciBench: Assessing LLM Causal Reasoning for Scientific Research

CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs

CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process

Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?

Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Language Models

CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

Combining Textual and Structural Information for Premise Selection in Lean

Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

Concept Generalization in Humans and Large Language Models: Insights from the Number Game

Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

Credit Cards, Confusion, Computation, and Consequences: How Well Do LLMs Reason About Financial Literacy?

Curiosity-driven RL for symbolic equation solving

DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

Decompose, Adapt, and Evolve: Towards Efficient Scientific Equation Discovery with Large Language Models

Decoupling Reasoning from Proving: A New Framework for Tackling Olympiad-Level Mathematics

DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation

DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning

Evaluating Spatial Reasoning in Language Models

Exact Learning of Arithmetic with Differentiable Agents

Expanding the Action Space of LLMs to Reason Beyond Language

Faults in our Formal Benchmarks

FoCus: Improving Faithfulness in Chain-of-Thoughts by Training on Structured Reasoning Data

FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis

Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

Hilbert: Recursively Building Formal Proofs with Informal Reasoning

How does RL induce skill composition? A Case Study using Countdown

HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs

Improving ML attacks on LWE with data repetition and stepwise regression

In Good GRACES: Principled Teacher Selection for Knowledge Distillation

In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

Infinite-Dimensional HiPPO Provides an Explicit Formula for LSSLs

Inpainting-Guided Policy Optimization for Diffusion Large Language Models

Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

Kimina Lean Server: A High-Performance Lean Server for Large-Scale Verification

Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

LeanDojo-v2: A Comprehensive Library for AI-Assisted Theorem Proving in Lean

Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

Learning Modular Exponentiation with Transformers

Learning Permuted Congruential Sequences with Transformers

Learning to Reason on Hard Problems with Privileged On-Policy Exploration

Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs

LLM-Generated Search Heuristics Can Solve Open Instances of Combinatorial Design Problems

Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains