NeurIPS 2025 Past Math & reasoningLarge language models

The 5th Workshop on Mathematical Reasoning and AI

MATH-AI

Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.

Submission deadline
Aug 22, 2025, 23:59 AoE (UTC−12)
SEED estimate of the historical deadline — verify
Workshop day
Dec 6, 2025
Submission portal
OpenReview
Notes
SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.

Previous editions

Accepted papers (150)

Fetched from OpenReview (v2) on 2026-06-10.

  1. \textsc{Gambit}: Generating Automated Mathematical Bounds, Inequalities, and Theorems

    Randy Davila · PDF
  2. A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

    Shubhra Mishra, Yuka Machino, Gabriel Poesia, Albert Q. Jiang, Joy Hsu, Adrian Weller, Challenger Mishra, David Broman, Joshua B. Tenenbaum, Mateja Jamnik, Cedegao E. Zhang, Katherine M. Collins · PDF
  3. A NUMA Aware Compiler Framework for Large Scale Mathematical Reasoning Inference on PCIe Based Multi Accelerator Systems

    JooHyoung Cha, Yongin Kwon · PDF
  4. A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

    Roussel Rahman, Jeff Shrager · PDF
  5. A Toolbox, Not a Hammer -- Multi-TAG: Scaling Math Reasoning with Multi-Tool Aggregation

    Bohan Yao, Vikas Yadav · PDF
  6. Adaptive Control for Test-time Scaling

    Taneesh Gupta, Rahul Madhavan, Rishabh Tiwari, Xuchao Zhang, Chetan Bansal, Saravan Rajmohan, Kurt Keutzer · PDF
  7. Adaptive Coopetition: Leveraging Coarse Verifier Signals for Resilient Multi-Agent LLM Reasoning

    Rui Jerry Huang, Anastasia Miin, Wendy Yaqiao Liu, Lei Ding · PDF
  8. AI Impact on Human Proof Formalization Workflows

    Katherine M. Collins, Simon Frieder, Jonas Bayer, Jacob Loader, Jeck Lim, Peiyang Song, Fabian Zaiser, Lexin Zhou, Shanda Li, Shi-Zhuo Looi, Jose Hernandez-Orallo, Joshua B. Tenenbaum, Cameron Freer, Umang Bhatt, Adrian Weller, Valerie Chen, Ilia Sucholutsky · PDF
  9. AI-Driven Mathematical Discovery for the Andrews–Curtis Conjecture

    Caroline Zhang, Aaron Zhou, Robert Joseph George, Sergei Gukov, Anima Anandkumar · PDF
  10. AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

    Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, Huan Zhang · PDF
  11. Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization

    Nathan Egbuna, Saatvik Gaur, Kevin Zhu, Sunishchal Dev, Ashwinee Panda, Maheep Chaudhary · PDF
  12. Analytical Lyapunov Function Discovery: An RL-based Generative Approach

    Haohan Zou, Jie Feng, Hao Zhao, Yuanyuan Shi · PDF
  13. AntiderivBench: Evaluating language models on indefinite integration

    Bartosz Piotrowski, Kaiyu Yang · PDF
  14. ARM: Discovering Agentic Reasoning Modules for Mathematical Problem-Solving

    Bohan Yao, Shiva Krishna Reddy Malay, Vikas Yadav · PDF
  15. Aryabhata: An exam-focused language model for JEE Math

    Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma Penmetsa · PDF
  16. Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers

    Vivan Doshi · PDF
  17. Axiom-Aware FunSearch for Non-Constructive Mathematics

    Massimiliano Esposito, Besart Shyti · PDF
  18. Babel-formal: Translation of Proofs between Lean and Rocq

    Théo Stoskopf, Cyril Cohen, Nicolas Tabareau · PDF
  19. Best-of-L: Cross-Lingual Reward Modeling for Mathematical Reasoning

    Sara Rajaee, Rochelle Choenni, Ekaterina Shutova, Christof Monz · PDF
  20. Beyond Accuracy: Evaluating Multimodal Mathematical and Scientific Reasoning Through Error Analysis and Self-Correction

    Arka Mukherjee, Shreya Ghosh · PDF
  21. Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

    Chenlu Ye, Zhou Yu, Ziji Zhang, Hao Chen, Narayanan Sadagopan, Jing Huang, Tong Zhang, Anurag Beniwal · PDF
  22. Blind Spot Navigation in Large Language Model Reasoning with Thought Space Explorer

    Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Kunpeng Liu · PDF
  23. Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

    Zihao Wan, Pau Tong Lin Xu, Fuwen Luo, Ziyue Wang, Peng Li, Yang Liu · PDF
  24. BrokenMath: A Benchmark for Sycophancy in Theorem Proving with LLMs

    Ivo Petrov, Jasper Dekoninck, Martin Vechev · PDF
  25. Can Large Language Models Learn Formal Logic? A Data-Driven Training and Evaluation Framework

    Yuan Xia, Akanksha Atrey, Fadoua Khmaissia, Kedar Namjoshi · PDF
  26. CauSciBench: Assessing LLM Causal Reasoning for Scientific Research

    Sawal Acharya, Terry Jingchen Zhang, Andrew Kim, Sun Xianlin, Anahita Haghighat, Maximilian Mordig, Rahul Babu Shrestha, Clijo Jose, Yahang Qi, Pepijn Cobben, Bernhard Schölkopf, Mrinmaya Sachan, Zhijing Jin · PDF
  27. CayleyPy Growth: Efficient growth computations and hundreds of new conjectures on Cayley graphs

    Alexander Chervov, Dmytro Fedoriaka, Mark Obozov, Elena V. Konstantinova, Anton Naumov, Igor Kiselev, Anastasia Sheveleva, Ivan Koltsov, Sergei Lytkin, Andrei Smolensky, Alexander Soibelman, Fedor Levkovich-Maslyuk, Ruslan Grimov, Dmitry Volovich, Artem Isakov, Anton Kostin, Michael Litvinov, Nick Vilkin-Krom, Alim Bidzhiev, Artem Krasnyi, Mikhail Evseev, Elizaveta Geraseva, Liliya Grunwald, Sergey Galkin, Eduard Koldunov, Stanislav Diner, Artem Chevychelov, Evelina Kudasheva, Arsenii Sychev, Zakhar Kogan, Altana Natyrova, Lidia Shishina, Lyudmila Cheldieva, Vladislav Zamkovoy, Dmitrii Kovalenko, Oleg Papulov, Kudashev Sergey, Dmitry Shiltsov, Rustem Turtayev, Olga Nikitina, Dariya Mamayeva, Nikolenko Sergei, Anton Titarenko, Antonina Dolgorukova, Alexey N. Aparnev, Orianne Debeaupuis, Simo Alami Chehboune, Herve Isambert · PDF
  28. CDE: Curiosity-Driven Exploration for Efficient Reinforcement Learning in Large Language Models

    Runpeng Dai, Linfeng Song, Haolin Liu, Zhenwen Liang, Dian Yu, Haitao Mi, Zhaopeng Tu, Rui Liu, Tong Zheng, Hongtu Zhu, Dong Yu · PDF
  29. CircuitSense: A Hierarchical Circuit System Benchmark Bridging Visual Comprehension and Symbolic Reasoning in Engineering Design Process

    Arman Akbari, Jian Gao, Yifei Zou, Mei Yang, Jinru Duan, Dmitrii Torbunov, Yanzhi Wang, Yihui Ren, Xuan Zhang · PDF
  30. Climbing the Ladder of Reasoning: What LLMs Can—and Still Can’t—Solve after SFT?

    Yiyou Sun, Georgia Zhou, Haoyue Bai, Hao Wang, Dacheng Li, Nouha Dziri, Dawn Song · PDF
  31. Co-rewarding: Stable Self-supervised RL for Eliciting Reasoning in Large Language Models

    Zizhuo Zhang, Jianing Zhu, Xinmu Ge, Zihua Zhao, Zhanke Zhou, Xuan Li, Xiao Feng, Jiangchao Yao, Bo Han · PDF
  32. CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Language Models

    Zhanke Zhou, Xiangyu Lu, Chentao Cao, Brando Miranda, Tongliang Liu, Bo Han, Sanmi Koyejo · PDF
  33. CombiGraph-Vis: A Curated Multimodal Olympiad Benchmark for Discrete Mathematical Reasoning

    Hamed Mahdavi, Pouria Mahdavinia, Alireza Farhadi, Pegah Mohammadipour, Samira Malek, Majid Daliri, Pedram Mohammadipour, Alireza Hashemi, Amir Khasahmadi, Vasant G. Honavar · PDF
  34. Combining Textual and Structural Information for Premise Selection in Lean

    Job Petrovčič, David E. Narváez, Ljupco Todorovski · PDF
  35. Compute as Teacher: Turning Inference Compute Into Reference-Free Supervision

    Dulhan Jayalath, Shashwat Goel, Thomas Foster, Parag Jain, Suchin Gururangan, Cheng Zhang, Anirudh Goyal, Alan Schelten · PDF
  36. Concept Generalization in Humans and Large Language Models: Insights from the Number Game

    Arghavan Bazigaran, Hansem Sohn · PDF
  37. Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

    xuying li · PDF
  38. Credit Cards, Confusion, Computation, and Consequences: How Well Do LLMs Reason About Financial Literacy?

    Arnav Hiray, Agam Shah, Caleb Lu, Meghaj Tarte, Sreya Tummalapelly, Harsit Mittal, Sudheer Chava · PDF
  39. Curiosity-driven RL for symbolic equation solving

    Kevin O'Keeffe · PDF
  40. DAG-Math: Graph-Guided Mathematical Reasoning in LLMs

    Yuanhe Zhang, Ilja Kuzborskij, Jason D. Lee, Chenlei Leng, Fanghui Liu · PDF
  41. Decompose, Adapt, and Evolve: Towards Efficient Scientific Equation Discovery with Large Language Models

    Pouya Behzadifar, Parshin Shojaee, Sanchit Kabra, Kazem Meidani, Chandan K. Reddy · PDF
  42. Decoupling Reasoning from Proving: A New Framework for Tackling Olympiad-Level Mathematics

    Zhenwen Liang, Linfeng Song, Yang Li, TAO YANG, feng zhang, Haitao Mi, Dong Yu · PDF
  43. DELTA: How Does RL Unlock and Transfer New Algorithms in LLMs?

    Yiyou Sun, Yuhan Cao, Pohao Huang, Haoyue Bai, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song · PDF
  44. DiagramIR: An Automatic Pipeline for Educational Math Diagram Evaluation

    Vishal Kumar, Shubhra Mishra, Rebecca Hao, Rizwaan Malik, David Broman, Dorottya Demszky · PDF
  45. DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning

    Qi Cao, Ruiyi Wang, Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie · PDF
  46. EchoRL: Learning to Plan through Experience for Efficient Reinforcement Learning

    Dong Liu, Yanxuan Yu, Ying Nian Wu · PDF
  47. Evaluating Spatial Reasoning in Language Models

    Aarush Gupta · PDF
  48. Exact Learning of Arithmetic with Differentiable Agents

    Hristo Papazov, Francesco D'Angelo, Nicolas Flammarion · PDF
  49. Expanding the Action Space of LLMs to Reason Beyond Language

    Zhongqi Yue, Weishi Wang, Yundaichuan Zhan, Juncheng Li, Daniel Dahlmeier, Fredrik D. Johansson · PDF
  50. Faults in our Formal Benchmarks

    Pawan Sasanka Ammanamanchi, Siddharth Bhat · PDF
  51. FoCus: Improving Faithfulness in Chain-of-Thoughts by Training on Structured Reasoning Data

    Guan-Yi Lin, Chung-En Sun, Tsui-Wei Weng · PDF
  52. FormalML: A Benchmark for Evaluating Formal Subgoal Completion in Machine Learning Theory

    Xiao-Wen Yang, Zihao Zhang, Jianuo Cao, Zhi Zhou, Zenan Li, Lan-Zhe Guo, Yuan Yao, Taolue Chen, Yu-Feng Li, Xiaoxing Ma · PDF
  53. FractalBench: Diagnosing Visual-Mathematical Reasoning Through Recursive Program Synthesis

    Jan Ondras, Marek Suppa · PDF
  54. Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute

    Sheng Liu, Tianlang Chen, Pan Lu, Haotian Ye, Yizheng Chen, Lei Xing, James Zou · PDF
  55. From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

    Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao · PDF
  56. HeuriGym: An Agentic Benchmark for LLM-Crafted Heuristics in Combinatorial Optimization

    Hongzheng Chen, Yingheng Wang, Yaohui Cai, Hins Hu, Jiajie Li, Shirley Huang, Chenhui Deng, Rongjian Liang, Shufeng Kong, Haoxing Ren, Samitha Samaranayake, Carla P Gomes, Zhiru Zhang · PDF
  57. Hilbert: Recursively Building Formal Proofs with Informal Reasoning

    Sumanth Varambally, Thomas Voice, Yanchao Sun, Zhifeng Chen, Rose Yu, Ke Ye · PDF
  58. How does RL induce skill composition? A Case Study using Countdown

    Simon Park, Simran Kaur, Sanjeev Arora · PDF
  59. HYBRIDMIND: Meta Selection of Natural Language and Symbolic Language for Enhanced LLM Reasoning

    Simeng Han, Tianyu Liu, Chuhan Li, Xuyuan Xiong, Arman Cohan · PDF
  60. I-RAVEN-X: Benchmarking Generalization and Robustness of Analogical and Mathematical Reasoning in Large Language and Reasoning Models

    Giacomo Camposampiero, Michael Hersche, Roger Wattenhofer, Abu Sebastian, Abbas Rahimi · PDF
  61. IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation

    Johannes Schmitt, Gergely Berczi, Jasper Dekoninck, Jeremy Feusi, Tim Gehrunger, Raphael Appenzeller, Jim Bryan, Niklas Canova, Timo de Wolff, Filippo Gaia, Michel van Garrel, Baran Hashemi, David Holmes, Aitor Iribar Lopez, Victor Jaeck, Martina Jørgensen, Steven Kelk, Stefan Kuhlmann, Adam Kurpisz, Chiara Meroni, Ingmar Metzler, Samuel Muñoz-Echániz, Robert Nowak, Georg Oberdieck, Daniel Platt, Dylan Possamaï, Gabriel Ribeiro, Raúl Sánchez Galán, Zheming Sun, Josef Teichmann, Richard P Thomas, Charles Vial · PDF
  62. Improving autoformalization via cycle consistency and incremental type-checking using language-model probabilistic programs

    Mauricio Barba da Costa, Fabian Zaiser, Katherine M. Collins, Romir Patel, Timothy J. O'Donnell, Alexander K. Lew, Joshua B. Tenenbaum, Vikash Mansinghka, Cameron Freer · PDF
  63. Improving ML attacks on LWE with data repetition and stepwise regression

    Alberto Alfarano, Eshika Saxena, Emily Wenger, Francois Charton, Kristin E. Lauter · PDF
  64. In Good GRACES: Principled Teacher Selection for Knowledge Distillation

    Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi, Sham M. Kakade, Surbhi Goel · PDF
  65. In-the-Flow Agentic System Optimization for Effective Planning and Tool Use

    Zhuofeng Li, Haoxiang Zhang, Seungju Han, Sheng Liu, Jianwen Xie, Yu Zhang, Yejin Choi, James Zou, Pan Lu · PDF
  66. Infinite-Dimensional HiPPO Provides an Explicit Formula for LSSLs

    Atsushi Takabatake, Takaharu Yaguchi · PDF
  67. Inpainting-Guided Policy Optimization for Diffusion Large Language Models

    Siyan Zhao, Mengchen Liu, Jing Huang, Miao Liu, Chenyu Wang, Bo Liu, Yuandong Tian, Guan Pang, Sean Bell, Aditya Grover, Feiyu Chen · PDF
  68. Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles

    Antara Raaghavi Bhattacharya, Isabel Papadimitriou, Kathryn Davidson, David Alvarez-Melis · PDF
  69. Kimina Lean Server: A High-Performance Lean Server for Large-Scale Verification

    Marco Dos Santos, Hugues de Saxcé, Haiming Wang, Mantas Baksys, Mert Unsal, Junqi Liu, Zhengying Liu, Jia LI · PDF
  70. Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

    Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han · PDF
  71. Layer Importance for Mathematical Reasoning is Forged in Pre-Training and Invariant after Post-Training

    Aadim Nepal, Safal Shrestha, Anubhav Shrestha, Minwu Kim, Jalal Naghiyev, Ravid Shwartz-Ziv, Keith W. Ross · PDF
  72. LeanDojo-v2: A Comprehensive Library for AI-Assisted Theorem Proving in Lean

    Ryan Hsiang, William Adkisson, Robert Joseph George, Anima Anandkumar · PDF
  73. Learning How to Use Tools, Not Just When: Pattern-Aware Tool-Integrated Reasoning

    Ningning Xu, Yuxuan Jiang, Shubhashis Roy Dipta · PDF
  74. Learning Modular Exponentiation with Transformers

    David Demitri Africa, Sara Mani Kapoor, Theo Simon Sorg, Challenger Mishra · PDF
  75. Learning Permuted Congruential Sequences with Transformers

    Tao Tao, Maissam Barkeshli · PDF
  76. Learning to Reason on Hard Problems with Privileged On-Policy Exploration

    Yuxiao Qu, Amrith Setlur, Virginia Smith, Ruslan Salakhutdinov, Aviral Kumar · PDF
  77. Limits of Generalization in RLVR: Two Case Studies in Mathematical Reasoning

    Md Tanvirul Alam, Nidhi Rastogi · PDF
  78. Limits of PRM-Guided Tree Search for Mathematical Reasoning with LLMs

    Tristan Cinquin, Geoff Pleiss, Agustinus Kristiadi · PDF
  79. LLM-Generated Search Heuristics Can Solve Open Instances of Combinatorial Design Problems

    Christopher D. Rosin · PDF
  80. Local Coherence or Global Validity? Investigating RLVR Traces in Math Domains

    Soumya Rani Samineni, Durgesh Kalwar, Vardaan Gangal, Siddhant Bhambri, Subbarao Kambhampati · PDF
  81. MathBode: Understanding LLM Reasoning with Dynamical Systems

    Charles L. Wang · PDF
  82. MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

    Shaden Alshammari, Kevin Wen, Abrar Zainal, Mark Hamilton, Navid Safaei, Sultan Albarakati, William T. Freeman, Antonio Torralba · PDF
  83. MathSticks: A Benchmark for Visual Symbolic Compositional Reasoning with Matchstick Puzzles

    Yuheng Ji, Huajie Tan, Cheng Chi, Yijie Xu, Yuting Zhao, Enshen Zhou, Huaihai Lyu, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Xiaolong Zheng · PDF
  84. Measuring Off-Trajectory Math Reasoning of LLMs

    Aochong Oliver Li, Tanya Goyal · PDF
  85. Meta Thinker: Thinking What AI Thinks

    Junyu Guo, Shangding Gu, Costas Spanos, Javad Lavaei · PDF
  86. Minif2f in Rocq: Automatic Translation Between Proof Assistants — A Case Study

    Jules Viennot, Guillaume Baudart, Emilio Jesús Gallego Arias, Marc Lelarge · PDF
  87. Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions

    Haoze Wu, Cheng Wang, Wenshuo Zhao, Junxian He · PDF
  88. Modeling Chain-of-Thought Collapse in Pruned Language Models: Fidelity and Similarity Analysis for Mathematical Reasoning

    AVINASH KUMAR SHARMA, Tushar Shinde · PDF
  89. Nested Depth Generalization in Transformers

    Emile R Richard · PDF
  90. Numbers Already Carry Their Own Embeddings

    SuHyun Bae, Donghun Lee · PDF
  91. OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization

    Yiyou Sun, Shawn Hu, Georgia Zhou, Ken Jiankun Zheng, Hannaneh Hajishirzi, Nouha Dziri, Dawn Song · PDF
  92. On the Design of KL-Regularized Policy Gradient Algorithms for LLM Reasoning

    Yifan Zhang, Yifeng Liu, Huizhuo Yuan, Yang Yuan, Quanquan Gu, Andrew C Yao · PDF
  93. On the Evolution of Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

    Yujun Zhou, Zhenwen Liang, Haolin Liu, Wenhao Yu, Kishan Panaganti, Linfeng Song, Dian Yu, Xiangliang Zhang, Haitao Mi, Dong Yu · PDF
  94. One Token to Fool LLM-as-a-Judge

    Yulai Zhao, Haolin Liu, Dian Yu, Sunyuan Kung, MEIJIA CHEN, Haitao Mi, Dong Yu · PDF
  95. OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

    Yihe Deng, Hritik Bansal, Fan Yin, Nanyun Peng, Wei Wang, Kai-Wei Chang · PDF
  96. PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

    Yuhua Jiang, Yuwen Xiong, Yufeng Yuan, Chao Xin, Wenyuan Xu, YuYue, Qianchuan Zhao, Lin Yan · PDF
  97. Patching Gaps In LLM Reasoning With Interventional Training

    Matthew Y. R. Yang, Hao Bai, Ian Wu, Gene Yang, Amrith Setlur, Aviral Kumar · PDF
  98. Pretraining Scaling Laws for Generative Evaluations of Language Models

    Rylan Schaeffer, Noam Itzhak Levi, Brando Miranda, Sanmi Koyejo · PDF
  99. PRISM-Physics: Causal DAG-Based Process Evaluation for Physics Reasoning

    Wanjia Zhao, Qinwei Ma, Jingzhe Shi, Shirley Wu, Jiaqi Han, Yijia Xiao, Si-Yuan Chen, Xiao Luo, Ludwig Schmidt, James Zou · PDF
  100. Probabilistic Soundness Guarantees in LLM Reasoning Chains

    Weiqiu You, Anton Xue, Shreya Havaldar, Delip Rao, Helen Jin, Chris Callison-Burch, Eric Wong · PDF
  101. Process-Verified Reinforcement Learning for Theorem Proving via Lean

    Minsu Kim, Se-Young Yun · PDF
  102. ProofGym: Unifying LLM-Based Theorem Proving Across Formal Systems

    Xinrui Li, Wenjie Ma, Hangrui Bi, Zhaoyu Li, Xujie Si, Kaiyu Yang · PDF
  103. ProofOptimizer: Training Language Models to Simplify Proofs without Human Demonstrations

    Alex Gu, Bartosz Piotrowski, Fabian Gloeckle, Kaiyu Yang, Aram H. Markosyan · PDF
  104. ProxyThinker: Test-Time Guidance through Small Visual Reasoners

    Zilin Xiao, Jaywon Koo, Siru Ouyang, Jefferson Hernandez, Yu Meng, Vicente Ordonez · PDF
  105. PVSGym: A Proof Learning Environment

    Manoj Acharya, Karthik Nukala, Natarajan Shankar · PDF
  106. Quagmires in SFT-RL Post-Training: When High SFT Scores Mislead and What to Use Instead

    Feiyang Kang, Michael Kuchnik, Karthik Padthe, Marin Vlastelica, Ruoxi Jia, Carole-Jean Wu, Newsha Ardalani · PDF
  107. R-Zero: Self-Evolving Reasoning LLM from Zero Data

    Chengsong Huang, Wenhao Yu, Xiaoyang Wang, Hongming Zhang, Zongxia Li, Ruosen Li, Jiaxin Huang, Haitao Mi, Dong Yu · PDF
  108. RADAR: Reasoning–Ability and Difficulty-Aware Routing for Reasoning LLMs

    Nigel Fernandez, Branislav Kveton, Ryan A. Rossi, Andrew Lan, Zichao Wang · PDF
  109. RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval

    Minhae Oh, Jeonghye Kim, Nakyung Lee, Donggeon Seo, Taeuk Kim, Jungwoo Lee · PDF
  110. RefGrader: Automated Grading of Mathematical Competition Proofs using Agentic Workflows

    Hamed Mahdavi, Pouria Mahdavinia, Pegah Mohammadipour, Samira Malek, Alireza Farhadi, Majid Daliri, Alireza Hashemi, Niloofar Mireshghallah, Amir Khasahmadi, Vasant G. Honavar · PDF
  111. Reinforcement Learning for Hierarchical Proof Generation in Lean 4

    Fabian Gloeckle, Alex Gu, Gabriel Synnaeve, Amaury Hayat · PDF
  112. Reliable Fine-Grained Evaluation of Natural Language Math Proofs

    Wenjie Ma, Andrei Cojocaru, Neel Kolhe, Robin Sharif, Haihan Zhang, Vincent Zhuang, Matei Zaharia, Sewon Min · PDF
  113. Restructuring the Corpus Makes RAG Work for Math

    Negar Arabzadeh, Wenjie Ma, Sewon Min, Matei Zaharia · PDF
  114. Revisiting the Uniform Information Density Hypothesis in LLM Reasoning Traces

    Minju Gwak, Guijin Son, Jaehyung Kim · PDF
  115. Risk-Sensitive Reinforcement Learning for Alleviating Exploration Dilemmas in Large Language Models

    Yuhua Jiang, Jiawei Huang, Yufeng Yuan, Xin Mao, YuYue, Qianchuan Zhao, Lin Yan · PDF
  116. RLVR vs. Distillation: Understanding Accuracy and Capability in LLM Mathematical Reasoning

    Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith W. Ross · PDF
  117. SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and Answers

    Chaitanya Manem, Pratik Prabhanjan Brahma, Prakamya Mishra, Zicheng Liu, Emad Barsoum · PDF
  118. SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas

    Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken · PDF
  119. Scaling Generative Verifiers For Natural Language Mathematical Proof Verification And Selection

    Sadegh Mahdavi, Branislav Kisacanin, Shubham Toshniwal, Wei Du, Ivan Moshkov, George Armstrong, Renjie Liao, Christos Thrampoulidis, Igor Gitman · PDF
  120. Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

    Ran Xin, Zeyu Zheng, Yanchen Nie, Kun Yuan, Xia Xiao · PDF
  121. Scheherazade: Evaluating Chain-of-Thought Math Reasoning in LLMs with Chain-of-Problems

    Stephen Miner, Yoshiki Takashima, Simeng Han, Sam Kouteili, Ferhat Erata, Ruzica Piskac, Scott J Shapiro · PDF
  122. SciML Agents: Write the Solver, Not the Solution

    Saarth Gaonkar, Xiang Zheng, Haocheng Xi, Rishabh Tiwari, Kurt Keutzer, Dmitriy Morozov, Michael W. Mahoney, Amir Gholami · PDF
  123. Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models

    Sayam Goyal, Brad Peters, María Emilia Granda, Akshath Vijayakumar Narmadha, Dharunish Yugeswardeenoo, Callum Stuart McDougall, Sean O'Brien, Ashwinee Panda, Kevin Zhu, Cole Blondin · PDF
  124. Simultaneous Multi-objective Alignment Across Verifiable and Non-verifiable Rewards

    Yiran Jenny Shen, Yu Xia, Jonathan Daniel Chang, Prithviraj Ammanabrolu · PDF
  125. Single-stream Policy Optimization

    Zhongwen Xu, Zihan Ding · PDF
  126. Skill-Aware Data Selection and Fine-Tuning for Data-Efficient Reasoning Distillation

    Lechen Zhang, Yunxiang Zhang, Wei Hu, Lu Wang · PDF
  127. Specifying exact circuit algorithms in universal transformers

    Takuya Ito, Ruchir Puri, Parikshit Ram · PDF
  128. SPG: Sandwiched Policy Gradient for Mask Diffusion Language Models

    Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu · PDF
  129. SpotIt: Evaluating Text-to-SQL Evaluation with Formal Verification

    Rocky Klopfenstein, YANG HE, Andrew Tremante, Yuepeng Wang, Nina Narodytska, Haoze Wu · PDF
  130. STAT: Skill-Targeted Adaptive Training

    Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora · PDF
  131. STELAR-VISION: Self-Topology-Aware Efficient Learning for Aligned Reasoning in Vision

    Chen Li, Han Zhang, Zhantao Yang, Fangyi Chen, ZIHAN WANG, Anudeepsekhar Bolimera, Marios Savvides · PDF
  132. Stoic Reasoner: Dual-Mode Transformers that Compress to Think and Decompress to Speak

    Angeliki Giannou, Liu Yang, Kangwook Lee, Robert D Nowak, Dimitris Papailiopoulos · PDF
  133. StreetMath: Study of LLMs’ Approximation Behaviors

    Chiung-Yi Tseng, Maisha thasin, Somshubhra Roy, Danyang Zhang, Blessing Effiong · PDF
  134. Systematic Diagnosis of Brittle Reasoning in Large Language Models

    V. S. Raghu Parupudi · PDF
  135. Tales from a Graph: a Pipeline for Mathematical Problem Generation

    Bastien Le Chenadec, Eleonora Kreacic, Mathieu Sibue, Gabriel Mercier, Jannik Brinkmann, Nelson Vadori, Manuela Veloso · PDF
  136. Think, Align, Select: Query–Key Scores for LLM Reasoning

    Mark Obozov, Eduard Tulchinskii, Kristian Kuznetsov, Michael Diskin, Serguei Barannikov · PDF
  137. ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models

    Chung-En Sun, Ge Yan, Tsui-Wei Weng · PDF
  138. Tool-Assisted Multi-Turn Theorem Proving with LLMs

    Kanan Gupta, Jannis Limperg, Udaya Ghai · PDF
  139. Towards Scaling Laws for Symbolic Regression

    David Otte, Jörg K.H. Franke, Frank Hutter · PDF
  140. Towards Understanding Self-play for LLM Reasoning

    Justin Yang Chae, Md Tanvirul Alam, Nidhi Rastogi · PDF
  141. TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models

    Shima Imani, Seungwhan Moon, Lambert Mathias, Lu Zhang, Babak Damavandi · PDF
  142. Tree-OPO: Off-policy Monte Carlo Tree-Guided Advantage Optimization for Multistep Reasoning

    Bingning Huang, Tu Nguyen, Matthieu Zimmer · PDF
  143. Understanding Tool-Integrated Reasoning

    Heng Lin, Zhongwen Xu · PDF
  144. Unspoken Logic: Understanding and bridging the gap between free-form and LLM-interpretable natural language mathematical proofs

    Chenjun Guo, Manooshree Patel, Bjoern Hartmann, J.D. Zamfirescu-Pereira, Sarah Chasins, Gireeja Ranade · PDF
  145. Usefulness-Driven Learning of Formal Mathematics

    Timothe Kasriel, Thomas Lu, Qinghua Ding, Jingxuan He, Dawn Song · PDF
  146. VeriBench-FTP: A Formal Theorem Proving Benchmark in Lean 4 for Code Verification

    Slim Barkallah, Srivatsava Daruru, Brando Miranda, Leni Aniva, Allen Nie, Sanmi Koyejo · PDF
  147. Why GRPO Needs Normalization: A Local-Curvature Perspective on Adaptive Gradients

    Cheng Ge, Caitlyn Heqi Yin, Hao Liang, Jiawei Zhang · PDF
  148. Why Reinforcement Learning Struggles with Expression Simplification: A Reward Analysis

    Oleksii Shuhailo, Karel Chvalovsky, Tomáš Pevný · PDF
  149. Winning Gold at IMO 2025 with a Model-Agnostic Verification-and-Refinement Pipeline

    Yichen Huang, Lin F. Yang · PDF
  150. You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models

    Shuvendu Roy, Hossein Hajimirsadeghi, Mengyao Zhai, Golnoosh Samei · PDF