NeurIPS 2024 Past Math & reasoningLarge language models

The 4th Workshop on Mathematical Reasoning and AI

MATH-AI

Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.

Submission deadline
Sep 15, 2024, 23:59 AoE (UTC−12)
SEED estimate of the historical deadline — verify
Workshop day
Dec 14, 2024
Submission portal
OpenReview
Notes
SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.

Accepted papers (69)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Hessian View of Grokking in Mathematical Reasoning

    Zhenshuo Zhang, Jerry Weihong Liu, Christopher Re, Hongyang R. Zhang · PDF
  2. ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving

    Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, Amaury Hayat · PDF
  3. Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

    Huajian Xin, Daya Guo, Zhihong Shao, Z.Z. Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang · PDF
  4. AI-Assisted Generation of Difficult Math Questions

    Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Curtis Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal · PDF
  5. Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic

    Shaoxiong Duan, Yining Shi, Wei Xu · PDF
  6. CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem

    Haoxuan Deng, Bohao Zheng, Yirui Jiang, Trung Hieu Tran · PDF
  7. CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

    ZEYU WANG · PDF
  8. Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

    Timofey Fedoseev, Dimitar Iliev Dimitrov, Timon Gehr, Martin Vechev · PDF
  9. DafnyBench: A Benchmark for Formal Software Verification

    Chloe R Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark · PDF
  10. Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

    Hyunsik Chae, Seungwoo Yoon, Chloe Yewon Chun, Gyehun Go, Yongin Cho, Gyeongmin Lee, Ernest K. Ryu · PDF
  11. DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images

    Sami Baral, Li Lucy, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo · PDF
  12. FEABench: Evaluating Language Models on Real World Physics Reasoning Ability

    Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael Brenner, Peter Christian Norgaard · PDF
  13. Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

    Yihe Deng, Paul Mineiro · PDF
  14. Formal Representation and Solution of Plane Geometric Problems

    Xiaokai Zhang, Na Zhu, Cheng Qin, Yang Li, Zhenbing Zeng, Tuo Leng · PDF
  15. Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically

    Kefan Dong, Arvind V. Mahankali, Tengyu Ma · PDF
  16. Generative Verifiers: Reward Modeling as Next-Token Prediction

    Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal · PDF
  17. Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem

    Michael Li, Christopher Haberland, Natasha Jaques · PDF
  18. Give me a hint: Can LLMs take a hint to solve math problems?

    Vansh Agrawal, Pratham Singla, Amitoj Singh Miglani, Shivank Garg, Ayush Mangal · PDF
  19. HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics

    Jingxuan Fan, Sarah Martinson, Erik Y. Wang, Kaylie Hausknecht, Jonah Brenner, Danxian Liu, Nianli Peng, Corey Wang, Michael Brenner · PDF
  20. How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem

    Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy · PDF
  21. Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

    Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang · PDF
  22. InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

    Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You · PDF
  23. Interleaving Text and Number Embeddings to Solve Mathemathics Problems

    Marvin Alberts, Gianmarco Gabrieli, Irina Espejo Morales · PDF
  24. Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models

    Neeraj Gangwar, Suma Bhat, Nickvash Kani · PDF
  25. Lean-STaR: Learning to Interleave Thinking and Proving

    Haohan Lin, Zhiqing Sun, Sean Welleck, Yiming Yang · PDF
  26. Learning Elementary Cellular Automata with Transformers

    Mikhail Burtsev · PDF
  27. Learning Mathematical Rules with Large Language Models

    Antoine Gorceix, Bastien Le Chenadec, Ahmad Rammal, Nelson Vadori, Manuela Veloso · PDF
  28. Library Learning Doesn’t: The Curious Case of the Single-Use “Library”

    Ian Berlot-Attwell, Frank Rudzicz, Xujie Si · PDF
  29. LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

    Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik · PDF
  30. Looped Transformers for Length Generalization

    Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee · PDF
  31. Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research

    Herman Chau, Helen Jenne, Davis Brown, Jesse He, Mark Raugas, Sara C. Billey, Henry Kvinge · PDF
  32. Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes

    Jesse He, Helen Jenne, Herman Chau, Davis Brown, Mark Raugas, Sara C. Billey, Henry Kvinge · PDF
  33. Math for AI: On the Generalization of Learning Mathematical Problem Solving

    Ruochen Zhou, Minrui Xu, Shiqi Chen, Junteng Liu, Yunqi Li, Xinxin Lin, Zhengyu Chen, Junxian He · PDF
  34. Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers

    Minh Phu Nguyen, Minh Phuong Pham, Man Ngo, Kha Tuan Minh · PDF
  35. MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

    Shubhra Mishra, Gabriel Poesia, Belinda Mo, Noah Goodman · PDF
  36. MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis

    Sagnik Anupam, Maddy Bowers, Omar Costilla Reyes, Armando Solar-Lezama · PDF
  37. miniCTX: Neural Theorem Proving with (Long-)Contexts

    Jiewen Hu, Thomas Zhu, Sean Welleck · PDF
  38. Mining Math Conjectures from LLMs: A Pruning Approach

    Jake Chuharski, Elias Rojas Collins, Mark Meringolo · PDF
  39. Models Can and Should Embrace the Communicative Nature of Human-Generated Math

    Sasha Boguraev, Ben Lipkin, Leonie Weissweiler, Kyle Mahowald · PDF
  40. NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving

    Laetitia Teodorescu, Guillaume Baudart, Emilio Jesús Gallego Arias, Marc Lelarge · PDF
  41. Not All LLM Reasoners Are Created Equal

    Arian Hosseini, Alessandro Sordoni, Daniel Kenji Toyama, Aaron Courville, Rishabh Agarwal · PDF
  42. On Memorization of Large Language Models in Logical Reasoning

    Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar · PDF
  43. OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

    Shubham Toshniwal, Wei Du, Ivan Moshkov, Branislav Kisacanin, Alexan Ayrapetyan, Igor Gitman · PDF
  44. Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification

    Noor Rahim, Ali Abdul Rahim · PDF
  45. Proving Olympiad Algebraic Inequalities without Human Demonstrations

    Chenrui Wei, Mengzhou Sun, Wei Wang · PDF
  46. Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning

    Aryan Gulati, Brando Miranda, Eric Chen, Emily Xia, Kai Fronsdal, Bruno de Moraes Dumont, Sanmi Koyejo · PDF
  47. Reasoning and Tools for Forecasting

    Elvis Hsieh, Preston Fu, Jonathan Chen · PDF
  48. Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving

    Ziyu Ye, Jiacheng Chen, Jonathan Light, Yifei Wang, Jiankai Sun, Mac Schwager, Philip Torr, Guohao Li, Yuxin Chen, Kaiyu Yang, Yisong Yue, Ziniu Hu · PDF
  49. Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

    Jonas Zausinger, Lars Pennig, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh, Michael Danziger, Jannis Born · PDF
  50. Repeated examples help learn arithmetic

    Francois Charton, Julia Kempe · PDF
  51. SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation.

    Prakhar Dixit, Tim Oates · PDF
  52. SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance

    Kunal Singh, Ankan Biswas, Sayandeep Bhowmick, Pradeep Moturi · PDF
  53. Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

    Tim Knappe, Ryan Luo Li, Ayush Chauhan, Kaylee Chhua, Kevin Zhu, Sean O'Brien · PDF
  54. Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On

    Liang Zeng, Liangjun Zhong · PDF
  55. Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

    Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi · PDF
  56. STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

    Jiaru Zou, Qing Wang, Pratyush Thakur, Nickvash Kani · PDF
  57. Structure Based Dataset on SAT Solving with Graph Neural Networks

    Yi Fu, Anthony Tompkins, Yang Song, Maurice Pagnucco · PDF
  58. Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs

    Yuan-Hao Jiang, Ruijia Li, Yuang Wei, Rui Jia, Xiaobao Shao, Hanglei Hu, Bo Jiang · PDF
  59. Synthesizing Verified Mathematical Problems

    Xuefeng Li, Yanheng He, Pengfei Liu · PDF
  60. The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines

    Fukun Evelene Zhang, Bonan Zhao · PDF
  61. The Karp Dataset

    Mason DiCicco, Eamon Worden, Daniel Reichman, Neil Heffernan, Conner Olsen, Nikhil Gangaram · PDF
  62. Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning

    Alexander Koziell-Pipe, Richie Yeung, Matthew Sutcliffe · PDF
  63. Transformers Can Do Arithmetic with the Right Embeddings

    Sean Michael McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein · PDF
  64. Transformers to Predict the Applicability of Symbolic Integration Routines

    Rashid Barket, Uzma Shafiq, Matthew England, Juergen Gerhard · PDF
  65. TurtleBench: A Visual Programming Benchmark in Turtle Geometry

    Sina Rismanchian, Yasaman Razeghi, Sameer Singh, Shayan Doroudi · PDF
  66. VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

    David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe R Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin · PDF
  67. VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

    Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux · PDF
  68. WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs

    Eryk Banatt, Jonathan Cheng, Tiffany Hwu · PDF
  69. Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

    Shiven Sinha, Ameya Prabhu, Ponnurangam Kumaraguru, Siddharth Bhat, Matthias Bethge · PDF