NeurIPS 2024PastMath & reasoningLarge language models

The 4th Workshop on Mathematical Reasoning and AI

Name: The 4th Workshop on Mathematical Reasoning and AI (MATH-AI)
Start: Dec 14, 2024

MATH-AI

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.

Submission deadline: Sep 16, 2024, 11:59 UTC
SEED estimate of the historical deadline — verify
Workshop day: Dec 14, 2024
Submission portal: OpenReview
Notes: SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.

Accepted papers (69)

Fetched from OpenReview (v2) on 2026-06-10.

A Hessian View of Grokking in Mathematical Reasoning
Zhenshuo Zhang, Jerry Weihong Liu, Christopher Re, Hongyang R. Zhang · PDF
ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving
Fabian Gloeckle, Jannis Limperg, Gabriel Synnaeve, Amaury Hayat · PDF
Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Huajian Xin, Daya Guo, Zhihong Shao, Z.Z. Ren, Qihao Zhu, Bo Liu, Chong Ruan, Wenda Li, Xiaodan Liang · PDF
AI-Assisted Generation of Difficult Math Questions
Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Curtis Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal · PDF
Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic
Shaoxiong Duan, Yining Shi, Wei Xu · PDF
CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem
Haoxuan Deng, Bohao Zheng, Yirui Jiang, Trung Hieu Tran · PDF
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
ZEYU WANG · PDF
Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning
Timofey Fedoseev, Dimitar Iliev Dimitrov, Timon Gehr, Martin Vechev · PDF
DafnyBench: A Benchmark for Formal Software Verification
Chloe R Loughridge, Qinyi Sun, Seth Ahrenbach, Federico Cassano, Chuyue Sun, Ying Sheng, Anish Mudide, Md Rakib Hossain Misu, Nada Amin, Max Tegmark · PDF
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
Hyunsik Chae, Seungwoo Yoon, Chloe Yewon Chun, Gyehun Go, Yongin Cho, Gyeongmin Lee, Ernest K. Ryu · PDF
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
Sami Baral, Li Lucy, Ryan Knight, Alice Ng, Luca Soldaini, Neil Heffernan, Kyle Lo · PDF
FEABench: Evaluating Language Models on Real World Physics Reasoning Ability
Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael Brenner, Peter Christian Norgaard · PDF
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
Yihe Deng, Paul Mineiro · PDF
Formal Representation and Solution of Plane Geometric Problems
Xiaokai Zhang, Na Zhu, Cheng Qin, Yang Li, Zhenbing Zeng, Tuo Leng · PDF
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
Kefan Dong, Arvind V. Mahankali, Tengyu Ma · PDF
Generative Verifiers: Reward Modeling as Next-Token Prediction
Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal · PDF
Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem
Michael Li, Christopher Haberland, Natasha Jaques · PDF
Give me a hint: Can LLMs take a hint to solve math problems?
Vansh Agrawal, Pratham Singla, Amitoj Singh Miglani, Shivank Garg, Ayush Mangal · PDF
HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics
Jingxuan Fan, Sarah Martinson, Erik Y. Wang, Kaylie Hausknecht, Jonah Brenner, Danxian Liu, Nianli Peng, Corey Wang, Michael Brenner · PDF
How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem
Guan Zhe Hong, Nishanth Dikkala, Enming Luo, Cyrus Rashtchian, Xin Wang, Rina Panigrahy · PDF
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang · PDF
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
Xiaotian Han, Yiren Jian, Xuefeng Hu, Haogeng Liu, Yiqi Wang, Qihang Fan, Yuang Ai, Huaibo Huang, Ran He, Zhenheng Yang, Quanzeng You · PDF
Interleaving Text and Number Embeddings to Solve Mathemathics Problems
Marvin Alberts, Gianmarco Gabrieli, Irina Espejo Morales · PDF
Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models
Neeraj Gangwar, Suma Bhat, Nickvash Kani · PDF
Lean-STaR: Learning to Interleave Thinking and Proving
Haohan Lin, Zhiqing Sun, Sean Welleck, Yiming Yang · PDF
Learning Elementary Cellular Automata with Transformers
Mikhail Burtsev · PDF
Learning Mathematical Rules with Large Language Models
Antoine Gorceix, Bastien Le Chenadec, Ahmad Rammal, Nelson Vadori, Manuela Veloso · PDF
Library Learning Doesn’t: The Curious Case of the Single-Use “Library”
Ian Berlot-Attwell, Frank Rudzicz, Xujie Si · PDF
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Pingchuan Ma, Tsun-Hsuan Wang, Minghao Guo, Zhiqing Sun, Joshua B. Tenenbaum, Daniela Rus, Chuang Gan, Wojciech Matusik · PDF
Looped Transformers for Length Generalization
Ying Fan, Yilun Du, Kannan Ramchandran, Kangwook Lee · PDF
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research
Herman Chau, Helen Jenne, Davis Brown, Jesse He, Mark Raugas, Sara C. Billey, Henry Kvinge · PDF
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
Jesse He, Helen Jenne, Herman Chau, Davis Brown, Mark Raugas, Sara C. Billey, Henry Kvinge · PDF
Math for AI: On the Generalization of Learning Mathematical Problem Solving
Ruochen Zhou, Minrui Xu, Shiqi Chen, Junteng Liu, Yunqi Li, Xinxin Lin, Zhengyu Chen, Junxian He · PDF
Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers
Minh Phu Nguyen, Minh Phuong Pham, Man Ngo, Kha Tuan Minh · PDF
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula
Shubhra Mishra, Gabriel Poesia, Belinda Mo, Noah Goodman · PDF
MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis
Sagnik Anupam, Maddy Bowers, Omar Costilla Reyes, Armando Solar-Lezama · PDF
miniCTX: Neural Theorem Proving with (Long-)Contexts
Jiewen Hu, Thomas Zhu, Sean Welleck · PDF
Mining Math Conjectures from LLMs: A Pruning Approach
Jake Chuharski, Elias Rojas Collins, Mark Meringolo · PDF
Models Can and Should Embrace the Communicative Nature of Human-Generated Math
Sasha Boguraev, Ben Lipkin, Leonie Weissweiler, Kyle Mahowald · PDF
NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving
Laetitia Teodorescu, Guillaume Baudart, Emilio Jesús Gallego Arias, Marc Lelarge · PDF
Not All LLM Reasoners Are Created Equal
Arian Hosseini, Alessandro Sordoni, Daniel Kenji Toyama, Aaron Courville, Rishabh Agarwal · PDF
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie, Yangsibo Huang, Chiyuan Zhang, Da Yu, Xinyun Chen, Bill Yuchen Lin, Bo Li, Badih Ghazi, Ravi Kumar · PDF
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
Shubham Toshniwal, Wei Du, Ivan Moshkov, Branislav Kisacanin, Alexan Ayrapetyan, Igor Gitman · PDF
Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification
Noor Rahim, Ali Abdul Rahim · PDF
Proving Olympiad Algebraic Inequalities without Human Demonstrations
Chenrui Wei, Mengzhou Sun, Wei Wang · PDF
Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning
Aryan Gulati, Brando Miranda, Eric Chen, Emily Xia, Kai Fronsdal, Bruno de Moraes Dumont, Sanmi Koyejo · PDF
Reasoning and Tools for Forecasting
Elvis Hsieh, Preston Fu, Jonathan Chen · PDF
Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving
Ziyu Ye, Jiacheng Chen, Jonathan Light, Yifei Wang, Jiankai Sun, Mac Schwager, Philip Torr, Guohao Li, Yuxin Chen, Kaiyu Yang, Yisong Yue, Ziniu Hu · PDF
Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models
Jonas Zausinger, Lars Pennig, Kacper Chlodny, Vincent Limbach, Anna Ketteler, Thorben Prein, Vishwa Mohan Singh, Michael Danziger, Jannis Born · PDF
Repeated examples help learn arithmetic
Francois Charton, Julia Kempe · PDF
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation.
Prakhar Dixit, Tim Oates · PDF
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
Kunal Singh, Ankan Biswas, Sayandeep Bhowmick, Pradeep Moturi · PDF
Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting
Tim Knappe, Ryan Luo Li, Ayush Chauhan, Kaylee Chhua, Kevin Zhu, Sean O'Brien · PDF
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On
Liang Zeng, Liangjun Zhong · PDF
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
Hritik Bansal, Arian Hosseini, Rishabh Agarwal, Vinh Q. Tran, Mehran Kazemi · PDF
STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing
Jiaru Zou, Qing Wang, Pratyush Thakur, Nickvash Kani · PDF
Structure Based Dataset on SAT Solving with Graph Neural Networks
Yi Fu, Anthony Tompkins, Yang Song, Maurice Pagnucco · PDF
Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs
Yuan-Hao Jiang, Ruijia Li, Yuang Wei, Rui Jia, Xiaobao Shao, Hanglei Hu, Bo Jiang · PDF
Synthesizing Verified Mathematical Problems
Xuefeng Li, Yanheng He, Pengfei Liu · PDF
The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines
Fukun Evelene Zhang, Bonan Zhao · PDF
The Karp Dataset
Mason DiCicco, Eamon Worden, Daniel Reichman, Neil Heffernan, Conner Olsen, Nikhil Gangaram · PDF
Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning
Alexander Koziell-Pipe, Richie Yeung, Matthew Sutcliffe · PDF
Transformers Can Do Arithmetic with the Right Embeddings
Sean Michael McLeish, Arpit Bansal, Alex Stein, Neel Jain, John Kirchenbauer, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Jonas Geiping, Avi Schwarzschild, Tom Goldstein · PDF
Transformers to Predict the Applicability of Symbolic Integration Routines
Rashid Barket, Uzma Shafiq, Matthew England, Juergen Gerhard · PDF
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
Sina Rismanchian, Yasaman Razeghi, Sameer Singh, Shayan Doroudi · PDF
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
David Brandfonbrener, Simon Henniger, Sibi Raja, Tarun Prasad, Chloe R Loughridge, Federico Cassano, Sabrina Ruixin Hu, Jianang Yang, William E. Byrd, Robert Zinkov, Nada Amin · PDF
VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning
Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux · PDF
WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs
Eryk Banatt, Jonathan Cheng, Tiffany Hwu · PDF
Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry
Shiven Sinha, Ameya Prabhu, Ponnurangam Kumaraguru, Siddharth Bhat, Matthias Bethge · PDF

Accepted papers (69)

☆A Hessian View of Grokking in Mathematical Reasoning

☆ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving

☆Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

☆AI-Assisted Generation of Difficult Math Questions

☆Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic

☆CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem

☆CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

☆Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

☆DafnyBench: A Benchmark for Formal Software Verification

☆Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

☆DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images

☆FEABench: Evaluating Language Models on Real World Physics Reasoning Ability

☆Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

☆Formal Representation and Solution of Plane Geometric Problems

☆Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically

☆Generative Verifiers: Reward Modeling as Next-Token Prediction

☆Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem

☆Give me a hint: Can LLMs take a hint to solve math problems?

☆HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics

☆How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem

☆Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

☆InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

☆Interleaving Text and Number Embeddings to Solve Mathemathics Problems

☆Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models

☆Lean-STaR: Learning to Interleave Thinking and Proving

☆Learning Elementary Cellular Automata with Transformers

☆Learning Mathematical Rules with Large Language Models

☆Library Learning Doesn’t: The Curious Case of the Single-Use “Library”

☆LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

☆Looped Transformers for Length Generalization

☆Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research

☆Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes

☆Math for AI: On the Generalization of Learning Mathematical Problem Solving

☆Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers

☆MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

☆MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis

☆miniCTX: Neural Theorem Proving with (Long-)Contexts

☆Mining Math Conjectures from LLMs: A Pruning Approach

☆Models Can and Should Embrace the Communicative Nature of Human-Generated Math

☆NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving

☆Not All LLM Reasoners Are Created Equal

☆On Memorization of Large Language Models in Logical Reasoning

☆OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

☆Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification

☆Proving Olympiad Algebraic Inequalities without Human Demonstrations

☆Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning

☆Reasoning and Tools for Forecasting

☆Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving

☆Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

☆Repeated examples help learn arithmetic

☆SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation.

☆SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance

☆Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

☆Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On

☆Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

☆STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

☆Structure Based Dataset on SAT Solving with Graph Neural Networks

☆Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs

☆Synthesizing Verified Mathematical Problems

☆The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines

☆The Karp Dataset

☆Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning

☆Transformers Can Do Arithmetic with the Right Embeddings

☆Transformers to Predict the Applicability of Symbolic Integration Routines

☆TurtleBench: A Visual Programming Benchmark in Turtle Geometry

☆VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

☆VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

☆WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs

☆Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry

A Hessian View of Grokking in Mathematical Reasoning

ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving

Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data

AI-Assisted Generation of Difficult Math Questions

Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic

CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem

CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models

Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning

DafnyBench: A Benchmark for Formal Software Verification

Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models

DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images

FEABench: Evaluating Language Models on Real World Physics Reasoning Ability

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

Formal Representation and Solution of Plane Geometric Problems

Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically

Generative Verifiers: Reward Modeling as Next-Token Prediction

Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem

Give me a hint: Can LLMs take a hint to solve math problems?

HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics

How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning

Interleaving Text and Number Embeddings to Solve Mathemathics Problems

Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models

Lean-STaR: Learning to Interleave Thinking and Proving

Learning Elementary Cellular Automata with Transformers

Learning Mathematical Rules with Large Language Models

Library Learning Doesn’t: The Curious Case of the Single-Use “Library”

LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery

Looped Transformers for Length Generalization

Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research

Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes

Math for AI: On the Generalization of Learning Mathematical Problem Solving

Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers

MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula

MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis

miniCTX: Neural Theorem Proving with (Long-)Contexts

Mining Math Conjectures from LLMs: A Pruning Approach

Models Can and Should Embrace the Communicative Nature of Human-Generated Math

NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving

Not All LLM Reasoners Are Created Equal

On Memorization of Large Language Models in Logical Reasoning

OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data

Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification

Proving Olympiad Algebraic Inequalities without Human Demonstrations

Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning

Reasoning and Tools for Forecasting

Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving

Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models

Repeated examples help learn arithmetic

SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation.

SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance

Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting

Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing

Structure Based Dataset on SAT Solving with Graph Neural Networks

Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs

Synthesizing Verified Mathematical Problems

The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines

The Karp Dataset

Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning

Transformers Can Do Arithmetic with the Right Embeddings

Transformers to Predict the Applicability of Symbolic Integration Routines

TurtleBench: A Visual Programming Benchmark in Turtle Geometry

VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search

VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning

WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs

Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry