NeurIPS 2024 Past Math & reasoningLarge language models
The 4th Workshop on Mathematical Reasoning and AI
MATH-AI
Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.
- Submission deadline
- Sep 15, 2024, 23:59 AoE (UTC−12) SEED estimate of the historical deadline — verify
- Workshop day
- Dec 14, 2024
- Submission portal
- OpenReview
- Notes
- SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.
Accepted papers (69)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Hessian View of Grokking in Mathematical Reasoning
-
ABEL: Sample Efficient Online Reinforcement Learning for Neural Theorem Proving
-
Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
-
AI-Assisted Generation of Difficult Math Questions
-
Attention Bias as an Inductive Bias: How to Teach Transformers Simple Arithmetic
-
CAFA: Coding as Auto-Formulation Can Boost Large Language Models in Solving Linear Programming Problem
-
CausalBench: A Comprehensive Benchmark for Evaluating Causal Reasoning Capabilities of Large Language Models
-
Constraint-Based Synthetic Data Generation for LLM Mathematical Reasoning
-
DafnyBench: A Benchmark for Formal Software Verification
-
Decomposing Complex Visual Comprehension into Atomic Visual Skills for Vision Language Models
-
DrawEduMath: Evaluating Vision Language Models with Expert-Annotated Students’ Hand-Drawn Math Images
-
FEABench: Evaluating Language Models on Real World Physics Reasoning Ability
-
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning
-
Formal Representation and Solution of Plane Geometric Problems
-
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically
-
Generative Verifiers: Reward Modeling as Next-Token Prediction
-
Genetic Curriculum Learning for Distribution Generalization on the Travelling Salesman Problem
-
Give me a hint: Can LLMs take a hint to solve math problems?
-
HARDMATH: A Benchmark Dataset for Challenging Problems in Applied Mathematics
-
How Transformers Reason: A Case Study on a Synthetic Propositional Logic Problem
-
Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving
-
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning
-
Interleaving Text and Number Embeddings to Solve Mathemathics Problems
-
Intermediate Fine-Tuning Improves Mathematical Reasoning in Smaller Models
-
Lean-STaR: Learning to Interleave Thinking and Proving
-
Learning Elementary Cellular Automata with Transformers
-
Learning Mathematical Rules with Large Language Models
-
Library Learning Doesn’t: The Curious Case of the Single-Use “Library”
-
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
-
Looped Transformers for Length Generalization
-
Machine Learning meets Algebraic Combinatorics: A Suite of Datasets to Accelerate AI for Mathematics Research
-
Machines and Mathematical Mutations: Using GNNs to Characterize Quiver Mutation Classes
-
Math for AI: On the Generalization of Learning Mathematical Problem Solving
-
Math2Sym: A System for Solving Elementary Problems via Large Language Models and Symbolic Solvers
-
MathCAMPS: Fine-grained Synthesis of Mathematical Problems From Human Curricula
-
MathDSL: A Domain-Specific Language for Concise Mathematical Solutions Via Program Synthesis
-
miniCTX: Neural Theorem Proving with (Long-)Contexts
-
Mining Math Conjectures from LLMs: A Pruning Approach
-
Models Can and Should Embrace the Communicative Nature of Human-Generated Math
-
NLIR: Natural Language Intermediate Representation for Mechanized Theorem Proving
-
Not All LLM Reasoners Are Created Equal
-
On Memorization of Large Language Models in Logical Reasoning
-
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data
-
Probabilistic Proof State Compression: Optimizing LLM-Guided Formal Verification
-
Proving Olympiad Algebraic Inequalities without Human Demonstrations
-
Putnam-AXIOM: A Functional and Static Benchmark for Measuring Higher Level Mathematical Reasoning
-
Reasoning and Tools for Forecasting
-
Reasoning in Reasoning: A Hierarchical Framework for Better and Faster Neural Theorem Proving
-
Regress, Don’t Guess – A Regression-like Loss on Number Tokens for Language Models
-
Repeated examples help learn arithmetic
-
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation.
-
SBSC: Step-by-Step Coding for Improving Mathematical Olympiad Performance
-
Semantic Self-Consistency: Enhancing Language Model Reasoning via Semantic Weighting
-
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in LLMs — The Story Goes On
-
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
-
STEM-PoM: Evaluating Language Models Math-Symbol Reasoning in Document Parsing
-
Structure Based Dataset on SAT Solving with Graph Neural Networks
-
Synchronizing Verbal Responses and Board Writing for Multimodal Math Instruction with LLMs
-
Synthesizing Verified Mathematical Problems
-
The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines
-
The Karp Dataset
-
Towards Faster Quantum Circuit Simulation Using Graph Decompositions, GNNs and Reinforcement Learning
-
Transformers Can Do Arithmetic with the Right Embeddings
-
Transformers to Predict the Applicability of Symbolic Integration Routines
-
TurtleBench: A Visual Programming Benchmark in Turtle Geometry
-
VerMCTS: Synthesizing Multi-Step Programs using a Verifier, a Large Language Model, and Tree Search
-
VinePPO: Accurate Credit Assignment in RL for LLM Mathematical Reasoning
-
WILT: A Multi-turn, Memorization-Robust Inductive Logic Benchmark for LLMs
-
Wu’s Method Boosts Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry