ICML 2025 Past Math & reasoningLarge language modelsTheory

2nd AI for Math Workshop

AI4Math

Unverified seed entry. Some fields are estimates — confirm everything on the official website before planning a submission.

Submission deadline
May 25, 2025, 23:59 AoE (UTC−12)
SEED estimate of the historical deadline — verify
Workshop day
Jul 18, 2025
Submission portal
OpenReview
Notes
SEED DATA — name/website/date taken from the OpenReview venue record; verify remaining fields.

Accepted papers (112)

Fetched from OpenReview (v2) on 2026-06-10.

  1. $Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training

    Jin Peng Zhou, Kaiwen Wang, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kilian Q Weinberger, Kianté Brantley, Wen Sun · PDF
  2. A Comprehensive Evaluation of Contemporary Machine-Learning-Based Solvers for CO

    Shengyu Feng, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang · PDF
  3. A Compute-Matched Re-Evaluation of TroVE on MATH

    Tobias Sesterhenn, Ian Berlot-Attwell, Janis Zenkner, Christian Bartelt · PDF
  4. A Markov Categorical Framework for Language Modeling

    Yifan Zhang · PDF
  5. A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning

    Hiroshi Yoshihara, Taiki Yamaguchi, Yuichi Inoue · PDF
  6. A Survey on Large Language Model Reasoning Failures

    Peiyang Song, Pengrui Han, Noah Goodman · PDF
  7. Ada-R1: Hybrid CoT via Bi-Level Adaptive Reasoning Optimization

    Haotian Luo, Haiying He, Yibo Wang, Jinluan Yang, Rui Liu, Naiqiang Tan, Xiaochun Cao, Dacheng Tao, Li Shen · PDF
  8. Beyond Accuracy: A Policy Gradient Reweighting Approach for Pass@K Maximization in LLMs

    Sadegh Mahdavi, Muchen Li, Kaiwen Liu, Renjie Liao, Christos Thrampoulidis · PDF
  9. Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

    Yi Hu, Shijia Kang, Haotong Yang, Haotian Xu, Muhan Zhang · PDF
  10. Boolformer: Symbolic Regression of Logic Functions with Transformers

    Stéphane d'Ascoli, Arthur Renard, Vassilis Papadopoulos, Samy Bengio, Joshua M. Susskind, Emmanuel Abbe · PDF
  11. Boosting LLM Reasoning via Spontaneous Self-Correction

    Xutong Zhao, Tengyu Xu, Xuewei Wang, Zhengxing Chen, Di Jin, Liang Tan, Yen-Ting Lin, Zishun Yu, Zhuokai Zhao, Yun He, Sinong Wang, Han Fang, Sarath Chandar, Chen Zhu · PDF
  12. Chain of Thought in Order: Discovering Learning-Friendly Orders for Arithmetic

    Yuta Sato, Kazuhiko Kawamoto, Hiroshi Kera · PDF
  13. Chain-of-Thought Reasoning for Math: Theoretical Foundation and Applications

    Jessica E. Liang · PDF
  14. CLEVER: A Curated Benchmark for Formally Verified Code Generation

    Amitayush Thakur, Jasper Lee, George Tsoukalas, Meghana Sistla, Matthew Zhao, Stefan Zetzsche, Greg Durrett, Yisong Yue, Swarat Chaudhuri · PDF
  15. COAST: Intelligent Time-Adaptive Neural Operators

    Zhikai Wu, Shiyang Zhang, Sizhuang He, Sifan Wang, Min Zhu, Anran Jiao, Lu Lu, David van Dijk · PDF
  16. CoDaPO: Confidence and Difficulty-Adaptive Policy Optimization for Post-Training Language Models

    Zhanke Zhou, Xiangyu Lu, Chentao Cao, Brando Miranda, Tongliang Liu, Bo Han, Sanmi Koyejo · PDF
  17. DeduCE: Deductive Consistency as a Framework to Evaluate LLM Reasoning

    Atharva Pandey, Kshitij Dubey, Rahul Sharma, Amit Sharma · PDF
  18. Direct Induction Proof Challenge: Evaluating Large Language Models on Deeply Nested Mathematical Induction

    Risako Ando, Koji Mineshima, Mitsuhiro Okada · PDF
  19. Discovering Hidden Algebraic Structures via Transformers with Rank-Aware Beam GRPO

    Jaeha Lee, Gio Huh, Ning Su, Tony Yue YU · PDF
  20. Discrete Feynman-Kac Correctors

    Mohsin Hasan, Marta Skreta, Alan Aspuru-Guzik, Yoshua Bengio, Kirill Neklyudov · PDF
  21. Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs

    Zhihe Yang, Xufang Luo, Zilong Wang, Dongqi Han, Zhiyuan He, Dongsheng Li, Yunjian Xu · PDF
  22. Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, Gao Huang · PDF
  23. Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition

    Zihao Zeng, Xuyao Huang, Boxiu Li, Hao Zhang, Zhijie Deng · PDF
  24. DSR-Bench: Evaluating the Structural Reasoning Abilities of LLMs via Data Structures

    Yu He, Yingxi Li, Colin White, Ellen Vitercik · PDF
  25. e3: Learning to Explore Enables Extrapolation of Test-Time Compute for LLMs

    Amrith Setlur, Matthew Y. R. Yang, Charlie Victor Snell, Jeremiah Greer, Ian Wu, Virginia Smith, Max Simchowitz, Aviral Kumar · PDF
  26. Enhancing Graph Neural Network for Boolean Satisfiability Solving via Data Augmentation

    Yi Fu, Anthony Tompkins, Yang Song, Maurice Pagnucco · PDF
  27. Entropy-Based Adaptive Weighting for Self-Training

    Xiaoxuan Wang, Yihe Deng, Mingyu Derek Ma, Wei Wang · PDF
  28. EquivaMap: Leveraging LLMs for Automatic Equivalence Checking of Optimization Formulations

    Haotian Zhai, Connor Lawless, Ellen Vitercik, Liu Leqi · PDF
  29. FMC: Formalization of Natural Language Mathematical Competition Problems

    Jiaxuan Xie, Chengwu Liu, Ye Yuan, Siqi Li, Zhiping Xiao, Ming Zhang · PDF
  30. Forget Less, Solve More: Sequential Fine-Tuning with Adapter Shrinking for Math Word Problems

    Gauri Toshniwal, S R Balasundaram · PDF
  31. From Narrative to Formalism: A Case Study in the Origin of Molecular Translation System

    Dmitry Zubarev · PDF
  32. Generalized Tree Edit Distance (GTED): A Faithful Evaluation Metric for Statement Autoformalization

    Yuntian Liu, Tao Zhu, Xiaoyang Liu, Yu Chen, Liu ZhaoXuan, Guo qingfeng, Jiashuo Zhang, Kangjie Bao, Tao Luo · PDF
  33. GenSelect: A Generative Approach to Best-of-N

    Shubham Toshniwal, Ivan Sorokin, Aleksander Ficek, Ivan Moshkov, Igor Gitman · PDF
  34. Governing Equation Discovery from Data Based on Differential Invariants

    Lexiang Hu, Yikang Li, Zhouchen Lin · PDF
  35. Graph Neural Networks for Tensor Product Decompositions of Lie Algebra Representations

    Max Vargas, Helen Jenne, Davis Brown, Henry Kvinge · PDF
  36. How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark

    Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Yang Wang, Liangming Pan · PDF
  37. Inequality Ranking and Inference System ($\texttt{\textbf{IRIS}}$): Giving Mathematical Conjectures Numerical Value

    Jillian Eddy, Randy Davila, Jesus De Loera, Junwei Lu, Ethan X Fang, Zini Yang · PDF
  38. Inferring Loop Invariants for Program Verification: an Abductive Learning Perspective

    Daiyang Luan, Ming Li · PDF
  39. Instilling Parallel Reasoning into Language Models

    Matthew Macfarlane, Minseon Kim, Nebojsa Jojic, Weijia Xu, Lucas Caccia, Xingdi Yuan, Wanru Zhao, Zhengyan Shi, Alessandro Sordoni · PDF
  40. IntegralBench: Benchmarking LLMs with Definite Integral Problems

    Bintao Tang, Xin Yang, Yuhao Wang, Zixuan Qiu, Zimo Ji, Wenyuan Jiang · PDF
  41. InternLM2.5-StepProver: Advancing Automated Theorem Proving via Critic-Guided Search

    Zijian Wu, Suozhi Huang, Zhejian Zhou, Huaiyuan Ying, Zheng Yuan, Wenwei Zhang, Dahua Lin, Kai Chen · PDF
  42. KELPS: A Framework for Verified Multi-Language Autoformalization via Semantic-Syntactic Alignment

    Jiyao Zhang, Chengli Zhong, Hui Xu, Li Qige, Jiajia Tian, Yi Zhou · PDF
  43. Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models

    Zhanke Zhou, Zhaocheng Zhu, Xuan Li, Mikhail Galkin, Xiao Feng, Sanmi Koyejo, Jian Tang, Bo Han · PDF
  44. Lean Finder: Semantic Search for Mathlib That Understands User Intents

    Jialin Lu, Kye Emond, Weiran Sun, Wuyang Chen · PDF
  45. Lean Meets Theoretical Computer Science: Scalable Synthesis of Theorem Proving Challenges in Formal-Informal Pairs

    Terry Jingchen Zhang, Wenyuan Jiang, Rongchuan Liu, Yisong Wang, Ning Wang, Junran Yang, Yinya Huang, Mrinmaya Sachan · PDF
  46. LeanTree: Accelerating White-Box Proof Search with Factorized States in Lean 4

    Matěj Kripner, Michal Sustr, Milan Straka · PDF
  47. LeanTutor: A Lean-Verified Tutor for Mathematical Proofs

    Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade · PDF
  48. Learning an Effective Premise Retrieval Model for Efficient Mathematical Formalization

    Yicheng Tao, Haotian Liu, Shanwen Wang, Hongteng Xu · PDF
  49. Learning Moderately Input-Sensitive Functions: A Case Study in QR Code Decoding

    Kazuki Yoda, Kazuhiko Kawamoto, Hiroshi Kera · PDF
  50. Learning to Discover Abstractions for LLM Reasoning

    Yuxiao Qu, Anikait Singh, Yoonho Lee, Amrith Setlur, Ruslan Salakhutdinov, Chelsea Finn, Aviral Kumar · PDF
  51. Learning to Reason without External Rewards

    Xuandong Zhao, Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song · PDF
  52. Learning to Solve Complex Problems via Dataset Decomposition

    Wanru Zhao, Lucas Caccia, Zhengyan Shi, Minseon Kim, Xingdi Yuan, Weijia Xu, Marc-Alexandre Côté, Alessandro Sordoni · PDF
  53. Learning-Guided Local Search for Asymmetric Traveling Salesman Problem

    Lejun Zhou, Yi Ju, Scott Moura · PDF
  54. Lemmanaid: Neuro-Symbolic Lemma Conjecturing

    Yousef Alhessi, Sólrún Halla Einarsdóttir, George Granberry, Emily First, Moa Johansson, Sorin Lerner, Nicholas Smallbone · PDF
  55. Let’s Try Again: Eliciting Multi-Turn Reasoning in Language Models via Simplistic Feedback

    Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li · PDF
  56. Machine Learning and LLM-Boost Symbolic Regression for Predicting $\mathbb{Q}$-Gonality of Modular Curves

    Xu Zhuang, Yuxiang Yao, Po-Chu Hsu, Xiaokang Wang, Peikai Qi · PDF
  57. Majority of the Bests: Improving Best-of-N via Bootstrapping

    Amin Rakhsha, Tianyu Zhang, Kanika Madan, Amir-massoud Farahmand, Amir Khasahmadi · PDF
  58. MIRB: Mathematical Information Retrieval Benchmark

    Haocheng Ju, Bin Dong · PDF
  59. MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem

    Fan Liu, Zhe-Rui Yang, Cancheng Liu, Tianrui Song, Xiaofeng Gao, Hao Liu · PDF
  60. NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

    Weiming Wu, Zi-Kang Wang, Jin Ye, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo · PDF
  61. NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation

    Xiangyan Liu, Jinjie Ni, Zijian Wu, Chao Du, Longxu Dou, Haonan Wang, Tianyu Pang, Michael Qizhe Shieh · PDF
  62. Not All Votes Count! Translated Program for Verification Improves Self-Consistency of Language Models for Math Reasoning

    Vernon Toh, Deepanway Ghosal, Soujanya Poria · PDF
  63. O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

    Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li, Naiqiang Tan, Xiaochun Cao, Dacheng Tao · PDF
  64. OctoThinker: Mid-Training Incentivizes Reinforcement Learning Scaling

    Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu · PDF
  65. Omni-Thinker: Scaling Cross-Domain Generalization in LLMs via Multi-Task RL with Hybrid Rewards

    Derek Li, Jiaming Zhou, Amirreza Kazemi, Qianyi Sun, Abbas Ghaddar, Liheng Ma, Yu Luo, Dong Li, Jianye HAO, Yingxue Zhang · PDF
  66. On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization

    Wenlong Deng, Yi Ren, Muchen Li, Danica J. Sutherland, Xiaoxiao Li, Christos Thrampoulidis · PDF
  67. Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

    Taishi Nakamura, Satoki Ishikawa, Masaki Kawamura, Takumi Okamoto, Daisuke Nohara, Jun Suzuki, Rio Yokota · PDF
  68. Optimizing Anytime Reasoning via Budget Relative Policy Optimization

    Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin · PDF
  69. PADRE: Pseudo-Likelihood based Alignment of Diffusion Language Models

    Shiv Shankar · PDF
  70. Physics-Constrained Symbolic Regression from Imagery

    Zhenyu Yu, MOHD YAMANI IDNA IDRIS, Pei Wang · PDF
  71. Plane Geometry Diagram Formalization via Vision-Language Models

    Xiaoteng Cui, Yi Liu · PDF
  72. POD-KAN-NO: a physically interpretable neural operator

    Yanyu Ke · PDF
  73. Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

    Ivo Petrov, Jasper Dekoninck, Lyuben Baltadzhiev, Maria Drencheva, Kristian Minchev, Mislav Balunovic, Nikola Jovanović, Martin Vechev · PDF
  74. ProofCompass: Enhancing Specialized Provers with LLM Guidance

    Nicolas Wischermann, Claudio Mayrink Verdun, Gabriel Poesia, Francesco Noseda · PDF
  75. ProofWala: Multilingual Proof Data Synthesis and Theorem-Proving

    Amitayush Thakur, George Tsoukalas, Greg Durrett, Swarat Chaudhuri · PDF
  76. Prover Agent: An Agent-based Framework for Formal Mathematical Proofs

    Kaito Baba, Chaoran Liu, Shuhei Kurita, Akiyoshi Sannai · PDF
  77. Putnam-AXIOM: A Functional and Static Benchmark

    Aryan Gulati, Brando Miranda, Eric Chen, Emily Xia, Kai Fronsdal, Bruno de Moraes Dumont, Sanmi Koyejo · PDF
  78. Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

    Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini · PDF
  79. README: Rapid Equation Discovery with Multimodal Encoders

    Gregory Kang Ruey Lau, Yue Ran Kang, Zi-Yu Khoo, Apivich Hemachandra, Ruth Wan Theng Chew, Bryan Kian Hsiang Low · PDF
  80. RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics

    Jie Zhang, Cezara Petrui, Kristina Nikolić, Florian Tramèr · PDF
  81. Reinforcement Learning Teachers of Test Time Scaling

    Edoardo Cetin, Tianyu Zhao, Yujin Tang · PDF
  82. Reward Inside the Model: A Lightweight Hidden‑State Reward Model for LLM's Best-of-N sampling

    Jizhou Guo, Zhaomin Wu, Philip S. Yu · PDF
  83. Reward Under Attack: Evaluating the Sensitivity of Process Reward Models

    Udbhav Bamba, Heng Yang, Rishabh Tiwari, Kurt Keutzer, Amir Gholami · PDF
  84. RL‑QESA: Reinforcement‑Learning Quasi‑Equilibrium Simulated Annealing

    Ruichen Xu, Kai Li, Haochun Wang, Georgios Kementzidis, Wei Zhu, Yuefan Deng · PDF
  85. Scalable Best-of-N Selection for Large Language Models via Self-Certainty

    Zhewei Kang, Xuandong Zhao, Dawn Song · PDF
  86. Scaling Mathematical Reasoning through Data, Tools, and Generative Selection

    Ivan Moshkov, Darragh Hanley, Ivan Sorokin, Shubham Toshniwal, Christof Henkel, Benedikt Schifferer, Wei Du, Igor Gitman · PDF
  87. Scaling Natural-Language Graph-Based Test Time Compute for Automated Theorem Proving

    Tim Knappe, Vincent Li, Yule Fu, Kevin Han, Kevin Zhu · PDF
  88. Simple, Scalable Reasoning via Iterated Summarization

    Vivek Vajipey, Aditya Tadimeti, Justin Shen, Ben Prystawski, Michael Y. Li, Noah Goodman · PDF
  89. Small Models Struggle to Learn from Strong Reasoners

    Yuetai Li, Xiang Yue, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Bill Yuchen Lin, Bhaskar Ramasubramanian, Radha Poovendran · PDF
  90. Solving Inequality Proofs with Large Language Models

    Jiayi Sheng, Luna Lyu, Jikai Jin, Tony Xia, Alex Gu, James Zou, Pan Lu · PDF
  91. SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

    Kechen Li, Wenqi Zhu, Coralia Cartis, Tianbo Ji, Shiwei Liu · PDF
  92. Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO

    Peter Chen, Xiaopeng Li, Ziniu Li, Xi Chen, Tianyi Lin · PDF
  93. SPEED-RL: Faster Training of Reasoning Models via Online Curriculum Learning

    Ruiqi Zhang, Daman Arora, Song Mei, Andrea Zanette · PDF
  94. Target-Based Automated Conjecturing for Neural Theorem Proving

    Marco Dos Santos, Albert Q. Jiang, Wenda Li, Mateja Jamnik · PDF
  95. Temporal Sampling for Forgotten Reasoning in LLMs

    Yuetai Li, Zhangchen Xu, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Xiang Yue, Radha Poovendran · PDF
  96. The Challenge of Teaching Reasoning to LLMs Without RL or Distillation

    Wei Du, Branislav Kisacanin, George Armstrong, Shubham Toshniwal, Ivan Moshkov, Alexan Ayrapetyan, Sadegh Mahdavi, Dan Zhao, Shizhe Diao, Dragan Mašulović, Advaith Avadhanam, Max Wang, Shitij Govil, Sri Yanamandra, Mihir Tandon, Sriram Ananthakrishnan, Vedant Rathi, David Zhang, Joonseok Kang, Leon Luo, Titu Andreescu, Ashmit Dutta, Boris Ginsburg, Igor Gitman · PDF
  97. The Invisible Leash: Why RLVR May Not Escape Its Origin

    Fang Wu, Yejin Choi · PDF
  98. The Open Proof Corpus: A Large-Scale Study of LLM-Generated Mathematical Proofs

    Jasper Dekoninck, Ivo Petrov, Kristian Minchev, Miroslav Marinov, Maria Drencheva, Lyuba Konova, Milen Milenov Shumanov, Kaloyan Tsvetkov, Nikolay Drenchev, Lazar D. Todorov, Kalina Nikolova, Nikolay Georgiev, Vanesa Kalinkova, Margulan Ismoldayev, Mislav Balunovic, Martin Vechev · PDF
  99. The Surprising Effectiveness of Negative Reinforcement in LLM Reasoning

    Xinyu Zhu, Mengzhou Xia, Zhepei Wei, Wei-Lin Chen, Danqi Chen, Yu Meng · PDF
  100. TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning

    Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran · PDF
  101. Token Hidden Reward: Steering Exploration-Exploitation in GRPO Training

    Wenlong Deng, Yi Ren, Danica J. Sutherland, Christos Thrampoulidis, Xiaoxiao Li · PDF
  102. Towards Geometry Problem Solving in the Large Model Era: A Survey

    Yurui Zhao, Xiang Wang, Jiahong Liu, Irwin King, Zhitao Huang · PDF
  103. Training Language Models to Reason Efficiently

    Daman Arora, Andrea Zanette · PDF
  104. Understanding R1-Zero-Like Training: A Critical Perspective

    Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin · PDF
  105. Value-Guided Search for Efficient Chain-of-Thought Reasoning

    Kaiwen Wang, Jin Peng Zhou, Jonathan Daniel Chang, Zhaolin Gao, Nathan Kallus, Kianté Brantley, Wen Sun · PDF
  106. VeriBench: End-to-End Formal Verification Benchmark for AI Code Generation in Lean 4

    Brando Miranda, Zhanke Zhou, Allen Nie, Elyas Obbad, Leni Aniva, Kai Fronsdal, Weston Kirk, Dilara Soylu, Andrea Yu, Ying Li, Sanmi Koyejo · PDF
  107. VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers

    Jianing Qi, Hao Tang, Zhigang Zhu · PDF
  108. Verifying Prompt-Induced Search-Space Shifts in LLM-Generated Mathematical Functions

    Shervin Ardeshir · PDF
  109. Verina: Benchmarking Verifiable Code Generation

    Zhe Ye, Zhengxu Yan, Jingxuan He, Timothe Kasriel, Kaiyu Yang, Dawn Song · PDF
  110. Vision Language Models are Biased: Counting legs of an animal is surprisingly hard

    An Vo, Khai-Nguyen Nguyen, Mohammad Reza Taesiri, Vy Tuong Dang, Anh Totti Nguyen, Daeyoung Kim · PDF
  111. When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

    Nishad Singhi, Hritik Bansal, Arian Hosseini, Aditya Grover, Kai-Wei Chang, Marcus Rohrbach, Anna Rohrbach · PDF
  112. Which Data Attributes Stimulate Math and Code Reasoning? An Investigation via Influence Functions

    Siqi Kou, Qingyuan Tian, Hanwen Xu, Zihao Zeng, Zhijie Deng · PDF