ICLR 2026 Past Other

ICLR 2026 Workshop: VerifAI-2: The Second Workshop on AI Verification in the Wild

ICLR 2026 Workshop VerifAI-2

Submission deadline
Feb 9, 2026, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (39)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A NASH EQUILIBRIUM FRAMEWORK FOR TRAINING FREE MULTIMODAL STEP VERIFICATION

    Rohit Sinha, Kunal Tilaganji, Tanuja Ganu, Nagarajan Natarajan, Amit Sharma, Vineeth N. Balasubramanian
  2. A Minimal Agent for Automated Theorem Proving

    Borja Requena, Austin Letson, Krystian Nowakowski, Izan Beltran Ferreiro, Leopoldo Sarra
  3. Agentic Uncertainty Reveals Agentic Overconfidence

    Jean Kaddour, Srijan Patel, Gbetondji Jean-Sebastien Dovonon, Leo Richter, Pasquale Minervini, Matt J. Kusner
  4. Autoformalizing Memory Device Specifications with Agents

    Jan Ole Ernst, Dmitri Michelangelo Saberi, Thomas Zimmermann, Derek Christ, Rajath Salegame, Suhaas M Bhat, Stanislav Levental, Thomas Dybdahl Ahle, Matthias Jung
  5. Beaver: An Efficient Deterministic LLM Verifier

    Tarun Suresh, Nalin Wadhwa, Debangshu Banerjee, Gagandeep Singh
  6. Benchmarking Code Verification Strategies with LLMs-as-a-judge

    Arnav Kumar Jain, Justin T Chiu, Tom Sherborne, Matthias Gallé
  7. Beyond Self-Checking: Fragment-Level Verification Across Diverse LLMs

    Ken Mueller, Arihant Choudhary, David Perez, Scott Mueller
  8. Computational Arbitrage in AI Model Markets

    Ricardo Olmedo, Bernhard Schölkopf, Moritz Hardt
  9. Conv-to-Bench: Evaluating Language Models Via User–Assistant Dialogues In Code Tasks

    Victor Moreli dos Santos, André Cerqueira Castro, Samuel Lopes de Souza Toledo, Bruno Moreira Lavalli Calura, Lisandra Cristina de Moura Menezes, Raul César Reis Mata, Telma Woerle de Lima Soares, Bryan Lincoln Marques de Oliveira
  10. DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification

    Shentong Mo
  11. Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning

    Kyuhee Kim, Auguste Poiroux, Antoine Bosselut
  12. Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy

    Andrea Brunello, Luca Geatti, Michele Mignani, Angelo Montanari, Nicola Saccomanno
  13. Enforcing Temporal Constraints for LLM Agents

    Adharsh Kamath, Sishen Zhang, Changming Xu, Shubham Ugare, Gagandeep Singh, Sasa Misailovic
  14. Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning

    Manan Tayal, Mumuksh Tayal
  15. Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence

    Bingji Yi, Qiyuan Liu, Yuwei Cheng, Haifeng Xu
  16. Evaluating Agentic Optimization on Large Codebases

    Atharva Sehgal, James Hou, Akanksha Sarkar, Ishaan Mantripragada, Swarat Chaudhuri, Jennifer J. Sun, Yisong Yue
  17. FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?

    Nikil Ravi, Kexing Ying, Vasilii Nesterov, Rayan Krishnan, Elif Uskuplu, Bingyu Xia, Janitha Aswedige, Langston Nashold
  18. Geometry of Reason: Probabilistic Spectral Verification for Mathematical Reasoning

    Valentin NOËL
  19. GLEAN: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification

    Yichi Zhang, Nabeel Seedat, Yinpeng Dong, Peng Cui, Jun Zhu, Mihaela van der Schaar
  20. Grounding Long-Horizon Agent Coordination in GUI Environments via Contract-based Structural Planning

    Hao Yu, Weiming Li, Yueming Lyu, Jie-Jing Shao, Yulei Sui, Ivor Tsang, Haiyan Yin
  21. Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition

    Joonhyuk Cha, Moises Andrade, Zsolt Kira
  22. interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors

    Vishak K Bhat, Prateek Chanda, Ashmit Khandelwal, Maitreyi Swaroop, Subbarao Kambhampati, Vineeth N. Balasubramanian, Nagarajan Natarajan, Amit Sharma
  23. ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?

    Ayush Nangia, Shikhar Mishra, Aman Gokrani, Paras Chopra
  24. Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math

    Guijin Son, Donghun Yang, Hitesh Laxmichand Patel, Hyunwoo Ko, Amit Agarwal, Sunghee Ahn, Kyong-Ha Lee, Youngjae Yu
  25. Learning from Synthetic Data Improves Multi-hop Reasoning

    Anmol Kabra, Yilun Yin, Albert Gong, Kamilė Stankevičiūtė, Dongyoung Go, Johann Lee, Katie Z Luo, Carla P Gomes, Kilian Q Weinberger
  26. Learning to Rank the Initial Branching Order of SAT Solvers

    Arvid Eriksson, Gabriel Poesia, Roman Bresson, Karl Henrik Johansson, David Broman
  27. Learning to Repair Lean Proofs from Compiler Feedback

    Evan Wang, Simon Chess, Daniel Lee, Siyuan Ge, Ajit Mallavarapu, Vasily Ilin
  28. MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

    Guijin Son, Dongkeun Yoon, Juyoung Suk, Javier Aula-Blasco, Mano Aslan, Kim Vu, Shayekh Bin Islam, Jaume Prats-Cristià, Lucía Tormo-Bañuelos, Seungone Kim
  29. NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference

    Zhaohui Geoffrey Wang
  30. ProofRepairBench: Exploring Proof Repair in Lean

    Manooshree Patel, Bartosz Piotrowski, Leopold Haller, Hugh James Leather
  31. Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis

    Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, Alex Aiken
  32. ROC-n-reroll: How verifier imperfection affects test-time scaling

    Florian E. Dorner, Yatong Chen, André F Cruz, Fanny Yang
  33. RocqSmith: Can Automatic Optimization Forge Better Proof Agents?

    Andrei Kozyrev, Nikita Khramov, Denis Lochmelis, Valerio Morelli, Gleb Solovev, Anton Podkopaev
  34. Scaling Evaluation-Time Compute with Reasoning Models as Process Evaluators

    Seungone Kim, Ian Wu, Jinu Lee, Xiang Yue, Seongyun Lee, Minkyeong Moon, Carolin Lawrence, Kiril Gashteovski, Julia Hockenmaier, Graham Neubig, Sean Welleck
  35. SorryDB: Can AI Provers Complete Real-World Lean Theorems?

    Austin Letson, Leopoldo Sarra, Auguste Poiroux, Oliver Dressler, Paul Lezeau, Dhyan Aranha, Frederick Pu, Aaron Hill, Miguel Corredera Hidalgo, Julian Berman, George Tsoukalas, Lenny Taelman
  36. The Dual Nature of Unlearning: Impact of Fact Salience and Model Fine-Tuning

    Anna Borisiuk, Andrey Savchenko, Alexander Panchenko, Elena Tutubalina
  37. ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents

    Dawei Li, Yuguang Yao, Zhen Tan, huan liu, Ruocheng Guo
  38. Unified Operational Formalism for LLM-based Theorem-proving Systems

    Avaljot Singh, Shaurya Gomber, Yasmin Sarita, José Meseguer, Gagandeep Singh
  39. Verification Limits Code LLM Training

    Srishti Gureja, Marzieh Fadaee, Sara Hooker, Matthias Gallé, Jingyi He, Elena Tommasone