ICLR 2026 Past Other
ICLR 2026 Workshop: VerifAI-2: The Second Workshop on AI Verification in the Wild
ICLR 2026 Workshop VerifAI-2
- Submission deadline
- Feb 9, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (39)
Fetched from OpenReview (v2) on 2026-06-10.
-
A NASH EQUILIBRIUM FRAMEWORK FOR TRAINING FREE MULTIMODAL STEP VERIFICATION
-
A Minimal Agent for Automated Theorem Proving
-
Agentic Uncertainty Reveals Agentic Overconfidence
-
Autoformalizing Memory Device Specifications with Agents
-
Beaver: An Efficient Deterministic LLM Verifier
-
Benchmarking Code Verification Strategies with LLMs-as-a-judge
-
Beyond Self-Checking: Fragment-Level Verification Across Diverse LLMs
-
Computational Arbitrage in AI Model Markets
-
Conv-to-Bench: Evaluating Language Models Via User–Assistant Dialogues In Code Tasks
-
DafnyLLM: Pre-training Dafny Representations with Large Language Models for Code Verification
-
Do LLMs Game Formalization? Evaluating Faithfulness in Logical Reasoning
-
Do LLMs Really Struggle at NL-FOL Translation? Revealing their Strengths via a Novel Benchmarking Strategy
-
Enforcing Temporal Constraints for LLM Agents
-
Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning
-
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
-
Evaluating Agentic Optimization on Large Codebases
-
FormalProofBench: Can Models Write Graduate Level Math Proofs That Are Formally Verified?
-
Geometry of Reason: Probabilistic Spectral Verification for Mathematical Reasoning
-
GLEAN: Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification
-
Grounding Long-Horizon Agent Coordination in GUI Environments via Contract-based Structural Planning
-
Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition
-
interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors
-
ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads?
-
Judging What We Cannot Solve: A Consequence-Based Approach for Oracle-Free Evaluation of Research-Level Math
-
Learning from Synthetic Data Improves Multi-hop Reasoning
-
Learning to Rank the Initial Branching Order of SAT Solvers
-
Learning to Repair Lean Proofs from Compiler Feedback
-
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
-
NANOZK: Layerwise Zero-Knowledge Proofs for Verifiable Large Language Model Inference
-
ProofRepairBench: Exploring Proof Repair in Lean
-
Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis
-
ROC-n-reroll: How verifier imperfection affects test-time scaling
-
RocqSmith: Can Automatic Optimization Forge Better Proof Agents?
-
Scaling Evaluation-Time Compute with Reasoning Models as Process Evaluators
-
SorryDB: Can AI Provers Complete Real-World Lean Theorems?
-
The Dual Nature of Unlearning: Impact of Fact Salience and Model Fine-Tuning
-
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents
-
Unified Operational Formalism for LLM-based Theorem-proving Systems
-
Verification Limits Code LLM Training