ICML 2025 Past Other
ICML 2025 Workshop on Assessing World Models
ICML 2025 World Models Workshop
- Submission deadline
- May 22, 2025, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (36)
Fetched from OpenReview (v2) on 2026-06-10.
-
Adapting Vision-Language Models for Evaluating World Models
-
APOD: Adaptive PDE-Observation Diffusion for Physics-Constrained Sampling
-
Aquilon: Towards Building Multimodal Weather LLMs
-
Are LLM Belief Updates Consistent with Bayes’ Theorem?
-
Beyond Behavioural Evaluations for Assessing World Models
-
Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning
-
Contextual Effects in LLM and Human Causal Reasoning
-
Deep Koopman operator framework for causal discovery in nonlinear dynamical systems
-
Do Vision Language Models infer human intention without visual perspective-taking? Towards a scalable "One-Image-Probe-All" dataset
-
Eliminating Discriminative Shortcuts in Multiple Choice Evaluations with Answer Matching
-
Evaluating Forecasting is More Difficult than Other LLM Evaluations
-
Evaluating Self-Orienting in Language and Reasoning Models
-
FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models
-
GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning
-
HueManity: Probing Fine-Grained Visual Perception in MLLMs
-
I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2
-
Let’s Simulate Frame-by-Frame: In-Context Physical Simulations with Vision-Language Models
-
Leveraging the Sequential Nature of Language for Interpretability
-
Measuring Belief Updates in Curious Agents
-
Measuring Rule-Following in Language Models
-
MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models
-
Newfluence: Boosting Model Interpretability and Understanding in High Dimensions
-
On the Emergence of "Useless" Features in Next Token Predictors
-
Open World Scene Graph Generation using Vision Language Models
-
Probing the Limits of Mathematical World Models in LLMs
-
ReviseQA: A Benchmark for Belief Revision in Multi-Turn Logical Reasoning
-
RMA: Reward Model Alignment with Human preference
-
Testing LLM Understanding of Scientific Literature through Expert-Driven Question Answering: Insights from High-Temperature Superconductivity
-
Tracking World States with Language Models: State-Based Evaluation Using Chess
-
Unbounded Memory and Consistent Imagination via Unified Diffusion–SSM World Models
-
Uncertainty Quantification for LLM-Based Survey Simulations
-
Understanding Large Language Models' Ability on Interdisciplinary Research
-
Virtue Semantics: Probing the Consistency of Moral Values of Large Language Models
-
What if Othello-Playing Language Models Could See?
-
World Models and Consistent Mistakes in LLMs
-
WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning