ICML 2025 Past Other

ICML 2025 Workshop on Assessing World Models

ICML 2025 World Models Workshop

Submission deadline
May 22, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (36)

Fetched from OpenReview (v2) on 2026-06-10.

  1. Adapting Vision-Language Models for Evaluating World Models

    Mariya Hendriksen, Tabish Rashid, David Bignell, Raluca Georgescu, Abdelhak Lemkhenter, Katja Hofmann, Sam Devlin, Sarah Parisot · PDF
  2. APOD: Adaptive PDE-Observation Diffusion for Physics-Constrained Sampling

    Ruichen Xu, Haochun Wang, Georgios Kementzidis, Chenhao Si, Yuefan Deng · PDF
  3. Aquilon: Towards Building Multimodal Weather LLMs

    Sumanth Varambally, Veeramakali Vignesh Manivannan, Yasaman Jafari, Luyu Han, Zachary Novack, Zhirui Xia, Salva Rühling Cachay, Srikar Eranky, Ruijia Niu, Taylor Berg-Kirkpatrick, Duncan Watson-Parris, Yian Ma, Rose Yu · PDF
  4. Are LLM Belief Updates Consistent with Bayes’ Theorem?

    Sohaib Imran, Ihor Kendiukhov, Matthew Broerman, Aditya Thomas, Riccardo Campanella, Rob Lamb, Peter M. Atkinson · PDF
  5. Beyond Behavioural Evaluations for Assessing World Models

    Kola Ayonrinde · PDF
  6. Cards Against Contamination: TCG-Bench for Difficulty-Scalable Multilingual LLM Reasoning

    Sultan AlRashed, Jianghui Wang, Francesco Orabona · PDF
  7. Contextual Effects in LLM and Human Causal Reasoning

    Zach Studdiford, Gary Lupyan · PDF
  8. Deep Koopman operator framework for causal discovery in nonlinear dynamical systems

    Juan Nathaniel, Carla Roesch, Jatan Buch, Derek DeSantis, Adam Rupe, Kara D Lamb, Pierre Gentine · PDF
  9. Do Vision Language Models infer human intention without visual perspective-taking? Towards a scalable "One-Image-Probe-All" dataset

    Bingyang Wang, Yijiang Li, Qingyang Zhou, Hui Yi Leong, Tianwei Zhao, Letian Ye, Hokin Deng, Dezhi Luo, Nuno Vasconcelos · PDF
  10. Eliminating Discriminative Shortcuts in Multiple Choice Evaluations with Answer Matching

    Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping · PDF
  11. Evaluating Forecasting is More Difficult than Other LLM Evaluations

    Daniel Paleka, Shashwat Goel, Jonas Geiping, Florian Tramèr · PDF
  12. Evaluating Self-Orienting in Language and Reasoning Models

    Eric J Bigelow, Zergham Ahmed, Tomer Ullman · PDF
  13. FRED: Financial Retrieval-Enhanced Detection and Editing of Hallucinations in Language Models

    Likun Tan, Kuan-Wei Huang, Kevin Wu · PDF
  14. GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

    Sahiti Yerramilli, Nilay Pande, Jayant Sravan Tamarapalli, Rynaa Grover · PDF
  15. HueManity: Probing Fine-Grained Visual Perception in MLLMs

    Rynaa Grover, Jayant Sravan Tamarapalli, Sahiti Yerramilli, Nilay Pande · PDF
  16. I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

    Oliver McLaughlin, Jack Merullo, Arjun Khurana · PDF
  17. Let’s Simulate Frame-by-Frame: In-Context Physical Simulations with Vision-Language Models

    YingQiao Wang, Eric J Bigelow, Tomer Ullman · PDF
  18. Leveraging the Sequential Nature of Language for Interpretability

    Usha Bhalla, Alex Oesterling, Claudio Mayrink Verdun, Flavio Calmon, Himabindu Lakkaraju · PDF
  19. Measuring Belief Updates in Curious Agents

    Joschka Strüber, Ilze Amanda Auzina, Shashwat Goel, Susanne Keller, Jonas Geiping, Ameya Prabhu, Matthias Bethge · PDF
  20. Measuring Rule-Following in Language Models

    · PDF
  21. MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models

    Vanya Cohen, Ray Mooney · PDF
  22. Newfluence: Boosting Model Interpretability and Understanding in High Dimensions

    Haolin Zou, Arnab Auddy, Yongchan Kwon, Kamiar Rahnama Rad, Arian Maleki · PDF
  23. On the Emergence of "Useless" Features in Next Token Predictors

    Mark Rofin, Jalal Naghiyev, Michael Hahn · PDF
  24. Open World Scene Graph Generation using Vision Language Models

    · PDF
  25. Probing the Limits of Mathematical World Models in LLMs

    Henry Kvinge, Elizabeth Coda, Eric Yeats, Davis Brown, John Buckheit, Sarah McGuire Scullen, Brendan Kennedy, Loc Truong, William Kay, Cliff Joslyn, Tegan Emerson, Michael J. Henry, John Anthony Emanuello · PDF
  26. ReviseQA: A Benchmark for Belief Revision in Multi-Turn Logical Reasoning

    Chadi Helwe, Sultan AlRashed, Francesco Orabona · PDF
  27. RMA: Reward Model Alignment with Human preference

    Ashish Gupta, Manjunatha Naik MC · PDF
  28. Testing LLM Understanding of Scientific Literature through Expert-Driven Question Answering: Insights from High-Temperature Superconductivity

    Haoyu Guo, Maria Tikhanovskaya, Paul Raccuglia, Alexey Vlaskin, Chris Co, Daniel J. Liebling, Scott Ellsworth, Matthew Abraham, Elizabeth Dorfman, Peter Armitage, John Tranquada, Senthil Todadri, Antoine Georges, Subir Sachdev, Steven Kivelson, Brad Ramshaw, Dominik Kiese, Chunhan Feng, Olivier Gingras, Vadim Oganesyan, Michael Brenner, Subhashini Venugopalan, Eun-Ah Kim · PDF
  29. Tracking World States with Language Models: State-Based Evaluation Using Chess

    Romain Harang, Jason Naradowsky, Yaswitha Gujju, Yusuke Miyao · PDF
  30. Unbounded Memory and Consistent Imagination via Unified Diffusion–SSM World Models

    Jia-Hua Lee, Bor-Jiun Lin, Wei-Fang Sun, Chun-Yi Lee · PDF
  31. Uncertainty Quantification for LLM-Based Survey Simulations

    Chengpiao Huang, Yuhang Wu, Kaizheng Wang · PDF
  32. Understanding Large Language Models' Ability on Interdisciplinary Research

    Yuanhao Shen, Daniel Xavier de Sousa, Ricardo Marçal de Andrade Nascimento, Ali Asad, Hongyu Guo, Xiaodan Zhu · PDF
  33. Virtue Semantics: Probing the Consistency of Moral Values of Large Language Models

    Em Smullen, Srihari Thirumaligai, Anna Leshinskaya · PDF
  34. What if Othello-Playing Language Models Could See?

    Xinyi Chen, Yifei Yuan, Jiaang Li, Serge Belongie, Maarten de Rijke, Anders Søgaard · PDF
  35. World Models and Consistent Mistakes in LLMs

    Christopher Wolfram, Aaron Schein · PDF
  36. WorldPrediction: A Benchmark for High-level World Modeling and Long-horizon Procedural Planning

    Delong Chen, Willy Chung, Yejin Bang, Ziwei Ji, Pascale Fung · PDF