NeurIPS 2025 Past Math & reasoningAgents

NeurIPS 2025 Workshop on Bridging Language, Agent, and World Models for Reasoning and Planning

LAW

Submission deadline
Sep 21, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (111)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments

    Manuel Cherep, Chengtian Ma, Abigail Xu, Maya Shaked, Patricia Maes, Nikhil Singh · PDF
  2. ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language

    Aly Lidayan, Jakob Brandt Bjorner, Satvik Golechha, Alane Suhr · PDF
  3. Acting Less is Reasoning More! Teaching Language Model to Act Efficiently

    Hongru WANG, Cheng Qian, Wanjun Zhong, Xiusi Chen, Jiahao Qiu, Shijue Huang, Bowen Jin, Mengdi Wang, Kam-Fai Wong, Heng Ji · PDF
  4. Adapting Vision-Language Models for Evaluating World Models

    Mariya Hendriksen, Tabish Rashid, David Bignell, Raluca Stevenson, Abdelhak Lemkhenter, Katja Hofmann, Sam Devlin, Sarah Parisot · PDF
  5. AgentChangeBench: A Multi-Dimensional Evaluation Framework for Goal-Shift Robustness in Conversational AI

    Manik Rana, Calissa Man, Jeffrey Paine, Anotida Expected Msiiwa, Ahan M R, Kevin Zhu, Vasu Sharma, Sunishchal Dev · PDF
  6. Agentic Design Patterns: A System-Theoretic Framework

    Dung Dao, Quy Minh Le, Hoang Thanh Lam, Duc-Trong Le, Quoc-Viet Pham, Barry O'Sullivan, Hoang D. Nguyen · PDF
  7. AgentMaster: A Modular Multi-Agent Framework with A2A and MCP Protocols via a Unified Conversational Interface

    Callie C. Liao, Duoduo Liao, Sai Surya Gadiraju · PDF
  8. AI Agents for Web Testing: A Case Study in the Wild

    Naimeng Ye, Xiao Yu, Ruize Xu, Tianyi Peng, Zhou Yu · PDF
  9. AirTrafficGen: Configurable Air Traffic Scenario Generation with Large Language Models

    Dewi Sid William Gould, George De Ath, Ben Carvell, Nick Pepper · PDF
  10. Anemoi: A Semi-Centralized Multi-agent System Based on Agent-to-Agent Communication MCP server from Coral Protocol

    Xinxing Ren, Caelum Forder, Qianbo Zang, Ahsen Tahir, Roman J. Georgio, Suman Deb, Peter Carroll, Önder GÜRCAN, Zekun Guo · PDF
  11. Are LLMs Generalist Hanabi Agents?

    Mahesh Ramesh, Aswinkumar Ramkumar, Pavan Thodima, Kaousheik Jayakumar, Aniket Rege · PDF
  12. Assessing Adaptive World Models in Machines with Novel Games

    Lance Ying, Katherine M. Collins, Prafull Sharma, Cédric Colas, Kaiya Ivy Zhao, Adrian Weller, Zenna Tavares, Phillip Isola, Samuel J. Gershman, Jacob Andreas, Thomas L. Griffiths, Francois Chollet, Kelsey R Allen, Joshua B. Tenenbaum · PDF
  13. ATLAS: Actor-Critic Task-completion with Look-ahead Action Simulation

    Jiali Cheng, Anjishnu Kumar, Rishi Rajasekaran, G Roshan Lal, Hani Ramezani, Oleg Rokhlenko, Omar Zia Khan, Sunny Chiu-Webster, Gang Hua, Hadi Amiri · PDF
  14. AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory

    Jitesh Jain, Shubham Maheshwari, Ning Yu, Wen-mei Hwu, Humphrey Shi · PDF
  15. Automated Reward Design for Gran Turismo

    Michel Ma, Takuma Seno, Kaushik Subramanian, Peter R. Wurman, Peter Stone, Craig Sherstan · PDF
  16. Avi: A 3D Vision-Language Action Model Architecture generating Action from Volumetric Inference

    Harris Song, Long Le · PDF
  17. Behavioral Systems Require Behavioral Tests

    Manuel Cherep, Nikhil Singh, Patricia Maes · PDF
  18. Benchmarking Large Language Models for Zero-shot and Few-shot Phishing URL Detection

    Najmul Hasan, Prashanth BusiReddyGari · PDF
  19. Beyond Generative AI: World Models for Clinical Prediction, Counterfactuals, and Planning

    Mohammad Areeb Qazi, Maryam Nadeem, Mohammad Yaqub · PDF
  20. Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens

    Kaya Stechly, Karthik Valmeekam, Vardhan Palod, Atharva Gundawar, Subbarao Kambhampati · PDF
  21. BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation

    Fuyi Yang, Chenchen Ye, Mingyu Derek Ma, Yijia Xiao, Matthew Yang, Wei Wang · PDF
  22. Blocks, Bots, and Bottlenecks: Studying Real-time and Adaptive Multi-Agent LLM Collaboration

    Isadora White, Kolby Nottingham, Max Robinson, Ayush Parasbhai Maniar, Mehul Maheshwari, Hansen Lillemark, Lianhui Qin, Prithviraj Ammanabrolu · PDF
  23. Bootstrapping World Models from Dynamics Models in Multimodal Foundation Models

    Yifu QIU, Yftah Ziser, Anna Korhonen, Shay B Cohen, Edoardo Ponti · PDF
  24. Bridging Symbols from Language and Hierarchical Reinforcement Learning with Active Imitation

    Ziqi Ma, Sao Mai Nguyen, Philippe Xu · PDF
  25. Bridging Tool Dependencies and Domain Knowledge: A Graph-Based Framework for In-Context Planning

    Shengjie Liu, Li Dong, Zhenyu Zhang · PDF
  26. Can LLMs Reliably Evaluate Themselves? A Probabilistic VC Framework

    Jae Oh Woo, Mengdie Flora Wang, Rahul Ghosh, Baishali Chaudhury, Mun Young Kim · PDF
  27. CaughtCheating: Is Your MLLM a Good Cheating Detective? Exploring the Boundary of Visual Perception and Reasoning

    Ming Li, Chenguang Wang, Tianyi Zhou · PDF
  28. Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

    Jared Junkin, Samuel Nathanson · PDF
  29. CausalARC: Abstract Reasoning with Causal World Models

    Jacqueline R. M. A. Maasch, John Kalantari, Kia Khezeli · PDF
  30. Computer-Use Agents as Judges for Automatic GUI Design

    Kevin Qinghong Lin, Siyuan Hu, Linjie Li, Zhengyuan Yang, Lijuan Wang, Philip Torr, Mike Zheng Shou · PDF
  31. CORE: Full-Path Evaluation of LLM Agents Beyond Final State

    Panagiotis Michelakis, Yiannis Hadjiyianni, Dimitrios Stamoulis · PDF
  32. CORTEX: Collaborative LLM Agents for High-Stakes Alert Triage

    Bowen Wei, Yuan Shen Tay, Howard Liu, Jinhao Pan, Kun Luo, Ziwei Zhu, Chris Jordan · PDF
  33. Credit-Budgeted ICPC-Style Coding: When LLM Agents Must Pay for Every Decision

    Lingfeng Zhou, Junhao Shi, Jin Gao, Dequan Wang · PDF
  34. DDCG: Decoupled Dual-Critic Guidance for Embodied Agents

    Shaojin Ma, Min Zhang, Hongyao Tang, Jianye HAO, YAN ZHENG · PDF
  35. DeepPersona: Generative Engine for Scaling Deep Synthetic Personas

    Zhen Wang, Yufan Zhou, Zhongyan Luo, Lyumanshan Ye, Adam Wood, Man Yao, Luoshang Pan · PDF
  36. Democratizing Agentic RAG: Distillation-Guided Policy Optimization for Compact Language Models

    Rikuto Kotoge, Mai Nishimura, Jiaxin Ma · PDF
  37. Democratizing Microgrid Optimization: An LLM Agent for Dispatching Mobile Chargers to Construction Electric Vehicles

    Daniela Rojas Lozano, Yuanyuan Shi · PDF
  38. Demystify the Potential of Large Language Models as World Models of Code

    Bohan Lyu, Siqiao Huang, Zichen Liang, Wenjia Yang, Qian Sun, Jiaming Zhang · PDF
  39. DiffusionPack: Bin Packing with Custom Human Preferences

    Anurag Maurya, Shivam Vats, Gautham Balachandran, Ravi Prakash · PDF
  40. Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?

    Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati · PDF
  41. DR. WELL: Dynamic Reasoning and Learning with Symbolic World Model for Embodied LLM-Based Multi-Agent Collaboration

    Narjes Nourzad, Hanqing Yang, Shiyu Chen, Carlee Joe-Wong · PDF
  42. EconWebArena: Benchmarking Autonomous Agents on Economic Tasks in Realistic Web Environments

    Zefang Liu, Yinzhu Quan · PDF
  43. ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction

    Qineng Wang, Wenlong Huang, Yu Zhou, Hang Yin, Tianwei Bao, Jianwen Lyu, Weiyu Liu, Ruohan Zhang, Jiajun Wu, Li Fei-Fei, Manling Li · PDF
  44. Evaluating LLM Planning in Partially Observable Environments via Observation Representations and Action Sequences

    Hayeong Lee, Jun Ho Seo, Sunguk Shin, Jinho Lee, Myunsoo Kim, Minsuk Chang, Byung-Jun Lee · PDF
  45. Evaluating Long-Context Reasoning in LLM-Based WebAgents

    Andy Chung, Yichi Zhang, Kaixiang Lin, Aditya Rawal, Qiaozi Gao, Joyce Chai · PDF
  46. Every Answer Counts: Efficient Entity-Centric QA by Bayesian-Guided Subquery Sampling

    Binyamin Perets, Zohar Shnaider, Dvir Aran, Shie Mannor · PDF
  47. EvoMem: Improving Multi-Agent Planning with Dual-Evolving Memory

    Wenzhe Fan, Ning Yan, Masood S. Mortazavi · PDF
  48. Gaze-Guided Multimodal LLMs for Social Scene Understanding

    Shayan Nasiriboukani, Muhammad Awais, Sara Atito · PDF
  49. GAZE: Governance-Aware pre-annotation for Zero-shot World Model Environments

    Leela Krishna, Mengyang Zhao, Saicharithreddy Pasula, Harshit Rajgarhia, Abhishek Mukherji, Vasudevan Sundarababu · PDF
  50. GenPlanX. Integrating LLMs and Classical AI for Generation of Plans and Execution

    Daniel Borrajo, Giuseppe Canonaco, Tomás de la Rosa, Alfredo Garrachón Ruiz, Sriram Gopalakrishnan, Simerjot Kaur, Marianela Morales, Sunandita Patra, Alberto Pozanco, Keshav Ramani, Charese Smiley, Pietro Totis, Manuela Veloso · PDF
  51. GRIT: Teaching MLLMs to Think with Images

    Yue Fan, Xuehai He, Diji Yang, Kaizhi Zheng, Ching-Chen Kuo, Yuting Zheng, Xinze Guan, Xin Eric Wang · PDF
  52. Grounded-Retrieval Adversarial Imitation Loop: Integrating Language, Agent, and World Models

    Liv G. d'Aliberti, Manoel Horta Ribeiro · PDF
  53. GroundedPRM: Tree-Guided and Fidelity-Aware Process Reward Modeling for Step-Level Reasoning

    Yao Zhang, Yu Wu, Haowei Zhang, Weiguo Li, Haokun Chen, Guohao Li, Zhen Han, Volker Tresp · PDF
  54. Higher Embedding Dimension Creates a Stronger World Model for a Simple Sorting Task

    Brady Bhalla, Honglu Fan, Nancy Chen, Tony Yue YU · PDF
  55. HugAgent: Evaluating LLMs in Simulating Individual-Level Human Reasoning on Open-Ended Tasks

    Chance Jiajie Li, Zhenze Mo, Yuhan Tang, Ao Qu, Jiayi Wu, Kaiya Ivy Zhao, Yulu Gan, Jie Fan, Jiangbo Yu, Jinhua Zhao, Paul Pu Liang, Luis Alberto Alonso Pastor, Kent Larson · PDF
  56. Knot So Simple: A Minimalistic Environment for Spatial Reasoning

    Zizhao Chen, Yoav Artzi · PDF
  57. Language-conditioned world model improves policy generalization by reading environmental descriptions

    Joe Nguyen, Stefan Lee · PDF
  58. Law in Silico: Simulating Legal Society with LLM-Based Agents

    Yiding Wang, Yuxuan Chen, Fanxu Meng, Xifan Chen, Xiaolei Yang, Muhan Zhang · PDF
  59. Let’s Try Again: Eliciting Multi-Turn Reasoning in Language Models via Simplistic Feedback

    Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li · PDF
  60. LLM Economist: Large Population Models and Mechanism Design in Multi-Agent Generative Simulacra

    Seth Karten, Wenzhe Li, Zihan Ding, Samuel Kleiner, Yu Bai, Chi Jin · PDF
  61. LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding

    Yu Yu, Qian Xie, Li Jin · PDF
  62. LLMs as Scalable, General-Purpose Simulators For Evolving Digital Agent Training

    Yiming Wang, Da Yin, Yuedong Cui, Zhiqian Li, Ruichen Zheng, Zongyu Lin, Di Wu, Xueqing Wu, Chenchen Ye, Yu Zhou, Kai-Wei Chang · PDF
  63. Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers

    Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, Wendong Fan, Xin Gao, Ruohao Guo, Yuan He, Yicheng He, Xianglong Hu, Neil Johnson, Bowen Li, Fangru Lin, Siyu Lin, Tong Liu, Yunpu Ma, HAO SHEN, Hao Sun, Beibei Wang, Fangyijie Wang, Hao Wang, Haoran Wang, Yang Wang, Yifeng Wang, Zhaowei Wang, Ziyang Wang, Yifan Wu, Zikai Xiao, Chengxing Xie, Fan Yang, Junxiao Yang, Qianshuo Ye, Ziyu Ye, Guangtao Zeng, Yuwen Ebony Zhang, Zeyu Zhang, Zihao Zhu, Bernard Ghanem, Philip Torr, Guohao Li · PDF
  64. Measuring Physical-World Privacy Awareness of Large Language Models: An Evaluation Benchmark

    Xinjie Shen, Mufei Li, Pan Li · PDF
  65. Measuring Rhetorical Style in Scientific Writing with LLM Personas

    Jingyi Qiu, Hong Chen, Zongyi Li · PDF
  66. MetaSynth: Multi-Agent Metadata Generation from Implicit Feedback in Black-Box Systems

    Shreeranjani srirangamsridharan, Ali Abavisani, Reza Yousefi Maragheh, Ramin Giahi, Kai Zhao, Jason Cho, Sushant Kumar · PDF
  67. Mind-Map Agent: Enhancing Cooperative Task Planning through Communication Alignment with Large Language Models

    HoBeomJeon, Hyungmin Kim, DohyungKim, Minsu Jang, Jaehong Kim · PDF
  68. MIRAI: Evaluating LLM Agents for International Event Forecasting

    Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang · PDF
  69. Model Context Protocol for Vision Agents: Schema, Memory, and World Model Implications

    Aditi Tiwari, Akshit Bhalla, Darshan Ganesh Prasad · PDF
  70. Modeling Open World Cognition as On-Demand Synthesis of Probabilistic Models

    Lionel Wong, Katherine M. Collins, Lance Ying, Cedegao E. Zhang, Adrian Weller, Tobias Gerstenberg, Timothy J. O'Donnell, Alexander K. Lew, Jacob Andreas, Joshua B. Tenenbaum, Tyler BrookeWilson · PDF
  71. Modeling Others' Minds as Code

    Kunal Jha, Aydan Yuenan Huang, Eric Ye, Natasha Jaques, Max Kleiman-Weiner · PDF
  72. NiceWebRL: a Python library for human subject experiments with reinforcement learning environments

    Wilka Carvalho, Vikram Srinivas Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha · PDF
  73. Observer, Not Player: Simulating Theory of Mind in Large Language Models through Game Observation

    Jerry Wang, Ting Yu Liu · PDF
  74. Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025

    Jiahao Qiu, Jingzhe Shi, Xinzhe Juan, Zelin Zhao, Jiayi Geng, Shilong Liu, Hongru WANG, Sanfeng Wu, Mengdi Wang · PDF
  75. Planning with Generative Cognitive Maps

    Jeffrey Qin, Albert Yang, Cole Wyeth, Ziheng Xu, Kevin Ellis, Marta Kryven · PDF
  76. Position: Hierarchical World Models with Causal Curation for Generalizing Agents

    Fei Dai, Hanqi Zhou, Alison Gopnik · PDF
  77. Position: Human-Robot Interaction Demands a Shift From Static Privacy Controls to Dynamic Learning

    Shuning Zhang, Hong Jia, Simin Li, Ting Dang, Yongquan Hu, Xin Yi, Hewu Li · PDF
  78. Position: The Physics-Physical Reasoning Interplay is Key for Future Embodied World Models

    Terry Jingchen Zhang, Kun Xiang, Yinya Huang, Jixi He, Zirong Liu, Yueling Tang, Ruizhe Zhou, Chengyu Yu, Xiaodan Liang · PDF
  79. QueryBandits for Hallucination Mitigation: Exploiting Semantic Features for No-Regret Rewriting

    Nicole Cho, William Watson, Alec Koppel, Sumitra Ganesh, Manuela Veloso · PDF
  80. R2P: Reformulate–Retrieve–Program for Robust Mathematical Reasoning in LLMs

    Yu Zhang, Shujun Peng, Xinhan Lin, Yang Hu, Shouyi Yin · PDF
  81. RealWebAssist: A Benchmark for Long-Horizon Web Assistance with Real-World Users

    Suyu Ye, Haojun Shi, Darren Shih, Hyokun Yun, Tanya G. Roosta, Tianmin Shu · PDF
  82. Reasoning Under Pressure: LLMs in Competitive Pokémon Battles

    Tadisetty Sai Yashwanth, Dhatri C · PDF
  83. RECOLLAB: Retrieval-Augmented LLMs for Cooperative Ad-hoc Teammate Modeling

    Conor Wallace, Umer Siddique, Yongcan Cao · PDF
  84. RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs

    Soumya Rani Samineni, Durgesh Kalwar, Karthik Valmeekam, Kaya Stechly, Subbarao Kambhampati · PDF
  85. ROSE: Reconstructing Objects, Scenes, and Trajectories from Casual Videos for Robotic Manipulation

    Peihao Li, Haoran Geng, Jameson Crate, Yanbing Han, Junyi Zhang, Feishi Wang, Charlie Tianyue Cheng, Runpei Dong, Yen-Jen Wang, Haozhe Lou, Trevor Darrell, Pieter Abbeel, Jitendra Malik · PDF
  86. Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting

    Michael Y. Hu, Benjamin Van Durme, Jacob Andreas, Harsh Jhamtani · PDF
  87. SAND: Boosting LLM Agents with Self-Taught Action Deliberation

    Yu Xia, Yiran Jenny Shen, Junda Wu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Julian McAuley · PDF
  88. SAPO: Safety-Aware Embodied Task Planning with fully Partially-Observable environment and physical constraints

    Hyungmin Kim, HoBeomJeon, DohyungKim, Minsu Jang, Jaehong Kim · PDF
  89. SCALAR: Self-Supervised Composition and Learning of Skills with LLM Planning and RL

    Renos Zabounidis, Yue Wu, Simon Stepputtis, Tom Mitchell, Yuanzhi Li, Katia P. Sycara · PDF
  90. Scaling LLM Planning: NL2FLow for Parametric Workflow Problem Generation and Rigorous Evaluation

    Jungkoo Kang · PDF
  91. Similar: A Step-Wise, Multi-Dimensional Reward Model for Virtual Agent Learning and Reasoning

    Bingchen Miao, Yang Wu, Minghe Gao, Qifan Yu, Wendong Bu, Wenqiao Zhang, Yunfei Li, Siliang Tang, Tat-Seng Chua, Juncheng Li · PDF
  92. Social Behaviour and Strategic Adaptation of LLMs in Multiplayer Sequential Games

    Xijie Zeng, Frank Rudzicz, Marta Kryven · PDF
  93. Social World Models

    Xuhui Zhou, Jiarui Liu, Akhila Yerukola, Hyunwoo Kim, Maarten Sap · PDF
  94. Spatial Mental Modeling from Limited Views

    Baiqiao Yin, Qineng Wang, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Manling Li, Jiajun Wu, Li Fei-Fei · PDF
  95. Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!

    Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Paul Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas · PDF
  96. STRIDE: A Systematic Framework for Selecting AI Modalities—Agentic AI, AI Assistants, or LLM Calls

    Shubhi Asthana, Ruchi Mahindru, Bing Zhang, Hima Patel, Chad DeLuca · PDF
  97. Test-Time Scaling for Multistep Reasoning in Small Language Models via A* Search

    Alexander Braverman, Weitong Zhang, Quanquan Gu · PDF
  98. The Personality Illusion: Revealing Dissociation Between Self-Reports & Behavior in LLMs

    Pengrui Han, Rafal Dariusz Kocielnik, Peiyang Song, Ramit Debnath, Dean Mobbs, Anima Anandkumar, R. Michael Alvarez · PDF
  99. The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason

    Shanchao Liang, Spandan Garg, Roshanak Zilouchian Moghaddam · PDF
  100. Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

    Junhong Shen, Hao Bai, Lunjun Zhang, Yifei Zhou, Amrith Setlur, Shengbang Tong, Diego Caples, Nan Jiang, Tong Zhang, Ameet Talwalkar, Aviral Kumar · PDF
  101. ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark

    Vaskar Nath, Pranav Vishnu Raja, Jane Yu, Claire Yoon, Sean M. Hendryx · PDF
  102. Trust, Risk, and Security in Agentic AI: A Short Survey

    Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis · PDF
  103. UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

    Jiannan Xiang, Yun Zhu, Lei Shu, Maria Wang, Lijun Yu, Gabriel Barcik, James David Lyon, Srinivas Sunkara, Jindong Chen · PDF
  104. ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making

    Yitong Luo, Ziang Chen, Hou Hei Lam, Jiayu Zhan, Junqi Wang, Zhenliang Zhang, Xue Feng · PDF
  105. VideoAgent: Self-Improving Video Generation for Embodied Planning

    Achint Soni, Sreyas Venkataraman, Abhranil Chandra, Sebastian Fischmeister, Percy Liang, Bo Dai, Sherry Yang · PDF
  106. VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning

    Ye Liu, Kevin Qinghong Lin, Chang Wen Chen, Mike Zheng Shou · PDF
  107. What Limits Virtual Agent Application? OmniBench: A Scalable Multi-Dimensional Benchmark for Essential Virtual Agent Capabilities

    Wendong Bu, Yang Wu, Qifan Yu, Minghe Gao, Bingchen Miao, Zhenkui Zhang, Kaihang Pan, Yunfei Li, Mengze Li, Wei Ji, Juncheng Li, Siliang Tang, Yueting Zhuang · PDF
  108. Who Gets the Reward & Who Gets the Blame? Evaluation-Aligned Post-Training for Multi-LLM Agents

    Chih-Hsuan Yang, Tanwi Mallick, Ian Foster, Amal Gueroudji, Rajeev Thakur · PDF
  109. World Model Driven Episodic Memory for LLMs

    Shreyas Rajesh, Pavan S Holur, Chenda Duan, David Chong, vwani Roychowdhury · PDF
  110. World Models must live in Parallel Worlds

    Sahithya Ravi, Aditya Chinchure, Pushkar Shukla, Vered Shwartz, Leonid Sigal · PDF
  111. WorldAgen: Unified State-Action Prediction with Test-Time World Model Training

    Chi Wan, Kangrui Wang, Yuan Si, Pingyue Zhang, Huang Huang, Manling Li · PDF