ICLR 2025 Past Large language models

ICLR 2025 Workshop on Foundation Models in the Wild

ICLR 2025 FM-Wild Workshop

Submission deadline
Feb 11, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (102)

Fetched from OpenReview (v2) on 2026-06-10.

  1. "Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence

    Shaopeng Fu, Liang Ding, Di Wang · PDF
  2. Accelerating Unbiased LLM Evaluation via Synthetic Feedback

    Zhaoyi Zhou, Yuda Song, Andrea Zanette · PDF
  3. ACTIVATION STEERING IN NEURAL THEOREM PROVERS

    Shashank Kirtania · PDF
  4. Adjustment for Confounding using Pre-Trained Representations

    Rickmer Schulte, David Rügamer, Thomas Nagler · PDF
  5. AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models

    Mintong Kang, Chejian Xu, Shuang Yang, Bo Li · PDF
  6. Agentic Multimodal AI for Hyper-Personalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework

    Sagar Srinivas Sakhinana, Akash Das, Shivam Gupta, Venkataramana Runkana · PDF
  7. AgentTaxo: Dissecting and Benchmarking Token Distribution of LLM Multi-Agent Systems

    Qian Wang, Zhenheng Tang, ZICHEN JIANG, Nuo Chen, Tianyu Wang, Bingsheng He · PDF
  8. All It Takes Is One Prompt: An Autonomous LLM-MA System

    Qian Wang, Tianyu Wang, Zhenheng Tang, Qinbin Li, Nuo Chen, Jingsheng Liang, Bingsheng He · PDF
  9. AppVLM: A Lightweight Vision Language Model for Online App Control

    Georgios Papoudakis, Thomas Coste, Zhihao Wu, Jianye HAO, Jun Wang, Kun Shao · PDF
  10. Are DeepSeek R1 And Other Reasoning Models More Faithful?

    James Chua, Owain Evans · PDF
  11. Aria-UI: Visual Grounding for GUI Instructions

    Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li · PDF
  12. Attacking Multimodal OS Agents with Malicious Image Patches

    Lukas Aichberger, Alasdair Paren, Philip Torr, Yarin Gal, Adel Bibi · PDF
  13. Automated Benchmark Generation for Repository-Level Coding Tasks

    Konstantinos Vergopoulos, Mark Niklas Mueller, Martin Vechev · PDF
  14. Automated Capability Discovery via Model Self-Exploration

    Cong Lu, Shengran Hu, Jeff Clune · PDF
  15. AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind

    Zhining Zhang, Chuanyang Jin, Mung Yao Jia, Tianmin Shu · PDF
  16. Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images

    Aditya Kumar, Tom Blanchard, Adam Dziedzic, Franziska Boenisch · PDF
  17. Beyond ID Bias: PCA-Guided Dropout for Robust Fine-tuning

    Bo Fei, Xiaocheng Li, ZhangZhiqi, Youchen Qing, YANCONG DENG · PDF
  18. Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models

    Patrick Knab, Sascha Marton, Christian Bartelt · PDF
  19. Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation

    Tim Rädsch, Leon Mayer, Simon Pavicic, Ali Emre Kavur, Marcel Knopp, Barış Öztürk, Klaus Maier-Hein, Paul F Jaeger, Fabian Isensee, Annika Reinke, Lena Maier-hein · PDF
  20. Captured by Captions: On Memorization and its Mitigation in CLIP Models

    Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch · PDF
  21. CARROT: A Cost Aware Rate Optimal Router

    Seamus Somerstep, Felipe Maia Polo, Allysson Flavio Melo de Oliveira, Prattyush Mangal, Mírian Silva, Onkar Bhardwaj, Mikhail Yurochkin, Subha Maity · PDF
  22. Cheap and Effective Personalization of Foundation Language Models for Imitating a User's Writing Style

    Armand Mihai Nicolicioiu, Eugenia Iofinova, Andrej Jovanovic, Eldar Kurtic, Mahdi Nikdan, Andrei Panferov, Ilia Markov, Nir N Shavit, Dan Alistarh · PDF
  23. Co-optimizing Recommendation and Evaluation for LLM Selection

    Tarun Kumar, Cong Xu, Arpit Shah, Baradji Diallo, Martin Foltin, Suparna Bhattacharya · PDF
  24. Cost-efficient Collaboration between On-device and Cloud Language Models

    Avanika Narayan, Sabri Eyuboglu, Dan Biderman, Avner May, Scott Linderman, James Zou, Christopher Re · PDF
  25. CROSS: Analyzing the Trade-offs in Long-Context Cross-lingual Retrieval

    Sina Bagheri Nezhad, Ameeta Agrawal · PDF
  26. DASFormer: Self-supervised Pretraining for Earthquake Monitoring

    Qianggang Ding, Zhichao Shen, Weiqiang Zhu, Bang Liu · PDF
  27. DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products

    Julien Siems, Timur Carstensen, Arber Zela, Frank Hutter, Massimiliano Pontil, Riccardo Grazzi · PDF
  28. Demystifying Long Chain-of-Thought Reasoning in LLMs

    Edward Yeo, Yuxuan Tong, Xinyao Niu, Graham Neubig, Xiang Yue · PDF
  29. Detecting Covariate Shifts With Vision-Language Foundation Models

    Alvin Heng, Harold Soh · PDF
  30. Diagnosing Robotics Systems Issues with Large Language Models -- A Case Study

    Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Mueller · PDF
  31. Disentangling Sequence Memorization and General Capability in Large Language Models

    Gaurav Rohit Ghosal, Pratyush Maini, Aditi Raghunathan · PDF
  32. Does Cross-Domain Pre-Training Truly Help Time-Series Foundation Models?

    Zhenwei Zhang, Jiawen Zhang, Shun Zheng, Yuantao Gu, Jiang Bian · PDF
  33. DP-GPL: DIFFERENTIALLY PRIVATE GRAPH PROMPT LEARNING

    Jing Xu, Franziska Boenisch, Iyiola Emmanuel Olatunji, Adam Dziedzic · PDF
  34. Efficient Backdoor Detection on Text-to-image Synthesis via Neuron Activation Variation

    Shengfang Zhai, Jiajun Li, Yue Liu, Yinpeng Dong, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Jiaheng Zhang · PDF
  35. Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

    Jan Betley, Daniel Chee Hian Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans · PDF
  36. Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets

    Tommaso Bendinelli, Artur Dox, Christian Holz · PDF
  37. Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems

    Matthew Barker, Andrew Bell, Evan Thomas, James Carr, Thomas Andrews, Umang Bhatt · PDF
  38. Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models

    Anh-Tien Nguyen, Duy Minh Ho Nguyen, Nghiem Tuong Diep, Trung Quoc Nguyen, Nhat Ho, Jacqueline Michelle Metsch, Miriam Cindy Maurer, Daniel Sonntag, Hanibal Bohnenberger, Anne-Christin Hauschild · PDF
  39. FlipAttack: Jailbreak LLMs via Flipping

    Yue Liu, Xiaoxin He, Miao Xiong, Jinlan Fu, Shumin Deng, YINGWEI MA, Jiaheng Zhang, Bryan Hooi · PDF
  40. FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

    Cheng-Yu Hsieh, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Chun-Liang Li, Ranjay Krishna, Oncel Tuzel, Hadi Pouransari · PDF
  41. Focus on this, not that! Steering LLMs with Adaptive Feature Specification

    Tom A. Lamb, Adam Davies, Alasdair Paren, Philip Torr, Francesco Pinto · PDF
  42. Foundation Model-Based Data Selection for Dense Prediction Tasks

    Niclas Popp, Dan Zhang, Jan Hendrik Metzen, Matthias Hein, Lukas Schott · PDF
  43. From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions

    Ruben Weijers, Denton Wu, Hannah Betts, Tamara Jacod, Yuxiang Guan, Vidya Sujaya, Kushal Dev, Toshali Goel, William Delooze, Reihaneh Rabbany, Ying Wu, Jean-François Godbout, Kellin Pelrine · PDF
  44. G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks

    Guibin Zhang, Yanwei Yue, Xiangguo Sun, Guancheng Wan, Miao Yu, Junfeng Fang, Kun Wang, Tianlong Chen, Dawei Cheng · PDF
  45. Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?

    Simon Park, Abhishek Panigrahi, Yun Cheng, Dingli Yu, Anirudh Goyal, Sanjeev Arora · PDF
  46. Geneshift: Impact of different scenario shift on Jailbreaking LLM

    Tianyi Wu, Zhiwei Xue, Yue Liu, Jiaheng Zhang, Bryan Hooi, See-Kiong Ng · PDF
  47. GeoFT: Fine-tuning Foundation Models for Automated OSINT Geolocation

    Selena Sun · PDF
  48. GuardReasoner: Towards Reasoning-based LLM Safeguards

    Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi · PDF
  49. Improving Your Model Ranking on Chatbot Arena by Vote Rigging

    Rui Min, Tianyu Pang, Chao Du, Qian Liu, Minhao Cheng, Min Lin · PDF
  50. Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters

    Kevin Li, Sachin Goyal, João D. Semedo, J Zico Kolter · PDF
  51. Infinite Leagues Under the Sea: Realistic 3D Underwater Terrain Generation Augmented by Visual Foundation Models

    Tianyi Zhang, Weiming Zhi, Joshua G Mangelson, Matthew Johnson-Roberson · PDF
  52. KnowGuard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning

    Zhen Xiang, Shuang Yang, Nathaniel D. Bastian, Bo Li · PDF
  53. KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking

    Jiawei Zhang, Chejian Xu, Yu Gai, Freddy Lecue, Shuang Yang, Dawn Song, Bo Li · PDF
  54. Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment

    Giulia Sanguedolce, Dragos-Cristian Gruia, Patrick Naylor, Fatemeh Geranmayeh · PDF
  55. Leveraging the true depth of LLMs

    Ramón Calvo González, Daniele Paliotta, Matteo Pagliardini, Martin Jaggi, François Fleuret · PDF
  56. MASQUE: Diffusion-Based Localized Adversarial Makeup for Facial Privacy

    Youngjin Kwon, Xiao Zhang · PDF
  57. Measuring In-Context Computation Complexity via Hidden State Prediction

    Vincent Herrmann, Róbert Csordás, Jürgen Schmidhuber · PDF
  58. MetaSC: Test-Time Safety Specification Optimization for Language Models

    Victor Gallego · PDF
  59. MITIGATING CACHE NOISE IN TEST-TIME ADAPTATION FOR LARGE VISION-LANGUAGE MODELS

    Haotian Zhai, Xinyu Chen, Can Zhang, TianMing Sha, Ruirui Li · PDF
  60. MLLM CAN SEE? DYNAMIC CORRECTION DECODING FOR HALLUCINATION MITIGATION

    Chenxi Wang, Xiang Chen, Ningyu Zhang, Bozhong Tian, Haoming Xu, Shumin Deng, Huajun Chen · PDF
  61. MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention

    Yucheng Li, Huiqiang Jiang, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Amir H. Abdi, Dongsheng Li, Jianfeng Gao, Yuqing Yang, Lili Qiu · PDF
  62. MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression

    Tianyu Fu, Haofeng Huang, Xuefei Ning, Genghan Zhang, Boju Chen, Tianqi Wu, Hongyi Wang, Zixiao Huang, Shiyao Li, Shengen Yan, Guohao Dai, Huazhong Yang, Yu Wang · PDF
  63. Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting

    Xiaohao Xu, Feng Xue, Xiang Li, Haowei Li, Shusheng Yang, Tianyi Zhang, Matthew Johnson-Roberson, Xiaonan Huang · PDF
  64. Narrowing Class-Wise Robustness Gaps in Adversarial Training

    Fatemeh Amerehi, Patrick Healy · PDF
  65. Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models

    Haonan Shi, Tu Ouyang, An Wang · PDF
  66. OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

    Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou · PDF
  67. OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning

    Jiawei Zhou, Lei Chen · PDF
  68. Optimizing Test-Time Compute via Meta Reinforcement Finetuning

    Yuxiao Qu, Matthew Y. R. Yang, Amrith Setlur, Lewis Tunstall, Edward Emanuel Beeching, Ruslan Salakhutdinov, Aviral Kumar · PDF
  69. PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos

    Steven Abreu, Tiffany D Do, Karan Ahuja, Eric J Gonzalez, Lee Payne, Daniel McDuff, Mar Gonzalez-Franco · PDF
  70. PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

    Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Campagnolo Guizilini, Yue Wang · PDF
  71. Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

    Max Sobol Mark, Tian Gao, Georgia Gabriela Sampaio, Mohan Kumar Srirama, Archit Sharma, Chelsea Finn, Aviral Kumar · PDF
  72. Privacy Auditing for Large Language Models with Natural Identifiers

    Lorenzo Rossi, Bartłomiej Marek, Franziska Boenisch, Adam Dziedzic · PDF
  73. Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing

    Yichao Fu, Junda Chen, Yonghao Zhuang, Zheyu Fu, Ion Stoica, Hao Zhang · PDF
  74. ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding

    Xingyu Fu, Minqian Liu, Zhengyuan Yang, John Richard Corring, Yijuan Lu, Jianwei Yang, Dan Roth, Dinei Florencio, Cha Zhang · PDF
  75. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking

    Will LeVine, Bijan Varjavand · PDF
  76. Reliable and Efficient Amortized Model-based Evaluation

    Sang T. Truong, Yuheng Tu, Percy Liang, Bo Li, Sanmi Koyejo · PDF
  77. Risks and Safety Considerations for Foundation Model-based Autonomous Agents' Interaction with the Environment

    Azmine Toushik Wasi, Mahfuz Ahmed Anik, Riashat Islam · PDF
  78. RoboMorph: Evolving Robot Morphology using Large Language Models

    Kevin Qiu, Władysław Pałucki, Krzysztof Ciebiera, Paweł Fijałkowski, Marek Cygan, Łukasz Kuciński · PDF
  79. SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging

    Aladin Djuhera, Swanand Ravindra Kadhe, Farhan Ahmed, Syed Zawad, Holger Boche · PDF
  80. SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations

    Zhaorun Chen, Francesco Pinto, Minzhou Pan, Shuang Yang, Bo Li · PDF
  81. SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More

    Tianrun Chen, Ankang Lu, Lanyun Zhu, Chaotao Ding, Chunan Yu, Deyi Ji, Zejian Li, Lingyun Sun, Papa Mao, Ying Zang · PDF
  82. SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

    Sarthak Srivastava, Kathy Wu · PDF
  83. Shh, don't say that! Domain Certification in LLMs

    Cornelius Emde, Alasdair Paren, Preetham Arvind, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Bernard Ghanem, Philip Torr, Adel Bibi · PDF
  84. ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning

    Zhaorun Chen, Mintong Kang, Shuang Yang, Bo Li · PDF
  85. Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation

    Mufei Li, Siqi Miao, Pan Li · PDF
  86. StochasTok: Improving Fine-Grained Subword Understanding in LLMs

    Anya Sims, Cong Lu, Klara Kaleb, Jakob Nicolaus Foerster, Yee Whye Teh · PDF
  87. Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren · PDF
  88. Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels

    Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Sepp Hochreiter · PDF
  89. Toward Trustworthy Neural Program Synthesis

    Wen-Ding Li, Darren Yan Key, Kevin Ellis · PDF
  90. Towards Universal Offline Black-Box Optimization via Learning String Embedding Space

    Rong-Xi Tan, Ming Chen, Ke Xue, Yao Wang, Yaoyuan Wang, Fu Sheng, Chao Qian · PDF
  91. TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models

    Zefang Liu, Yinzhu Quan · PDF
  92. Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods

    Yotam Wolf, Noam Wies, Dorin Shteyman, Binyamin Rothberg, Yoav Levine, Amnon Shashua · PDF
  93. Understanding (Un)Reliability of Steering Vectors in Language Models

    Joschka Braun, Carsten Eickhoff, David Krueger, Seyed Ali Bahrainian, Dmitrii Krasheninnikov · PDF
  94. Unisolver: PDE-Conditional Transformers Are Universal Neural PDE Solvers

    Hang Zhou, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long · PDF
  95. Unlocking Post-hoc Dataset Inference with Synthetic Data

    Bihe Zhao, Pratyush Maini, Franziska Boenisch, Adam Dziedzic · PDF
  96. VisR-Bench: A Visual Retrieval Benchmark for Visually-Rich Documents

    Jian Chen, Ruiyi Zhang, Ming Li, Shijie Zhou, Changyou Chen · PDF
  97. WABER: Evaluating Reliability and Efficiency of Web Agents with Existing Benchmarks

    Su Kara, Fazle Faisal, Suman Nath · PDF
  98. Why Foundation Models Struggle with Cross-Modal Context

    Chen Henry Wu, Neil Kale, Aditi Raghunathan · PDF
  99. Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

    Kou Misaki, Yuichi Inoue, Yuki Imajuku, So Kuroki, Taishi Nakamura, Takuya Akiba · PDF
  100. Words or Vision: Do Vision-Language Models Have Blind Faith in Text?

    Ailin Deng, Tri Cao, Zhirui Chen, Bryan Hooi · PDF
  101. WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

    Junhong Shen, Atishay Jain, Zedian Xiao, Ishan Amlekar, Mouad Hadji, Aaron Podolny, Ameet Talwalkar · PDF
  102. xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

    Maximilian Beck, Korbinian Pöppel, Phillip Lippe, Richard Kurle, Patrick M Blies, Günter Klambauer, Sebastian Böck, Sepp Hochreiter · PDF