ICLR 2025 Past Large language models
ICLR 2025 Workshop on Foundation Models in the Wild
ICLR 2025 FM-Wild Workshop
- Submission deadline
- Feb 11, 2025, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (102)
Fetched from OpenReview (v2) on 2026-06-10.
-
"Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence
-
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
-
ACTIVATION STEERING IN NEURAL THEOREM PROVERS
-
Adjustment for Confounding using Pre-Trained Representations
-
AdvWave: Stealthy Adversarial Jailbreak Attack against Large Audio-Language Models
-
Agentic Multimodal AI for Hyper-Personalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework
-
AgentTaxo: Dissecting and Benchmarking Token Distribution of LLM Multi-Agent Systems
-
All It Takes Is One Prompt: An Autonomous LLM-MA System
-
AppVLM: A Lightweight Vision Language Model for Online App Control
-
Are DeepSeek R1 And Other Reasoning Models More Faithful?
-
Aria-UI: Visual Grounding for GUI Instructions
-
Attacking Multimodal OS Agents with Malicious Image Patches
-
Automated Benchmark Generation for Repository-Level Coding Tasks
-
Automated Capability Discovery via Model Self-Exploration
-
AutoToM: Automated Bayesian Inverse Planning and Model Discovery for Open-ended Theory of Mind
-
Beautiful Images, Toxic Words: Understanding and Addressing Offensive Text in Generated Images
-
Beyond ID Bias: PCA-Guided Dropout for Robust Fine-tuning
-
Beyond Pixels: Enhancing LIME with Hierarchical Features and Segmentation Foundation Models
-
Bridging vision language model (VLM) evaluation gaps with a framework for scalable and cost-effective benchmark generation
-
Captured by Captions: On Memorization and its Mitigation in CLIP Models
-
CARROT: A Cost Aware Rate Optimal Router
-
Cheap and Effective Personalization of Foundation Language Models for Imitating a User's Writing Style
-
Co-optimizing Recommendation and Evaluation for LLM Selection
-
Cost-efficient Collaboration between On-device and Cloud Language Models
-
CROSS: Analyzing the Trade-offs in Long-Context Cross-lingual Retrieval
-
DASFormer: Self-supervised Pretraining for Earthquake Monitoring
-
DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products
-
Demystifying Long Chain-of-Thought Reasoning in LLMs
-
Detecting Covariate Shifts With Vision-Language Foundation Models
-
Diagnosing Robotics Systems Issues with Large Language Models -- A Case Study
-
Disentangling Sequence Memorization and General Capability in Large Language Models
-
Does Cross-Domain Pre-Training Truly Help Time-Series Foundation Models?
-
DP-GPL: DIFFERENTIALLY PRIVATE GRAPH PROMPT LEARNING
-
Efficient Backdoor Detection on Text-to-image Synthesis via Neuron Activation Variation
-
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
-
Exploring LLM Agents for Cleaning Tabular Machine Learning Datasets
-
Faster, Cheaper, Better: Multi-Objective Hyperparameter Optimization for LLM and RAG Systems
-
Few-Shot Whole Slide Pathology Classification with Multi-Granular Vision-Language Models
-
FlipAttack: Jailbreak LLMs via Flipping
-
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
-
Focus on this, not that! Steering LLMs with Adaptive Feature Specification
-
Foundation Model-Based Data Selection for Dense Prediction Tasks
-
From Intuition to Understanding: Using AI Peers to Overcome Physics Misconceptions
-
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks
-
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
-
Geneshift: Impact of different scenario shift on Jailbreaking LLM
-
GeoFT: Fine-tuning Foundation Models for Automated OSINT Geolocation
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
-
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
-
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
-
Infinite Leagues Under the Sea: Realistic 3D Underwater Terrain Generation Augmented by Visual Foundation Models
-
KnowGuard: Robust Reasoning Enabled LLM Guardrail via Knowledge-Enhanced Logical Reasoning
-
KnowHalu: Hallucination Detection via Multi-Form Knowledge Based Factual Checking
-
Latent Representation Encoding and Multimodal Biomarkers for Post-Stroke Speech Assessment
-
Leveraging the true depth of LLMs
-
MASQUE: Diffusion-Based Localized Adversarial Makeup for Facial Privacy
-
Measuring In-Context Computation Complexity via Hidden State Prediction
-
MetaSC: Test-Time Safety Specification Optimization for Language Models
-
MITIGATING CACHE NOISE IN TEST-TIME ADAPTATION FOR LARGE VISION-LANGUAGE MODELS
-
MLLM CAN SEE? DYNAMIC CORRECTION DECODING FOR HALLUCINATION MITIGATION
-
MMInference: Accelerating Pre-filling for Long-Context Visual Language Models via Modality-Aware Permutation Sparse Attention
-
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
-
Multi-Hypothesis Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity via Laplacian Visual Prompting
-
Narrowing Class-Wise Robustness Gaps in Adversarial Training
-
Navigating the Designs of Privacy-Preserving Fine-tuning for Large Language Models
-
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
-
OpenRAG: Optimizing RAG End-to-End via In-Context Retrieval Learning
-
Optimizing Test-Time Compute via Meta Reinforcement Finetuning
-
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos
-
PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding
-
Policy-Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
-
Privacy Auditing for Large Language Models with Natural Identifiers
-
Reasoning Without Self-Doubt: More Efficient Chain-of-Thought Through Certainty Probing
-
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
-
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking
-
Reliable and Efficient Amortized Model-based Evaluation
-
Risks and Safety Considerations for Foundation Model-based Autonomous Agents' Interaction with the Environment
-
RoboMorph: Evolving Robot Morphology using Large Language Models
-
SafeMERGE: Preserving Safety Alignment in Fine-Tuned Large Language Models via Selective Layer-Wise Model Merging
-
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
-
SAM2-Adapter: Evaluating & Adapting Segment Anything 2 in Downstream Tasks: Camouflage, Shadow, Medical Image Segmentation, and More
-
SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation
-
Shh, don't say that! Domain Certification in LLMs
-
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
-
Simple is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
-
StochasTok: Improving Fine-Grained Subword Understanding in LLMs
-
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery
-
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
-
Toward Trustworthy Neural Program Synthesis
-
Towards Universal Offline Black-Box Optimization via Learning String Embedding Space
-
TPP-LLM: Modeling Temporal Point Processes by Efficiently Fine-Tuning Large Language Models
-
Tradeoffs Between Alignment and Helpfulness in Language Models with Steering Methods
-
Understanding (Un)Reliability of Steering Vectors in Language Models
-
Unisolver: PDE-Conditional Transformers Are Universal Neural PDE Solvers
-
Unlocking Post-hoc Dataset Inference with Synthetic Data
-
VisR-Bench: A Visual Retrieval Benchmark for Visually-Rich Documents
-
WABER: Evaluating Reliability and Efficiency of Web Agents with Existing Benchmarks
-
Why Foundation Models Struggle with Cross-Modal Context
-
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
-
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
-
WorkflowAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data
-
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference