ICLR 2024 Past Large language modelsFairness & ethics
ICLR 2024 Workshop on Reliable and Responsible Foundation Models
ICLR 2024 R2-FM Workshop
- Submission deadline
- Feb 11, 2024, 12:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (66)
Fetched from OpenReview (v2) on 2026-06-10.
-
©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model
-
A StrongREJECT for Empty Jailbreaks
-
Actions Speak Louder than Words: Superficial Fairness Alignment in LLMs
-
Adversarial Robustness for Visual Grounding of Multimodal Large Language Models
-
AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval
-
Aligning Modalities in Vision Large Language Models via Preference Fine-tuning
-
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
-
Augmentation Alone Leads to Generalization
-
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition
-
Boosting Jailbreak Attack With Momentum
-
Can Generative Multimodal Models Count to Ten?
-
Can Large Language Models Achieve Calibration with In-Context Learning?
-
Can Large Language Models Reason Robustly with Noisy Rationales?
-
Chain-of-Verification Reduces Hallucination in Large Language Models
-
Composing Knowledge and Compression Interventions for Language Models
-
Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation
-
Dataset MKSL for measuring adequate response performance by knowledge level
-
Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data For Language Models
-
Evaluating Model Bias Requires Characterizing its Mistakes
-
Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs
-
Explaining latent representations of generative models with large multimodal models
-
Explicit Knowledge Factorization Meets In-Context Learning: What Do We Gain?
-
Exploring the Robustness of In-Context Learning with Noisy Labels
-
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
-
Hijacking Context in Large Multi-modal Models
-
How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG
-
How to train your VIT for OOD detection
-
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty
-
In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation
-
Instruction Tuning for Secure Code Generation
-
Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates
-
LARGE LANGUAGE MODEL CASCADES WITH MIXTURE OF THOUGHT REPRESENTATIONS FOR COST-EFFICIENT REASONING
-
Large Language Models are Anonymizers
-
Mapping Social Choice Theory to RLHF
-
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
-
Memorization and Privacy Risks in Domain-Specific Large Language Models
-
On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models
-
Personalized Language Modeling from Personalized Human Feedback
-
Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control
-
Preventing Memorized Completions through White-Box Filtering
-
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
-
Prompting for Robustness: Extracting Robust Classifiers from Foundation Models
-
ProTransformer: Robustify Transformers via Plug-and-Play Paradigm
-
Questioning the Survey Responses of Large Language Models
-
RAMBLA: A FRAMEWORK FOR EVALUATING THE RELIABILITY OF LLMS AS ASSISTANTS IN THE BIOMEDICAL DOMAIN
-
Re-Ex: Revising after Explanation reduces the Factual Errors in LLM Responses
-
Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models
-
Scaling Compute Is Not All You Need for Adversarial Robustness
-
Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding
-
Self-Alignment of Large Language Models via Social Scene Simulation
-
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
-
Setting the Record Straight on Transformer Oversmoothing
-
Skip $\textbackslash n$: A simple method to reduce hallucination in Large Vision-Language Models
-
Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework
-
texplain: Post-hoc Textual Explanation of Image Classifiers with Pre-trained Language Models
-
THE BIAS OF HARMFUL LABEL ASSOCIATIONS IN VISION-LANGUAGE MODELS
-
Towards Logically Consistent Language Models via Probabilistic Reasoning
-
Towards Personalized AI: Early-stopping Low-Rank Adaptation of Foundation Models
-
Unified Hallucination Detection for Multimodal Large Language Models
-
Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation
-
Unsolvable Problem Detection for Vision Language Models
-
Value Augmented Sampling: Predict Your Rewards To Align Language Models
-
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
-
Watermark Stealing in Large Language Models
-
WAVES: Benchmarking the Robustness of Image Watermarks
-
WorldBench: Quantifying Geographic Disparities in LLM Factual Recall