ICLR 2024 Past Large language modelsFairness & ethics

ICLR 2024 Workshop on Reliable and Responsible Foundation Models

ICLR 2024 R2-FM Workshop

Submission deadline
Feb 11, 2024, 12:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (66)

Fetched from OpenReview (v2) on 2026-06-10.

  1. ©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model

    Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu · PDF
  2. A StrongREJECT for Empty Jailbreaks

    Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer · PDF
  3. Actions Speak Louder than Words: Superficial Fairness Alignment in LLMs

    Qiyao Wei, Alex James Chan, Lea Goetz, David Watson, Mihaela van der Schaar · PDF
  4. Adversarial Robustness for Visual Grounding of Multimodal Large Language Models

    Kuofeng Gao, Yang Bai, Jiawang Bai, Yong Yang, Shu-Tao Xia · PDF
  5. AI-Generated Images Introduce Invisible Relevance Bias to Text-Image Retrieval

    Shicheng Xu, Danyang Hou, Liang Pang, Jingcheng Deng, Jun Xu, Huawei Shen, Xueqi Cheng · PDF
  6. Aligning Modalities in Vision Large Language Models via Preference Fine-tuning

    Yiyang Zhou, Chenhang Cui, Rafael Rafailov, Chelsea Finn, Huaxiu Yao · PDF
  7. Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

    Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson · PDF
  8. Augmentation Alone Leads to Generalization

    Runtian Zhai, Bingbin Liu, Andrej Risteski, J Zico Kolter, Pradeep Kumar Ravikumar · PDF
  9. AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition

    Zhaorun Chen, Zhuokai Zhao, Zhihong Zhu, Ruiqi Zhang, Xiang Li, Bhiksha Raj, Huaxiu Yao · PDF
  10. Boosting Jailbreak Attack With Momentum

    Yihao Zhang, Zeming Wei · PDF
  11. Can Generative Multimodal Models Count to Ten?

    Sunayana Rane, Alexander Ku, Jason Michael Baldridge, Ian Tenney, Thomas L. Griffiths, Been Kim · PDF
  12. Can Large Language Models Achieve Calibration with In-Context Learning?

    Chengzu Li, Han Zhou, Goran Glavaš, Anna Korhonen, Ivan Vulić · PDF
  13. Can Large Language Models Reason Robustly with Noisy Rationales?

    Zhanke Zhou, Rong Tao, Jianing Zhu, Yiwen Luo, Zengmao Wang, Bo Han · PDF
  14. Chain-of-Verification Reduces Hallucination in Large Language Models

    Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason E Weston · PDF
  15. Composing Knowledge and Compression Interventions for Language Models

    Arinbjörn Kolbeinsson, Tianjin Huang, Shanghua Gao, Shiwei Liu, Jonathan Richard Schwarz, Anurag Jayant Vaidya, Faisal Mahmood, Marinka Zitnik, Tianlong Chen, Thomas Hartvigsen · PDF
  16. Confidence Calibration and Rationalization for LLMs via Multi-Agent Deliberation

    Ruixin Yang, Dheeraj Rajagopal, Shirley Anugrah Hayati, Bin Hu, Dongyeop Kang · PDF
  17. Dataset MKSL for measuring adequate response performance by knowledge level

    NohMyongSung, Cho Ung Hui · PDF
  18. Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data For Language Models

    Minhao Jiang, Ken Liu, Ming Zhong, Rylan Schaeffer, Siru Ouyang, Jiawei Han, Sanmi Koyejo · PDF
  19. Evaluating Model Bias Requires Characterizing its Mistakes

    Isabela Albuquerque, Jessica Schrouff, David Warde-Farley, Ali Taylan Cemgil, Sven Gowal, Olivia Wiles · PDF
  20. Experts Don't Cheat: Learning What You Don't Know By Predicting Pairs

    Daniel D. Johnson, Daniel Tarlow, David Duvenaud, Chris J. Maddison · PDF
  21. Explaining latent representations of generative models with large multimodal models

    Mengdan Zhu, Zhenke Liu, Bo Pan, Abhinav Angirekula, Liang Zhao · PDF
  22. Explicit Knowledge Factorization Meets In-Context Learning: What Do We Gain?

    Sarthak Mittal, Eric Elmoznino, Leo Gagnon, Sangnie Bhardwaj, Dhanya Sridhar, Guillaume Lajoie · PDF
  23. Exploring the Robustness of In-Context Learning with Noisy Labels

    Chen Cheng, Xinzhi Yu, Haodong Wen, Jingsong Sun, Guanzhang Yue, Yihao Zhang, Zeming Wei · PDF
  24. HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding

    Zhaorun Chen, Zhuokai Zhao, Hongyin Luo, Huaxiu Yao, Bo Li, Jiawei Zhou · PDF
  25. Hijacking Context in Large Multi-modal Models

    Joonhyun Jeong · PDF
  26. How many Opinions does your LLM have? Improving Uncertainty Estimation in NLG

    Lukas Aichberger, Kajetan Schweighofer, Mykyta Ielanskyi, Sepp Hochreiter · PDF
  27. How to train your VIT for OOD detection

    Maximilian Müller, Matthias Hein · PDF
  28. Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty

    Chen Ling, Xujiang Zhao, Xuchao Zhang, Yanchi Liu, Wei Cheng, Haoyu Wang, Zhengzhang Chen, Mika Oishi, Takao Osaki, Katsushi Matsuda, Liang Zhao, Haifeng Chen · PDF
  29. In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation

    Shiqi Chen, Miao Xiong, Junteng Liu, Zhengxuan Wu, Teng Xiao, Siyang Gao, Junxian He · PDF
  30. Instruction Tuning for Secure Code Generation

    Jingxuan He, Mark Vero, Gabriela Krasnopolska, Martin Vechev · PDF
  31. Keeping LLMs Aligned After Fine-tuning: The Crucial Role of Prompt Templates

    Kaifeng Lyu, Haoyu Zhao, Xinran Gu, Dingli Yu, Anirudh Goyal, Sanjeev Arora · PDF
  32. LARGE LANGUAGE MODEL CASCADES WITH MIXTURE OF THOUGHT REPRESENTATIONS FOR COST-EFFICIENT REASONING

    Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao · PDF
  33. Large Language Models are Anonymizers

    Robin Staab, Mark Vero, Mislav Balunovic, Martin Vechev · PDF
  34. Mapping Social Choice Theory to RLHF

    Jessica Dai, Eve Fleisig · PDF
  35. Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

    Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Krueger · PDF
  36. Memorization and Privacy Risks in Domain-Specific Large Language Models

    Xinyu Yang, Zichen Wen, Wenjie Qu, Zhaorun Chen, Zhiying Xiang, Beidi Chen, Huaxiu Yao · PDF
  37. On Fairness Implications and Evaluations of Low-Rank Adaptation of Large Models

    Ken Liu, Zhoujie Ding, Berivan Isik, Sanmi Koyejo · PDF
  38. Personalized Language Modeling from Personalized Human Feedback

    Xinyu Li, Zachary Chase Lipton, Liu Leqi · PDF
  39. Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

    Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner · PDF
  40. Preventing Memorized Completions through White-Box Filtering

    Oam Patel, Rowan Wang · PDF
  41. Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines

    Yuchen Li, Alexandre Kirchmeyer, Aashay Mehta, Yilong Qin, Boris Dadachev, Kishore A Papineni, Sanjiv Kumar, Andrej Risteski · PDF
  42. Prompting for Robustness: Extracting Robust Classifiers from Foundation Models

    Amrith Setlur, Saurabh Garg, Virginia Smith, Sergey Levine · PDF
  43. ProTransformer: Robustify Transformers via Plug-and-Play Paradigm

    Zhichao Hou, Weizhi Gao, Yuchen Shen, Xiaorui Liu · PDF
  44. Questioning the Survey Responses of Large Language Models

    Ricardo Dominguez-Olmedo, Moritz Hardt, Celestine Mendler-Dünner · PDF
  45. RAMBLA: A FRAMEWORK FOR EVALUATING THE RELIABILITY OF LLMS AS ASSISTANTS IN THE BIOMEDICAL DOMAIN

    William James Bolton, Rafael Poyiadzi, Edward Morrell, Gabriela van Bergen Gonzalez Bueno, Lea Goetz · PDF
  46. Re-Ex: Revising after Explanation reduces the Factual Errors in LLM Responses

    Juyeon Kim, Jeongeun Lee, YoonHo Chang, CHANYEOL CHOI, Jun-Seong Kim, Jy-yong Sohn · PDF
  47. Robust CLIP: Unsupervised Adversarial Fine-tuning of Vision Embeddings for Robust Large Vision-Language Models

    Christian Schlarmann, Naman Deep Singh, Francesco Croce, Matthias Hein · PDF
  48. Scaling Compute Is Not All You Need for Adversarial Robustness

    Edoardo Debenedetti, Zishen Wan, Maksym Andriushchenko, Vikash Sehwag, Kshitij Bhardwaj, Bhavya Kailkhura · PDF
  49. Seeing is Believing: Mitigating Hallucination in Large Vision-Language Models via CLIP-Guided Decoding

    Ailin Deng, Zhirui Chen, Bryan Hooi · PDF
  50. Self-Alignment of Large Language Models via Social Scene Simulation

    Xianghe Pang, Shuo Tang, Rui Ye, Yuxin Xiong, Bolun Zhang, Yanfeng Wang, Siheng Chen · PDF
  51. Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation

    Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu · PDF
  52. Setting the Record Straight on Transformer Oversmoothing

    Gbetondji Jean-Sebastien Dovonon, Michael M. Bronstein, Matt Kusner · PDF
  53. Skip $\textbackslash n$: A simple method to reduce hallucination in Large Vision-Language Models

    Zongbo Han, Zechen Bai, Haiyang Mei, Qianli Xu, Changqing Zhang, Mike Zheng Shou · PDF
  54. Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework

    Jingling Li, Zeyu Tang, Xiaoyu Liu, Peter Spirtes, Kun Zhang, Liu Leqi, Yang Liu · PDF
  55. texplain: Post-hoc Textual Explanation of Image Classifiers with Pre-trained Language Models

    Saeid Asgari, Aliasghar Khani, Amir Hosein Khasahmadi, Aditya Sanghi, Karl D.D. Willis, Ali Mahdavi Amiri · PDF
  56. THE BIAS OF HARMFUL LABEL ASSOCIATIONS IN VISION-LANGUAGE MODELS

    Caner Hazirbas, Alicia Yi Sun, Yonathan Efroni, Mark Ibrahim · PDF
  57. Towards Logically Consistent Language Models via Probabilistic Reasoning

    Diego Calanzone, Antonio Vergari, Stefano Teso · PDF
  58. Towards Personalized AI: Early-stopping Low-Rank Adaptation of Foundation Models

    Zihao Luo, Di Wang, Yun Sing Koh, Jingfeng Zhang · PDF
  59. Unified Hallucination Detection for Multimodal Large Language Models

    Xiang Chen, Chenxi Wang, Ningyu Zhang, Yida Xue, xiaoyan yang, Yue Shen, Jinjie GU, Huajun Chen · PDF
  60. Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation

    Zhengyue Zhao, Jinhao Duan, Xing Hu, Kaidi Xu, Chenan Wang, Rui Zhang, Zidong Du, Qi Guo, Yunji Chen · PDF
  61. Unsolvable Problem Detection for Vision Language Models

    Atsuyuki Miyai, Jingkang Yang, Jingyang Zhang, Yifei Ming, Qing Yu, Go Irie, Yixuan Li, Hai Li, Ziwei Liu, Kiyoharu Aizawa · PDF
  62. Value Augmented Sampling: Predict Your Rewards To Align Language Models

    Seungwook Han, Idan Shenfeld, Akash Srivastava, Yoon Kim, Pulkit Agrawal · PDF
  63. Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations

    Katie Matton, Robert Ness, Emre Kiciman · PDF
  64. Watermark Stealing in Large Language Models

    Nikola Jovanović, Robin Staab, Martin Vechev · PDF
  65. WAVES: Benchmarking the Robustness of Image Watermarks

    Mucong Ding, Tahseen Rabbani, Bang An, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang · PDF
  66. WorldBench: Quantifying Geographic Disparities in LLM Factual Recall

    Mazda Moayeri, Soheil Feizi · PDF