ICLR 2026 Past ML systemsAgentsSafety & alignment

Algorithmic Fairness Across Alignment Procedures and Agentic Systems

AFAA 2026

Submission deadline
Feb 6, 2026, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (35)

Fetched from OpenReview (v2) on 2026-06-10.

  1. Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest

    Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths · PDF
  2. Automatically Finding Reward Model Biases

    Atticus Wang, Iván Arcuschin, Arthur Conmy · PDF
  3. Cross-Linguistic Failures and Disparities in LLM Medical Reasoning: Analyzing XMedBench and CrossMMLU Across Western and Non-Western Languages

    Rehan Nazeem, Akira Hoque, Vedesh Ray Peddoddi, Tim Liu, Kevin Zhu · PDF
  4. Differential Adjusted Parity for Learning Fair Representations

    Bucher Sahyouni, Matthew James Vowels, Liqun Chen, Simon Hadfield · PDF
  5. Disparities in Negation Understanding Across Languages in Vision-Language Models

    Charikleia Moraitaki, Skyler Pulling, Sarah Pan, Gwendolyn Flusche, Kumail Alhamoud, Marzyeh Ghassemi · PDF
  6. Distortion of AI Alignment Revisited: RLHF Is a Decent Utilitarian Aligner

    Kazusato Oko, Annie S Ulichney, Nika Haghtalab, Han Bao · PDF
  7. Evaluating black-box vulnerabilities with Wasserstein-constrained data perturbations

    Adriana Laurindo Monteiro, Jean-Michel Loubes · PDF
  8. Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

    Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis · PDF
  9. FairMed-VLM: Toward Equitable Medical Di- agnosis with Vision–Language Models

    zihao chang, Ruixiang Zhu, Daochu Li, Chaozhi Geng, Siqi Chen · PDF
  10. Fairness Failure Modes of Multimodal LLMs

    Canyu Chen, Anglin Cai, Joan Nwatu, Jianshu Zhang, Yale Li, Han Liu, Jessica Hullman, Rada Mihalcea, Kathleen McKeown, Manling Li · PDF
  11. GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory

    Pepijn Cobben, X. Angelo Huang, Thao Amelia Pham, Isabel Dahlgren, Terry Jingchen Zhang, Zhijing Jin · PDF
  12. Improving Fairness via Noise Injection in Vision Transformers

    Qiaoyue Tang, Sepidehsadat Hosseini, Mengyao Zhai, Thibaut Durand, Greg Mori · PDF
  13. Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation

    Haiyan Feng, Yuqiao Du, Huacong Tang, Junjie Liao, Yipeng Kang, Mingjie Bi, Fangwei Zhong, Zhou Ziheng · PDF
  14. Long-term Fairness with Selective Labels

    Giovani Valdrighi, Isabel Valera, Marcos M. Raimundo · PDF
  15. Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations

    Preethi Seshadri, Samuel Cahyawijaya, Ayomide Odumakinde, Sameer Singh, Seraphina Goldfarb-Tarrant · PDF
  16. Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs

    Edie Pearman, Sophia Osborne, Mira Kandlikar-Bloch, Mina Arzaghi, Florian Carichon, Golnoosh Farnadi · PDF
  17. MEMORIES THAT DISCRIMINATE: DETECTING AND CORRECTING BIAS IN PERSONALIZED HIRING AGENTS

    Himanshu Gharat, Himanshi Agrawal, Gourab K Patro · PDF
  18. Metanetworks as Regulatory Operators: Learning to Edit for Requirement Compliance

    Ioannis Kalogeropoulos, Giorgos Bouritsas, Yannis Panagakis · PDF
  19. MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

    Andor Vári-Kakas, Ji Won Park, Natasa Tagasovska · PDF
  20. Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs

    Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Colin Treleaven · PDF
  21. Moral Preferences of LLMs Under Directed Contextual Influence

    Phil Blandfort, Tushar Karayil, Urja Pawar, Alex McKenzie, Robert Graham, Dmitrii Krasheninnikov · PDF
  22. Navigating the Rashomon Set: The Impact of Score Distributions and Decision Thresholds on Model Agreement

    Giovani Valdrighi, Marcos M. Raimundo · PDF
  23. OC-PRM: Overcredit-Contrastive Training for Precision-First Process Reward Models

    Aakriti Agrawal, Souradip Chakraborty, Armin Saghafian, Nihal Sharma, Rizal Fathony, Nam H Nguyen, C. Bayan Bruss, Amrit Singh Bedi, Furong Huang · PDF
  24. Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation Strategies

    Megan Smith, Venkatesh Thirugnana Sambandham, Florian Richter, Matthias Uhl, Laura Crompton, Torsten Schön · PDF
  25. Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation

    Sola Kim, Dongjune Chang, Jieshu Wang · PDF
  26. Probing Implicit Bias Risk Framing in Language Models

    Rishi Kalra, Andrea Dhelpra, Seonglae Cho, Adriano Koshiyama · PDF
  27. Procedural Fairness Failures in RLHF from Preference Averaging

    M P V S GOPINADH, Karthik Kamuju, Kummari Avinash, Muppana John Joshua, Srinivasa Raju Rudraraju · PDF
  28. Red Teaming the Rules: An Adversarial Approach to Legal Alignment

    Rui-Jie Yew, Greg Demirchyan · PDF
  29. Robust AI Evaluation through Maximal Lotteries

    Hadi Khalaf, Serena Lutong Wang, Daniel Halpern, Itai Shapira, Flavio Calmon, Ariel D. Procaccia · PDF
  30. Scalable Intersectional Bias Auditing in Vision-Language Models through Combinatorial Interaction Testing

    Heejin Bin, Junyoung Choi, JangHyun Kim, Seungjae Kim, Shin Yoo · PDF
  31. SOMnibus: Recovering Underlying Sensitive Attributes with Self-Organizing Maps

    Joseph Charles Bingham, Netanel Arussy, Dvir Aran · PDF
  32. State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition

    Bryan Cheng, Austin Jin, Jasper Zhang · PDF
  33. THE PERSONALIZATION TRAP: HOW USER MEMORY ALTERS EMOTIONAL REASONING IN LLMS

    Weijie Xu, Xi Fang, Yuchong Zhang, Stephanie Eckman, Scott Nickleach, Chandan K. Reddy · PDF
  34. Verifying Alignment Constraints Under Finite-Sample Uncertainty in Composite-Data Regimes

    Blossom Metevier, Max Springer, Bohdan Turbal, Aleksandra Korolova · PDF
  35. When AI Describes Race? Unveiling Racial Bias in Vision-Language Models in Brazilian People

    Leodécio Braz da Silva Segundo, Marcos M. Raimundo · PDF