ICLR 2026 Past ML systemsAgentsSafety & alignment
Algorithmic Fairness Across Alignment Procedures and Agentic Systems
AFAA 2026
- Submission deadline
- Feb 6, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (35)
Fetched from OpenReview (v2) on 2026-06-10.
-
Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
-
Automatically Finding Reward Model Biases
-
Cross-Linguistic Failures and Disparities in LLM Medical Reasoning: Analyzing XMedBench and CrossMMLU Across Western and Non-Western Languages
-
Differential Adjusted Parity for Learning Fair Representations
-
Disparities in Negation Understanding Across Languages in Vision-Language Models
-
Distortion of AI Alignment Revisited: RLHF Is a Decent Utilitarian Aligner
-
Evaluating black-box vulnerabilities with Wasserstein-constrained data perturbations
-
Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
-
FairMed-VLM: Toward Equitable Medical Di- agnosis with Vision–Language Models
-
Fairness Failure Modes of Multimodal LLMs
-
GT-HarmBench: Benchmarking AI Safety Risks Through the Lens of Game Theory
-
Improving Fairness via Noise Injection in Vision Transformers
-
Learning to Be Fair: Modeling Fairness Dynamics by Simulating Moral-Based Multi-Agent Resource Allocation
-
Long-term Fairness with Selective Labels
-
Lost in Simulation: LLM-Simulated Users are Unreliable Proxies for Human Users in Agentic Evaluations
-
Mechanics of Bias and Reasoning: Interpreting the Impact of Chain-of-Thought Prompting on Gender Bias in LLMs
-
MEMORIES THAT DISCRIMINATE: DETECTING AND CORRECTING BIAS IN PERSONALIZED HIRING AGENTS
-
Metanetworks as Regulatory Operators: Learning to Edit for Requirement Compliance
-
MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment
-
Mind the Gap: Evaluating Model- and Agentic-Level Vulnerabilities in LLMs with Action Graphs
-
Moral Preferences of LLMs Under Directed Contextual Influence
-
Navigating the Rashomon Set: The Impact of Score Distributions and Decision Thresholds on Model Agreement
-
OC-PRM: Overcredit-Contrastive Training for Precision-First Process Reward Models
-
Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation Strategies
-
Persona Alchemy: Designing, Evaluating, and Implementing Psychologically-Grounded LLM Agents for Diverse Stakeholder Representation
-
Probing Implicit Bias Risk Framing in Language Models
-
Procedural Fairness Failures in RLHF from Preference Averaging
-
Red Teaming the Rules: An Adversarial Approach to Legal Alignment
-
Robust AI Evaluation through Maximal Lotteries
-
Scalable Intersectional Bias Auditing in Vision-Language Models through Combinatorial Interaction Testing
-
SOMnibus: Recovering Underlying Sensitive Attributes with Self-Organizing Maps
-
State Space Models are Effective Sign Language Learners: Exploiting Phonological Compositionality for Vocabulary-Scale Recognition
-
THE PERSONALIZATION TRAP: HOW USER MEMORY ALTERS EMOTIONAL REASONING IN LLMS
-
Verifying Alignment Constraints Under Finite-Sample Uncertainty in Composite-Data Regimes
-
When AI Describes Race? Unveiling Racial Bias in Vision-Language Models in Brazilian People