ICML 2026 Past Safety & alignment

Pluralistic Alignment Workshop at ICML 2026

Pluralistic-Alignment 2026

Submission deadline
May 9, 2026, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (80)

Fetched from OpenReview (v2) on 2026-06-10.

  1. Adaptive Pluralistic Alignment: a pipeline for dynamic artificial democracy

    Rachel Freedman · PDF
  2. AI Pluralism and the Worlds It Misses

    Rashid Mushkani · PDF
  3. Algorithmic Approaches to Opinion Selection for Online Deliberation: A Comparative Study

    Salim Hafid, Manon Berriche, Jean-Philippe Cointet · PDF
  4. Benchmarking Pluralistic Alignment Through Persona-Conditioned Behavioral Evaluation

    Archie Chaudhury, Shikhar Shiromani, Ayushi Mehta · PDF
  5. Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data

    Eun Cheol Choi, Youngrae Kim, Prabhu Pugalenthi, Hong-En Chen, Bo-Ruei Huang · PDF
  6. Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies

    Abhilekh Borah · PDF
  7. Changing Tunes: A Longitudinal Study of Political Drift in LLMs

    Bruno Demattos Nogueira, Jost Große Perdekamp, Leon Swazinna, Elisabeth Kirsten, Nils Christopher Köbis, Juhi Kulshrestha, Markus Pauly, Muhammad Bilal Zafar · PDF
  8. ConstitutionMAS-EC: Peer Constitutional Critique for Aligned Emergent Communication in Decentralized Multi-Agent LLMs

    Rishi Ashish Shah, Priyanshu Banik, RAHUL KATARYA, Himanshu Nandanwar · PDF
  9. Data Mixing for Group Preference Heterogeneity in Collaborative Filtering

    David Mingfei Liu, Haruka Kiyohara, Sarah Dean · PDF
  10. Deference by Design: Pluralistic Alignment Is an Interface Problem

    Steven Molotnikov, Cathy Mengying Fang, Patricia Maes · PDF
  11. Directional Influence and Consensus Formation in Multi-Agent Systems

    Prisha Priyadarshini, Aryan Shrivastava · PDF
  12. Diversifying Multiple Generative Agents by Aligning with Human Populations

    Manh Hung Nguyen, Sebastian Tschiatschek, Adish Singla · PDF
  13. Do LLMs Acknowledge Disputed Facts? A Benchmark for Factual Pluralism in LLMs

    Enfa Fane, Mihai Surdeanu · PDF
  14. Does AI Assistance Preserve or Collapse Disagreement? A Study of Pre-Annotations in Ambiguous Video Labeling

    Juan Gutiérrez, Víctor Gutiérrez-García, Jose Luis Blanco-Murillo · PDF
  15. Does Privacy Always Harm Fairness? Data-Dependent Trade-offs via Chernoff Information Neural Estimation

    Arjun Nichani, Hsiang Hsu, Chun-Fu Chen, Haewon Jeong · PDF
  16. Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models

    Jongwook Han, Jongwon Lim, Injin Kong, Yohan Jo · PDF
  17. EGGROLL-IPO: Pluralistic Alignment via Decentralised Post-Training with Population Preferences

    Alfie Lamerton, Bidipta Sarkar, Roberto-Rafael Maura-Rivero, Jakob Nicolaus Foerster · PDF
  18. Evaluating Pluralism in LLMs through Latent Perspectives

    Laura Majer, Jan Šnajder, Martin Tutek · PDF
  19. Event-Driven Reinforcement Learning for Pluralistic Alignment

    Soyoung Yun, HAYOUNG OH · PDF
  20. For Questions of Ought, AI Could Use Some SAGE Advice

    Smitha Milli, Ratip Emin Berker, Sonja Kraiczy, Claudia Shi, Jack Kussman, Avinandan Bose, Edith Elkind, Himaghna Bhattacharjee, Ariel D. Procaccia, Maximilian Nickel · PDF
  21. FRAGILE: Benchmarking Framing Sensitivity in High-Stakes Decision-Making

    Seojin Hwang, Minju Kim, Junhyuk Choi, Hwanhee Lee · PDF
  22. From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets

    Zakk Heile, Hayden McTavish, Varun Babbar, Margo Seltzer, Cynthia Rudin · PDF
  23. Geometry of Values: Task Vector Composition for Ethical Preference Alignment in Language Models

    Utkarsh Agarwal, Monojit Choudhury · PDF
  24. HEARSAYBENCH: Can LLMs Navigate from Abstract Human Rights to Lived Lives?

    Sobhan Lotfi, Ava Iranmanesh, Ali Iranmanesh, Liwei Jiang · PDF
  25. Helpful or Safe? UltraFeedback's Binarized Labels Encode a Value Tradeoff

    Jingyi Zhang · PDF
  26. Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance

    Luozhijie Jin, Zijie Qiu, Zijie Diao, Lifeng Qiao, Ning Ding, Alex Lamb, Xipeng Qiu · PDF
  27. Innocuous-Seeming Data, Latent Ideology: Ideological Generalisation in Finetuned LLMs

    Robert Graham, Edward Stevinson, Yariv Barsheshat · PDF
  28. It’s Up to Interpretation: Aligning to One’s Ever-Shifting Internal State

    Tiffany Wang, Vincent Huang · PDF
  29. Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction

    Sejun Park, Yoonah Park, Jongwon Lim, Yohan Jo · PDF
  30. Learning Unanimously Acceptable Lotteries via Queries

    Davin Choo, Paul W. Goldberg, Nicholas Teh · PDF
  31. LLM Human Response Alignment: A Multi-Sample Debiasing Framework

    Li Jiang, Xiao Liu · PDF
  32. Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain

    Joshua Muhumuza, Joab Ezra Agaba, Mercy Rebekah Amiyo · PDF
  33. Memetic Capture: A Pluralistic Policy Framework for Governing AI-Driven Cultural Disempowerment

    Subramanyam Sahoo · PDF
  34. Memetic Drift in Multi-Agent LLMs: Scaling Laws for Consensus Under Pluralistic Uncertainty

    Hidenori Tanaka · PDF
  35. Mission Impossible: Universal Moral Alignment

    Saimun Habib, Xiao Xiao, Meng Fang, Fengxiang He · PDF
  36. Modeling diverse preferences in movie artwork personalization with large language models

    HyunJi Nam, Sejoon Oh, Emma Yanyang Kong, Yesu Feng, Moumita Bhattacharya · PDF
  37. Moral Orientation and Calibration: Coupled in Human Annotators, Separable in Judge LLMs

    Youngsam Chun · PDF
  38. Multi-Action-Head On-Policy Self-Distillation for Pluralistic Alignment

    Yiran Jenny Shen, Yu Xia, Liuyi Yao, Prithviraj Ammanabrolu · PDF
  39. PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration

    Arnav Raj · PDF
  40. Pedagogical Games: Paths to Generalisation for Agentic Moral Alignment

    Krish Sen, Nikhil Narayanan, Luca Franceschetti, Jonathan Robinson, Yadnyesh Chakane, Shobhit Aggarwal, Dylan Waldner, Elizaveta Tennant · PDF
  41. Personalization, Personas, and Forecasting in Value Alignment

    James Wedgwood, Pratiksha Thaker, Neil Kale, Virginia Smith · PDF
  42. PIPE: Personalized Image-generation via Preference Encoding

    Moonkyung Ryu, Chih-Wei Hsu, Avinab Saha, Ofir Nabati, Guy Tennenholtz, Junfeng He, Craig Boutilier · PDF
  43. Playing Devil’s Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy

    Ishaan Kelkar, Nebras Alam, Vikram Kakaria, Madhur Panwar, Vasu Sharma, Maheep Chaudhary · PDF
  44. Pluralistic AI Alignment Requires Inference-Time Multi-Objective Control

    Weichen Li, Mislav Stojanović, Daniel Neider, Marius Kloft, Sophie Fellenz · PDF
  45. Pluralistic Preference Alignment via Sortition-Weighted RLHF

    Suvadip Sana, Jinzhou Wu, Martin T. Wells · PDF
  46. Position: Aggregate Preference Optimization Hides a Posterior Identifiability Failure for Pluralistic Alignment

    Zezheng Lin, Jinhao Gan · PDF
  47. Position: Align AI to Our Aspirations, Not Our Flaws

    Nikita Kazeev, Phan Bui Nhat Huyen · PDF
  48. Position: LLM alignment data should be regulated as mass media

    João Gonçalves · PDF
  49. Position: Why LLMs Should Be Reasonably Morally Inconsistent

    Jakob Stenseke, Aidan Kierans, Itamar Pres, Dylan Hadfield-Menell · PDF
  50. PRISM: When Agents Provably Learn from Pluralistic Human Feedback

    Shuo Yang, Zhen Chen, Sujay Sanghavi · PDF
  51. Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

    Junghyun Lee, Minju Hong, Kwang-Sung Jun, Chulhee Yun, Se-Young Yun · PDF
  52. Reasoning Models Generate Societies of Thought

    Junsol Kim, Shiyang Lai, Nino Scherrer, Blaise Aguera y Arcas, James Evans · PDF
  53. Reducing Supervision Uncertainty Induces Model Miscalibration

    Leixin Zhang, Cagri Coltekin · PDF
  54. Response-Aware User Memory Selection for LLM Personalization

    Jillian Fisher, Jennifer Neville, Chan Young Park · PDF
  55. Rethinking AI Alignment: From Static Rewards to Social Reinforcement Learning

    Majid Ghasemi, Mark Crowley · PDF
  56. Rethinking Diversity-Preserving RL for Pluralistic Alignment: Empirical Evidence from Rubric-Grounded Moral Reasoning

    Zhaowei Zhang, Xiaohan Liu, Xuekai Zhu, Junchao Huang, Ceyao Zhang, Xiang Liu, ZhiYuan Feng, Yaodong Yang, Xiaoyuan Yi, Xing Xie · PDF
  57. Rethinking Scaffolding in LLM Tutors: The Interactional Mismatch Between Benchmarks and Real-World Deployments

    Alexandra Neagu, Jeffrey T. H. Wong, Marcus Messer, Rhodri Nelson, Peter B. Johnson · PDF
  58. RobotValues: Evaluating Household Robots When Human Values Conflict

    Jongwook Han, Hyeongjin Kim, Yohan Jo · PDF
  59. RouteJudge: Preference-Based Evaluation of LLM Routers under Pluralistic User Preferences

    Guannan Lai, Haoran Hu, Han-Jia Ye · PDF
  60. Same Facts, Different Updates: Inference Setup Shapes LLM Behavior in Medical Allocation

    Spencer Gibson, Tyler Crosse, Magnus Saebo, Achyutha Menon, Eyon Jang, Diogo Cruz · PDF
  61. Separating Value Disagreement from Data Uncertainty in Pluralistic Preference Data

    Ahmad A Rushdi · PDF
  62. Side Effects of Character Training: Quantifying Cross-Constitution Drift in LLMs

    Bhagyesh Kumar, Ananya Sutradhar, Saurav Panigrahi, Jonathn Chang, Lionel Levine · PDF
  63. Social Choice Foundations for Simulation-Augmented Generation

    Sonja Kraiczy, Smitha Milli, Ratip Emin Berker, Avinandan Bose, Brandon Amos, Jamelle Watson-Daniels, Maximilian Nickel, Edith Elkind, Ariel D. Procaccia · PDF
  64. Socially Grounded Agentic AI: Coordinating Plural Perspectives through Social Theory

    Matt Ratto, Abhishek Moturu, Daniel Silver · PDF
  65. Steerable Cultural Preference Optimization of Reward Models

    Minsik Oh, Advit Deepak, Sophie Wu, Douwe Kiela, Ekaterina Shutova · PDF
  66. The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety

    Ian Rios-Sialer · PDF
  67. The Language of Bargaining: Linguistic Effects in LLM Negotiations

    Stuti Sinha, Himanshu Kumar, Aryan Raju Mandapati, Rakshit Sakhuja, Dhruv Kumar · PDF
  68. The Persona Fidelity Gap: Behaviorally Grounded LLM Personas Still Compress Real-User Preference Diversity

    Rishav Kumar, Atul Dev, Shivank Garg · PDF
  69. The Wedge Questions: Latent Cultural Boundaries in LLMs via Persona Projection Divergence

    Yejin Son, Yongjin Yang, Ryan Faulkner, Matt Ratto, Seungwon Lim, Youngjae Yu, Zhijing Jin · PDF
  70. ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

    Chuanyang Jin, Binze Li, Haopeng Xie, Cathy Mengying Fang, Tianjian Li, Shayne Longpre, Hongxiang Gu, Maximillian Chen, Tianmin Shu · PDF
  71. To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands

    Fangyi Yu, Nabeel Seedat, Jonathan Richard Schwarz, Andrew M. Bean · PDF
  72. ToolAlignBench: Investigating Alignment Conflicts in Tool-Calling Enabled LLMs

    Aryan Keluskar, Amrita Bhattacharjee, Huan Liu · PDF
  73. Toward Deployable Pluralistic Alignment in Robotics: Learning Similarity-Grouped Rewards from Diverse Human Preferences

    Taehyung Kim, Gwangmo Lee, Jonghak Bae, Dongjae Kim, Jaewoong Han, Jongeun Choi · PDF
  74. Universal Alignment Fails in Global Classrooms: Cross-Cultural Blind Spots in EdTech AI

    Zijin Wu, David Scott Lewis · PDF
  75. What Aggregate Accuracy Hides: Cultural Affective Inequity in Multilingual LLMs

    Youngjin Lee, HAYOUNG OH · PDF
  76. What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models

    Payal Chandak, Victoria Alkin, David Wu, Maya Dagan, Taposh Dutta Roy, Maria Clara Saad Menezes, Ayush Noori, Nirali Somia, John S Brownstein, Ran Balicer, Rebecca Weintraub Brendel, Noa Dagan, Isaac S. Kohane, Gabriel A Brat · PDF
  77. When Disagreement Matters: Friction, Pluralistic Alignment, and National-Security AI

    Morgan Livingston · PDF
  78. When We Don’t See The Same Picture: Aligning Agents with Divergent Visual Spaces

    Gul Zain Khan, Stephan Alaniz, Eric Schulz, Zeynep Akata · PDF
  79. Where Models Concentrate and Humans Spread: A Coverage Framework for Distributional Pluralism in Open-Ended Generation

    Zini Yang, Emily Wenger, Richard Jean So · PDF
  80. Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts

    Niklas Weller, Emilio Barkett · PDF