NeurIPS 2024 Past Safety & alignment

Pluralistic Alignment Workshop at NeurIPS 2024

Pluralistic-Alignment 2024

Submission deadline
Sep 11, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (48)

Fetched from OpenReview (v2) on 2026-06-10.

  1. "There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.

    Elle Michelle Yang, Matthias Gallé, Seraphina Goldfarb-Tarrant · PDF
  2. A Case Study in Plural Governance Design

    Joel Miller, Christopher Kanich, Glen Weyl · PDF
  3. Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

    Hadassah Harland, Richard Dazeley, Peter Vamplew, Hashini Senaratne, Bahareh Nakisa, Francisco Cruz · PDF
  4. AGR: Age Group fairness Reward for Bias Mitigation in LLMs

    Shuirong Cao, Ruoxi Cheng, Zhiqiang wang · PDF
  5. AI, Pluralism, and (Social) Compensation

    Nandhini Swaminathan, David Danks · PDF
  6. Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation

    Raeid Saqur · PDF
  7. Aligning to Thousands of Preferences via System Message Generalization

    Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo · PDF
  8. Are Large Language Models Consistent over Value-laden Questions?

    Jared Moore, Tanvi Deshpande, Diyi Yang · PDF
  9. Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

    Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith · PDF
  10. Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs

    Scott E. Friedman, Noam Benkler, Drisana Mosaphir, Jeffrey Rye, Sonja M. Schmer-Galunder, Micah Goldwater, Matthew McLure, Ruta Wheelock, Jeremy Gottlieb, Robert P. Goldman, Christopher Miller · PDF
  11. Can Language Models Reason about Individualistic Human Values and Preferences?

    Liwei Jiang, Sydney Levine, Yejin Choi · PDF
  12. Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

    Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin, Amy X Zhang · PDF
  13. Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

    Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Kumar Avinava Dubey, Alexandre Rame, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Leonard Hussenot, Olivier Bachem, Edouard Leurent · PDF
  14. Contrastive Learning Neuromotor Interface From Teacher

    Kilian Freitag, Ran Wei · PDF
  15. Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training

    Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme · PDF
  16. Critique-out-Loud Reward Models

    Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan Daniel Chang, Prithviraj Ammanabrolu · PDF
  17. Diverging Preferences: When do Annotators Disagree and do Models Know?

    Michael JQ Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin · PDF
  18. Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

    Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma · PDF
  19. Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study

    Tanay Baswa, Nitin Aravind Birur, Divyanshu Kumar, Jatan Loya, Anurakt Kumar, Prashanth Harshangi, Sahil Agarwal · PDF
  20. Evaluating the Prompt Steerability of Large Language Models

    Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu · PDF
  21. FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness

    Tina Behzad, Mithilesh Kumar Singh, Anthony J. Ripa, Klaus Mueller · PDF
  22. From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

    Thom Lake, Eunsol Choi, Greg Durrett · PDF
  23. Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment

    Sangwoong Yoon, William Bankes, Seongho Son, Anja Petrovic, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic · PDF
  24. Intuitions of Compromise: Utilitarianism vs. Contractualism

    Jared Moore, Yejin Choi, Sydney Levine · PDF
  25. Learning from Personal Preferences

    Kelly Jiang, Berk Ustun, Jessica Hullman · PDF
  26. Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

    Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang · PDF
  27. Mechanism Design for LLM Fine-tuning with Multiple Reward Models

    Haoran Sun, Yurong Chen, Siwei Wang, Wei Chen, Xiaotie Deng · PDF
  28. MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces

    Shravan Nayak, Rashid Mushkani, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand · PDF
  29. Model Plurality: A Taxonomy for Pluralistic AI

    Christina Lu, Max Van Kleek · PDF
  30. Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

    Peter Vamplew, Conor F. Hayes, Cameron Foale, Richard Dazeley, Hadassah Harland · PDF
  31. Multilingual Trolley Problems for Language Models

    Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf · PDF
  32. PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

    Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak · PDF
  33. Pareto-Optimal Learning from Preferences with Hidden Context

    Ryan Boldi, Li Ding, Lee Spector, Scott Niekum · PDF
  34. Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

    Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta, Natasha Jaques · PDF
  35. PersonalLLM: Tailoring LLMs to Individual Preferences

    Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye, Ang Li, Hongseok Namkoong · PDF
  36. Pluralistic Alignment Over Time

    Toryn Q. Klassen, Parand A. Alamdari, Sheila A. McIlraith · PDF
  37. Plurality of value pluralism and AI value alignment

    Atoosa Kasirzadeh · PDF
  38. Plurals: A system for pluralistic AI via simulated social ensembles

    Joshua Ashkinaze, Eric Gilbert, Ceren Budak · PDF
  39. Policy Aggregation

    Parand A. Alamdari, Soroush Ebadian, Ariel D. Procaccia · PDF
  40. Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Ruoxi Cheng, Hao-Xuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo · PDF
  41. Representative Social Choice: From Learning Theory to AI Alignment

    Tianyi Qiu · PDF
  42. Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment

    Nicholas A. Caputo · PDF
  43. Selective Preference Aggregation

    Shreyas Kadekodi, Hayden McTavish, Berk Ustun · PDF
  44. Toward Democracy Levels for AI

    Aviv Ovadya, Luke Thorburn, Kyle Redman, Flynn Devine, Smitha Milli, Manon Revel, Andrew Konya, Atoosa Kasirzadeh · PDF
  45. Tractable Agreement Protocols

    Natalie Collina, Surbhi Goel, Varun Gupta, Aaron Roth · PDF
  46. Value Alignment from Unstructured Text

    Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Manish Nagireddy, Pierre Dognin, Kush R. Varshney · PDF
  47. Value-Aligned Imitation via focused Satisficing

    Rushit N. Shah, Nikolaos Agadakos, Synthia Sasulski, Ali Farajzadeh, Sanjiban Choudhury, Brian D Ziebart · PDF
  48. Virtual Personas for Language Models via an Anthology of Backstories

    Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David Chan · PDF