NeurIPS 2024PastSafety & alignment

Pluralistic Alignment Workshop at NeurIPS 2024

Pluralistic-Alignment 2024

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Sep 11, 2024, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (48)

Fetched from OpenReview (v2) on 2026-06-10.

"There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.
Elle Michelle Yang, Matthias Gallé, Seraphina Goldfarb-Tarrant · PDF
A Case Study in Plural Governance Design
Joel Miller, Christopher Kanich, Glen Weyl · PDF
Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI
Hadassah Harland, Richard Dazeley, Peter Vamplew, Hashini Senaratne, Bahareh Nakisa, Francisco Cruz · PDF
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
Shuirong Cao, Ruoxi Cheng, Zhiqiang wang · PDF
AI, Pluralism, and (Social) Compensation
Nandhini Swaminathan, David Danks · PDF
Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation
Raeid Saqur · PDF
Aligning to Thousands of Preferences via System Message Generalization
Seongyun Lee, Sue Hyun Park, Seungone Kim, Minjoon Seo · PDF
Are Large Language Models Consistent over Value-laden Questions?
Jared Moore, Tanvi Deshpande, Diyi Yang · PDF
Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI
Parand A. Alamdari, Toryn Q. Klassen, Rodrigo Toro Icarte, Sheila A. McIlraith · PDF
Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs
Scott E. Friedman, Noam Benkler, Drisana Mosaphir, Jeffrey Rye, Sonja M. Schmer-Galunder, Micah Goldwater, Matthew McLure, Ruta Wheelock, Jeremy Gottlieb, Robert P. Goldman, Christopher Miller · PDF
Can Language Models Reason about Individualistic Human Values and Preferences?
Liwei Jiang, Sydney Levine, Yejin Choi · PDF
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Andrew Konya, Aviv Ovadya, Kevin Feng, Quan Ze Chen, Lisa Schirch, Colin Irwin, Amy X Zhang · PDF
Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning
Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Kumar Avinava Dubey, Alexandre Rame, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Leonard Hussenot, Olivier Bachem, Edouard Leurent · PDF
Contrastive Learning Neuromotor Interface From Teacher
Kilian Freitag, Ran Wei · PDF
Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training
Jingyu Zhang, Ahmed Elgohary, Ahmed Magooda, Daniel Khashabi, Benjamin Van Durme · PDF
Critique-out-Loud Reward Models
Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan Daniel Chang, Prithviraj Ammanabrolu · PDF
Diverging Preferences: When do Annotators Disagree and do Models Know?
Michael JQ Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin · PDF
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee, Freda Shi, Parisa Kordjamshidi, Joyce Chai, Ziqiao Ma · PDF
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study
Tanay Baswa, Nitin Aravind Birur, Divyanshu Kumar, Jatan Loya, Anurakt Kumar, Prashanth Harshangi, Sahil Agarwal · PDF
Evaluating the Prompt Steerability of Large Language Models
Erik Miehling, Michael Desmond, Karthikeyan Natesan Ramamurthy, Elizabeth M. Daly, Pierre Dognin, Jesus Rios, Djallel Bouneffouf, Miao Liu · PDF
FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness
Tina Behzad, Mithilesh Kumar Singh, Anthony J. Ripa, Klaus Mueller · PDF
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Thom Lake, Eunsol Choi, Greg Durrett · PDF
Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment
Sangwoong Yoon, William Bankes, Seongho Son, Anja Petrovic, Shyam Sundhar Ramesh, Xiaohang Tang, Ilija Bogunovic · PDF
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore, Yejin Choi, Sydney Levine · PDF
Learning from Personal Preferences
Kelly Jiang, Berk Ustun, Jessica Hullman · PDF
Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam, David Yao, Wenpin Tang · PDF
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
Haoran Sun, Yurong Chen, Siwei Wang, Wei Chen, Xiaotie Deng · PDF
MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces
Shravan Nayak, Rashid Mushkani, Hugo Berard, Allison Cohen, Shin Koseki, Hadrien Bertrand · PDF
Model Plurality: A Taxonomy for Pluralistic AI
Christina Lu, Max Van Kleek · PDF
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
Peter Vamplew, Conor F. Hayes, Cameron Foale, Richard Dazeley, Hadassah Harland · PDF
Multilingual Trolley Problems for Language Models
Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez Adauto, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, Bernhard Schölkopf · PDF
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen, Yi Chen, Aniket Rege, Ramya Korlakai Vinayak · PDF
Pareto-Optimal Learning from Preferences with Hidden Context
Ryan Boldi, Li Ding, Lee Spector, Scott Niekum · PDF
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Sriyash Poddar, Yanming Wan, Hamish Ivison, Abhishek Gupta, Natasha Jaques · PDF
PersonalLLM: Tailoring LLMs to Individual Preferences
Thomas P Zollo, Andrew Wei Tung Siah, Naimeng Ye, Ang Li, Hongseok Namkoong · PDF
Pluralistic Alignment Over Time
Toryn Q. Klassen, Parand A. Alamdari, Sheila A. McIlraith · PDF
Plurality of value pluralism and AI value alignment
Atoosa Kasirzadeh · PDF
Plurals: A system for pluralistic AI via simulated social ensembles
Joshua Ashkinaze, Eric Gilbert, Ceren Budak · PDF
Policy Aggregation
Parand A. Alamdari, Soroush Ebadian, Ariel D. Procaccia · PDF
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng, Hao-Xuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo · PDF
Representative Social Choice: From Learning Theory to AI Alignment
Tianyi Qiu · PDF
Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment
Nicholas A. Caputo · PDF
Selective Preference Aggregation
Shreyas Kadekodi, Hayden McTavish, Berk Ustun · PDF
Toward Democracy Levels for AI
Aviv Ovadya, Luke Thorburn, Kyle Redman, Flynn Devine, Smitha Milli, Manon Revel, Andrew Konya, Atoosa Kasirzadeh · PDF
Tractable Agreement Protocols
Natalie Collina, Surbhi Goel, Varun Gupta, Aaron Roth · PDF
Value Alignment from Unstructured Text
Inkit Padhi, Karthikeyan Natesan Ramamurthy, Prasanna Sattigeri, Manish Nagireddy, Pierre Dognin, Kush R. Varshney · PDF
Value-Aligned Imitation via focused Satisficing
Rushit N. Shah, Nikolaos Agadakos, Synthia Sasulski, Ali Farajzadeh, Sanjiban Choudhury, Brian D Ziebart · PDF
Virtual Personas for Language Models via an Anthology of Backstories
Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David Chan · PDF

Accepted papers (48)

☆"There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.

☆A Case Study in Plural Governance Design

☆Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

☆AGR: Age Group fairness Reward for Bias Mitigation in LLMs

☆AI, Pluralism, and (Social) Compensation

☆Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation

☆Aligning to Thousands of Preferences via System Message Generalization

☆Are Large Language Models Consistent over Value-laden Questions?

☆Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

☆Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs

☆Can Language Models Reason about Individualistic Human Values and Preferences?

☆Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

☆Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

☆Contrastive Learning Neuromotor Interface From Teacher

☆Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training

☆Critique-out-Loud Reward Models

☆Diverging Preferences: When do Annotators Disagree and do Models Know?

☆Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

☆Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study

☆Evaluating the Prompt Steerability of Large Language Models

☆FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness

☆From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

☆Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment

☆Intuitions of Compromise: Utilitarianism vs. Contractualism

☆Learning from Personal Preferences

☆Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

☆Mechanism Design for LLM Fine-tuning with Multiple Reward Models

☆MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces

☆Model Plurality: A Taxonomy for Pluralistic AI

☆Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

☆Multilingual Trolley Problems for Language Models

☆PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

☆Pareto-Optimal Learning from Preferences with Hidden Context

☆Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

☆PersonalLLM: Tailoring LLMs to Individual Preferences

☆Pluralistic Alignment Over Time

☆Plurality of value pluralism and AI value alignment

☆Plurals: A system for pluralistic AI via simulated social ensembles

☆Policy Aggregation

☆Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

☆Representative Social Choice: From Learning Theory to AI Alignment

☆Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment

☆Selective Preference Aggregation

☆Toward Democracy Levels for AI

☆Tractable Agreement Protocols

☆Value Alignment from Unstructured Text

☆Value-Aligned Imitation via focused Satisficing

☆Virtual Personas for Language Models via an Anthology of Backstories

"There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.

A Case Study in Plural Governance Design

Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI

AGR: Age Group fairness Reward for Bias Mitigation in LLMs

AI, Pluralism, and (Social) Compensation

Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation

Aligning to Thousands of Preferences via System Message Generalization

Are Large Language Models Consistent over Value-laden Questions?

Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI

Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs

Can Language Models Reason about Individualistic Human Values and Preferences?

Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment

Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning

Contrastive Learning Neuromotor Interface From Teacher

Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training

Critique-out-Loud Reward Models

Diverging Preferences: When do Annotators Disagree and do Models Know?

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities

Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study

Evaluating the Prompt Steerability of Large Language Models

FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness

From Distributional to Overton Pluralism: Investigating Large Language Model Alignment

Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment

Intuitions of Compromise: Utilitarianism vs. Contractualism

Learning from Personal Preferences

Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions

Mechanism Design for LLM Fine-tuning with Multiple Reward Models

MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces

Model Plurality: A Taxonomy for Pluralistic AI

Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment

Multilingual Trolley Problems for Language Models

PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences

Pareto-Optimal Learning from Preferences with Hidden Context

Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning

PersonalLLM: Tailoring LLMs to Individual Preferences

Pluralistic Alignment Over Time

Plurality of value pluralism and AI value alignment

Plurals: A system for pluralistic AI via simulated social ensembles

Policy Aggregation

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Representative Social Choice: From Learning Theory to AI Alignment

Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment

Selective Preference Aggregation

Toward Democracy Levels for AI

Tractable Agreement Protocols

Value Alignment from Unstructured Text

Value-Aligned Imitation via focused Satisficing

Virtual Personas for Language Models via an Anthology of Backstories