NeurIPS 2024 Past Safety & alignment
Pluralistic Alignment Workshop at NeurIPS 2024
Pluralistic-Alignment 2024
- Submission deadline
- Sep 11, 2024, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (48)
Fetched from OpenReview (v2) on 2026-06-10.
-
"There are no solutions, only trade-offs.'' Taking A Closer Look At Safety Data Annotations.
-
A Case Study in Plural Governance Design
-
Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AI
-
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
-
AI, Pluralism, and (Social) Compensation
-
Aligning LLMs using Reinforcement Learning from Market Feedback (RLMF) for Regime Adaptation
-
Aligning to Thousands of Preferences via System Message Generalization
-
Are Large Language Models Consistent over Value-laden Questions?
-
Being Considerate as a Pathway Towards Pluralistic Alignment for Agentic AI
-
Bottom-Up and Top-Down Analysis of Values, Agendas, and Observations in Corpora and LLMs
-
Can Language Models Reason about Individualistic Human Values and Preferences?
-
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
-
Conditional Language Policy: A General Framework For Steerable Multi-Objective Finetuning
-
Contrastive Learning Neuromotor Interface From Teacher
-
Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training
-
Critique-out-Loud Reward Models
-
Diverging Preferences: When do Annotators Disagree and do Models Know?
-
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
-
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study
-
Evaluating the Prompt Steerability of Large Language Models
-
FairPlay: A Collaborative Approach to Mitigate Bias in Datasets for Improved AI Fairness
-
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
-
Group Robust Best-of-K Decoding of Language Models for Pluralistic Alignment
-
Intuitions of Compromise: Utilitarianism vs. Contractualism
-
Learning from Personal Preferences
-
Mallows-DPO: Fine-Tune Your LLM with Preference Dispersions
-
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
-
MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces
-
Model Plurality: A Taxonomy for Pluralistic AI
-
Multi-objective Reinforcement Learning: A Tool for Pluralistic Alignment
-
Multilingual Trolley Problems for Language Models
-
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
-
Pareto-Optimal Learning from Preferences with Hidden Context
-
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
-
PersonalLLM: Tailoring LLMs to Individual Preferences
-
Pluralistic Alignment Over Time
-
Plurality of value pluralism and AI value alignment
-
Plurals: A system for pluralistic AI via simulated social ensembles
-
Policy Aggregation
-
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
-
Representative Social Choice: From Learning Theory to AI Alignment
-
Rules, Cases, and Reasoning: Positivist Legal Theory as a Framework for Pluralistic AI Alignment
-
Selective Preference Aggregation
-
Toward Democracy Levels for AI
-
Tractable Agreement Protocols
-
Value Alignment from Unstructured Text
-
Value-Aligned Imitation via focused Satisficing
-
Virtual Personas for Language Models via an Anthology of Backstories