ICML 2026 Past Safety & alignment
Pluralistic Alignment Workshop at ICML 2026
Pluralistic-Alignment 2026
- Submission deadline
- May 9, 2026, 12:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (80)
Fetched from OpenReview (v2) on 2026-06-10.
-
Adaptive Pluralistic Alignment: a pipeline for dynamic artificial democracy
-
AI Pluralism and the Worlds It Misses
-
Algorithmic Approaches to Opinion Selection for Online Deliberation: A Comparative Study
-
Benchmarking Pluralistic Alignment Through Persona-Conditioned Behavioral Evaluation
-
Beyond the Mean: Three-Axis Fidelity for Aligning LLM-Based Survey Simulators from Small Pilot Data
-
Bosses, Kings, and the Commons: Cooperation Under Power Asymmetry in LLM Societies
-
Changing Tunes: A Longitudinal Study of Political Drift in LLMs
-
ConstitutionMAS-EC: Peer Constitutional Critique for Aligned Emergent Communication in Decentralized Multi-Agent LLMs
-
Data Mixing for Group Preference Heterogeneity in Collaborative Filtering
-
Deference by Design: Pluralistic Alignment Is an Interface Problem
-
Directional Influence and Consensus Formation in Multi-Agent Systems
-
Diversifying Multiple Generative Agents by Aligning with Human Populations
-
Do LLMs Acknowledge Disputed Facts? A Benchmark for Factual Pluralism in LLMs
-
Does AI Assistance Preserve or Collapse Disagreement? A Study of Pre-Annotations in Ambiguous Video Labeling
-
Does Privacy Always Harm Fairness? Data-Dependent Trade-offs via Chernoff Information Neural Estimation
-
Dual Mechanisms of Value Expression: Intrinsic vs. Prompted Values in Large Language Models
-
EGGROLL-IPO: Pluralistic Alignment via Decentralised Post-Training with Population Preferences
-
Evaluating Pluralism in LLMs through Latent Perspectives
-
Event-Driven Reinforcement Learning for Pluralistic Alignment
-
For Questions of Ought, AI Could Use Some SAGE Advice
-
FRAGILE: Benchmarking Framing Sensitivity in High-Stakes Decision-Making
-
From Rashomon Theory to PRAXIS: Efficient Decision Tree Rashomon Sets
-
Geometry of Values: Task Vector Composition for Ethical Preference Alignment in Language Models
-
HEARSAYBENCH: Can LLMs Navigate from Abstract Human Rights to Lived Lives?
-
Helpful or Safe? UltraFeedback's Binarized Labels Encode a Value Tradeoff
-
Inference-Time Alignment Control for Diffusion Models with Reinforcement Learning Guidance
-
Innocuous-Seeming Data, Latent Ideology: Ideological Generalisation in Finetuned LLMs
-
It’s Up to Interpretation: Aligning to One’s Ever-Shifting Internal State
-
Learning to Retrieve User History and Generate User Profiles for Personalized Persuasiveness Prediction
-
Learning Unanimously Acceptable Lotteries via Queries
-
LLM Human Response Alignment: A Multi-Sample Debiasing Framework
-
Majority Vote Silences Minority Values: Annotator Disagreement at the Hate/Offensive Boundary in HateXplain
-
Memetic Capture: A Pluralistic Policy Framework for Governing AI-Driven Cultural Disempowerment
-
Memetic Drift in Multi-Agent LLMs: Scaling Laws for Consensus Under Pluralistic Uncertainty
-
Mission Impossible: Universal Moral Alignment
-
Modeling diverse preferences in movie artwork personalization with large language models
-
Moral Orientation and Calibration: Coupled in Human Annotators, Separable in Judge LLMs
-
Multi-Action-Head On-Policy Self-Distillation for Pluralistic Alignment
-
PEBS: Per-rater Empirical-Bayes Shrinkage for RLHF Reward-Model Calibration
-
Pedagogical Games: Paths to Generalisation for Agentic Moral Alignment
-
Personalization, Personas, and Forecasting in Value Alignment
-
PIPE: Personalized Image-generation via Preference Encoding
-
Playing Devil’s Advocate: Off-the-Shelf Persona Vectors Rival Targeted Steering for Sycophancy
-
Pluralistic AI Alignment Requires Inference-Time Multi-Objective Control
-
Pluralistic Preference Alignment via Sortition-Weighted RLHF
-
Position: Aggregate Preference Optimization Hides a Posterior Identifiability Failure for Pluralistic Alignment
-
Position: Align AI to Our Aspirations, Not Our Flaws
-
Position: LLM alignment data should be regulated as mass media
-
Position: Why LLMs Should Be Reasonably Morally Inconsistent
-
PRISM: When Agents Provably Learn from Pluralistic Human Feedback
-
Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences
-
Reasoning Models Generate Societies of Thought
-
Reducing Supervision Uncertainty Induces Model Miscalibration
-
Response-Aware User Memory Selection for LLM Personalization
-
Rethinking AI Alignment: From Static Rewards to Social Reinforcement Learning
-
Rethinking Diversity-Preserving RL for Pluralistic Alignment: Empirical Evidence from Rubric-Grounded Moral Reasoning
-
Rethinking Scaffolding in LLM Tutors: The Interactional Mismatch Between Benchmarks and Real-World Deployments
-
RobotValues: Evaluating Household Robots When Human Values Conflict
-
RouteJudge: Preference-Based Evaluation of LLM Routers under Pluralistic User Preferences
-
Same Facts, Different Updates: Inference Setup Shapes LLM Behavior in Medical Allocation
-
Separating Value Disagreement from Data Uncertainty in Pluralistic Preference Data
-
Side Effects of Character Training: Quantifying Cross-Constitution Drift in LLMs
-
Social Choice Foundations for Simulation-Augmented Generation
-
Socially Grounded Agentic AI: Coordinating Plural Perspectives through Social Theory
-
Steerable Cultural Preference Optimization of Reward Models
-
The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety
-
The Language of Bargaining: Linguistic Effects in LLM Negotiations
-
The Persona Fidelity Gap: Behaviorally Grounded LLM Personas Still Compress Real-User Preference Diversity
-
The Wedge Questions: Latent Cultural Boundaries in LLMs via Persona Projection Divergence
-
ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions
-
To Whom Do Language Models Align? Measuring Principal Hierarchies Under High-Stakes Competing Demands
-
ToolAlignBench: Investigating Alignment Conflicts in Tool-Calling Enabled LLMs
-
Toward Deployable Pluralistic Alignment in Robotics: Learning Similarity-Grouped Rewards from Diverse Human Preferences
-
Universal Alignment Fails in Global Classrooms: Cross-Cultural Blind Spots in EdTech AI
-
What Aggregate Accuracy Hides: Cultural Affective Inequity in Multilingual LLMs
-
What Does the AI Doctor Value? Auditing Pluralism in the Clinical Ethics of Language Models
-
When Disagreement Matters: Friction, Pluralistic Alignment, and National-Security AI
-
When We Don’t See The Same Picture: Aligning Agents with Divergent Visual Spaces
-
Where Models Concentrate and Humans Spread: A Coverage Framework for Distributional Pluralism in Open-Ended Generation
-
Whose Alignment? Comparing LLM Process Alignment Across Diverse Organizational Decision Contexts