NeurIPS 2024 Past Large language modelsFairness & ethics
Workshop on Socially Responsible Language Modelling Research
SoLaR
- Submission deadline
- Sep 15, 2024, 15:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (71)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Cautionary Tale on the Evaluation of Differentially Private In-Context Learning
-
AI Sandbagging: Language Models can Selectively Underperform on Evaluations
-
An Adversarial Perspective on Machine Unlearning for AI Safety
-
Analyzing Probabilistic Methods for Evaluating Agent Capabilities
-
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
-
Balancing Act: Prioritization Strategies for LLM-Designed Restless Bandit Rewards
-
Beyond the Binary: Capturing Diverse Preferences With Reward Regularization
-
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
-
Century: A Dataset of Sensitive Historical Images
-
CoS: Enhancing Personalization with Context Steering
-
Detection of Partially-Synthesized LLM Text
-
Developing an occupational prestige scale using Large Language Models
-
Developing Story: Case Studies of Generative AI’s Use in Journalism
-
Different Bias Under Different Criteria: Assessing Bias in LLMs with a Fact-Based Approach
-
Differentially Private Learning Needs Better Model Initialization and Self-Distillation
-
Emergence of Steganography Between Large Language Models
-
Enhancing Language Model Calibration to Human Responses in Ethical Ambiguity via Fine-Tuning
-
Fact or Fiction? Can LLMs be Reliable Annotators for Political Truths?
-
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models
-
Gender Bias in LLM-generated Interview Responses
-
GPAI Evaluations Standards Taskforce: towards effective AI governance
-
HEARTS: A Holistic Framework for Explainable, Sustainable and Robust Text Stereotype Detection
-
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
-
How Does LLM Compression Affect Weight Exfiltration Attacks?
-
I Think, Therefore I am: Benchmarking Awareness of Large Language Models Using AwareBench
-
Investigating Goal-Aligned and Empathetic Social Reasoning Strategies for Human-Like Social Intelligence in LLMs
-
Jailbreak Defense in a Narrow Domain: Failures of Existing Methods and Improving Transcript-Based Classifiers
-
Jailbreaking Large Language Models with Symbolic Mathematics
-
Just rephrase it! Uncertainty estimation in closed-source language models via multiple rephrased queries
-
Language Models Resist Alignment
-
Large Language Models Still Exhibit Bias in Long Text
-
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
-
Levels of Autonomy: Liability in the age of AI Agents
-
Linear Probe Penalties Reduce LLM Sycophancy
-
LLM Alignment Using Soft Prompt Tuning: The Case of Cultural Alignment
-
LLM Hallucination Reasoning with Zero-shot Knowledge Test
-
Measuring AI Agent Autonomy: Towards a Scalable Approach With Code Inspection
-
Measuring Free-Form Decision-Making Inconsistency of Language Models in Military Crisis Simulations
-
MISR: Measuring Instrumental Self-Reasoning in Frontier Models
-
Mitigating Downstream Model Risks via Model Provenance
-
Monitoring Human Dependence On AI Systems With Reliance Drills
-
NusaMT-7B: Machine Translation for Low-Resource Indonesian Languages with Large Language Models
-
On Adversarial Robustness of Language Models in Transfer Learning
-
On Demonstration Selection for Improving Fairness in Language Models
-
On the Ethical Considerations of Generative Agents
-
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
-
Plentiful Jailbreaks with String Compositions
-
Policy Dreamer: Diverse Public Policy Generation Via Elicitation and Simulation of Human Preferences
-
Position Paper: Model Access should be a Key Concern in AI Governance
-
Position: AI Agents & Liability – Mapping Insights from ML and HCI Research to Policy
-
Position: Governments Need to Increase and Interconnect Post-Deployment Monitoring of AI
-
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
-
ReFeR: A Hierarchical Framework of Models as Evaluative and Reasoning Agents
-
Report Cards: Qualitative Evaluation of LLMs Using Natural Language Summaries
-
SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation
-
Salad-Bowl-LLM: Multi-Culture LLMs by In-Context Demonstrations from Diverse Cultures
-
Sandbag Detection through Model Impairment
-
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
-
Shh, don't say that! Domain Certification in LLMs
-
Simulation System Towards Solving Societal-Scale Manipulation
-
SocialStigmaQA Spanish and Japanese - Towards Multicultural Adaptation of Social Bias Benchmarks
-
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
-
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models
-
The Elicitation Game: Stress-Testing Capability Elicitation Techniques
-
The Impact of Large Language Models in Academia: from Writing to Speaking
-
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
-
Towards a Theory of AI Personhood
-
Towards Safe Multilingual Frontier AI
-
Toxic Neurons Aren’t Enough to Explain DPO: A Mechanistic Analysis for Toxicity Reduction
-
Understanding Model Bias Requires Systematic Probing Across Tasks
-
Ways Forward for Global AI Benefit Sharing