COLM 2025PastLarge language modelsFairness & ethics

Workshop on Socially Responsible Language Modelling Research

COLM 2025 Workshop SoLaR

Official website ↗OpenReview venue ↗See all COLM workshops →✎ Edit this entry

Submission deadline: Jun 28, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (25)

Fetched from OpenReview (v2) on 2026-06-11.

A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens
Sophie Xhonneux, David Dobre, Mehrnaz Mofakhami, Leo Schwinn, Gauthier Gidel · PDF
A Study of Large Language Models for Extraction of Themes from Homeless Shelter Case Notes
Madhumitha Selvaraj, Teale Masrani, Yani Ioannou, Geoffrey Messier · PDF
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin · PDF
CONECUT: Scalable Removal of Preference Redundancy
Purbid bambroo, Daniel S. Brown, Ana Marasovic · PDF
CourtReasoner: Can LLM Agents Reason Like Judges?
Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J Shapiro, Arman Cohan · PDF
Detecting Biased Language in Icelandic: A Named Entity Recognition Approach for Socially Responsible Text Analysis
Steinunn Rut Friðriksdóttir, Hafsteinn Einarsson · PDF
IMPersona: Evaluating Individual Level LM Impersonation
Quan Shi, Carlos E Jimenez, Stephen Dong, Brian Seo, Caden Yao, Adam Kelch, Karthik R Narasimhan · PDF
Investigating Model Editing for Unlearning in Large Language Models
Shariqah Hossain, Lalana Kagal · PDF
Large Language Models in the Task of Automatic Validation of Text Classifier Predictions
Aleksandr Tsymbalov · PDF
LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language
Yubin Ge, Neeraja Kirtane, Hao Peng, Dilek Hakkani-Tür · PDF
LLMs on Trial: Evaluating Judicial Fairness for Large Language Models
Yiran HU, Zongyue Xue, Haitao Li, Siyuan Zheng, Qingjing Chen, Shaochun Wang, Xihan Zhang, Ning Zheng, Yun Liu, Qingyao Ai, Yiqun LIU, Charles L. A. Clarke, Weixing Shen · PDF
MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment
John Timothy Halloran · PDF
MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering
Yuexing Hao, Kumail Alhamoud, Hyewon Jeong, Haoran Zhang, Isha Puri, Philip Torr, Mike Schaekermann, Ariel Dora Stern, Marzyeh Ghassemi · PDF
Multi-Turn Jailbreaks Are Simpler Than They Seem
Xiaoxue Yang, Jaeha Lee, Anna-Katharina Dick, Jasper Timm, Fei Xie, Diogo Cruz · PDF
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung, Golnoosh Farnadi · PDF
Poor Alignment and Steerability of Large Language Models: Evidence Using 30,000 College Admissions Essays
Jinsook Lee, AJ Alvero, Thorsten Joachims, Rene F Kizilcec · PDF
Practical Evaluation of Machine Learning Efficiency Requires Model Life Cycle Assessment
Jared Fernandez, Clara Na, Yonatan Bisk, Constantine Samaras, Emma Strubell · PDF
Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases
Yubeen Bae, Minchan Kim, Jaejin Lee, Sangbum Kim, Jaehyung Kim, Yejin Choi, Niloofar Mireshghallah · PDF
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz · PDF
Red Teaming Vision Language Models Under Change
Rebecca Tsekanovskiy, James Hendler · PDF
Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques
Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Kevin Zhu, Sean O'Brien, Vasu Sharma · PDF
The Alignment Game: The Inevitable Conflict of Values in Generative Models
Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab · PDF
Towards Attuned AI: Integrating Care Ethics in Large Language Model Development and Alignment
Rayane El Masri, Aaron J Snoswell · PDF
TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection
Hao Yu, Shenyang Huang, Zachary Yang, Maximilian Puelma Touzel, Kellin Pelrine, Jean-François Godbout, Reihaneh Rabbany · PDF
When Do Language Models Endorse Limitations on Universal Human Rights Principles?
Keenan Samway, Rada Mihalcea, Zhijing Jin · PDF

Accepted papers (25)

☆A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens

☆A Study of Large Language Models for Extraction of Themes from Homeless Shelter Case Notes

☆Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

☆CONECUT: Scalable Removal of Preference Redundancy

☆CourtReasoner: Can LLM Agents Reason Like Judges?

☆Detecting Biased Language in Icelandic: A Named Entity Recognition Approach for Socially Responsible Text Analysis

☆IMPersona: Evaluating Individual Level LM Impersonation

☆Investigating Model Editing for Unlearning in Large Language Models

☆Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

☆LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language

☆LLMs on Trial: Evaluating Judicial Fairness for Large Language Models

☆MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment

☆MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

☆Multi-Turn Jailbreaks Are Simpler Than They Seem

☆Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

☆Poor Alignment and Steerability of Large Language Models: Evidence Using 30,000 College Admissions Essays

☆Practical Evaluation of Machine Learning Efficiency Requires Model Life Cycle Assessment

☆Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

☆Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

☆Red Teaming Vision Language Models Under Change

☆Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

☆The Alignment Game: The Inevitable Conflict of Values in Generative Models

☆Towards Attuned AI: Integrating Care Ethics in Large Language Model Development and Alignment

☆TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection

☆When Do Language Models Endorse Limitations on Universal Human Rights Principles?

A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens

A Study of Large Language Models for Extraction of Themes from Homeless Shelter Case Notes

Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

CONECUT: Scalable Removal of Preference Redundancy

CourtReasoner: Can LLM Agents Reason Like Judges?

Detecting Biased Language in Icelandic: A Named Entity Recognition Approach for Socially Responsible Text Analysis

IMPersona: Evaluating Individual Level LM Impersonation

Investigating Model Editing for Unlearning in Large Language Models

Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language

LLMs on Trial: Evaluating Judicial Fairness for Large Language Models

MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment

MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

Multi-Turn Jailbreaks Are Simpler Than They Seem

Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

Poor Alignment and Steerability of Large Language Models: Evidence Using 30,000 College Admissions Essays

Practical Evaluation of Machine Learning Efficiency Requires Model Life Cycle Assessment

Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

Red Teaming Vision Language Models Under Change

Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

The Alignment Game: The Inevitable Conflict of Values in Generative Models

Towards Attuned AI: Integrating Care Ethics in Large Language Model Development and Alignment

TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection

When Do Language Models Endorse Limitations on Universal Human Rights Principles?