COLM 2025 Past Large language modelsFairness & ethics

Workshop on Socially Responsible Language Modelling Research

COLM 2025 Workshop SoLaR

Submission deadline
Jun 28, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-11 — please verify and enrich (topics are keyword-guessed).

Accepted papers (25)

Fetched from OpenReview (v2) on 2026-06-11.

  1. A Generative Approach to LLM Harmfulness Mitigation with Red Flag Tokens

    Sophie Xhonneux, David Dobre, Mehrnaz Mofakhami, Leo Schwinn, Gauthier Gidel · PDF
  2. A Study of Large Language Models for Extraction of Themes from Homeless Shelter Case Notes

    Madhumitha Selvaraj, Teale Masrani, Yani Ioannou, Geoffrey Messier · PDF
  3. Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards

    Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin · PDF
  4. CONECUT: Scalable Removal of Preference Redundancy

    Purbid bambroo, Daniel S. Brown, Ana Marasovic · PDF
  5. CourtReasoner: Can LLM Agents Reason Like Judges?

    Simeng Han, Yoshiki Takashima, Shannon Zejiang Shen, Chen Liu, Yixin Liu, Roque K. Thuo, Sonia Knowlton, Ruzica Piskac, Scott J Shapiro, Arman Cohan · PDF
  6. Detecting Biased Language in Icelandic: A Named Entity Recognition Approach for Socially Responsible Text Analysis

    Steinunn Rut Friðriksdóttir, Hafsteinn Einarsson · PDF
  7. IMPersona: Evaluating Individual Level LM Impersonation

    Quan Shi, Carlos E Jimenez, Stephen Dong, Brian Seo, Caden Yao, Adam Kelch, Karthik R Narasimhan · PDF
  8. Investigating Model Editing for Unlearning in Large Language Models

    Shariqah Hossain, Lalana Kagal · PDF
  9. Large Language Models in the Task of Automatic Validation of Text Classifier Predictions

    Aleksandr Tsymbalov · PDF
  10. LLMs are Vulnerable to Malicious Prompts Disguised as Scientific Language

    Yubin Ge, Neeraja Kirtane, Hao Peng, Dilek Hakkani-Tür · PDF
  11. LLMs on Trial: Evaluating Judicial Fairness for Large Language Models

    Yiran HU, Zongyue Xue, Haitao Li, Siyuan Zheng, Qingjing Chen, Shaochun Wang, Xihan Zhang, Ning Zheng, Yun Liu, Qingyao Ai, Yiqun LIU, Charles L. A. Clarke, Weixing Shen · PDF
  12. MCP Safety Training: Learning to Refuse Falsely Benign MCP Exploits using Improved Preference Alignment

    John Timothy Halloran · PDF
  13. MedPAIR: Measuring Physicians and AI Relevance Alignment in Medical Question Answering

    Yuexing Hao, Kumail Alhamoud, Hyewon Jeong, Haoran Zhang, Isha Puri, Philip Torr, Mike Schaekermann, Ariel Dora Stern, Marzyeh Ghassemi · PDF
  14. Multi-Turn Jailbreaks Are Simpler Than They Seem

    Xiaoxue Yang, Jaeha Lee, Anna-Katharina Dick, Jasper Timm, Fei Xie, Diogo Cruz · PDF
  15. Neither Valid nor Reliable? Investigating the Use of LLMs as Judges

    Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung, Golnoosh Farnadi · PDF
  16. Poor Alignment and Steerability of Large Language Models: Evidence Using 30,000 College Admissions Essays

    Jinsook Lee, AJ Alvero, Thorsten Joachims, Rene F Kizilcec · PDF
  17. Practical Evaluation of Machine Learning Efficiency Requires Model Life Cycle Assessment

    Jared Fernandez, Clara Na, Yonatan Bisk, Constantine Samaras, Emma Strubell · PDF
  18. Privacy-Preserving LLM Interaction with Socratic Chain-of-Thought Reasoning and Homomorphically Encrypted Vector Databases

    Yubeen Bae, Minchan Kim, Jaejin Lee, Sangbum Kim, Jaehyung Kim, Yejin Choi, Niloofar Mireshghallah · PDF
  19. Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods

    Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz · PDF
  20. Red Teaming Vision Language Models Under Change

    Rebecca Tsekanovskiy, James Hendler · PDF
  21. Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques

    Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Kevin Zhu, Sean O'Brien, Vasu Sharma · PDF
  22. The Alignment Game: The Inevitable Conflict of Values in Generative Models

    Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab · PDF
  23. Towards Attuned AI: Integrating Care Ethics in Large Language Model Development and Alignment

    Rayane El Masri, Aaron J Snoswell · PDF
  24. TRUTH: Teaching LLMs to Rerank for Truth in Misinformation Detection

    Hao Yu, Shenyang Huang, Zachary Yang, Maximilian Puelma Touzel, Kellin Pelrine, Jean-François Godbout, Reihaneh Rabbany · PDF
  25. When Do Language Models Endorse Limitations on Universal Human Rights Principles?

    Keenan Samway, Rada Mihalcea, Zhijing Jin · PDF