COLM 2025 Past Other

The First Workshop on the Interplay of Model Behavior and Model Internals

INTERPLAY

Submission deadline
Jul 11, 2025, 07:55 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-11 — please verify and enrich (topics are keyword-guessed).

Accepted papers (22)

Fetched from OpenReview (v2) on 2026-06-11.

  1. Analyzing Representational Shifts in Multimodal Models: A Study of Feature Dynamics in Gemma and PaliGemma

    Aaron C Friedman, Trinabh Gupta, Raine Ma, Sean O'Brien, Kevin Zhu, Cole Blondin · PDF
  2. Angular Steering: Behavior Control via Rotation in Activation Space

    Hieu M. Vu, Tan Minh Nguyen · PDF
  3. Attributing Response to Context: A Jensen–Shannon Divergence Driven Mechanistic Study of Context Attribution in Retrieval-Augmented Generation

    Ruizhe Li, Chen Chen, Yuchen Hu, Yanjun Gao, Xi Wang, Emine Yilmaz · PDF
  4. BERTology in the Modern World

    Michael Li, Nishant Subramani · PDF
  5. Causal Interventions Reveal Shared Structure Across English Filler–Gap Constructions

    Sasha Boguraev, Christopher Potts, Kyle Mahowald · PDF
  6. Comparing Prompt and Representation Engineering for Personality Control in Language Models: A Case Study

    Pengrui Han · PDF
  7. Death by a Thousand Directions: Exploring the Geometry of Harmfulness in LLMs through Subconcept Probing

    McNair Shah, Saleena Angeline Sartawita, Adhitya Rajendra Kumar, Naitik Chheda, Kevin Zhu, Vasu Sharma, Sean O'Brien, Will Cai · PDF
  8. Emotions Where Art Thou: Understanding and Characterizing the Emotional Latent Space of Large Language Models

    Benjamin Reichman, Adar Avsian, Larry Heck · PDF
  9. Evaluating Contrast Localizer for Identifying Causal Units in Social & Mathematical Tasks in Language Models

    Yassine Jamaa, Badr AlKhamissi, Satrajit S Ghosh, Martin Schrimpf · PDF
  10. From Indirect Object Identification to Syllogisms: Exploring Binary Mechanisms in Transformer Circuits

    Karim Saraipour, Shichang Zhang · PDF
  11. How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

    Hongzhe Du, Weikai Li, Min Cai, Karim Saraipour, Zimin Zhang, Yizhou Sun, Himabindu Lakkaraju, Shichang Zhang · PDF
  12. Interpreting the Latent Structure of Operator Precedence in Language Models

    Dharunish Yugeswardeenoo, Harshil Nukala, Cole Blondin, Sean O'Brien, Vasu Sharma, Kevin Zhu · PDF
  13. LLM Microscope: What Model Internals Reveal About Answer Correctness and Context Utilization

    Jiarui Liu, Jivitesh Jain, Mona T. Diab, Nishant Subramani · PDF
  14. Localizing Persona Representations in LLMs

    Celia Cintas, Miriam Rateike, Erik Miehling, Elizabeth M. Daly, Skyler Speakman · PDF
  15. On the Geometry of Semantics in Next-token Prediction

    Yize Zhao, Christos Thrampoulidis · PDF
  16. One-shot Optimized Steering Vectors Mediate Safety-relevant Behaviors in LLMs

    Jacob Dunefsky, Arman Cohan · PDF
  17. Predicting Success of Model Editing via Intrinsic Features

    Yanay Soker, Martin Tutek, Yonatan Belinkov · PDF
  18. Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking

    Wuwei Zhang, Fangcong Yin, Howard Yen, Danqi Chen, Xi Ye · PDF
  19. Safety Subspaces are Not Distinct: A Fine-Tuning Case Study

    Shaan Shah, Kaustubh Ponkshe, Raghav Singhal, Praneeth Vepakomma · PDF
  20. Stochastic Chameleons: Irrelevant Context Hallucinations Reveal Class-Based (Mis)Generalization in LLMs

    Ziling Cheng, Meng Cao, Marc-Antoine Rondeau, Jackie CK Cheung · PDF
  21. Understanding In-context Learning of Addition via Activation Subspaces

    Xinyan Hu, Kayo Yin, Michael I. Jordan, Jacob Steinhardt, Lijie Chen · PDF
  22. Universal Neurons in GPT-2: Emergence, Persistence, and Functional Impact

    Advey Nandan, Cheng-Ting Chou, Amrit Kurakula, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien · PDF