NeurIPS 2025 Past Large language models

Lock-LLM Workshop: Prevent Unauthorized Knowledge Use from Large Language Models

NeurIPS Lock-LLM Workshop 2025

Submission deadline
Sep 18, 2025, 23:59 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (57)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Granular Study of Safety Pretraining under Model Abliteration

    Shashank Agnihotri, Jonas Jakubassa, Priyam Dey, Sachin Goyal, Bernt Schiele, Venkatesh Babu Radhakrishnan, Margret Keuper · PDF
  2. AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs

    Madhava Gaikwad · PDF
  3. ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization

    Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Dheeraj Kulshrestha, Rajiv Ramnath · PDF
  4. AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models

    Andrew Zagula, Aashray Reddy, Nicholas Saban · PDF
  5. Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs

    Krishiv Agarwal, Ramneet Kaur, Colin Samplawski, Manoj Acharya, Anirban Roy, Daniel Elenius, Brian Matejek, Adam D. Cobb, Susmit Jha · PDF
  6. Breaking Distortion-free Watermarks in Large Language Models

    Shayleen Reynolds, Hengzhi He, Dung Daniel Ngo, Saheed Obitayo, Niccolo Dalmasso, Guang Cheng, Vamsi K. Potluru, Manuela Veloso · PDF
  7. Can Editing LLMs Inject Harm?

    Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu · PDF
  8. Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning

    Filip Sondej, Yushi Yang · PDF
  9. Compressed but Compromised? A Study of Jailbreaking in Compressed LLMs

    Satya Sai Srinath Namburi GNVV, Alex James Boyd, Andrew Warrington · PDF
  10. Context-Masked Meta-Prompting for Privacy-Preserving LLM Adaptation in Finance

    Sayash Raaj Hiraou · PDF
  11. Cross-Modal Attention Guided Unlearning in Vision-Language Models

    Karuna Bhaila, Aneesh Komanduri, Minh-Hao Van, Xintao Wu · PDF
  12. Cryptographic Fingerprinting for Medical AI: A Proof-of-Concept Approach to Protecting Healthcare ML Models from API Extraction

    Saaketh Bhojanam, Sohum Mehta · PDF
  13. Differentially Private In-Context Learning with Nearest Neighbor Search

    Antti Koskela, Tejas Kulkarni, Laith Yousef Zumot · PDF
  14. DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge

    Asmita Mohanty, Gezheng Kang, Lei Gao, Murali Annavaram · PDF
  15. Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning

    Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar · PDF
  16. Does Machine Unlearning Truly Remove Knowledge?

    Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Claudia Grosser, Denis Krompaß, Jindong Gu, Nassir Navab, Volker Tresp · PDF
  17. DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation

    Pingzhi Li, Zhen Tan, Yu-Chao Huang, Huaizhi Qu, huan liu, Tianlong Chen · PDF
  18. Economic Confidentiality without Secrets: Making Intercepted LLM-Agent Communications Unusable

    Bolaji Makinde · PDF
  19. Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?

    Zexi Li, Xiangzhu Wang, William F. Shen, Meghdad Kurmanji, Xinchi Qiu, Dongqi Cai, Chao Wu, Nicholas D. Lane · PDF
  20. Evaluating and Mitigating Contextual Vulnerabilities in LLMs: An Architectural Approach to Resisting Multi-Turn Jailbreaks

    Adarsh Kumarappan, Ananya Mujoo · PDF
  21. Evaluating Privacy Leakage From In-Context Learning

    Hongyi Li, James Flemings, YoungJune, Murali Annavaram · PDF
  22. Exploiting the Experts: Unauthorized Compression in MoE-LLMs

    Pinaki Prasad Guha Neogi, Ahmad Mohammadshirazi, Dheeraj Kulshrestha, Rajiv Ramnath · PDF
  23. How to Make LLMs Safer? Detecting and Editing Key Heads in LLMs

    Kuan-Lin Chu, Chung-En Sun, Tsui-Wei Weng · PDF
  24. Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?

    Rishika Bhagwatkar, Kevin Kasa, Abhay Puri, Gabriel Huang, Irina Rish, Graham W. Taylor, Krishnamurthy Dj Dvijotham, Alexandre Lacoste · PDF
  25. Jailbreak Distillation: Renewable Safety Benchmarking

    Jingyu Zhang, Ahmed Elgohary, Xiawei Wang, A S M Iftekhar, Ahmed Magooda, Benjamin Van Durme, Daniel Khashabi, Kyle Jackson · PDF
  26. Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models

    Muhammad Haris Khan · PDF
  27. LLMs can hide text in other text of the same length

    Antonio Norelli, Michael M. Bronstein · PDF
  28. LSMAS (LLM Security Modeling via Activation Steering)

    Anthony Kuang, Ahmed Ismail, Ayo Akinkugbe, Kevin Zhu, Sean O'Brien · PDF
  29. MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models

    Hyunjun Kim, Sejong Kim · PDF
  30. MarkTune: Advancing the Quality-Detectability Pareto Frontier of Open-Weight LM Watermarking

    Yizhou Zhao, Steven Wu, Adam Block · PDF
  31. MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use

    Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Dheeraj Kulshrestha, Rajiv Ramnath · PDF
  32. Model Immunization by Trapping Harmful Finetuning

    Najibul Haque Sarker, Zaber Ibn Abdul Hakim, Alvi Md Ishmam, Chia-Wei Tang, Chris Thomas · PDF
  33. No Question, No Passage, No Problem: Investigating Artifact Exploitation and Reasoning in Multiple-Choice Reading Comprehension

    Anthony Cui, Rohan Raj Butani, Theodore Oltean · PDF
  34. OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution

    Zerui Cheng, Edoardo Contente, Benjamin Tsengel Finch, Oleg Aleksandrovich Golev, Jonathan Hayase, Andrew Miller, Niusha Moshrefi, Anshul Nasery, Sewoong Oh, Himanshu Tyagi, Pramod Viswanath · PDF
  35. On the Relationship Between Neural Tangent Kernel Frobenius Distance and Distillation Sample Complexity

    Arnav Sharma, Ahmed Wez, Karthik Srikumar · PDF
  36. PASTRAL: Privacy-aware AST and TRansformer-based Anomalous command-Line detection

    Xiayan Ji, Ecenaz Erdemir, Kyuhong Park, Bhavna Soman, Yi Fan · PDF
  37. Permissioned LLMs: Enforcing Access Control in Large Language Models

    Bargav Jayaraman, Virendra Marathe, Hamid Mozaffari, William F. Shen, Krishnaram Kenthapadi · PDF
  38. Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness

    Lang Xiong, Nishant Bhargava, Jeremy Chang, Jianhang Hong, Haihao Liu, Vasu Sharma, Kevin Zhu · PDF
  39. Reasoning Models Can be Easily Hacked by Fake Reasoning Bias

    Qian Wang, Zhenheng Tang, Nuo Chen, Wenxuan Wang, Bingsheng He · PDF
  40. SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning

    Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang · PDF
  41. Safety Subspaces are Not Distinct: A Fine-Tuning Case Study

    Kaustubh Ponkshe, Shaan Shah, Raghav Singhal, Praneeth Vepakomma · PDF
  42. Scalable Fingerprinting of Large Language Models

    Anshul Nasery, Jonathan Hayase, Creston Brooks, Peiyao Sheng, Himanshu Tyagi, Pramod Viswanath, Sewoong Oh · PDF
  43. SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From

    Yao Tong, Haonan Wang, Siquan Li, Kenji Kawaguchi, Tianyang Hu · PDF
  44. Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption

    Michael Yang, Ruijiang Gao, Zhiqiang Zheng · PDF
  45. Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

    Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr · PDF
  46. The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models

    Ann-Kathrin Dombrowski, Dillon Bowen, Adam Gleave, Chris Cundy · PDF
  47. Towards Controlled LLM Unlearning

    William F. Shen, Xinchi Qiu, Meghdad Kurmanji, Alex Iacob, Lorenzo Sani, Yihong Chen, Nicola Cancedda, Nicholas D. Lane · PDF
  48. Towards Quantization-Adversarial Reparameterizations

    Raine Ma · PDF
  49. Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM

    Adarsh Kumarappan, Ayushi Mehrotra · PDF
  50. Un-Distillable LLMs via Entropy-Perturbed Logits

    Mithil Shah, Andrew Bae, Laksh Patel · PDF
  51. Undistillable Open Language Models with Teacher Scrambling

    Sebastian Dionicio, Aniq Elahi, Domenic Rosati, Hassan Sajjad · PDF
  52. Unlearners Can Lie: Evaluating “Honesty” in LLM Unlearning

    Renjie Gu, Jiazhen Du, Yihua Zhang, Sijia Liu · PDF
  53. User Confidence-Fueled Stereotypes: Investigating Sycophantic Amplification of Implicit Bias in Language Models

    Hannah You, Daniel Wang, Victor Chan, Mirabel Wang, Aslihan Akalin, Kevin Zhu · PDF
  54. Who’s Your Judge? On the Detectability of LLM-Generated Judgments

    Dawei Li, Zhen Tan, Chengshuai Zhao, Bohan Jiang, Baixiang Huang, Pingchuan Ma, Abdullah Alnaibari, Kai Shu, huan liu · PDF
  55. Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning

    Wassim Bouaziz, Mathurin VIDEAU, Nicolas Usunier, El-Mahdi El-Mhamdi · PDF
  56. X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates

    Hyunjun Kim, Junwoo Ha, Haon Park, Sangyoon Yu · PDF
  57. Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs

    Arjun Damerla, Anirudh Sekar, Rachel Sharma, Mrinal Agarwal, Jasmine Zhang, Akitsugu Tanaka · PDF