NeurIPS 2025 Past Large language models
Lock-LLM Workshop: Prevent Unauthorized Knowledge Use from Large Language Models
NeurIPS Lock-LLM Workshop 2025
- Submission deadline
- Sep 18, 2025, 23:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (57)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Granular Study of Safety Pretraining under Model Abliteration
-
AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs
-
ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization
-
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
-
Breaking Bad: Interpretability-Based Safety Audits of State-of-the-Art LLMs
-
Breaking Distortion-free Watermarks in Large Language Models
-
Can Editing LLMs Inject Harm?
-
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
-
Compressed but Compromised? A Study of Jailbreaking in Compressed LLMs
-
Context-Masked Meta-Prompting for Privacy-Preserving LLM Adaptation in Finance
-
Cross-Modal Attention Guided Unlearning in Vision-Language Models
-
Cryptographic Fingerprinting for Medical AI: A Proof-of-Concept Approach to Protecting Healthcare ML Models from API Extraction
-
Differentially Private In-Context Learning with Nearest Neighbor Search
-
DistilLock: Safeguarding LLMs from Unauthorized Knowledge Distillation on the Edge
-
Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
-
Does Machine Unlearning Truly Remove Knowledge?
-
DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
-
Economic Confidentiality without Secrets: Making Intercepted LLM-Agent Communications Unusable
-
Editing as Unlearning: Are Knowledge Editing Methods Strong Baselines for Large Language Model Unlearning?
-
Evaluating and Mitigating Contextual Vulnerabilities in LLMs: An Architectural Approach to Resisting Multi-Turn Jailbreaks
-
Evaluating Privacy Leakage From In-Context Learning
-
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
-
How to Make LLMs Safer? Detecting and Editing Key Heads in LLMs
-
Indirect Prompt Injections: Are Firewalls All You Need, or Stronger Benchmarks?
-
Jailbreak Distillation: Renewable Safety Benchmarking
-
Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models
-
LLMs can hide text in other text of the same length
-
LSMAS (LLM Security Modeling via Activation Steering)
-
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
-
MarkTune: Advancing the Quality-Detectability Pareto Frontier of Open-Weight LM Watermarking
-
MGA-VQA: Secure and Interpretable Graph-Augmented Visual Question Answering with Memory-Guided Protection Against Unauthorized Knowledge Use
-
Model Immunization by Trapping Harmful Finetuning
-
No Question, No Passage, No Problem: Investigating Artifact Exploitation and Reasoning in Multiple-Choice Reading Comprehension
-
OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution
-
On the Relationship Between Neural Tangent Kernel Frobenius Distance and Distillation Sample Complexity
-
PASTRAL: Privacy-aware AST and TRansformer-based Anomalous command-Line detection
-
Permissioned LLMs: Enforcing Access Control in Large Language Models
-
Probe-Rewrite-Evaluate: A Workflow for Reliable Benchmarks and Quantifying Evaluation Awareness
-
Reasoning Models Can be Easily Hacked by Fake Reasoning Bias
-
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
-
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
-
Scalable Fingerprinting of Large Language Models
-
SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
-
Sell Data to AI Algorithms Without Revealing It: Secure Data Valuation and Sharing via Homomorphic Encryption
-
Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
-
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models
-
Towards Controlled LLM Unlearning
-
Towards Quantization-Adversarial Reparameterizations
-
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
-
Un-Distillable LLMs via Entropy-Perturbed Logits
-
Undistillable Open Language Models with Teacher Scrambling
-
Unlearners Can Lie: Evaluating “Honesty” in LLM Unlearning
-
User Confidence-Fueled Stereotypes: Investigating Sycophantic Amplification of Implicit Bias in Language Models
-
Who’s Your Judge? On the Detectability of LLM-Generated Judgments
-
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
-
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
-
Zero-Shot Embedding Drift Detection: A Lightweight Defense Against Prompt Injections in LLMs