NeurIPS 2025 Past Other
NeurIPS 2025 Workshop: Reliable ML from Unreliable Data
NeurIPS 2025 - Reliable ML Workshop
- Submission deadline
- Aug 30, 2025, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (149)
Fetched from OpenReview (v2) on 2026-06-10.
-
$\texttt{strategic-fl-sim}$: An Extensible Package for Simulating Strategic Behavior in Federated Learning
-
A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy
-
A Guide to Robust Generalization: The Impact of Architecture, Pre-training, and Optimization Strategy
-
A Multi-Method Interpretability Framework for Probing Cognitive Processing in Deep Neural Networks across Vision and Biomedical Domains
-
Active Slice Discovery in Large Language Models
-
Adaptive Norm Selection Prevents Catastrophic Overfitting in Fast Adversarial Training
-
Adversarial Attacks against Context-dependent Visual Association in Referring Multi-Object Tracking Systems
-
Adversarially-robust probes for Deep Networks
-
Aggregated Individual Reporting for Post-Deployment Evaluation: Mechanism Design & Modeling Considerations
-
Ambient Diffusion Omni
-
Ambient Proteins: Training Diffusion Models on Low Quality Structures
-
An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation
-
Approximate Leave-One-Out Cross Validation for Robust Scatter Matrix Estimation
-
Approximating Human Preferences Using a Multi-Judge Learned System
-
AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin
-
Automated Generation of Multilingual Jailbreak Prompts
-
Batch-Adaptive Annotations for Causal Inference with Complex-Embedded Outcomes
-
Bayesian Decision Making around Experts
-
Better Data for Satellite Super Resolution
-
Beyond Per-Question Privacy: Multi-Query Differential Privacy for RAG Systems
-
Beyond Static Bias: Quantifying Fairness Variability in CheXpert
-
Beyond Text: Multimodal Jailbreaking of Vision-Language and Audio Models through Perceptually-Aware Transformations
-
Breaking Bad: Exploring the Dangers of LLM-generated Misinformation from Fringe Social Media
-
Breaking the Mirror: Activation-Based Mitigation of Self-Preference in LLM Evaluators
-
BridgePure: Limited Protection Leakage Can Break Black-Box Data Protection
-
Certified Adversarial Robustness via Mixture-of-Gaussians Randomized Smoothing
-
Check Yourself Before You Wreck Yourself: Selectively Quitting Improves LLM Agent Safety
-
Clean-Label Physical Backdoor Attacks with Data Distillation
-
COIR: Chain-of-Intention Reasoning Elicits Defense in Multimodal Large Language Models
-
Complementing Self-Consistency with Cross-Model Disagreement for Uncertainty Quantification
-
Concept-Based Masking: A Patch-Agnostic Defense Against Adversarial Patch Attacks
-
Conformal Prediction for Molecular Properties under Label Shift
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
-
Cost Efficient Fairness Audit Under Partial Feedback
-
CroPA++: Exposing Vulnerabilities in Vision Language Models and Enhancing Adversarial Transferability of Cross-Prompt Attacks
-
Cross-Lingual Multimodal Retrieval-Augmented Generation for Open Question Answering in Tamil and Yoruba
-
Curvature Tuning: Provable Training-free Model Steering From a Single Parameter
-
Data Decomposition beyond Splitting for Causal Estimation
-
Data-Efficient and Robust Coreset Selection via Sparse Adversarial Perturbations
-
Deep Research Brings Deeper Harm
-
Diffusion-supplemented Implicit Layers: Operator Smoothing for better Implicit Solvers
-
Disarming Strategic Text: Span-Aware Counterfactuals for Robust Content Moderation
-
Ditch the Denoiser: Emergence of Noise Robustness in Self-Supervised Learning from Data Curriculum
-
Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?
-
Domain Generalization: A Tale of Two ERMs
-
Don’t Make It Up: Preserving Ignorance Awareness in LLM Fine-Tuning
-
Double Machine Learning Evaluation Under Distribution Shift and Selection Bias
-
Drawing Reliable Conclusions with Imperfect Synthetic Data
-
DynamiX: Dynamic Resource eXploration for Personalized Ad-Recommendations
-
Efficiently Robust In-Context Reinforcement Learning with Adversarial Generalization and Adaptation
-
Energy-Shaped Manifold Projections Enable Adversarial Detection
-
ERGO: Entropy-guided Resetting for Generation Optimization in Multi-turn Language Models
-
Evaluating robustness of tabular models under meta-features based shifts
-
Evaluating the Quality of AI-Generated Resolutions from Conversational vs Structured Sources: Implications for Enterprise Knowledge Automation
-
Extracting Latent Generalization from Models Trained with Noisy Labels
-
Failure Prediction Is a Better Performance Proxy for Early-Exit Networks Than Calibration
-
FairContrast: Enhancing Fairness through Contrastive learning and Customized Augmenting Methods on Tabular Data
-
Fairness Implications of GNN-to-MLP Knowledge Distillation
-
Fairness Through Independence via Cramér-von Mises Regularization
-
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
-
FAVAE-Effective Frequency Aware Latent Tokenizer
-
Few-Shot Knowledge Distillation for Language Models via Counterfactual Explanations
-
Fine-Grained Uncertainty Decomposition in Large Language Models: A Spectral Approach
-
Forget Less, Retain More: A Lightweight Regularizer for Rehearsal-Based Continual Learning
-
From Clutter to Clarity: Visual Recognition through Foveated Object-Centric Learning (FocL)
-
From Evidence to Knowledge: A Hierarchical Probabilistic Model of the Scientific Knowledge Landscape at Web Scale
-
From Many Voices to One: A Statistically Principled Aggregation of LLM Judges
-
From Search to Decision: A Framework for Adversarially Robust Approximate Nearest Neighbor Search
-
From Semantics to Symbols: A Two-Stage Framework for Deconstructing LLM Reasoning into Concepts and Rules
-
Generalizing Robustness from $\ell_p$ to Unforeseen Attack via Calibrated Adversarial Sampling
-
GUARD: Guiding Unbiased Alignment through Reward Debiasing
-
Human Uncertainty-Aware Reliable Data Selection and Efficient Annotation for Visual Question Answering
-
Improving Consistency in Retrieval-Augmented Systems with Group Similarity Rewards
-
Inducing Uncertainty on Open-Weight Models for Test-Time Privacy in Image Recognition
-
Influence Functions for Preference Dataset Pruning
-
Information-Theoretic Conditions for Chain-of-Thought Monitorability and Methods for Improving It
-
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models
-
It is Hard to Unlearn Dogged Backdoor Samples in Diffusion Models
-
KAIROS: Scalable Model-Agnostic Data Valuation
-
Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification
-
Learning reliably under adversarial attacks, distribution shifts and strategic agents
-
Lightweight Robust Direct Preference Optimization
-
LoCaTE: A Local and Training Dynamics Perspective at Detecting Label Noise in Deep Classification
-
Locks Tested Without Burglars: Using Coding Assistants to Break Prompt Injection Defenses
-
Minimal Repairs for Learning Over Incomplete Data
-
MPSelectTune: Prompt-type Selection for Fine-tuning improves Concept Unlearning in LLMs
-
Near-Optimal Reinforcement Learning for Linear Distributionally Robust Markov Decision Processes
-
Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning
-
Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories
-
Obscurable Fishermen
-
On Fairness of Task Arithmetic: The Role of Task Vectors
-
On the Interaction of Compressibility and Adversarial Robustness
-
Optimal Fair Learning Robust to Adversarial Distribution Shift
-
Optimal Lower Bounds and New Upper Bounds for Sequential Prediction with Abstention
-
Persistent and Stealthy Backdoor Attacks in Federated Learning via Layerwise Model Poisoning
-
Positive-Unlabeled Learning for Control Group Construction in Observational Causal Inference
-
Probabilistic Framework for Robustness of Counterfactual Explanations Under Data Shifts
-
Quantifying CBRN Risk in Frontier Models
-
Reasoning as an Adaptive Defense for Safety
-
Regression-Based Estimation of Causal Effects in the Presence of Selection Bias and Confounding
-
Regularized Robustly Reliable Learners and Instance Targeted Attacks
-
Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry
-
Reliable Compositional Editing with Overlap-Aware Attention in Diffusion Models
-
Reliable Models via Responsiveness Verification
-
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection
-
Responsible Imputation of User Behavior Surveys via Mask-Aware Transformers
-
Rethinking Sparse Autoencoders: Select-and-Project for Fairness and Control from Encoder Features Alone
-
Return Augmented Decision Transformer for Off-Dynamics Reinforcement Learning
-
Reweighted Flow Matching via Unbalanced Optimal Transport for Long-tailed Generation
-
RL-Guided Data Selection for Language Model Finetuning
-
Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling
-
Robust Federated Learning under Heterogeneous Data with Generalized Heavy-Ball Momentum
-
Robust Fine-Tuning from Non-Robust Pretrained Models: Mitigating Suboptimal Transfer With Epsilon-Scheduling
-
Robust Multi-task Modeling for Bayesian Optimization via In-Context Learning
-
Safety by Design: High-Probability Constrained Contextual Bandits
-
SAGE: Streaming, Agreement-driven Gradient Sketches for Representative Subset Selection
-
Sandbagging in a Simple Survival Bandit Problem
-
Selective Cost-Aware Random Forests for Unreliable Data
-
Selective Preference Aggregation
-
SIVA: Self-Improving Vulnerability Agent
-
Sparse Parameter Adaptation for Fair Model Transfer Across Domains
-
Spectral Regularization as a Safety-Critical Inductive Bias
-
StealthEval: A Probe-Rewrite-Evaluate Workflow for Reliable Benchmarks
-
Strategic Feature Selection
-
Stress-Testing Byzantine Defenses under Data Heterogeneity
-
Stylistic Shifts in Human–LLM Conversations: Challenges and Adaptation
-
Tackling the Noisy Elephant in the Room: Label Noise-robust Out-of-Distribution Detection via Loss Correction and Low-rank Decomposition
-
Taming the Noisy Oracle: Robust Entity-Centric Question Answering via Learning from Imperfect Feedback
-
Task Priors: Enhancing Model Evaluation by Considering the Entire Space of Downstream Tasks
-
Teaming LLMs to Detect and Mitigate Hallucinations
-
Temp-SCONE: A Novel Out-of-Distribution Detection and Domain Generalization Framework for Wild Data with Temporal Shift
-
Testing Noise Assumptions of Learning Algorithms
-
Text‑Guided Data Attribution: Attributing the Influence of Simplicity Bias to Dataset
-
The Impact of Training Data on Adversarial Robustness
-
The Silent Judge: Unacknowledged Shortcut Bias in LLM-as-a-Judge
-
The Statistical Fairness-Accuracy Frontier
-
Towards Context-Aware Domain Generalization: Understanding the Benefits and Limits of Marginal Transfer Learning
-
Towards Trustworthy Amortized Bayesian Model Comparison
-
Trust, But Attribute: Tracing Impact of Data on Trustworthiness in Supervised LLM Fine-Tuning
-
Uncertainty as Feature Gaps: Epistemic Uncertainty Quantification of LLMs in Contextual Question-Answering
-
Uncertainty-Aware LLMs Fail to Flag Misleading Contexts
-
Unlocking Transfer Learning for Open-World Few-Shot Recognition
-
Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning
-
WASP: A Weight-Space Approach to Detecting Learned Spuriousness
-
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
-
When "Competency" in Reasoning Opens the Door to Vulnerability: Jailbreaking LLMs via Novel Ciphers
-
Why is Your Language Model a Poor Implicit Reward Model?
-
Wrong Model, Right Uncertainty: Spatial Associations for Discrete Data with Misspecification
-
Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting