NeurIPS 2025PastML systems

Machine Learning for Systems 2025

MLForSys2025

Official website ↗OpenReview venue ↗See all NeurIPS workshops →✎ Edit this entry

Submission deadline: Aug 30, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (41)

Fetched from OpenReview (v2) on 2026-06-10.

A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving
Ferran Agullo, Joan Oliveras Torra, Chen Wang, Alberto Gutierrez-Torre, Olivier Tardieu, Alaa Youssef, Jordi Torres, Josep Lluis Berral · PDF
A Joint Learning Approach to Hardware Caching and Prefetching
· PDF
Advancing Routing-Awareness in Analog ICs Floorplanning
· PDF
Adversarial Query Synthesis via Bayesian Optimization
· PDF
Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks
· PDF
An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory
· PDF
An Expert in Residence: LLM Agents for Always-On Operating System Tuning
· PDF
APCE: Adaptive Progressive Context Expansion for Long Context Processing
· PDF
ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training
· PDF
Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC
· PDF
Automated Multi-Agent Workflows for RTL Design
· PDF
Carbon-Aware RL-LLM Control for Energy-Efficient Liquid-Cooled HPC Data Centers
· PDF
DataSwift: Smart Choices for Safe Query Optimization
· PDF
Forecasting machine degradation of GPU Clusters
Shengnan Cai, Shuxin Nie, Zhehui Chen, Nupur Gulalkari, George Vanica, Chetna Jain, Sethuraman Sankaran · PDF
GraphFaaS: Serverless GNN Inference for Burst-Resilient, Real-Time Intrusion Detection
· PDF
How Should We Evaluate Data Deletion in Graph-Based ANN Indexes?
· PDF
InfraGym: Empowering LLM Agents for Real-World Computer System Optimization
· PDF
Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference
· PDF
Leveraging Large Language Models to Enhance Machine-Learning-Driven HPC Job Scheduling
· PDF
LLM-Box : An Agentic Framework for Guided Black-Box Optimization in Mapping LLMs onto Specialized Hardware Accelerators
· PDF
LLM-Guided Autoscheduling for Large-Scale Sparse Machine Learning
· PDF
LLMVisor: A Real-Time Latency Attribution Model for Multi-Tenant LLM Serving
· PDF
Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents
Derek Lilienthal, Sanghyun Hong · PDF
ML-Guided Cold Plate Design and Thermal Analysis for Liquid-Cooled HPC Servers
· PDF
MoE-GPS: Guidlines for Prediction Strategy with Expert Duplication in MoE Load Balancing
· PDF
MXNorm: Reusing block scales for efficient tensor normalisation
· PDF
NetGent : Agent-Based Automation of Network Application Workflows
· PDF
NeuSym-HLS: Learning-Driven Symbolic Distillation in High-Level Synthesis of Hardware Accelerators
Chung-Mou Pan, Salma Elmalaki, Yasser Shoukry, Sitao Huang · PDF
Optimized Learned Count-Min Sketch
· PDF
OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization
Advait Gadhikar, Riccardo Grazzi, James Hensman · PDF
PORT: Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving
Fangzhou Wu, Sandeep Silwal · PDF
QAQ: Query-adaptive Mixed-precision Quantization for Large Language Models
· PDF
Retrieval on Verilog Repositories: A Knowledge-Graph Based Solution
· PDF
Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems
Prathamesh Devadiga · PDF
Small, Fast, and Certain: Developing a Specialized Verilog Code Completion Solution for the Enterprise
· PDF
Sustainable Control of Geo-Distributed Datacenters by Distilling Numerical Experts into Adaptive LLM Agents
· PDF
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
· PDF
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
YUSHENG ZHENG, YanPeng Hu, Wei Zhang, Andi Quinn · PDF
Towards Automatically Optimizing Retrieval Augmented AI Systems
· PDF
Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction
· PDF
When to Reason: Semantic Router for vLLM
Chen Wang, Xunzhuo Liu, Yuhan Liu, Yue Zhu, Xiangxi Mo, Junchen Jiang, Huamin Chen · PDF

Accepted papers (41)

☆A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving

☆A Joint Learning Approach to Hardware Caching and Prefetching

☆Advancing Routing-Awareness in Analog ICs Floorplanning

☆Adversarial Query Synthesis via Bayesian Optimization

☆Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks

☆An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory

☆An Expert in Residence: LLM Agents for Always-On Operating System Tuning

☆APCE: Adaptive Progressive Context Expansion for Long Context Processing

☆ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training

☆Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC

☆Automated Multi-Agent Workflows for RTL Design

☆Carbon-Aware RL-LLM Control for Energy-Efficient Liquid-Cooled HPC Data Centers

☆DataSwift: Smart Choices for Safe Query Optimization

☆Forecasting machine degradation of GPU Clusters

☆GraphFaaS: Serverless GNN Inference for Burst-Resilient, Real-Time Intrusion Detection

☆How Should We Evaluate Data Deletion in Graph-Based ANN Indexes?

☆InfraGym: Empowering LLM Agents for Real-World Computer System Optimization

☆Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference

☆Leveraging Large Language Models to Enhance Machine-Learning-Driven HPC Job Scheduling

☆LLM-Box : An Agentic Framework for Guided Black-Box Optimization in Mapping LLMs onto Specialized Hardware Accelerators

☆LLM-Guided Autoscheduling for Large-Scale Sparse Machine Learning

☆LLMVisor: A Real-Time Latency Attribution Model for Multi-Tenant LLM Serving

☆Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents

☆ML-Guided Cold Plate Design and Thermal Analysis for Liquid-Cooled HPC Servers

☆MoE-GPS: Guidlines for Prediction Strategy with Expert Duplication in MoE Load Balancing

☆MXNorm: Reusing block scales for efficient tensor normalisation

☆NetGent : Agent-Based Automation of Network Application Workflows

☆NeuSym-HLS: Learning-Driven Symbolic Distillation in High-Level Synthesis of Hardware Accelerators

☆Optimized Learned Count-Min Sketch

☆OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization

☆PORT: Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

☆QAQ: Query-adaptive Mixed-precision Quantization for Large Language Models

☆Retrieval on Verilog Repositories: A Knowledge-Graph Based Solution

☆Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

☆Small, Fast, and Certain: Developing a Specialized Verilog Code Completion Solution for the Enterprise

☆Sustainable Control of Geo-Distributed Datacenters by Distilling Numerical Experts into Adaptive LLM Agents

☆SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization

☆Towards Agentic OS: An LLM Agent Framework for Linux Schedulers

☆Towards Automatically Optimizing Retrieval Augmented AI Systems

☆Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

☆When to Reason: Semantic Router for vLLM

A Data-driven ML Approach for Maximizing Performance in LLM-Adapter Serving

A Joint Learning Approach to Hardware Caching and Prefetching

Advancing Routing-Awareness in Analog ICs Floorplanning

Adversarial Query Synthesis via Bayesian Optimization

Agentic Bridge Framework: Closing the Gap Between Agentic Capability and Performance Benchmarks

An Early Exploration of Deep-Learning-Driven Prefetching for Far Memory

An Expert in Residence: LLM Agents for Always-On Operating System Tuning

APCE: Adaptive Progressive Context Expansion for Long Context Processing

ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training

Attention-Informed Surrogates for Navigating Power-Performance Trade-offs in HPC

Automated Multi-Agent Workflows for RTL Design

Carbon-Aware RL-LLM Control for Energy-Efficient Liquid-Cooled HPC Data Centers

DataSwift: Smart Choices for Safe Query Optimization

Forecasting machine degradation of GPU Clusters

GraphFaaS: Serverless GNN Inference for Burst-Resilient, Real-Time Intrusion Detection

How Should We Evaluate Data Deletion in Graph-Based ANN Indexes?

InfraGym: Empowering LLM Agents for Real-World Computer System Optimization

Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference

Leveraging Large Language Models to Enhance Machine-Learning-Driven HPC Job Scheduling

LLM-Box : An Agentic Framework for Guided Black-Box Optimization in Mapping LLMs onto Specialized Hardware Accelerators

LLM-Guided Autoscheduling for Large-Scale Sparse Machine Learning

LLMVisor: A Real-Time Latency Attribution Model for Multi-Tenant LLM Serving

Mind the Gap: Time-of-Check to Time-of-Use Vulnerabilities in LLM-Enabled Agents

ML-Guided Cold Plate Design and Thermal Analysis for Liquid-Cooled HPC Servers

MoE-GPS: Guidlines for Prediction Strategy with Expert Duplication in MoE Load Balancing

MXNorm: Reusing block scales for efficient tensor normalisation

NetGent : Agent-Based Automation of Network Application Workflows

NeuSym-HLS: Learning-Driven Symbolic Distillation in High-Level Synthesis of Hardware Accelerators

Optimized Learned Count-Min Sketch

OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization

PORT: Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving

QAQ: Query-adaptive Mixed-precision Quantization for Large Language Models

Retrieval on Verilog Repositories: A Knowledge-Graph Based Solution

Small Language Models as Compiler Experts: Auto-Parallelization for Heterogeneous Systems

Small, Fast, and Certain: Developing a Specialized Verilog Code Completion Solution for the Enterprise

Sustainable Control of Geo-Distributed Datacenters by Distilling Numerical Experts into Adaptive LLM Agents

SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization

Towards Agentic OS: An LLM Agent Framework for Linux Schedulers

Towards Automatically Optimizing Retrieval Augmented AI Systems

Ultra-Efficient Decoding for End-to-End Neural Compression and Reconstruction

When to Reason: Semantic Router for vLLM