NeurIPS 2025 Past Other

NeurIPS 2025 Fourth Workshop on Deep Learning for Code

DL4C @ NeurIPS 2025

Submission deadline
Aug 28, 2025, 20:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (69)

Fetched from OpenReview (v2) on 2026-06-10.

  1. A Matter of Representation: Towards Graph-Based Abstract Code Generation

    Nyx Audrey Angelo Iskandar, Hisham Bedri, Andy Tsen · PDF
  2. A Note on the Code Quality Score System: LLMs for Maintainable Large Codebases

    Jalaj Bhandari, Sherman Wong, Fan Yang · PDF
  3. Adapting Language Models for Low-Resource Programming Languages

    Ananya Singha, Mukul Singh, Hosein Hasanbeig, Arjun Radhakrishna, Sumit Gulwani · PDF
  4. Advancing Environment Setup LLMs through Online Reinforcement Learning

    Alexander Kovrigin, Aleksandra Eliseeva, Konstantin Grotov, Egor Bogomolov, Yaroslav Zharov · PDF
  5. Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem

    Muhammad Maaz, Liam DeVoe, Zac Hatfield-Dodds, Nicholas Carlini · PDF
  6. Agint: Agentic Graph Compilation for Software Engineering Agents

    Abhiram Chivukula, Jay Somasundaram, Vijay Somasundaram · PDF
  7. Asm2SrcEval: Evaluating Large Language Models for Assembly to Source Code Translation

    Parisa Hamedi, Hamed Jelodar, Samita Bai, Mohammad Meymani, Roozbeh Razavi-Far, Ali A. Ghorbani · PDF
  8. Astra: A Multi-Agent System for GPU Kernel Performance Optimization

    Anjiang Wei, Tianran Sun, Yogesh Seenichamy, Hang Song, Anne Ouyang, Azalia Mirhoseini, Ke Wang, Alex Aiken · PDF
  9. Beyond Accuracy: Realistic and Diagnostic Evaluation of Code Generation Models

    Pareesa Ameneh Golnari, Adarsh Kumarappan, Wen Wen, Xiaoyu Liu, Gabriel Ryan, Yuting Sun, Shengyu Fu, Elsie Nallipogu · PDF
  10. BUILD-BENCH: Benchmarking LLM Agents on Compiling Real-World Open-Source Software

    Zehua Zhang, Ati Priya Bajaj, Divij Handa, Siyu Liu, Arvind S Raj, Hongkai Chen, Hulin Wang, Yibo Liu, Zion Leonahenahe Basque, Souradip Nath, Vishal Juneja, Nikhil Chapre, Yan Shoshitaishvili, Adam Doupe, Chitta Baral, Ruoyu Wang · PDF
  11. Can Test-Time Compute Help LLMs Write Low-Resource Parallel Code Better?

    Gautam Singh, Arjun Guha, Bhavya Kailkhura, Harshitha Menon · PDF
  12. ChopChop: Semantically Constraining the Code Output of Language Models

    Shaan Nagy, Timothy Zhou, Nadia Polikarpova, Loris D'Antoni · PDF
  13. Code2Video: A Code-centric Paradigm for Educational Video Generation

    Yanzhe Chen, Kevin Qinghong Lin, Mike Zheng Shou · PDF
  14. CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis

    Anjiang Wei, Tarun Suresh, Jiannan Cao, Naveen Kannan, Yuheng Wu, Kai Yan, Thiago S. F. X. Teixeira, Ke Wang, Alex Aiken · PDF
  15. CodeEvo: Interaction-Driven Synthesis of Code-centric Data through Hybrid and Iterative Feedback

    Qiushi Sun, Jingyang Gong, Qipeng Guo, Lei Li, Fei Yuan · PDF
  16. CodeMirage: A Multi-Lingual Benchmark for Detecting AI-Generated and Paraphrased Source Code from Production-Level LLMs

    Hanxi Guo, Siyuan Cheng, Kaiyuan Zhang, Guangyu Shen, Xiangyu Zhang · PDF
  17. CoDyn: Dynamic LLM Routing for Coding Tasks

    Mirazul Haque, Petr Babkin, Vali Tawosi, Saba Rahimi, Natraj Raman, Xiaomo Liu · PDF
  18. Constrained Decoding of Diffusion LLMs with Context-Free Grammars

    Niels Mündler, Jasper Dekoninck, Martin Vechev · PDF
  19. Cyber-Zero: Training Cybersecurity Agents without Runtime

    Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang · PDF
  20. Deep-Reproducer: From Paper Understanding to Code Generation

    Pengcheng Chen, Ning Yan, Zihan Zhao, Yixiao Lin, Huaibo Chen, Yue Hu, Qinbo Bai, Xiang Li, Masood S. Mortazavi · PDF
  21. Demystify the Potential of Large Language Models as General-Purpose Surrogate Code Executors

    Bohan Lyu, Siqiao Huang, Zichen Liang, Wenjia Yang, Qian Sun, Jiaming Zhang · PDF
  22. Diff-XYZ: A Benchmark for Evaluating Diff Understanding

    Evgeniy Glukhov, Michele Conti, Egor Bogomolov, Yaroslav Golubev, Alex Bezzubov · PDF
  23. DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code

    Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma · PDF
  24. Efficient Code Embeddings from Code Generation Models

    Daria Kryvosheieva, Saba Sturua, Michael Günther, Scott Martens, Han Xiao · PDF
  25. Ensuring Functional Correctness of Large Code Models with Selective Generation

    Jaewoo Jeong, Taesoo Kim, Sangdon Park · PDF
  26. EquiBench: Benchmarking Large Language Models’ Understanding of Program Semantics via Equivalence Checking

    Anjiang Wei, Jiannan Cao, Ran Li, Hongyu Chen, Yuhui Zhang, Ziheng Wang, Yuan Liu, Thiago S. F. X. Teixeira, Diyi Yang, Ke Wang, Alex Aiken · PDF
  27. FreshBrew: A Benchmark for Evaluating AI Agents on Java Code Migration

    Victor May, Diganta Misra, Yanqi Luo, Anjali Sridhar, Justine Gehring, Silvio Soares Ribeiro Junior · PDF
  28. GitChameleon 2.0: Evaluating AI Code Generation Against Python Library Version Incompatibilities

    Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Sajjad Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia · PDF
  29. Good-Enough Structured Generation: A Case Study on JSON Schema

    Ivan Lee, Loris D'Antoni, Taylor Berg-Kirkpatrick · PDF
  30. HardTests: Synthesizing High-Quality Test Cases for LLM Coding

    Zhongmou He, Yee Man Choi, Kexun Zhang, Jiabao Ji, Junting Zhou, Dejia Xu, Ivan Bercovich, Aidan Zhang, Lei Li · PDF
  31. HarnessLLM: Automatic Testing Harness Generation via Reinforcement Learning

    Yujian Liu, Jiabao Ji, Yang Zhang, Wenbo Guo, Tommi Jaakkola, Shiyu Chang · PDF
  32. Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

    Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken · PDF
  33. Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces

    Anjiang Wei, Allen Nie, Thiago S. F. X. Teixeira, Rohan Yadav, Wonchan Lee, Ke Wang, Alex Aiken · PDF
  34. In-Context Learning for Esoteric Programming Languages: Evaluating and Enhancing LLM Reasoning Without Fine-Tuning

    Saraswathy Amjith, Michael X. Wang, Jayson Lynch, Arul Kolla, Neil Thompson · PDF
  35. Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

    Amal Abed, Ivan Lukic, Jörg K.H. Franke, Frank Hutter · PDF
  36. Interactive Evaluation of Large Language Models for Multi-Requirement Software Engineering Tasks

    Dimitrios Rontogiannis, Maxime Peyrard, Nicolas Baldwin, Martin Josifoski, Robert West, Dimitrios Gunopulos · PDF
  37. Is Your Benchmark Still Useful? Dynamic Benchmarking for Code Language Models

    Batu Guan, Xiao Wu, Yuanyuan Yuan, Shaohua Li · PDF
  38. Learning From Design Procedure To Generate CAD Programs for Data Augmentation

    Yan-Ying Chen, Dule Shu, Matthew K Hong, Andrew Taber, Jonathan Qiang Li, Matthew Klenk · PDF
  39. Learning to Solve and Verify: A Self-Play Framework for Mutually Improving Code and Test Generation

    Zi Lin, Sheng Shen, Jingbo Shang, Jason E Weston, Yixin Nie · PDF
  40. LLM-Driven Multi-step Translation from C to Rust using Static Analysis

    Tianyang Zhou, Haowen Lin, Somesh Jha, Mihai Christodorescu, Kirill Levchenko, Varun Chandrasekaran · PDF
  41. LLM-JEPA: Large Language Models Meet Joint Embedding Predictive Architectures

    Hai Huang, Yann LeCun, Randall Balestriero · PDF
  42. MOSAIC: Multi-agent Orchestration for Task-Intelligent Scientific Coding

    Siddeshwar Raghavan, Tanwi Mallick · PDF
  43. Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

    Minju Seo, Jinheon Baek, Seongyun Lee, Sung Ju Hwang · PDF
  44. Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

    Timur Galimzyanov, Olga Kolomyttseva, Egor Bogomolov · PDF
  45. pydra: Probing Code Representations With Synthetic Clones and Bugs

    Ellie Kitanidis, Cole J Hunter · PDF
  46. R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents

    Naman Jain, Jaskirat Singh, Manish Shetty, Tianjun Zhang, Liang Zheng, Koushik Sen, Ion Stoica · PDF
  47. Random Baselines for Simple Code Problems are Competitive with Code Evolution

    Yonatan Gideoni, Yujin Tang, Sebastian Risi, Yarin Gal · PDF
  48. Refactoring Codebases through Library Design

    Žiga Kovačič, Justin T Chiu, Celine Lee, Wenting Zhao, Kevin Ellis · PDF
  49. RocqStar: Leveraging Similarity-driven Retrieval and Agentic Systems for Rocq generation

    Andrei Kozyrev, Nikita Khramov, Gleb Solovev, Anton Podkopaev · PDF
  50. SATBench: Benchmarking LLMs' Logical Reasoning via Automated Puzzle Generation from SAT Formulas

    Anjiang Wei, Yuheng Wu, Yingjia Wan, Tarun Suresh, Huanmi Tan, Zhanke Zhou, Sanmi Koyejo, Ke Wang, Alex Aiken · PDF
  51. Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models

    Mehrzad Samadi, Aleksander Ficek, Sean Narenthiran, Siddhartha Jain, Wasi Uddin Ahmad, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg · PDF
  52. Schema Lineage Extraction at Scale: Multilingual Pipelines, Composite Evaluation, and Language-Model Benchmarks

    Jiaqi Yin, Yi-Wei Chen, Meng-Lung Lee, Xiya Liu · PDF
  53. Security Knowledge Dilution in Large Language Models: How Irrelevant Context Degrades Critical Domain Expertise

    Shivani Shukla, Himanshu Joshi · PDF
  54. SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning

    Zhenghai Xue, Longtao Zheng, Qian Liu, Yingru Li, Xiaosen Zheng, Zejun MA, Bo An · PDF
  55. SQL-of-Thought: Multi-agentic Text-to-SQL with Guided Error Correction

    Saumya Chaturvedi, Aman Chadha, Laurent Bindschaedler · PDF
  56. STACKFEED: Structured Textual Actor-Critic Knowledge base editing with FEEDback

    Shashank Kirtania, Naman Gupta, Priyanshu Gupta, Sumit Gulwani, Arun Iyer, Suresh Parthasarathy Iyengar, Arjun Radhakrishna, Sriram K. Rajamani, Gustavo Soares · PDF
  57. SubtaskEval: Benchmarking LLMs on Competitive Programming Subtasks

    Samik Goyal · PDF
  58. SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development

    Yaxin Du, Yuzhu Cai, Yifan Zhou, Cheng Wang, Yu Qian, Xianghe Pang, Qian Liu, Yue Hu, Siheng Chen · PDF
  59. SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

    Xinyi He, Qian Liu, Mingzhe Du, Lin Yan, ZhiJie Fan, Yiming Huang, Zejian Yuan, Zejun MA · PDF
  60. SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution

    Yuxiang Wei, Olivier Duchenne, Jade Copet, Quentin Carbonneaux, LINGMING ZHANG, Daniel Fried, Gabriel Synnaeve, Rishabh Singh, Sida Wang · PDF
  61. The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management

    Tobias Lindenbauer, Igor Slinko, Ludwig Felder, Egor Bogomolov, Yaroslav Zharov · PDF
  62. The Valley of Code Reasoning: Scaling Knowledge Distillation of Large Language Models

    Muyu He, Muhammad Ali Shafique, Anand Kumar, Tsach Mackey, Nazneen Rajani · PDF
  63. Thyme: Think Beyond Images

    YiFan Zhang, Xingyu Lu, Shukang Yin, Chaoyou Fu, Wei Chen, Xiao Hu, Bin Wen, Kaiyu Jiang, Changyi Liu, Tianke Zhang, Haonan Fan, Kaibing Chen, Jiankang Chen, Haojie Ding, Kaiyu Tang, Zhang Zhang, Liang Wang, Fan Yang, Tingting Gao, Guorui Zhou · PDF
  64. Training Language Model Agents to Find Vulnerabilities with CTF-Dojo

    Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang · PDF
  65. Training LLM Agents to Empower Humans

    Evan Ellis, Vivek Myers, Jens Tuyls, Sergey Levine, Anca Dragan, Benjamin Eysenbach · PDF
  66. Understanding Secret Leakage Risks in Code LLMs: A Tokenization Perspective

    Meifang Chen, Zhe YANG, HUANG Nianchen, Yizhan Huang, Yichen LI, Michael R. Lyu · PDF
  67. VeriCoder: Enhancing LLM-Based RTL Code Generation through Functional Correctness Validation

    Anjiang Wei, Huanmi Tan, Tarun Suresh, Daniel Mendoza, Thiago S. F. X. Teixeira, Ke Wang, Caroline Trippel, Alex Aiken · PDF
  68. Where's the Bug? Attention Probing for Scalable Fault Localization

    Adam Stein, Arthur Wayne, Aaditya Naik, Mayur Naik, Eric Wong · PDF
  69. Workflows vs Agents for Code Translation

    Henry Gray, Octavian Udrea, Tom Yotam · PDF