ICLR 2025PastOther

ICLR 2025 Third Workshop on Deep Learning for Code

DL4C @ ICLR 2025

Official website ↗OpenReview venue ↗See all ICLR workshops →✎ Edit this entry

Submission deadline: Feb 13, 2025, 11:59 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (46)

Fetched from OpenReview (v2) on 2026-06-10.

Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu, Kunle Olukotun · PDF
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining
Yuxiang Wei, Hojae Han, Rajhans Samdani · PDF
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions
Ziming Li, Qianbo Zang, David Ma, Jiawei Guo, Tianyu Zheng, Minghao Liu, Xinyao Niu, Yue Wang, Jian Yang, Jiaheng Liu, Wanjun Zhong, Wangchunshu Zhou, Stephen Huang, Ge Zhang · PDF
Automated Benchmark Generation for Repository-Level Coding Tasks
Konstantinos Vergopoulos, Mark Niklas Mueller, Martin Vechev · PDF
BaxBench: Can LLMs Generate Correct and Secure Backends?
Mark Vero, Niels Mündler, Victor Chibotaru, Veselin Raychev, Maximilian Baader, Nikola Jovanović, Jingxuan He, Martin Vechev · PDF
Black-Box Adversarial Attacks on LLM-Based Code Completion
Slobodan Jenko, Niels Mündler, Jingxuan He, Mark Vero, Martin Vechev · PDF
CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification
Jiacheng Xu, Bo Pang, Jin Qu, Hiroaki Hayashi, Caiming Xiong, Yingbo Zhou · PDF
Code2JSON: Can a Zero-Shot LLM Extract Code Features for Code RAG?
Aryan Singhal, Rajat Ghosh, Ria Mundra, Harshil Dadlani, Debojyoti Dutta · PDF
CodeEditorBench: Evaluating Code Editing Capability of LLMs
Jiawei Guo, Ziming Li, Xueling Liu, Kaijing Ma, Tianyu Zheng, Zhouliang Yu, Ding Pan, Yizhi LI, Ruibo Liu, Yue Wang, Shuyue Guo, Xingwei Qu, Xiang Yue, Ge Zhang, Wenhu Chen, Jie Fu · PDF
CodeTransEngine: Ready-to-use Backend for LLM-based Code Translation
Marcos Macedo, Yuan Tian, Bram Adams · PDF
Contextual Augmented Multi-Model Programming (CAMP): A Local-Cloud Copilot Solution
Yuchen Wang, Shangxin Guo, Chee Wei Tan · PDF
Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning
Lynn Cherif, Flemming Kondrup, David Venuto, Ankit Anand, Doina Precup, Khimya Khetarpal · PDF
Diagnosing Robotics Systems Issues with Large Language Models – A Case Study
Jordis Emilia Herrmann, Aswath Mandakath Gopinath, Mikael Norrlof, Mark Niklas Mueller · PDF
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light, Wei Cheng, Yue Wu, Masafumi Oyamada, Mengdi Wang, Santiago Paternain, Haifeng Chen · PDF
Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution
Jiawei Liu, Thanh V Nguyen, Mingyue Shang, Hantian Ding, Xiaopeng Li, Yu Yu, Varun Kumar, Zijian Wang · PDF
EnvBench: A Benchmark for Automated Environment Setup
Aleksandra Eliseeva, Alexander Kovrigin, Ilia Kholkin, Egor Bogomolov, Yaroslav Zharov · PDF
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula, Shuo Li, Botong Zhang, Vishakh Padmakumar, Kayo Yin, Osbert Bastani · PDF
Evolving RL: Discovering New Activation Functions using LLMs
Kalyan Varma Nadimpalli, Shashank Reddy Chirra, Pradeep Varakantham, Stefan Bauer · PDF
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models
Wenqi Pei, Hailing Xu, Henry Hengyuan Zhao, CHEN Han, Zining Zhang, Shizheng Hou, Luo Pingyi, Bingsheng He · PDF
From Pseudo-Code to Source Code: A Self-Supervised Search Approach
Adithya Kulkarni, Mohna Chakraborty, Yonas Afewerki Sium, Sai Charishma Valluri, Wei Le, Qi Li · PDF
GenePrune : Automated Pruning of Large Language Models for Code using Genetic Algorithm
Nikhil Reddy Varimalla, Ruturaj Godse · PDF
Generate-Feedback-Refine: How Much Does Model Quality in Each Role Matter?
Xiang Pan, Jason Phang, Guy Davidson, Ethan Perez · PDF
Generating Code to Verify Cryptic Crossword Reasoning
Martin Andrews, Sam Witteveen · PDF
GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code
Samidha Verma, Arushi Goyal, Ananya Mathur, Ankit Anand, Sayan Ranu · PDF
Improving Automated Issue Resolution via Comprehensive Repository Exploration
YINGWEI MA, Yue Liu · PDF
InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation
Marcos Macedo, Yuan Tian, Pengyu Nie, Filipe R. Cogo, Bram Adams · PDF
KernelBench: Can LLMs Write Efficient GPU Kernels?
Anne Ouyang, Simon Guo, Simran Arora, Alex L Zhang, William Hu, Christopher Re, Azalia Mirhoseini · PDF
LLM Program Optimization via Retrieval Augmented Search
Sagnik Anupam, Alexander Shypula, Osbert Bastani · PDF
LoRACode: LoRA Adapters for Code Embeddings
Saumya Chaturvedi, Aman Chadha, Laurent Bindschaedler · PDF
ML-BENCH: EVALUATING LARGE LANGUAGE MODELS AND AGENTS FOR MACHINE LEARNING TASKS ON REPOSITORY-LEVEL CODE
Xiangru Tang, Yuliang Liu, Zefan Cai, Yanjun Shao, Junjie Lu, Yichi Zhang, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Chen Sheng, Haozhe Zhao, Liang Chen, Tianyu Liu, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Zhiwei Jiang, Baobao Chang, Arman Cohan, Mark Gerstein · PDF
ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows
Harshith Padigela, Chintan Shah, Dinkar Juyal · PDF
NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits
Tushar Aggarwal, Swayam Singh, Abhijeet Awasthi, Aditya Kanade, Nagarajan Natarajan · PDF
On Pretraining For Project-Level Code Completion
Maksim Sapronov, Evgeniy Glukhov · PDF
One Model to Train Them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings
Andrea Gurioli, Federico Pennino, Joao Monteiro, Maurizio Gabbrielli · PDF
Parameter-Efficient Instruction Tuning Code Large Language Models: An Empirical Study
Terry Yue Zhuo, Armel Randy Zebaze, Leandro Von Werra, Harm de Vries, Qian Liu, Niklas Muennighoff · PDF
Programming with Pixels: Towards Generalist Software Engineering Agents
Pranjal Aggarwal, Sean Welleck · PDF
Shedding Light on Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model
Janis Zenkner, Tobias Sesterhenn, Christian Bartelt · PDF
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution
Chengxing Xie, Bowen Li, Chang Gao, He Du, Wai Lam, Difan Zou, Kai Chen · PDF
Tasks, Challenges, and Paths Towards AI for Software Engineering
Alex Gu, Naman Jain, Wen-Ding Li, Manish Shetty, Kevin Ellis, Koushik Sen, Armando Solar-Lezama · PDF
Teaching Language Models to Critique via Reinforcement Learning
Zhihui Xie, Jie chen, Liyu Chen, Weichao Mao, Jingjing Xu, Lingpeng Kong · PDF
Themisto: Jupyter-Based Runtime Benchmark
Konstantin Grotov, Sergey Titov · PDF
Toward Trustworthy Neural Program Synthesis
Wen-Ding Li, Darren Yan Key, Kevin Ellis · PDF
Training Software Engineering Agents and Verifiers with SWE-Gym
Jiayi Pan, Xingyao Wang, Graham Neubig, Navdeep Jaitly, Heng Ji, Alane Suhr, Yizhe Zhang · PDF
Type-Constrained Code Generation with Language Models
Niels Mündler, Jingxuan He, Hao Wang, Koushik Sen, Dawn Song, Martin Vechev · PDF
TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories
Yuhe Jiang, Xun Deng, Jiacheng Yang, Honghua Dong, Gennady Pekhimenko, Fan Long, Xujie Si · PDF
WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation
Rabiul Awal, Mahsa Massoud, Zichao Li, Aarash Feizi, Suyuchen Wang, Christopher Pal, Aishwarya Agrawal, David Vazquez, Siva Reddy, Juan A. Rodriguez, Perouz Taslakian, Spandana Gella, Sai Rajeswar · PDF

Accepted papers (46)

☆Adaptive Self-improvement LLM Agentic System for ML Library Development

☆Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

☆AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

☆Automated Benchmark Generation for Repository-Level Coding Tasks

☆BaxBench: Can LLMs Generate Correct and Secure Backends?

☆Black-Box Adversarial Attacks on LLM-Based Code Completion

☆CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification

☆Code2JSON: Can a Zero-Shot LLM Extract Code Features for Code RAG?

☆CodeEditorBench: Evaluating Code Editing Capability of LLMs

☆CodeTransEngine: Ready-to-use Backend for LLM-based Code Translation

☆Contextual Augmented Multi-Model Programming (CAMP): A Local-Cloud Copilot Solution

☆Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning

☆Diagnosing Robotics Systems Issues with Large Language Models – A Case Study

☆DISC: Dynamic Decomposition Improves LLM Inference Scaling

☆Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

☆EnvBench: A Benchmark for Automated Environment Setup

☆Evaluating the Diversity and Quality of LLM Generated Content

☆Evolving RL: Discovering New Activation Functions using LLMs

☆Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

☆From Pseudo-Code to Source Code: A Self-Supervised Search Approach

☆GenePrune : Automated Pruning of Large Language Models for Code using Genetic Algorithm

☆Generate-Feedback-Refine: How Much Does Model Quality in Each Role Matter?

☆Generating Code to Verify Cryptic Crossword Reasoning

☆GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code

☆Improving Automated Issue Resolution via Comprehensive Repository Exploration

☆InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

☆KernelBench: Can LLMs Write Efficient GPU Kernels?

☆LLM Program Optimization via Retrieval Augmented Search

☆LoRACode: LoRA Adapters for Code Embeddings

☆ML-BENCH: EVALUATING LARGE LANGUAGE MODELS AND AGENTS FOR MACHINE LEARNING TASKS ON REPOSITORY-LEVEL CODE

☆ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows

☆NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits

☆On Pretraining For Project-Level Code Completion

☆One Model to Train Them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings

☆Parameter-Efficient Instruction Tuning Code Large Language Models: An Empirical Study

☆Programming with Pixels: Towards Generalist Software Engineering Agents

☆Shedding Light on Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model

☆SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

☆Tasks, Challenges, and Paths Towards AI for Software Engineering

☆Teaching Language Models to Critique via Reinforcement Learning

☆Themisto: Jupyter-Based Runtime Benchmark

☆Toward Trustworthy Neural Program Synthesis

☆Training Software Engineering Agents and Verifiers with SWE-Gym

☆Type-Constrained Code Generation with Language Models

☆TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

☆WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation

Adaptive Self-improvement LLM Agentic System for ML Library Development

Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

Automated Benchmark Generation for Repository-Level Coding Tasks

BaxBench: Can LLMs Generate Correct and Secure Backends?

Black-Box Adversarial Attacks on LLM-Based Code Completion

CLOVER: A Test Case Generation Benchmark with Coverage, Long-Context, and Verification

Code2JSON: Can a Zero-Shot LLM Extract Code Features for Code RAG?

CodeEditorBench: Evaluating Code Editing Capability of LLMs

CodeTransEngine: Ready-to-use Backend for LLM-based Code Translation

Contextual Augmented Multi-Model Programming (CAMP): A Local-Cloud Copilot Solution

Cracking the Code of Action: a Generative Approach to Affordances for Reinforcement Learning

Diagnosing Robotics Systems Issues with Large Language Models – A Case Study

DISC: Dynamic Decomposition Improves LLM Inference Scaling

Do LLMs Understand Code Preference? Training Code Preference Models via Synthetic Code Evolution

EnvBench: A Benchmark for Automated Environment Setup

Evaluating the Diversity and Quality of LLM Generated Content

Evolving RL: Discovering New Activation Functions using LLMs

Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models

From Pseudo-Code to Source Code: A Self-Supervised Search Approach

GenePrune : Automated Pruning of Large Language Models for Code using Genetic Algorithm

Generate-Feedback-Refine: How Much Does Model Quality in Each Role Matter?

Generating Code to Verify Cryptic Crossword Reasoning

GRAIL: Graph Edit Distance and Node Alignment using LLM-Generated Code

Improving Automated Issue Resolution via Comprehensive Repository Exploration

InterTrans: Leveraging Transitive Intermediate Translations to Enhance LLM-based Code Translation

KernelBench: Can LLMs Write Efficient GPU Kernels?

LLM Program Optimization via Retrieval Augmented Search

LoRACode: LoRA Adapters for Code Embeddings

ML-BENCH: EVALUATING LARGE LANGUAGE MODELS AND AGENTS FOR MACHINE LEARNING TASKS ON REPOSITORY-LEVEL CODE

ML-Dev-Bench: Comparative Analysis of AI Agents on ML development workflows

NextCoder: Robust Adaptation of Code LMs to Diverse Code Edits

On Pretraining For Project-Level Code Completion

One Model to Train Them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings

Parameter-Efficient Instruction Tuning Code Large Language Models: An Empirical Study

Programming with Pixels: Towards Generalist Software Engineering Agents

Shedding Light on Task Decomposition in Program Synthesis: The Driving Force of the Synthesizer Model

SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution

Tasks, Challenges, and Paths Towards AI for Software Engineering

Teaching Language Models to Critique via Reinforcement Learning

Themisto: Jupyter-Based Runtime Benchmark

Toward Trustworthy Neural Program Synthesis

Training Software Engineering Agents and Verifiers with SWE-Gym

Type-Constrained Code Generation with Language Models

TypyBench: Evaluating LLM Type Inference for Untyped Python Repositories

WebMMU: A Benchmark for Multimodal Multilingual Website Understanding and Code Generation