ICML 2026 Past AI for science
ICML 2026 AI for Science Workshop
ICML2026-AI4Science
- Submission deadline
- May 8, 2026, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (176)
Fetched from OpenReview (v2) on 2026-06-10.
-
$\texttt{LAUGHS}$: An LLM-compatible Molecular String Representation
-
A Multi-Agent LLM Framework with Hierarchical Citation Graph for Automated Survey Generation
-
A Multimodal Literature Agent as Substrate for Autonomous Biology Research
-
A Precedent-Guided Co-Scientist for Side-Effect-Aware Drug Redesign
-
AblationBench: Evaluating Automated Planning of Ablations in Empirical AI Research
-
ABLE: Choosing Perturbation Experiments to Recover Gene Logic
-
Adopt Machine-Human Collaboration Peer-Review through Computational Research Assessment
-
Advancing Ligand-based Virtual Screening and Molecular Generation with Pretrained Molecular Embedding Distance
-
Adversarial Fast-Moving Real-World Domains as Test Beds For Benchmarking AI Scientist Capabilities
-
AFDBench: A Benchmark for Evaluating AI-Generated National Weather Service Forecast Discussions
-
Agent Systems for Academic Research Automation
-
Agent-Native Research Artifacts for AI Scientists
-
Agentic supervision of iterative assay design in high-throughput regulatory genomics
-
Agentic Systems for Sample-Efficient Drug Formulation Design
-
AgentSLR: Automating Systematic Literature Reviews in Epidemiology with Agentic AI
-
AI Scientist Agents and Biosecurity: Capabilities, Risks, and Governance for Autonomous Labs
-
AI-Assisted Descriptor Discovery for Electrochemical Interfacial States via Latent Organization of High-Throughput EIS
-
AISFlow: Boundary-Informed Flow Matching for Long-Term AIS Trajectory Imputation
-
All-Atom GPCR-Ligand Dynamics Simulation via a Residual Isometric Latent Flow Model
-
An AI Scientist that Doesn't Drift: Taste, Structure, and Falsifiable Findings in a Quadruped Navigation Research Loop
-
An LLM in Two Discovery Experiments for Extreme Astrophysics: Promising Tool and Co-author, Not Fully Independent Yet
-
ARK: AI Research Harness - Offload the Labour, Steer the Science
-
Asking the Right Question: Epistemic Inquiry as a Learnable Reasoning Skill for Scientific Discovery
-
AtomWorld: A Benchmark for Evaluating Spatial Reasoning in Large Language Models on Material Structures
-
Audit-Grade Harness for Agent-driven Scientific Computation Workflows
-
Augmented Equivariant Mesh Networks for Anatomical Mesh Segmentation
-
Automated Prototyping of Behavioral Experiments with Large Language Models
-
Automating cognitive distillation for expert-level scientific literature synthesis
-
Bayesian Last Layer for Neural Force Fields
-
Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials
-
Beyond Scalar Electrostatics: Multipole Features for Long-Range Molecular Machine Learning
-
Beyond Static Snapshots: A Large-Scale Dataset for Dynamics-Aware Protein-Nucleic Acid Modeling
-
Beyond Tool Use: Multimodal Distillation and the Evolution of Neural Networks toward AI Scientists
-
Bi-semantic Chemical Embedder for Joint Representation Learning of SMILES and Natural Language
-
BiomedBench Suite: Benchmarks for Evaluating LLM Performance on Biomedical Reasoning Tasks
-
Bruno: An AI Product Manager for Scientists
-
CADEngBench: Can AI Systems Co-Author Engineering Designs? A Hierarchical Benchmark for Physics-Verified Parametric CAD Generation
-
ChemPRM: Improving Retrosynthesis by Structured Intermediate Process Reward
-
ClaimGarden: Update-Aware Claim-State Control for AI Scientist Workflows
-
CloudFlow: A Flow Matching Model to Generate High-Resolution Cloud Structures
-
Compressing the Validation Bottleneck: An Agentic Self-Driving Lab for Metal Additive Manufacturing
-
Context-Aware and Controllable Academic Paper Revision via Human-AI Collaboration
-
Coupled Integral PINN for Discontinuity
-
Credit Where Credit Is Due: A Taxonomy of AI Contributions to Scientific Discovery and Recommendations for Authorship Policy
-
CreditMap: Provenance Ledgers for Attribution in Human--AI Scientific Collaboration
-
CRYSTAL: Coordinated Multi-Objective Reinforcement Learning for Crystal Generation
-
CrysTune: Crystal Generation via Fine-Tuning of Large Language Models on Wyckoff Representations
-
DARWIN: A Framework for Target Specific Diversity Constrained Natural Product Like Molecule Generation
-
Dead Science Walking: Publication Bias and the AI Scientist Pipeline
-
Density of States-Intermediated Crystal Generation for Material Inverse Design
-
Diagnostic Foundation for Evaluating LLMs' Research Integrity as Co-Scientists
-
DiffResearch: Harnessing Diffusion Language Models for Automated Literature Review
-
Diffusion Sampling of Adsorbate Configurations on Catalyst Surfaces
-
Does Verbal Self-Reflection Transfer to Long-Horizon Scientific Discovery?
-
Domain-Prior-Regularized Graph Modeling for Anomaly Detection in Cyber-Physical Systems
-
Dynamic large language model representations for multi-objective chemical reaction optimisation
-
Effective Harness Engineering for Algorithm Discovery with Coding Agents
-
Efficient Cross-Functional Learning for Atomistic Modeling of Materials
-
Efficient Vision Transformer-based Surrogate for Scalable Pressure Prediction in Incompressible Turbulent Flows
-
Enabling Robust Epidemic Control via LLM-Elicited Causal Discovery
-
Energy-Based Operator Learning in Function Space
-
EO-Agents: A Three-Agent LLM Pipeline for Earth Observation Hypothesis Generation
-
Evaluating System Design Choices in Biomedical AI Agents
-
Evaluating the Progression of Large Language Model Capabilities for Small-Molecule Drug Design
-
Evidence-Grounded Tree-Based Hypothesis Generation for Scientific Discovery
-
EVIL Baseline: LLM-Discovered Heuristics can be strong Baselines for Scientific Inference
-
EvoMaster: A Foundational Evolving Agent Framework for Agentic Science at Scale
-
EvoVLM: Multimodal Evolutionary Feedback for Visual Symbolic Regression
-
Experimental Attempts in Electronic Lab Notebooks: A Dataset Proposal for Scientific Debugging
-
Expert-guided Bayesian optimization for sustainable protein formulation
-
FabScore: Fine-Grained Evaluation of Fabrications in Automated AI Research
-
FIRSTPASS: A Multi-Domain, Multi-Round Peer Review Dataset Grounded in Real Editorial Outcomes
-
FirstPass: Grounding AI Scientific Judgment in Multi-Round Editorial Outcomes
-
FLAME: Physics-Guided Neural Operators for Onboard Satellite Methane Detection in Hyperspectral Imagery
-
FluorCode: Predicting Fluorescent Protein Photophysical Properties with LoRA-Fine-Tuned Protein Language Models
-
From Literature to Experimental Conditions: Large Language Models for Co-Crystal Synthesis Design
-
From Ricci Curvature to Metric Matching: A Simplified Approach to Geometric Transfer Learning
-
GAE: Graph-Augmented Evolution for Scientific Discovery via Reinforcement Optimization
-
Generative Pipeline for Discovering Solid-State Battery Materials with Universal Atomistic Potentials
-
Geo-Expert: Towards Expert-Level Geological Reasoning via Parameter-Efficient Fine-Tuning
-
Graph-Based Cross-Modal Learning for Drug–Target Affinity Prediction
-
Grounded autonomous research: a fault-tolerant LLM pipeline from corpus to manuscript in frontier computational physics
-
Grounded autonomous scrutiny at scale: emergent critique from reproduction of published computational physics papers
-
Gyaradax: Local Gyrokinetics JAX Code
-
Hierarchical Discovery of Adiabatic Hamiltonian Paths and RL Schedules for Quantum Linear System Solvers
-
In silico evaluation of pre-training strategies based on synthetic data for functional DNA generation
-
INDIBATOR: Diverse and Fact-Grounded Individuality for Multi-Agent Debate in Molecular Discovery
-
LabProc and Tacit: Quantifying the Visual-Textual Prior Gap in Autonomous Laboratory Perception
-
Lang2MLIP: End-to-End Language-to-Machine Learning Interatomic Potential Development with Autonomous Agentic Workflows
-
Large Language Models as Generative Bayesian Policies
-
Learning Reaction-Condition Plausibility for Evaluation and Self-Curation Under Noisy and Non-Unique Supervision
-
Leveraging Biokinetic Knowledge Priors for Data-Scarce Bioprocess Modeling
-
Lithography Solvent Discovery using Neuro-Symbolic Search Agent
-
LitXBench: A Benchmark for Extracting Experiments from Scientific Literature
-
LLM Agents for Distribution-Aware Algorithmic Discovery
-
LOD Surrogate Modelling for multiscale Darcy-Flow Problems
-
Look Before You Leap: Improving Structure-Based Drug Optimization with Attribution-Guided Genetic Operators
-
MADField: Multi-fidelity Amortized Density Field for Adsorption in Nanoporous Materials
-
MADS-CPS: A Machine-Checkable Admissibility Contract for AI Scientists in Autonomous Laboratories
-
MATAI: A Unified Interactive Platform for AI-Driven Alloy Discovery
-
MatDeplot: Agent-Ready Materials-Curve Understanding for Scientific Reasoning
-
MELON: Multimodal Learning Framework for Spatial Multimodal Omics Data Integration
-
MiRS-AI: A Data-Driven Framework for Accurate and Efficient Atmospheric Profile Retrieval from Microwave Observations
-
Mobus: Data Infrastructure for Researchers and Autonomous Scientific Agents
-
MOFology: A Knowledge Graph for Engineering Direct Air Capture Materials
-
Navigating Order-(Dis)Order Family Trees via Group-Subgroup Transitions
-
NeuriCo: Towards reliable AI scientists
-
NMR Elucidation as an Agentic Search Problem, Not a Modeling Problem
-
On the Effects of Reasoning Effort and Prompt-Based Diversification on Scientific Ideation Diversity
-
Open Electrolyte Databank: Unlocking Molecular-Mixture Intelligence for Battery Electrolyte Discovery through a Standardized, Multimodal Foundation
-
OpenDiscoveryTrace: Process Traces for Evaluating AI Scientist Workflows
-
OpenSeesAgentBench: A Benchmark and Evaluation Framework for Agentic Structural Analysis in OpenSeesPy
-
PaperDoctor: Evidence-Grounded and Actionable Feedback for Scientific Papers in Progress
-
Parameter-Efficient Adaptation of a Pretrained Language Model via Soft Prompt Tuning Enables Hit-Enriched Conditional Cell Line-Specific Generation of Cell-Penetrating Peptides
-
Patch size and its effect on representations in MAEs
-
Peak Risk Score: A Peak-Space Verification Layer for AI Scientists in NMR-Guided Molecular Discovery
-
PepLang-Bench - Evaluating Large Language Models Understanding On Peptide Related Tasks
-
Periodic Complex Stochastic Processes for Retrieving Atomic Structures of Unknown Matters
-
Physics Is All You Need? A Case Study in Physicist-Supervised AI Development of Scientific Software
-
Physics-Informed Gaussian Processes for Hardness Prediction in Refractory High Entropy Alloys
-
Physics-Informed Graph Learning Acceleration for Large-Scale AC-OPF with Topology Changes
-
Polygenic-by-Environment Adjustment for Binary GWAS with Out-of-Fold Block-PRS and Low-Rank Bilinear Models
-
Position: AI Should Verify, Not Judge, Scientific Work
-
Position: Correct Answer, Wrong Mechanism - When AI Scientists Defend General Claims Their Own Data Contradicts
-
Position: Preventing the Collapse of Peer Review Requires Verification-First AI
-
Practical Bayesian Optimization for Scientific Discovery
-
Pre-training on noncovalent interactions from synthetic protein-ligand structures to better predict binding affinity
-
Pretrained Medical Representations for Practical Screening of Drug Repositioning Candidates
-
Pretrained Model Representations as Acquisition Signals for Active Learning of MLIPs
-
PRISM: Problem Discovery via Structural Motifs in Knowledge Graphs
-
Propose, Critique, Falsify: Benchmarking Self-Verifying AI Scientists
-
Recursive Flow Matching
-
REFLEX: Reflective Evolution from LLM Experience
-
REPA: : Reproducibility Evaluation via an Autonomous Pipeline Architecture
-
Retro-Forge: A Multi-Step Pairwise Retrosynthesis Framework for Solid-State Materials Synthesis
-
RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking
-
ReviewArena: A Large-Scale Cross-Conference Dataset and Benchmark for LLM Peer Review
-
Revisiting Vanilla Bayesian Optimization in High-Dimensional Permutation Spaces
-
RosettaSearch: Multi-Objective Inference-Time Search for Protein Sequence Design
-
RSI for Science: A Verifier-First Framework for AI Scientists
-
Rubric-Grounded Reinforcement Learning:Structured Judge Rewards for Generalizable Reasoning in Language Models
-
Sample Efficient Generative Optimization for Molecular Design
-
Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation
-
SciContrib-Bench: Mapping the Autonomy Landscape of AI Scientists Through Stage-Dependent Detectability
-
SciMem: Scientific Reasoning with Structured Memory for Materials Design
-
SciPaths: Forecasting Pathways to Scientific Discovery
-
SciReview: A Benchmark for Evaluating Frontier AI for Scientific Review
-
SEAL - A Symmetry Encouraging Loss for High Energy Physics
-
SeisAgentBench: A Failure-Aware Core Benchmark for Multi-Agent Earthquake Rupture Inversion and Forward Validation
-
SENPAI: Self-ExperimentatioN for Physical AI An Observability-Based Research Harness
-
SF-Cluster: Frustration-Aware MSA Subsampling for Protein Conformation Modeling
-
Sibyl: A Multi-Agent Pipeline for Autonomous Hypothesis Generation
-
Sibyl: Temporal Backtesting for Literature-Based Scientific Discovery with Large Language Model Agents
-
SP-Mind: An Autonomous Reasoning Agent for Spatial Proteomics Analysis
-
SR-Scientist: Scientific Equation Discovery With Agentic AI
-
State-Aware Policy Optimization for a Reliable Multi-Turn, Multi-Tool Scientific Agent in Kinetic Biological Models
-
Symmetry-Constrained Gaussian Processes for Sample-Efficient Molecular Property Prediction
-
Synthesizability-Aware Materials Generation with Target Properties via Reinforcement Learning
-
TasteBench: multimodal benchmark for sensory prediction, from molecules to sustainable foods
-
The Few-Shot Unreliability of Molecular Foundation Models: A Geometric Diagnosis and Partial Remedy
-
The Latent-First Laboratory: A Manifesto for Efficient, Audit-Based AI Science
-
The Novelty Ceiling: PAC-Theoretic Bounds on Autonomous Scientific Discovery and the Minimum Oversight Rate
-
The Socratic Scientist: Designing LLM Agentic Harness for Long-Running Computational Science
-
The Story is Not the Science: Execution-Grounded Evaluation of Mechanistic Interpretability Research
-
To aggregate or Not to Aggregate? Test-Time Aggregation Beyond Verifier-Friendly Benchmarks
-
TotSyn: A Total Synthesis Reaction Dataset for Machine Learning in Organic Chemistry
-
Towards AI-Driven Recommendation of Liquid Chromatography Conditions for Chemical Reactions
-
Towards Self-Evolving Agentic Literature Retrieval
-
TreeAgent: A Generalizable Multi-Agent Framework for Automated Bias Labeling in Forestry via Compiled Expert Rules and Vision-Language Models
-
TRIAGE: An AI Scientist for Adversarial Target Falsification
-
Trusted Convergence and Knowing What We Know Together: Privacy-Preserving Knowledge Discovery Across Neurodegenerative Disease Institutes
-
TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment
-
U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
-
What the Records Don't Carry: A Position on Researcher-AI Co-Adaptation in Exploratory Laboratory Research
-
When Does Structure Help? The Information Bonus of AlphaFold2 Representations over Protein Language Models
-
When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery