NeurIPS 2025 Past Healthcare & biology
The Second Workshop on GenAI for Health: Potential, Trust, and Policy Compliance
GenAI4Health 2025
- Submission deadline
- Sep 6, 2025, 11:59 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (99)
Fetched from OpenReview (v2) on 2026-06-10.
-
3D Brain MRI Generation with a Clinically-Conditioned VAE-GAN and Diffusion-Driven Feature Sampling
-
An Interactive Framework for Generating Clinical Data with Human Feedback
-
Application of Whisper in Clinical Practice: the Post-Stroke Speech Assessment during a Naming Task
-
ArtifactGen: Benchmarking WGAN-GP vs Diffusion for Label-Aware EEG Artifact Synthesis
-
Automatic Correction of AI Reports using Fact-Checking Model-guided LLMs
-
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
-
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
-
Beyond Overall Accuracy: A Psychometric Deep Dive into the Topic-Specific Medical Capabilities of 80 Large Language Models
-
Bridging Graph and State-Space Modeling for Intensive Care Unit Length of Stay Prediction
-
Brittleness and Promise: Knowledge Graph–Based Reward Modeling for Diagnostic Reasoning
-
Can You Spot the Virtual Patient (VP)? Expert Evaluation, Turing Test, Linguistic Analysis, and Semantic Similarity Analysis
-
CancerGUIDE: Cancer Guideline Understanding via Internal Disagreement Estimation
-
ChatThero: A Language Agent for Recovery Support
-
Clinically Grounded Agent-based Report Evaluation: An Interpretable Metric for Radiology Report Generation
-
Count-Based Approaches Remain Strong: A Benchmark Against Transformer and LLM Pipelines on Structured EHR
-
Demo: An Agentic Multi-Persona Generative AI System for Mental Health Companionship
-
Demo: Building Maternal Health LLMs for Low-Resource Settings
-
Demo: Can Visual Stimulation Enhance Reminiscence-Therapy Chatbot?
-
Demo: Clinically Diverse Chest X-ray Synthesis via Cross-Modal Conditioning
-
Demo: Customizing Open-Source LLMs for Quantitative Medication Attribute Extraction across Heterogeneous EHR Systems
-
Demo: Generative AI helps Radiotherapy Planning with User Preference
-
Demo: Guide-RAG: Evidence-Driven Corpus Curation for Retrieval-Augmented Generation in Long COVID
-
Demo: H2AI: A Framework for Experiential Learning and De-Risking Generative AI in Healthcare
-
Demo: Medbot, A Practical Tool Built by Clinicians for Clinicians to Leverage AI Agents to Enhance Clinical Practice
-
Demo: Orchestrating Large Language Model Agents and Resources for Medical Deep Research
-
Demo: PeerCoPilot: A Language Model-Powered Assistant for Behavioral Health Organizations
-
Demo: PharmaData-Agent: A Specialized Agent for Pharmaceutical Data Analysis
-
Demo: Sanitizing Medical Documents with Differential Privacy using Large Language Models
-
Demo: Statistically Significant Results on Biases and Errors of LLMs Do Not Guarantee Generalizable Results
-
Demo: Streamlining Health Insurance Claims Verifications with AI-Blockchain Integration through AI+ROAX (Rod of Asclepius eXchange)
-
Demo: Towards Generating Long-Sequence Sleep Heart Rate Signals with Conditional Diffusion
-
Detecting Synthetic Radiology Reports using Style Disentanglement
-
Editing with AI: How Doctors Refine LLM-Generated Answers to Patient Queries
-
Enhancing Fine-Tuning-Free Clinical Reasoning via Test-Time Scaling
-
Examining the Vulnerability of Multi-Agent Medical Systems to Human Interventions for Clinical Reasoning
-
Explainable Insulin Pump Control with LLMs for Type 1 Diabetes
-
Explaining Temporal Effects in Sepsis Prediction
-
FairGRPO: Towards Fair Reasoning Foundation Models for Clinical Diagnosis
-
Faithful or Just Plausible? Evaluating Faithfulness for Medical Reasoning in Closed-Source LLMs
-
FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health
-
Foresight-England: Development of a National-Scale Generative AI Model of Patient Electronic Health Records for General Medical Event Prediction across the COVID-19 Pandemic
-
GRASP: Graph Reasoning Agents for Systems Pharmacology with Human-in-the-Loop
-
H-DDx: A Hierarchical Evaluation Framework for Differential Diagnosis
-
HealthSLM-Bench: Benchmarking Small Language Models for Mobile and Wearable Healthcare Monitoring
-
Hearing Health in Home Healthcare: Leveraging LLMs for Illness Scoring and ALMs for Vocal Biomarker Extraction
-
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
-
Improvisational Reasoning with Vision-Language Models for Grounded Procedural Planning
-
K-Stain: Keypoint-Driven Correspondence for H\&E-to-IHC Virtual Staining
-
Large Language Models as Medical Codes Selectors: a benchmark using the International Classification of Primary Care
-
Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes
-
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
-
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale
-
MedBrowseComp: Benchmarking Medical Deep Research and Computer Use
-
MedGUIDE: Benchmarking Clinical Decision-Making in Large Language Models
-
Medical thinking with multiple images
-
MedVAL: Toward Expert-Level Medical Text Validation with Language Models
-
MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
-
Mending synthetic data with MAPS: Model Agnostic Post-hoc Synthetic Data Refinement Framework
-
Mind the Gap: Aligning Knowledge Bases with User Needs to Enhance Mental Health Retrieval
-
Mixture-of-Experts Guided Multi-Omic Integration for Gastrointestinal Cancer Subtype Prediction
-
Modeling PTSD Trajectories with Conditional SVAEs and Synthetic Data Generation: Data-Efficient Prediction and Outcome-Specific Explainability
-
Multi-Turn LLM Systems for Diagnostic Decision-Making: Considerations, Biases, and Challenges
-
Natural Language Grounded Reinforcement Learning for Clinical Decision-Making in Virtual Patient Simulations
-
Ordinal Label-Distribution Learning with Constrained Asymmetric Priors for Imbalanced Retinal Grading
-
PAME-AI: Patient Messaging Creation and Optimization using Agentic AI
-
Pandemic-Potential Viruses are a Blind Spot for Frontier Open-Source LLMs
-
Physician Perceptions of Large Language Models in Clinical Practice: A Mixed-Methods Survey Study
-
Position: Adjacent Technologies Are the Key Enablers of Scalable and Safe Clinical MLLM Deployment
-
Position: AI Will Transform Neuropsychology Through Mental Health Digital Twins for Dynamic Mental Health Care, Especially for ADHD
-
Position: AI-Driven Risk Stratification is Essential for Affordable Early Detection of Cancer
-
Position: CARE-RAG: Clinical Assessment and Reasoning in RAG
-
Position: Communities of Practice can be used to Address Challenges to Regulation and Governance of Generative AI in South East Asian Countries
-
Position: Ophthalmology as a Lens for Trustworthy GenAI in Europe---Uncertainty-Aware AI under the EU AI Act
-
Position: Restricted Release of Advanced Biological Models Safeguards Biosecurity
-
Position: Specialty Society-Led Meta-Governance is Essential to Responsible Implementation of Generative AI in Cardiovascular Care
-
Position: The Pitfalls of Over-Alignment: Overly Caution Health-Related Responses From LLMs are Unethical and Dangerous
-
Position: Thematic Analysis of Unstructured Clinical Transcripts with Large Language Models
-
PRISM: Physician Rules Integrated with Small large language Models for probable diagnoses associated with Abdominal Pain
-
QRad: Enhancing Radiology Report Generation by Captioning-to-VQA Reframing
-
Reliable or Risky? Assessing Diffusion Models for Biomedical Data Generation
-
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
-
RPRO: Ranked Preference Reinforcement Optimization for Enhancing Medical QA and Diagnostic Reasoning
-
Scalable Whole-Slide Vision-Language Modeling with Learned Token Pruning
-
SecureRAG: End-to-End Secure Retrieval-Augmented Generation
-
Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs
-
Stabilizing Reasoning in Medical LLMs with Continued Pretraining and Reasoning Preference Optimization
-
SynLLM: A Comparative Analysis of Large Language Models for Medical Tabular Synthetic Data Generation via Prompt Engineering
-
The Biased Oracle: Assessing LLMs’ Understandability and Empathy in Medical Diagnoses
-
The Energy to Say No: Pre-Generation Abstention for Safety-Critical Medical RAG
-
Towards Application Aligned Synthetic Surgical Image Synthesis
-
Towards Memory-Efficient Foundation Models in Medical Imaging: A Federated Learning and Knowledge Distillation Approach
-
Towards Synthesizing Normative Data for Cognitive Assessments Using Generative Multimodal Large Language Models
-
Traj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction
-
Unanchoring the Mind: AI-Guided Counterfactual Reasoning for Rare Disease Diagnosis
-
Uncovering Intervention Opportunities for Suicide Prevention with Language Model Assistants
-
Using LLM-as-a-Judge/Jury to Advance Scalable, Clinically-Validated Safety Evaluations of Model Responses to Users Demonstrating Psychosis
-
Vision-Language Reasoning for Burn Depth Assessment with Structured Diagnostic Hypotheses
-
When the Domain Expert Has No Time and the LLM Developer Has No Clinical Expertise: Real-World Lessons from LLM Co-Design in a Safety-Net Hospital
-
Zero-Shot Large Language Model Agents for Fully Automated Radiotherapy Treatment Planning