CVPR 2026PastMultimodalNeuroscience

The 1st CogVL: Cognitive Foundations for Multimodal Models Workshop at CVPR 2026

CVPR 2026 Workshop CogVL

Official website ↗OpenReview venue ↗See all CVPR workshops →✎ Edit this entry

Submission deadline: Mar 7, 2026, 12:00 UTC
OpenReview-synced 2026-03-07 12:00 UTC (as of 2026-06-23) — extensions on OpenReview are applied automatically; verify on the website.
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (17)

Fetched from OpenReview (v2) on 2026-06-10.

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection
Daniel Harari, Michael Sidorov, Chen Shterental, Liel David, Abrham Kahsay Gebreselasie, Muhammad Haris Khan · PDF
Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models
Patrick Batsell, Satoshi Tsutsui, Bihan Wen · PDF
Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions
Saurav Sengupta, Nazanin Moradinasab, Jiebei Liu, Donald E. Brown · PDF
CounterBench: A Controllable Counterfactual Testbed Reveals Systematic Reasoning Failures in Vision-Language Models
Aayam Bansal, Ishaan Gangwani · PDF
CP-VLM: Causal Prompting for Human Intention Inference with Vision–Language Models
KAZUKI OSAMURA, Hidetsugu Uchida, Narishige Abe · PDF
Do Vision-Language Models Revise Beliefs or Just Rationalize? Evidence Update Prompting for Non-Monotonic Visual Reasoning
Aayam Bansal, Ishaan Gangwani · PDF
Jailbreaking Vision-Language Models Through the Visual Modality
Aharon Azulay, Jan Dubiński, Zhuoyun Li, Atharv Mittal, Yossi Gandelsman · PDF
Knowing When You Don’t Know: Metacognitive Uncertainty Calibration in Vision--Language Models
Mahule Roy, Subhas Roy · PDF
Latent-Stability Gated SAM: Detecting Hallucinated Segmentations under Domain Shift
Muhammad Imran, Yugyung Lee · PDF
Let Androids Dream of Electric Sheep: A Human-Inspired Image Metaphor Understanding and Reasoning Framework
Chenhao Zhang, Yazhe Niu · PDF
MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models
Anh Thai, Stefan Stojanov, Zixuan Huang, Bikram Boote, James Matthew Rehg · PDF
Multimodal Graph-of-Thoughts: Hypothesis-Verification Graphs for Multimodal Reasoning in Vision-Language Models
Irina Belyaeva · PDF
Relational Visual Similarity
Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng Li · PDF
The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs
Tejas Anvekar, Fenil Bardoliya, Pavan K. Turaga, Chitta Baral, Vivek Gupta · PDF
Think Slow, See Better? Dual-Process Prompting for Vision-Language Model Calibration
Aayam Bansal, Ishaan Gangwani · PDF
VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models
Pritam Sarkar, Ali Etemad · PDF
Vision–Language Pretraining with Structured Distractor Augmentation
Prasanth · PDF

Accepted papers (17)

☆Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection

☆Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models

☆Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions

☆CounterBench: A Controllable Counterfactual Testbed Reveals Systematic Reasoning Failures in Vision-Language Models

☆CP-VLM: Causal Prompting for Human Intention Inference with Vision–Language Models

☆Do Vision-Language Models Revise Beliefs or Just Rationalize? Evidence Update Prompting for Non-Monotonic Visual Reasoning

☆Jailbreaking Vision-Language Models Through the Visual Modality

☆Knowing When You Don’t Know: Metacognitive Uncertainty Calibration in Vision--Language Models

☆Latent-Stability Gated SAM: Detecting Hallucinated Segmentations under Domain Shift

☆Let Androids Dream of Electric Sheep: A Human-Inspired Image Metaphor Understanding and Reasoning Framework

☆MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models

☆Multimodal Graph-of-Thoughts: Hypothesis-Verification Graphs for Multimodal Reasoning in Vision-Language Models

☆Relational Visual Similarity

☆The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs

☆Think Slow, See Better? Dual-Process Prompting for Vision-Language Model Calibration

☆VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

☆Vision–Language Pretraining with Structured Distractor Augmentation

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection

Benchmarking Attribute Discrimination in Infant-Scale Vision-Language Models

Can Vision-Language Models Count? A Synthetic Benchmark and Analysis of Attention-Based Interventions

CounterBench: A Controllable Counterfactual Testbed Reveals Systematic Reasoning Failures in Vision-Language Models

CP-VLM: Causal Prompting for Human Intention Inference with Vision–Language Models

Do Vision-Language Models Revise Beliefs or Just Rationalize? Evidence Update Prompting for Non-Monotonic Visual Reasoning

Jailbreaking Vision-Language Models Through the Visual Modality

Knowing When You Don’t Know: Metacognitive Uncertainty Calibration in Vision--Language Models

Latent-Stability Gated SAM: Detecting Hallucinated Segmentations under Domain Shift

Let Androids Dream of Electric Sheep: A Human-Inspired Image Metaphor Understanding and Reasoning Framework

MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models

Multimodal Graph-of-Thoughts: Hypothesis-Verification Graphs for Multimodal Reasoning in Vision-Language Models

Relational Visual Similarity

The Perceptual Observatory Characterizing Robustness and Grounding in MLLMs

Think Slow, See Better? Dual-Process Prompting for Vision-Language Model Calibration

VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models

Vision–Language Pretraining with Structured Distractor Augmentation