CVPR 2026PastComputer vision

Third Workshop on Visual Concepts

VisCon 2026

Official website ↗OpenReview venue ↗See all CVPR workshops →✎ Edit this entry

Submission deadline: Apr 2, 2026, 07:59 UTC
OpenReview-synced 2026-04-02 07:59 UTC (as of 2026-06-23) — extensions on OpenReview are applied automatically; verify on the website.
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (27)

Fetched from OpenReview (v2) on 2026-06-10.

A Taxonomy-Aware Evaluation for Open-Vocabulary Wildlife Detection
Wenqi Xue, Pengxi Zhang, William Wang, Yijia Cai, Jieyu Zhang · PDF
CGEBench: Benchmarking Concept Generalization of Promptable Image Segmentation Models
Alexander von Recum, Christoph Schnabl · PDF
ConceptOT: Fine-Grained Vision-Language Alignment via Low-Rank Unbalanced Optimal Transport
Pawan Kumar · PDF
CTRL-STEER: Closed-Loop Neuron Activation Control in Vision-Language-Action Models
Abhijith Babu, Ramneet Kaur, Nathaniel D. Bastian, Olivera Kotevska, Susmit Jha, Yanzhao Wu, Sumit Kumar Jha, Anirban Roy · PDF
Dissecting Representation Structure in Vision Transformers: A Rigorous Architectural Study
Kim-Cuc Nguyen, Ngai-Man Cheung · PDF
Do VLMs Reason About Faces? Probing the Perception-Reasoning Gap in Identity Judgment
Mahsa Khoshnoodi, Sarah Adel Bargal · PDF
Entropy-based Patchification Creates Semantic Tokens
Suhao Yu, Jingjia Peng, Yao Tang, Jiatao Gu · PDF
Forecasting Animal Motion in the Wild
Neerja Thakkar, Shiry Ginosar, Jacob C Walker, Jitendra Malik, Joao Carreira, Carl Doersch · PDF
From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories
Minghao Fu, Sheng Zhang, Guangyi Chen, Zijian Li, Fan Feng, Yifan Shen, Shaoan Xie, Heng Huang, Kun Zhang · PDF
Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles
Zacharie Bugaud · PDF
Improved Vision-Language Alignment via Text-Conditioned Image Embeddings using Sparse Autoencoders
Sweta Mahajan, Sukrut Rao, Jiahao Xie, Alexander Koller, Bernt Schiele · PDF
INSID3: Training-Free In-Context Segmentation with DINOv3
Claudia Cuttano, Gabriele Trivigno, Christoph Reich, Daniel Cremers, Carlo Masone, Stefan Roth · PDF
LandCIS: Hierarchical Semantic Anchoring for Concept-Centric Continual Segmentation
Yuyin Ma, Yijian wu, Wang Xinyu, Yijun Lu, Zhen Tian, Ming YAN, Yunni Xia · PDF
Learning Sparse Visual Representations via Spatial-Semantic Factorization
Theodore Zhengde Zhao, Sid Kiblawi, Jianwei Yang, Naoto Usuyama, Reuben Tan, Noel C Codella, Tristan Naumann, Hoifung Poon, Mu Wei · PDF
MCSBench: Probing Multimodal Conceptual Structure of Multimodal LLMs
Sheng Zhang, Minghao Fu, Kelin Yu, Tong Zheng, Guangyi Chen, Hong Jiao, Salman Khan, Zhiqiang Shen, Heng Huang · PDF
Most of This Video Is Boring
Anya Singh, Jiahang He, Varun Nair, Jai Relan, Vidyut Baradwaj, Cabrel Happi · PDF
Multi-hop Relational Contrastive Learning: Extending Spatial Contrastive Pre-training Beyond Pairwise Relations
Sheikh Tanvir Ahmed, Md. Tanvir Raihan · PDF
Seeing Only What Exists: Visibility-Aware Contrastive Learning for Concept-Level Hallucination in Vision–Language Models
Hikaru Shijo, Yutaka Yoshihama, Yasunori Ishii, Takayoshi Yamashita · PDF
Self-Consistency for LLM-Based Motion Trajectory Generation and Verification
Jiaju Ma, R. Kenny Jones, Jiajun Wu, Maneesh Agrawala · PDF
Semantic Concept Conditioning for State Space Image Super-Resolution
Andrii Ahitoliev, Bohdan Milian, Oleh Shtohryn, Anna-Alina Bondarets, Alina Labaz, Taras Rumezhak, Volodymyr Karpiv · PDF
SPOT: Structured Prompting with Object-centric Tokens for open-world scene graphs
Mengqi Zhang, Sahil Khose, Fiona Ryan, Judy Hoffman · PDF
Test-Time Visual Concept Anchoring via Entropic Optimal Transport
Pawan Kumar · PDF
Toward Compact and Structured Visual Representations in VLMs: SSM-Based Vision Encoders as an Alternative to Transformers
Shang-Jui Ray Kuo, Paola Cascante-Bonilla · PDF
Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models
Hayeon Kim, Ji Ha Jang, Junghun James Kim, Se Young Chun · PDF
VCode: A Multimodal Coding Benchmark with SVG as Symbolic Visual Representation
Kevin Qinghong Lin, Yuhao Zheng, Hangyu Ran, Dongxing Mao, Linjie Li, Philip Torr, Alex Jinpeng Wang · PDF
VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images
Zhaonan Li, Kyle R. Chickering, Bangzheng Li, Jacob Dineen, Xiao Ye, Zhikun Xu, Shijie Lu, Yuxi Huang, Ming Shen, Bach Nguyen, Jaya Adithya Pavuluri, Mau Son Nguyen, Sanika Chavan, Ngoc Minh Thu Le, Muhao Chen, Ben Zhou · PDF
WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation
Varun Nair, Vidyut Baradwaj, Jiahang He, Anya Singh, Jai Relan, Cabrel Happi · PDF

Accepted papers (27)

☆A Taxonomy-Aware Evaluation for Open-Vocabulary Wildlife Detection

☆CGEBench: Benchmarking Concept Generalization of Promptable Image Segmentation Models

☆ConceptOT: Fine-Grained Vision-Language Alignment via Low-Rank Unbalanced Optimal Transport

☆CTRL-STEER: Closed-Loop Neuron Activation Control in Vision-Language-Action Models

☆Dissecting Representation Structure in Vision Transformers: A Rigorous Architectural Study

☆Do VLMs Reason About Faces? Probing the Perception-Reasoning Gap in Identity Judgment

☆Entropy-based Patchification Creates Semantic Tokens

☆Forecasting Animal Motion in the Wild

☆From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories

☆Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

☆Improved Vision-Language Alignment via Text-Conditioned Image Embeddings using Sparse Autoencoders

☆INSID3: Training-Free In-Context Segmentation with DINOv3

☆LandCIS: Hierarchical Semantic Anchoring for Concept-Centric Continual Segmentation

☆Learning Sparse Visual Representations via Spatial-Semantic Factorization

☆MCSBench: Probing Multimodal Conceptual Structure of Multimodal LLMs

☆Most of This Video Is Boring

☆Multi-hop Relational Contrastive Learning: Extending Spatial Contrastive Pre-training Beyond Pairwise Relations

☆Seeing Only What Exists: Visibility-Aware Contrastive Learning for Concept-Level Hallucination in Vision–Language Models

☆Self-Consistency for LLM-Based Motion Trajectory Generation and Verification

☆Semantic Concept Conditioning for State Space Image Super-Resolution

☆SPOT: Structured Prompting with Object-centric Tokens for open-world scene graphs

☆Test-Time Visual Concept Anchoring via Entropic Optimal Transport

☆Toward Compact and Structured Visual Representations in VLMs: SSM-Based Vision Encoders as an Alternative to Transformers

☆Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

☆VCode: A Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

☆VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

☆WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation

A Taxonomy-Aware Evaluation for Open-Vocabulary Wildlife Detection

CGEBench: Benchmarking Concept Generalization of Promptable Image Segmentation Models

ConceptOT: Fine-Grained Vision-Language Alignment via Low-Rank Unbalanced Optimal Transport

CTRL-STEER: Closed-Loop Neuron Activation Control in Vision-Language-Action Models

Dissecting Representation Structure in Vision Transformers: A Rigorous Architectural Study

Do VLMs Reason About Faces? Probing the Perception-Reasoning Gap in Identity Judgment

Entropy-based Patchification Creates Semantic Tokens

Forecasting Animal Motion in the Wild

From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories

Hidden Clones: Exposing and Fixing Family Bias in Vision-Language Model Ensembles

Improved Vision-Language Alignment via Text-Conditioned Image Embeddings using Sparse Autoencoders

INSID3: Training-Free In-Context Segmentation with DINOv3

LandCIS: Hierarchical Semantic Anchoring for Concept-Centric Continual Segmentation

Learning Sparse Visual Representations via Spatial-Semantic Factorization

MCSBench: Probing Multimodal Conceptual Structure of Multimodal LLMs

Most of This Video Is Boring

Multi-hop Relational Contrastive Learning: Extending Spatial Contrastive Pre-training Beyond Pairwise Relations

Seeing Only What Exists: Visibility-Aware Contrastive Learning for Concept-Level Hallucination in Vision–Language Models

Self-Consistency for LLM-Based Motion Trajectory Generation and Verification

Semantic Concept Conditioning for State Space Image Super-Resolution

SPOT: Structured Prompting with Object-centric Tokens for open-world scene graphs

Test-Time Visual Concept Anchoring via Entropic Optimal Transport

Toward Compact and Structured Visual Representations in VLMs: SSM-Based Vision Encoders as an Alternative to Transformers

Uncertainty-guided Compositional Alignment with Part-to-Whole Semantic Representativeness in Hyperbolic Vision-Language Models

VCode: A Multimodal Coding Benchmark with SVG as Symbolic Visual Representation

VisAnalog: A Diagnostic Suite for Visual Concept Transfer on Natural Images

WristCompass: Kinematic Coupling as a Learnable Visual Concept for Ego-Camera Orientation