CVPR 2025PastComputer vision

Second Workshop on Visual Concepts

VisCon 2025

Official website ↗OpenReview venue ↗See all CVPR workshops →✎ Edit this entry

Submission deadline: Apr 16, 2025, 08:00 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (34)

Fetched from OpenReview (v2) on 2026-06-10.

BAR: Probing Brain Encoders with Concept-Based Explanations
Huadi Wang, Weihao Xia, Cengiz Oztireli · PDF
Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor
Ethan Lin, Linxi Zhao, Atharva Sehgal, Jennifer J. Sun · PDF
Beyond Language Priors: Enhancing Visual Comprehension and Attention in MLLMs
Aarti Ghatkesar, Uddeshya Upadhyay, Ganesh Venkatesh · PDF
Can generative models generate novel objects the same as familiar objects?
Zhaokun Xue, Chang Ye, Gamaleldin Fathy Elsayed, Junfeng He · PDF
Can Visual Encoder Learn to See Arrows?
Naoyuki Terashita, Yusuke Tozaki, HIDEAKI OMOTE, Kha Cong Nguyen, Ryosuke Nakamoto, Yuta Koreeda, Hiroaki Ozaki · PDF
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim, Hyungjin Chung, Byung-Hoon Kim · PDF
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning
Xindi Wu, Hee Seung Hwang, Polina Kirichenko, Olga Russakovsky · PDF
ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features
Alec Helbling, Tuna Han Salih Meral, Benjamin Hoover, Pinar Yanardag, Duen Horng Chau · PDF
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi, Dahyun Kang, Minsu Cho · PDF
Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra, Trisha Mittal, Subhadra Gopalakrishnan, Joshua Kimball · PDF
Dictionary-based Framework for Interpretable and Consistent Object Parsing
Tiezheng Zhang, Qihang Yu, Alan Yuille, Ju He · PDF
Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning
Geri Skenderi, Luigi Capogrosso, Andrea Toaiari, Matteo Denitto, Franco Fummi, Simone Melzi · PDF
Emergence and Evolution of Interpretable Concepts in Diffusion Models Through the Lens of Sparse Autoencoders
Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi · PDF
GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution
Shuja Khalid, Mohamed Ibrahim, Yang Liu · PDF
HyperVLM: Hyperbolic Space Guided Vision Language Modeling for Hierarchical Multi-Modal Understanding
Sarthak Srivastava, Kathy Wu · PDF
Learning Hierarchically using Formal Concepts
Deepika Vemuri, Sayanta Adhikari, Ankit Saha, Vineeth N. Balasubramanian · PDF
Learning reusable concepts across different video understanding tasks
Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Tatiana Tommasi, Giuseppe Averta · PDF
Memory-Modular Classification: Novel-Class Generalization with Web-Crawled Memory
Dahyun Kang, Ahmet Iscen, Eunchan Jo, Sua Choi, Minsu Cho, Cordelia Schmid · PDF
On Achieving Perfect Multimodal Alignment
Abhi Kamboj, Minh N. Do · PDF
PartComposer: Composing Part-Level Concepts from Single-Image Examples
Junyu Liu, R. Kenny Jones, Daniel Ritchie · PDF
Physical Rule-Guided Convolutional Neural Network
kishor datta gupta, Marufa Kamal, Rakib Hossain Rifat, Mohd Ariful Haque, Roy George · PDF
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
Patrick Rim, Hyoungseob Park, Suchisrit Gangopadhyay, Ziyao Zeng, Younjoon Chung, Alex Wong · PDF
Pruning Visual Concepts for Efficient and Interpretable Transfer Learning
Zichao Li, Zong Ke · PDF
Quantifying Interpretability in CLIP Models with Concept Consistency
Avinash Madasu, Vasudev Lal, Phillip Howard · PDF
Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era
Dan Oneata, Desmond Elliott, Stella Frank · PDF
Sequentially Acquiring Concept Knowledge to Guide Continual Learning
Shivanand Kundargi, Kowshik Thopalli, Tejas Gokhale · PDF
SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation
Sarthak Srivastava, Kathy Wu · PDF
SSCA: SigLIP-2 Sonar Concept Alignment
Trevor Brokowski, Alexandre Sallinen, Mary-Anne Hartley · PDF
Text Slider: Efficient and Precise Concept Control for Video Generation and Editing via LoRA Adapters
Pin-Yen Chiu, I-Sheng Fang, Jun-Cheng Chen · PDF
Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics
Sara Ghazanfari, Siddharth Garg, Nicolas Flammarion, Prashanth Krishnamurthy, Farshad Khorrami, Francesco Croce · PDF
Unsupervised Training of Vision Transformers with Synthetic Negatives
Nikolaos Giakoumoglou, Andreas Floros, Kleanthis Marios Papadopoulos, Tania Stathaki · PDF
Vision language models have difficulty recognizing virtual objects
Tyler Tran, Sangeet S. Khemlani, J. Gregory Trafton · PDF
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram · PDF
Where Do Erased Concepts Go in Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen · PDF

Accepted papers (34)

☆BAR: Probing Brain Encoders with Concept-Based Explanations

☆Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor

☆Beyond Language Priors: Enhancing Visual Comprehension and Attention in MLLMs

☆Can generative models generate novel objects the same as familiar objects?

☆Can Visual Encoder Learn to See Arrows?

☆CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

☆COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

☆ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

☆Contrastive Mean-Shift Learning for Generalized Category Discovery

☆Coreset Selection via LLM-based Concept Bottlenecks

☆Dictionary-based Framework for Interpretable and Consistent Object Parsing

☆Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

☆Emergence and Evolution of Interpretable Concepts in Diffusion Models Through the Lens of Sparse Autoencoders

☆GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution

☆HyperVLM: Hyperbolic Space Guided Vision Language Modeling for Hierarchical Multi-Modal Understanding

☆Learning Hierarchically using Formal Concepts

☆Learning reusable concepts across different video understanding tasks

☆Memory-Modular Classification: Novel-Class Generalization with Web-Crawled Memory

☆On Achieving Perfect Multimodal Alignment

☆PartComposer: Composing Part-Level Concepts from Single-Image Examples

☆Physical Rule-Guided Convolutional Neural Network

☆ProtoDepth: Unsupervised Continual Depth Completion with Prototypes

☆Pruning Visual Concepts for Efficient and Interpretable Transfer Learning

☆Quantifying Interpretability in CLIP Models with Concept Consistency

☆Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era

☆Sequentially Acquiring Concept Knowledge to Guide Continual Learning

☆SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

☆SSCA: SigLIP-2 Sonar Concept Alignment

☆Text Slider: Efficient and Precise Concept Control for Video Generation and Editing via LoRA Adapters

☆Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

☆Unsupervised Training of Vision Transformers with Synthetic Negatives

☆Vision language models have difficulty recognizing virtual objects

☆What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

☆Where Do Erased Concepts Go in Diffusion Models?

BAR: Probing Brain Encoders with Concept-Based Explanations

Beyond Accuracy: Metrics that Uncover What Makes a 'Good' Visual Descriptor

Beyond Language Priors: Enhancing Visual Comprehension and Attention in MLLMs

Can generative models generate novel objects the same as familiar objects?

Can Visual Encoder Learn to See Arrows?

CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models

COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Contrastive Mean-Shift Learning for Generalized Category Discovery

Coreset Selection via LLM-based Concept Bottlenecks

Dictionary-based Framework for Interpretable and Consistent Object Parsing

Disentangled Latent Spaces Facilitate Data-Driven Auxiliary Learning

Emergence and Evolution of Interpretable Concepts in Diffusion Models Through the Lens of Sparse Autoencoders

GaussianVAE: Adaptive Learning Dynamics of 3D Gaussians for High-Fidelity Super-Resolution

HyperVLM: Hyperbolic Space Guided Vision Language Modeling for Hierarchical Multi-Modal Understanding

Learning Hierarchically using Formal Concepts

Learning reusable concepts across different video understanding tasks

Memory-Modular Classification: Novel-Class Generalization with Web-Crawled Memory

On Achieving Perfect Multimodal Alignment

PartComposer: Composing Part-Level Concepts from Single-Image Examples

Physical Rule-Guided Convolutional Neural Network

ProtoDepth: Unsupervised Continual Depth Completion with Prototypes

Pruning Visual Concepts for Efficient and Interpretable Transfer Learning

Quantifying Interpretability in CLIP Models with Concept Consistency

Seeing What Tastes Good: Revisiting Multimodal Distributional Semantics in the Billion Parameter Era

Sequentially Acquiring Concept Knowledge to Guide Continual Learning

SGBD: Sharpness-Aware Mirror Gradient with BLIP-Based Denoising for Robust Multimodal Product Recommendation

SSCA: SigLIP-2 Sonar Concept Alignment

Text Slider: Efficient and Precise Concept Control for Video Generation and Editing via LoRA Adapters

Towards Unified Benchmark and Models for Multi-Modal Perceptual Metrics

Unsupervised Training of Vision Transformers with Synthetic Negatives

Vision language models have difficulty recognizing virtual objects

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Where Do Erased Concepts Go in Diffusion Models?