CVPR 2025 Past Large language modelsComputer vision

CVPR 2025 Workshop Vision Language Models For All

VLMs4All 2025

Submission deadline
May 1, 2025, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (18)

Fetched from OpenReview (v2) on 2026-06-10.

  1. Behind Maya: Building a Multilingual Vision Language Model

    Nahid Alam, Karthik reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Cecilia Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth.S, Snehanshu Mukherjee, Alham Fikri Aji · PDF
  2. Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

    Srishti Yadav, Zhi Zhang, Daniel Hershcovich, Ekaterina Shutova · PDF
  3. Challenging Multimodal LLMs with African Standardized Exams: A Document VQA Evaluation

    Victor Tolulope Olufemi, Oreoluwa Boluwatife Babatunde, Emmanuel Bolarinwa, Kausar Yetunde Moshood · PDF
  4. Chitrakshara: A Large Multilingual Multimodal Dataset for Indian languages

    Shaharukh Khan, Ali Faraz, Abhinav Ravi, Mohd Nauman, Mohd Sarfraz, Akshat Patidar, Raja Kolla, Chandra Khatri, Shubham Agarwal · PDF
  5. CONCAP: Seeing Beyond English with Retrieval-Augmented Captioning

    George Ibrahim, Rita Ramos, Yova Kementchedjhieva · PDF
  6. Cultural Awareness in Vision-Language Models: A Cross-Country Exploration

    Avinash Madasu, Vasudev Lal, Phillip Howard · PDF
  7. Culturally-Aware Financial Fraud Detection Using Vision-Language Models

    Huangqi Jiang · PDF
  8. CultureShift: Mapping Temporal Cultural Evolution in Vision-Language Models

    Gautam Jajoo, Harsh Deshpande, Hamna, Pranjal A Chitale · PDF
  9. CultureVLM: Characterizing and Improving Cultural Understanding of Vision-Language Models for over 100 Countries

    Shudong Liu, Yiqiao Jin, CHENG LI, Derek F. Wong, Qingsong Wen, Lichao Sun, Haipeng Chen, Xing Xie, Jindong Wang · PDF
  10. Enhancing Cultural Awareness in Vision-Language Models: The Power of Multimodal Few-Shot Prompting

    Pitikorn Khlaisamniang · PDF
  11. Enhancing Vision-Language Models for Global Cultural Understanding through Semantic Expansion and Diversity Reranking

    Zirui Hou, xiangzhe yin, Guangyu Gao · PDF
  12. GeoDiv: Measuring Concept Diversity of Images Across Geographical Regions

    Abhipsa Basu, Mohana Singh, Venkatesh Babu Radhakrishnan · PDF
  13. JEEM: Vision-Language Understanding in Four Arabic Dialects

    Karima Kadaoui, Hanin atwany, Hamdan Al-Ali, Abdelrahman Mohamed, Ali Mekky, Sergei Tilga, Natalia Fedorova, Ekaterina Artemova, Hanan Aldarmaki, Yova Kementchedjhieva · PDF
  14. Nayana: A Foundation for Document-Centric Vision-Language Models via Multi-Task, Multimodal, and Multilingual Data Synthesis

    Adithya S Kolavi, Samarth P, Vyoman Jain · PDF
  15. RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

    Chirag Parikh, Deepti Rawat, Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla · PDF
  16. Synthetic Document Question Answering in Hungarian

    Jonathan Lingjie Li, Zoltan Csaki, Nidhi Hiremath, Etash Kumar Guha, Fenglu Hong, King Chun Ma, Urmish Thakker · PDF
  17. The use of multi-modal models and machine learning tech-niques to improve the efficiency and accuracy of geospatial data analysis

    Matthew C Gaskins · PDF
  18. Why do LLaVA Vision-Language Models Reply to Images in English?

    Musashi Hinck, Carolin Holtermann, Matthew Lyle Olson, Florian Schneider, Sungduk Yu, Anahita Bhiwandiwalla, Anne Lauscher, Shao-Yen Tseng, Vasudev Lal · PDF