ICML 2026 Past Evaluation & benchmarks
Culture x AI: Evaluating AI as a Cultural Technology (ICML 2026)
Culture x AI 2026
- Submission deadline
-
TBA — know
the deadline? Add it in one line The file opens with a ready-to-fill template — takes about a minute.
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (63)
Fetched from OpenReview (v2) on 2026-06-10.
-
“AI is (not) the new..”: A Diagnostic Analogy Framework for Generative AI’s Cultural Impacts
-
A Charter for Cultural AI Evaluation: Methodological Principles for Long-Tail, Cross-Cultural Tasks
-
A Vision for Cultural Alignment: Opportunities and Safety Imperatives for AI in Mental Health Support
-
Agonistic AI: Advancing Interpretive Pluralism in the Cultural AI Value Space
-
AI as Cultural Mediation: Agentic Sanskrit–English Translation with Linguistic Grounding
-
AI-Assisted Video Montage as Coordination: Design Guidelines for Platforms of Interactive Agent-based Multimodal Synthesis
-
Beyond Bias: Evaluating Cultural AI Through Participation and Interpretation
-
Beyond Hallucination: Evaluating Cultural and Institutional Misinterpretation in Public-Facing LLMs
-
Caesar Speaks Again: Bringing Historical Characters to Life using AI-Driven Avatars for Immersive Cultural Heritage in AR
-
Care Is Not a Style Transfer Task: Evaluating Culturally Grounded Clinical AI
-
Causal Mechanisms of the Gender Pay Gap
-
Code-Switching Reveals Anchor Bias in Multilingual Large Language Models
-
Consensus Is Not Enough: Disagreement-Preserving Evaluation for Cultural AI
-
Cultural Fermentation: on Craft, Ecology, Listening, and Safety
-
Cultural Fidelity in English-to-Hindi Translation: A Preservation–Fluency Frontier for Gender Recoverability
-
Culturally-Adapted Red-Teaming Across East and Southeast Asian Contexts: A Methodological and Comparative Analysis
-
CuPS: Measuring Cultural Preference Signatures in LLM/VLM Agents and Their Steering by Profile Memories
-
Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation
-
Does Persona Make LLM a K-pop Fan? A Pilot Study of LLM-Based Online Concert Audience Agents
-
Environmental Slow AI: Design Principles for Generative Systems
-
Evolution of Cooperation in LLM Societies : A Multi-Lingual Examination
-
Fine-Tuning as Repair? Care Ethics and Situated Knowledges in LLM Alignment Cultures
-
From Error Detection to Cultural Legibility: Human-AI Cooperation for Trauma-Informed Heritage Education in Conflict Zones
-
From Style to Cultural Calibration: Evaluating Institutional Voice in LLM-Generated News
-
GlotOCR Bench: OCR Models Still Struggle Beyond a Handful of Unicode Scripts
-
IndicDB - Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
-
Injecting Knowledge from Social Science Journals to Improve Indonesian Cultural Understanding by LLMs
-
Interpretive Anchoring for Culturally Situated LLM Evaluation
-
KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation
-
Korean Culture into LLM Alignment: From Refusal to Cultural Coherence
-
LLMs Exhibit Significantly Lower Uncertainty in Creative Writing Than Professional Writers
-
Manga109-v2026: Revisiting Manga109 Annotations for Modern Manga Understanding
-
Mise en Place for Taste: Recipes, Connoisseurship, and Cultural Competence in Generative AI
-
NarrativeWorldBench: A Frontier-Saturated Benchmark and a Latent World Model for Long-Horizon Co-Creative Audio Drama
-
Operative Contexts: Belief Revision and Memory in Agentic AI
-
PAUSE: Editable Strategy Artifacts for Long-Form Cultural Story Adaptation
-
Plural Voices: A Cultural Contestability Framework for Evaluating AI-Mediated Service Work
-
Reading Models’ Self-Defense: Narratology as Legibility Instrument for Cultural AI Evaluation
-
Repertoires, Not Scores: Instability as Signal in Cultural Evaluation of LLMs
-
Robustness of Cultural Norm Reasoning Under Language and Context Perturbations
-
SAFE: Segment-Aware Filtering and Evaluation for Lyric Content Moderation
-
SEA-MU: Cultural Meme Understanding Benchmark for Southeast Asia
-
Spoiler Alert: Narrative Forecasting as a Metric for Tension in LLM Storytelling
-
Stress-Testing Emotional Support Models: Moving from Homogeneous to Diverse Help Seekers
-
StylisticBias: A Few Human Visual Cues Drive Most Social Bias in MLLMs
-
The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety
-
The Language of Bargaining: Linguistic Effects in LLM Negotiations
-
The Modular Encyclopedia: LLMs and the Assemblage of Cultural Knowledge
-
The Time of the Latent: Evaluating Cultural AI Through Human–AI Creative Trajectories
-
Three Years of r/ChatGPT: Societal Impact Evaluations from Social Media Data
-
Tokenization as Cultural Erasure: How Corpus Composition Shapes the Representation of Aymara Morphology in NLP Systems
-
Toward a 21st Century Turing Test: Games, Authority, and Interpretive Intelligence in AI
-
Towards A New Toolkit for Measuring AI-Enabled Influence Operations
-
What Could Cézanne Have Painted? Geometric Search for Stylistic Gaps in Embedding Spaces
-
What Do Historical Language Models Model?
-
What does a surplus of interpretations consume?
-
What Gets Lost When Memory Becomes Media? Evaluating AI-Generated Oral History Visualization
-
What If Chinese Were Latinized? A Counterfactual Study of Script, Tokenization, and Language Modeling
-
What Makes AI a Good Cultural Mediator? Evidence from Literary Paratexts
-
When East Asia Loses Its Names: Interpreting Neighborhood Effect and Cultural Generalization in Vision-Language Models
-
When Perspective Becomes Control: Verifying Role-Conditioned Image Generation
-
Where Models Concentrate and Humans Spread: Toward Cultural Reach in Generative AI
-
Whose Interpretation Counts? Reading Generative AI as an Interpretive Technology Across UK and Indian Households