ICLR 2024 Past Large language modelsDatasets
ICLR 2024 Workshop on Navigating and Addressing Data Problems for Foundation Models
DPFM 2024
- Submission deadline
- Feb 13, 2024, 01:30 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (49)
Fetched from OpenReview (v2) on 2026-06-10.
-
A Tale of Tails: Model Collapse as a Change of Scaling Laws
-
AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent
-
Augmenting Math Word Problems via Iterative Question Composing
-
Autonomous Data Selection with Language Models for Mathematical Texts
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
-
CollabEdit: Towards Non-destructive Collaborative Knowledge Editing
-
Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms
-
Cookbook: A framework for improving LLM generative abilities via programmatic data generating templates
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
-
Data Alignment for Zero-Shot Concept Generation in Dermatology AI
-
DELE: Data Efficient LLM Evaluation
-
Distributional Dataset Distillation with Subtask Decomposition
-
Does Data Contamination Make a Difference? Insights from Intentionally Contaminating Pre-training Data For Language Models
-
Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
-
Efficient Global Data Attribution for Diffusion Models
-
Enhancing Data Quality in Federated Fine-Tuning of Large Language Models
-
Evaluating Large Language Models in an Emerging Domain: A Pilot Study in Decentralized Finance
-
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
-
Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
-
Hallucination Augmented Recitations for Language Models
-
How to Craft Backdoors with Unlabeled Data Alone?
-
Improving Practical Counterfactual Fairness with Limited Causal Knowledge
-
Incentivizing Inclusive Data Contributions in Personalized Federated Learning
-
Intent-based Prompt Calibration: Enhancing prompt optimization with synthetic boundary cases
-
Label-free Neural Semantic Image Synthesis
-
LESS: Selecting Influential Data for Targeted Instruction Tuning
-
LongForm: Effective Instruction Tuning with Reverse Instructions
-
Model & Data Insights using Pre-trained Language Models
-
Multimodal Dataset Upgrading: a New Challenge for Data Annotation
-
ON THE SCALABILITY OF GNNS FOR MOLECULAR GRAPHS
-
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
-
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
-
Perplexed by Perplexity: Perplexity-Based Pruning with Small Reference Models
-
Pre-training Concept Frequency is predictive of CLIP Zero-shot Performance
-
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
-
Prompt Optimization with Logged Bandit Data
-
QuRating: Selecting High-Quality Data for Training Lanugage Models
-
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
-
Scaling Laws for Downstream Task Performance of Large Language Models
-
Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
-
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
-
The Science of Data Filtering: Data Curation cannot be Compute Agnostic
-
TOFU: A Task of Fictitious Unlearning for LLMs
-
Toward Data-driven Skill Identification for General-purpose Vision-language Models
-
Towards Unbiased Evaluation of Detecting Unanswerable Questions in EHRSQL
-
VideoCon: Robust Video-Language Alignment via Contrast Captions
-
Virtual Classifier: A Reversed Approach for Robust Image Evaluation
-
West-of-N: Synthetic Preference Generation for Improved Reward Modeling
-
What's in Your "Safe" Data?: Identifying Benign Data that Breaks Safety