ICLR 2024 Past AI for scienceDatasets
ICLR 2024 Workshop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science
DMLR @ ICLR 2024
- Submission deadline
- Feb 9, 2024, 12:00 UTC imported from OpenReview — check the website for extensions
- Submission portal
- OpenReview
- Notes
- Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).
Accepted papers (85)
Fetched from OpenReview (v2) on 2026-06-10.
-
AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent
-
Analyzing Diffusion Models on Synthesizing Training Datasets
-
Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation
-
Annotation Sensitivity: Drivers of Training Data Quality
-
Atomic Data Groups: An issue in train-test splits for the real world as demonstrated through digital hardware design
-
Autoregressive activity prediction for low-data drug discovery
-
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models
-
Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data
-
Bidirectional Long-Range Parser for Sequential Data Understanding
-
Birbal: An efficient 7B instruct-model fine-tuned with curated datasets
-
Building Scalable Video Understanding Benchmarks through Sports
-
Calibrated prediction of scarce adverse drug reaction labels with conditional neural processes
-
CLE-SMOTE: Addressing Extreme Imbalanced Data Classification with Contrastive Learning-Enhanced SMOTE
-
Coactive Learning for Large Language Models using Implicit User Feedback
-
Combining Time Series Modalities to Create Endpoint-driven Patient Records
-
Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms
-
Corrective Machine Unlearning
-
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
-
Data Distribution Valuation
-
Data-Efficient Multi-Modal Contrastive Learning: Prioritizing Data Quality over Quantity
-
Denoising Drug Discovery ADMET Data for Improved Regression Task Performance
-
Deploying Data Selection Techniques on Dynamic Datasets
-
Distributional Dataset Distillation with Subtask Decomposition
-
Empowering Large Language Models for Textual Data Augmentation
-
Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research
-
Enhanced Variational Autoencoder Estimation from Incomplete Data using Mixture Variational Families
-
Environment-adjusted Topic Models
-
Exploring the Efficacy of Meta-Learning: Unveiling Superior Data Diversity Utilization of MAML Over Pre-training
-
Feedback-guided Data Synthesis for Imbalanced Classification
-
Fractals as Pre-training Datasets for Anomaly Detection and Localization
-
From Categories to Classifier: Name-Only Continual Learning by Exploring the Web
-
FTFT: efficient and robust Fine-Tuning by transFerring Training Dynamics
-
Genetic Learning for Designing Sim-to-Real Data Augmentations
-
GitChameleon: Breaking the version barrier for code generation models
-
Graph Kernel Convolutions for Interpretable Classification
-
GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts
-
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
-
Heterogeneous Normal Classes Pose a Challenge for Anomaly Detection
-
Identifying Spurious Correlations Early in Training through the Lens of Simplicity Bias
-
Improving Semantic Segmentation Models through Synthetic Data Generation via Diffusion Models
-
Information Compensation: A Fix for Any-scale Dataset Distillation
-
Interpretable Graph Neural Networks for Tabular Data
-
Is a picture of a bird a bird? A mixed-methods approach to understanding diverse human perspectives and ambiguity in machine vision models
-
Is margin all you need? An extensive empirical study of deep active learning on tabular data
-
Language Models as Science Tutors
-
Learning Galaxy Intrinsic Alignment Correlations
-
Learning representations of learning representations
-
Learning to Rank for One-Round Active Learning
-
Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress
-
LLM-Guided Counterfactual Data Generation for Fairer AI
-
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
-
Measuring Diversity in Datasets
-
Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism
-
Multi-model evaluation with labeled and unlabeled data
-
On the Scalability of GNNs for Molecular Graphs
-
One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support
-
OODRobustBench: a benchmark and large-scale analysis of adversarial robustness under distribution shift
-
Open Domain Generalization with a Single Network by Regularization Exploiting Pre-trained Features
-
PointSAGE : Mesh-independent superresolution approach to fluid flow predictions
-
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
-
Pretraining Probabilistic Models for Scalable Precision Agriculture
-
Private Data Measurements for Decentralized Data Markets
-
Pushing the Decision Boundaries: Discovering New Classes in Audio Data
-
QualEval: Qualitative Evaluation for Model Improvement
-
Quantifying the Importance of Data Alignment in Downstream Model Performance
-
QuRating: Selecting High-Quality Data for Training Language Models
-
Re-evaluating Retrosynthesis Algorithms with Syntheseus
-
Retail-786k: a Large-Scale Dataset for Visual Entity Matching
-
Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design
-
Style-Content Disentanglement Under Conditional Shift
-
The Science of Data Filtering: Data Curation cannot be Compute Agnostic
-
TOTEM: Tokenized Time Series Embeddings for General Time Series Analysis
-
Towards Algorithmic Fairness by means of Instance-level Data Re-weighting based on Shapley Values
-
Towards Efficient Active Learning in NLP via Pretrained Representations
-
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
-
Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning
-
Towards Robust Data Pruning
-
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
-
Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models
-
Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems
-
Verified Training for Counterfactual Explanation Robustness under Data Shift
-
VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI
-
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
-
WINDSET: Weather Insights and Novel Data for Systematic Evaluation and Testing
-
You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling