ICLR 2024 Past AI for scienceDatasets

ICLR 2024 Workshop on Data-centric Machine Learning Research (DMLR): Harnessing Momentum for Science

DMLR @ ICLR 2024

Submission deadline
Feb 9, 2024, 12:00 UTC
imported from OpenReview — check the website for extensions
Submission portal
OpenReview
Notes
Auto-imported from the OpenReview venue record on 2026-06-10 — please verify and enrich (topics are keyword-guessed).

Accepted papers (85)

Fetched from OpenReview (v2) on 2026-06-10.

  1. AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent

    · PDF
  2. Analyzing Diffusion Models on Synthesizing Training Datasets

    · PDF
  3. Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

    · PDF
  4. Annotation Sensitivity: Drivers of Training Data Quality

    · PDF
  5. Atomic Data Groups: An issue in train-test splits for the real world as demonstrated through digital hardware design

    · PDF
  6. Autoregressive activity prediction for low-data drug discovery

    · PDF
  7. Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    · PDF
  8. Beyond Scale: The Diversity Coefficient as a Data Quality Metric for Variability in Natural Language Data

    · PDF
  9. Bidirectional Long-Range Parser for Sequential Data Understanding

    · PDF
  10. Birbal: An efficient 7B instruct-model fine-tuned with curated datasets

    · PDF
  11. Building Scalable Video Understanding Benchmarks through Sports

    · PDF
  12. Calibrated prediction of scarce adverse drug reaction labels with conditional neural processes

    · PDF
  13. CLE-SMOTE: Addressing Extreme Imbalanced Data Classification with Contrastive Learning-Enhanced SMOTE

    · PDF
  14. Coactive Learning for Large Language Models using Implicit User Feedback

    · PDF
  15. Combining Time Series Modalities to Create Endpoint-driven Patient Records

    · PDF
  16. Computational Copyright: Towards A Royalty Model for AI Music Generation Platforms

    · PDF
  17. Corrective Machine Unlearning

    · PDF
  18. CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

    · PDF
  19. Data Distribution Valuation

    · PDF
  20. Data-Efficient Multi-Modal Contrastive Learning: Prioritizing Data Quality over Quantity

    · PDF
  21. Denoising Drug Discovery ADMET Data for Improved Regression Task Performance

    · PDF
  22. Deploying Data Selection Techniques on Dynamic Datasets

    · PDF
  23. Distributional Dataset Distillation with Subtask Decomposition

    · PDF
  24. Empowering Large Language Models for Textual Data Augmentation

    · PDF
  25. Enabling High Data Throughput Reinforcement Learning on GPUs: A Domain Agnostic Framework for Data-Driven Scientific Research

    · PDF
  26. Enhanced Variational Autoencoder Estimation from Incomplete Data using Mixture Variational Families

    · PDF
  27. Environment-adjusted Topic Models

    · PDF
  28. Exploring the Efficacy of Meta-Learning: Unveiling Superior Data Diversity Utilization of MAML Over Pre-training

    · PDF
  29. Feedback-guided Data Synthesis for Imbalanced Classification

    · PDF
  30. Fractals as Pre-training Datasets for Anomaly Detection and Localization

    · PDF
  31. From Categories to Classifier: Name-Only Continual Learning by Exploring the Web

    · PDF
  32. FTFT: efficient and robust Fine-Tuning by transFerring Training Dynamics

    · PDF
  33. Genetic Learning for Designing Sim-to-Real Data Augmentations

    · PDF
  34. GitChameleon: Breaking the version barrier for code generation models

    · PDF
  35. Graph Kernel Convolutions for Interpretable Classification

    · PDF
  36. GRASP-GCN: Graph-Shape Prioritization for Neural Architecture Search under Distribution Shifts

    · PDF
  37. H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

    · PDF
  38. Heterogeneous Normal Classes Pose a Challenge for Anomaly Detection

    · PDF
  39. Identifying Spurious Correlations Early in Training through the Lens of Simplicity Bias

    · PDF
  40. Improving Semantic Segmentation Models through Synthetic Data Generation via Diffusion Models

    · PDF
  41. Information Compensation: A Fix for Any-scale Dataset Distillation

    · PDF
  42. Interpretable Graph Neural Networks for Tabular Data

    · PDF
  43. Is a picture of a bird a bird? A mixed-methods approach to understanding diverse human perspectives and ambiguity in machine vision models

    · PDF
  44. Is margin all you need? An extensive empirical study of deep active learning on tabular data

    · PDF
  45. Language Models as Science Tutors

    · PDF
  46. Learning Galaxy Intrinsic Alignment Correlations

    · PDF
  47. Learning representations of learning representations

    · PDF
  48. Learning to Rank for One-Round Active Learning

    · PDF
  49. Lifelong Benchmarks: Efficient Model Evaluation in an Era of Rapid Progress

    · PDF
  50. LLM-Guided Counterfactual Data Generation for Fairer AI

    · PDF
  51. Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning

    · PDF
  52. Measuring Diversity in Datasets

    · PDF
  53. Model-Based Data-Centric AI: Bridging the Divide Between Academic Ideals and Industrial Pragmatism

    · PDF
  54. Multi-model evaluation with labeled and unlabeled data

    · PDF
  55. On the Scalability of GNNs for Molecular Graphs

    · PDF
  56. One Law, Many Languages: Benchmarking Multilingual Legal Reasoning for Judicial Support

    · PDF
  57. OODRobustBench: a benchmark and large-scale analysis of adversarial robustness under distribution shift

    · PDF
  58. Open Domain Generalization with a Single Network by Regularization Exploiting Pre-trained Features

    · PDF
  59. PointSAGE : Mesh-independent superresolution approach to fluid flow predictions

    · PDF
  60. PRE: Vision-Language Prompt Learning with Reparameterization Encoder

    · PDF
  61. Pretraining Probabilistic Models for Scalable Precision Agriculture

    · PDF
  62. Private Data Measurements for Decentralized Data Markets

    · PDF
  63. Pushing the Decision Boundaries: Discovering New Classes in Audio Data

    · PDF
  64. QualEval: Qualitative Evaluation for Model Improvement

    · PDF
  65. Quantifying the Importance of Data Alignment in Downstream Model Performance

    · PDF
  66. QuRating: Selecting High-Quality Data for Training Language Models

    · PDF
  67. Re-evaluating Retrosynthesis Algorithms with Syntheseus

    · PDF
  68. Retail-786k: a Large-Scale Dataset for Visual Entity Matching

    · PDF
  69. Step-DAD: Semi-Amortized Policy-Based Bayesian Experimental Design

    · PDF
  70. Style-Content Disentanglement Under Conditional Shift

    · PDF
  71. The Science of Data Filtering: Data Curation cannot be Compute Agnostic

    · PDF
  72. TOTEM: Tokenized Time Series Embeddings for General Time Series Analysis

    · PDF
  73. Towards Algorithmic Fairness by means of Instance-level Data Re-weighting based on Shapley Values

    · PDF
  74. Towards Efficient Active Learning in NLP via Pretrained Representations

    · PDF
  75. Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models

    · PDF
  76. Towards Quantifying the Effect of Datasets for Benchmarking: A Look at Tabular Machine Learning

    · PDF
  77. Towards Robust Data Pruning

    · PDF
  78. Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift

    · PDF
  79. Unveiling the Intertwined Relationship Between Essential Sparsity and Robustness in Large Pre-trained Models

    · PDF
  80. Urban Sound Propagation: a Benchmark for 1-Step Generative Modeling of Complex Physical Systems

    · PDF
  81. Verified Training for Counterfactual Explanation Robustness under Data Shift

    · PDF
  82. VTruST: Controllable value function based subset selection for Data-Centric Trustworthy AI

    · PDF
  83. When is Off-Policy Evaluation Useful? A Data-Centric Perspective

    · PDF
  84. WINDSET: Weather Insights and Novel Data for Systematic Evaluation and Testing

    · PDF
  85. You can't handle the (dirty) truth: Data-centric insights improve pseudo-labeling

    · PDF