ICML 2024PastLarge language modelsEfficiency

ICML 2024 Workshop on Efficient and Accessible Foundation Models for Biological Discovery

AccMLBio

Official website ↗OpenReview venue ↗See all ICML workshops →✎ Edit this entry

Submission deadline: May 30, 2024, 12:01 UTC
imported from OpenReview — check the website for extensions
Submission portal: OpenReview
Notes: Topics were auto-suggested and may be imprecise — edits welcome.

Accepted papers (37)

Fetched from OpenReview (v2) on 2026-06-10.

2Bits of Protein: Efficient Protein Language Models at the Scale of 2-bits
Oliver M. Turnbull, Mohamed Baioumy, Charlotte Deane · PDF
A generative foundation model for antibody sequence understanding
Justin Barton, Aretas Gaspariunas, David A Yadin, Jorge Dias, Francesca L Nice, Danielle H Minns, Olivia Snudden, Chelsea Povall, Sara Valle Tomas, Harry Dobson, James H R Farmery, Jinwoo Leem, Jacob D Galson · PDF
ABodyBuilder3: Improved and scalable antibody structure predictions
Henry Kenlay, Frederic A Dreyer, Daniel Cutting, Daniel Allen Nissley, Charlotte Deane · PDF
Are Protein Language Models Compute Optimal?
Yaiza Serrano, Alvaro Ciudad Serrano, Alexis Molina · PDF
BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning
Varuni Sarwal, Seungmo Lee, Rosemary He, Aingela Kattapuram, xiaoxuan wang, Eleazar Eskin, Wei Wang, Serghei Mangul · PDF
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, Volodymyr Kuleshov · PDF
Compressing the Latent Space of Single-Sequence Protein Predictors for Multimodal Generation
Amy X. Lu, Wilson Yan, Vladimir Gligorijevic, Pieter Abbeel, Kevin K Yang, Nathan C. Frey · PDF
Cramming Protein Language Model Training in 24 GPU Hours
Nathan C. Frey, Taylor Joren, Aya Abdelsalam Ismail, Allen Goodman, Richard Bonneau, Kyunghyun Cho, Vladimir Gligorijevic · PDF
Enhancing Single-Cell VAE Latent Space via Semi-Supervision
Meichen Gong, Konstantin Ivanov, Merja Heinäniemi, Ville Hautamaki · PDF
Fine-tuning the ESM2 protein language model to understand the functional impact of missense variants
Ali Saadat, Jacques Fellay · PDF
FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking
Sophia Vincoff, Shrey Goel, Kseniia Kholina, Pranam Chatterjee · PDF
Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets
Ulrich Armel Mbou Sob, Qiulin Li, Miguel Arbesú, Oliver Bent, Andries Petrus Smit, Arnu Pretorius · PDF
Geometric Algebra based encoding for graph prompting
Sotirios Panagiotis Chytas, Rudrasis Chakraborty, Vikas Singh · PDF
Graph2Token: Make LLMs Understand Molecule Graphs
Runze Wang, Mingqi Yang, Yanming Shen · PDF
High-Resolution In Silico Painting with Generative Models
Trang Le · PDF
Identifying Biological Priors and Structure in Single-Cell Foundation Models
Flavia Pedrocchi, Stefan Stark, Gunnar Ratsch, Amir Joudaki · PDF
Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction
Fatemeh Nassajian Mojarrad, Lorenzo Bini, Thomas Matthes, Stephane Marchand-Maillet · PDF
Interactome-scale comparison of co-immunoprecipitation and yeast two-hybrid assays for protein interaction prediction
Kapil Devkota, Lenore Cowen, Rohit Singh · PDF
Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming
João Loula, Katherine M. Collins, Ulrich Schaechtle, Joshua B. Tenenbaum, Adrian Weller, Feras Saad, Timothy J. O'Donnell, Vikash Mansinghka · PDF
Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design
Alex Hawkins-Hooker, Jakub Kmec, Oliver Bent, Paul Duckworth · PDF
MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning
Kerstin Klaser, Blazej Banaszewski, Samuel Maddrell-Mander, Callum McLean, Luis Müller, Ali Parviz, Shenyang Huang, Andrew W Fitzgibbon · PDF
MolEval: An Evaluation Toolkit for Molecular Embeddings via LLMs
Shaghayegh Sadeghi, Ali Forooghi, Jianguo Lu, Alioune Ngom · PDF
MSA Pairing Transfomer: protein interaction partner prediction with few-shot contrastive learning
Alex Hawkins-Hooker, Daniel Burkhardt Cerigo, Umberto Lupo, David Jones, Brooks Paige · PDF
Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences
Dhuvarakesh Karthikeyan, Alex Rubinsteyn · PDF
One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data
Michal Golovanevsky, Eva Schiller, Akira A Nair, Ritambhara Singh, Carsten Eickhoff · PDF
PLUTO: Pathology-Universal Transformer
Dinkar Juyal, Harshith Padigela, Chintan Shah, Daniel Shenker, Natalia Harguindeguy, Yi Liu, Blake Martin, Yibo Zhang, Michael Nercessian, Miles Markey, Isaac Finberg, Kelsey Luu, Daniel Borders, Syed Ashar Javed, Emma L Krause, Raymond Biju, Aashish Sood, Allen Ma, Jackson Nyman, John Shamshoian, Guillaume Chhor, Darpan Sanghavi, Marc Thibault, Limin Yu, Fedaa Najdawi, Jennifer A. Hipp, Darren Fahy, Benjamin Glass, Eric Walk, John Abel, Harsha Vardhan pokkalla, Andrew H. Beck, Sean Grullon · PDF
Pre-training of Single-cell Language Models through Genetic Pathway Learning
Xuxi Chen, Zhangyang Wang, Marinka Zitnik, Manolis Kellis, Tianlong Chen · PDF
Prot2Token: A multi-task framework for protein language processing using autoregressive language modeling
Mahdi Pourmirzaei, Farzaneh Esmaili, Mohammadreza Pourmirzaei, Duolin Wang, Dong Xu · PDF
ProtMamba: a homology-aware but alignment-free protein state space model
Damiano Sgarbossa, Cyril Malbranke, Anne-Florence Bitbol · PDF
Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Enhanced Goal Directed Generation
Arthur-Louis Heath, Amina Mollaysa, Michael Krauthammer · PDF
RFamLlama: an efficient conditional language model for RNA sequence generation across diverse structural families
Jinyuan Sun, Han Li, Yifan Deng · PDF
scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data
Moritz Vandenhirtz, Florian Barkmann, Laura Manduchi, Julia E Vogt, Valentina Boeva · PDF
Simple and Effective Masked Diffusion Language Models
Subham Sekhar Sahoo, Marianne Arriola, Aaron Gokaslan, Edgar Mariano Marroquin, Alexander M Rush, Yair Schiff, Justin T Chiu, Volodymyr Kuleshov · PDF
SWUS: Active Learning with Structure Weighted Uncertainty Score
Andrea Karlova, Brooks Paige · PDF
Towards Generalizable Particle Picking in Cryo-EM Images by Leveraging Masked AutoEncoder
Andreas Zamanos, Panagiotis Koromilas, Giorgos Bouritsas, Panagiotis L. Kastritis, Yannis Panagakis · PDF
Training Compute-Optimal Protein Language Models
Xingyi Cheng, Bo Chen, Pan Li, Jing Gong, Jie Tang, Le Song · PDF
xMINT: A Multimodal Integration Transformer for Xenium Gene Imputation
Xiaohui Jiang, Yuxia Xie, Jichun Xie · PDF

Accepted papers (37)

☆2Bits of Protein: Efficient Protein Language Models at the Scale of 2-bits

☆A generative foundation model for antibody sequence understanding

☆ABodyBuilder3: Improved and scalable antibody structure predictions

☆Are Protein Language Models Compute Optimal?

☆BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning

☆Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

☆Compressing the Latent Space of Single-Sequence Protein Predictors for Multimodal Generation

☆Cramming Protein Language Model Training in 24 GPU Hours

☆Enhancing Single-Cell VAE Latent Space via Semi-Supervision

☆Fine-tuning the ESM2 protein language model to understand the functional impact of missense variants

☆FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking

☆Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

☆Geometric Algebra based encoding for graph prompting

☆Graph2Token: Make LLMs Understand Molecule Graphs

☆High-Resolution In Silico Painting with Generative Models

☆Identifying Biological Priors and Structure in Single-Cell Foundation Models

☆Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction

☆Interactome-scale comparison of co-immunoprecipitation and yeast two-hybrid assays for protein interaction prediction

☆Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming

☆Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

☆MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning

☆MolEval: An Evaluation Toolkit for Molecular Embeddings via LLMs

☆MSA Pairing Transfomer: protein interaction partner prediction with few-shot contrastive learning

☆Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences

☆One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

☆PLUTO: Pathology-Universal Transformer

☆Pre-training of Single-cell Language Models through Genetic Pathway Learning

☆Prot2Token: A multi-task framework for protein language processing using autoregressive language modeling

☆ProtMamba: a homology-aware but alignment-free protein state space model

☆Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Enhanced Goal Directed Generation

☆RFamLlama: an efficient conditional language model for RNA sequence generation across diverse structural families

☆scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

☆Simple and Effective Masked Diffusion Language Models

☆SWUS: Active Learning with Structure Weighted Uncertainty Score

☆Towards Generalizable Particle Picking in Cryo-EM Images by Leveraging Masked AutoEncoder

☆Training Compute-Optimal Protein Language Models

☆xMINT: A Multimodal Integration Transformer for Xenium Gene Imputation

2Bits of Protein: Efficient Protein Language Models at the Scale of 2-bits

A generative foundation model for antibody sequence understanding

ABodyBuilder3: Improved and scalable antibody structure predictions

Are Protein Language Models Compute Optimal?

BioinformaticsBench: A collaboratively built large language model benchmark for Bioinformatics reasoning

Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling

Compressing the Latent Space of Single-Sequence Protein Predictors for Multimodal Generation

Cramming Protein Language Model Training in 24 GPU Hours

Enhancing Single-Cell VAE Latent Space via Semi-Supervision

Fine-tuning the ESM2 protein language model to understand the functional impact of missense variants

FusOn-pLM: A Fusion Oncoprotein-Specific Language Model via Focused Probabilistic Masking

Generative Model for Small Molecules with Latent Space RL Fine-Tuning to Protein Targets

Geometric Algebra based encoding for graph prompting

Graph2Token: Make LLMs Understand Molecule Graphs

High-Resolution In Silico Painting with Generative Models

Identifying Biological Priors and Structure in Single-Cell Foundation Models

Injecting Hierarchical Biological Priors into Graph Neural Networks for Flow Cytometry Prediction

Interactome-scale comparison of co-immunoprecipitation and yeast two-hybrid assays for protein interaction prediction

Learning Generative Population Models From Multiple Clinical Datasets Via Probabilistic Programming

Likelihood-based fine-tuning of protein language models for few-shot fitness prediction and design

MiniMol: A Parameter-Efficient Foundation Model for Molecular Learning

MolEval: An Evaluation Toolkit for Molecular Embeddings via LLMs

MSA Pairing Transfomer: protein interaction partner prediction with few-shot contrastive learning

Multi-Task Training Increases Native Sequence Recovery of Antigen-Specific T-cell Receptor Sequences

One-Versus-Others Attention: Scalable Multimodal Integration for Biomedical Data

PLUTO: Pathology-Universal Transformer

Pre-training of Single-cell Language Models through Genetic Pathway Learning

Prot2Token: A multi-task framework for protein language processing using autoregressive language modeling

ProtMamba: a homology-aware but alignment-free protein state space model

Rethinking Molecular Design: Integrating Latent Variable and Auto-Regressive Models for Enhanced Goal Directed Generation

RFamLlama: an efficient conditional language model for RNA sequence generation across diverse structural families

scTree: Discovering Cellular Hierarchies in the Presence of Batch Effects in scRNA-seq Data

Simple and Effective Masked Diffusion Language Models

SWUS: Active Learning with Structure Weighted Uncertainty Score

Towards Generalizable Particle Picking in Cryo-EM Images by Leveraging Masked AutoEncoder

Training Compute-Optimal Protein Language Models

xMINT: A Multimodal Integration Transformer for Xenium Gene Imputation