Alan N. Amin

Selected Works

* denotes equal contribution

Scalable, flexible models of large sequence data

A Diffusion Model to Shrink Proteins

Baron E*, Amin A N*, Weitzman R, Marks D S, Wilson A G

ArXiv 2025 🏆 Best paper at MoML 2025🏆 Best Paper at ExAIT Workshop🏆 Pitch award at GenBio workshop

Paper Code

"Can we leverage evolutionary information to shrink proteins?"

Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Amin A N*, Potapczynski A*, Wilson A G

ICML 2025 🏆 Oral & 2nd Best Paper at AI4NA Workshop🎤 Oral at MLCB

Paper Code

"Can we leverage deep neural networks to improve GWAS?"

Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences

Amin A N, Gruver N*, Kuang Y*, Li Y*, Elliott H, McCarter C, Raghu A, Greenside P, Wilson A G

ICLR 2024 🌟 Spotlight 🏆 Outstanding Poster at AIDrugX

Paper Code

"Can we evolve antibodies in the lab by learning how antibodies evolve in our bodies?"

Manufacturing-Aware Generative Model Architectures Enable Biological Sequence Design and Synthesis at Petascale

Weinstein E N*, Gollub M G*, Slabodkin A*, Gardner C L, Dobbs K, Cui X-B, Amin A N, Church G M, Wood E B

Nature Biotechnology 2025 🏆 Top 4 paper at MoML 2024

Paper Code

"Can we print large scale libraries that match generative models?"

A generative nonparametric Bayesian model for whole genomes

Amin A N*, Weinstein E N*, Marks D S

NeurIPS 2021

Paper Code

"Can we build a generative model of whole genomes?"

Theoretical foundations of modelling discrete sequence data

A Unification of Discrete, Gaussian, and Simplicial Diffusion

Chandra A*, Li Y L*, Amin A N*, Ali A, Rollins J, Ober S W, Raghu A, Wilson A G

ArXiv 2025

Paper Code

"Can we unifty the different diffusion models for discrete data?"

Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion

Amin A N, Gruver N, Wilson A G

NeurIPS 2025

Paper Code

"Why are some discrete diffusion models easier to train than others?"

Kernels with Guaranteed Flexibility for Reliable Machine Learning on Biological Sequences

Amin A N, Weinstein E N*, Marks D S*

JMLR 2025 🏆 Student Research Award NESS 2023

Paper Code

"What data can different sequence kernels fit?"

Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny

Weinstein E N*, Amin A N*, Frazer J, Marks D S

NeurIPS 2022 🎤 Oral Presentation

Paper

"Why do ML models of evolution work despite obvious violations of their assumptions?"

Flexible hypothesis testing on huge datasets and models

Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency

Amin A N, Wilson A G

ICML 2024

Paper Code

"Can we learn causal structures of huge systems?"

Kernel-Based Evaluation of Conditional Biological Sequence Models

Glaser P, Paul S, Hummer A M, Deane C M, Marks D S, Amin A N

ICML 2024

Paper

"Can we flexibly evaluate the fit of large structure-to-sequence models?"

A Kernelized Stein Discrepancy for Biological Sequences

Amin A N, Weinstein E N*, Marks D S*

ICML 2023

Paper Code

"Can we flexibly evaluate the fit of large sequence models?"

Selected Works

Scalable, flexible models of large sequence data

Theoretical foundations of modelling discrete sequence data

Flexible hypothesis testing on huge datasets and models

Teaching