Alan N. Amin

Faculty Fellow / Assistant Professor at NYU Courant Institute.
Host: Wilson Lab

I work on machine learning models built on modern, large databases of biological sequences. My projects either build models that can leverage these datasets in new ways, or seek to understand why these models work.

Previously: PhD at Harvard Systems Biology with Debora Marks (2023), Postdoc at Jura Bio (2023), BS in Biochemistry & Mathematics from University of Toronto (2019).

Alan Amin

Selected Works

* denotes equal contribution

Scalable, flexible models of large sequence data

DeepWAS visualization
Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra
Amin A N*, Potapczynski A*, Wilson A G
ICML 2025 🏆 Oral & 2nd Best Paper at AI4NA Workshop🎤 Oral at MLCB
"Can we leverage deep neural networks to improve GWAS?"
CloneBO visualization
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
Amin A N, Gruver N*, Kuang Y*, Li Y*, Elliott H, McCarter C, Raghu A, Greenside P, Wilson A G
ICLR 2024 🌟 Spotlight 🏆 Outstanding Poster at AIDrugX
"Can we evolve antibodies in the lab by learning how antibodies evolve in our bodies?"
Petascale synthesis
Manufacturing-Aware Generative Model Architectures Enable Biological Sequence Design and Synthesis at Petascale
Weinstein E N*, Gollub M G*, Slabodkin A*, Gardner C L, Dobbs K, Cui X-B, Amin A N, Church G M, Wood E B
arXiv 2024 🏆 Best paper at MoML 2024
"Can we print large scale libraries that match generative models?"
BEAR model
A generative nonparametric Bayesian model for whole genomes
Amin A N*, Weinstein E N*, Marks D S
NeurIPS 2021
"Can we build a generative model of whole genomes?"

Theoretical foundations of modelling discrete sequence data

SCUD diffusion
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Amin A N, Gruver N, Wilson A G
NeurIPS 2025
"Why are some discrete diffusion models easier to train than others?"
Kernels visualization
Kernels with Guaranteed Flexibility for Reliable Machine Learning on Biological Sequences
Amin A N, Weinstein E N*, Marks D S*
JMLR 2025 🏆 Student Research Award NESS 2023
"What data can different sequence kernels fit?"
Non-identifiability
Non-identifiability and the blessings of misspecification in models of molecular fitness and phylogeny
Weinstein E N*, Amin A N*, Frazer J, Marks D S
NeurIPS 2022 🎤 Oral Presentation
"Why do ML models of evolution work despite obvious violations of their assumptions?"

Flexible hypothesis testing on huge datasets and models

DAT graph
Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
Amin A N, Wilson A G
ICML 2024
"Can we learn causal structures of huge systems?"
Kernel evaluation
Kernel-Based Evaluation of Conditional Biological Sequence Models
Glaser P, Paul S, Hummer A M, Deane C M, Marks D S, Amin A N
ICML 2024
"Can we flexibly evaluate the fit of large structure-to-sequence models?"
KSD visualization
A Kernelized Stein Discrepancy for Biological Sequences
Amin A N, Weinstein E N*, Marks D S*
ICML 2023
"Can we flexibly evaluate the fit of large sequence models?"

Teaching

CSCI-102: Data Structures
Looking for undergraduate graders at NYU - please reach out!
Reviews