Samuel King
Ph.D. Student in Bioengineering, admitted Autumn 2023
Education & Certifications
-
Master of Science, Stanford University, BIOE-MS (2025)
-
B.Sc., University of British Columbia, Hons. Biology (2022)
All Publications
-
Semantic design of functional de novo genes from a genomic language model.
Nature
2025
Abstract
Generative genomic models can design increasingly complex biological systems1. However, controlling these models to generate novel sequences with desired functions remains challenging. Here, we show that Evo, a genomic language model, can leverage genomic context to perform function-guided design that accesses novel regions of sequence space. By learning semantic relationships across prokaryotic genes2, Evo enables a genomic 'autocomplete' in which a DNA prompt encoding genomic context for a function of interest guides the generation of novel sequences enriched for related functions, which we refer to as 'semantic design'. We validate this approach by experimentally testing the activity of generated anti-CRISPR proteins and type II and III toxin-antitoxin systems, including de novo genes with no significant sequence similarity to natural proteins. In-context design of proteins and non-coding RNAs with Evo achieves robust activity and high experimental success rates even in the absence of structural priors, known evolutionary conservation or task-specific fine-tuning. We then use Evo to complete millions of prompts to produce SynGenome, a database containing over 120 billion base pairs of artificial intelligence-generated genomic sequences that enables semantic design across many functions. More broadly, these results demonstrate that generative genomics with biological language models can extend beyond natural sequences.
View details for DOI 10.1038/s41586-025-09749-7
View details for PubMedID 41261132
View details for PubMedCentralID 12057570
-
A multi-kingdom genetic barcoding system for precise clone isolation.
Nature biotechnology
2025
Abstract
Cell-tagging strategies with DNA barcodes have enabled the analysis of clone size dynamics and clone-restricted transcriptomic landscapes in heterogeneous populations. However, isolating a target clone that displays a specific phenotype from a complex population remains challenging. Here we present a multi-kingdom genetic barcoding system, CloneSelect, which enables a target cell clone to be triggered to express a reporter gene for isolation through barcode-specific CRISPR base editing. In CloneSelect, cells are first stably tagged with DNA barcodes and propagated so that their subpopulation can be subjected to a given experiment. A clone that shows a phenotype or genotype of interest at a given time can then be isolated from the initial or subsequent cell pools stored during the experiment using CRISPR base editing. CloneSelect is scalable and compatible with single-cell RNA sequencing. We demonstrate the versatility of CloneSelect in human embryonic kidney 293T cells, mouse embryonic stem cells, human pluripotent stem cells, yeast cells and bacterial cells.
View details for DOI 10.1038/s41587-025-02649-1
View details for PubMedID 40399693
View details for PubMedCentralID 4900892
-
Sequence modeling and design from molecular to genome scale with Evo.
Science (New York, N.Y.)
2024; 386 (6723): eado9336
Abstract
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an organism's function. We present Evo, a long-context genomic foundation model with a frontier architecture trained on millions of prokaryotic and phage genomes, and report scaling laws on DNA to complement observations in language and vision. Evo generalizes across DNA, RNA, and proteins, enabling zero-shot function prediction competitive with domain-specific language models and the generation of functional CRISPR-Cas and transposon systems, representing the first examples of protein-RNA and protein-DNA codesign with a language model. Evo also learns how small mutations affect whole-organism fitness and generates megabase-scale sequences with plausible genomic architecture. These prediction and generation capabilities span molecular to genomic scales of complexity, advancing our understanding and control of biology.
View details for DOI 10.1126/science.ado9336
View details for PubMedID 39541441
-
Spatiotemporal modeling of molecular holograms.
Cell
2024
Abstract
Quantifying spatiotemporal dynamics during embryogenesis is crucial for understanding congenital diseases. We developed Spateo (https://github.com/aristoteo/spateo-release), a 3D spatiotemporal modeling framework, and applied it to a 3D mouse embryogenesis atlas at E9.5 and E11.5, capturing eight million cells. Spateo enables scalable, partial, non-rigid alignment, multi-slice refinement, and mesh correction to create molecular holograms of whole embryos. It introduces digitization methods to uncover multi-level biology from subcellular to whole organ, identifying expression gradients along orthogonal axes of emergent 3D structures, e.g., secondary organizers such as midbrain-hindbrain boundary (MHB). Spateo further jointly models intercellular and intracellular interaction to dissect signaling landscapes in 3D structures, including the zona limitans intrathalamica (ZLI). Lastly, Spateo introduces "morphometric vector fields" of cell migration and integrates spatial differential geometry to unveil molecular programs underlying asymmetrical murine heart organogenesis and others, bridging macroscopic changes with molecular dynamics. Thus, Spateo enables the study of organ ecology at a molecular level in 3D space over time.
View details for DOI 10.1016/j.cell.2024.10.011
View details for PubMedID 39532097
-
Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression.
Frontiers in systems biology
2024; 4: 1284668
Abstract
The emergence of SARS-CoV-2 variants during the COVID-19 pandemic caused frequent global outbreaks that confounded public health efforts across many jurisdictions, highlighting the need for better understanding and prediction of viral evolution. Predictive models have been shown to support disease prevention efforts, such as with the seasonal influenza vaccine, but they require abundant data. For emerging viruses of concern, such models should ideally function with relatively sparse data typically encountered at the early stages of a viral outbreak. Conventional discrete approaches have proven difficult to develop due to the spurious and reversible nature of amino acid mutations and the overwhelming number of possible protein sequences adding computational complexity. We hypothesized that these challenges could be addressed by encoding discrete protein sequences into continuous numbers, effectively reducing the data size while enhancing the resolution of evolutionarily relevant differences. To this end, we developed a viral protein evolution prediction model (VPRE), which reduces amino acid sequences into continuous numbers by using an artificial neural network called a variational autoencoder (VAE) and models their most statistically likely evolutionary trajectories over time using Gaussian process (GP) regression. To demonstrate VPRE, we used a small amount of early SARS-CoV-2 spike protein sequences. We show that the VAE can be trained on a synthetic dataset based on this data. To recapitulate evolution along a phylogenetic path, we used only 104 spike protein sequences and trained the GP regression with the numerical variables to project evolution up to 5 months into the future. Our predictions contained novel variants and the most frequent prediction mapped primarily to a sequence that differed by only a single amino acid from the most reported spike protein within the prediction timeframe. Novel variants in the spike receptor binding domain (RBD) were capable of binding human angiotensin-converting enzyme 2 (ACE2) in silico, with comparable or better binding than previously resolved RBD-ACE2 complexes. Together, these results indicate the utility and tractability of combining deep learning and regression to model viral protein evolution with relatively sparse datasets, toward developing more effective medical interventions.
View details for DOI 10.3389/fsysb.2024.1284668
View details for PubMedID 40809129
View details for PubMedCentralID PMC12341966
-
Forecasting SARS-CoV-2 spike protein evolution from small data by deep learning and regression
Frontiers in Systems Biology
2024; 4
View details for DOI 10.3389/fsysb.2024.1284668
-
DNA-GPS: A theoretical framework for optics-free spatial genomics and synthesis of current methods.
Cell systems
2023
Abstract
While single-cell sequencing technologies provide unprecedented insights into genomic profiles at the cellular level, they lose the spatial context of cells. Over the past decade, diverse spatial transcriptomics and multi-omics technologies have been developed to analyze molecular profiles of tissues. In thisarticle, we categorize current spatial genomics technologies into three classes: optical imaging, positional indexing, and mathematical cartography. We discuss trade-offs in resolution and scale, identify limitations, and highlight synergies between existing single-cell and spatial genomics methods. Further, we propose DNA-GPS (global positioning system), a theoretical framework for large-scale optics-free spatial genomics that combines ideas from mathematical cartography and positional indexing. DNA-GPS has the potential to achieve scalable spatial genomics for multiple measurement modalities, andby eliminating the need for optical measurement, it has the potential to position cells in three-dimensions (3D).
View details for DOI 10.1016/j.cels.2023.08.005
View details for PubMedID 37751737
-
Young innovators and the bioeconomy
Genomics and the Global Bioeconomy
Academic Press. 2023; 1st: 83-100
View details for DOI 10.1016/b978-0-323-91601-1.00005-5
-
Subcellular coordination of plant cell wall synthesis
DEVELOPMENTAL CELL
2021; 56 (7): 933-948
Abstract
Organelles of the plant cell cooperate to synthesize and secrete a strong yet flexible polysaccharide-based extracellular matrix: the cell wall. Cell wall composition varies among plant species, across cell types within a plant, within different regions of a single cell wall, and in response to intrinsic or extrinsic signals. This diversity in cell wall makeup is underpinned by common cellular mechanisms for cell wall production. Cellulose synthase complexes function at the plasma membrane and deposit their product into the cell wall. Matrix polysaccharides are synthesized by a multitude of glycosyltransferases in hundreds of mobile Golgi stacks, and an extensive set of vesicle trafficking proteins govern secretion to the cell wall. In this review, we discuss the different subcellular locations at which cell wall synthesis occurs, review the molecular mechanisms that control cell wall biosynthesis, and examine how these are regulated in response to different perturbations to maintain cell wall homeostasis.
View details for DOI 10.1016/j.devcel.2021.03.004
View details for Web of Science ID 000641581300008
View details for PubMedID 33761322
https://orcid.org/0000-0002-1260-5045