Work Experience


  • Junior Machine Learning Scientist, ProteinQure (May 3, 2021 - August 31, 2022)

    Location

    Toronto, Ontario, Canada

All Publications


  • Sparks of function by de novo protein design. Nature biotechnology Chu, A. E., Lu, T., Huang, P. S. 2024; 42 (2): 203-215

    Abstract

    Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.

    View details for DOI 10.1038/s41587-024-02133-2

    View details for PubMedID 38361073

    View details for PubMedCentralID 6423711

  • Geometric Deep Learning for Structure-Based Ligand Design. ACS central science Powers, A. S., Yu, H. H., Suriana, P., Koodli, R. V., Lu, T., Paggi, J. M., Dror, R. O. 2023; 9 (12): 2257-2267

    Abstract

    A pervasive challenge in drug design is determining how to expand a ligand-a small molecule that binds to a target biomolecule-in order to improve various properties of the ligand. Adding single chemical groups, known as fragments, is important for lead optimization tasks, and adding multiple fragments is critical for fragment-based drug design. We have developed a comprehensive framework that uses machine learning and three-dimensional protein-ligand structures to address this challenge. Our method, FRAME, iteratively determines where on a ligand to add fragments, selects fragments to add, and predicts the geometry of the added fragments. On a comprehensive benchmark, FRAME consistently improves predicted affinity and selectivity relative to the initial ligand, while generating molecules with more drug-like chemical properties than docking-based methods currently in widespread use. FRAME learns to accurately describe molecular interactions despite being given no prior information on such interactions. The resulting framework for quality molecular hypothesis generation can be easily incorporated into the workflows of medicinal chemists for diverse tasks, including lead optimization, fragment-based drug discovery, and de novo drug design.

    View details for DOI 10.1021/acscentsci.3c00572

    View details for PubMedID 38161364

  • ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations JOURNAL OF MOLECULAR BIOLOGY Strokach, A., Lu, T., Kim, P. M. 2021; 433 (11): 166810

    Abstract

    The ELASPIC web server allows users to evaluate the effect of mutations on protein folding and protein-protein interaction on a proteome-wide scale. It uses homology models of proteins and protein-protein interactions, which have been precalculated for several proteomes, and machine learning models, which integrate structural information with sequence conservation scores, in order to make its predictions. Since the original publication of the ELASPIC web server, several advances have motivated a revisiting of the problem of mutation effect prediction. First, progress in neural network architectures and self-supervised pre-trained has resulted in models which provide more informative embeddings of protein sequence and structure than those used by the original version of ELASPIC. Second, the amount of training data has increased several-fold, largely driven by advances in deep mutation scanning and other multiplexed assays of variant effect. Here, we describe two machine learning models which leverage the recent advances in order to achieve superior accuracy in predicting the effect of mutation on protein folding and protein-protein interaction. The models incorporate features generated using pre-trained transformer- and graph convolution-based neural networks, and are trained to optimize a ranking objective function, which permits the use of heterogeneous training data. The outputs from the new models have been incorporated into the ELASPIC web server, available at http://elaspic.kimlab.org.

    View details for DOI 10.1016/j.jmb.2021.166810

    View details for Web of Science ID 000648520800013

    View details for PubMedID 33450251