Tianyu Lu
Ph.D. Student in Bioengineering, admitted Autumn 2022
Masters Student in Bioengineering, admitted Autumn 2023
Work Experience
-
Junior Machine Learning Scientist, ProteinQure (May 3, 2021 - August 31, 2022)
Location
Toronto, Ontario, Canada
All Publications
-
Synthetic biology education and pedagogy: a review of evolving practices in a growing discipline
FRONTIERS IN EDUCATION
2024; 9
View details for DOI 10.3389/feduc.2024.1441720
View details for Web of Science ID 001337078600001
-
Sparks of function by de novo protein design.
Nature biotechnology
2024; 42 (2): 203-215
Abstract
Information in proteins flows from sequence to structure to function, with each step causally driven by the preceding one. Protein design is founded on inverting this process: specify a desired function, design a structure executing this function, and find a sequence that folds into this structure. This 'central dogma' underlies nearly all de novo protein-design efforts. Our ability to accomplish these tasks depends on our understanding of protein folding and function and our ability to capture this understanding in computational methods. In recent years, deep learning-derived approaches for efficient and accurate structure modeling and enrichment of successful designs have enabled progression beyond the design of protein structures and towards the design of functional proteins. We examine these advances in the broader context of classical de novo protein design and consider implications for future challenges to come, including fundamental capabilities such as sequence and structure co-design and conformational control considering flexibility, and functional objectives such as antibody and enzyme design.
View details for DOI 10.1038/s41587-024-02133-2
View details for PubMedID 38361073
View details for PubMedCentralID 6423711
-
Geometric Deep Learning for Structure-Based Ligand Design.
ACS central science
2023; 9 (12): 2257-2267
Abstract
A pervasive challenge in drug design is determining how to expand a ligand-a small molecule that binds to a target biomolecule-in order to improve various properties of the ligand. Adding single chemical groups, known as fragments, is important for lead optimization tasks, and adding multiple fragments is critical for fragment-based drug design. We have developed a comprehensive framework that uses machine learning and three-dimensional protein-ligand structures to address this challenge. Our method, FRAME, iteratively determines where on a ligand to add fragments, selects fragments to add, and predicts the geometry of the added fragments. On a comprehensive benchmark, FRAME consistently improves predicted affinity and selectivity relative to the initial ligand, while generating molecules with more drug-like chemical properties than docking-based methods currently in widespread use. FRAME learns to accurately describe molecular interactions despite being given no prior information on such interactions. The resulting framework for quality molecular hypothesis generation can be easily incorporated into the workflows of medicinal chemists for diverse tasks, including lead optimization, fragment-based drug discovery, and de novo drug design.
View details for DOI 10.1021/acscentsci.3c00572
View details for PubMedID 38161364
-
ELASPIC2 (EL2): Combining Contextualized Language Models and Graph Neural Networks to Predict Effects of Mutations
JOURNAL OF MOLECULAR BIOLOGY
2021; 433 (11): 166810
Abstract
The ELASPIC web server allows users to evaluate the effect of mutations on protein folding and protein-protein interaction on a proteome-wide scale. It uses homology models of proteins and protein-protein interactions, which have been precalculated for several proteomes, and machine learning models, which integrate structural information with sequence conservation scores, in order to make its predictions. Since the original publication of the ELASPIC web server, several advances have motivated a revisiting of the problem of mutation effect prediction. First, progress in neural network architectures and self-supervised pre-trained has resulted in models which provide more informative embeddings of protein sequence and structure than those used by the original version of ELASPIC. Second, the amount of training data has increased several-fold, largely driven by advances in deep mutation scanning and other multiplexed assays of variant effect. Here, we describe two machine learning models which leverage the recent advances in order to achieve superior accuracy in predicting the effect of mutation on protein folding and protein-protein interaction. The models incorporate features generated using pre-trained transformer- and graph convolution-based neural networks, and are trained to optimize a ranking objective function, which permits the use of heterogeneous training data. The outputs from the new models have been incorporated into the ELASPIC web server, available at http://elaspic.kimlab.org.
View details for DOI 10.1016/j.jmb.2021.166810
View details for Web of Science ID 000648520800013
View details for PubMedID 33450251