
Sang Truong
Ph.D. Student in Computer Science, admitted Autumn 2021
All Publications
-
Systematic analysis of biomolecular conformational ensembles with PENSA.
The Journal of chemical physics
2025; 162 (1)
Abstract
Atomic-level simulations are widely used to study biomolecules and their dynamics. A common goal in such studies is to compare simulations of a molecular system under several conditions-for example, with various mutations or bound ligands-in order to identify differences between the molecular conformations adopted under these conditions. However, the large amount of data produced by simulations of ever larger and more complex systems often renders it difficult to identify the structural features that are relevant to a particular biochemical phenomenon. We present a flexible software package named Python ENSemble Analysis (PENSA) that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides featurization and feature transformations that allow for a complete representation of biomolecules such as proteins and nucleic acids, including water and ion binding sites, thus avoiding the bias that would come with manual feature selection. PENSA implements methods to systematically compare the distributions of molecular features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule, which allows, for example, tracing information flow to identify allosteric pathways. PENSA also comes with convenient tools for loading data and visualizing results, making them quick to process and easy to interpret. PENSA is an open-source Python library maintained at https://github.com/drorlab/pensa along with an example workflow and a tutorial. We demonstrate its usefulness in real-world examples by showing how it helps us determine molecular mechanisms efficiently.
View details for DOI 10.1063/5.0235544
View details for PubMedID 39745157
-
Bayesian Optimization for Crop Genetics with Scalable Probabilistic Models
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024: 30-44
View details for Web of Science ID 001347127100002
-
An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2024: 6549-6560
View details for Web of Science ID 001356731806041
-
DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001220600008008
-
GAUCHE: A Library for Gaussian Processes in Chemistry
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001224281501021