All Publications


  • Systematic analysis of biomolecular conformational ensembles with PENSA. The Journal of chemical physics Vogele, M., Thomson, N. J., Truong, S. T., McAvity, J., Zachariae, U., Dror, R. O. 2025; 162 (1)

    Abstract

    Atomic-level simulations are widely used to study biomolecules and their dynamics. A common goal in such studies is to compare simulations of a molecular system under several conditions-for example, with various mutations or bound ligands-in order to identify differences between the molecular conformations adopted under these conditions. However, the large amount of data produced by simulations of ever larger and more complex systems often renders it difficult to identify the structural features that are relevant to a particular biochemical phenomenon. We present a flexible software package named Python ENSemble Analysis (PENSA) that enables a comprehensive and thorough investigation into biomolecular conformational ensembles. It provides featurization and feature transformations that allow for a complete representation of biomolecules such as proteins and nucleic acids, including water and ion binding sites, thus avoiding the bias that would come with manual feature selection. PENSA implements methods to systematically compare the distributions of molecular features across ensembles to find the significant differences between them and identify regions of interest. It also includes a novel approach to quantify the state-specific information between two regions of a biomolecule, which allows, for example, tracing information flow to identify allosteric pathways. PENSA also comes with convenient tools for loading data and visualizing results, making them quick to process and easy to interpret. PENSA is an open-source Python library maintained at https://github.com/drorlab/pensa along with an example workflow and a tutorial. We demonstrate its usefulness in real-world examples by showing how it helps us determine molecular mechanisms efficiently.

    View details for DOI 10.1063/5.0235544

    View details for PubMedID 39745157

  • Bayesian Optimization for Crop Genetics with Scalable Probabilistic Models Azam, R., Truong, S. T., Fernandes, S. B., Leakey, A. B., Lipka, A., El-Kebir, M., Koyejo, S., Antoran, J., Naesseth, C. A. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024: 30-44
  • An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models Bhatt, G., Chen, Y., Das, A. M., Zhang, J., Truong, S. T., Mussmann, S., Zhu, Y., Bilmes, J., Du, S. S., Jamieson, K., Ash, J. T., Nowak, R. D., Martins, A., Srikumar, Ku, L. W. ASSOC COMPUTATIONAL LINGUISTICS-ACL. 2024: 6549-6560
  • DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models Wang, B., Chen, W., Pei, H., Xie, C., Kang, M., Zhang, C., Xu, C., Xiong, Z., Dutta, R., Schaeffer, R., Truong, S. T., Arora, S., Mazeika, M., Hendrycks, D., Lin, Z., Cheng, Y., Koyejo, S., Song, D., Li, B., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • GAUCHE: A Library for Gaussian Processes in Chemistry Griffiths, R., Klarner, L., Moss, H., Ravuri, A., Truong, S., Stanton, S., Tom, G., Rankovic, B., Du, Y., Jamasb, A., Deshwal, A., Schwartz, J., Tripp, A., Kell, G., Frieder, S., Bourached, A., Chan, A. J., Moss, J., Guo, C., Durholt, J., Chaurasia, S., Park, J., Strieth-Kalthoff, F., Lee, A. A., Cheng, B., Aspuru-Guzik, A., Schwaller, P., Tang, J., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023