All Publications


  • Crystal Structure Determination from Powder Diffraction Patterns with Generative Machine Learning. Journal of the American Chemical Society Riesel, E. A., Mackey, T., Nilforoshan, H., Xu, M., Badding, C. K., Altman, A. B., Leskovec, J., Freedman, D. E. 2024

    Abstract

    Powder X-ray diffraction (PXRD) is a cornerstone technique in materials characterization. However, complete structure determination from PXRD patterns alone remains time-consuming and is often intractable, especially for novel materials. Current machine learning (ML) approaches to PXRD analysis predict only a subset of the total information that comprises a crystal structure. We developed a pioneering generative ML model designed to solve crystal structures from real-world experimental PXRD data. In addition to strong performance on simulated diffraction patterns, we demonstrate full structure solutions over a large set of experimental diffraction patterns. Benchmarking our model, we predicted the structure for 134 experimental patterns from the RRUFF database and thousands of simulated patterns from the Materials Project on which our model achieves state-of-the-art 42 and 67% match rate, respectively. Further, we applied our model to determine the unreported structures of materials such as NaCu2P2, Ca2MnTeO6, ZrGe6Ni6, LuOF, and HoNdV2O8 from the Powder Diffraction File database. We extended this methodology to new materials created in our lab at high pressure with previously unsolved structures and found the new binary compounds Rh3Bi, RuBi2, and KBi3. We expect that our model will open avenues toward materials discovery under conditions which preclude single crystal growth and toward automated materials discovery pipelines, opening the door to new domains of chemistry.

    View details for DOI 10.1021/jacs.4c10244

    View details for PubMedID 39298266

  • An all-atom protein generative model. Proceedings of the National Academy of Sciences of the United States of America Chu, A. E., Kim, J., Cheng, L., El Nesr, G., Xu, M., Shuai, R. W., Huang, P. S. 2024; 121 (27): e2311500121

    Abstract

    Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically through sidechains, is an important need in protein design. However, constructing an all-atom generative model requires an appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded in the structure and sequence. We describe an all-atom diffusion model of protein structure, Protpardelle, which represents all sidechain states at once as a "superposition" state; superpositions defining a protein are collapsed into individual residue types and conformations during sample generation. When combined with sequence design methods, our model is able to codesign all-atom protein structure and sequence. Generated proteins are of good quality under the typical quality, diversity, and novelty metrics, and sidechains reproduce the chemical features and behavior of natural proteins. Finally, we explore the potential of our model to conduct all-atom protein design and scaffold functional motifs in a backbone- and rotamer-free way.

    View details for DOI 10.1073/pnas.2311500121

    View details for PubMedID 38916999

  • An all-atom protein generative model. bioRxiv : the preprint server for biology Chu, A. E., Cheng, L., Nesr, G. E., Xu, M., Huang, P. S. 2023

    Abstract

    Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically through sidechains, is an important need in protein design. However, constructing an all-atom generative model requires an appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded in the structure and sequence. We describe an all-atom diffusion model of protein structure, Protpardelle, which instantiates a "superposition" over the possible sidechain states, and collapses it to conduct reverse diffusion for sample generation. When combined with sequence design methods, our model is able to co-design all-atom protein structure and sequence. Generated proteins are of good quality under the typical quality, diversity, and novelty metrics, and sidechains reproduce the chemical features and behavior of natural proteins. Finally, we explore the potential of our model conduct all-atom protein design and scaffold functional motifs in a backbone- and rotamer-free way.

    View details for DOI 10.1101/2023.05.24.542194

    View details for PubMedID 37292974

    View details for PubMedCentralID PMC10245864

  • Graph and Geometry Generative Modeling for Drug Discovery Xu, M., Liu, M., Jin, W., Ji, S., Leskovec, J., Ermon, S., ACM ASSOC COMPUTING MACHINERY. 2023: 5833-5834
  • Scaling Riemannian Diffusion Models Lou, A., Xu, M., Farris, A., Ermon, S., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation Song, Y., Gong, J., Xu, M., Cao, Z., Lan, Y., Ermon, S., Zhou, H., Ma, W., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • MUDiff: Unified Diffusion for Complete Molecule Generation Hua, C., Luan, S., Xu, M., Ying, R., Fu, J., Ermon, S., Precup, D., Villar, S., Chamberlain, B. JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
  • When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability Luan, S., Hua, C., Xu, M., Lu, Q., Zhu, J., Chang, X., Fu, J., Leskovec, J., Precup, D., Oh, A., Neumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
  • Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip Song, Y., Xu, M., Yu, L., Zhou, H., Shao, S., Yu, Y., Assoc Advancement Artificial Intelligence ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2020: 5834-5841