Minkai Xu
Ph.D. Student in Computer Science, admitted Autumn 2022
All Publications
-
Crystal Structure Determination from Powder Diffraction Patterns with Generative Machine Learning.
Journal of the American Chemical Society
2024
Abstract
Powder X-ray diffraction (PXRD) is a cornerstone technique in materials characterization. However, complete structure determination from PXRD patterns alone remains time-consuming and is often intractable, especially for novel materials. Current machine learning (ML) approaches to PXRD analysis predict only a subset of the total information that comprises a crystal structure. We developed a pioneering generative ML model designed to solve crystal structures from real-world experimental PXRD data. In addition to strong performance on simulated diffraction patterns, we demonstrate full structure solutions over a large set of experimental diffraction patterns. Benchmarking our model, we predicted the structure for 134 experimental patterns from the RRUFF database and thousands of simulated patterns from the Materials Project on which our model achieves state-of-the-art 42 and 67% match rate, respectively. Further, we applied our model to determine the unreported structures of materials such as NaCu2P2, Ca2MnTeO6, ZrGe6Ni6, LuOF, and HoNdV2O8 from the Powder Diffraction File database. We extended this methodology to new materials created in our lab at high pressure with previously unsolved structures and found the new binary compounds Rh3Bi, RuBi2, and KBi3. We expect that our model will open avenues toward materials discovery under conditions which preclude single crystal growth and toward automated materials discovery pipelines, opening the door to new domains of chemistry.
View details for DOI 10.1021/jacs.4c10244
View details for PubMedID 39298266
-
An all-atom protein generative model.
Proceedings of the National Academy of Sciences of the United States of America
2024; 121 (27): e2311500121
Abstract
Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically through sidechains, is an important need in protein design. However, constructing an all-atom generative model requires an appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded in the structure and sequence. We describe an all-atom diffusion model of protein structure, Protpardelle, which represents all sidechain states at once as a "superposition" state; superpositions defining a protein are collapsed into individual residue types and conformations during sample generation. When combined with sequence design methods, our model is able to codesign all-atom protein structure and sequence. Generated proteins are of good quality under the typical quality, diversity, and novelty metrics, and sidechains reproduce the chemical features and behavior of natural proteins. Finally, we explore the potential of our model to conduct all-atom protein design and scaffold functional motifs in a backbone- and rotamer-free way.
View details for DOI 10.1073/pnas.2311500121
View details for PubMedID 38916999
-
An all-atom protein generative model.
bioRxiv : the preprint server for biology
2023
Abstract
Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically through sidechains, is an important need in protein design. However, constructing an all-atom generative model requires an appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded in the structure and sequence. We describe an all-atom diffusion model of protein structure, Protpardelle, which instantiates a "superposition" over the possible sidechain states, and collapses it to conduct reverse diffusion for sample generation. When combined with sequence design methods, our model is able to co-design all-atom protein structure and sequence. Generated proteins are of good quality under the typical quality, diversity, and novelty metrics, and sidechains reproduce the chemical features and behavior of natural proteins. Finally, we explore the potential of our model conduct all-atom protein design and scaffold functional motifs in a backbone- and rotamer-free way.
View details for DOI 10.1101/2023.05.24.542194
View details for PubMedID 37292974
View details for PubMedCentralID PMC10245864
-
Graph and Geometry Generative Modeling for Drug Discovery
ASSOC COMPUTING MACHINERY. 2023: 5833-5834
View details for DOI 10.1145/3580305.3599559
View details for Web of Science ID 001118896305100
-
Scaling Riemannian Diffusion Models
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001228825106030
-
Equivariant Flow Matching with Hybrid Probability Transport for 3D Molecule Generation
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001220818805023
-
MUDiff: Unified Diffusion for Complete Molecule Generation
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023
View details for Web of Science ID 001221054300020
-
When Do Graph Neural Networks Help with Node Classification? Investigating the Impact of Homophily Principle on Node Distinguishability
NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2023
View details for Web of Science ID 001230083405025
-
Infomax Neural Joint Source-Channel Coding via Adversarial Bit Flip
ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE. 2020: 5834-5841
View details for Web of Science ID 000667722805111