All Publications

  • Multinode Multi-GPU Two-Electron Integrals: Code Generation Using the Regent Language. Journal of chemical theory and computation Johnson, K. G., Mirchandaney, S., Hoag, E., Heirich, A., Aiken, A., Martinez, T. J. 2022


    The computation of two-electron repulsion integrals (ERIs) is often the most expensive step of integral-direct self-consistent field methods. Formally it scales as O(N4), where N is the number of Gaussian basis functions used to represent the molecular wave function. In practice, this scaling can be reduced to O(N2) or less by neglecting small integrals with screening methods. The contributions of the ERIs to the Fock matrix are of Coulomb (J) and exchange (K) type and require separate algorithms to compute matrix elements efficiently. We previously implemented highly efficient GPU-accelerated J-matrix and K-matrix algorithms in the electronic structure code TeraChem. Although these implementations supported the use of multiple GPUs on a node, they did not support the use of multiple nodes. This presents a key bottleneck to cutting-edge ab initio simulations of large systems, e.g., excited state dynamics of photoactive proteins. We present our implementation of multinode multi-GPU J- and K-matrix algorithms in TeraChem using the Regent programming language. Regent directly supports distributed computation in a task-based model and can generate code for a variety of architectures, including NVIDIA GPUs. We demonstrate multinode scaling up to 45 GPUs (3 nodes) and benchmark against hand-coded TeraChem integral code. We also outline our metaprogrammed Regent implementation, which enables flexible code generation for integrals of different angular momenta.

    View details for DOI 10.1021/acs.jctc.2c00414

    View details for PubMedID 36200649

  • Performance of Coupled-Cluster Singles and Doubles on Modern Stream Processing Architectures. Journal of chemical theory and computation Fales, B. S., Curtis, E. R., Johnson, K. G., Lahana, D., Seritan, S., Wang, Y., Weir, H., Martinez, T. J., Hohenstein, E. G. 2020


    We develop a new implementation of coupled-cluster singles and doubles (CCSD) optimized for the most recent graphical processing unit (GPU) hardware. We find that a single node with 8 NVIDIA V100 GPUs is capable of performing CCSD computations on roughly 100 atoms and 1300 basis functions in less than 1 day. Comparisons against massively parallel implementations of CCSD suggest that more than 64 CPU-based nodes (each with 16 cores) are required to match this performance.

    View details for DOI 10.1021/acs.jctc.0c00336

    View details for PubMedID 32567305