All Publications

  • Artificial intelligence foundation for therapeutic science. Nature chemical biology Huang, K., Fu, T., Gao, W., Zhao, Y., Roohani, Y., Leskovec, J., Coley, C. W., Xiao, C., Sun, J., Zitnik, M. 2022

    View details for DOI 10.1038/s41589-022-01131-2

    View details for PubMedID 36131149

  • HINT: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns (New York, N.Y.) Fu, T., Huang, K., Xiao, C., Glass, L. M., Sun, J. 2022; 3 (4): 100445


    Clinical trials are crucial for drug development but often face uncertain outcomes due to safety, efficacy, or patient-recruitment problems. We propose the Hierarchical Interaction Network (HINT) to predict clinical trial outcomes. First, HINT encodes multi-modal data (drug molecule, target disease, trial eligibility criteria) into embeddings. Then, HINT trains knowledge-embedding modules using drug pharmacokinetic and historical trial data. Finally, a hierarchical interaction graph connects all of the embeddings to capture their interactions and predict trial outcomes. HINT was trained and validated on 1,160 phase I trials, 4,449 phase II trials, and 3,436 phase III trials. It obtained 0.665, 0.620, and 0.847 F1 scores on separate test sets of 627 phase I, 1,653 phase II, and 1,140 phase III trials, respectively. HINT significantly outperforms the best baseline method on most metrics. The benchmark dataset and codes are released at

    View details for DOI 10.1016/j.patter.2022.100445

    View details for PubMedID 35465223

  • Machine learning applications for therapeutic tasks with genomics data PATTERNS Huang, K., Xiao, C., Glass, L. M., Critchlow, C. W., Gibson, G., Sun, J. 2021; 2 (10): 100328


    Thanks to the increasing availability of genomics and other biomedical data, many machine learning algorithms have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine learning applications for genomics through the lens of therapeutic development. We investigate the interplay among genomics, compounds, proteins, electronic health records, cellular images, and clinical texts. We identify 22 machine learning in genomics applications that span the whole therapeutics pipeline, from discovering novel targets, personalizing medicine, developing gene-editing tools, all the way to facilitating clinical trials and post-market studies. We also pinpoint seven key challenges in this field with potentials for expansion and impact. This survey examines recent research at the intersection of machine learning, genomics, and therapeutic development.

    View details for DOI 10.1016/j.patter.2021.100328

    View details for Web of Science ID 000706708700002

    View details for PubMedID 34693370

    View details for PubMedCentralID PMC8515011