Stanford Advisors

All Publications

  • Coding genomes with gapped pattern graph convolutional network. Bioinformatics (Oxford, England) Wang, R. H., Ng, Y. K., Zhang, X., Wang, J., Li, S. C. 2024


    MOTIVATION: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment.RESULTS: Inspired by the theory and applications of spaced seeds, we propose a graph representation of genome sequences called gapped pattern graph. These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences.AVAILABILITY: The framework is available at INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btae188

    View details for PubMedID 38603603

  • A novel deep generative model for mRNA vaccine development: Designing 5' UTRs withN1-methyl-pseudouridine modification. Acta pharmaceutica Sinica. B Tang, X., Huo, M., Chen, Y., Huang, H., Qin, S., Luo, J., Qin, Z., Jiang, X., Liu, Y., Duan, X., Wang, R., Chen, L., Li, H., Fan, N., He, Z., He, X., Shen, B., Li, S. C., Song, X. 2024; 14 (4): 1814-1826


    Efficient translation mediated by the 5' untranslated region (5' UTR) is essential for the robust efficacy of mRNA vaccines. However, the N1-methyl-pseudouridine (m1Psi) modification of mRNA can impact the translation efficiency of the 5' UTR. We discovered that the optimal 5' UTR for m1Psi-modified mRNA (m1Psi-5' UTR) differs significantly from its unmodified counterpart, highlighting the need for a specialized tool for designing m1Psi-5' UTRs rather than directly utilizing high-expression endogenous gene 5' UTRs. In response, we developed a novel machine learning-based tool, Smart5UTR, which employs a deep generative model to identify superior m1Psi-5' UTRs in silico. The tailored loss function and network architecture enable Smart5UTR to overcome limitations inherent in existing models. As a result, Smart5UTR can successfully design superior 5' UTRs, greatly benefiting mRNA vaccine development. Notably, Smart5UTR-designed superior 5' UTRs significantly enhanced antibody titers induced by COVID-19 mRNA vaccines against the Delta and Omicron variants of SARS-CoV-2, surpassing the performance of vaccines using high-expression endogenous gene 5' UTRs.

    View details for DOI 10.1016/j.apsb.2023.11.003

    View details for PubMedID 38572113