Stanford Advisors


All Publications


  • Enhancing insights into diseases through horizontal gene transfer event detection from gut microbiome NUCLEIC ACIDS RESEARCH Wang, S., Jiang, Y., Che, L., Wang, R., Li, S. 2024

    Abstract

    Horizontal gene transfer (HGT) phenomena pervade the gut microbiome and significantly impact human health. Yet, no current method can accurately identify complete HGT events, including the transferred sequence and the associated deletion and insertion breakpoints from shotgun metagenomic data. Here, we develop LocalHGT, which facilitates the reliable and swift detection of complete HGT events from shotgun metagenomic data, delivering an accuracy of 99.4%-verified by Nanopore data-across 200 gut microbiome samples, and achieving an average F1 score of 0.99 on 100 simulated data. LocalHGT enables a systematic characterization of HGT events within the human gut microbiome across 2098 samples, revealing that multiple recipient genome sites can become targets of a transferred sequence, microhomology is enriched in HGT breakpoint junctions (P-value = 3.3e-58), and HGTs can function as host-specific fingerprints indicated by the significantly higher HGT similarity of intra-personal temporal samples than inter-personal samples (P-value = 4.3e-303). Crucially, HGTs showed potential contributions to colorectal cancer (CRC) and acute diarrhoea, as evidenced by the enrichment of the butyrate metabolism pathway (P-value = 3.8e-17) and the shigellosis pathway (P-value = 5.9e-13) in the respective associated HGTs. Furthermore, differential HGTs demonstrated promise as biomarkers for predicting various diseases. Integrating HGTs into a CRC prediction model achieved an AUC of 0.87.

    View details for DOI 10.1093/nar/gkae515

    View details for Web of Science ID 001247647300001

    View details for PubMedID 38884260

  • Coding genomes with gapped pattern graph convolutional network. Bioinformatics (Oxford, England) Wang, R. H., Ng, Y. K., Zhang, X., Wang, J., Li, S. C. 2024

    Abstract

    MOTIVATION: Genome sequencing technologies reveal a huge amount of genomic sequences. Neural network-based methods can be prime candidates for retrieving insights from these sequences because of their applicability to large and diverse datasets. However, the highly variable lengths of genome sequences severely impair the presentation of sequences as input to the neural network. Genetic variations further complicate tasks that involve sequence comparison or alignment.RESULTS: Inspired by the theory and applications of spaced seeds, we propose a graph representation of genome sequences called gapped pattern graph. These graphs can be transformed through a Graph Convolutional Network to form lower-dimensional embeddings for downstream tasks. On the basis of the gapped pattern graphs, we implemented a neural network model and demonstrated its performance on diverse tasks involving microbe and mammalian genome data. Our method consistently outperformed all the other state-of-the-art methods across various metrics on all tasks, especially for the sequences with limited homology to the training data. In addition, our model was able to identify distinct gapped pattern signatures from the sequences.AVAILABILITY: The framework is available at https://github.com/deepomicslab/GCNFrame.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btae188

    View details for PubMedID 38603603

  • A novel deep generative model for mRNA vaccine development: Designing 5' UTRs withN1-methyl-pseudouridine modification. Acta pharmaceutica Sinica. B Tang, X., Huo, M., Chen, Y., Huang, H., Qin, S., Luo, J., Qin, Z., Jiang, X., Liu, Y., Duan, X., Wang, R., Chen, L., Li, H., Fan, N., He, Z., He, X., Shen, B., Li, S. C., Song, X. 2024; 14 (4): 1814-1826

    Abstract

    Efficient translation mediated by the 5' untranslated region (5' UTR) is essential for the robust efficacy of mRNA vaccines. However, the N1-methyl-pseudouridine (m1Psi) modification of mRNA can impact the translation efficiency of the 5' UTR. We discovered that the optimal 5' UTR for m1Psi-modified mRNA (m1Psi-5' UTR) differs significantly from its unmodified counterpart, highlighting the need for a specialized tool for designing m1Psi-5' UTRs rather than directly utilizing high-expression endogenous gene 5' UTRs. In response, we developed a novel machine learning-based tool, Smart5UTR, which employs a deep generative model to identify superior m1Psi-5' UTRs in silico. The tailored loss function and network architecture enable Smart5UTR to overcome limitations inherent in existing models. As a result, Smart5UTR can successfully design superior 5' UTRs, greatly benefiting mRNA vaccine development. Notably, Smart5UTR-designed superior 5' UTRs significantly enhanced antibody titers induced by COVID-19 mRNA vaccines against the Delta and Omicron variants of SARS-CoV-2, surpassing the performance of vaccines using high-expression endogenous gene 5' UTRs.

    View details for DOI 10.1016/j.apsb.2023.11.003

    View details for PubMedID 38572113