Honors & Awards

  • Travel Award, Journal Metabolites (2018)
  • Student Travel Award, The International Metabolomics Society (2018)

Professional Education

  • Doctor of Philosophy, Chinese Academy Of Sciences (2019)
  • BS, Inner Mongolia University (2013)

Stanford Advisors

Lab Affiliations

All Publications

  • Deep learning-based pseudo-mass spectrometry imaging analysis for precision medicine. Briefings in bioinformatics Shen, X., Shao, W., Wang, C., Liang, L., Chen, S., Zhang, S., Rusu, M., Snyder, M. P. 2022


    Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics provides systematic profiling of metabolic. Yet, its applications in precision medicine (disease diagnosis) have been limited by several challenges, including metabolite identification, information loss and low reproducibility. Here, we present the deep-learning-based Pseudo-Mass Spectrometry Imaging (deepPseudoMSI) project (https://www.deeppseudomsi.org/), which converts LC-MS raw data to pseudo-MS images and then processes them by deep learning for precision medicine, such as disease diagnosis. Extensive tests based on real data demonstrated the superiority of deepPseudoMSI over traditional approaches and the capacity of our method to achieve an accurate individualized diagnosis. Our framework lays the foundation for future metabolic-based precision medicine.

    View details for DOI 10.1093/bib/bbac331

    View details for PubMedID 35947990

  • massDatabase: utilities for the operation of the public compound and pathway database. Bioinformatics (Oxford, England) Shen, X., Wang, C., Snyder, M. P. 2022


    SUMMARY: One of the major challenges in LC-MS data is converting many metabolic feature entries to biological function information, such as metabolite annotation and pathway enrichment, which are based on the compound and pathway databases. Multiple online databases have been developed. However, no tool has been developed for operating all these databases for biological analysis. Therefore, we developed massDatabase, an R package that operates the online public databases and combines with other tools for streamlined compound annotation and pathway enrichment. massDatabase is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the users to leverage all the online public databases for biological function mining. A detailed tutorial and a case study are provided in the Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: https://massdatabase.tidymass.org/.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btac546

    View details for PubMedID 35944213

  • TidyMass an object-oriented reproducible analysis framework for LC-MS data. Nature communications Shen, X., Yan, H., Wang, C., Gao, P., Johnson, C. H., Snyder, M. P. 2022; 13 (1): 4365


    Reproducibility, traceability, and transparency have been long-standing issues for metabolomics data analysis. Multiple tools have been developed, but limitations still exist. Here, we present the tidyMass project ( https://www.tidymass.org/ ), a comprehensive R-based computational framework that can achieve the traceable, shareable, and reproducible workflow needs of data processing and analysis for LC-MS-based untargeted metabolomics. TidyMass is an ecosystem of R packages that share an underlying design philosophy, grammar, and data structure, which provides a comprehensive, reproducible, and object-oriented computational framework. The modular architecture makes tidyMass a highly flexible and extensible tool, which other users can improve and integrate with other tools to customize their own pipeline.

    View details for DOI 10.1038/s41467-022-32155-w

    View details for PubMedID 35902589

  • Precision environmental health monitoring by longitudinal exposome and multi-omics profiling. Genome research Gao, P., Shen, X., Zhang, X., Jiang, C., Zhang, S., Zhou, X., Schüssler-Fiorenza Rose, S. M., Snyder, M. 2022


    Conventional environmental health studies have primarily focused on limited environmental stressors at the population level, which lacks the power to dissect the complexity and heterogeneity of individualized environmental exposures. Here, as a pilot case study, we integrated deep-profiled longitudinal personal exposome and internal multi-omics to systematically investigate how the exposome shapes a single individual's phenome. We annotated thousands of chemical and biological components in the personal exposome cloud and found they were significantly correlated with thousands of internal biomolecules, which was further cross-validated using corresponding clinical data. Our results showed that agrochemicals and fungi predominated in the highly diverse and dynamic personal exposome, and the biomolecules and pathways related to the individual's immune system, kidney, and liver were highly associated with the personal external exposome. Overall, this data-driven longitudinal monitoring study shows the potential dynamic interactions between the personal exposome and internal multi-omics, as well as the impact of the exposome on precision health by producing abundant testable hypotheses.

    View details for DOI 10.1101/gr.276521.121

    View details for PubMedID 35667843

  • Multiomic analysis reveals cell-type-specific molecular determinants of COVID-19 severity. Cell systems Zhang, S., Cooper-Knock, J., Weimer, A. K., Shi, M., Kozhaya, L., Unutmaz, D., Harvey, C., Julian, T. H., Furini, S., Frullanti, E., Fava, F., Renieri, A., Gao, P., Shen, X., Timpanaro, I. S., Kenna, K. P., Baillie, J. K., Davis, M. M., Tsao, P. S., Snyder, M. P. 2022


    The determinants of severe COVID-19 in healthy adults are poorly understood, which limits the opportunity for early intervention. We present a multiomic analysis using machine learning to characterize the genomic basis of COVID-19 severity. We use single-cell multiome profiling of human lungs to link genetic signals to cell-type-specific functions. We discover >1,000 risk genes across 19 cell types, which account for 77% of the SNP-based heritability for severe disease. Genetic risk is particularly focused within natural killer (NK) cells and T cells, placing the dysfunction of these cells upstream of severe disease. Mendelian randomization and single-cell profiling of human NK cells support the role of NK cells and further localize genetic risk to CD56bright NK cells, which are key cytokine producers during the innate immune response. Rare variant analysis confirms the enrichment of severe-disease-associated genetic variation within NK-cell risk genes. Our study provides insights into the pathogenesis of severe COVID-19 with potential therapeutic targets.

    View details for DOI 10.1016/j.cels.2022.05.007

    View details for PubMedID 35690068

  • metID: a R package for automatable compound annotation for LC-MS-based data. Bioinformatics (Oxford, England) Shen, X., Wu, S., Liang, L., Chen, S., Contrepois, K., Zhu, Z., Snyder, M. 2021


    SUMMARY: Accurate and efficient compound annotation is a long-standing challenge for LC-MS-based data (e.g., untargeted metabolomics and exposomics). Substantial efforts have been devoted to overcoming this obstacle, whereas current tools are limited by the sources of spectral information used (in-house and public databases) and are not automated and streamlined. Therefore, we developed metID, an R package that combines information from all major databases for comprehensive and streamlined compound annotation. metID is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the compound annotation process to be fully automatic and reproducible. A detailed tutorial and a case study are provided in Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: https://jaspershen.github.io/metID.SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btab583

    View details for PubMedID 34432001

  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women OBSTETRICAL & GYNECOLOGICAL SURVEY Liang, L., Rasmussen, M., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. Y., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 75 (11): 649–51
  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women. Cell Liang, L., Rasmussen, M. H., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. C., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 181 (7): 1680


    Metabolism during pregnancy is a dynamic and precisely programmed process, the failure of which can bring devastating consequences to the mother and fetus. To define a high-resolution temporal profile of metabolites during healthy pregnancy, we analyzed the untargeted metabolome of 784weekly blood samples from 30 pregnant women. Broad changes and a highly choreographed profile were revealed: 4,995 metabolic features (of 9,651 total), 460 annotated compounds (of 687 total), and 34 human metabolic pathways (of 48 total) were significantly changed during pregnancy. Using linear models, we built a metabolic clock with five metabolites that time gestational age in high accordance with ultrasound (R= 0.92). Furthermore, two to three metabolites can identify when labor occurs (time to delivery within two, four, and eight weeks, AUROC ≥ 0.85). Our study represents a weekly characterization of the human pregnancy metabolome, providing a high-resolution landscape for understanding pregnancy with potential clinical utilities.

    View details for DOI 10.1016/j.cell.2020.05.002

    View details for PubMedID 32589958

  • Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics NATURE COMMUNICATIONS Shen, X., Wang, R., Xiong, X., Yin, Y., Cai, Y., Ma, Z., Liu, N., Zhu, Z. 2019; 10: 1516


    Large-scale metabolite annotation is a challenge in liquid chromatogram-mass spectrometry (LC-MS)-based untargeted metabolomics. Here, we develop a metabolic reaction network (MRN)-based recursive algorithm (MetDNA) that expands metabolite annotations without the need for a comprehensive standard spectral library. MetDNA is based on the rationale that seed metabolites and their reaction-paired neighbors tend to share structural similarities resulting in similar MS2 spectra. MetDNA characterizes initial seed metabolites using a small library of MS2 spectra, and utilizes their experimental MS2 spectra as surrogate spectra to annotate their reaction-paired neighbor metabolites, which subsequently serve as the basis for recursive analysis. Using different LC-MS platforms, data acquisition methods, and biological samples, we showcase the utility and versatility of MetDNA and demonstrate that about 2000 metabolites can cumulatively be annotated from one experiment. Our results demonstrate that MetDNA substantially expands metabolite annotation, enabling quantitative assessment of metabolic pathways and facilitating integrative multi-omics analysis.

    View details for DOI 10.1038/s41467-019-09550-x

    View details for Web of Science ID 000463171300005

    View details for PubMedID 30944337

    View details for PubMedCentralID PMC6447530

  • LipidIMMS Analyzer: integrating multi-dimensional information to support lipid identification in ion mobility-mass spectrometry based lipidomics BIOINFORMATICS Zhou, Z., Shen, X., Chen, X., Tu, J., Xiong, X., Zhu, Z. 2019; 35 (4): 698–700


    Ion mobility-mass spectrometry (IM-MS) has showed great application potential for lipidomics. However, IM-MS based lipidomics is significantly restricted by the available software for lipid structural identification. Here, we developed a software tool, namely, LipidIMMS Analyzer, to support the accurate identification of lipids in IM-MS. For the first time, the software incorporates a large-scale database covering over 260 000 lipids and four-dimensional structural information for each lipid [i.e. m/z, retention time (RT), collision cross-section (CCS) and MS/MS spectra]. Therefore, multi-dimensional information can be readily integrated to support lipid identifications, and significantly improve the coverage and confidence of identification. Currently, the software supports different IM-MS instruments and data acquisition approaches.The software is freely available at: http://imms.zhulab.cn/LipidIMMS/.Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/bty661

    View details for Web of Science ID 000459316300028

    View details for PubMedID 30052780

  • Development of a Correlative Strategy To Discover Colorectal Tumor Tissue Derived Metabolite Biomarkers in Plasma Using Untargeted Metabolomics ANALYTICAL CHEMISTRY Wang, Z., Cui, B., Zhang, F., Yang, Y., Shen, X., Li, Z., Zhao, W., Zhang, Y., Deng, K., Rong, Z., Yang, K., Yu, X., Li, K., Han, P., Zhu, Z. 2019; 91 (3): 2401–8


    The metabolic profiling of biofluids using untargeted metabolomics provides a promising choice to discover metabolite biomarkers for clinical cancer diagnosis. However, metabolite biomarkers discovered in biofluids may not necessarily reflect the pathological status of tumor tissue, which makes these biomarkers difficult to reproduce. In this study, we developed a new analysis strategy by integrating the univariate and multivariate correlation analysis approach to discover tumor tissue derived (TTD) metabolites in plasma samples. Specifically, untargeted metabolomics was first used to profile a set of paired tissue and plasma samples from 34 colorectal cancer (CRC) patients. Next, univariate correlation analysis was used to select correlative metabolite pairs between tissue and plasma, and a random forest regression model was utilized to define 243 TTD metabolites in plasma samples. The TTD metabolites in CRC plasma were demonstrated to accurately reflect the pathological status of tumor tissue and have great potential for metabolite biomarker discovery. Accordingly, we conducted a clinical study using a set of 146 plasma samples from CRC patients and gender-matched polyp controls to discover metabolite biomarkers from TTD metabolites. As a result, eight metabolites were selected as potential biomarkers for CRC diagnosis with high sensitivity and specificity. For CRC patients after surgery, the survival risk score defined by metabolite biomarkers also performed well in predicting overall survival time ( p = 0.022) and progression-free survival time ( p = 0.002). In conclusion, we developed a new analysis strategy which effectively discovers tumor tissue related metabolite biomarkers in plasma for cancer diagnosis and prognosis.

    View details for DOI 10.1021/acs.analchem.8b05177

    View details for Web of Science ID 000458220300103

    View details for PubMedID 30580524

  • MetFlow: An Interactive and Integrated Workflow for Metabolomics Data Cleaning and Differential Metabolite Discovery Bioinformatics Shen, X., Zhu, Z. 2019
  • LipidCCS: Prediction of Collision Cross-Section Values for Lipids with High Precision To Support Ion Mobility-Mass Spectrometry-Based Lipidomics ANALYTICAL CHEMISTRY Zhou, Z., Tu, J., Xiong, X., Shen, X., Zhu, Z. 2017; 89 (17): 9559–66


    The use of collision cross-section (CCS) values derived from ion mobility-mass spectrometry (IM-MS) has been proven to facilitate lipid identifications. Its utility is restricted by the limited availability of CCS values. Recently, the machine-learning algorithm-based prediction (e.g., MetCCS) is reported to generate CCS values in a large-scale. However, the prediction precision is not sufficient to differentiate lipids due to their high structural similarities and subtle differences on CCS values. To address this challenge, we developed a new approach, namely, LipidCCS, to precisely predict lipid CCS values. In LipidCCS, a set of molecular descriptors were optimized using bioinformatic approaches to comprehensively describe the subtle structure differences for lipids. The use of optimized molecular descriptors together with a large set of standard CCS values for lipids (458 in total) to build the prediction model significantly improved the precision. The prediction precision of LipidCCS was externally validated with median relative errors (MRE) of ∼1% using independent data sets across different instruments (Agilent DTIM-MS and Waters TWIM-MS) and laboratories. We also demonstrated that the improved precision in the predicted LipidCCS database (15 646 lipids and 63 434 CCS values in total) could effectively reduce false-positive identifications of lipids. Common users can freely access our LipidCCS web server for the following: (1) the prediction of lipid CCS values directly from SMILES structure; (2) database search; and (3) lipid match and identification. We believe LipidCCS will be a valuable tool to support IM-MS-based lipidomics. The web server is freely available on the Internet ( http://www.metabolomics-shanghai.org/LipidCCS/ ).

    View details for DOI 10.1021/acs.analchem.7b02625

    View details for Web of Science ID 000410014900133

    View details for PubMedID 28764323

  • Large-Scale Prediction of Collision Cross-Section Values for Metabolites in Ion Mobility-Mass Spectrometry ANALYTICAL CHEMISTRY Zhou, Z., Shen, X., Tu, J., Zhu, Z. 2016; 88 (22): 11084–91


    The rapid development of metabolomics has significantly advanced health and disease related research. However, metabolite identification remains a major analytical challenge for untargeted metabolomics. While the use of collision cross-section (CCS) values obtained in ion mobility-mass spectrometry (IM-MS) effectively increases identification confidence of metabolites, it is restricted by the limited number of available CCS values for metabolites. Here, we demonstrated the use of a machine-learning algorithm called support vector regression (SVR) to develop a prediction method that utilized 14 common molecular descriptors to predict CCS values for metabolites. In this work, we first experimentally measured CCS values (ΩN2) of ∼400 metabolites in nitrogen buffer gas and used these values as training data to optimize the prediction method. The high prediction precision of this method was externally validated using an independent set of metabolites with a median relative error (MRE) of ∼3%, better than conventional theoretical calculation. Using the SVR based prediction method, a large-scale predicted CCS database was generated for 35 203 metabolites in the Human Metabolome Database (HMDB). For each metabolite, five different ion adducts in positive and negative modes were predicted, accounting for 176 015 CCS values in total. Finally, improved metabolite identification accuracy was demonstrated using real biological samples. Conclusively, our results proved that the SVR based prediction method can accurately predict nitrogen CCS values (ΩN2) of metabolites from molecular descriptors and effectively improve identification accuracy and efficiency in untargeted metabolomics. The predicted CCS database, namely, MetCCS, is freely available on the Internet.

    View details for DOI 10.1021/acs.analchem.6b03091

    View details for Web of Science ID 000388154700045

    View details for PubMedID 27768289

  • Serum metabolomics for early diagnosis of esophageal squamous cell carcinoma by UHPLC-QTOF/MS METABOLOMICS Wang, J., Zhang, T., Shen, X., Liu, J., Zhao, D., Sun, Y., Wang, L., Liu, Y., Gong, X., Liu, Y., Zhu, Z., Xue, F. 2016; 12 (7)
  • Normalization and integration of large-scale metabolomics data using support vector regression METABOLOMICS Shen, X., Gong, X., Cai, Y., Guo, Y., Tu, J., Li, H., Zhang, T., Wang, J., Xue, F., Zhu, Z. 2016; 12 (5)