All Publications

  • Multi-omics microsampling for the profiling of lifestyle-associated changes in health. Nature biomedical engineering Shen, X., Kellogg, R., Panyard, D. J., Bararpour, N., Castillo, K. E., Lee-McMullen, B., Delfarah, A., Ubellacker, J., Ahadi, S., Rosenberg-Hasson, Y., Ganz, A., Contrepois, K., Michael, B., Simms, I., Wang, C., Hornburg, D., Snyder, M. P. 2023


    Current healthcare practices are reactive and use limited physiological and clinical information, often collected months or years apart. Moreover, the discovery and profiling of blood biomarkers in clinical and research settings are constrained by geographical barriers, the cost and inconvenience of in-clinic venepuncture, low sampling frequency and the low depth of molecular measurements. Here we describe a strategy for the frequent capture and analysis of thousands of metabolites, lipids, cytokines and proteins in 10 μl of blood alongside physiological information from wearable sensors. We show the advantages of such frequent and dense multi-omics microsampling in two applications: the assessment of the reactions to a complex mixture of dietary interventions, to discover individualized inflammatory and metabolic responses; and deep individualized profiling, to reveal large-scale molecular fluctuations as well as thousands of molecular relationships associated with intra-day physiological variations (in heart rate, for example) and with the levels of clinical biomarkers (specifically, glucose and cortisol) and of physical activity. Combining wearables and multi-omics microsampling for frequent and scalable omics may facilitate dynamic health profiling and biomarker discovery.

    View details for DOI 10.1038/s41551-022-00999-8

    View details for PubMedID 36658343

  • Semi-supervised Cooperative Learning for Multiomics Data Fusion Ding, D., Shen, X., Snyder, M., Tibshirani, R., Maier, A. K., Schnabel, J. A., Tiwari, P., Stegle, O. SPRINGER INTERNATIONAL PUBLISHING AG. 2024: 54-63
  • Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Gonçalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W. J., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., Börner, K., Snyder, M. P. 2023


    The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.

    View details for DOI 10.1038/s41556-023-01194-w

    View details for PubMedID 37468756

    View details for PubMedCentralID 8238499

  • Organism-wide, cell-type-specific secretome mapping of exercise training in mice. Cell metabolism Wei, W., Riley, N. M., Lyu, X., Shen, X., Guo, J., Raun, S. H., Zhao, M., Moya-Garzon, M. D., Basu, H., Sheng-Hwa Tung, A., Li, V. L., Huang, W., Wiggenhorn, A. L., Svensson, K. J., Snyder, M. P., Bertozzi, C. R., Long, J. Z. 2023


    There is a significant interest in identifying blood-borne factors that mediate tissue crosstalk and function as molecular effectors of physical activity. Although past studies have focused on an individual molecule or cell type, the organism-wide secretome response to physical activity has not been evaluated. Here, we use a cell-type-specific proteomic approach to generate a 21-cell-type, 10-tissue map of exercise training-regulated secretomes in mice. Our dataset identifies >200 exercise training-regulated cell-type-secreted protein pairs, the majority of which have not been previously reported. Pdgfra-cre-labeled secretomes were the most responsive to exercise training. Finally, we show anti-obesity, anti-diabetic, and exercise performance-enhancing activities for proteoforms of intracellular carboxylesterases whose secretion from the liver is induced by exercise training.

    View details for DOI 10.1016/j.cmet.2023.04.011

    View details for PubMedID 37141889

  • Deep learning-based pseudo-mass spectrometry imaging analysis for precision medicine. Briefings in bioinformatics Shen, X., Shao, W., Wang, C., Liang, L., Chen, S., Zhang, S., Rusu, M., Snyder, M. P. 2022


    Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics provides systematic profiling of metabolic. Yet, its applications in precision medicine (disease diagnosis) have been limited by several challenges, including metabolite identification, information loss and low reproducibility. Here, we present the deep-learning-based Pseudo-Mass Spectrometry Imaging (deepPseudoMSI) project (, which converts LC-MS raw data to pseudo-MS images and then processes them by deep learning for precision medicine, such as disease diagnosis. Extensive tests based on real data demonstrated the superiority of deepPseudoMSI over traditional approaches and the capacity of our method to achieve an accurate individualized diagnosis. Our framework lays the foundation for future metabolic-based precision medicine.

    View details for DOI 10.1093/bib/bbac331

    View details for PubMedID 35947990

  • massDatabase: utilities for the operation of the public compound and pathway database. Bioinformatics (Oxford, England) Shen, X., Wang, C., Snyder, M. P. 2022


    SUMMARY: One of the major challenges in LC-MS data is converting many metabolic feature entries to biological function information, such as metabolite annotation and pathway enrichment, which are based on the compound and pathway databases. Multiple online databases have been developed. However, no tool has been developed for operating all these databases for biological analysis. Therefore, we developed massDatabase, an R package that operates the online public databases and combines with other tools for streamlined compound annotation and pathway enrichment. massDatabase is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the users to leverage all the online public databases for biological function mining. A detailed tutorial and a case study are provided in the Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btac546

    View details for PubMedID 35944213

  • TidyMass an object-oriented reproducible analysis framework for LC-MS data. Nature communications Shen, X., Yan, H., Wang, C., Gao, P., Johnson, C. H., Snyder, M. P. 2022; 13 (1): 4365


    Reproducibility, traceability, and transparency have been long-standing issues for metabolomics data analysis. Multiple tools have been developed, but limitations still exist. Here, we present the tidyMass project ( ), a comprehensive R-based computational framework that can achieve the traceable, shareable, and reproducible workflow needs of data processing and analysis for LC-MS-based untargeted metabolomics. TidyMass is an ecosystem of R packages that share an underlying design philosophy, grammar, and data structure, which provides a comprehensive, reproducible, and object-oriented computational framework. The modular architecture makes tidyMass a highly flexible and extensible tool, which other users can improve and integrate with other tools to customize their own pipeline.

    View details for DOI 10.1038/s41467-022-32155-w

    View details for PubMedID 35902589

  • Precision environmental health monitoring by longitudinal exposome and multi-omics profiling. Genome research Gao, P., Shen, X., Zhang, X., Jiang, C., Zhang, S., Zhou, X., Schüssler-Fiorenza Rose, S. M., Snyder, M. 2022


    Conventional environmental health studies have primarily focused on limited environmental stressors at the population level, which lacks the power to dissect the complexity and heterogeneity of individualized environmental exposures. Here, as a pilot case study, we integrated deep-profiled longitudinal personal exposome and internal multi-omics to systematically investigate how the exposome shapes a single individual's phenome. We annotated thousands of chemical and biological components in the personal exposome cloud and found they were significantly correlated with thousands of internal biomolecules, which was further cross-validated using corresponding clinical data. Our results showed that agrochemicals and fungi predominated in the highly diverse and dynamic personal exposome, and the biomolecules and pathways related to the individual's immune system, kidney, and liver were highly associated with the personal external exposome. Overall, this data-driven longitudinal monitoring study shows the potential dynamic interactions between the personal exposome and internal multi-omics, as well as the impact of the exposome on precision health by producing abundant testable hypotheses.

    View details for DOI 10.1101/gr.276521.121

    View details for PubMedID 35667843

  • Multiomic analysis reveals cell-type-specific molecular determinants of COVID-19 severity. Cell systems Zhang, S., Cooper-Knock, J., Weimer, A. K., Shi, M., Kozhaya, L., Unutmaz, D., Harvey, C., Julian, T. H., Furini, S., Frullanti, E., Fava, F., Renieri, A., Gao, P., Shen, X., Timpanaro, I. S., Kenna, K. P., Baillie, J. K., Davis, M. M., Tsao, P. S., Snyder, M. P. 2022


    The determinants of severe COVID-19 in healthy adults are poorly understood, which limits the opportunity for early intervention. We present a multiomic analysis using machine learning to characterize the genomic basis of COVID-19 severity. We use single-cell multiome profiling of human lungs to link genetic signals to cell-type-specific functions. We discover >1,000 risk genes across 19 cell types, which account for 77% of the SNP-based heritability for severe disease. Genetic risk is particularly focused within natural killer (NK) cells and T cells, placing the dysfunction of these cells upstream of severe disease. Mendelian randomization and single-cell profiling of human NK cells support the role of NK cells and further localize genetic risk to CD56bright NK cells, which are key cytokine producers during the innate immune response. Rare variant analysis confirms the enrichment of severe-disease-associated genetic variation within NK-cell risk genes. Our study provides insights into the pathogenesis of severe COVID-19 with potential therapeutic targets.

    View details for DOI 10.1016/j.cels.2022.05.007

    View details for PubMedID 35690068

  • metID: a R package for automatable compound annotation for LC-MS-based data. Bioinformatics (Oxford, England) Shen, X., Wu, S., Liang, L., Chen, S., Contrepois, K., Zhu, Z., Snyder, M. 2021


    SUMMARY: Accurate and efficient compound annotation is a long-standing challenge for LC-MS-based data (e.g., untargeted metabolomics and exposomics). Substantial efforts have been devoted to overcoming this obstacle, whereas current tools are limited by the sources of spectral information used (in-house and public databases) and are not automated and streamlined. Therefore, we developed metID, an R package that combines information from all major databases for comprehensive and streamlined compound annotation. metID is a flexible, simple, and powerful tool that can be installed on all platforms, allowing the compound annotation process to be fully automatic and reproducible. A detailed tutorial and a case study are provided in Supplementary Materials.AVAILABILITY AND IMPLEMENTATION: INFORMATION: Supplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btab583

    View details for PubMedID 34432001

  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women OBSTETRICAL & GYNECOLOGICAL SURVEY Liang, L., Rasmussen, M., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. Y., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 75 (11): 649–51
  • Metabolic Dynamics and Prediction of Gestational Age and Time to Delivery in Pregnant Women. Cell Liang, L., Rasmussen, M. H., Piening, B., Shen, X., Chen, S., Rost, H., Snyder, J. K., Tibshirani, R., Skotte, L., Lee, N. C., Contrepois, K., Feenstra, B., Zackriah, H., Snyder, M., Melbye, M. 2020; 181 (7): 1680


    Metabolism during pregnancy is a dynamic and precisely programmed process, the failure of which can bring devastating consequences to the mother and fetus. To define a high-resolution temporal profile of metabolites during healthy pregnancy, we analyzed the untargeted metabolome of 784weekly blood samples from 30 pregnant women. Broad changes and a highly choreographed profile were revealed: 4,995 metabolic features (of 9,651 total), 460 annotated compounds (of 687 total), and 34 human metabolic pathways (of 48 total) were significantly changed during pregnancy. Using linear models, we built a metabolic clock with five metabolites that time gestational age in high accordance with ultrasound (R= 0.92). Furthermore, two to three metabolites can identify when labor occurs (time to delivery within two, four, and eight weeks, AUROC ≥ 0.85). Our study represents a weekly characterization of the human pregnancy metabolome, providing a high-resolution landscape for understanding pregnancy with potential clinical utilities.

    View details for DOI 10.1016/j.cell.2020.05.002

    View details for PubMedID 32589958

  • Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics NATURE COMMUNICATIONS Shen, X., Wang, R., Xiong, X., Yin, Y., Cai, Y., Ma, Z., Liu, N., Zhu, Z. 2019; 10: 1516


    Large-scale metabolite annotation is a challenge in liquid chromatogram-mass spectrometry (LC-MS)-based untargeted metabolomics. Here, we develop a metabolic reaction network (MRN)-based recursive algorithm (MetDNA) that expands metabolite annotations without the need for a comprehensive standard spectral library. MetDNA is based on the rationale that seed metabolites and their reaction-paired neighbors tend to share structural similarities resulting in similar MS2 spectra. MetDNA characterizes initial seed metabolites using a small library of MS2 spectra, and utilizes their experimental MS2 spectra as surrogate spectra to annotate their reaction-paired neighbor metabolites, which subsequently serve as the basis for recursive analysis. Using different LC-MS platforms, data acquisition methods, and biological samples, we showcase the utility and versatility of MetDNA and demonstrate that about 2000 metabolites can cumulatively be annotated from one experiment. Our results demonstrate that MetDNA substantially expands metabolite annotation, enabling quantitative assessment of metabolic pathways and facilitating integrative multi-omics analysis.

    View details for DOI 10.1038/s41467-019-09550-x

    View details for Web of Science ID 000463171300005

    View details for PubMedID 30944337

    View details for PubMedCentralID PMC6447530

  • LipidIMMS Analyzer: integrating multi-dimensional information to support lipid identification in ion mobility-mass spectrometry based lipidomics BIOINFORMATICS Zhou, Z., Shen, X., Chen, X., Tu, J., Xiong, X., Zhu, Z. 2019; 35 (4): 698–700


    Ion mobility-mass spectrometry (IM-MS) has showed great application potential for lipidomics. However, IM-MS based lipidomics is significantly restricted by the available software for lipid structural identification. Here, we developed a software tool, namely, LipidIMMS Analyzer, to support the accurate identification of lipids in IM-MS. For the first time, the software incorporates a large-scale database covering over 260 000 lipids and four-dimensional structural information for each lipid [i.e. m/z, retention time (RT), collision cross-section (CCS) and MS/MS spectra]. Therefore, multi-dimensional information can be readily integrated to support lipid identifications, and significantly improve the coverage and confidence of identification. Currently, the software supports different IM-MS instruments and data acquisition approaches.The software is freely available at: data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/bty661

    View details for Web of Science ID 000459316300028

    View details for PubMedID 30052780

  • Development of a Correlative Strategy To Discover Colorectal Tumor Tissue Derived Metabolite Biomarkers in Plasma Using Untargeted Metabolomics ANALYTICAL CHEMISTRY Wang, Z., Cui, B., Zhang, F., Yang, Y., Shen, X., Li, Z., Zhao, W., Zhang, Y., Deng, K., Rong, Z., Yang, K., Yu, X., Li, K., Han, P., Zhu, Z. 2019; 91 (3): 2401–8


    The metabolic profiling of biofluids using untargeted metabolomics provides a promising choice to discover metabolite biomarkers for clinical cancer diagnosis. However, metabolite biomarkers discovered in biofluids may not necessarily reflect the pathological status of tumor tissue, which makes these biomarkers difficult to reproduce. In this study, we developed a new analysis strategy by integrating the univariate and multivariate correlation analysis approach to discover tumor tissue derived (TTD) metabolites in plasma samples. Specifically, untargeted metabolomics was first used to profile a set of paired tissue and plasma samples from 34 colorectal cancer (CRC) patients. Next, univariate correlation analysis was used to select correlative metabolite pairs between tissue and plasma, and a random forest regression model was utilized to define 243 TTD metabolites in plasma samples. The TTD metabolites in CRC plasma were demonstrated to accurately reflect the pathological status of tumor tissue and have great potential for metabolite biomarker discovery. Accordingly, we conducted a clinical study using a set of 146 plasma samples from CRC patients and gender-matched polyp controls to discover metabolite biomarkers from TTD metabolites. As a result, eight metabolites were selected as potential biomarkers for CRC diagnosis with high sensitivity and specificity. For CRC patients after surgery, the survival risk score defined by metabolite biomarkers also performed well in predicting overall survival time ( p = 0.022) and progression-free survival time ( p = 0.002). In conclusion, we developed a new analysis strategy which effectively discovers tumor tissue related metabolite biomarkers in plasma for cancer diagnosis and prognosis.

    View details for DOI 10.1021/acs.analchem.8b05177

    View details for Web of Science ID 000458220300103

    View details for PubMedID 30580524

  • MetFlow: An Interactive and Integrated Workflow for Metabolomics Data Cleaning and Differential Metabolite Discovery Bioinformatics Shen, X., Zhu, Z. 2019
  • LipidCCS: Prediction of Collision Cross-Section Values for Lipids with High Precision To Support Ion Mobility-Mass Spectrometry-Based Lipidomics ANALYTICAL CHEMISTRY Zhou, Z., Tu, J., Xiong, X., Shen, X., Zhu, Z. 2017; 89 (17): 9559–66


    The use of collision cross-section (CCS) values derived from ion mobility-mass spectrometry (IM-MS) has been proven to facilitate lipid identifications. Its utility is restricted by the limited availability of CCS values. Recently, the machine-learning algorithm-based prediction (e.g., MetCCS) is reported to generate CCS values in a large-scale. However, the prediction precision is not sufficient to differentiate lipids due to their high structural similarities and subtle differences on CCS values. To address this challenge, we developed a new approach, namely, LipidCCS, to precisely predict lipid CCS values. In LipidCCS, a set of molecular descriptors were optimized using bioinformatic approaches to comprehensively describe the subtle structure differences for lipids. The use of optimized molecular descriptors together with a large set of standard CCS values for lipids (458 in total) to build the prediction model significantly improved the precision. The prediction precision of LipidCCS was externally validated with median relative errors (MRE) of ∼1% using independent data sets across different instruments (Agilent DTIM-MS and Waters TWIM-MS) and laboratories. We also demonstrated that the improved precision in the predicted LipidCCS database (15 646 lipids and 63 434 CCS values in total) could effectively reduce false-positive identifications of lipids. Common users can freely access our LipidCCS web server for the following: (1) the prediction of lipid CCS values directly from SMILES structure; (2) database search; and (3) lipid match and identification. We believe LipidCCS will be a valuable tool to support IM-MS-based lipidomics. The web server is freely available on the Internet ( ).

    View details for DOI 10.1021/acs.analchem.7b02625

    View details for Web of Science ID 000410014900133

    View details for PubMedID 28764323

  • Large-Scale Prediction of Collision Cross-Section Values for Metabolites in Ion Mobility-Mass Spectrometry ANALYTICAL CHEMISTRY Zhou, Z., Shen, X., Tu, J., Zhu, Z. 2016; 88 (22): 11084–91


    The rapid development of metabolomics has significantly advanced health and disease related research. However, metabolite identification remains a major analytical challenge for untargeted metabolomics. While the use of collision cross-section (CCS) values obtained in ion mobility-mass spectrometry (IM-MS) effectively increases identification confidence of metabolites, it is restricted by the limited number of available CCS values for metabolites. Here, we demonstrated the use of a machine-learning algorithm called support vector regression (SVR) to develop a prediction method that utilized 14 common molecular descriptors to predict CCS values for metabolites. In this work, we first experimentally measured CCS values (ΩN2) of ∼400 metabolites in nitrogen buffer gas and used these values as training data to optimize the prediction method. The high prediction precision of this method was externally validated using an independent set of metabolites with a median relative error (MRE) of ∼3%, better than conventional theoretical calculation. Using the SVR based prediction method, a large-scale predicted CCS database was generated for 35 203 metabolites in the Human Metabolome Database (HMDB). For each metabolite, five different ion adducts in positive and negative modes were predicted, accounting for 176 015 CCS values in total. Finally, improved metabolite identification accuracy was demonstrated using real biological samples. Conclusively, our results proved that the SVR based prediction method can accurately predict nitrogen CCS values (ΩN2) of metabolites from molecular descriptors and effectively improve identification accuracy and efficiency in untargeted metabolomics. The predicted CCS database, namely, MetCCS, is freely available on the Internet.

    View details for DOI 10.1021/acs.analchem.6b03091

    View details for Web of Science ID 000388154700045

    View details for PubMedID 27768289

  • Serum metabolomics for early diagnosis of esophageal squamous cell carcinoma by UHPLC-QTOF/MS METABOLOMICS Wang, J., Zhang, T., Shen, X., Liu, J., Zhao, D., Sun, Y., Wang, L., Liu, Y., Gong, X., Liu, Y., Zhu, Z., Xue, F. 2016; 12 (7)
  • Normalization and integration of large-scale metabolomics data using support vector regression METABOLOMICS Shen, X., Gong, X., Cai, Y., Guo, Y., Tu, J., Li, H., Zhang, T., Wang, J., Xue, F., Zhu, Z. 2016; 12 (5)