My research is focused on computational and systems biology. My primary research interest lies in developing new computational algorithms and statistical methods for the analysis of complex data in biological systems, especially related to the large-scale single-cell RNA sequencing data. The specific topics I have examined include:
1. Integration of single-cell multi-omics datasets for tumor
2. Statistical test of cell developmental trajectories
3. Visualization and reconstruction of single-cell RNA sequencing data
4. Computational analysis of the bifurcating event revealed by dynamical network biomarker methods

Honors & Awards

  • China Scholarship of State Scholarship Fund, China Scholarship Council (2019)
  • Chinese Academy of Sciences President Awards, Chinese Academy of Sciences Academy of Mathematics and Systems Science (2019)
  • Jiaqing Zhong Excellent Paper Award, Chinese Society of Probability and Statistics (2019)
  • National Scholarship for Doctoral Students, Chinese Academy of Sciences (2019)
  • Travel Grants, The 20th International Conference on Systems Biology (2019)
  • Merit Student of University of Chinese Academy of Sciences, Chinese Academy of Sciences (2015)
  • Third-class Comprehensive scholarship, Sichuan University (2014)
  • Second-class Individual scholarship, Sichhuan University (2011)
  • Cyrus Tang Scholarship, Sichuan University (2010)

Professional Education

  • Ph.D, Academy of Mathematics and Systems Sciences, Chinese Academy of Science, System Science (2021)
  • B.S., Sichuan University, Statistics (2015)
  • B.S., Sichuan University, Economics (2015)

All Publications

  • Unsupervised topological alignment for single-cell multi-omics integration Cao, K., Bai, X., Hong, Y., Wan, L. OXFORD UNIV PRESS. 2020: 48-56


    Single-cell multi-omics data provide a comprehensive molecular view of cells. However, single-cell multi-omics datasets consist of unpaired cells measured with distinct unmatched features across modalities, making data integration challenging.In this study, we present a novel algorithm, termed UnionCom, for the unsupervised topological alignment of single-cell multi-omics integration. UnionCom does not require any correspondence information, either among cells or among features. It first embeds the intrinsic low-dimensional structure of each single-cell dataset into a distance matrix of cells within the same dataset and then aligns the cells across single-cell multi-omics datasets by matching the distance matrices via a matrix optimization method. Finally, it projects the distinct unmatched features across single-cell datasets into a common embedding space for feature comparability of the aligned cells. To match the complex non-linear geometrical distorted low-dimensional structures across datasets, UnionCom proposes and adjusts a global scaling parameter on distance matrices for aligning similar topological structures. It does not require one-to-one correspondence among cells across datasets, and it can accommodate samples with dataset-specific cell types. UnionCom outperforms state-of-the-art methods on both simulated and real single-cell multi-omics datasets. UnionCom is robust to parameter choices, as well as subsampling of features.UnionCom software is available at data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btaa443

    View details for Web of Science ID 000579894600007

    View details for PubMedID 32657382

    View details for PubMedCentralID PMC7355262

  • A Branch Point on Differentiation Trajectory is the Bifurcating Event Revealed by Dynamical Network Biomarker Analysis of Single-Cell Data IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS Chen, Z., Bai, X., Ma, L., Wang, X., Liu, X., Liu, Y., Chen, L., Wan, L. 2020; 17 (2): 366-375


    The advance in single-cell profiling technologies and the development in computational algorithms provide the opportunity to reconstruct pseudo temporal trajectory with branch point of cellular development. On the other hand, theories such as dynamical network biomarkers (DNB) theory have been recently proposed to characterize the pre-transition state in biological systems. Few studies have validated whether the branch point identified in pseudo time is the critical point in dynamical system. In this study, the dynamical behavior of the branch point on the pseudo trajectory has been investigated. We study the pseudo temporal trajectories reconstructed by Wishbone and diffusion pseudotime analysis (DPT) algorithms, as well as the simulated trajectory. DNB theory is applied to justify the bifurcating event on the pseudo trajectories. Our results demonstrate that the branch point recovered by Wishbone and DPT algorithms is confirmed as a transition state in cell differentiation process by DNB theory. Furthermore, we show that an appropriate DNB group will amplify the comprehensive index of critical event as defined in DNB theory. Our study provides biological insights on pseudo trajectory with branch point in a dynamical view and also indicates that DNB theory may serve as a benchmark to check the validity of branch point.

    View details for DOI 10.1109/TCBB.2018.2847690

    View details for Web of Science ID 000524236800001

    View details for PubMedID 29994127

  • Joint Inference of Clonal Structure using Single-cell Genome and Transcriptome Sequencing Data Bai, X., Duren, Z., Wan, L., Xia, L. In progress. 2020
  • Statistical test of structured continuous trees based on discordance matrix BIOINFORMATICS Bai, X., Ma, L., Wan, L. 2019; 35 (23): 4962-4970


    Cell fate determination is a continuous process in which one cell type diversifies to other cell types following a hierarchical path. Advancements in single-cell technologies provide the opportunity to reveal the continuum of cell progression which forms a structured continuous tree (SCTree). Computational algorithms, which are usually based on a priori assumptions on the hidden structures, have previously been proposed as a means of recovering pseudo trajectory along cell differentiation process. However, there still lack of statistical framework on the assessments of intrinsic structure embedded in high-dimensional gene expression profile. Inherit noise and cell-to-cell variation underlie the single-cell data, however, pose grand challenges to testing even basic structures, such as linear versus bifurcation.In this study, we propose an adaptive statistical framework, termed SCTree, to test the intrinsic structure of a high-dimensional single-cell dataset. SCTree test is conducted based on the tools derived from metric geometry and random matrix theory. In brief, by extending the Gromov-Farris transform and utilizing semicircular law, we formulate the continuous tree structure testing problem into a signal matrix detection problem. We show that the SCTree test is most powerful when the signal-to-noise ratio exceeds a moderate value. We also demonstrate that SCTree is able to robustly detect linear, single and multiple branching events with simulated datasets and real scRNA-seq datasets. Overall, the SCTree test provides a unified statistical assessment of the significance of the hidden structure of single-cell data.SCTree software is available at data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btz425

    View details for Web of Science ID 000506808900013

    View details for PubMedID 31116393

  • DensityPath: an algorithm to visualize and reconstruct cell state-transition path on density landscape for single-cell RNA sequencing data BIOINFORMATICS Chen, Z., An, S., Bai, X., Gong, F., Ma, L., Wan, L. 2019; 35 (15): 2593-2601


    Visualizing and reconstructing cell developmental trajectories intrinsically embedded in high-dimensional expression profiles of single-cell RNA sequencing (scRNA-seq) snapshot data are computationally intriguing, but challenging.We propose DensityPath, an algorithm allowing (i) visualization of the intrinsic structure of scRNA-seq data on an embedded 2-d space and (ii) reconstruction of an optimal cell state-transition path on the density landscape. DensityPath powerfully handles high dimensionality and heterogeneity of scRNA-seq data by (i) revealing the intrinsic structures of data, while adopting a non-linear dimension reduction algorithm, termed elastic embedding, which can preserve both local and global structures of the data; and (ii) extracting the topological features of high-density, level-set clusters from a single-cell multimodal density landscape of transcriptional heterogeneity, as the representative cell states. DensityPath reconstructs the optimal cell state-transition path by finding the geodesic minimum spanning tree of representative cell states on the density landscape, establishing a least action path with the minimum-transition-energy of cell fate decisions. We demonstrate that DensityPath can ably reconstruct complex trajectories of cell development, e.g. those with multiple bifurcating and trifurcating branches, while maintaining computational efficiency. Moreover, DensityPath has high accuracy for pseudotime calculation and branch assignment on real scRNA-seq, as well as simulated datasets. DensityPath is robust to parameter choices, as well as permutations of data.DensityPath software is available at data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/bty1009

    View details for Web of Science ID 000484378200009

    View details for PubMedID 30535348