Stanford Advisors


All Publications


  • A spectral method for assessing and combining multiple data visualizations. Nature communications Ma, R., Sun, E. D., Zou, J. 2023; 14 (1): 780

    Abstract

    Dimension reduction is an indispensable part of modern data science, and many algorithms have been developed. However, different algorithms have their own strengths and weaknesses, making it important to evaluate their relative performance, and to leverage and combine their individual strengths. This paper proposes a spectral method for assessing and combining multiple visualizations of a given dataset produced by diverse algorithms. The proposed method provides a quantitative measure - the visualization eigenscore - of the relative performance of the visualizations for preserving the structure around each data point. It also generates a consensus visualization, having improved quality over individual visualizations in capturing the underlying structure. Our approach is flexible and works as a wrapper around any visualizations. We analyze multiple real-world datasets to demonstrate the effectiveness of the method. We also provide theoretical justifications based on a general statistical framework, yielding several fundamental principles along with practical guidance.

    View details for DOI 10.1038/s41467-023-36492-2

    View details for PubMedID 36774377

    View details for PubMedCentralID PMC9922271

  • Dynamic visualization of high-dimensional data NATURE COMPUTATIONAL SCIENCE Sun, E. D., Ma, R., Zou, J. 2022
  • Statistical Inference for High-Dimensional Generalized Linear Models With Binary Outcomes JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Cai, T., Guo, Z., Ma, R. 2021
  • Optimal Permutation Recovery in Permuted Monotone Matrix Model JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Ma, R., Tony Cai, T., Li, H. 2020
  • Global and Simultaneous Hypothesis Testing for High-Dimensional Logistic Regression Models JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION Ma, R., Cai, T., Li, H. 2021; 116 (534): 984-998

    Abstract

    High-dimensional logistic regression is widely used in analyzing data with binary outcomes. In this paper, global testing and large-scale multiple testing for the regression coefficients are considered in both single- and two-regression settings. A test statistic for testing the global null hypothesis is constructed using a generalized low-dimensional projection for bias correction and its asymptotic null distribution is derived. A lower bound for the global testing is established, which shows that the proposed test is asymptotically minimax optimal over some sparsity range. For testing the individual coefficients simultaneously, multiple testing procedures are proposed and shown to control the false discovery rate (FDR) and falsely discovered variables (FDV) asymptotically. Simulation studies are carried out to examine the numerical performance of the proposed tests and their superiority over existing methods. The testing procedures are also illustrated by analyzing a data set of a metabolomics study that investigates the association between fecal metabolites and pediatric Crohn's disease and the effects of treatment on such associations.

    View details for DOI 10.1080/01621459.2019.1699421

    View details for Web of Science ID 000508347000001

    View details for PubMedID 34421157

    View details for PubMedCentralID PMC8375316