Jiaqi Gu's Profile | Stanford Profiles

Bio

I am a postdoctoral scholar in Department of Neurology & Neurological Sciences, Stanford University and under supervision of Dr. Zihuai He. Before that, I obtained my PhD degree in statistics under supervision of Prof. Philip L.H. Yu and Prof. Guosheng Yin in University of Hong Kong and my bachelor degrees in statistics from Renmin University of China.

My researches concentrate on preference learning, network data modeling, quantitative analysis of survival and public health data, high-dimensional statistical inference with geometric information and statistical genetics.

Honors & Awards

Excellent Research Award, Department of Statistics and Actuarial Science, University of Hong Kong (2022)
Excellent Research Award, Department of Statistics and Actuarial Science, University of Hong Kong (2021)
Excellent Teaching Assistant Award, Department of Statistics and Actuarial Science, University of Hong Kong (2021)
Honorable Mention, Interdisciplinary Contest in Modeling (2017)
Runner-up, Beijing-Hong Kong Data Modeling Competition (2017)
First Prize, Contemporary Undergraduate Mathematical Contest in Modeling (Beijing) (2016)

Professional Education

Bachelor of Science, Renmin University Of China (2018)
Doctor of Philosophy, University Of Hong Kong (2022)
Ph.D., University of Hong Kong, Statistics (2022)
B.S., Renmin University of China, Statistics (2018)

Stanford Advisors

Zihuai He, Postdoctoral Faculty Sponsor

Contact

Academic
jiaqigu@stanford.edu

University - Scholar Department: Adult Neurology Position: Postdoctoral Scholar

Additional Info

Mail Code: 5479
ORCID:
https://orcid.org/0000-0002-8773-9558

All Publications

Second-order group knockoffs with applications to GWAS. Bioinformatics (Oxford, England) Chu, B. B., Gu, J., Chen, Z., Morrison, T., Candès, E., He, Z., Sabatti, C. 2024

Abstract

Conditional testing via the knockoff framework allows one to identify-among large number of possible explanatory variables-those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance.While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct ''group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank.The described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.Supplementary data are available from Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btae580

View details for PubMedID 39340798
Unit information Dirichlet process prior. Biometrics Gu, J., Yin, G. 2024; 80 (3)

Abstract

Prior distributions, which represent one's belief in the distributions of unknown parameters before observing the data, impact Bayesian inference in a critical and fundamental way. With the ability to incorporate external information from expert opinions or historical datasets, the priors, if specified appropriately, can improve the statistical efficiency of Bayesian inference. In survival analysis, based on the concept of unit information (UI) under parametric models, we propose the unit information Dirichlet process (UIDP) as a new class of nonparametric priors for the underlying distribution of time-to-event data. By deriving the Fisher information in terms of the differential of the cumulative hazard function, the UIDP prior is formulated to match its prior UI with the weighted average of UI in historical datasets and thus can utilize both parametric and nonparametric information provided by historical datasets. With a Markov chain Monte Carlo algorithm, simulations and real data analysis demonstrate that the UIDP prior can adaptively borrow historical information and improve statistical efficiency in survival analysis.

View details for DOI 10.1093/biomtc/ujae091

View details for PubMedID 39248120
Summary statistics knockoffs inference with family-wise error rate control. Biometrics Yu, C. X., Gu, J., Chen, Z., He, Z. 2024; 80 (3)

Abstract

Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.

View details for DOI 10.1093/biomtc/ujae082

View details for PubMedID 39222026

View details for PubMedCentralID PMC11367731
In silico identification of putative causal genetic variants. bioRxiv : the preprint server for biology He, Z., Chu, B., Yang, J., Gu, J., Chen, Z., Liu, L., Morrison, T., Belloy, M. E., Qi, X., Hejazi, N., Mathur, M., Le Guen, Y., Tang, H., Hastie, T., Ionita-Laza, I., Sabatti, C., Candes, E. 2024

Abstract

Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Despite the widespread availability of genome-wide data, existing methods to analyze genetic data still primarily focus on marginal association models, which fall short of fully capturing the polygenic nature of complex traits and elucidating biological causal mechanisms. Here we present a computationally efficient causal inference framework for genome-wide detection of putative causal variants underlying genetic associations. Our approach utilizes summary statistics from potentially overlapping studies as input, constructs in silico knockoff copies of summary statistics as negative controls to attenuate confounding effects induced by linkage disequilibrium, and employs efficient ultrahigh-dimensional sparse regression to jointly model all genetic variants across the genome. Our method is computationally efficient, requiring less than 15 minutes on a single CPU to analyze genome-wide summary statistics. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD) we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline via marginal association testing. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of large-scale genome-wide association studies (GWAS) summary statistics from 2013 to 2022. Results reveal the method's capacity to robustly discover additional loci for polygenic traits beyond conventional GWAS and pinpoint potential causal variants underpinning each locus (on average, 22.7% more loci and 78.7% fewer proxy variants), contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses. We are making the discoveries and software freely available to the community and anticipate that routine end-to-end in silico identification of putative causal genetic variants will become an important tool that will facilitate downstream functional experiments and future research into disease etiology, as well as the exploration of novel therapeutic avenues.

View details for DOI 10.1101/2024.02.28.582621

View details for PubMedID 38464202
Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression. ArXiv Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., Candes, E. 2024

Abstract

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.

View details for PubMedID 38463500
Omnibus test for restricted mean survival time based on influence function. Statistical methods in medical research Gu, J., Fan, Y., Yin, G. 2023: 9622802231158735

Abstract

The restricted mean survival time (RMST), which evaluates the expected survival time up to a pre-specified time point τ, has been widely used to summarize the survival distribution due to its robustness and straightforward interpretation. In comparative studies with time-to-event data, the RMST-based test has been utilized as an alternative to the classic log-rank test because the power of the log-rank test deteriorates when the proportional hazards assumption is violated. To overcome the challenge of selecting an appropriate time point τ, we develop an RMST-based omnibus Wald test to detect the survival difference between two groups throughout the study follow-up period. Treating a vector of RMSTs at multiple quantile-based time points as a statistical functional, we construct a Wald χ2 test statistic and derive its asymptotic distribution using the influence function. We further propose a new procedure based on the influence function to estimate the asymptotic covariance matrix in contrast to the usual bootstrap method. Simulations under different scenarios validate the size of our RMST-based omnibus test and demonstrate its advantage over the existing tests in power, especially when the true survival functions cross within the study follow-up period. For illustration, the proposed test is applied to two real datasets, which demonstrate its power and applicability in various situations.

View details for DOI 10.1177/09622802231158735

View details for PubMedID 37015346
ANALYSIS OF PREFERENCES IN SOCIAL NETWORKS ANNALS OF APPLIED STATISTICS Gu, B., Yu, P. H. 2023; 17 (1): 89-107

View details for DOI 10.1214/22-AOAS1617

View details for Web of Science ID 000929003900005
Bayesian Log-Rank Test AMERICAN STATISTICIAN Gu, J., Zhang, Y., Yin, G. 2023

View details for DOI 10.1080/00031305.2022.2161637

View details for Web of Science ID 000908150700001
3D-Polishing for Triangular Mesh Compression of Point Cloud Data The 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’23) Gu, J., Yin, G. 2023: 557-566

View details for DOI 10.1145/3580305.3599239
Jiaqi Gu and Guosheng Yin’s contribution to the Discussion of ‘Martingale Posterior Distributions’ by Fong, Holmes and Walker Journal of the Royal Statistical Society Series B (Statistical Methodology) Gu, J., Yin, G. 2023

View details for DOI 10.1093/jrsssb/qkad092
Bayesian SIR model with change points with application to the Omicron wave in Singapore SCIENTIFIC REPORTS Gu, J., Yin, G. 2022; 12 (1): 20864

Abstract

The Omicron variant has led to a new wave of the COVID-19 pandemic worldwide, with unprecedented numbers of daily confirmed new cases in many countries and areas. To analyze the impact of society or policy changes on the development of the Omicron wave, the stochastic susceptible-infected-removed (SIR) model with change points is proposed to accommodate the situations where the transmission rate and the removal rate may vary significantly at change points. Bayesian inference based on a Markov chain Monte Carlo algorithm is developed to estimate both the locations of change points as well as the transmission rate and removal rate within each stage. Experiments on simulated data reveal the effectiveness of the proposed method, and several stages are detected in analyzing the Omicron wave data in Singapore.

View details for DOI 10.1038/s41598-022-25473-y

View details for Web of Science ID 000932261400072

View details for PubMedID 36460721

View details for PubMedCentralID PMC9718478
Triangular Concordance Learning of Networks JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS Gu, J., Yin, G. 2022

View details for DOI 10.1080/10618600.2022.2099405

View details for Web of Science ID 000853487100001
Sparse concordance-based ordinal classification SCANDINAVIAN JOURNAL OF STATISTICS Fan, Y., Gu, J., Yin, G. 2022

View details for DOI 10.1111/sjos.12606

View details for Web of Science ID 000846860600001
Joint latent space models for ranking data and social network STATISTICS AND COMPUTING Gu, J., Yu, P. H. 2022; 32 (3)

View details for DOI 10.1007/s11222-022-10106-1

View details for Web of Science ID 000810665300001
Reconstructing the Kaplan-Meier Estimator as an M-estimator AMERICAN STATISTICIAN Gu, J., Fan, Y., Yin, G. 2022; 76 (1): 37-43

View details for DOI 10.1080/00031305.2021.1947376

View details for Web of Science ID 000678954700001
Crystallization Learning with the Delaunay Triangulation The 38th International Conference on Machine Learning Gu, J., Yin, G. 2021: 3854-3863
Analysis of ranking data WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS Yu, P. H., Gu, J., Xu, H. 2019; 11 (6)

View details for DOI 10.1002/wics.1483

View details for Web of Science ID 000489576600004
Fast Algorithm for Generalized Multinomial Models with Ranking Data The 36th International Conference on Machine Learning Gu, J., Yin, G. 2019: 2445- 2453

Jiaqi Gu

Postdoctoral Scholar, Neurology and Neurological Sciences

Bio

Honors & Awards

Professional Education

Stanford Advisors

Contact

Additional Info

Links

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract