Honors & Awards


  • Roger Williams Memorial Award, International Genetic Epidemiology Society (2024)
  • Goldwater Scholarship, Barry Goldwater Scholarship & Excellence in Education Foundation (2018)

Boards, Advisory Committees, Professional Organizations


  • Communications Committee, International Genetic Epidemiology Society (2025 - Present)

Professional Education


  • PhD, University of Minnesota, Biostatistics (2025)
  • BS, Andrews University, Mathematics (2020)

Stanford Advisors


All Publications


  • A bootstrap model comparison test for identifying genes with context-specific patterns of genetic regulation. The annals of applied statistics Malakhov, M. M., Dai, B., Shen, X. T., Pan, W. 2024; 18 (3): 1840-1857

    Abstract

    Understanding how genetic variation affects gene expression is essential for a complete picture of the functional pathways that give rise to complex traits. Although numerous studies have established that many genes are differentially expressed in distinct human tissues and cell types, no tools exist for identifying the genes whose expression is differentially regulated. Here we introduce DRAB (differential regulation analysis by bootstrapping), a gene-based method for testing whether patterns of genetic regulation are significantly different between tissues or other biological contexts. DRAB first leverages the elastic net to learn context-specific models of local genetic regulation and then applies a novel bootstrap-based model comparison test to check their equivalency. Unlike previous model comparison tests, our proposed approach can determine whether population-level models have equal predictive performance by accounting for the variability of feature selection and model training. We validated DRAB on mRNA expression data from a variety of human tissues in the Genotype-Tissue Expression (GTEx) Project. DRAB yielded biologically reasonable results and had sufficient power to detect genes with tissue-specific regulatory profiles while effectively controlling false positives. By providing a framework that facilitates the prioritization of differentially regulated genes, our study enables future discoveries on the genetic architecture of molecular phenotypes.

    View details for DOI 10.1214/23-aoas1859

    View details for PubMedID 39421855

    View details for PubMedCentralID PMC11484521

  • Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer's disease. PLoS genetics He, R., Ren, J., Malakhov, M. M., Pan, W. 2025; 21 (4): e1011659

    Abstract

    Genome-wide association studies (GWAS) performed on large cohort and biobank datasets have identified many genetic loci associated with Alzheimer's disease (AD). However, the younger demographic of biobank participants relative to the typical age of late-onset AD has resulted in an insufficient number of AD cases, limiting the statistical power of GWAS and any downstream analyses. To mitigate this limitation, several trait imputation methods have been proposed to impute the expected future AD status of individuals who may not have yet developed the disease. This paper explores the use of imputed AD status in nonlinear transcriptome/proteome-wide association studies (TWAS/PWAS) to identify genes and proteins whose genetically regulated expression is associated with AD risk. In particular, we considered the TWAS/PWAS method DeLIVR, which utilizes deep learning to model the nonlinear effects of expression on disease. We trained transcriptome and proteome imputation models for DeLIVR on data from the Genotype-Tissue Expression (GTEx) Project and the UK Biobank (UKB), respectively, with imputed AD status in UKB participants as the outcome. Next, we performed hypothesis testing for the DeLIVR models using clinically diagnosed AD cases from the Alzheimer's Disease Sequencing Project (ADSP). Our results demonstrate that nonlinear TWAS/PWAS trained with imputed AD outcomes successfully identifies known and putative AD risk genes and proteins. Notably, we found that training with imputed outcomes can increase statistical power without inflating false positives, enabling the discovery of molecular exposures with potentially nonlinear effects on neurodegeneration.

    View details for DOI 10.1371/journal.pgen.1011659

    View details for PubMedID 40209152

    View details for PubMedCentralID PMC12040266

  • A novel multivariable Mendelian randomization framework to disentangle highly correlated exposures with application to metabolomics. American journal of human genetics Chan, L. S., Malakhov, M. M., Pan, W. 2024; 111 (9): 1834-1847

    Abstract

    Mendelian randomization (MR) utilizes genome-wide association study (GWAS) summary data to infer causal relationships between exposures and outcomes, offering a valuable tool for identifying disease risk factors. Multivariable MR (MVMR) estimates the direct effects of multiple exposures on an outcome. This study tackles the issue of highly correlated exposures commonly observed in metabolomic data, a situation where existing MVMR methods often face reduced statistical power due to multicollinearity. We propose a robust extension of the MVMR framework that leverages constrained maximum likelihood (cML) and employs a Bayesian approach for identifying independent clusters of exposure signals. Applying our method to the UK Biobank metabolomic data for the largest Alzheimer disease (AD) cohort through a two-sample MR approach, we identified two independent signal clusters for AD: glutamine and lipids, with posterior inclusion probabilities (PIPs) of 95.0% and 81.5%, respectively. Our findings corroborate the hypothesized roles of glutamate and lipids in AD, providing quantitative support for their potential involvement.

    View details for DOI 10.1016/j.ajhg.2024.07.007

    View details for PubMedID 39106865

    View details for PubMedCentralID PMC11393695

  • Accounting for nonlinear effects of gene expression identifies additional associated genes in transcriptome-wide association studies. Human molecular genetics Lin, Z., Xue, H., Malakhov, M. M., Knutson, K. A., Pan, W. 2022; 31 (14): 2462-2470

    Abstract

    Transcriptome-wide association studies (TWAS) integrate genome-wide association study (GWAS) data with gene expression (GE) data to identify (putative) causal genes for complex traits. There are two stages in TWAS: in Stage 1, a model is built to impute gene expression from genotypes, and in Stage 2, gene-trait association is tested using imputed gene expression. Despite many successes with TWAS, in the current practice, one only assumes a linear relationship between GE and the trait, which however may not hold, leading to loss of power. In this study, we extend the standard TWAS by considering a quadratic effect of GE, in addition to the usual linear effect. We train imputation models for both linear and quadratic gene expression levels in Stage 1, then include both the imputed linear and quadratic expression levels in Stage 2. We applied both the standard TWAS and our approach first to the ADNI gene expression data and the IGAP Alzheimer's disease GWAS summary data, then to the GTEx (V8) gene expression data and the UK Biobank individual-level GWAS data for lipids, followed by validation with different GWAS data, suitable model checking and more robust TWAS methods. In all these applications, the new TWAS approach was able to identify additional genes associated with Alzheimer's disease, LDL and HDL cholesterol levels, suggesting its likely power gains and thus the need to account for potentially nonlinear effects of gene expression on complex traits.

    View details for DOI 10.1093/hmg/ddac015

    View details for PubMedID 35043938

    View details for PubMedCentralID PMC9307319

  • Governance structure affects transboundary disease management under alternative objectives. BMC public health Blackwood, J. C., Malakhov, M. M., Duan, J., Pellett, J. J., Phadke, I. S., Lenhart, S., Sims, C., Shea, K. 2021; 21 (1): 1782

    Abstract

    The development of public health policy is inextricably linked with governance structure. In our increasingly globalized world, human migration and infectious diseases often span multiple administrative jurisdictions that might have different systems of government and divergent management objectives. However, few studies have considered how the allocation of regulatory authority among jurisdictions can affect disease management outcomes.Here we evaluate the relative merits of decentralized and centralized management by developing and numerically analyzing a two-jurisdiction SIRS model that explicitly incorporates migration. In our model, managers choose between vaccination, isolation, medication, border closure, and a travel ban on infected individuals while aiming to minimize either the number of cases or the number of deaths.We consider a variety of scenarios and show how optimal strategies differ for decentralized and centralized management levels. We demonstrate that policies formed in the best interest of individual jurisdictions may not achieve global objectives, and identify situations where locally applied interventions can lead to an overall increase in the numbers of cases and deaths.Our approach underscores the importance of tailoring disease management plans to existing regulatory structures as part of an evidence-based decision framework. Most importantly, we demonstrate that there needs to be a greater consideration of the degree to which governance structure impacts disease outcomes.

    View details for DOI 10.1186/s12889-021-11797-3

    View details for PubMedID 34600500

    View details for PubMedCentralID PMC8487237

  • Management efficacy in a metapopulation model of white-nose syndrome NATURAL RESOURCE MODELING Duan, J., Malakhov, M. M., Pellett, J. J., Phadke, I. S., Barber, J., Blackwood, J. C. 2021; 34 (3)

    View details for DOI 10.1111/nrm.12304

    View details for Web of Science ID 000637029300001