Zihuai He

Associate Professor (Research) of Neurology and Neurological Sciences (Neurology Research), of Medicine (BMIR) and, by courtesy, of Biomedical Data Science

Web page: http://www.zihuai-he.com

Bio

Dr. He received his PhD from the University of Michigan in 2016. Following a postdoctoral training in biostatistics at Columbia University, he joined Stanford University in 2018. His research is concentrated in the area of statistical genomics and integrative analysis of omics data, with the aim of developing novel statistical and computational methodologies for the identification and interpretation of complex biological pathways involved in human diseases and aging. His methodology interest includes high-dimensional data analysis, machine learning algorithms and explainable AI.

Academic Appointments

Associate Professor (Research), Neurology and Neurological Sciences
Associate Professor (Research), Medicine - Biomedical Informatics Research
Associate Professor (Research) (By courtesy), Department of Biomedical Data Science
Member, Bio-X
Member, Wu Tsai Neurosciences Institute

Honors & Awards

Rackham Pre-doctoral Fellowship Award, University of Michigan (2015)
Rackham Conference Travel Grant, University of Michigan (2013 - 2015)
Best Performance on the Qualifying Exam, University of Michigan (2013)

Professional Education

Ph.D., University of Michigan, Biostatistics (2016)
B.S., Tsinghua University, Mathematics and Physics (2010)

Contact

Academic
zihuai@stanford.edu
University - Faculty Department: Neurology Research Position: Assoc Professor-Research
- 3180 Porter Dr Rm 112
- Palo Alto, California 94304-1212

Additional Info

Mail Code: 5559
ORCID:
https://orcid.org/0000-0002-8220-4183

Current Research and Scholarly Interests

Statistical genetics and other omics to study Alzheimer's disease and aging.

2025-26 Courses

Workshop in Biomedical Data Science
BMDS 280A, STATS 260A (Aut)
Independent Studies (2)
- Directed Reading
  BMDS 299 (Win)
- Ph.D. Research
  CME 400 (Aut, Win, Spr, Sum)
Prior Year Courses
2024-25 Courses
- Workshop in Biostatistics
  BIODS 260A, STATS 260A (Aut)

Stanford Advisees

Doctoral Dissertation Reader (AC)
Amelia Farinas
Doctoral Dissertation Advisor (AC)
Julie F. Wang

All Publications

Second-order group knockoffs with applications to GWAS. Bioinformatics (Oxford, England) Chu, B. B., Gu, J., Chen, Z., Morrison, T., Candès, E., He, Z., Sabatti, C. 2024

Abstract

Conditional testing via the knockoff framework allows one to identify-among large number of possible explanatory variables-those that carry unique information about an outcome of interest, and also provides a false discovery rate guarantee on the selection. This approach is particularly well suited to the analysis of genome wide association studies (GWAS), which have the goal of identifying genetic variants which influence traits of medical relevance.While conditional testing can be both more powerful and precise than traditional GWAS analysis methods, its vanilla implementation encounters a difficulty common to all multivariate analysis methods: it is challenging to distinguish among multiple, highly correlated regressors. This impasse can be overcome by shifting the object of inference from single variables to groups of correlated variables. To achieve this, it is necessary to construct ''group knockoffs." While successful examples are already documented in the literature, this paper substantially expands the set of algorithms and software for group knockoffs. We focus in particular on second-order knockoffs, for which we describe correlation matrix approximations that are appropriate for GWAS data and that result in considerable computational savings. We illustrate the effectiveness of the proposed methods with simulations and with the analysis of albuminuria data from the UK Biobank.The described algorithms are implemented in an open-source Julia package Knockoffs.jl. R and Python wrappers are available as knockoffsr and knockoffspy packages.Supplementary data are available from Bioinformatics online.

View details for DOI 10.1093/bioinformatics/btae580

View details for PubMedID 39340798
Summary statistics knockoffs inference with family-wise error rate control. Biometrics Yu, C. X., Gu, J., Chen, Z., He, Z. 2024; 80 (3)

Abstract

Testing multiple hypotheses of conditional independence with provable error rate control is a fundamental problem with various applications. To infer conditional independence with family-wise error rate (FWER) control when only summary statistics of marginal dependence are accessible, we adopt GhostKnockoff to directly generate knockoff copies of summary statistics and propose a new filter to select features conditionally dependent on the response. In addition, we develop a computationally efficient algorithm to greatly reduce the computational cost of knockoff copies generation without sacrificing power and FWER control. Experiments on simulated data and a real dataset of Alzheimer's disease genetics demonstrate the advantage of the proposed method over existing alternatives in both statistical power and computational efficiency.

View details for DOI 10.1093/biomtc/ujae082

View details for PubMedID 39222026

View details for PubMedCentralID PMC11367731
In silico identification of putative causal genetic variants. bioRxiv : the preprint server for biology He, Z., Chu, B., Yang, J., Gu, J., Chen, Z., Liu, L., Morrison, T., Belloy, M. E., Qi, X., Hejazi, N., Mathur, M., Le Guen, Y., Tang, H., Hastie, T., Ionita-Laza, I., Sabatti, C., Candes, E. 2024

Abstract

Understanding the causal genetic architecture of complex phenotypes is essential for future research into disease mechanisms and potential therapies. Despite the widespread availability of genome-wide data, existing methods to analyze genetic data still primarily focus on marginal association models, which fall short of fully capturing the polygenic nature of complex traits and elucidating biological causal mechanisms. Here we present a computationally efficient causal inference framework for genome-wide detection of putative causal variants underlying genetic associations. Our approach utilizes summary statistics from potentially overlapping studies as input, constructs in silico knockoff copies of summary statistics as negative controls to attenuate confounding effects induced by linkage disequilibrium, and employs efficient ultrahigh-dimensional sparse regression to jointly model all genetic variants across the genome. Our method is computationally efficient, requiring less than 15 minutes on a single CPU to analyze genome-wide summary statistics. In applications to a meta-analysis of ten large-scale genetic studies of Alzheimer's disease (AD) we identified 82 loci associated with AD, including 37 additional loci missed by conventional GWAS pipeline via marginal association testing. The identified putative causal variants achieve state-of-the-art agreement with massively parallel reporter assays and CRISPR-Cas9 experiments. Additionally, we applied the method to a retrospective analysis of large-scale genome-wide association studies (GWAS) summary statistics from 2013 to 2022. Results reveal the method's capacity to robustly discover additional loci for polygenic traits beyond conventional GWAS and pinpoint potential causal variants underpinning each locus (on average, 22.7% more loci and 78.7% fewer proxy variants), contributing to a deeper understanding of complex genetic architectures in post-GWAS analyses. We are making the discoveries and software freely available to the community and anticipate that routine end-to-end in silico identification of putative causal genetic variants will become an important tool that will facilitate downstream functional experiments and future research into disease etiology, as well as the exploration of novel therapeutic avenues.

View details for DOI 10.1101/2024.02.28.582621

View details for PubMedID 38464202
Controlled Variable Selection from Summary Statistics Only? A Solution via GhostKnockoffs and Penalized Regression. ArXiv Chen, Z., He, Z., Chu, B. B., Gu, J., Morrison, T., Sabatti, C., Candes, E. 2024

Abstract

Identifying which variables do influence a response while controlling false positives pervades statistics and data science. In this paper, we consider a scenario in which we only have access to summary statistics, such as the values of marginal empirical correlations between each dependent variable of potential interest and the response. This situation may arise due to privacy concerns, e.g., to avoid the release of sensitive genetic information. We extend GhostKnockoffs He et al. [2022] and introduce variable selection methods based on penalized regression achieving false discovery rate (FDR) control. We report empirical results in extensive simulation studies, demonstrating enhanced performance over previous work. We also apply our methods to genome-wide association studies of Alzheimer's disease, and evidence a significant improvement in power.

View details for PubMedID 38463500
Organ aging signatures in the plasma proteome track health and disease. Nature Oh, H. S., Rutledge, J., Nachun, D., Pálovics, R., Abiose, O., Moran-Losada, P., Channappa, D., Urey, D. Y., Kim, K., Sung, Y. J., Wang, L., Timsina, J., Western, D., Liu, M., Kohlfeld, P., Budde, J., Wilson, E. N., Guen, Y., Maurer, T. M., Haney, M., Yang, A. C., He, Z., Greicius, M. D., Andreasson, K. I., Sathyan, S., Weiss, E. F., Milman, S., Barzilai, N., Cruchaga, C., Wagner, A. D., Mormino, E., Lehallier, B., Henderson, V. W., Longo, F. M., Montgomery, S. B., Wyss-Coray, T. 2023; 624 (7990): 164-172

Abstract

Animal studies show aging varies between individuals as well as between organs within an individual1-4, but whether this is true in humans and its effect on age-related diseases is unknown. We utilized levels of human blood plasma proteins originating from specific organs to measure organ-specific aging differences in living individuals. Using machine learning models, we analysed aging in 11 major organs and estimated organ age reproducibly in five independent cohorts encompassing 5,676 adults across the human lifespan. We discovered nearly 20% of the population show strongly accelerated age in one organ and 1.7% are multi-organ agers. Accelerated organ aging confers 20-50% higher mortality risk, and organ-specific diseases relate to faster aging of those organs. We find individuals with accelerated heart aging have a 250% increased heart failure risk and accelerated brain and vascular aging predict Alzheimer's disease (AD) progression independently from and as strongly as plasma pTau-181 (ref. 5), the current best blood-based biomarker for AD. Our models link vascular calcification, extracellular matrix alterations and synaptic protein shedding to early cognitive decline. We introduce a simple and interpretable method to study organ aging using plasma proteomics data, predicting diseases and aging effects.

View details for DOI 10.1038/s41586-023-06802-1

View details for PubMedID 38057571

View details for PubMedCentralID PMC10700136
Improving genetic risk prediction across diverse population by disentangling ancestry representations. Communications biology Gyawali, P. K., Le Guen, Y., Liu, X., Belloy, M. E., Tang, H., Zou, J., He, Z. 2023; 6 (1): 964

Abstract

Risk prediction models using genetic data have seen increasing traction in genomics. However, most of the polygenic risk models were developed using data from participants with similar (mostly European) ancestry. This can lead to biases in the risk predictors resulting in poor generalization when applied to minority populations and admixed individuals such as African Americans. To address this issue, largely due to the prediction models being biased by the underlying population structure, we propose a deep-learning framework that leverages data from diverse population and disentangles ancestry from the phenotype-relevant information in its representation. The ancestry disentangled representation can be used to build risk predictors that perform better across minority populations. We applied the proposed method to the analysis of Alzheimer's disease genetics. Comparing with standard linear and nonlinear risk prediction methods, the proposed method substantially improves risk prediction in minority populations, including admixed individuals, without needing self-reported ancestry information.

View details for DOI 10.1038/s42003-023-05352-6

View details for PubMedID 37736834
Association of African Ancestry-Specific APOE Missense Variant R145C With Risk of Alzheimer Disease. JAMA Le Guen, Y., Raulin, A., Logue, M. W., Sherva, R., Belloy, M. E., Eger, S. J., Chen, A., Kennedy, G., Kuchenbecker, L., O'Leary, J. P., Zhang, R., Merritt, V. C., Panizzon, M. S., Hauger, R. L., Gaziano, J. M., Bu, G., Thornton, T. A., Farrer, L. A., Napolioni, V., He, Z., Greicius, M. D. 2023; 329 (7): 551-560

Abstract

Importance: Numerous studies have established the association of the common APOE epsilon2 and APOE epsilon4 alleles with Alzheimer disease (AD) risk across ancestries. Studies of the interaction of these alleles with other amino acid changes on APOE in non-European ancestries are lacking and may improve ancestry-specific risk prediction.Objective: To determine whether APOE amino acid changes specific to individuals of African ancestry modulate AD risk.Design, Setting, and Participants: Case-control study including 31 929 participants and using a sequenced discovery sample (Alzheimer Disease Sequencing Project; stage 1) followed by 2 microarray imputed data sets derived from the Alzheimer Disease Genetic Consortium (stage 2, internal replication) and the Million Veteran Program (stage 3, external validation). This study combined case-control, family-based, population-based, and longitudinal AD cohorts, which recruited participants (1991-2022) in primarily US-based studies with 1 US/Nigerian study. Across all stages, individuals included in this study were of African ancestry.Exposures: Two APOE missense variants (R145C and R150H) were assessed, stratified by APOE genotype.Main Outcomes and Measures: The primary outcome was AD case-control status, and secondary outcomes included age at AD onset.Results: Stage 1 included 2888 cases (median age, 77 [IQR, 71-83] years; 31.3% male) and 4957 controls (median age, 77 [IQR, 71-83] years; 28.0% male). In stage 2, across multiple cohorts, 1201 cases (median age, 75 [IQR, 69-81] years; 30.8% male) and 2744 controls (median age, 80 [IQR, 75-84] years; 31.4% male) were included. In stage 3, 733 cases (median age, 79.4 [IQR, 73.8-86.5] years; 97.0% male) and 19 406 controls (median age, 71.9 [IQR, 68.4-75.8] years; 94.5% male) were included. In epsilon3/epsilon4-stratified analyses of stage 1, R145C was present in 52 individuals with AD (4.8%) and 19 controls (1.5%); R145C was associated with an increased risk of AD (odds ratio [OR], 3.01; 95% CI, 1.87-4.85; P=6.0*10-6) and was associated with a reported younger age at AD onset (beta, -5.87 years; 95% CI, -8.35 to -3.4 years; P=3.4*10-6). Association with increased AD risk was replicated in stage 2 (R145C was present in 23 individuals with AD [4.7%] and 21 controls [2.7%]; OR, 2.20; 95% CI, 1.04-4.65; P=.04) and was concordant in stage 3 (R145C was present in 11 individuals with AD [3.8%] and 149 controls [2.7%]; OR, 1.90; 95% CI, 0.99-3.64; P=.051). Association with earlier AD onset was replicated in stage 2 (beta, -5.23 years; 95% CI, -9.58 to -0.87 years; P=.02) and stage 3 (beta, -10.15 years; 95% CI, -15.66 to -4.64 years; P=4.0*10-4). No significant associations were observed in other APOE strata for R145C or in any APOE strata for R150H.Conclusions and Relevance: In this exploratory analysis, the APOE epsilon3[R145C] missense variant was associated with an increased risk of AD among individuals of African ancestry with the epsilon3/epsilon4 genotype. With additional external validation, these findings may inform AD genetic risk assessment in individuals of African ancestry.

View details for DOI 10.1001/jama.2023.0268

View details for PubMedID 36809323
BIGKnock: fine-mapping gene-based associations via knockoff analysis of biobank-scale data. Genome biology Ma, S., Wang, C., Khan, A., Liu, L., Dalgleish, J., Kiryluk, K., He, Z., Ionita-Laza, I. 2023; 24 (1): 24

Abstract

We propose BIGKnock (BIobank-scale Gene-based association test via Knockoffs), a computationally efficient gene-based testing approach for biobank-scale data, that leverages long-range chromatin interaction data, and performs conditional genome-wide testing via knockoffs. BIGKnock can prioritize causal genes over proxy associations at a locus. We apply BIGKnock to the UK Biobank data with 405,296 participants for multiple binary and quantitative traits, and show that relative to conventional gene-based tests, BIGKnock produces smaller sets of significant genes that contain the causal gene(s) with high probability. We further illustrate its ability to pinpoint potential causal genes at [Formula: see text] of the associated loci.

View details for DOI 10.1186/s13059-023-02864-6

View details for PubMedID 36782330
GhostKnockoff inference empowers identification of putative causal variants in genome-wide association studies. Nature communications He, Z., Liu, L., Belloy, M. E., Le Guen, Y., Sossin, A., Liu, X., Qi, X., Ma, S., Gyawali, P. K., Wyss-Coray, T., Tang, H., Sabatti, C., Candes, E., Greicius, M. D., Ionita-Laza, I. 2022; 13 (1): 7209

Abstract

Recent advances in genome sequencing and imputation technologies provide an exciting opportunity to comprehensively study the contribution of genetic variants to complex phenotypes. However, our ability to translate genetic discoveries into mechanistic insights remains limited at this point. In this paper, we propose an efficient knockoff-based method, GhostKnockoff, for genome-wide association studies (GWAS) that leads to improved power and ability to prioritize putative causal variants relative to conventional GWAS approaches. The method requires only Z-scores from conventional GWAS and hence can be easily applied to enhance existing and future studies. The method can also be applied to meta-analysis of multiple GWAS allowing for arbitrary sample overlap. We demonstrate its performance using empirical simulations and two applications: (1) a meta-analysis for Alzheimer's disease comprising nine overlapping large-scale GWAS, whole-exome and whole-genome sequencing studies and (2) analysis of 1403 binary phenotypes from the UK Biobank data in 408,961 samples of European ancestry. Our results demonstrate that GhostKnockoff can identify putatively functional variants with weaker statistical effects that are missed by conventional association tests.

View details for DOI 10.1038/s41467-022-34932-z

View details for PubMedID 36418338
Deep learning-assisted genome-wide characterization of massively parallel reporter assays. Nucleic acids research Lu, F., Sossin, A., Abell, N., Montgomery, S. B., He, Z. 2022

Abstract

Massively parallel reporter assay (MPRA) is a high-throughput method that enables the study of the regulatory activities of tens of thousands of DNA oligonucleotides in a single experiment. While MPRA experiments have grown in popularity, their small sample sizes compared to the scale of the human genome limits our understanding of the regulatory effects they detect. To address this, we develop a deep learning model, MpraNet, to distinguish potential MPRA targets from the background genome. This model achieves high discriminative performance (AUROC=0.85) at differentiating MPRA positives from a set of control variants that mimic the background genome when applied to the lymphoblastoid cell line. We observe that existing functional scores represent very distinct functional effects, and most of them fail to characterize the regulatory effect that MPRA detects. Using MpraNet, we predict potential MPRA functional variants across the genome and identify the distributions of MPRA effect relative to other characteristics of genetic variation, including allele frequency, alternative functional annotations specified by FAVOR, and phenome-wide associations. We also observed that the predicted MPRA positives are not uniformly distributed across the genome; instead, they are clumped together in active regions comprising 9.95% of the genome and inactive regions comprising 89.07% of the genome. Furthermore, we propose our model as a screen to filter MPRA experiment candidates at genome-wide scale, enabling future experiments to be more cost-efficient by increasing precision relative to that observed from previous MPRAs.

View details for DOI 10.1093/nar/gkac990

View details for PubMedID 36350674
Deep neural networks with controlled variable selection for the identification of putative causal genetic variants. Nature machine intelligence Kassani, P. H., Lu, F., Guen, Y. L., Belloy, M. E., He, Z. 2022; 4 (9): 761-771

Abstract

Deep neural networks (DNNs) have been successfully utilized in many scientific problems for their high prediction accuracy, but their application to genetic studies remains challenging due to their poor interpretability. Here we consider the problem of scalable, robust variable selection in DNNs for the identification of putative causal genetic variants in genome sequencing studies. We identified a pronounced randomness in feature selection in DNNs due to its stochastic nature, which may hinder interpretability and give rise to misleading results. We propose an interpretable neural network model, stabilized using ensembling, with controlled variable selection for genetic studies. The merit of the proposed method includes: flexible modelling of the nonlinear effect of genetic variants to improve statistical power; multiple knockoffs in the input layer to rigorously control the false discovery rate; hierarchical layers to substantially reduce the number of weight parameters and activations, and improve computational efficiency; and stabilized feature selection to reduce the randomness in identified signals. We evaluate the proposed method in extensive simulation studies and apply it to the analysis of Alzheimer's disease genetics. We show that the proposed method, when compared with conventional linear and nonlinear methods, can lead to substantially more discoveries.

View details for DOI 10.1038/s42256-022-00525-0

View details for PubMedID 37859729

View details for PubMedCentralID PMC10586424
Multiple causal variants underlie genetic associations in humans. Science (New York, N.Y.) Abell, N. S., DeGorter, M. K., Gloudemans, M. J., Greenwald, E., Smith, K. S., He, Z., Montgomery, S. B. 2022; 375 (6586): 1247-1254

Abstract

Associations between genetic variation and traits are often in noncoding regions with strong linkage disequilibrium (LD), where a single causal variant is assumed to underlie the association. We applied a massively parallel reporter assay (MPRA) to functionally evaluate genetic variants in high, local LD for independent cis-expression quantitative trait loci (eQTL). We found that 17.7% of eQTLs exhibit more than one major allelic effect in tight LD. The detected regulatory variants were highly and specifically enriched for activating chromatin structures and allelic transcription factor binding. Integration of MPRA profiles with eQTL/complex trait colocalizations across 114 human traits and diseases identified causal variant sets demonstrating how genetic association signals can manifest through multiple, tightly linked causal variants.

View details for DOI 10.1126/science.abj5117

View details for PubMedID 35298243
Quantitative disease risk scores from EHR with applications to clinical risk stratification and genetic studies. NPJ digital medicine Xu, D., Wang, C., Khan, A., Shang, N., He, Z., Gordon, A., Kullo, I. J., Murphy, S., Ni, Y., Wei, W., Gharavi, A., Kiryluk, K., Weng, C., Ionita-Laza, I. 2021; 4 (1): 116

Abstract

Labeling clinical data from electronic health records (EHR) in health systems requires extensive knowledge of human expert, and painstaking review by clinicians. Furthermore, existing phenotyping algorithms are not uniformly applied across large datasets and can suffer from inconsistencies in case definitions across different algorithms. We describe here quantitative disease risk scores based on almost unsupervised methods that require minimal input from clinicians, can be applied to large datasets, and alleviate some of the main weaknesses of existing phenotyping algorithms. We show applications to phenotypic data on approximately 100,000 individuals in eMERGE, and focus on several complex diseases, including Chronic Kidney Disease, Coronary Artery Disease, Type 2 Diabetes, Heart Failure, and a few others. We demonstrate that relative to existing approaches, the proposed methods have higher prediction accuracy, can better identify phenotypic features relevant to the disease under consideration, can perform better at clinical risk stratification, and can identify undiagnosed cases based on phenotypic features available in the EHR. Using genetic data from the eMERGE-seq panel that includes sequencing data for 109 genes on 21,363 individuals from multiple ethnicities, we also show how the new quantitative disease risk scores help improve the power of genetic association studies relative to the standard use of disease phenotypes. The results demonstrate the effectiveness of quantitative disease risk scores derived from rich phenotypic EHR databases to provide a more meaningful characterization of clinical risk for diseases of interest beyond the prevalent binary (case-control) classification.

View details for DOI 10.1038/s41746-021-00488-3

View details for PubMedID 34302027
Identification of putative causal loci in whole-genome sequencing data via knockoff statistics. Nature communications He, Z., Liu, L., Wang, C., Le Guen, Y., Lee, J., Gogarten, S., Lu, F., Montgomery, S., Tang, H., Silverman, E. K., Cho, M. H., Greicius, M., Ionita-Laza, I. 2021; 12 (1): 3152

Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of rare variants in noncoding regions and the lack of natural units for testing. We propose a statistical method to detect and localize rare and common risk variants in whole-genome sequencing studies based on a recently developed knockoff framework. It can (1) prioritize causal variants over associations due to linkage disequilibrium thereby improving interpretability; (2) help distinguish the signal due to rare variants from shadow effects of significant common variants nearby; (3) integrate multiple knockoffs for improved power, stability, and reproducibility; and (4) flexibly incorporate state-of-the-art and future association tests to achieve the benefits proposed here. In applications to whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP) and COPDGene samples from NHLBI Trans-Omics for Precision Medicine (TOPMed) Program we show that our method compared with conventional association tests can lead to substantially more discoveries.

View details for DOI 10.1038/s41467-021-22889-4

View details for PubMedID 34035245
Genome-wide analysis of common and rare variants via multiple knockoffs at biobank scale, with an application to Alzheimer disease genetics. American journal of human genetics He, Z., Le Guen, Y., Liu, L., Lee, J., Ma, S., Yang, A. C., Liu, X., Rutledge, J., Losada, P. M., Song, B., Belloy, M. E., Butler, R. R., Longo, F. M., Tang, H., Mormino, E. C., Wyss-Coray, T., Greicius, M. D., Ionita-Laza, I. 2021

Abstract

Knockoff-based methods have become increasingly popular due to their enhanced power for locus discovery and their ability to prioritize putative causal variants in a genome-wide analysis. However, because of the substantial computational cost for generating knockoffs, existing knockoff approaches cannot analyze millions of rare genetic variants in biobank-scale whole-genome sequencing and whole-genome imputed datasets. We propose a scalable knockoff-based method for the analysis of common and rare variants across the genome, KnockoffScreen-AL, that is applicable to biobank-scale studies with hundreds of thousands of samples and millions of genetic variants. The application of KnockoffScreen-AL to the analysis of Alzheimer disease (AD) in 388,051 WG-imputed samples from the UK Biobank resulted in 31 significant loci, including 14 loci that are missed by conventional association tests on these data. We perform replication studies in an independent meta-analysis of clinically diagnosed AD with 94,437 samples, and additionally leverage single-cell RNA-sequencing data with 143,793 single-nucleus transcriptomes from 17 control subjects and AD-affected individuals, and proteomics data from 735 control subjects and affected indviduals with AD and related disorders to validate the genes at these significant loci. These multi-omics analyses show that 79.1% of the proximal genes at these loci and 76.2% of the genes at loci identified only by KnockoffScreen-AL exhibit at least suggestive signal (p < 0.05) in the scRNA-seq or proteomics analyses. We highlight a potentially causal gene in AD progression, EGFR, that shows significant differences in expression and protein levels between AD-affected individuals and healthy control subjects.

View details for DOI 10.1016/j.ajhg.2021.10.009

View details for PubMedID 34767756
Powerful gene-based testing by integrating long-range chromatin interactions and knockoff genotypes. Proceedings of the National Academy of Sciences of the United States of America Ma, S., Dalgleish, J., Lee, J., Wang, C., Liu, L., Gill, R., Buxbaum, J. D., Chung, W. K., Aschard, H., Silverman, E. K., Cho, M. H., He, Z., Ionita-Laza, I. 2021; 118 (47)

Abstract

Gene-based tests are valuable techniques for identifying genetic factors in complex traits. Here, we propose a gene-based testing framework that incorporates data on long-range chromatin interactions, several recent technical advances for region-based tests, and leverages the knockoff framework for synthetic genotype generation for improved gene discovery. Through simulations and applications to genome-wide association studies (GWAS) and whole-genome sequencing data for multiple diseases and traits, we show that the proposed test increases the power over state-of-the-art gene-based tests in the literature, identifies genes that replicate in larger studies, and can provide a more narrow focus on the possible causal genes at a locus by reducing the confounding effect of linkage disequilibrium. Furthermore, our results show that incorporating genetic variation in distal regulatory elements tends to improve power over conventional tests. Results for UK Biobank and BioBank Japan traits are also available in a publicly accessible database that allows researchers to query gene-based results in an easy fashion.

View details for DOI 10.1073/pnas.2105191118

View details for PubMedID 34799441
A genome-wide scan statistic framework for whole-genome sequence data analysis. Nature communications He, Z., Xu, B., Buxbaum, J., Ionita-Laza, I. 2019; 10 (1): 3018

Abstract

The analysis of whole-genome sequencing studies is challenging due to the large number of noncoding rare variants, our limited understanding of their functional effects, and the lack of natural units for testing. Here we propose a scan statistic framework, WGScan, to simultaneously detect the existence, and estimate the locations of association signals at genome-wide scale. WGScan can analytically estimate the significance threshold for a whole-genome scan; utilize summary statistics for a meta-analysis; incorporate functional annotations for enhanced discoveries in noncoding regions; and enable enrichment analyses using genome-wide summary statistics. Based on the analysis of whole genomes of 1,786 phenotypically discordant sibling pairs from the Simons Simplex Collection study for autism spectrum disorders, we derive genome-wide significance thresholds for whole genome sequencing studies and detect significant enrichments of regions showing associations with autism in promoter regions, functional categories related to autism, and enhancers predicted to regulate expression of autism associated genes.

View details for DOI 10.1038/s41467-019-11023-0

View details for PubMedID 31289270
A semi-supervised approach for predicting cell-type specific functional consequences of non-coding variation using MPRAs. Nature communications He, Z. n., Liu, L. n., Wang, K. n., Ionita-Laza, I. n. 2018; 9 (1): 5199

Abstract

Predicting the functional consequences of genetic variants in non-coding regions is a challenging problem. We propose here a semi-supervised approach, GenoNet, to jointly utilize experimentally confirmed regulatory variants (labeled variants), millions of unlabeled variants genome-wide, and more than a thousand cell/tissue type specific epigenetic annotations to predict functional consequences of non-coding variants. Through the application to several experimental datasets, we demonstrate that the proposed method significantly improves prediction accuracy compared to existing functional prediction methods at the tissue/cell type level, but especially so at the organism level. Importantly, we illustrate how the GenoNet scores can help in fine-mapping at GWAS loci, and in the discovery of disease associated genes in sequencing studies. As more comprehensive lists of experimentally validated variants become available over the next few years, semi-supervised methods like GenoNet can be used to provide increasingly accurate functional predictions for variants genome-wide and across a variety of cell/tissue types.

View details for PubMedID 30518757
Unified Sequence-Based Association Tests Allowing for Multiple Functional Annotations and Meta-analysis of Noncoding Variation in Metabochip Data AMERICAN JOURNAL OF HUMAN GENETICS He, Z., Xu, B., Lee, S., Ionita-Laza, I. 2017; 101 (3): 340–52

Abstract

Substantial progress has been made in the functional annotation of genetic variation in the human genome. Integrative analysis that incorporates such functional annotations into sequencing studies can aid the discovery of disease-associated genetic variants, especially those with unknown function and located outside protein-coding regions. Direct incorporation of one functional annotation as weight in existing dispersion and burden tests can suffer substantial loss of power when the functional annotation is not predictive of the risk status of a variant. Here, we have developed unified tests that can utilize multiple functional annotations simultaneously for integrative association analysis with efficient computational techniques. We show that the proposed tests significantly improve power when variant risk status can be predicted by functional annotations. Importantly, when functional annotations are not predictive of risk status, the proposed tests incur only minimal loss of power in relation to existing dispersion and burden tests, and under certain circumstances they can even have improved power by learning a weight that better approximates the underlying disease model in a data-adaptive manner. The tests can be constructed with summary statistics of existing dispersion and burden tests for sequencing data, therefore allowing meta-analysis of multiple studies without sharing individual-level data. We applied the proposed tests to a meta-analysis of noncoding rare variants in Metabochip data on 12,281 individuals from eight studies for lipid traits. By incorporating the Eigen functional score, we detected significant associations between noncoding rare variants in SLC22A3 and low-density lipoprotein and total cholesterol, associations that are missed by standard dispersion and burden tests.

View details for PubMedID 28844485

View details for PubMedCentralID PMC5590864
Set-Based Tests for the Gene-Environment Interaction in Longitudinal Studies JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION He, Z., Zhang, M., Lee, S., Smith, J. A., Kardia, S. L. R., Roux, V., Mukherjee, B. 2017; 112 (519): 966–78

Abstract

We propose a generalized score type test for set-based inference for gene-environment interaction with longitudinally measured quantitative traits. The test is robust to misspecification of within subject correlation structure and has enhanced power compared to existing alternatives. Unlike tests for marginal genetic association, set-based tests for gene-environment interaction face the challenges of a potentially misspecified and high-dimensional main effect model under the null hypothesis. We show that our proposed test is robust to main effect misspecification of environmental exposure and genetic factors under the gene-environment independence condition. When genetic and environmental factors are dependent, the method of sieves is further proposed to eliminate potential bias due to a misspecified main effect of a continuous environmental exposure. A weighted principal component analysis approach is developed to perform dimension reduction when the number of genetic variants in the set is large relative to the sample size. The methods are motivated by an example from the Multi-Ethnic Study of Atherosclerosis (MESA), investigating interaction between measures of neighborhood environment and genetic regions on longitudinal measures of blood pressure over a study period of about seven years with 4 exams.

View details for PubMedID 29780190

View details for PubMedCentralID PMC5954413
Set-Based Tests for Genetic Association in Longitudinal Studies BIOMETRICS He, Z., Zhang, M., Lee, S., Smith, J. A., Guo, X., Palmas, W., Kardia, S. L. R., Roux, A., Mukherjee, B. 2015; 71 (3): 606–15

Abstract

Genetic association studies with longitudinal markers of chronic diseases (e.g., blood pressure, body mass index) provide a valuable opportunity to explore how genetic variants affect traits over time by utilizing the full trajectory of longitudinal outcomes. Since these traits are likely influenced by the joint effect of multiple variants in a gene, a joint analysis of these variants considering linkage disequilibrium (LD) may help to explain additional phenotypic variation. In this article, we propose a longitudinal genetic random field model (LGRF), to test the association between a phenotype measured repeatedly during the course of an observational study and a set of genetic variants. Generalized score type tests are developed, which we show are robust to misspecification of within-subject correlation, a feature that is desirable for longitudinal analysis. In addition, a joint test incorporating gene-time interaction is further proposed. Computational advancement is made for scalable implementation of the proposed methods in large-scale genome-wide association studies (GWAS). The proposed methods are evaluated through extensive simulation studies and illustrated using data from the Multi-Ethnic Study of Atherosclerosis (MESA). Our simulation results indicate substantial gain in power using LGRF when compared with two commonly used existing alternatives: (i) single marker tests using longitudinal outcome and (ii) existing gene-based tests using the average value of repeated measurements as the outcome.

View details for PubMedID 25854837

View details for PubMedCentralID PMC4601568
Modeling and Testing for Joint Association Using a Genetic Random Field Model BIOMETRICS He, Z., Zhang, M., Zhan, X., Lu, Q. 2014; 70 (3): 471–79

Abstract

Substantial progress has been made in identifying single genetic variants predisposing to common complex diseases. Nonetheless, the genetic etiology of human diseases remains largely unknown. Human complex diseases are likely influenced by the joint effect of a large number of genetic variants instead of a single variant. The joint analysis of multiple genetic variants considering linkage disequilibrium (LD) and potential interactions can further enhance the discovery process, leading to the identification of new disease-susceptibility genetic variants. Motivated by development in spatial statistics, we propose a new statistical model based on the random field theory, referred to as a genetic random field model (GenRF), for joint association analysis with the consideration of possible gene-gene interactions and LD. Using a pseudo-likelihood approach, a GenRF test for the joint association of multiple genetic variants is developed, which has the following advantages: (1) accommodating complex interactions for improved performance; (2) natural dimension reduction; (3) boosting power in the presence of LD; and (4) computationally efficient. Simulation studies are conducted under various scenarios. The development has been focused on quantitative traits and robustness of the GenRF test to other traits, for example, binary traits, is also discussed. Compared with a commonly adopted kernel machine approach, SKAT, as well as other more standard methods, GenRF shows overall comparable performance and better performance in the presence of complex interactions. The method is further illustrated by an application to the Dallas Heart Study.

View details for PubMedID 24628067
Basic Science and Pathogenesis. Alzheimer's & dementia : the journal of the Alzheimer's Association Reid, D. M., Cook, N., Yang, C., Song, S., Apio, C., Western, D., Guen, Y. L., Stewart, I., Young, C. B., Mormino, E. C., Napolioni, V., He, Z., Altmann, A., Wingo, A. P., Wingo, T. S., Cruchaga, C., Sung, Y. J., Greicius, M. D., Belloy, M. E. 2025; 21 Suppl 1 (Suppl 1): e099094

Abstract

To elucidate sex differences in Alzheimer's disease (AD), we conducted the largest-to-date sex-stratified genome-wide association study (GWAS) of AD. To further increase power and identify sex-specific, potentially druggable AD causal proteins, we performed proteome-wide association studies (PWAS) integrating GWAS with proteogenomic (i.e., protein quantitative trait locus [pQTL]) brain and cerebrospinal fluid (CSF) datasets.Sex-stratified and sex-heterogeneity AD GWAS were conducted in European ancestry individuals using a 3-stage design, followed by fixed-effects meta-analysis (Figure 1A). PWAS were conducted via FUSION, combining sex-stratified AD GWAS with sex-matched and non-sex-stratified protein-specific variant weights, respectively (Figure 2A). Significant findings in European ancestry were evaluated for sex heterogeneity consistency with admixed African ancestry AD GWAS (improved sex heterogeneity p-value upon fixed-effects meta-analysis) and PWAS (sample-size weighted Z-score combination improved in matching sex and non-significant Z-score [P>0.05] in opposite sex).GWAS identified 1 sex-heterogeneous, 14 female-specific, and 5 male-specific loci, with 13 out 20 total loci (65%) showing consistent sex heterogeneity in African ancestry data, and 5 out of 20 being novel AD risk loci (Figure 1B-C). Brain and CSF AD PWAS identified 66 and 19 genes significantly associated with AD in females, respectively, whereas 23 and 17 were identified for males. Upon filtering for sex-specificity, 34 (52%) female and 4 (21%) male-specific genes were identified (Figure 2B). Out of 38 total sex-specific genes, 27 were present in 22 novel AD loci, and 23 out of 30 total unique loci (77%) showed persistent sex heterogeneity upon integration with African ancestry data (Figure 2B). The brain contributed 33 genes, of which 7 were uniquely observed through sex-matched PWAS (Figure 2C). There were few overlapping significant sex-stratified proteins between brain and CSF; however, 4 out of 5 overlapping proteins displayed concordant effect directions (Figure 2D).Sex-stratified GWAS and PWAS identified 20 and 30 sex-specific AD loci/genes, respectively, with high sex heterogeneity accordance in exploratory African-admixed AD GWAS and PWAS. To provide validation and help prioritize probable causal genes at novel, significant GWAS/PWAS loci, colocalization analyses were performed with various QTL datasets (data not shown). These findings enhance our understanding of AD pathogenesis and risk, which may inform drug target development relevant to sex-specific personalized medicine.

View details for DOI 10.1002/alz70855_099094

View details for PubMedID 41434954

View details for PubMedCentralID PMC12726539
Public Health. Alzheimer's & dementia : the journal of the Alzheimer's Association Lu, O., Trelle, A. N., Young, C. B., Mormino, E. C., Wagner, A. D., He, Z., Carr, V. A., Sha, S. J., Vossler, H., Romero, A., Park, J., Skylar-Scott, I. A. 2025; 21 Suppl 6 (Suppl 6): e107011

Abstract

Non-pharmacological approaches may guide prevention and treatment strategies for patients with Alzheimer's disease (AD), but little is known about the "dose" and "type" required. The goal of this study is to examine what types of social and cognitive activities are associated with cognitive performance and AD biomarkers.173 cognitively normal participants (69.0 ± 6.4 years) completed a questionnaire regarding frequency and hours per week of 7 social and 5 cognitively engaging activities. Participants underwent neuropsychological testing (z-scored composites of executive function, working memory, attention, episodic memory, visuospatial function, and language as well as a global cognitive composite), CSF biomarker testing (127 individuals: Aβ-42/Aβ-40, p-tau181, and t-tau), and plasma testing (127: Aβ-42/Aβ-40 and pTau181). Regression co-variates included age, sex, education, and APOE. Because this was a hypothesis-driven analysis, multiple comparisons corrections were deferred.Visiting loved ones more frequently was associated with higher global cognition (β=0.20, p = 0.0097), executive function (β=0.17, p = 0.037), and working memory (β=0.20, p = 0.021) scores. More hours spent on these visits was associated with higher executive function scores (β =0.15, p = 0.049). Volunteering more frequently was positively associated with global cognition (β=0.20, p = 0.0069), executive (β=0.17, p = 0.031), and working memory (β=017, p = 0.046); volunteering for more hours per week was positively associated with executive function (β=0.18, p = 0.016). There was a negative association between cognitive performance and senior center attendance (executive function: β=-0.17, p = 0.033) and church attendance (language and frequency: β=-0.21, p = 0.012; language and hours: β=-0.19, p = 0.020). Attending clubs more frequently was positively associated with working memory (β=0.17, p = 0.047). Using computers for longer was significantly associated with global cognition (β=0.26, p <0.001), executive function (β=0.29, p <0.001), working memory (β=0.17, p = 0.045), attention (β=0.18, p = 0.026), and language (β=0.18, p = 0.029). Additionally, doing woodworking, needlework, drawing, or other crafts for longer was positively associated with executive function (β=0.15, p = 0.048). Games, billiards, attending events, playing an instrument, and reading were not associated with cognitive performance. There was no association between these activities and AD biomarkers.In a comprehensive analysis of the associations between types of social and cognitive activities on cognitive scores, social visits, volunteering, clubs, computer use, and crafts were associated with higher cognitive performance.

View details for DOI 10.1002/alz70860_107011

View details for PubMedID 41434558

View details for PubMedCentralID PMC12726314
Developing Topics. Alzheimer's & dementia : the journal of the Alzheimer's Association Park, J., He, Z., Greicius, M. D. 2025; 21 Suppl 7 (Suppl 7): e108637

Abstract

Polygenic risk scores (PRS) are widely used to predict Alzheimer's disease (AD) risk. Most models use millions of genome-wide SNPs, including many in non-coding regions. However, this can increase computational cost and reduce interpretability, with limited gains in predictive performance. In this study, we assessed whether narrowing down the SNP space to more biologically interpretable SNP spaces (whole exome or non-synonymous exonic variants) can still give us reasonable predictive power.We analyzed 35,123 European ancestry participants from the UK Biobank, consisting of 31,604 age-matched healthy controls and 3,519 AD patients (Table 1). PRS performance was compared across three SNP spaces after QC (HWE p < 10-6, MAF > 0.01): (1) imputed (∼5.4M SNPs), (2) whole exome (∼90K), and (3) non-synonymous exonic variants (∼28K). GWAS was conducted for each space, and three modelling methods- Clumping and Thresholding (C+T), Lasso regression, and Extreme Gradient Boosting (XGBoost)-were applied using resulting summary statistics. A 5-fold cross-validation (CV) framework was used with training/validation/test splits. SNPs were selected using p -value thresholds from 10-8 to 0.05, and the best model was chosen based on validation AUC and subsequently evaluated on the test set.A model using only APOE ε2/ε4 genotype served as a baseline (AUC of 0.68-0.69.). Across all methods, PRS models based on whole exome or non-synonymous SNPs showed comparable accuracy to those using imputed SNPs, despite far fewer variants (Figure 1A). Notably, when APOE ε2/ε4 genotype was excluded from PRS modelling, performance declined most sharply in models based on non-synonymous exome variants, while whole exome-based models retained accuracy comparable to genome-wide models (Figure 1B).Our findings underscore the potential of using biologically relevant SNP subsets to simplify models while maintaining performance. They also indicate that future PRS models for AD can be effectively developed using whole exome SNPs, providing a practical balance between predictive power and model simplicity. Taken together, these results support the use of more focused variant sets in cases where genome-wide approaches may be computationally intensive or unnecessarily complex.

View details for DOI 10.1002/alz70861_108637

View details for PubMedID 41434507

View details for PubMedCentralID PMC12726378
Integrative Genetic, Proteogenomic, and Multi-omics Analyses Reveal Sex-Biased Causal Genes and Drug Targets in Alzheimer's Disease. medRxiv : the preprint server for health sciences Cook, N., Yang, C., Zeng, Y., Sivasankaran, S. K., Song, S., Talozzi, L., Western, D., Yang, C., Liu, Y., Le Guen, Y., Stewart, I., Young, C., Mormino, E. C., Altmann, A., He, Z., Napolioni, V., Wingo, A. P., Wingo, T. S., Cruchaga, C., Sung, Y. J., Greicius, M. D., Belloy, M. E. 2025

Abstract

Sex differences are pervasive in Alzheimer's disease, but the underlying drivers remain poorly understood. To address this, we performed sex-stratified genome-wide association studies of Alzheimer's disease in ~1,000,000 individuals, which we subsequently integrated with proteogenomics datasets from neurological tissues to identify candidate causal genes. We further prioritized genes through additional multi-omics approaches, including quantitative trait locus summary-based mendelian randomization and colocalization. Altogether, we prioritized 125 female-biased and 21 male-biased risk genes. Female-biased pathways included amyloid, neurite, stress, clearance, and immune processes, with genes enriched for microglia and astrocyte expression. Through computational drug repurposing analyses, a set of sex hormone related drugs, converging on Epidermal Growth Factor Receptor (EGFR), were uniquely prioritized in women. Finally, we identified Haptoglobin (HP) as a female-specific gene, leveraging long-read sequencing approaches to implicate a link to oxidative stress, APOE, and hemoglobin biology. Altogether, our findings provide a portal into sex-specific precision medicine for Alzheimer's disease.

View details for DOI 10.1101/2025.10.31.25339089

View details for PubMedID 41282793

View details for PubMedCentralID PMC12636628
Long-read genome sequencing and multi-omics in aging and neurodegeneration. medRxiv : the preprint server for health sciences Jensen, T. D., Le Guen, Y., Talozzi, L., Yang, S., Gorzynski, J., Peña-Tauber, A., Stewart, I., Ferrasse, A., Nachun, D., Arriaga, M. T., Lee, J., Pulgrossi, R. C., Park, J., Zhang, J., Wagner, A. D., Mormino, E. C., Poston, K. L., Henderson, V. W., He, Z., Wyss-Coray, T., Montgomery, S. B., Ashley, E. A., Greicius, M. D. 2025

Abstract

Structural variants (SVs) are a major source of genetic variation yet remain underexplored in healthy aging and neurodegenerative diseases. We performed nanopore long-read genome sequencing (lrGS) on 551 deeply-phenotyped individuals from Stanford's Aging and Memory Study and Alzheimer's Disease Research Center, generating a comprehensive SV map integrated with matched methylation, transcriptomic, and proteomic data. Over 60% of SVs identified by lrGS were not detected with short-read WGS, including many poorly tagged by single-nucleotide variants (SNVs). We discovered >60,000 SV-QTLs across molecular traits and showed that SVs were more likely than SNVs to be fine-mapped as causal. Colocalization with Alzheimer's and Parkinson's disease GWAS implicated SVs at multiple loci, including TMEM106B, BIN3, and NBEAL1. Multi-omic outlier enrichment and Bayesian modeling prioritized rare functional SVs near known risk genes. Combined, these data reveal widespread regulatory SVs in healthy aging and neurodegeneration, underscoring the importance of lrGS in deciphering complex genetic architecture.

View details for DOI 10.1101/2025.10.10.25337775

View details for PubMedID 41282933

View details for PubMedCentralID PMC12633103
Socioeconomic Factors Associated With Migraine Medication Prescription at a Tertiary Headache Center: A Retrospective Cohort Analysis. Neurology. Clinical practice Nandyala, A. S., Tan, K., Africk, B., Graber-Naidich, A., Zhang, N., He, Z., Moskatel, L. S. 2025; 15 (5): e200517

Abstract

The socioeconomic and demographic factors affecting the prescription of migraine medications are underexplored. Understanding these factors is critical to addressing health. We used our tertiary headache center's prescription database to assess the demographic and socioeconomic factors associated with the prescription of acute and preventive migraine medications and the factors affecting the rollout of novel migraine medications.We performed a retrospective cohort analysis using aggregated deidentified data of patients who had received care through the Stanford Headache Clinic using data adapted from the Stanford deidentified instance of the Observational Medical Outcomes Partnership Common Data Model. We included patients in California who had received a diagnosis of chronic migraine and had received at least 1 prescription from our clinic between 2018 and 2022. The types and volumes of prescriptions were assessed, as well as demographic factors (age, sex, race ethnicity, and zip code income quartile).A total of 4,213 patients met inclusion criteria, of whom 3,349 (79.5%) were women and 863 (20.5%) were men, with a mean age of 44.6 ± 14.7 years. Our group was predominantly White and non-Hispanic/non-Latino (2,381/4213, 56.5%) and came from zip codes whose median income ranged from $77,250 to $236,912 (2046/3298, 62.0%). Age, sex, and race-ethnicity were all found to be statistically significant factors in the selection of both acute and preventive medications for patients. Zip code income quartile played a limited role in prescription variation for both acute and preventive medications. Race-ethnicity was also a statistically significant factor for those who received a prescription for a calcitonin gene-related peptide (CGRP) monoclonal antibody and a gepant. Similarly, sex, race-ethnicity, and zip code income quartile were all factors in the rollout of the CGRP monoclonal antibodies and gepants (all p < 0.05), but age was not (p = 0.722 and p = 0.057, respectively). The second and third zip code income quartiles had the lowest prescription rates of the CGRP monoclonal antibodies and gepants during their rollout.Disparities in sex, race-ethnicity, and zip code income quartile were found among those who received medications and which acute and preventive migraine medications were prescribed. This may reflect that some groups may have received less headache-specific care before establishing with our clinic. Future research will seek to better illuminate the underlying reasons for this more clearly to enable solutions and ensure equitable care.

View details for DOI 10.1212/CPJ.0000000000200517

View details for PubMedID 40741480

View details for PubMedCentralID PMC12307023
Distilling Direct Effects via Conditional Differential Gene Expression Analysis. bioRxiv : the preprint server for biology Gu, J., Skelton, A., Staley, J., Popson, P., Peng, L., Song, X., Knowles, J., He, Z. 2025

Abstract

Understanding gene expression levels is crucial for comprehending gene functions, gene-gene interactions and disease mechanisms. Differential gene expression (DGE) analysis is a widely used statistical approach that offers insights by comparing gene expression across various conditions. However, traditional DGE methods focus on what are known as marginal associations, which refer to correlations observed between gene expression and a trait of interest, even if that association is indirect or not causal. To address this limitation, we introduce conditional differential gene expression (CDGE) analysis, a framework designed to identify direct effect genes. Direct effect genes are those whose changes in expression causally and directly impact downstream biological processes of interest. In applications to three RNA sequencing datasets (including one genome-scale perturb-seq dataset), CDGE analysis identifies that only a small fraction of differentially expressed genes has direct effects and mediate most other gene actions. These direct effect genes offer greater biological insight in enrichment analyses involving protein interactions and pathways. This suggests that CDGE yields more informative conclusions on causal gene effects and could become a key tool for studying biological pathways.

View details for DOI 10.1101/2025.09.29.678272

View details for PubMedID 41256684

View details for PubMedCentralID PMC12621742
A retrospective cohort study to evaluate the effectiveness and safety profile of occipital nerve blocks in the treatment of migraine during pregnancy. Headache Smirnoff, L., Moskatel, L., Shaw, B., He, Z., Peretz, A. 2025

Abstract

Nearly 12% of Americans experience migraine, with 75% of that group represented by women aged 15-55 years, notably including peak childbearing years. This presents a therapeutic dilemma for pregnant patients, given that most medications for migraine range from unknown teratogenicity in human pregnancies, at best, to known teratogenicity, severely limiting their utility. However, migraine causes significant disability and impairment in the lives of pregnant patients, necessitating treatment. We conducted a retrospective chart review and phone survey to evaluate the safety profile and effectiveness of bilateral occipital nerve blocks to treat migraine during pregnancy.We conducted a retrospective review of charts of women aged 18-50 years who received bilateral occipital nerve blocks at the Stanford Headache Clinic between January 1, 2014 and December 31, 2020 during their pregnancies for the treatment of migraine and followed up with phone call surveys to address fetal outcomes as well as effectiveness of the nerve blocks.Thirty patients met inclusion criteria, and 21 responded to our survey. Of the 21 surveyed, none experienced significant pregnancy complications, negative fetal outcomes, or an increased rate of miscarriage. Participants receiving nerve blocks noted a reduction in pain on a visual analog scale from an average of 7 to 2 (p < 0.001) as well as from 9 days to 4 days of acute medication use per month (p = 0.002).Based on this limited retrospective cohort study, serial occipital nerve blocks may offer a safe and potentially effective option for treatment of migraine during pregnancy. Occipital nerve blocks may improve the overall quality of life, decrease disability rates, and decrease the use of potentially teratogenic therapies in pregnant women. Future larger and prospective studies are needed to better assess the safety profile and effectiveness of occipital nerve blocks for pregnant patients with migraine.

View details for DOI 10.1111/head.15001

View details for PubMedID 40621922
It's a wrap: deriving distinct discoveries with FDR control after a GWAS pipeline. bioRxiv : the preprint server for biology Chu, B. B., He, Z., Sabatti, C. 2025

Abstract

Recent work has shown how to test conditional independence hypotheses between an outcome of interest and a large number of explanatory variables with false discovery rate control (FDR), even without access to individual level data. In the case of genome-wide association studies (GWAS) specifically, summary statistics resulting from the standard analysis pipeline can be used as input of a procedure which identifies distinct signals across the genome with FDR control. This secondary analysis requires sampling of negative controls (knockoff) from a distribution determined by the linkage disequilibrium patterns in the genome of the population under study. In prior work, we have pre-computed this distribution for European genomes, starting from information derived from the UK Biobank. Thus, researchers working with European GWASes can carry out a knockoff analysis with minimal computational costs, using the distributed routine GhostKnockoffGWAS. Here we introduce and release a new software (solveblock) that extends this capability to a much richer collection of studies. Given a set of genotyped samples, or a reference dataset, our pipeline efficiently estimates the high-dimensional correlation matrices that describe correlation structures across the genome, making rather common sparsity assumptions. Taking this sample-specific estimate as input, the software identifies groups of genetic variants that are highly correlated, and uses them to define an appropriate resolution for conditional independence hypotheses. Finally, we compute the distribution for the exchangeable negative controls necessary to test these hypotheses. The output of solveblock can be passed directly to GhostKnockoffGWAS, allowing users to carry out the complete analysis in a two step procedure. We illustrate the performance of the routine analyzing data from five UK Biobank sub-populations. In simulations, our method controls FDR. Analyzing real data relative to 26 phenotypes of varying polygenicity in British individuals, we make an average of ≈ 19 additional discoveries, compared to standard marginal association testing. Our code, precompiled software, and processed files for these five subpopulations are openly shared.

View details for DOI 10.1101/2025.06.05.658138

View details for PubMedID 40502041

View details for PubMedCentralID PMC12157521
Uncovering Heterogeneous Effects via Localized Feature Selection. bioRxiv : the preprint server for biology Liu, X., Gu, J., Chen, Z., Chu, B., Liu, L., Morrison, T., Butler, R. R., Edelson, J., Li, J., Longo, F. M., Tang, H., Ionita-Laza, I., Sabatti, C., Candès, E., He, Z. 2025

Abstract

Identifying features that interact to trigger disease, while accounting for heterogeneity across diverse populations, is essential for the development of precision and targeted medicine. Despite the availability of vast and complex health-related datasets, most existing works focus on identifying disease-associated features at the population level or within a few subpopulations, often overlooking individual-level heterogeneity within these groups. To address this limitation, we propose a novel framework that utilizes localized test statistics to identify disease-associated features tailored to individual profiles. Our method leverages the recently developed knockoffs methodology to control the noise level of the selection set so that the results are replicable. Moreover, it allows for the discovery of hidden heterogeneous effects within the data, as demonstrated in an application to single-cell RNA sequencing data for Alzheimer's disease. By aggregating localized feature selection results, our framework also enables powerful population-level feature selection. Our framework provides a powerful tool for exploratory studies of precision medicine, offering the potential to generate novel hypotheses for confirmatory biological experiments.

View details for DOI 10.1101/2025.06.03.657761

View details for PubMedID 40661389

View details for PubMedCentralID PMC12258879
Robust inference with GhostKnockoffs in genome-wide association studies. Research square Qi, X., Belloy, M. E., Gu, J., Liu, X., Tang, H., He, Z. 2025

Abstract

Genome-wide association studies (GWASs) have been extensively adopted to depict the underlying genetic architecture of complex traits. Recent studies have demonstrated that for feature selection in GWASs data, in addition to controlling the familywise error rate (FWER), the false discovery rate (FDR) serves as an appealing alternative for detecting small effect loci associated with polygenic traits. However, the presence of correlations among genetic variants makes direct application of usual FDR-controlling procedures to marginal association tests ineffective. The knockoffs-based methods have shown guarantee in FDR control in GWASs, but their statistical validity and effectiveness in studies with related individuals remain unexplored. In this paper, we propose a knockoff-based approach by integrating recently proposed GhostKnockoffs and state-of-the-art marginal association tests. We show that GhostKnockoffs, which only requires GWAS Z-scores as input, is robust to arbitrary relatedness structure as long as the input Z-scores are derived from valid generalized linear mixed models. Therefore, it can be flexibly applied on top of the standard GWASs pipeline that accounts for relatedness to enhance the discovery of small effect loci. This robustness also generalizes GhostKnockoffs to other GWASs settings, such as the meta-analysis of multiple overlapping studies and studies based on association test statistics deviated from score tests. We demonstrate the method's performance using simulation studies and a meta-analysis of nine European ancestral genome-wide association studies and whole exome/genome sequencing studies for the Alzheimer's disease.

View details for DOI 10.21203/rs.3.rs-6396196/v1

View details for PubMedID 40386429
Repetitive Transcranial Magnetic Stimulation Modulates Brain Connectivity in Children with Self-limited Epilepsy with Centrotemporal Spikes. Brain stimulation She, X., Qi, W., Nix, K. C., Menchaca, M., Cline, C. C., Wu, W., He, Z., Baumer, F. M. 2025

Abstract

Self-limited epilepsy with centrotemporal spikes (SeLECTS) is a common pediatric syndrome in which interictal epileptiform discharges (IEDs) emerge from the motor cortex and children often develop language deficits. IEDs may induce these language deficits by pathologically enhancing brain connectivity. Using a sham-controlled design, we test the impact of inhibitory low-frequency repetitive transcranial magnetic stimulation (rTMS) on connectivity and IEDs in SeLECTS.Nineteen children participated in a cross-over study comparing active vs. sham motor cortex rTMS. Single pulses of TMS combined with EEG (spTMS-EEG) were applied to the motor cortex before and after rTMS to probe connectivity. Connectivity was quantified by calculating the weighted phase lag index (wPLI) between six regions of interest: bilateral motor cortices (implicated in SeLECTS) and bilateral inferior frontal and superior temporal regions (important for language). IED frequency before and after rTMS was also quantified.Active, but not sham, rTMS decreased wPLI connectivity between multiple regions, with the greatest reductions seen in superior temporal connections in the stimulated hemisphere. IED frequency decreased after active but not sham rTMS.Low-frequency rTMS reduces pathologic hyperconnectivity and IEDs in children with SeLECTS, making it a promising avenue for therapeutic interventions for SeLECTS and potentially other pediatric epilepsy syndromes.

View details for DOI 10.1016/j.brs.2025.02.018

View details for PubMedID 40010636
Indications for continuous electroencephalography and frequency of electrographic seizure detection in a pediatric and neonatal cardiovascular intensive care unit. Epilepsia Segal, J. B., Yang, J. K., Silverman, A., Darji, H., He, Z., Campen, C. J. 2025

Abstract

OBJECTIVE: Seizures are a recognized complication of critical cardiovascular illness in infants and children. We assessed the diagnostic yield of continuous video-electroencephalography (cEEG) in a pediatric and neonatal cardiovascular intensive care unit (CVICU) by the symptoms and risk factors prompting cEEG evaluation.METHODS: This retrospective case series included all consecutive cEEGs in patients ≤21years old performed in one CVICU over 38months. cEEG indications were categorized as (1) index symptoms of concern and/or (2) clinical risk factors. Index symptoms were divided into (1) vital sign symptoms (i.e., heart rate, blood pressure, oxygen, respiration, or temperature) and (2) non-vital sign symptoms (i.e., mental status, abnormal movements, eye findings, weakness, or failed extubation). Indications for cEEG were extracted by manual chart review. The presence of seizures was established electrographically from neurophysiologist reports.RESULTS: There were 605 cEEGs from 411 patients. The median study was 26h (25%-75%, interquartile range=20-41h). Seizures were detected in 57 of 605 (9%) cEEGs overall; in 34 of 356 (10%) cEEGs obtained for risk factors alone (odds ratio [OR] =1.03, 95% confidence interval [CI] = .60-1.82, p=.90), 0 of 104 (0%) for isolated vital sign changes (p<.001), 10 of 101 (10%) for symptoms not involving vital signs (OR=1.06, 95% CI = .52-2.09, p=.88), and in 13 of 44 (30%) for both vital sign and non-vital sign symptoms (OR=4.93, 95% CI=2.45-9.77, p<.001). On univariate analysis, symptoms involving gaze deviation, abnormal limb movements, or intermittent oxygen desaturation, and the risk factors of preexisting epilepsy, recent neurosurgery, acute stroke, and cardiac air embolism were associated with seizures (p<.05).SIGNIFICANCE: There were zero electrographic seizures in cEEGs obtained for isolated vital sign changes, whereas cEEGs obtained for the combination of vital sign changes and other non-vital sign symptoms were five times more likely to detect electrographic seizures than cEEGs obtained based on risk factors alone.

View details for DOI 10.1111/epi.18253

View details for PubMedID 39760979
Public Health. Alzheimer's & dementia : the journal of the Alzheimer's Association Lu, O., Mormino, E., He, Z., Carr, V. A., Trelle, A. N., Young, C. B., Romero, A., Vossler, H., Park, J., Skylar-Scott, I. A. 2024; 20 Suppl 7: e084160

Abstract

It is increasingly clear that delaying the onset of Alzheimer's disease (AD) dementia by several years can meaningfully lower its prevalence. The goal of the present study is to examine the relationship between lifestyle activities and cognition function as well as cerebrospinal fluid (CSF) biomarkers of AD to determine whether these activities can serve as protective factors for AD resistance and resilience.173 cognitively normal older individuals (mean ± SD, 69 ± 6.4 years) were recruited to the Stanford Aging and Memory Study (SAMS) and completed the Community Healthy Activities Model Program for Seniors (CHAMPS) questionnaire regarding current social, cognitive, and physical activity (Table 1). They also underwent APOE genetic testing and a detailed neuropsychological evaluation. The following cognitive domains were evaluated after conversion to z-scores: global cognition (cognitive composite), executive function, working memory, attention, episodic memory, visuospatial function, and language (see Table 2 for definitions). 127 participants completed lumbar punctures, and levels of Aβ-40, Aβ-42, p-tau181, and total tau were measured in the CSF. Cross-sectional regression models included age, sex, years of education, and APOE status as co-variates. Benjamini-Hochberg corrections for multiple hypotheses were completed.There was a significant association between social activity (frequency/week) and global cognition (β = 0.20, p = 0.03), executive function (β = 0.15, p<0.05), and working memory (β = 0.26, p = 0.01) but not episodic memory, visuospatial function, or language function. There was also a significant association with attention prior to correction for multiple hypotheses (β = 0.17, p = 0.04) but not afterward (Table 2). Additionally, there was a significant association between cognitive activity (hours/week) and global cognition (β = 0.19, p = 0.03) as well as executive function (β = 0.25, p = 0.007) but not with other cognitive domains tested (Table 3). There was no association between light or moderate caloric expenditure and cognitive measures. There was also no significant relationship between CSF biomarkers and levels of social, cognitive, and physical activity.In a well-characterized cohort of cognitively normal older adults, higher levels of social and cognitive activity were associated with higher cognitive scores on tasks of executive function but not episodic memory. The mechanism mediating this relationship appears to be independent of both Aβ and tau burden.

View details for DOI 10.1002/alz.084160

View details for PubMedID 39784972
Age-associated proteins explain the role of medial temporal lobe networks in Alzheimer's disease. GeroScience Turnbull, A., Kim, Y., Zhang, K., Jiang, X., He, Z., Henderson, V. W., Lin, F. V. 2024

Abstract

The structural connectivity (SC) of the medial temporal lobe and its associated cortical anterior temporal and posterior medial networks (MTL-AT-PM) is linked to pathologies and memory decline in Alzheimer's disease (AD). However, neuroimaging analyses cannot tell us how SC changes occur in AD at the molecular level and do not provide a means of intervening to slow/prevent pathology-related changes in MTL-AT-PM SC. The current study aimed to understand how and where AD-related changes occur within MTL-AT-PM using proteomics. We used a 4-step approach in 101 older adults from a local sample, aiming to understand how proteins and SC in combination at the multivariate level predict AD pathology, and to identify specific proteins related to SC and AD pathology. Separately, we validated the discovered proteins in relation to SC and AD pathology using ADNI sample. We identified 12 latent factors linking proteins and SC; five showed significant relationships with AD pathology and/or episodic memory. Insulin-like growth factor binding proteins and tumor necrosis factor receptors, and hippocampal/parahippocampal edges contributed most to AD-related latent factors. Fast causal inference found protein-protein, protein-SC, and protein-pathology pathways, with seven proteins showing directional links to SC and AD-related neurodegeneration. We validated these results by identifying significant relationships between six available proteins with SC and amyloid-beta and phosphorylated tau in ADNI. We identified multivariate relationships between proteins and MTL-AT-PM networks that add to our understanding of AD pathology and suggest specific non-pathological proteins that warrant further study in relation to brain networks and AD pathology as possible therapeutic targets.

View details for DOI 10.1007/s11357-024-01291-0

View details for PubMedID 39080151

View details for PubMedCentralID 7174043
Education Research: Sustained Implementation of Quality Improvement Practices Is Observed in Early Career Physicians Following a Neurology Resident QI Curriculum. Neurology. Education Xiong, K., Miller-Kuhlmann, R. K., Scott, B. J., He, Z., Dujari, S., Gold, C., Kvam, K. 2024; 3 (2): e200137

Abstract

Background and Objectives: The Accreditation Council for Graduate Medical Education and American Board of Psychiatry and Neurology expect engagement in quality improvement (QI) activities for all residents and practicing neurologists. Our neurology residency program instituted an experiential Neurology Residency QI Curriculum in 2015 for all residents. In this study, we aimed to characterize the role of QI engagement in the early-career paths of program graduates.Methods: We distributed an online survey evaluating QI training, scholarship, and leadership (before, during, and after residency training) to all individuals who graduated from our residency program (graduation years 2017-2021). Primary outcomes were QI project leadership or mentorship and QI scholarship (projects, posters, and publications) after residency. Predictors of these outcomes were also evaluated using Fisher exact test.Results: Twenty-nine of 50 graduates (58%) completed the survey. Median time from residency graduation was 3 years. Of the respondents, 14% actively participated in a QI project before residency, 83% during residency, and 48% after graduating. In addition, 41% had led or mentored a QI project and 34% had performed QI scholarship since residency. Fourteen percent of participants held formal roles in QI or patient safety, while 24% received formal full-time equivalents for QI work. Significant predictors (p < 0.05) of QI leadership included older age, time since graduation, rank, and participation in Clinical Effectiveness Leadership Training (CELT-an institutional QI faculty development course). Significant predictors (p < 0.05) of QI scholarship included older age, time since graduation, participation in CELT, and participation in QI scholarship during residency. QI training, participation, and/or project leadership before residency did not predict either QI leadership or scholarship after residency.Discussion: Many neurology residency graduates continued to lead QI projects and produce QI scholarship in the early years after graduation. However, receiving protected time for leadership and academic work in this area is uncommon. Our findings suggest that more infrastructure, including training, career development, and mentorship, can foster neurologists interested in leading in quality and patient safety. In academic models, promotion pathways that support academic advancement for faculty leading in QI are needed.

View details for DOI 10.1212/NE9.0000000000200137

View details for PubMedID 39359889
Temporal tau asymmetry spectrum influences divergent behavior and language patterns in Alzheimer's disease. Brain, behavior, and immunity Younes, K., Smith, V., Johns, E., Carlson, M. L., Winer, J., He, Z., Henderson, V. W., Greicius, M. D., Young, C. B., Mormino, E. C. 2024

Abstract

Understanding the psychiatric symptoms of Alzheimer s disease (AD) is crucial for advancing precision medicine and therapeutic strategies. The relationship between AD behavioral symptoms and asymmetry in spatial tau PET patterns is not well-known. Braak tau progression implicates the temporal lobes early. However, the clinical and pathological implications of temporal tau laterality remain unexplored. This cross-sectional study investigated the correlation between temporal tau PET asymmetry and behavior assessed using the neuropsychiatric inventory and composite scores for memory, executive function, and language, using data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. In the entire cohort, continuous right and left temporal tau contributions to behavior and cognition were evaluated, controlling for age, sex, education, and tau burden on the contralateral side. Additionally, a temporal tau laterality index was calculated to define "asymmetry-extreme" groups (individuals with laterality indices greater than two standard deviations from the mean). 695 individuals (age = 73.9 ± 7.6 years, 372(53.5 %) females) were included, comprising 281(40 %) cognitively unimpaired (CU) amyloid negative, 185(27 %) CU amyloid positive, and 229(33 %) impaired (CI) amyloid positive participants. In the full cohort analysis, right temporal tau was associated with worse behavior (B = 8.14, p-value = 0.007), and left temporal tau was associated with worse language (B = 1.4, p-value < 0.001). Categorization into asymmetry-extreme groups revealed 20 right- and 27 left-asymmetric participants. Within these extreme groups, there was additional heterogeneity along the anterior-posterior dimension. Asymmetrical tau burden is associated with distinct behavioral and cognitive profiles. Wide multi-cultural implementation of social cognition measures is needed to understand right-sided asymmetry in AD.

View details for DOI 10.1016/j.bbi.2024.05.002

View details for PubMedID 38710339
Stability of transcranial magnetic stimulation electroencephalogram evoked potentials in pediatric epilepsy. Scientific reports She, X., Nix, K. C., Cline, C. C., Qi, W., Tugin, S., He, Z., Baumer, F. M. 2024; 14 (1): 9045

Abstract

Transcranial magnetic stimulation paired with electroencephalography (TMS-EEG) can measure local excitability and functional connectivity. To address trial-to-trial variability, responses to multiple TMS pulses are recorded to obtain an average TMS evoked potential (TEP). Balancing adequate data acquisition to establish stable TEPs with feasible experimental duration is critical when applying TMS-EEG to clinical populations. Here we aim to investigate the minimum number of pulses (MNP) required to achieve stable TEPs in children with epilepsy. Eighteen children with Self-Limited Epilepsy with Centrotemporal Spikes, a common epilepsy arising from the motor cortices, underwent multiple 100-pulse blocks of TMS to both motor cortices over two days. TMS was applied at 120% of resting motor threshold (rMT) up to a maximum of 100% maximum stimulator output. The average of all 100 pulses was used as a "gold-standard" TEP to which we compared "candidate" TEPs obtained by averaging subsets of pulses. We defined TEP stability as the MNP needed to achieve a concordance correlation coefficient of 80% between the candidate and "gold-standard" TEP. We additionally assessed whether experimental or clinical factors affected TEP stability. Results show that stable TEPs can be derived from fewer than 100 pulses, a number typically used for designing TMS-EEG experiments. The early segment (15-80 ms) of the TEP was less stable than the later segment (80-350 ms). Global mean field amplitude derived from all channels was less stable than local TEP derived from channels overlying the stimulated site. TEP stability did not differ depending on stimulated hemisphere, block order, or antiseizure medication use, but was greater in older children. Stimulation administered with an intensity above the rMT yielded more stable local TEPs. Studies of TMS-EEG in pediatrics have been limited by the complexity of experimental set-up and time course. This study serves as a critical starting point, demonstrating the feasibility of designing efficient TMS-EEG studies that use a relatively small number of pulses to study pediatric epilepsy and potentially other pediatric groups.

View details for DOI 10.1038/s41598-024-59468-8

View details for PubMedID 38641629

View details for PubMedCentralID PMC11031596
Long-term persistence to OnabotulinumtoxinA to prevent chronic migraine: Results from 11 years of patient data from a tertiary headache center. Pain medicine (Malden, Mass.) Moskatel, L. S., Graber-Naidich, A., He, Z., Zhang, N. 2024

Abstract

OBJECTIVE: To determine if patients with chronic migraine continue onabotulinumtoxinA (onabotA) long-term.METHODS: We performed a retrospective cohort analysis using aggregated, de-identified patient data from the Stanford Headache Center. We included patients in California who received at least one prescription for onabotA during the years of 2011-2021. The primary outcome was the number of onabotA treatments each patient received. Secondary outcomes included sex, age, race, ethnicity, body mass index (BMI), distance to the treatment facility, and zip code income quartile.RESULTS: A total of 1,551 patients received a mean of 7.60±7.26 treatments and a median of 5 treatments, with 16.2% of patients receiving only one treatment and 10.6% receiving at least 19. Time-to-event survival analysis suggested 26.0% of patients would complete at least 29 treatments if able. Younger age and female sex were associated with statistically significant differences between quartile groups of number of onabotA treatments (p=0.007, p=0.015). BMI, distance to treatment facility, and zip code income quartile were not statistically significantly different between quartile groups (p>0.500 for all). Prescriptions of both triptans and non-onabotA preventive medications showed a statistically significant increase with each higher quartile of number of onabotA treatments (p<0.001; p<0.001).DISCUSSION: We show long-term persistence to onabotA is high and that distance to treatment facility and income are not factors in continuation. Our work also demonstrates that as patients continue onabotA over time, there may be an increased need for adjunctive or alternative treatments.

View details for DOI 10.1093/pm/pnae020

View details for PubMedID 38518091
Impaired 24-h activity patterns are associated with an increased risk of Alzheimer's disease, Parkinson's disease, and cognitive decline. Alzheimer's research & therapy Winer, J. R., Lok, R., Weed, L., He, Z., Poston, K. L., Mormino, E. C., Zeitzer, J. M. 2024; 16 (1): 35

Abstract

Sleep-wake regulating circuits are affected during prodromal stages in the pathological progression of both Alzheimer's disease (AD) and Parkinson's disease (PD), and this disturbance can be measured passively using wearable devices. Our objective was to determine whether accelerometer-based measures of 24-h activity are associated with subsequent development of AD, PD, and cognitive decline.This study obtained UK Biobank data from 82,829 individuals with wrist-worn accelerometer data aged 40 to 79 years with a mean (± SD) follow-up of 6.8 (± 0.9) years. Outcomes were accelerometer-derived measures of 24-h activity (derived by cosinor, nonparametric, and functional principal component methods), incident AD and PD diagnosis (obtained through hospitalization or primary care records), and prospective longitudinal cognitive testing.One hundred eighty-seven individuals progressed to AD and 265 to PD. Interdaily stability (a measure of regularity, hazard ratio [HR] per SD increase 1.25, 95% confidence interval [CI] 1.05-1.48), diurnal amplitude (HR 0.79, CI 0.65-0.96), mesor (mean activity; HR 0.77, CI 0.59-0.998), and activity during most active 10 h (HR 0.75, CI 0.61-0.94), were associated with risk of AD. Diurnal amplitude (HR 0.28, CI 0.23-0.34), mesor (HR 0.13, CI 0.10-0.16), activity during least active 5 h (HR 0.24, CI 0.08-0.69), and activity during most active 10 h (HR 0.20, CI 0.16-0.25) were associated with risk of PD. Several measures were additionally predictive of longitudinal cognitive test performance.In this community-based longitudinal study, accelerometer-derived metrics were associated with elevated risk of AD, PD, and accelerated cognitive decline. These findings suggest 24-h rhythm integrity, as measured by affordable, non-invasive wearable devices, may serve as a scalable early marker of neurodegenerative disease.

View details for DOI 10.1186/s13195-024-01411-0

View details for PubMedID 38355598

View details for PubMedCentralID 4163039
Post-translational modifications linked to preclinical Alzheimer's disease-related pathological and cognitive changes. Alzheimer's & dementia : the journal of the Alzheimer's Association Abiose, O., Rutledge, J., Moran-Losada, P., Belloy, M. E., Wilson, E. N., He, Z., Trelle, A. N., Channappa, D., Romero, A., Park, J., Yutsis, M. V., Sha, S. J., Andreasson, K. I., Poston, K. L., Henderson, V. W., Wagner, A. D., Wyss-Coray, T., Mormino, E. C. 2023

Abstract

In this study, we leverage proteomic techniques to identify communities of proteins underlying Alzheimer's disease (AD) risk among clinically unimpaired (CU) older adults.We constructed a protein co-expression network using 3869 cerebrospinal fluid (CSF) proteins quantified by SomaLogic, Inc., in a cohort of participants along the AD clinical spectrum. We then replicated this network in an independent cohort of CU older adults and related these modules to clinically-relevant outcomes.We discovered modules enriched for phosphorylation and ubiquitination that were associated with abnormal amyloid status, as well as p-tau181 (M4: β = 2.44, p < 0.001, M7: β = 2.57, p < 0.001) and executive function performance (M4: β = -2.00, p = 0.005, M7: β = -2.39, p < 0.001).In leveraging CSF proteomic data from individuals spanning the clinical spectrum of AD, we highlight the importance of post-translational modifications for early cognitive and pathological changes.

View details for DOI 10.1002/alz.13576

View details for PubMedID 38146099
Assessing the Assisted Six-Minute Cycling Test as a Measure of Endurance in Non-Ambulatory Patients with Spinal Muscular Atrophy (SMA). Journal of clinical medicine Tang, W. J., Gu, B., Montalvo, S., Dunaway Young, S., Parker, D. M., de Monts, C., Ataide, P., Ni Ghiollagain, N., Wheeler, M. T., Tesi Rocha, C., Christle, J. W., He, Z., Day, J. W., Duong, T. 2023; 12 (24)

Abstract

Assessing endurance in non-ambulatory individuals with Spinal Muscular Atrophy (SMA) has been challenging due to limited evaluation tools. The Assisted 6-Minute Cycling Test (A6MCT) is an upper limb ergometer assessment used in other neurologic disorders to measure endurance. To study the performance of the A6MCT in the non-ambulatory SMA population, prospective data was collected on 38 individuals with SMA (13 sitters; 25 non-sitters), aged 5 to 74 years (mean = 30.3; SD = 14.1). The clinical measures used were A6MCT, Revised Upper Limb Module (RULM), Adapted Test of Neuromuscular Disorders (ATEND), and Egen Klassifikation Scale 2 (EK2). Perceived fatigue was assessed using the Fatigue Severity Scale (FSS), and effort was assessed using the Rate of Perceived Exertion (RPE). Data were analyzed for: (1) Feasibility, (2) Clinical discrimination, and (3) Associations between A6MCT with clinical characteristics and outcomes. Results showed the A6MCT was feasible for 95% of the tested subjects, discriminated between functional groups (p = 0.0086), and was significantly associated with results obtained from RULM, ATEND, EK2, and Brooke (p < 0.0001; p = 0.029; p < 0.001; p = 0.005). These findings indicate the A6MCT's potential to evaluate muscular endurance in non-ambulatory SMA individuals, complementing clinician-rated assessments. Nevertheless, further validation with a larger dataset is needed for broader application.

View details for DOI 10.3390/jcm12247582

View details for PubMedID 38137651

View details for PubMedCentralID PMC10743820
The introduction of the CGRP monoclonal antibodies and their effect on the prescription patterns of chronic migraine preventive medications in a tertiary headache center: A retrospective, observational analysis. Headache Moskatel, L. S., Graber-Naidich, A., He, Z., Zhang, N. 2023

Abstract

OBJECTIVE: To determine the effect of the introduction of the calcitonin gene-related peptide monoclonal antibodies (CGRP mAbs) in 2018 on the prescribing of older medications for the prevention of chronic migraine.BACKGROUND: Prior to 2018, the preventive treatment of migraine borrowed from medications intended to treat other illnesses with the last medication, onabotulinumtoxinA, receiving Food and Drug Administration (FDA) approval for the prevention of chronic migraine in 2010. The FDA approval of three CGRP mAbs in 2018 provided the ideal natural experiment to assess how the introduction of these medications, and a fourth in 2020, affected the generally stable migraine preventive medications market.METHODS: We performed a retrospective cohort analysis using the aggregated de-identified data of 6595 patients. The percentage of patients with chronic migraine who had been prescribed one of ten most prescribed oral preventive medications or onabotulinumtoxinA, or any of the four CGRP mAbs, were calculated relative to the total number of patients with chronic migraine who received a prescription for any medication from our clinic during the pre-CGRP mAb years of 2015-2017 and post-approval years of 2019-2021.RESULTS: We observed a statistically significant decrease in the prescription of the top 10 most prescribed medications after the introduction of the CGRP mAbs overall (1456/3144, 46.3%, to 1995/4629, 43.1%, p=0.001), as well as with most individual medications, including large decreases in verapamil (230/3144, 7.3%, to 125/4629, 2.7%; p<0.001), the tricyclic antidepressants (494/3144, 15.7%, to 532/4629, 11.5%; p<0.001), topiramate (566/3144, 18.0%, to 653/4629, 14.1%; p<0.001), and onabotulinumtoxinA (861/3144, 27.4%, to 1134/4629, 24.5%; p=0.001).CONCLUSION: The introduction of the CGRP mAbs during 2018 resulted in a decrease in utilization of most oral medications and onabotulinumtoxinA for the prevention of migraine. Future work should continue to observe how the prescription patterns of these medications evolve with time.

View details for DOI 10.1111/head.14642

View details for PubMedID 37882379
Loop diuretics association with Alzheimer's disease risk. Frontiers in aging Graber-Naidich, A., Lee, J., Younes, K., Greicius, M. D., Le Guen, Y., He, Z. 2023; 4: 1211571

Abstract

Objectives: To investigate whether exposure history to two common loop diuretics, bumetanide and furosemide, affects the risk of developing Alzheimer's disease (AD) after accounting for socioeconomic status and congestive heart failure. Methods: Individuals exposed to bumetanide or furosemide were identified in the Stanford University electronic health record using the de-identified Observational Medical Outcomes Partnership platform. We matched the AD case cohort to a control cohort (1:20 case:control) on gender, race, ethnicity, and hypertension, and controlled for variables that could potentially be collinear with bumetanide exposure and/or AD diagnosis. Among individuals older than 65 years, 5,839 AD cases and 116,103 matched controls were included. A total of 1,759 patients (54 cases and 1,705 controls) were exposed to bumetanide. Results: After adjusting for socioeconomic status and other confounders, the exposure of bumetanide and furosemide was significantly associated with reduced AD risk (respectively, bumetanide odds ratio [OR] = 0.23; 95% confidence interval [CI], 0.15-0.36; p = 4.0 × 10-11; furosemide OR = 0.42; 95% CI, 0.38-0.47; p < 2.0 × 10-16). Discussion: Our study replicates in an independent sample that a history of bumetanide exposure is associated with reduced AD risk while also highlighting an association of the most common loop diuretic (furosemide) with reduced AD risk. These associations need to be additionally replicated, and the mechanism of action remains to be investigated.

View details for DOI 10.3389/fragi.2023.1211571

View details for PubMedID 37822457

View details for PubMedCentralID PMC10563814
Multiancestry analysis of the HLA locus in Alzheimer's and Parkinson's diseases uncovers a shared adaptive immune response mediated by HLA-DRB1*04 subtypes. Proceedings of the National Academy of Sciences of the United States of America Le Guen, Y., Luo, G., Ambati, A., Damotte, V., Jansen, I., Yu, E., Nicolas, A., de Rojas, I., Peixoto Leal, T., Miyashita, A., Bellenguez, C., Lian, M. M., Parveen, K., Morizono, T., Park, H., Grenier-Boley, B., Naito, T., Küçükali, F., Talyansky, S. D., Yogeshwar, S. M., Sempere, V., Satake, W., Alvarez, V., Arosio, B., Belloy, M. E., Benussi, L., Boland, A., Borroni, B., Bullido, M. J., Caffarra, P., Clarimon, J., Daniele, A., Darling, D., Debette, S., Deleuze, J. F., Dichgans, M., Dufouil, C., During, E., Düzel, E., Galimberti, D., Garcia-Ribas, G., García-Alberca, J. M., García-González, P., Giedraitis, V., Goldhardt, O., Graff, C., Grünblatt, E., Hanon, O., Hausner, L., Heilmann-Heimbach, S., Holstege, H., Hort, J., Jung, Y. J., Jürgen, D., Kern, S., Kuulasmaa, T., Lee, K. H., Lin, L., Masullo, C., Mecocci, P., Mehrabian, S., de Mendonça, A., Boada, M., Mir, P., Moebus, S., Moreno, F., Nacmias, B., Nicolas, G., Niida, S., Nordestgaard, B. G., Papenberg, G., Papma, J., Parnetti, L., Pasquier, F., Pastor, P., Peters, O., Pijnenburg, Y. A., Piñol-Ripoll, G., Popp, J., Porcel, L. M., Puerta, R., Pérez-Tur, J., Rainero, I., Ramakers, I., Real, L. M., Riedel-Heller, S., Rodriguez-Rodriguez, E., Ross, O. A., Luís Royo, J., Rujescu, D., Scarmeas, N., Scheltens, P., Scherbaum, N., Schneider, A., Seripa, D., Skoog, I., Solfrizzi, V., Spalletta, G., Squassina, A., van Swieten, J., Sánchez-Valle, R., Tan, E. K., Tegos, T., Teunissen, C., Thomassen, J. Q., Tremolizzo, L., Vyhnalek, M., Verhey, F., Waern, M., Wiltfang, J., Zhang, J., Zetterberg, H., Blennow, K., He, Z., Williams, J., Amouyel, P., Jessen, F., Kehoe, P. G., Andreassen, O. A., Van Duin, C., Tsolaki, M., Sánchez-Juan, P., Frikke-Schmidt, R., Sleegers, K., Toda, T., Zettergren, A., Ingelsson, M., Okada, Y., Rossi, G., Hiltunen, M., Gim, J., Ozaki, K., Sims, R., Foo, J. N., van der Flier, W., Ikeuchi, T., Ramirez, A., Mata, I., Ruiz, A., Gan-Or, Z., Lambert, J. C., Greicius, M. D., Mignot, E. 2023; 120 (36): e2302720120

Abstract

Across multiancestry groups, we analyzed Human Leukocyte Antigen (HLA) associations in over 176,000 individuals with Parkinson's disease (PD) and Alzheimer's disease (AD) versus controls. We demonstrate that the two diseases share the same protective association at the HLA locus. HLA-specific fine-mapping showed that hierarchical protective effects of HLA-DRB1*04 subtypes best accounted for the association, strongest with HLA-DRB1*04:04 and HLA-DRB1*04:07, and intermediary with HLA-DRB1*04:01 and HLA-DRB1*04:03. The same signal was associated with decreased neurofibrillary tangles in postmortem brains and was associated with reduced tau levels in cerebrospinal fluid and to a lower extent with increased Aβ42. Protective HLA-DRB1*04 subtypes strongly bound the aggregation-prone tau PHF6 sequence, however only when acetylated at a lysine (K311), a common posttranslational modification central to tau aggregation. An HLA-DRB1*04-mediated adaptive immune response decreases PD and AD risks, potentially by acting against tau, offering the possibility of therapeutic avenues.

View details for DOI 10.1073/pnas.2302720120

View details for PubMedID 37643212
Real world evidence of changes in CGRP monoclonal antibody and onabotulinumtoxinA prescription practices at the start of the COVID-19 pandemic: An observational, retrospective study. Headache Moskatel, L. S., Graber-Naidich, A., He, Z., Zhang, N. 2023

View details for DOI 10.1111/head.14585

View details for PubMedID 37358470
Major Adverse Dystrophinopathy Events (MADE) Score as Marker of Cumulative Morbidity and Risk for Mortality in Boys with Duchenne Muscular Dystrophy. Progress in pediatric cardiology Kaufman, B. D., Garcia, A., He, Z., Tesi-Rocha, C., Buu, M., Rosenthal, D., Gordish-Dressman, H., Almond, C. S., Duong, T. 2023; 69

Abstract

Overlapping symptoms from cardiomyopathy, respiratory insufficiency, and skeletal myopathy confound assessment of heart failure in Duchenne Muscular Dystrophy. We developed an ordinal scale of multiorgan clinical variables that reflect cumulative disease burden-the Major Adverse Dystrophinopathy Event (MADE) Score. We hypothesized that a higher MADE score would be associated with increased mortality in boys with Duchenne Muscular Dystrophy. The Cooperative International Neuromuscular Research Group Duchenne Natural History Study dataset was utilized for validation.Duchenne Natural History Study variables were selected based on clinical relevance to prespecified domains: Cardiac, Pulmonary, Myopathy, Nutrition. Severity points (0-4) were assigned and summed for study visits. MADE score for cohorts defined by age, ambulatory status, and survival were compared at enrollment and longitudinally.Associations between MADE score and mortality were examined.Duchenne Natural History Study enrolled 440 males, 12.6 ±6.1 years old, with 3,559 visits over 4.6 ±2.8 years, 45 deaths. MADE score increased with age and nonambulatory status. Mean MADE score per visit was 19 ±10 for those who died vs. 9.8 ±9.3 in survivors p=0.03. Baseline MADE score >12 predicted mortality independent of age (78% sensitivity, CPE.70). Rising MADE score trajectory was associated with mortality in models adjusted for enrollment age, follow-up time, and ambulatory status, all p<.001.A multiorgan severity score, MADE, was developed to track cumulative morbidities that impact heart failure in Duchenne muscular dystrophy. MADE score predicted Duchenne Natural History Study mortality. MADE score can be used for serial heart failure assessment in males and may serve as an endpoint for Duchenne muscular dystrophy clinical research.

View details for DOI 10.1016/j.ppedcard.2023.101639

View details for PubMedID 37990740

View details for PubMedCentralID PMC10659574
Peripheral T-Cells, B-Cells, and Monocytes from Multiple Sclerosis Patients Supplemented with High-Dose Vitamin D Show Distinct Changes in Gene Expression Profiles. Nutrients Kim, D., Witt, E. E., Schubert, S., Sotirchos, E., Bhargava, P., Mowry, E. M., Sachs, K., Bilen, B., Steinman, L., Awani, A., He, Z., Calabresi, P. A., Van Haren, K. 2022; 14 (22)

Abstract

Vitamin D is a steroid hormone that has been widely studied as a potential therapy for multiple sclerosis and other inflammatory disorders. Pre-clinical studies have implicated vitamin D in the transcription of thousands of genes, but its influence may vary by cell type. A handful of clinical studies have failed to identify an in vivo gene expression signature when using bulk analysis of all peripheral immune cells. We hypothesized that vitamin D's gene signature would vary by immune cell type, requiring the analysis of distinct cell types. Multiple sclerosis patients (n = 18) were given high-dose vitamin D (10,400 IU/day) for six months as part of a prospective clinical trial (NCT01024777). We collected peripheral blood mononuclear cells from participants at baseline and again after six months of treatment. We used flow cytometry to isolate three immune cell types (CD4+ T-cells, CD19+ B-cells, CD14+ monocytes) for RNA microarray analysis and compared the expression profiles between baseline and six months. We identified distinct sets of differentially expressed genes and enriched pathways between baseline and six months for each cell type. Vitamin D's in vivo gene expression profile in the immune system likely differs by cell type. Future clinical studies should consider techniques that allow for a similar cell-type resolution.

View details for DOI 10.3390/nu14224737

View details for PubMedID 36432424
Connectivity increases during spikes and spike-free periods in self-limited epilepsy with centrotemporal spikes. Clinical neurophysiology : official journal of the International Federation of Clinical Neurophysiology Goad, B. S., Lee-Messer, C., He, Z., Porter, B. E., Baumer, F. M. 2022

Abstract

OBJECTIVE: To understand the impact of interictal spikes on brain connectivity in patients with Self-Limited Epilepsy with Centrotemporal Spikes (SeLECTS).METHODS: Electroencephalograms from 56 consecutive SeLECTS patients were segmented into periods with and without spikes. Connectivity between electrodes was calculated using the weighted phase lag index. To determine if there are chronic alterations in connectivity in SeLECTS, we compared spike-free connectivity to connectivity in 65 matched controls. To understand the acute impact of spikes, we compared connectivity immediately before, during, and after spikes versus baseline, spike-free connectivity. We explored whether behavioral state, spike laterality, or antiseizure medications affected connectivity.RESULTS: Children with SeLECTS had markedly higher connectivity than controls during sleep but not wakefulness, with greatest difference in the right hemisphere. During spikes, connectivity increased globally; before and after spikes, left frontal and bicentral connectivity increased. Right hemisphere connectivity increased more during right-sided than left-sided spikes; left hemisphere connectivity was equally affected by right and left spikes.CONCLUSIONS: SeLECTS patient have persistent increased connectivity during sleep; connectivity is further elevated during the spike and perispike periods.SIGNIFICANCE: Testing whether increased connectivity impacts cognition or seizure susceptibility in SeLECTS and more severe epilepsies could help determine if spikes should be treated.

View details for DOI 10.1016/j.clinph.2022.09.015

View details for PubMedID 36307364
A Fast and Robust Strategy to Remove Variant-Level Artifacts in Alzheimer Disease Sequencing Project Data. Neurology. Genetics Belloy, M. E., Le Guen, Y., Eger, S. J., Napolioni, V., Greicius, M. D., He, Z. 2022; 8 (5): e200012

Abstract

Background and Objectives: Exome sequencing (ES) and genome sequencing (GS) are expected to be critical to further elucidate the missing genetic heritability of Alzheimer disease (AD) risk by identifying rare coding and/or noncoding variants that contribute to AD pathogenesis. In the United States, the Alzheimer Disease Sequencing Project (ADSP) has taken a leading role in sequencing AD-related samples at scale, with the resultant data being made publicly available to researchers to generate new insights into the genetic etiology of AD. To achieve sufficient power, the ADSP has adapted a study design where subsets of larger AD cohorts are collected and sequenced across multiple centers, using a variety of sequencing platforms. This approach may lead to variable variant quality across sequencing centers and/or platforms. In this study, we sought to implement and evaluate filters that can be applied fast to robustly remove variant-level artifacts in the ADSP data.Methods: We implemented a robust quality control procedure to handle ADSP data. We evaluated this procedure while performing exome-wide and genome-wide association analyses on AD risk using the latest ADSP whole ES (WES) and whole GS (WGS) data releases (NG00067.v5).Results: We observed that many variants displayed large variation in allele frequencies across sequencing centers/platforms and contributed to spurious association signals with AD risk. We also observed that sequencing platform/center adjustment in association models could not fully account for these spurious signals. To address this issue, we designed and implemented variant filters that could capture and remove these center-specific/platform-specific artifactual variants.Discussion: We derived a fast and robust approach to filter variants that represent sequencing center-related or platform-related artifacts underlying spurious associations with AD risk in ADSP WES and WGS data. This approach will be important to support future robust genetic association studies on ADSP data, as well as other studies with similar designs.

View details for DOI 10.1212/NXG.0000000000200012

View details for PubMedID 35966919
KnockoffTrio: A knockoff framework for the identification of putative causal variants in genome-wide association studies with trio design. American journal of human genetics Yang, Y., Wang, C., Liu, L., Buxbaum, J., He, Z., Ionita-Laza, I. 2022

Abstract

Family-based designs can eliminate confounding due to population substructure and can distinguish direct from indirect genetic effects, but these designs are underpowered due to limited sample sizes. Here, we propose KnockoffTrio, a statistical method to identify putative causal genetic variants for father-mother-child trio design built upon a recently developed knockoff framework in statistics. KnockoffTrio controls the false discovery rate (FDR) in the presence of arbitrary correlations among tests and is less conservative and thus more powerful than the conventional methods that control the family-wise error rate via Bonferroni correction. Furthermore, KnockoffTrio is not restricted to family-based association tests and can be used in conjunction with more powerful, potentially nonlinear models to improve the power of standard family-based tests. We show, using empirical simulations, that KnockoffTrio can prioritize causal variants over associations due to linkage disequilibrium and can provide protection against confounding due to population stratification. In applications to 14,200 trios from three study cohorts for autism spectrum disorders (ASDs), including AGP, SPARK, and SSC, we show that KnockoffTrio can identify multiple significant associations that are missed by conventional tests applied to the same data. In particular, we replicate known ASD association signals with variants in several genes such as MACROD2, NRXN1, PRKAR1B, CADM2, PCDH9, and DOCK4 and identify additional associations with variants in other genes including ARHGEF10, SLC28A1, ZNF589, and HINT1 at FDR 10%.

View details for DOI 10.1016/j.ajhg.2022.08.013

View details for PubMedID 36150388
Molecular signatures underlying neurofibrillary tangle susceptibility in Alzheimer's disease. Neuron Otero-Garcia, M., Mahajani, S. U., Wakhloo, D., Tang, W., Xue, Y., Morabito, S., Pan, J., Oberhauser, J., Madira, A. E., Shakouri, T., Deng, Y., Allison, T., He, Z., Lowry, W. E., Kawaguchi, R., Swarup, V., Cobos, I. 2022

Abstract

Tau aggregation in neurofibrillary tangles (NFTs) is closely associated with neurodegeneration and cognitive decline in Alzheimer's disease (AD). However, the molecular signatures that distinguish between aggregation-prone and aggregation-resistant cell states are unknown. We developed methods for the high-throughput isolation and transcriptome profiling of single somas with NFTs from the human AD brain, quantified the susceptibility of 20 neocortical subtypes for NFT formation and death, and identified both shared and cell-type-specific signatures. NFT-bearing neurons shared a marked upregulation of synaptic transmission-related genes, including a core set of 63 genes enriched for synaptic vesicle cycling. Oxidative phosphorylation and mitochondrial dysfunction were highly cell-type dependent. Apoptosis was only modestly enriched, and the susceptibilities of NFT-bearing and NFT-free neurons for death were highly similar. Our analysis suggests that NFTs represent cell-type-specific responses to stress and synaptic dysfunction. We provide a resource for biomarker discovery and the investigation of tau-dependent and tau-independent mechanisms of neurodegeneration.

View details for DOI 10.1016/j.neuron.2022.06.021

View details for PubMedID 35882228
Pediatric Functional Neurological Disorder: Demographic and Clinical Factors Impacting Care JOURNAL OF CHILD NEUROLOGY Pal, R., Romero, E., He, Z., Stevenson, T., Campen, C. 2022: 8830738221113899

Abstract

This is a multicenter retrospective EMR-based chart review of 88 patients aged 3-21 years admitted for evaluation of functional neurologic disorder (FND). We sought to establish characteristics associated with FND, calculate incidence of abnormal neurodiagnostic findings, and determine features associated with variability in workup and treatment. FND patients were 65% female, 40% White, 33% Hispanic, and 88% primarily English speaking with median 13.9 years. We detected variability in management by age, ethnicity, psychiatric comorbidity, and hospital site. Our findings suggest limited utility to CTs in this setting (100% normal) and that workup can be safely informed by physical exam, which predicted abnormal MRI and LP results. We favor screening for adverse childhood experiences in FND patients. Hospitalization may be a rare opportunity for psychiatry contact.

View details for DOI 10.1177/08830738221113899

View details for Web of Science ID 000825063200001

View details for PubMedID 35815864
Association of Rare APOE Missense Variants V236E and R251G With Risk of Alzheimer Disease. JAMA neurology Le Guen, Y., Belloy, M. E., Grenier-Boley, B., de Rojas, I., Castillo-Morales, A., Jansen, I., Nicolas, A., Bellenguez, C., Dalmasso, C., Küçükali, F., Eger, S. J., Rasmussen, K. L., Thomassen, J. Q., Deleuze, J. F., He, Z., Napolioni, V., Amouyel, P., Jessen, F., Kehoe, P. G., van Duijn, C., Tsolaki, M., Sánchez-Juan, P., Sleegers, K., Ingelsson, M., Rossi, G., Hiltunen, M., Sims, R., van der Flier, W. M., Ramirez, A., Andreassen, O. A., Frikke-Schmidt, R., Williams, J., Ruiz, A., Lambert, J. C., Greicius, M. D., Arosio, B., Benussi, L., Boland, A., Borroni, B., Caffarra, P., Daian, D., Daniele, A., Debette, S., Dufouil, C., Düzel, E., Galimberti, D., Giedraitis, V., Grimmer, T., Graff, C., Grünblatt, E., Hanon, O., Hausner, L., Heilmann-Heimbach, S., Holstege, H., Hort, J., Jürgen, D., Kuulasmaa, T., van der Lugt, A., Masullo, C., Mecocci, P., Mehrabian, S., de Mendonça, A., Moebus, S., Nacmias, B., Nicolas, G., Olaso, R., Papenberg, G., Parnetti, L., Pasquier, F., Peters, O., Pijnenburg, Y. A., Popp, J., Rainero, I., Ramakers, I., Riedel-Heller, S., Scarmeas, N., Scheltens, P., Scherbaum, N., Schneider, A., Seripa, D., Soininen, H., Solfrizzi, V., Spalletta, G., Squassina, A., van Swieten, J., Tegos, T. J., Tremolizzo, L., Verhey, F., Vyhnalek, M., Wiltfang, J., Boada, M., García-González, P., Puerta, R., Real, L. M., Álvarez, V., Bullido, M. J., Clarimon, J., García-Alberca, J. M., Mir, P., Moreno, F., Pastor, P., Piñol-Ripoll, G., Molina-Porcel, L., Pérez-Tur, J., Rodríguez-Rodríguez, E., Royo, J. L., Sánchez-Valle, R., Dichgans, M., Rujescu, D. 2022

Abstract

The APOE ε2 and APOE ε4 alleles are the strongest protective and risk-increasing, respectively, genetic variants for late-onset Alzheimer disease (AD). However, the mechanisms linking APOE to AD-particularly the apoE protein's role in AD pathogenesis and how this is affected by APOE variants-remain poorly understood. Identifying missense variants in addition to APOE ε2 and APOE ε4 could provide critical new insights, but given the low frequency of additional missense variants, AD genetic cohorts have previously been too small to interrogate this question robustly.To determine whether rare missense variants on APOE are associated with AD risk.Association with case-control status was tested in a sequenced discovery sample (stage 1) and followed up in several microarray imputed cohorts as well as the UK Biobank whole-exome sequencing resource using a proxy-AD phenotype (stages 2 and 3). This study combined case-control, family-based, population-based, and longitudinal AD-related cohorts that recruited referred and volunteer participants. Stage 1 included 37 409 nonunique participants of European or admixed European ancestry, with 11 868 individuals with AD and 11 934 controls passing analysis inclusion criteria. In stages 2 and 3, 475 473 participants were considered across 8 cohorts, of which 84 513 individuals with AD and proxy-AD and 328 372 controls passed inclusion criteria. Selection criteria were cohort specific, and this study was performed a posteriori on individuals who were genotyped. Among the available genotypes, 76 195 were excluded. All data were retrieved between September 2015 and November 2021 and analyzed between April and November 2021.In primary analyses, the AD risk associated with each missense variant was estimated, as appropriate, with either linear mixed-model regression or logistic regression. In secondary analyses, associations were estimated with age at onset using linear mixed-model regression and risk of conversion to AD using competing-risk regression.A total of 544 384 participants were analyzed in the primary case-control analysis; 312 476 (57.4%) were female, and the mean (SD; range) age was 64.9 (15.2; 40-110) years. Two missense variants were associated with a 2-fold to 3-fold decreased AD risk: APOE ε4 (R251G) (odds ratio, 0.44; 95% CI, 0.33-0.59; P = 4.7 × 10-8) and APOE ε3 (V236E) (odds ratio, 0.37; 95% CI, 0.25-0.56; P = 1.9 × 10-6). Additionally, the cumulative incidence of AD in carriers of these variants was found to grow more slowly with age compared with noncarriers.In this genetic association study, a novel variant associated with AD was identified: R251G always coinherited with ε4 on the APOE gene, which mitigates the ε4-associated AD risk. The protective effect of the V236E variant, which is always coinherited with ε3 on the APOE gene, was also confirmed. The location of these variants confirms that the carboxyl-terminal portion of apoE plays an important role in AD pathogenesis. The large risk reductions reported here suggest that protein chemistry and functional assays of these variants should be pursued, as they have the potential to guide drug development targeting APOE.

View details for DOI 10.1001/jamaneurol.2022.1166

View details for PubMedID 35639372
Inequities in therapy for infantile spasms: a call to action. Annals of neurology Baumer, F. M., Mytinger, J. R., Neville, K., Briscoe Abath, C., Gutierrez, C. A., Numis, A. L., Harini, C., He, Z., Hussain, S. A., Berg, A. T., Chu, C. J., Gaillard, W. D., Loddenkemper, T., Pasupuleti, A., Samanata, D., Singh, R. K., Singhal, N. S., Wusthoff, C. J., Wirrell, E. C., Yozawitz, E., Knupp, K. G., Shellhaas, R. A., Grinspan, Z. M., Pediatric Epilepsy Research Consortium and National Infantile Spasms Consortium 2022

Abstract

OBJECTIVE: To determine whether selection of treatment for children with infantile spasms (IS) varies by race/ethnicity.METHODS: The prospective US National Infantile Spasms Consortium database includes children with IS treated from 2012-2018. We examined the relationship between race/ethnicity and receipt of standard IS therapy (prednisolone, adrenocorticotropic hormone, vigabatrin), adjusting for demographic and clinical variables using logistic regression. Our primary outcome was treatment course, which considered therapy prescribed for the first and, when needed, the second IS treatment together.RESULTS: Of 555 children, 324 (58%) were Non-Hispanic white, 55 (10%) Non-Hispanic Black, 24 (4%) Non-Hispanic Asian, 80 (14%) Hispanic, and 72 (13%) Other/Unknown. Most (398, 72%) received a standard treatment course. Insurance type, geographic location, history of prematurity, prior seizures, developmental delay or regression, abnormal head circumference, hypsarrhythmia, and IS etiologies were associated with standard therapy. In adjusted models, Non-Hispanic Black children had lower odds of receiving a standard treatment course compared with Non-Hispanic white children (OR 0.42, 95% CI 0.20-0.89, p=0.02). Adjusted models also showed that children with public (vs. private) insurance had lower odds of receiving standard therapy for treatment 1 (OR 0.42, CI 0.21-0.84, p=0.01).INTERPRETATION: Non-Hispanic Black children were more often treated with non-standard IS therapies than Non-Hispanic white children. Likewise, children with public (vs. private) insurance were less likely to receive standard therapies. Investigating drivers of inequities, and understanding the impact of racism on treatment decisions, are critical next steps to improve care for patients with IS. This article is protected by copyright. All rights reserved.

View details for DOI 10.1002/ana.26363

View details for PubMedID 35388521
Spinal cord injury: a study protocol for a systematic review and meta-analysis of microRNA alterations. Systematic reviews Tigchelaar, S., He, Z., Tharin, S. 2022; 11 (1): 61

Abstract

BACKGROUND: Spinal cord injury (SCI) is a devastating condition with no current neurorestorative treatments. Clinical trials have been hampered by a lack of meaningful diagnostic and prognostic markers of injury severity and neurologic recovery. Objective biomarkers and novel therapies for SCI represent urgent unmet clinical needs. Biomarkers of SCI that objectively stratify the severity of cord damage could expand the depth and scope of clinical trials and represent targets for the development of novel therapies for acute SCI. MicroRNAs (miRNAs) represent promising candidates both as informative molecules of injury severity and recovery, and as therapeutic targets. miRNAs are small, regulatory RNA molecules that are tissue-specific and evolutionarily conserved across species. miRNAs have been shown to represent powerful predictors of pathology, particularly with respect to neurologic disorders.METHODS: Studies investigating miRNA alterations in all species of animal models and human studies of acute, traumatic SCI will be identified from PubMed, Embase, and Scopus. We aim to identify whether SCI is associated with a specific pattern of miRNA expression that is conserved across species, and whether SCI is associated with a tissue- or cell type-specific pattern of miRNA expression. The inclusion criteria for this study will include (1) studies published anytime, (2) including all species, and sexes with acute, traumatic SCI, (3) relating to the alteration of miRNA after SCI, using molecular-based detection platforms including qRT-PCR, microarray, and RNA-sequencing, (4) including statistically significant miRNA alterations in tissues, such as spinal cord, serum/plasma, and/or CSF, and (5) studies with a SHAM surgery group. Articles included in the review will have their titles, abstracts, and full texts reviewed by two independent authors. Random effects meta-regression will be performed, which allows for within-study and between-study variability, on the miRNA expression after SCI or SHAM surgery. We will analyze both the cumulative pooled dataset, as well as datasets stratified by species, tissue type, and timepoint to identify miRNA alterations that are specifically related to the injured spinal cord. We aim to identify SCI-related miRNA that are specifically altered both within a species, and those that are evolutionarily conserved across species, including humans. The analyses will provide a description of the evolutionarily conserved miRNA signature of the pathophysiological response to SCI.DISCUSSION: Here, we present a protocol to perform a systematic review and meta-analysis to investigate the conserved inter- and intra-species miRNA changes that occur due to acute, traumatic SCI. This review seeks to serve as a valuable resource for the SCI community by establishing a rigorous and unbiased description of miRNA changes after SCI for the next generation of SCI biomarkers and therapeutic interventions.TRIAL REGISTRATION: The protocol for the systematic review and meta-analysis has been registered through PROSPERO: CRD42021222552 .

View details for DOI 10.1186/s13643-022-01921-8

View details for PubMedID 35382886
Precision Care in Cardiac Arrest: ICECAP (PRECICECAP) Study Protocol and Informatics Approach. Neurocritical care Elmer, J., He, Z., May, T., Osborn, E., Moberg, R., Kemp, S., Stover, J., Moyer, E., Geocadin, R. G., Hirsch, K. G., PRECICECAP Study Team 2022

Abstract

BACKGROUND: Most trials in critical care have been neutral, in part because between-patient heterogeneity means not all patients respond identically to the same treatment. The Precision Care in Cardiac Arrest: Influence of Cooling duration on Efficacy in Cardiac Arrest Patients (PRECICECAP) study will apply machine learning to high-resolution, multimodality data collected from patients resuscitated from out-of-hospital cardiac arrest. We aim to discover novel biomarker signatures to predict the optimal duration of therapeutic hypothermia and 90-day functional outcomes. In parallel, we are developing a freely available software platform for standardized curation of intensive care unit-acquired data for machine learning applications.METHODS: The Influence of Cooling duration on Efficacy in Cardiac Arrest Patients (ICECAP) study is a response-adaptive, dose-finding trial testing different durations of therapeutic hypothermia. Twelve ICECAP sites will collect data for PRECICECAP from multiple modalities routinely used after out-of-hospital cardiac arrest, including ICECAP case report forms, detailed medication data, cardiopulmonary and electroencephalographic waveforms, and digital imaging and communications in medicine files (DICOMs). We partnered with Moberg Analytics to develop a freely available software platform to allow high-resolution critical care data to be used efficiently and effectively. We will use an autoencoder neural network to create low-dimensional representations of all raw waveforms and derivative features, censored at rewarming to ensure clinical usability to guide optimal duration of hypothermia. We will also consider simple features that are historically considered to be important. Finally, we will create a supervised deep learning neural network algorithm to directly predict 90-day functional outcome from large sets of novel features.RESULTS: PRECICECAP is currently enrolling and will be completed in late 2025.CONCLUSIONS: Cardiac arrest is a heterogeneous disease that causes substantial morbidity and mortality. PRECICECAP will advance the overarching goal of titrating personalized neurocritical care on the basis of robust measures of individual need and treatment responsiveness. The software platform we develop will be broadly applicable to hospital-based research after acute illness or injury.

View details for DOI 10.1007/s12028-022-01464-9

View details for PubMedID 35229231
Challenges at the APOE locus: a robust quality control approach for accurate APOE genotyping. Alzheimer's research & therapy Belloy, M. E., Eger, S. J., Le Guen, Y., Damotte, V., Ahmad, S., Ikram, M. A., Ramirez, A., Tsolaki, A. C., Rossi, G., Jansen, I. E., de Rojas, I., Parveen, K., Sleegers, K., Ingelsson, M., Hiltunen, M., Amin, N., Andreassen, O., Sánchez-Juan, P., Kehoe, P., Amouyel, P., Sims, R., Frikke-Schmidt, R., van der Flier, W. M., Lambert, J. C., He, Z., Han, S. S., Napolioni, V., Greicius, M. D. 2022; 14 (1): 22

Abstract

Genetic variants within the APOE locus may modulate Alzheimer's disease (AD) risk independently or in conjunction with APOE*2/3/4 genotypes. Identifying such variants and mechanisms would importantly advance our understanding of APOE pathophysiology and provide critical guidance for AD therapies aimed at APOE. The APOE locus however remains relatively poorly understood in AD, owing to multiple challenges that include its complex linkage structure and uncertainty in APOE*2/3/4 genotype quality. Here, we present a novel APOE*2/3/4 filtering approach and showcase its relevance on AD risk association analyses for the rs439401 variant, which is located 1801 base pairs downstream of APOE and has been associated with a potential regulatory effect on APOE.We used thirty-two AD-related cohorts, with genetic data from various high-density single-nucleotide polymorphism microarrays, whole-genome sequencing, and whole-exome sequencing. Study participants were filtered to be ages 60 and older, non-Hispanic, of European ancestry, and diagnosed as cognitively normal or AD (n = 65,701). Primary analyses investigated AD risk in APOE*4/4 carriers. Additional supporting analyses were performed in APOE*3/4 and 3/3 strata. Outcomes were compared under two different APOE*2/3/4 filtering approaches.Using more conventional APOE*2/3/4 filtering criteria (approach 1), we showed that, when in-phase with APOE*4, rs439401 was variably associated with protective effects on AD case-control status. However, when applying a novel filter that increases the certainty of the APOE*2/3/4 genotypes by applying more stringent criteria for concordance between the provided APOE genotype and imputed APOE genotype (approach 2), we observed that all significant effects were lost.We showed that careful consideration of APOE genotype and appropriate sample filtering were crucial to robustly interrogate the role of the APOE locus on AD risk. Our study presents a novel APOE filtering approach and provides important guidelines for research into the APOE locus, as well as for elucidating genetic interaction effects with APOE*2/3/4.

View details for DOI 10.1186/s13195-022-00962-4

View details for PubMedID 35120553
Sex-heterogenous effect on Alzheimer's disease risk at the BIN1 locus. Alzheimer's & dementia : the journal of the Alzheimer's Association Guen, Y. L., Eger, S. J., Belloy, M. E., Kennedy, G., He, Z., Napolioni, V., Greicius, M. D. 1800; 17 Suppl 3: e053616

Abstract

BACKGROUND: Among Alzheimer's Disease (AD) tier 1 genes, BIN1 shows the greatest sex-biased expression in GTEx RNASeq, notably in brain tissues. Fine-mapping studies suggest that the BIN1 locus harbors at least two independent risk variants.METHOD: We considered a region ±200kb around BIN1 and performed sex-stratified analyses to identify genome-wide significant variants with a sex-heterogenous effect in imputed data from the AD Genetics Consortium. We ran conditional analyses on rs6733839 to show that variants with sex-heterogenous effects were independent from the lead variant at this locus. Additionally, we performed sex- and rs6733839-genotype-stratified analyses to understand which haplotype drives this sex-heterogenous effect on AD risk and on BIN1 expression in brain tissue from the ROSMAP study.RESULT: Rs10200967 has a significant sex-heterogenous effect on AD risk and is genome-wide significant in females but not males (Table 1). In the conditional analysis the association remains significant (pfemale = 6.5*10-3 , Table 2). The linkage disequilibrium between these two variants is low (r2 = 0.12). The protective association of rs10200967 is strongest in females homozygous for the major allele of rs6733839 (p = 1.1*10-3 ). Among individuals homozygous for the major allele of rs6733839, the effect of the interaction between rs10200967 dosage and sex on AD risk is significant (p = 3.2*10-3 ). In the full sample, the three-way interaction between these two variants and sex is significant (p = 0.021, Table 3). The rs10200967 minor allele is associated with an increased expression in GTEx (p = 6.0*10-15 , Figure 1) and ROSMAP (p = 9.1*10-3 , Table 4). Among rs6733839 reference allele homozygotes, the rs10200967 interaction with sex on BIN1 expression is significant (p = 0.0495). In the full ROSMAP sample, the three-way interaction is trending significant (p = 0.062, Table 5). Interestingly, rs10200967 is located in a histone peak and a start-exon of a BIN1 transcript (Figure 1) reinforcing its putative regulatory role.CONCLUSION: Our sex- and rs6733839-genotype stratified analyses, demonstrate that rs10200967 at the BIN1 locus is genome-wide significant, with a sex-heterogenous effect on AD risk and on BIN1 expression. These results support the growing consensus that there are two separate signals at the locus and suggest that rs10200967 contributes to the signal independent of rs6733839.

View details for DOI 10.1002/alz.053616

View details for PubMedID 35108924
APOE*4-stratified genome-wide association study of Alzheimer's disease in over 350,000 individuals. Alzheimer's & dementia : the journal of the Alzheimer's Association Belloy, M. E., Eger, S. J., Guen, Y. L., Kennedy, G., He, Z., Napolioni, V., Greicius, M. D. 1800; 17 Suppl 3: e055905

Abstract

BACKGROUND: APOE*4 is the strongest genetic risk factor for late-onset Alzheimer's disease (AD) and is highly pleiotropic, such that it may be considered as a biological factor that can affect overall genetic risk for AD. To advance our understanding of the genetic architecture of AD, we sought to perform the largest APOE*4-stratified genome-wide association study (GWAS) of AD.METHOD: Twenty-five publicly available AD GWAS datasets provided case-control diagnoses for phase-1 samples (imputed to the HRC r1.1 reference panel). The UK Biobank provided subjects with family history of AD status, transformed into an AD phenotype as described previously (Jansen et al., 2019) for phase-2 samples. Linear mixed model regressions were performed on case-control status (LMM-BOLT v.2.3.4), adjusting for age (age-at-onset in cases; age-at-last-exam in controls), sex, APOE*4 and APOE*2 dosage, the first 12 genetic principal components, array/batch, cohort in phase-1, and assessment center in phase-2. In phase-3, phase-1 and phase-2 findings were combined using multivariate genome-wide meta-analysis (Jansen et al., 2019). APOE*4+ heterogeneity tests were evaluated per phase and meta-analyzed in phase-3.RESULT: Participant demographics are in Table 1. Combining results from both APOE*4-stratified analyses, 106 lead variants across 98 loci passed suggestive significance (p<10-5 ; Figure 1). Although most variants reached only suggestive significance, 28 loci were previously reported at genome-wide significance in large-scale GWAS of AD (Kunkle et al., 2019, Jansen et al., 2019, Bellenguez et al., 2020), supporting that we identified potentially relevant AD loci. APOE*4-stratified effects were observed for 28 variants/loci covered across both phase-1 and phase-2 (NAPOE4+ =19; NAPOE4- =9; Table 2), and 25 variants/loci seen only in phase-2 (NAPOE4+ =17; NAPOE4- =8; Table 3). Notably, a genome-wide significant APOE*4+ heterogeneity effect was observed for the USP17L13 locus (a regulator of deubiquitination), while PPP1R12A, BRINP1, PCBD1, and SESN2 loci passed suggestive significance.CONCLUSION: Our findings revealed novel AD risk loci/genes and characterized which of these associated with AD risk differentially across APOE*4 status. This contributes highly to personalized genetic medicine and paves the way towards new potential AD drug targets. Ongoing work is adding samples for phase-1 analyses (imputing data to TOPMed) and pursuing both multi-omics and AD endophenotype validation efforts for variant prioritization.

View details for DOI 10.1002/alz.055905

View details for PubMedID 35108901
A Text-Based Intervention to Promote Literacy: An RCT. Pediatrics Chamberlain, L. J., Bruce, J., De La Cruz, M., Huffman, L., Steinberg, J. R., Bruguera, R., Peterson, J. W., Gardner, R. M., He, Z., Ordaz, Y., Connelly, E., Loeb, S. 2021

Abstract

BACKGROUND AND OBJECTIVES: Children entering kindergarten ready to learn are more likely to thrive. Inequitable access to high-quality, early educational settings creates early educational disparities. TipsByText, a text-message-based program for caregivers of young children, improves literacy of children in preschool, but efficacy for families without access to early childhood education was unknown.METHODS: We conducted a randomized controlled trial with caregivers of 3- and 4-year-olds in 2 public pediatric clinics. Intervention caregivers received TipsByText 3 times a week for 7 months. At pre- and postintervention, we measured child literacy using the Phonological Awareness Literacy Screening Tool (PALS-PreK) and caregiver involvement using the Parent Child Interactivity Scale (PCI). We estimated effects on PALS-PreK and PCI using multivariable linear regression.RESULTS: We enrolled 644 families, excluding 263 because of preschool participation. Compared with excluded children, those included in the study had parents with lower income and educational attainment and who were more likely to be Spanish speaking. Three-quarters of enrollees completed pre- and postintervention assessments. Postintervention PALS-PreK scores revealed an unadjusted treatment effect of 0.260 (P = .040); adjusting for preintervention score, child age, and caregiver language, treatment effect was 0.209 (P = .016), equating to 3 months of literacy gains. Effects were greater for firstborn children (0.282 vs 0.178), children in 2-parent families (0.262 vs 0.063), and 4-year-olds (0.436 vs 0.107). The overall effect on PCI was not significant (1.221, P = .124).CONCLUSIONS: The health sector has unique access to difficult-to-reach young children. With this clinic-based texting intervention, we reached underresourced families and increased child literacy levels.

View details for DOI 10.1542/peds.2020-049648

View details for PubMedID 34544847
Multitrait GWAS to connect disease variants and biological mechanisms. PLoS genetics Julienne, H., Laville, V., McCaw, Z. R., He, Z., Guillemot, V., Lasry, C., Ziyatdinov, A., Nerin, C., Vaysse, A., Lechat, P., Menager, H., Le Goff, W., Dube, M., Kraft, P., Ionita-Laza, I., Vilhjalmsson, B. J., Aschard, H. 2021; 17 (8): e1009713

Abstract

Genome-wide association studies (GWASs) have uncovered a wealth of associations between common variants and human phenotypes. Here, we present an integrative analysis of GWAS summary statistics from 36 phenotypes to decipher multitrait genetic architecture and its link with biological mechanisms. Our framework incorporates multitrait association mapping along with an investigation of the breakdown of genetic associations into clusters of variants harboring similar multitrait association profiles. Focusing on two subsets of immunity and metabolism phenotypes, we then demonstrate how genetic variants within clusters can be mapped to biological pathways and disease mechanisms. Finally, for the metabolism set, we investigate the link between gene cluster assignment and the success of drug targets in randomized controlled trials.

View details for DOI 10.1371/journal.pgen.1009713

View details for PubMedID 34460823
Do Steroids Matter? A Retrospective Review of Premedication for Taxane Chemotherapy and Hypersensitivity Reactions. Journal of clinical oncology : official journal of the American Society of Clinical Oncology Lansinger, O. M., Biedermann, S., He, Z., Colevas, A. D. 2021: JCO2101200

Abstract

PURPOSE: Despite the widespread use of the taxanes paclitaxel and docetaxel for a variety of cancers and their well-known association with hypersensitivity reactions (HSRs), there is still significant variation in the prescribing practices of steroids for premedication. Premedication almost always includes dexamethasone, which can be associated with multiple adverse effects if taken for extended periods of time. This study reviews the pattern of steroid premedication in patients who received paclitaxel or docetaxel at Stanford Cancer Institute between January 2010 and June 2020.METHODS: We used an electronic query of the electronic medical record followed up with a manual review of patient charts to ask whether we could find a correlation between steroid premedication dosing and the incidence or severity of HSRs with the first taxane dose. Variables considered included steroid dose and route, dose and type of taxane, clinical cancer group, sex, and race.RESULTS: Five thousand two hundred seventeen patients were identified as having received paclitaxel or docetaxel, and 3,181 met criteria for our analysis. There were 264 (8.3%) HSRs. In adjusted multivariate analysis, we found no correlation of HSR rate or severity among any of the variables evaluated except gynecology oncology clinic patients, who had an increased risk (hazard ratio [HR] 1.34) of HSRs overall and high-grade HSRs (HR 2.34), and female patients, who had a higher rate of HSRs overall (HR 1.26), but not high-grade HSRs.CONCLUSION: Neither dexamethasone dose nor route correlated with subsequent HSRs. Given the potential for adverse events from repeated high-dose steroids, our findings suggest that routine use of lower doses, such as a single 10 mg dose of dexamethasone, as premedication for taxanes to prevent HSRs is preferable to the current prescribing guidelines.

View details for DOI 10.1200/JCO.21.01200

View details for PubMedID 34357780
Advances and challenges in quantitative delineation of the genetic architecture of complex traits. Quantitative biology (Beijing, China) Tang, H., He, Z. 2021; 9 (2): 168-184

Abstract

Genome-wide association studies (GWAS) have been widely adopted in studies of human complex traits and diseases.This review surveys areas of active research: quantifying and partitioning trait heritability, fine mapping functional variants and integrative analysis, genetic risk prediction of phenotypes, and the analysis of sequencing studies that have identified millions of rare variants. Current challenges and opportunities are highlighted.GWAS have fundamentally transformed the field of human complex trait genetics. Novel statistical and computational methods have expanded the scope of GWAS and have provided valuable insights on the genetic architecture underlying complex phenotypes.

View details for DOI 10.15302/j-qb-021-0249

View details for PubMedID 35492964

View details for PubMedCentralID PMC9053444
Advances and challenges in quantitative delineation of the genetic architecture of complex traits QUANTITATIVE BIOLOGY Tang, H., He, Z. 2021; 9 (2): 168-184

View details for DOI 10.15302/J-QB-021-0249

View details for Web of Science ID 000687996800007
A novel age-informed approach for genetic association analysis in Alzheimer's disease. Alzheimer's research & therapy Le Guen, Y., Belloy, M. E., Napolioni, V., Eger, S. J., Kennedy, G., Tao, R., He, Z., Greicius, M. D., Alzheimers Disease Neuroimaging Initiative 2021; 13 (1): 72

Abstract

BACKGROUND: Many Alzheimer's disease (AD) genetic association studies disregard age or incorrectly account for it, hampering variant discovery.METHODS: Using simulated data, we compared the statistical power of several models: logistic regression on AD diagnosis adjusted and not adjusted for age; linear regression on a score integrating case-control status and age; and multivariate Cox regression on age-at-onset. We applied these models to real exome-wide data of 11,127 sequenced individuals (54% cases) and replicated suggestive associations in 21,631 genotype-imputed individuals (51% cases).RESULTS: Modeling variable AD risk across age results in 5-10% statistical power gain compared to logistic regression without age adjustment, while incorrect age adjustment leads to critical power loss. Applying our novel AD-age score and/or Cox regression, we discovered and replicated novel variants associated with AD on KIF21B, USH2A, RAB10, RIN3, and TAOK2 genes.CONCLUSION: Our AD-age score provides a simple means for statistical power gain and is recommended for future AD studies.

View details for DOI 10.1186/s13195-021-00808-5

View details for PubMedID 33794991
Administration of Dexamethasone for Bacterial Meningitis: An Unreliable Quality Measure. The Neurohospitalist Dujari, S., Gummidipundi, S., He, Z., Gold, C. A. 2021; 11 (2): 101-106

Abstract

To validate the use of administrative data to identify patients with bacterial meningitis and quantify the rate of dexamethasone administration as defined in the American Academy of Neurology Inpatient and Emergency Care Quality Measurement Set.The Vizient Clinical Data Base and Resource Manager was used to identify patients with International Classification of Diseases, Tenth Revision (ICD-10) codes for bacterial meningitis from October 2015 to June 2019. Chart review was performed on patients identified at a single quaternary-care hospital. The positive predictive value (PPV) of Vizient was determined. Demographic, clinical, and laboratory data were assessed using descriptive statistics.Of all hospitals that submitted complete data to Vizient during the study period, a median of 19 patients per hospital had ICD-10 codes for bacterial meningitis in the 45-month period. We identified 79 patients using Vizient at our institution of whom 69 had a diagnosis of bacterial meningitis confirmed by chart review (PPV = 87%). 15 patients were eligible to receive dexamethasone per the quality measurement set. Six of these patients (40%) received dexamethasone.It is feasible to use the Vizient Clinical Data Base and Resource Manager to identify patients with bacterial meningitis. Due to low prevalence across multiple institutions and high rate of exclusion criteria at our institution, this study suggests that the rate of dexamethasone administration in bacterial meningitis may be an unreliable indicator of quality of care provided by inpatient neurologists. The creation of a registry for hospitalized neurology patients could enhance development of future quality measures.

View details for DOI 10.1177/1941874420969556

View details for PubMedID 33791051

View details for PubMedCentralID PMC7958681
KLVS heterozygosity reduces brain amyloid in asymptomatic at-risk APOE4 carriers. Neurobiology of aging Belloy, M. E., Eger, S. J., Le Guen, Y., Napolioni, V., Deters, K. D., Yang, H., Scelsi, M. A., Porter, T., James, S., Wong, A., Schott, J. M., Sperling, R. A., Laws, S. M., Mormino, E. C., He, Z., Han, S. S., Altmann, A., Greicius, M. D., A4 Study Team, Insight 46 Study Team, Australian Imaging Biomarkers and Lifestyle (AIBL) Study, Alzheimer's Disease Neuroimaging Initiative 2021; 101: 123–29

Abstract

KLOTHOVS heterozygosity (KLVSHET+) was recently shown to be associated with reduced risk of Alzheimer's disease (AD) in APOE4 carriers. Additional studies suggest that KLVSHET+ protects against amyloid burden in cognitively normal older subjects, but sample sizes were too small to draw definitive conclusions. We performed a well-powered meta-analysis across 5 independent studies, comprising 3581 pre-clinical participants ages 60-80, to investigate whether KLVSHET+ reduces the risk of having an amyloid-positive positron emission tomography scan. Analyses were stratified by APOE4 status. KLVSHET+ reduced the risk of amyloid positivity in APOE4 carriers (odds ratio= 0.67 [0.52-0.88]; p= 3.5*10-3), but not in APOE4 non-carriers (odds ratio= 0.94 [0.73-1.21]; p= 0.63). The combination of APOE4 and KLVS genotypes should help enrich AD clinical trials for pre-symptomatic subjects at increased risk of developing amyloid aggregation and AD. KL-related pathways may help elucidate protective mechanisms against amyloid accumulation and merit exploration for novel AD drug targets. Future investigation of the biological mechanisms by which KL interacts with APOE4 and AD are warranted.

View details for DOI 10.1016/j.neurobiolaging.2021.01.008

View details for PubMedID 33610961
Treatment Practices and Outcomes in Continuous Spike and Wave During Slow Wave Sleep (CSWS): A Multicenter Collaboration. The Journal of pediatrics Baumer, F. M., McNamara, N. A., Fine, A. L., Pestana-Knight, E. n., Shellhaas, R. A., He, Z. n., Arndt, D. H., Gaillard, W. D., Kelley, S. A., Nagan, M. n., Ostendorf, A. P., Singhal, N. S., Speltz, L. n., Chapman, K. E. 2021

Abstract

To determine how Continuous Spike and Wave during Slow Wave Sleep (CSWS) is currently managed and to compare the effectiveness of current treatment strategies using a database from 11 pediatric epilepsy centers in the United States.This retrospective study gathered information on baseline clinical characteristics, CSWS etiology, and treatment(s) in consecutive patients seen between 2014-2016 at 11 epilepsy referral centers. Treatments were categorized as benzodiazepines, steroids, other antiseizure medications (ASMs), or other therapies. Two measures of treatment response [clinical improvement as noted by the treating physician; and EEG improvement] were compared across therapies, controlling for baseline variables.81 children underwent 153 treatment trials during the study period (68 trials of benzodiazepines, 25 of steroids, 45 of ASMs, 14 of other therapies). Children most frequently received benzodiazepines (62%) or ASMs (27%) as first line therapy. Treatment choice did not differ based on baseline clinical variables, nor did these variables correlate with outcome. After adjusting for baseline variables, children had a greater odds of clinical improvement with benzodiazepines (OR 3.32, 95%CI 1.57-7.04, P = .002) or steroids (OR 4.04, 95%CI 1.41-11.59, p=0.01) than with ASMs and a greater odds of EEG improvement after steroids (OR 3.36, 95% CI 1.09-10.33, p=0.03) than after ASMs.Benzodiazepines and ASMs are the most frequent initial therapy prescribed for CSWS in the United States. Our data suggests that ASMs are inferior to benzodiazepines and steroids and support earlier use of these therapies. Multicenter prospective studies that rigorously assess treatment protocols and outcomes are needed.

View details for DOI 10.1016/j.jpeds.2021.01.032

View details for PubMedID 33484700
Generalizable Sample-Efficient Siamese Autoencoder for Tinnitus Diagnosis in Listeners With Subjective Tinnitus IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING Liu, Z., Yao, L., Wang, X., Monaghan, J. J. M., Schaette, R., He, Z., McAlpine, D. 2021; 29: 1452-1461

Abstract

Electroencephalogram (EEG)-based neurofeedback has been widely studied for tinnitus therapy in recent years. Most existing research relies on experts' cognitive prediction, and studies based on machine learning and deep learning are either data-hungry or not well generalizable to new subjects. In this paper, we propose a robust, data-efficient model for distinguishing tinnitus from the healthy state based on EEG-based tinnitus neurofeedback. We propose trend descriptor, a feature extractor with lower fineness, to reduce the effect of electrode noises on EEG signals, and a siamese encoder-decoder network boosted in a supervised manner to learn accurate alignment and to acquire high-quality transferable mappings across subjects and EEG signal channels. Our experiments show the proposed method significantly outperforms state-of-the-art algorithms when analyzing subjects' EEG neurofeedback to 90dB and 100dB sound, achieving an accuracy of 91.67%-94.44% in predicting tinnitus and control subjects in a subject-independent setting. Our ablation studies on mixed subjects and parameters show the method's stability in performance.

View details for DOI 10.1109/TNSRE.2021.3095298

View details for Web of Science ID 000678331300009

View details for PubMedID 34232883
An evolutionarily acquired microRNA shapes development of mammalian cortical projections. Proceedings of the National Academy of Sciences of the United States of America Diaz, J. L., Siththanandan, V. B., Lu, V., Gonzalez-Nava, N., Pasquina, L., MacDonald, J. L., Woodworth, M. B., Ozkan, A., Nair, R., He, Z., Sahni, V., Sarnow, P., Palmer, T. D., Macklis, J. D., Tharin, S. 2020

Abstract

The corticospinal tract is unique to mammals and the corpus callosum is unique to placental mammals (eutherians). The emergence of these structures is thought to underpin the evolutionary acquisition of complex motor and cognitive skills. Corticospinal motor neurons (CSMN) and callosal projection neurons (CPN) are the archetypal projection neurons of the corticospinal tract and corpus callosum, respectively. Although a number of conserved transcriptional regulators of CSMN and CPN development have been identified in vertebrates, none are unique to mammals and most are coexpressed across multiple projection neuron subtypes. Here, we discover 17 CSMN-enriched microRNAs (miRNAs), 15 of which map to a single genomic cluster that is exclusive to eutherians. One of these, miR-409-3p, promotes CSMN subtype identity in part via repression of LMO4, a key transcriptional regulator of CPN development. In vivo, miR-409-3p is sufficient to convert deep-layer CPN into CSMN. This is a demonstration of an evolutionarily acquired miRNA in eutherians that refines cortical projection neuron subtype development. Our findings implicate miRNAs in the eutherians' increase in neuronal subtype and projection diversity, the anatomic underpinnings of their complex behavior.

View details for DOI 10.1073/pnas.2006700117

View details for PubMedID 33139574
Administration of Dexamethasone for Bacterial Meningitis: An Unreliable Quality Measure NEUROHOSPITALIST Dujari, S., Gummidipundi, S., He, Z., Gold, C. A. 2020

View details for DOI 10.1177/1941874420969556

View details for Web of Science ID 000600210200001
Benchmarking Performance on Administration of Dexamethasone for Bacterial Meningitis Dujari, S., Gummidipundi, S., He, Z., Gold, C. LIPPINCOTT WILLIAMS & WILKINS. 2020

View details for Web of Science ID 000536058003112
Interaction analysis under misspecification of main effects: Some common mistakes and simple solutions. Statistics in medicine Zhang, M., Yu, Y., Wang, S., Salvatore, M., G Fritsche, L., He, Z., Mukherjee, B. 2020

Abstract

The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study "The Michigan Genomics Initiative", we present two examples of interaction analysis in support of our results.

View details for DOI 10.1002/sim.8505

View details for PubMedID 32101638
FUN-LDA: A Latent Dirichlet Allocation Model for Predicting Tissue-Specific Functional Effects of Noncoding Variation: Methods and Applications AMERICAN JOURNAL OF HUMAN GENETICS Backenroth, D., He, Z., Kiryluk, K., Boeva, V., Pethukova, L., Khurana, E., Christiano, A., Buxbaum, J. D., Ionita-Laza, I. 2018; 102 (5): 920–42

Abstract

We describe a method based on a latent Dirichlet allocation model for predicting functional effects of noncoding genetic variants in a cell-type- and/or tissue-specific way (FUN-LDA). Using this unsupervised approach, we predict tissue-specific functional effects for every position in the human genome in 127 different tissues and cell types. We demonstrate the usefulness of our predictions by using several validation experiments. Using eQTL data from several sources, including the GTEx project, Geuvadis project, and TwinsUK cohort, we show that eQTLs in specific tissues tend to be most enriched among the predicted functional variants in relevant tissues in Roadmap. We further show how these integrated functional scores can be used for (1) deriving the most likely cell or tissue type causally implicated for a complex trait by using summary statistics from genome-wide association studies and (2) estimating a tissue-based correlation matrix of various complex traits. We found large enrichment of heritability in functional components of relevant tissues for various complex traits, and FUN-LDA yielded higher enrichment estimates than existing methods. Finally, using experimentally validated functional variants from the literature and variants possibly implicated in disease by previous studies, we rigorously compare FUN-LDA with state-of-the-art functional annotation methods and show that FUN-LDA has better prediction accuracy and higher resolution than these methods. In particular, our results suggest that tissue- and cell-type-specific functional prediction methods tend to have substantially better prediction accuracy than organism-level prediction methods. Scores for each position in the human genome and for each ENCODE and Roadmap tissue are available online (see Web Resources).

View details for PubMedID 29727691

View details for PubMedCentralID PMC5986983
Detecting Rare Mutations with Heterogeneous Effects Using a Family-Based Genetic Random Field Method. Genetics Li, M. n., He, Z. n., Tong, X. n., Witte, J. S., Lu, Q. n. 2018; 210 (2): 463–76

Abstract

The genetic etiology of many complex diseases is highly heterogeneous. A complex disease can be caused by multiple mutations within the same gene or mutations in multiple genes at various genomic loci. Although these disease-susceptibility mutations can be collectively common in the population, they are often individually rare or even private to certain families. Family-based studies are powerful for detecting rare variants enriched in families, which is an important feature for sequencing studies due to the heterogeneous nature of rare variants. In addition, family designs can provide robust protection against population stratification. Nevertheless, statistical methods for analyzing family-based sequencing data are underdeveloped, especially those accounting for heterogeneous etiology of complex diseases. In this article, we introduce a random field framework for detecting gene-phenotype associations in family-based sequencing studies, referred to as family-based genetic random field (FGRF). Similar to existing family-based association tests, FGRF could utilize within-family and between-family information separately or jointly to test an association. We demonstrate that FGRF has comparable statistical power with existing methods when there is no genetic heterogeneity, but can improve statistical power when there is genetic heterogeneity across families. The proposed method also shares the same advantages with the conventional family-based association tests (e.g., being robust to population stratification). Finally, we applied the proposed method to a sequencing data from the Minnesota Twin Family Study, and revealed several genes, including SAMD14, potentially associated with alcohol dependence.

View details for PubMedID 30104420

View details for PubMedCentralID PMC6216585
Rare-variant association tests in longitudinal studies, with an application to the Multi-Ethnic Study of Atherosclerosis (MESA) GENETIC EPIDEMIOLOGY He, Z., Lee, S., Zhang, M., Smith, J. A., Guo, X., Palmas, W., Kardia, S. L. R., Ionita-Laza, I., Mukherjee, B. 2017; 41 (8): 801–10

Abstract

Over the past few years, an increasing number of studies have identified rare variants that contribute to trait heritability. Due to the extreme rarity of some individual variants, gene-based association tests have been proposed to aggregate the genetic variants within a gene, pathway, or specific genomic region as opposed to a one-at-a-time single variant analysis. In addition, in longitudinal studies, statistical power to detect disease susceptibility rare variants can be improved through jointly testing repeatedly measured outcomes, which better describes the temporal development of the trait of interest. However, usual sandwich/model-based inference for sequencing studies with longitudinal outcomes and rare variants can produce deflated/inflated type I error rate without further corrections. In this paper, we develop a group of tests for rare-variant association based on outcomes with repeated measures. We propose new perturbation methods such that the type I error rate of the new tests is not only robust to misspecification of within-subject correlation, but also significantly improved for variants with extreme rarity in a study with small or moderate sample size. Through extensive simulation studies, we illustrate that substantially higher power can be achieved by utilizing longitudinal outcomes and our proposed finite sample adjustment. We illustrate our methods using data from the Multi-Ethnic Study of Atherosclerosis for exploring association of repeated measures of blood pressure with rare and common variants based on exome sequencing data on 6,361 individuals.

View details for PubMedID 29076270

View details for PubMedCentralID PMC5696115
Interaction between Social/Psychosocial Factors and Genetic Variants on Body Mass Index: A Gene-Environment Interaction Analysis in a Longitudinal Setting INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH Zhao, W., Ware, E. B., He, Z., Kardia, S. L. R., Faul, J. D., Smith, J. A. 2017; 14 (10)

Abstract

Obesity, which develops over time, is one of the leading causes of chronic diseases such as cardiovascular disease. However, hundreds of BMI (body mass index)-associated genetic loci identified through large-scale genome-wide association studies (GWAS) only explain about 2.7% of BMI variation. Most common human traits are believed to be influenced by both genetic and environmental factors. Past studies suggest a variety of environmental features that are associated with obesity, including socioeconomic status and psychosocial factors. This study combines both gene/regions and environmental factors to explore whether social/psychosocial factors (childhood and adult socioeconomic status, social support, anger, chronic burden, stressful life events, and depressive symptoms) modify the effect of sets of genetic variants on BMI in European American and African American participants in the Health and Retirement Study (HRS). In order to incorporate longitudinal phenotype data collected in the HRS and investigate entire sets of single nucleotide polymorphisms (SNPs) within gene/region simultaneously, we applied a novel set-based test for gene-environment interaction in longitudinal studies (LGEWIS). Childhood socioeconomic status (parental education) was found to modify the genetic effect in the gene/region around SNP rs9540493 on BMI in European Americans in the HRS. The most significant SNP (rs9540488) by childhood socioeconomic status interaction within the rs9540493 gene/region was suggestively replicated in the Multi-Ethnic Study of Atherosclerosis (MESA) (p = 0.07).

View details for PubMedID 28961216
Testing Allele Transmission of an SNP Set Using a Family-Based Generalized Genetic Random Field Method GENETIC EPIDEMIOLOGY Li, M., Li, J., He, Z., Lu, Q., Witte, J. S., Macleod, S. L., Hobbs, C. A., Cleves, M. A., Natl Birth Defects Prevention Stud 2016; 40 (4): 341–51

Abstract

Family-based association studies are commonly used in genetic research because they can be robust to population stratification (PS). Recent advances in high-throughput genotyping technologies have produced a massive amount of genomic data in family-based studies. However, current family-based association tests are mainly focused on evaluating individual variants one at a time. In this article, we introduce a family-based generalized genetic random field (FB-GGRF) method to test the joint association between a set of autosomal SNPs (i.e., single-nucleotide polymorphisms) and disease phenotypes. The proposed method is a natural extension of a recently developed GGRF method for population-based case-control studies. It models offspring genotypes conditional on parental genotypes, and, thus, is robust to PS. Through simulations, we presented that under various disease scenarios the FB-GGRF has improved power over a commonly used family-based sequence kernel association test (FB-SKAT). Further, similar to GGRF, the proposed FB-GGRF method is asymptotically well-behaved, and does not require empirical adjustment of the type I error rates. We illustrate the proposed method using a study of congenital heart defects with family trios from the National Birth Defects Prevention Study (NBDPS).

View details for PubMedID 27061818

View details for PubMedCentralID PMC5061344
Risk Prediction Modeling of Sequencing Data Using a Forward Random Field Method SCIENTIFIC REPORTS Wen, Y., He, Z., Li, M., Lu, Q. 2016; 6: 21120

Abstract

With the advance in high-throughput sequencing technology, it is feasible to investigate the role of common and rare variants in disease risk prediction. While the new technology holds great promise to improve disease prediction, the massive amount of data and low frequency of rare variants pose great analytical challenges on risk prediction modeling. In this paper, we develop a forward random field method (FRF) for risk prediction modeling using sequencing data. In FRF, subjects' phenotypes are treated as stochastic realizations of a random field on a genetic space formed by subjects' genotypes, and an individual's phenotype can be predicted by adjacent subjects with similar genotypes. The FRF method allows for multiple similarity measures and candidate genes in the model, and adaptively chooses the optimal similarity measure and disease-associated genes to reflect the underlying disease model. It also avoids the specification of the threshold of rare variants and allows for different directions and magnitudes of genetic effects. Through simulations, we demonstrate the FRF method attains higher or comparable accuracy over commonly used support vector machine based methods under various disease models. We further illustrate the FRF method with an application to the sequencing data obtained from the Dallas Heart Study.

View details for PubMedID 26892725
Association between Stress Response Genes and Features of Diurnal Cortisol Curves in the Multi-Ethnic Study of Atherosclerosis: A New Multi-Phenotype Approach for Gene-Based Association Tests PLOS ONE He, Z., Payne, E. K., Mukherjee, B., Lee, S., Smith, J. A., Ware, E. B., Sanchez, B. N., Seeman, T. E., Kardia, S. L. R., Roux, A. 2015; 10 (5): e0126637

Abstract

The hormone cortisol is likely to be a key mediator of the stress response that influences multiple physiologic systems that are involved in common chronic disease, including the cardiovascular system, the immune system, and metabolism. In this paper, a candidate gene approach was used to investigate genetic contributions to variability in multiple correlated features of the daily cortisol profile in a sample of European Americans, African Americans, and Hispanic Americans from the Multi-Ethnic Study of Atherosclerosis (MESA). We proposed and applied a new gene-level multiple-phenotype analysis and carried out a meta-analysis to combine the ethnicity specific results. This new analysis, instead of a more routine single marker-single phenotype approach identified a significant association between one gene (ADRB2) and cortisol features (meta-analysis p-value=0.0025), which was not identified by three other commonly used existing analytic strategies: 1. Single marker association tests involving each single cortisol feature separately; 2. Single marker association tests jointly testing for multiple cortisol features; 3. Gene-level association tests separately carried out for each single cortisol feature. The analytic strategies presented consider different hypotheses regarding genotype-phenotype association and imply different costs of multiple testing. The proposed gene-level analysis integrating multiple cortisol features across multiple ethnic groups provides new insights into the gene-cortisol association.

View details for PubMedID 25993632
A Powerful Nonparametric Statistical Framework for Family-Based Association Analyses GENETICS Li, M., He, Z., Schaid, D. J., Cleves, M. A., Nick, T. G., Lu, Q. 2015; 200 (1): 69–U140

Abstract

Family-based study design is commonly used in genetic research. It has many ideal features, including being robust to population stratification (PS). With the advance of high-throughput technologies and ever-decreasing genotyping cost, it has become common for family studies to examine a large number of variants for their associations with disease phenotypes. The yield from the analysis of these family-based genetic data can be enhanced by adopting computationally efficient and powerful statistical methods. We propose a general framework of a family-based U-statistic, referred to as family-U, for family-based association studies. Unlike existing parametric-based methods, the proposed method makes no assumption of the underlying disease models and can be applied to various phenotypes (e.g., binary and quantitative phenotypes) and pedigree structures (e.g., nuclear families and extended pedigrees). By using only within-family information, it can offer robust protection against PS. In the absence of PS, it can also utilize additional information (i.e., between-family information) for power improvement. Through simulations, we demonstrated that family-U attained higher power over a commonly used method, family-based association tests, under various disease scenarios. We further illustrated the new method with an application to large-scale family data from the Framingham Heart Study. By utilizing additional information (i.e., between-family information), family-U confirmed a previous association of CHRNA5 with nicotine dependence.

View details for PubMedID 25745024

View details for PubMedCentralID PMC4423382
A Weighted U-Statistic for Genetic Association Analyses of Sequencing Data GENETIC EPIDEMIOLOGY Wei, C., Li, M., He, Z., Vsevolozhskaya, O., Schaid, D. J., Lu, Q. 2014; 38 (8): 699–708

Abstract

With advancements in next-generation sequencing technology, a massive amount of sequencing data is generated, which offers a great opportunity to comprehensively investigate the role of rare variants in the genetic etiology of complex diseases. Nevertheless, the high-dimensional sequencing data poses a great challenge for statistical analysis. The association analyses based on traditional statistical methods suffer substantial power loss because of the low frequency of genetic variants and the extremely high dimensionality of the data. We developed a Weighted U Sequencing test, referred to as WU-SEQ, for the high-dimensional association analysis of sequencing data. Based on a nonparametric U-statistic, WU-SEQ makes no assumption of the underlying disease model and phenotype distribution, and can be applied to a variety of phenotypes. Through simulation studies and an empirical study, we showed that WU-SEQ outperformed a commonly used sequence kernel association test (SKAT) method when the underlying assumptions were violated (e.g., the phenotype followed a heavy-tailed distribution). Even when the assumptions were satisfied, WU-SEQ still attained comparable performance to SKAT. Finally, we applied WU-SEQ to sequencing data from the Dallas Heart Study (DHS), and detected an association between ANGPTL 4 and very low density lipoprotein cholesterol.

View details for DOI 10.1002/gepi.21864

View details for Web of Science ID 000345292600005

View details for PubMedID 25331574

View details for PubMedCentralID PMC4236269
A Generalized Genetic Random Field Method for the Genetic Association Analysis of Sequencing Data GENETIC EPIDEMIOLOGY Li, M., He, Z., Zhang, M., Zhan, X., Wei, C., Elston, R. C., Lu, Q. 2014; 38 (3): 242–53

Abstract

With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.

View details for DOI 10.1002/gepi.21790

View details for Web of Science ID 000332700300007

View details for PubMedID 24482034

View details for PubMedCentralID PMC5241166

Zihuai He

Associate Professor (Research) of Neurology and Neurological Sciences (Neurology Research), of Medicine (BMIR) and, by courtesy, of Biomedical Data Science

Web page: http://www.zihuai-he.com

Bio

Academic Appointments

Honors & Awards

Professional Education

Contact

Additional Info

Links

Current Research and Scholarly Interests

2025-26 Courses

2024-25 Courses

Stanford Advisees

All Publications

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract