Stanford Advisors


All Publications


  • Resolving the 22q11.2 deletion using CTLR-Seq reveals chromosomal rearrangement mechanisms and individual variance in breakpoints. Proceedings of the National Academy of Sciences of the United States of America Zhou, B., Purmann, C., Guo, H., Shin, G., Huang, Y., Pattni, R., Meng, Q., Greer, S. U., Roychowdhury, T., Wood, R. N., Ho, M., Dohna, H. Z., Abyzov, A., Hallmayer, J. F., Wong, W. H., Ji, H. P., Urban, A. E. 2024; 121 (31): e2322834121

    Abstract

    We developed a generally applicable method, CRISPR/Cas9-targeted long-read sequencing (CTLR-Seq), to resolve, haplotype-specifically, the large and complex regions in the human genome that had been previously impenetrable to sequencing analysis, such as large segmental duplications (SegDups) and their associated genome rearrangements. CTLR-Seq combines in vitro Cas9-mediated cutting of the genome and pulse-field gel electrophoresis to isolate intact large (i.e., up to 2,000 kb) genomic regions that encompass previously unresolvable genomic sequences. These targets are then sequenced (amplification-free) at high on-target coverage using long-read sequencing, allowing for their complete sequence assembly. We applied CTLR-Seq to the SegDup-mediated rearrangements that constitute the boundaries of, and give rise to, the 22q11.2 Deletion Syndrome (22q11DS), the most common human microdeletion disorder. We then performed de novo assembly to resolve, at base-pair resolution, the full sequence rearrangements and exact chromosomal breakpoints of 22q11.2DS (including all common subtypes). Across multiple patients, we found a high degree of variability for both the rearranged SegDup sequences and the exact chromosomal breakpoint locations, which coincide with various transposons within the 22q11.2 SegDups, suggesting that 22q11DS can be driven by transposon-mediated genome recombination. Guided by CTLR-Seq results from two 22q11DS patients, we performed three-dimensional chromosomal folding analysis for the 22q11.2 SegDups from patient-derived neurons and astrocytes and found chromosome interactions anchored within the SegDups to be both cell type-specific and patient-specific. Lastly, we demonstrated that CTLR-Seq enables cell-type specific analysis of DNA methylation patterns within the deletion haplotype of 22q11DS.

    View details for DOI 10.1073/pnas.2322834121

    View details for PubMedID 39042694

  • Prioritizing disease-related rare variants by integrating gene expression data. Research square Guo, H., Urban, A. E., Wong, W. H. 2024

    Abstract

    Rare variants, comprising a vast majority of human genetic variations, are likely to have more deleterious impact on human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the diseased patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. We also found strong excess of rare variants among the top prioritized genes in diseased patients compared to that in healthy individuals. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases.

    View details for DOI 10.21203/rs.3.rs-4355589/v1

    View details for PubMedID 38766095

    View details for PubMedCentralID PMC11100897

  • Prioritizing disease-related rare variants by integrating gene expression data. bioRxiv : the preprint server for biology Guo, H., Urban, A. E., Wong, W. H. 2024

    Abstract

    Rare variants, comprising a vast majority of human genetic variations, are likely to have more deleterious impact on human diseases compared to common variants. Here we present carrier statistic, a statistical framework to prioritize disease-related rare variants by integrating gene expression data. By quantifying the impact of rare variants on gene expression, carrier statistic can prioritize those rare variants that have large functional consequence in the diseased patients. Through simulation studies and analyzing real multi-omics dataset, we demonstrated that carrier statistic is applicable in studies with limited sample size (a few hundreds) and achieves substantially higher sensitivity than existing rare variants association methods. Application to Alzheimer's disease reveals 16 rare variants within 15 genes with extreme carrier statistics. The carrier statistic method can be applied to various rare variant types and is adaptable to other omics data modalities, offering a powerful tool for investigating the molecular mechanisms underlying complex diseases.

    View details for DOI 10.1101/2024.03.19.585836

    View details for PubMedID 38562756

  • Quantifying portable genetic effects and improving cross-ancestry genetic prediction with GWAS summary statistics. Nature communications Miao, J., Guo, H., Song, G., Zhao, Z., Hou, L., Lu, Q. 2023; 14 (1): 832

    Abstract

    Polygenic risk scores (PRS) calculated from genome-wide association studies (GWAS) of Europeans are known to have substantially reduced predictive accuracy in non-European populations, limiting their clinical utility and raising concerns about health disparities across ancestral populations. Here, we introduce a statistical framework named X-Wing to improve predictive performance in ancestrally diverse populations. X-Wing quantifies local genetic correlations for complex traits between populations, employs an annotation-dependent estimation procedure to amplify correlated genetic effects between populations, and combines multiple population-specific PRS into a unified score with GWAS summary statistics alone as input. Through extensive benchmarking, we demonstrate that X-Wing pinpoints portable genetic effects and substantially improves PRS performance in non-European populations, showing 14.1%-119.1% relative gain in predictive R2 compared to state-of-the-art methods based on GWAS summary statistics. Overall, X-Wing addresses critical limitations in existing approaches and may have broad applications in cross-population polygenic risk prediction.

    View details for DOI 10.1038/s41467-023-36544-7

    View details for PubMedID 36788230

  • Quantifying concordant genetic effects of de novo mutations on multiple disorders ELIFE Guo, H., Hou, L., Shi, Y., Jin, S., Zeng, X., Li, B., Lifton, R. P., Brueckner, M., Zhao, H., Lu, Q. 2022; 11

    Abstract

    Exome sequencing on tens of thousands of parent-proband trios has identified numerous deleterious de novo mutations (DNMs) and implicated risk genes for many disorders. Recent studies have suggested shared genes and pathways are enriched for DNMs across multiple disorders. However, existing analytic strategies only focus on genes that reach statistical significance for multiple disorders and require large trio samples in each study. As a result, these methods are not able to characterize the full landscape of genetic sharing due to polygenicity and incomplete penetrance. In this work, we introduce EncoreDNM, a novel statistical framework to quantify shared genetic effects between two disorders characterized by concordant enrichment of DNMs in the exome. EncoreDNM makes use of exome-wide, summary-level DNM data, including genes that do not reach statistical significance in single-disorder analysis, to evaluate the overall and annotation-partitioned genetic sharing between two disorders. Applying EncoreDNM to DNM data of nine disorders, we identified abundant pairwise enrichment correlations, especially in genes intolerant to pathogenic mutations and genes highly expressed in fetal tissues. These results suggest that EncoreDNM improves current analytic approaches and may have broad applications in DNM studies.

    View details for DOI 10.7554/eLife.75551

    View details for Web of Science ID 000867699200001

    View details for PubMedID 35666111

    View details for PubMedCentralID PMC9217133

  • Minimal sigma-field for flexible sufficient dimension reduction ELECTRONIC JOURNAL OF STATISTICS Guo, H., Hou, L., Zhu, Y. 2022; 16 (1): 1997-2032

    View details for DOI 10.1214/22-EJS1999

    View details for Web of Science ID 000825293500038

  • Detecting local genetic correlations with scan statistics NATURE COMMUNICATIONS Guo, H., Li, J. J., Lu, Q., Hou, L. 2021; 12 (1): 2033

    Abstract

    Genetic correlation analysis has quickly gained popularity in the past few years and provided insights into the genetic etiology of numerous complex diseases. However, existing approaches oversimplify the shared genetic architecture between different phenotypes and cannot effectively identify precise genetic regions contributing to the genetic correlation. In this work, we introduce LOGODetect, a powerful and efficient statistical method to identify small genome segments harboring local genetic correlation signals. LOGODetect automatically identifies genetic regions showing consistent associations with multiple phenotypes through a scan statistic approach. It uses summary association statistics from genome-wide association studies (GWAS) as input and is robust to sample overlap between studies. Applied to seven phenotypically distinct but genetically correlated neuropsychiatric traits, we identify 227 non-overlapping genome regions associated with multiple traits, including multiple hub regions showing concordant effects on five or more traits. Our method addresses critical limitations in existing analytic strategies and may have wide applications in post-GWAS analysis.

    View details for DOI 10.1038/s41467-021-22334-6

    View details for Web of Science ID 000636772600020

    View details for PubMedID 33795679

    View details for PubMedCentralID PMC8016883