Wing H Wong, Postdoctoral Faculty Sponsor
Heritability enrichment in context-specific regulatory networks improves phenotype-relevant tissue identification.
Systems genetics holds the promise to decipher complex traits by interpreting their associated SNPs through gene regulatory networks derived from comprehensive multi-omics data of cell types, tissues, and organs. Here, we propose SpecVar to integrate paired chromatin accessibility and gene expression data into context-specific regulatory network atlas and regulatory categories, conduct heritability enrichment analysis with GWAS summary statistics, identify relevant tissues, and depict common genetic factors acting in the shared regulatory networks between traits by relevance correlation. Our method improves power upon existing approaches by associating SNPs with context-specific regulatory elements to assess heritability enrichments and by explicitly prioritizing gene regulations underlying relevant tissues. Ablation studies, independent data validation, and comparison experiments with existing methods on GWAS of six phenotypes show that SpecVar can improve heritability enrichment, accurately detect relevant tissues, and reveal causal regulations. Furthermore, SpecVar correlates the relevance patterns for pairs of phenotypes and better reveals shared SNP associated regulations of phenotypes than existing methods. Studying GWAS of 206 phenotypes in UK-Biobank demonstrates that SpecVar leverages the context-specific regulatory network atlas to prioritize phenotypes' relevant tissues and shared heritability for biological and therapeutic insights. SpecVar provides a powerful way to interpret SNPs via context-specific regulatory networks and is available at https://github.com/AMSSwanglab/SpecVar.
View details for DOI 10.7554/eLife.82535
View details for PubMedID 36525361
Human Genetic Variants Associated with COVID-19 Severity are Enriched in Immune and Epithelium Regulatory Networks.
Phenomics (Cham, Switzerland)
2022; 2 (6): 389-403
Human genetic variants can influence the severity of symptoms infected with SARS-COV-2. Several genome-wide association studies have identified human genomic risk single nucleotide polymorphisms (SNPs) associated with coronavirus disease 2019 (COVID-19) severity. However, the causal tissues or cell types underlying COVID-19 severity are uncertain. In addition, candidate genes associated with these risk SNPs were investigated based on genomic proximity instead of their functional cellular contexts. Here, we compiled regulatory networks of 77 human contexts and revealed those risk SNPs' enriched cellular contexts and associated risk SNPs with transcription factors, regulatory elements, and target genes. Twenty-one human contexts were identified and grouped into two categories: immune cells and epithelium cells. We further aggregated the regulatory networks of immune cells and epithelium cells. These two aggregated regulatory networks were investigated to reveal their association with risk SNPs' regulation. Two genomic clusters, the chemokine receptors cluster and the oligoadenylate synthetase (OAS) cluster, showed the strongest association with COVID-19 severity, and they had different regulatory programs in immune and epithelium contexts. Our findings were supported by analysis of both SNP array and whole genome sequencing-based genome wide association study (GWAS) summary statistics.The online version contains supplementary material available at 10.1007/s43657-022-00066-x.
View details for DOI 10.1007/s43657-022-00066-x
View details for PubMedID 35990388
View details for PubMedCentralID PMC9375061
Comparison of chromatin accessibility landscapes during early development of prefrontal cortex between rhesus macaque and human.
2022; 13 (1): 3883
Epigenetic information regulates gene expression and development. However, our understanding of the evolution of epigenetic regulation on brain development in primates is limited. Here, we compared chromatin accessibility landscapes and transcriptomes during fetal prefrontal cortex (PFC) development between rhesus macaques and humans. A total of 304,761 divergent DNase I-hypersensitive sites (DHSs) are identified between rhesus macaques and humans, although many of these sites share conserved DNA sequences. Interestingly, most of the cis-elements linked to orthologous genes with dynamic expression are divergent DHSs. Orthologous genes expressed at earlier stages tend to have conserved cis-elements, whereas orthologous genes specifically expressed at later stages seldom have conserved cis-elements. These genes are enriched in synapse organization, learning and memory. Notably, DHSs in the PFC at early stages are linked to human educational attainment and cognitive performance. Collectively, the comparison of the chromatin epigenetic landscape between rhesus macaques and humans suggests a potential role for regulatory elements in the evolution of differences in cognitive ability between non-human primates and humans.
View details for DOI 10.1038/s41467-022-31403-3
View details for PubMedID 35794099
View details for PubMedCentralID PMC9259620
hReg-CNCC reconstructs a regulatory network in human cranial neural crest cells and annotates variants in a developmental context.
2021; 4 (1): 442
Cranial Neural Crest Cells (CNCC) originate at the cephalic region from forebrain, midbrain and hindbrain, migrate into the developing craniofacial region, and subsequently differentiate into multiple cell types. The entire specification, delamination, migration, and differentiation process is highly regulated and abnormalities during this craniofacial development cause birth defects. To better understand the molecular networks underlying CNCC, we integrate paired gene expression & chromatin accessibility data and reconstruct the genome-wide human Regulatory network of CNCC (hReg-CNCC). Consensus optimization predicts high-quality regulations and reveals the architecture of upstream, core, and downstream transcription factors that are associated with functions of neural plate border, specification, and migration. hReg-CNCC allows us to annotate genetic variants of human facial GWAS and disease traits with associated cis-regulatory modules, transcription factors, and target genes. For example, we reveal the distal and combinatorial regulation of multiple SNPs to core TF ALX1 and associations to facial distances and cranial rare disease. In addition, hReg-CNCC connects the DNA sequence differences in evolution, such as ultra-conserved elements and human accelerated regions, with gene expression and phenotype. hReg-CNCC provides a valuable resource to interpret genetic variants as early as gastrulation during embryonic development. The network resources are available at https://github.com/AMSSwanglab/hReg-CNCC .
View details for DOI 10.1038/s42003-021-01970-0
View details for PubMedID 33824393
scTIM: seeking cell-type-indicative marker from single cell RNA-seq data by consensus optimization.
Bioinformatics (Oxford, England)
2020; 36 (8): 2474-2485
Single cell RNA-seq data offers us new resource and resolution to study cell type identity and its conversion. However, data analyses are challenging in dealing with noise, sparsity and poor annotation at single cell resolution. Detecting cell-type-indicative markers is promising to help denoising, clustering and cell type annotation.We developed a new method, scTIM, to reveal cell-type-indicative markers. scTIM is based on a multi-objective optimization framework to simultaneously maximize gene specificity by considering gene-cell relationship, maximize gene's ability to reconstruct cell-cell relationship and minimize gene redundancy by considering gene-gene relationship. Furthermore, consensus optimization is introduced for robust solution. Experimental results on three diverse single cell RNA-seq datasets show scTIM's advantages in identifying cell types (clustering), annotating cell types and reconstructing cell development trajectory. Applying scTIM to the large-scale mouse cell atlas data identifies critical markers for 15 tissues as 'mouse cell marker atlas', which allows us to investigate identities of different tissues and subtle cell types within a tissue. scTIM will serve as a useful method for single cell RNA-seq data mining.scTIM is freely available at https://github.com/Frank-Orwell/scTIM.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btz936
View details for PubMedID 31845960
ELF: Extract Landmark Features By Optimizing Topology Maintenance, Redundancy, and Specificity
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS
2020; 17 (2): 411-421
Feature selection is the process of selecting a subset of landmark features for model construction when there are many features and a comparatively few samples. The far-reaching development technologies such as biological sequencing at single cell level make feature selection a more challenging work. The difficulty lies in four facts: those features measured are in high dimension and with noise; dropouts make the data much sparse; many features are either redundant or irrelevant; and samples are not well-labeled in the experiments. Here, we propose a new model called ELF (Extract Landmark Features) to address the above challenges. ELF aims to simultaneously maximize topology maintenance to keep the pairwise relationships among samples, minimize feature redundancy to diversify the features, and maximize feature specificity to make every selected feature more representative. This makes ELF a nonlinear combinatorial optimization. To solve this difficult problem, we propose a heuristic algorithm based on greedy strategy. We show ELF's outstanding performance on two single cell RNA-seq datasets. One is the direct reprogramming from mouse embryonic fibroblasts to induced neuron and the other is hepatoblast differentiation. ELF is able to choose only hundreds of landmark genes to maintain the cells' correlativity. Topology maintenance, redundancy removal, and specificity each plays its important role in selecting landmark features and revealing cells' biological functions. In addition, ELF can be generally applied in other scenarios. We demonstrate that ELF can reveal pivotal pixel in writing region and human face in two public image datasets. We believe that ELF is a useful tool to obtain more interpretable results by revealing key features while clustering the samples well.
View details for DOI 10.1109/TCBB.2018.2846225
View details for Web of Science ID 000524236800005
View details for PubMedID 29994260
Combining genome-wide association studies highlight novel loci involved in human facial variation
2022; 13 (1): 7832
Standard genome-wide association studies (GWASs) rely on analyzing a single trait at a time. However, many human phenotypes are complex and composed by multiple correlated traits. Here we introduce C-GWAS, a method for combining GWAS summary statistics of multiple potentially correlated traits. Extensive computer simulations demonstrated increased statistical power of C-GWAS compared to the minimal p-values of multiple single-trait GWASs (MinGWAS) and the current state-of-the-art method for combining single-trait GWASs (MTAG). Applying C-GWAS to a meta-analysis dataset of 78 single trait facial GWASs from 10,115 Europeans identified 56 study-wide suggestively significant loci with multi-trait effects on facial morphology of which 17 are novel loci. Using data from additional 13,622 European and Asian samples, 46 (82%) loci, including 9 (53%) novel loci, were replicated at nominal significance with consistent allele effects. Functional analyses further strengthen the reliability of our C-GWAS findings. Our study introduces the C-GWAS method and makes it available as computationally efficient open-source R package for widespread future use. Our work also provides insights into the genetic architecture of human facial appearance.
View details for DOI 10.1038/s41467-022-35328-9
View details for Web of Science ID 000938614300001
View details for PubMedID 36539420
View details for PubMedCentralID PMC9767941
OpenAnnotate: a web server to annotate the chromatin accessibility of genomic regions.
Nucleic acids research
2021; 49 (W1): W483-W490
Chromatin accessibility, as a powerful marker of active DNA regulatory elements, provides valuable information for understanding regulatory mechanisms. The revolution in high-throughput methods has accumulated massive chromatin accessibility profiles in public repositories. Nevertheless, utilization of these data is hampered by cumbersome collection, time-consuming processing, and manual chromatin accessibility (openness) annotation of genomic regions. To fill this gap, we developed OpenAnnotate (http://health.tsinghua.edu.cn/openannotate/) as the first web server for efficiently annotating openness of massive genomic regions across various biosample types, tissues, and biological systems. In addition to the annotation resource from 2729 comprehensive profiles of 614 biosample types of human and mouse, OpenAnnotate provides user-friendly functionalities, ultra-efficient calculation, real-time browsing, intuitive visualization, and elaborate application notebooks. We show its unique advantages compared to existing databases and toolkits by effectively revealing cell type-specificity, identifying regulatory elements and 3D chromatin contacts, deciphering gene functional relationships, inferring functions of transcription factors, and unprecedentedly promoting single-cell data analyses. We anticipate OpenAnnotate will provide a promising avenue for researchers to construct a more holistic perspective to understand regulatory mechanisms.
View details for DOI 10.1093/nar/gkab337
View details for PubMedID 33999180
View details for PubMedCentralID PMC8262705
Chromatin accessibility landscape and regulatory network of high-altitude hypoxia adaptation.
2020; 11 (1): 4928
High-altitude adaptation of Tibetans represents a remarkable case of natural selection during recent human evolution. Previous genome-wide scans found many non-coding variants under selection, suggesting a pressing need to understand the functional role of non-coding regulatory elements (REs). Here, we generate time courses of paired ATAC-seq and RNA-seq data on cultured HUVECs under hypoxic and normoxic conditions. We further develop a variant interpretation methodology (vPECA) to identify active selected REs (ASREs) and associated regulatory network. We discover three causal SNPs of EPAS1, the key adaptive gene for Tibetans. These SNPs decrease the accessibility of ASREs with weakened binding strength of relevant TFs, and cooperatively down-regulate EPAS1 expression. We further construct the downstream network of EPAS1, elucidating its roles in hypoxic response and angiogenesis. Collectively, we provide a systematic approach to interpret phenotype-associated noncoding variants in proper cell types and relevant dynamic conditions, to model their impact on gene regulation.
View details for DOI 10.1038/s41467-020-18638-8
View details for PubMedID 33004791