Jonathan Pritchard
Bing Professor of Population Studies, Professor of Genetics and Biology
Bio
Jonathan Pritchard is the Bing Professor of Population Studies, in the departments of Genetics and Biology at Stanford University. He grew up in England, and studied at Penn State, Stanford, and Oxford. He joined the faculty of the University of Chicago in 2001 and returned to Stanford University in 2013. His lab has done wide-ranging research on using genetics to study human population structure, history, and adaptation, and on understanding the mechanisms by which genetic variation affects gene regulation and complex traits. One of his early contributions was the Structure algorithm for using genetic data to infer population structure and personal ancestry, which has been widely influential and cited more than 40,000 times. More recently, his lab has made essential contributions on modeling and interpreting the genetic basis of complex traits and diseases. A current focus is on how to use experimental perturbation methods to model human gene regulatory networks for understanding complex traits, with a particular focus on the immune system. He has been recognized with various honors, including election to the National Academy of Sciences and the American Academy of Arts and Sciences.
Administrative Appointments
-
Co-Director, Stanford's Center for Computational, Evolutionary and Human Genomics (2017 - Present)
-
Investigator, Howard Hughes Medical Institute (2008 - 2019)
-
Professor, University of Chicago (Human Genetics) (2006 - 2013)
-
Assistant Professor, University of Chicago (Human Genetics) (2001 - 2005)
-
Postdoc, University of Oxford (Statistics) (1998 - 2001)
Honors & Awards
-
Fellow, National Academy of Sciences (2025)
-
Appointed Bing Professor of Population Studies, Stanford University (2020)
-
Fabio Frassetto International Prize in Physical Anthropology, Lincean Academy of Italy (2019)
-
Kistler Prize in Population Genetics and Society, Stanford University (2015)
-
Fellow, American Academy of Arts and Sciences (2013)
-
Novitski Prize, Genetics Society of America (2013)
-
Investigator, Howard Hughes Medical Institute (2008-2019)
-
Packard Fellow, Packard Foundation (2004)
-
Sloan Fellow, Sloan Foundation (2004)
-
Mitchell Prize, American Statistical Association and the International Society of Bayesian Analysis (2002)
-
Paper of the Year (Rosenberg et al 2002), Lancet (2002)
Professional Education
-
Ph.D., Stanford University, Biology (1998)
-
B.Sc., Pennsylvania State University, Biology and Mathematics (1994)
Current Research and Scholarly Interests
My group has expertise in the development of new statistical methods for genetic analysis and in their application to genomic data from humans and other organisms. We focus on questions relating to genetic variation and evolution: How does genetic variation impact phenotypic traits and evolution, both at the organismal and cellular level? What can we learn from genome sequences of modern and ancient humans about the relationships among human populations, and the the nature of adaptation in these populations?
We often work on problems where there are no off-the-shelf statistical methods. Thus, an important part of our work is in developing appropriate statistical and computational approaches that can yield new insights into biological data. In the past, we have made important contributions to a variety of problems in human population genetics, including methods for complex trait mapping, inference of population structure and history, and studies of natural selection. We have a strong track record of producing user-friendly resources that are widely used in the community, and in applied data analysis to tackle important biological questions. Notably, our Structure algorithm and software package for inferring population structure from genetic data have received >30,000 total citations spread across several papers.
Since 2008 an important emphasis of my group has focused on understanding gene regulation, and in particular how genetic variation may impact regulation. Ultimately, we would like to be able to predict which noncoding variants in the genome are likely to have regulatory effects in any given cell type, and how these link to phenotypic variation and disease. My lab has been deeply involved in developing new computational methods to interpret various types of modern genomic assays and in linking these to genetic variation.
Secondly, we have had a major focus on understanding the genetic architecture of complex traits, and the implications for understanding evolution. We have argued that much--if not most--evolution in humans likely proceeds through a process that we call "polygenic adaptation" in which populations evolve through small allele frequency shifts at many loci.
We have also written extensively about conceptual models for understanding the genetic architecture of trait variation (Boyle et al, 2017). We have argued that the data are consistent with a model in which essentially every regulatory variant in disease-relevant cell types can affect risk, and proposed that most of these effects act through trans-regulatory networks. Testing this model is an ongoing focus of our work.
2025-26 Courses
- Advanced Genetics
GENE 205 (Win) - Culture, Evolution, and Society
HUMBIO 2B (Aut) -
Independent Studies (11)
- Biomedical Informatics Teaching Methods
BMDS 295 (Aut, Win, Spr) - Directed Reading
BMDS 299 (Aut, Win, Spr) - Directed Reading in Biology
BIO 198 (Aut, Win, Spr, Sum) - Directed Reading in Genetics
GENE 299 (Aut, Win, Spr, Sum) - Graduate Research
BIO 300 (Aut, Win, Spr, Sum) - Graduate Research
GENE 399 (Aut, Win, Spr, Sum) - Medical Scholars Research
BMDS 370 (Aut, Win, Spr) - Medical Scholars Research
GENE 370 (Aut, Win, Spr, Sum) - Supervised Study
GENE 260 (Aut, Win, Spr, Sum) - Undergraduate Research
BIO 199 (Aut, Win, Spr, Sum) - Undergraduate Research
GENE 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2024-25 Courses
- Advanced Genetics
GENE 205 (Win) - Biology PhD Lab Rotation
BIO 299 (Aut, Win) - Culture, Evolution, and Society
HUMBIO 2B (Aut) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win)
2023-24 Courses
- Advanced Genetics
GENE 205 (Win) - Biology PhD Lab Rotation
BIO 299 (Aut, Win, Spr) - Culture, Evolution, and Society
HUMBIO 2B (Aut) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win)
2022-23 Courses
- Advanced Genetics
GENE 205 (Win) - Biology PhD Lab Rotation
BIO 299 (Win, Spr) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win)
- Advanced Genetics
Stanford Advisees
-
Med Scholar Project Advisor
Wilder Wohns -
Doctoral Dissertation Reader (AC)
Tatiana Bellagio, Javier Blanco, Connor Duffy, Keala Gapin, Olivia Ghosh, Ilayda Ilerten, Egor Lappo, Gabe Preising, Jess Rhodes, Alex Starr -
Postdoctoral Faculty Sponsor
Sayali Alatkar, Emma Dann, Yun Deng, Xinyi Li, Jiacheng Miao -
Doctoral Dissertation Advisor (AC)
Tami Gjorgjieva, Nikhil Milind, Courtney Smith, Julie Zhu -
Doctoral Dissertation Co-Advisor (AC)
Alvina Adimoelja, Ryan Goto
Graduate and Fellowship Programs
-
Biology (School of Humanities and Sciences) (Phd Program)
-
Biomedical Data Science (Phd Program)
All Publications
-
A way to identify the biological basis of gene-trait associations
NATURE
2026
View details for DOI 10.1038/d41586-025-04004-5
View details for Web of Science ID 001656052900001
View details for PubMedID 41501274
-
Causal modelling of gene effects from regulators to programs to traits.
Nature
2025
Abstract
Genetic association studies provide a unique tool for identifying candidate causal links from genes to human traits and diseases. However, it is challenging to determine the biological mechanisms underlying most associations, and we lack genome-scale approaches for inferring causal mechanistic pathways from genes to cellular functions to traits. Here we propose approaches to bridge this gap by combining quantitative estimates of gene-trait relationships from loss-of-function burden tests1 with gene-regulatory connections inferred from Perturb-seq experiments2 in relevant cell types. By combining these two forms of data, we aim to build causal graphs in which the directional associations of genes with a trait can be explained by their regulatory effects on biological programs or direct effects on the trait3. As a proof of concept, we constructed a causal graph of the gene-regulatory hierarchy that jointly controls three partially co-regulated blood traits. We propose that perturbation studies in trait-relevant cell types, coupled with gene-level effect sizes for traits, can bridge the gap between genetic association and biological mechanism.
View details for DOI 10.1038/s41586-025-09866-3
View details for PubMedID 41372418
View details for PubMedCentralID 8596853
-
The primate Major Histocompatibility Complex as a case study of gene family evolution.
eLife
2025; 14
Abstract
Gene families are groups of evolutionarily related genes. One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity. Across the ∼60 million year history of the primates, some MHC genes have turned over completely, some have changed function, some have converged in function, and others have remained essentially unchanged. Past work has typically focused on identifying MHC alleles within particular species or comparing gene content, but more work is needed to understand the overall evolution of the gene family across species. Thus, despite the immunologic importance of the MHC and its peculiar evolutionary history, we lack a complete picture of MHC evolution in the primates. We readdress this question using sequences from dozens of MHC genes and pseudogenes spanning the entire primate order, building a comprehensive set of gene and allele trees with modern methods. Overall, we find that the Class I gene subfamily is evolving much more quickly than the Class II gene subfamily, with the exception of the Class II MHC-DRB genes. We also pay special attention to the often-ignored pseudogenes, which we use to reconstruct different events in the evolution of the Class I region. We find that despite the shared function of the MHC across species, different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response. Our trees and extensive literature review represent the most comprehensive look into primate MHC evolution to date.
View details for DOI 10.7554/eLife.103545
View details for PubMedID 41335001
View details for PubMedCentralID PMC12674619
-
The primate Major Histocompatibility Complex as a case study of gene family evolution
ELIFE
2025; 14
Abstract
Gene families are groups of evolutionarily related genes. One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity. Across the ∼60 million year history of the primates, some MHC genes have turned over completely, some have changed function, some have converged in function, and others have remained essentially unchanged. Past work has typically focused on identifying MHC alleles within particular species or comparing gene content, but more work is needed to understand the overall evolution of the gene family across species. Thus, despite the immunologic importance of the MHC and its peculiar evolutionary history, we lack a complete picture of MHC evolution in the primates. We readdress this question using sequences from dozens of MHC genes and pseudogenes spanning the entire primate order, building a comprehensive set of gene and allele trees with modern methods. Overall, we find that the Class I gene subfamily is evolving much more quickly than the Class II gene subfamily, with the exception of the Class II MHC-DRB genes. We also pay special attention to the often-ignored pseudogenes, which we use to reconstruct different events in the evolution of the Class I region. We find that despite the shared function of the MHC across species, different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response. Our trees and extensive literature review represent the most comprehensive look into primate MHC evolution to date.
View details for DOI 10.7554/eLife.103545.3.sa2
View details for Web of Science ID 001630352100001
View details for PubMedID 41335001
View details for PubMedCentralID PMC12674619
-
Specificity, length and luck drive gene rankings in association studies.
Nature
2025
Abstract
Standard genome-wide association studies (GWAS) and rare variant burden tests are essential tools for identifying trait-relevant genes1. Although these methods are conceptually similar, by analysing association studies of 209 quantitative traits in the UK Biobank2-4, we show that they systematically prioritize different genes. This raises the question of how genes should ideally be prioritized. We propose two prioritization criteria: (1) trait importance - how much a gene quantitatively affects a trait; and (2) trait specificity - the importance of a gene for the trait under study relative to its importance across all traits. We find that GWAS prioritize genes near trait-specific variants, whereas burden tests prioritize trait-specific genes. Because non-coding variants can be context specific, GWAS can prioritize highly pleiotropic genes, whereas burden tests generally cannot. Both study designs are also affected by distinct trait-irrelevant factors, complicating their interpretation. Our results illustrate that burden tests and GWAS reveal different aspects of trait biology and suggest ways to improve their interpretation and usage.
View details for DOI 10.1038/s41586-025-09703-7
View details for PubMedID 41193809
View details for PubMedCentralID 7405896
-
Simple scaling laws control the genetic architectures of human complex traits.
PLoS biology
2025; 23 (10): e3003402
Abstract
Genome-wide association studies have revealed that the genetic architectures of complex traits vary widely, including in terms of the numbers, effect sizes, and allele frequencies of significant hits. However, at present we lack a principled way of understanding the similarities and differences among traits. Here, we describe a probabilistic model that combines the effects of mutation, drift, and stabilizing selection at individual sites with a genome-scale model of phenotypic variation. In this model, the architecture of a trait arises from the distribution of selection coefficients of mutations and from two scaling parameters. We fit this model for 95 highly polygenic quantitative traits of different kinds from the UK Biobank. Notably, we infer that all these traits have fairly similar, though not identical, distributions of selection coefficients. This similarity suggests that differences in architectures of highly polygenic traits arise mainly from the two scaling parameters: the mutational target size and heritability per site, which vary by orders of magnitude among traits. When these two scale factors are accounted for, we find that the architectures of all 95 traits are very similar.
View details for DOI 10.1371/journal.pbio.3003402
View details for PubMedID 41082512
-
Ancient trans-species polymorphism at the Major Histocompatibility Complex in primates.
eLife
2025; 14
Abstract
Classical genes within the Major Histocompatibility Complex (MHC) are responsible for peptide presentation to T cells, thus playing a central role in immune defense against pathogens. These genes are subject to strong selective pressures including both balancing and directional selection, resulting in exceptional genetic diversity-thousands of alleles per gene in humans. Moreover, some allelic lineages appear to be shared between primate species, a phenomenon known as trans-species polymorphism (TSP) or incomplete lineage sorting, which is rare in the genome overall. However, despite the clinical and evolutionary importance of MHC diversity, we currently lack a full picture of primate MHC evolution. In particular, we do not know to what extent genes and allelic lineages are retained across speciation events. To start addressing this gap, we explore variation across genes and species in our companion paper (Fortier and Pritchard, 2025), and here we explore variation within individual genes. We used Bayesian phylogenetic methods to determine the extent of TSP at 17 MHC genes, including classical and non-classical Class I and Class II genes. We find strong support for ancient TSP in 7 of 10 classical genes, including-remarkably-between humans and old-world monkeys in MHC-DQB1. In addition to the long-term persistence of ancient lineages, we additionally observe rapid evolution at nucleotides encoding the proteins' peptide-binding domains. The most rapidly-evolving amino acid positions are extremely enriched for autoimmune and infectious disease associations. Together, these results suggest complex selective forces-arising from differential peptide binding-that drive short-term allelic turnover within lineages while also maintaining deeply divergent lineages for at least 31 million years in some cases.
View details for DOI 10.7554/eLife.103547
View details for PubMedID 40937493
View details for PubMedCentralID PMC12431779
-
rRNA paralogs with variations, rRNA-subtypes, affect diverse human phenotypes.
medRxiv : the preprint server for health sciences
2025
Abstract
Eukaryotic ribosomal RNA (rRNA) genes exhibit hyper-variability at non-conserved regions known as Expansion Segments (ESs). Due to the numerous rRNA copies in the genome, editing ESs is challenging, and their significance remains unclear. In this study, we analyze rRNA variant frequencies in the UK Biobank population, revealing that highly abundant ES variations are causally linked to human health and physiology. We developed a Ribosome Variation Analysis (RiboVAn) method, identifying both heritable germline variants and a larger proportion of low-heritability, likely somatic variants. The most heritable variants cluster within four ESs of the 28S rRNA, with specific variants in es15l associated with adiposity, es39l linked to body dimensions, and es27l associated with blood-related traits and diseases. Variant-chromosome specificity is observed where functional variants are linked to certain rDNA chromosomes. These findings causally link rRNA sequence variation to human traits and establish that ESs have distinct and important functions in human physiology.
View details for DOI 10.1101/2025.09.02.25334953
View details for PubMedID 40950416
View details for PubMedCentralID PMC12424869
-
Gene regulatory network structure informs the distribution of perturbation effects.
PLoS computational biology
2025; 21 (9): e1013387
Abstract
Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.
View details for DOI 10.1371/journal.pcbi.1013387
View details for PubMedID 40892899
-
Buffering and non-monotonic behavior of gene dosage response curves for human complex traits.
medRxiv : the preprint server for health sciences
2025
Abstract
The genome-wide burdens of deletions, loss-of-function mutations, and duplications correlate with many traits. Curiously, for most of these traits, variants that decrease expression have the same genome-wide average direction of effect as variants that increase expression. This seemingly contradicts the intuition that for individual genes reducing expression should have the opposite effect on a phenotype as increasing expression. To understand this paradox, we use the gene dosage response curve (GDRC), which relates changes in gene expression to expected changes in phenotype. We show that, for many traits, GDRCs are systematically biased in one trait direction relative to the other, and we develop a simple theoretical model that explains this bias in trait direction. Our results have broad implications for complex traits, drug discovery, and statistical genetics.
View details for DOI 10.1101/2024.11.11.24317065
View details for PubMedID 40832387
-
Regulatory network topology and the genetic architecture of gene expression.
bioRxiv : the preprint server for biology
2025
Abstract
In human populations, most of the genetic variance in gene expression can be attributed to trans-acting expression quantitative trait loci (eQTLs) spread across the genome. However, in practice it is difficult to discover these eQTLs, and their cumulative effects on gene expression and complex traits are yet to be fully understood. Here, we assess how properties of the genetic architecture of gene expression constrain the space of plausible gene regulatory networks. We describe a structured causal model of gene expression regulation and consider how it interacts with biologically relevant properties of the gene regulatory network to alter the genomic distribution of expression heritability. Under our model, we find that the genetic architecture of gene expression is shaped in large part by local network motifs and by hub regulators that shorten paths through the network and act as key sources of trans-acting variance. Further, simulated networks with an enrichment of motifs and hub regulators best recapitulate the distribution of cis and trans heritability of gene expression as measured in a recent twin study. Taken together, our results suggest that the architecture of gene expression is sparser and more pleiotropic across genes than would be suggested by naive models of regulatory networks, which has important implications for future studies of complex traits.
View details for DOI 10.1101/2025.08.12.669924
View details for PubMedID 40832275
View details for PubMedCentralID PMC12363847
-
Investigating the Role of Neighborhood Socioeconomic Status and Germline Genetics on Prostate Cancer Risk.
HGG advances
2025: 100492
Abstract
Genetic factors play an important role in prostate cancer (PCa) development with polygenic risk scores (PRS) predicting disease risk across genetic ancestries. However, there are few convincing modifiable factors for PCa and little is known about their potential interaction with genetic risk. Our study explores the role of neighborhood socioeconomic status (nSES)-and how it may interact with PRS-on PCa risk. We analyzed incident PCa cases and controls of European (cases=5,960; controls=93,990) and African (cases=109; controls=1,226) ancestry from the UK Biobank (UKB) cohort. Using the English Indices of Deprivation, a set of validated metrics that quantify lack of resources within geographical areas, we performed logistic regression to investigate the main effects and interactions between nSES deprivation and genetic susceptibility to PCa, represented by a multi-ancestry PRS comprised of 269 genetic variants. The PRS was associated with PCa in the European (OR=2.04; 95%CI=2.00-2.09; P=5.34x10-807) and African (OR=1.35; 95%CI=1.16-1.58; P=1.05x10-4) ancestries. Additionally, nSES deprivation indices were inversely associated with PCa: employment, education, health, and income. From this, we suspect that PRS, through biological mechanisms, and nSES deprivation, likely through differences in screening, are associated with PCa, but act independently of each other. Our findings suggest that genetic factors and social determinants of health measured by neighborhood socioeconomic status do not synergistically increase risk of PCa.
View details for DOI 10.1016/j.xhgg.2025.100492
View details for PubMedID 40783788
-
Haplotype analysis reveals pleiotropic disease associations in the HLA region.
American journal of human genetics
2025
Abstract
The human leukocyte antigen (HLA) region plays an important role in human health through its involvement in immune cell recognition and maturation. While genetic variation in the HLA region is associated with many diseases, the pleiotropic patterns of these associations have not been systematically investigated. Here, we developed a haplotype approach to investigate disease associations phenome wide for 412,181 Finnish individuals and 2,459 diseases. Across the 1,035 diseases with a genome-wide association study association, we found a 17-fold average per-SNP enrichment of hits in the HLA region. Altogether, we identified 7,649 HLA associations across 647 diseases, including 1,750 associations uncovered by haplotype analysis. We found that some haplotypes show both risk-increasing and protective associations across different diseases, while others consistently increase risk across diseases, indicating a complex pleiotropic landscape involving a range of diseases. This study highlights the extensive impact of HLA variation on disease risk and underscores the importance of classical and non-classical genes as well as non-coding variation.
View details for DOI 10.1016/j.ajhg.2025.06.011
View details for PubMedID 40645183
-
Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage.
Cell genomics
2025: 100780
Abstract
Deep learning models have advanced our ability to predict cell-type-specific chromatin patterns from transcription factor (TF) binding motifs, but their application to perturbed contexts remains limited. We applied transfer learning to predict how concentrations of the dosage-sensitive TFs TWIST1 and SOX9 affect regulatory element (RE) chromatin accessibility in facial progenitor cells, achieving near-experimental accuracy. High-affinity motifs that allow for heterotypic TF co-binding and are concentrated at the center of REs buffer against quantitative changes in TF dosage and predict unperturbed accessibility. Conversely, low-affinity or homotypic binding motifs distributed throughout REs drive sensitive responses with minimal impact on unperturbed accessibility. Both buffering and sensitizing features display purifying selection signatures. We validated these sequence features through reporter assays and demonstrated that TF-nucleosome competition can explain low-affinity motifs' sensitizing effects. This combination of transfer learning and quantitative chromatin response measurements provides a novel approach for uncovering additional layers of the cis-regulatory code.
View details for DOI 10.1016/j.xgen.2025.100780
View details for PubMedID 40020686
-
Causal modeling of gene effects from regulators to programs to traits: integration of genetic associations and Perturb-seq.
bioRxiv : the preprint server for biology
2025
Abstract
Genetic association studies provide a unique tool for identifying causal links from genes to human traits and diseases. However, it is challenging to determine the biological mechanisms underlying most associations, and we lack genome-scale approaches for inferring causal mechanistic pathways from genes to cellular functions to traits. Here we propose new approaches to bridge this gap by combining quantitative estimates of gene-trait relationships from loss-of-function burden tests with gene-regulatory connections inferred from Perturb-seq experiments in relevant cell types. By combining these two forms of data, we aim to build causal graphs in which the directional associations of genes with a trait can be explained by their regulatory effects on biological programs or direct effects on the trait. As a proof-of-concept, we constructed a causal graph of the gene regulatory hierarchy that jointly controls three partially co-regulated blood traits. We propose that perturbation studies in trait-relevant cell types, coupled with gene-level effect sizes for traits, can bridge the gap between genetics and biology.
View details for DOI 10.1101/2025.01.22.634424
View details for PubMedID 39896538
-
Characterizing selection on complex traits through conditional frequency spectra.
Genetics
2024
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. Recognizing the biases inherent to GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
View details for DOI 10.1093/genetics/iyae210
View details for PubMedID 39691067
-
Specificity, length, and luck: How genes are prioritized by rare and common variant association studies.
bioRxiv : the preprint server for biology
2024
Abstract
Standard genome-wide association studies (GWAS) and rare variant burden tests are essential tools for identifying trait-relevant genes. Although these methods are conceptually similar, we show by analyzing association studies of 209 quantitative traits in the UK Biobank that they systematically prioritize different genes. This raises the question of how genes should ideally be prioritized. We propose two prioritization criteria: 1) trait importance - how much a gene quantitatively affects a trait; and 2) trait specificity - a gene's importance for the trait under study relative to its importance across all traits. We find that GWAS prioritize genes near trait-specific variants, while burden tests prioritize trait-specific genes. Because non-coding variants can be context specific, GWAS can prioritize highly pleiotropic genes, while burden tests generally cannot. Both study designs are also affected by distinct trait-irrelevant factors, complicating their interpretation. Our results illustrate that burden tests and GWAS reveal different aspects of trait biology and suggest ways to improve their interpretation and usage.
View details for DOI 10.1101/2024.12.12.628073
View details for PubMedID 39935885
View details for PubMedCentralID PMC11812597
-
Central control of dynamic gene circuits governs T cell rest and activation.
Nature
2024
Abstract
The ability of cells to maintain distinct identities and respond to transient environmental signals requires tightly controlled regulation of gene networks1-3. These dynamic regulatory circuits that respond to extracellular cues in primary human cells remain poorly defined. The need for context-dependent regulation is prominent in T cells, where distinct lineages must respond to diverse signals to mount effective immune responses and maintain homeostasis4-8. Here we performed CRISPR screens in multiple primary human CD4+ T cell contexts to identify regulators that control expression of IL-2Rα, a canonical marker of T cell activation transiently expressed by pro-inflammatory effector T cells and constitutively expressed by anti-inflammatory regulatory T cells where it is required for fitness9-11. Approximately 90% of identified regulators of IL-2Rα had effects that varied across cell types and/or stimulation states, including a subset that even had opposite effects across conditions. Using single-cell transcriptomics after pooled perturbation of context-specific screen hits, we characterized specific factors as regulators of overall rest or activation and constructed state-specific regulatory networks. MED12 - a component of the Mediator complex - serves as a dynamic orchestrator of key regulators, controlling expression of distinct sets of regulators in different T cell contexts. Immunoprecipitation-mass spectrometry revealed that MED12 interacts with the histone methylating COMPASS complex. MED12 was required for histone methylation and expression of genes encoding key context-specific regulators, including the rest maintenance factor KLF2 and the versatile regulator MYC. CRISPR ablation of MED12 blunted the cell-state transitions between rest and activation and protected from activation-induced cell death. Overall, this work leverages CRISPR screens performed across conditions to define dynamic gene circuits required to establish resting and activated T cell states.
View details for DOI 10.1038/s41586-024-08314-y
View details for PubMedID 39663454
View details for PubMedCentralID 3640494
-
Gene regulatory network inference from CRISPR perturbations in primary CD4+ T cells elucidates the genomic basis of immune disease.
Cell genomics
2024: 100671
Abstract
The effects of genetic variation on complex traits act mainly through changes in gene regulation. Although many genetic variants have been linked to target genes in cis, the trans-regulatory cascade mediating their effects remains largely uncharacterized. Mapping trans-regulators based on natural genetic variation has been challenging due to small effects, but experimental perturbations offer a complementary approach. Using CRISPR, we knocked out 84 genes in primary CD4+ T cells, targeting inborn error of immunity (IEI) disease transcription factors (TFs) and TFs without immune disease association. We developed a novel gene network inference method called linear latent causal Bayes (LLCB) to estimate the network from perturbation data and observed 211 regulatory connections between genes. We characterized programs affected by the TFs, which we associated with immune genome-wide association study (GWAS) genes, finding that JAK-STAT family members are regulated by KMT2A, an epigenetic regulator. These analyses reveal the trans-regulatory cascades linking GWAS genes to signaling pathways.
View details for DOI 10.1016/j.xgen.2024.100671
View details for PubMedID 39395408
-
The Primate Major Histocompatibility Complex: An Illustrative Example of Gene Family Evolution.
bioRxiv : the preprint server for biology
2024
Abstract
Gene families are groups of evolutionarily-related genes. One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity. Across the ~60 million year history of the primates, some MHC genes have turned over completely, some have changed function, some have converged in function, and others have remained essentially unchanged. Past work has typically focused on identifying MHC alleles within particular species or comparing gene content, but more work is needed to understand the overall evolution of the gene family across species. Thus, despite the immunologic importance of the MHC and its peculiar evolutionary history, we lack a complete picture of MHC evolution in the primates. We readdress this question using sequences from dozens of MHC genes and pseudogenes spanning the entire primate order, building a comprehensive set of gene and allele trees with modern methods. Overall, we find that the Class I gene subfamily is evolving much more quickly than the Class II gene subfamily, with the exception of the Class II MHC-DRB genes. We also pay special attention to the often-ignored pseudogenes, which we use to reconstruct different events in the evolution of the Class I region. We find that despite the shared function of the MHC across species, different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response. Our trees and extensive literature review represent the most comprehensive look into MHC evolution to date.
View details for DOI 10.1101/2024.09.16.613318
View details for PubMedID 39345418
View details for PubMedCentralID PMC11429698
-
Deciphering the impact of genomic variation on function.
Nature
2024; 633 (8028): 47-57
Abstract
Our genomes influence nearly every aspect of human biology-from molecular and cellular functions to phenotypes in health and disease. Studying the differences in DNA sequence between individuals (genomic variation) could reveal previously unknown mechanisms of human biology, uncover the basis of genetic predispositions to diseases, and guide the development of new diagnostic tools and therapeutic agents. Yet, understanding how genomic variation alters genome function to influence phenotype has proved challenging. To unlock these insights, we need a systematic and comprehensive catalogue of genome function and the molecular and cellular effects of genomic variants. Towards this goal, the Impact of Genomic Variation on Function (IGVF) Consortium will combine approaches in single-cell mapping, genomic perturbations and predictive modelling to investigate the relationships among genomic variation, genome function and phenotypes. IGVF will create maps across hundreds of cell types and states describing how coding variants alter protein activity, how noncoding variants change the regulation of gene expression, and how such effects connect through gene-regulatory and protein-interaction networks. These experimental data, computational predictions and accompanying standards and pipelines will be integrated into an open resource that will catalyse community efforts to explore how our genomes influence biology and disease across populations.
View details for DOI 10.1038/s41586-024-07510-0
View details for PubMedID 39232149
View details for PubMedCentralID 7405896
-
Diversity of ribosomes at the level of rRNA variation associated with human health and disease.
Cell genomics
2024: 100629
Abstract
With hundreds of copies of rDNA, it is unknown whether they possess sequence variations that form different types of ribosomes. Here, we developed an algorithm for long-read variant calling, termed RGA, which revealed that variations in human rDNA loci are predominantly insertion-deletion (indel) variants. We developed full-length rRNA sequencing (RIBO-RT) and in situ sequencing (SWITCH-seq), which showed that translating ribosomes possess variation in rRNA. Over 1,000 variants are lowly expressed. However, tens of variants are abundant and form distinct rRNA subtypes with different structures near indels as revealed by long-read rRNA structure probing coupled to dimethyl sulfate sequencing. rRNA subtypes show differential expression in endoderm/ectoderm-derived tissues, and in cancer, low-abundance rRNA variants can become highly expressed. Together, this study identifies the diversity of ribosomes at the level of rRNA variants, their chromosomal location, and unique structure as well as the association of ribosome variation with tissue-specific biology and cancer.
View details for DOI 10.1016/j.xgen.2024.100629
View details for PubMedID 39111318
-
Investigating the Role of Neighborhood Socioeconomic Status and Germline Genetics on Prostate Cancer Risk.
medRxiv : the preprint server for health sciences
2024
Abstract
Genetic factors play an important role in prostate cancer (PCa) development with polygenic risk scores (PRS) predicting disease risk across genetic ancestries. However, there are few convincing modifiable factors for PCa and little is known about their potential interaction with genetic risk. We analyzed incident PCa cases (n=6,155) and controls (n=98,257) of European and African ancestry from the UK Biobank (UKB) cohort to evaluate the role of neighborhood socioeconomic status (nSES)-and how it may interact with PRS-on PCa risk.We evaluated a multi-ancestry PCa PRS containing 269 genetic variants to understand the association of germline genetics with PCa in UKB. Using the English Indices of Deprivation, a set of validated metrics that quantify lack of resources within geographical areas, we performed logistic regression to investigate the main effects and interactions between nSES deprivation, PCa PRS, and PCa.The PCa PRS was strongly associated with PCa (OR=2.04; 95%CI=2.00-2.09; P<0.001). Additionally, nSES deprivation indices were inversely associated with PCa: employment (OR=0.91; 95%CI=0.86-0.96; P<0.001), education (OR=0.94; 95%CI=0.83-0.98; P<0.001), health (OR=0.91; 95%CI=0.86-0.96; P<0.001), and income (OR=0.91; 95%CI=0.86-0.96; P<0.001). The PRS effects showed little heterogeneity across nSES deprivation indices, except for the Townsend Index (P=0.03).We reaffirmed genetics as a risk factor for PCa and identified nSES deprivation domains that influence PCa detection and are potentially correlated with environmental exposures that are a risk factor for PCa. These findings also suggest that nSES and genetic risk factors for PCa act independently.
View details for DOI 10.1101/2024.07.31.24311312
View details for PubMedID 39132496
View details for PubMedCentralID PMC11312637
-
Haplotype Analysis Reveals Pleiotropic Disease Associations in the HLA Region.
medRxiv : the preprint server for health sciences
2024
Abstract
The human leukocyte antigen (HLA) region plays an important role in human health through involvement in immune cell recognition and maturation. While genetic variation in the HLA region is associated with many diseases, the pleiotropic patterns of these associations have not been systematically investigated. Here, we developed a haplotype approach to investigate disease associations phenome-wide for 412,181 Finnish individuals and 2,459 traits. Across the 1,035 diseases with a GWAS association, we found a 17-fold average per-SNP enrichment of hits in the HLA region. Altogether, we identified 7,649 HLA associations across 647 traits, including 1,750 associations uncovered by haplotype analysis. We find some haplotypes show trade-offs between diseases, while others consistently increase risk across traits, indicating a complex pleiotropic landscape involving a range of diseases. This study highlights the extensive impact of HLA variation on disease risk, and underscores the importance of classical and non-classical genes, as well as non-coding variation.
View details for DOI 10.1101/2024.07.29.24311183
View details for PubMedID 39132491
View details for PubMedCentralID PMC11312630
-
Genetic variants affect diurnal glucose levels throughout the day.
bioRxiv : the preprint server for biology
2024
Abstract
Circadian rhythms not only coordinate the timing of wake and sleep but also regulate homeostasis within the body, including glucose metabolism. However, the genetic variants that contribute to temporal control of glucose levels have not been previously examined. Using data from 420,000 individuals from the UK Biobank and replicating our findings in 100,000 individuals from the Estonian Biobank, we show that diurnal serum glucose is under genetic control. We discover a robust temporal association of glucose levels at the Melatonin receptor 1B (MTNR1B) (rs10830963, P = 1e-22) and a canonical circadian pacemaker gene Cryptochrome 2 (CRY2) loci (rs12419690, P = 1e-16). Furthermore, we show that sleep modulates serum glucose levels and the genetic variants have a separate mechanism of diurnal control. Finally, we show that these variants independently modulate risk of type 2 diabetes. Our findings, together with earlier genetic and epidemiological evidence, show a clear connection between sleep and metabolism and highlight variation at MTNR1B and CRY2 as temporal regulators for glucose levels.
View details for DOI 10.1101/2024.07.22.604631
View details for PubMedID 39091879
View details for PubMedCentralID PMC11291026
-
Bayesian estimation of gene constraint from an evolutionary model with gene features.
Nature genetics
2024
Abstract
Measures of selective constraint on genes have been used for many applications, including clinical interpretation of rare coding variants, disease gene discovery and studies of genome evolution. However, widely used metrics are severely underpowered at detecting constraints for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. Here we developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease and other phenotypes, especially for short genes. Our estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve the estimation of many gene-level properties, such as rare variant burden or gene expression differences.
View details for DOI 10.1038/s41588-024-01820-9
View details for PubMedID 38977852
View details for PubMedCentralID 5618255
-
Gene regulatory network structure informs the distribution of perturbation effects.
bioRxiv : the preprint server for biology
2024
Abstract
Gene regulatory networks (GRNs) govern many core developmental and biological processes underlying human complex traits. Even with broad-scale efforts to characterize the effects of molecular perturbations and interpret gene coexpression, it remains challenging to infer the architecture of gene regulation in a precise and efficient manner. Key properties of GRNs, like hierarchical structure, modular organization, and sparsity, provide both challenges and opportunities for this objective. Here, we seek to better understand properties of GRNs using a new approach to simulate their structure and model their function. We produce realistic network structures with a novel generating algorithm based on insights from small-world network theory, and we model gene expression regulation using stochastic differential equations formulated to accommodate modeling molecular perturbations. With these tools, we systematically describe the effects of gene knockouts within and across GRNs, finding a subset of networks that recapitulate features of a recent genome-scale perturbation study. With deeper analysis of these exemplar networks, we consider future avenues to map the architecture of gene expression regulation using data from cells in perturbed and unperturbed states, finding that while perturbation data are critical to discover specific regulatory interactions, data from unperturbed cells may be sufficient to reveal regulatory programs.
View details for DOI 10.1101/2024.07.04.602130
View details for PubMedID 39005431
-
A model for accurate quantification of CRISPR effects in pooled FACS screens.
bioRxiv : the preprint server for biology
2024
Abstract
CRISPR screens are powerful tools to identify key genes that underlie biological processes. One important type of screen uses fluorescence activated cell sorting (FACS) to sort perturbed cells into bins based on the expression level of marker genes, followed by guide RNA (gRNA) sequencing. Analysis of these data presents several statistical challenges due to multiple factors including the discrete nature of the bins and typically small numbers of replicate experiments. To address these challenges, we developed a robust and powerful Bayesian random effects model and software package called Waterbear. Furthermore, we used Waterbear to explore how various experimental design parameters affect statistical power to establish principled guidelines for future screens. Finally, we experimentally validated our experimental design model findings that, when using Waterbear for analysis, high power is maintained even at low cell coverage and a high multiplicity of infection. We anticipate that Waterbear will be of broad utility for analyzing FACS-based CRISPR screens.
View details for DOI 10.1101/2024.06.17.599448
View details for PubMedID 38948774
View details for PubMedCentralID PMC11213010
-
Conditional frequency spectra as a tool for studying selection on complex traits in biobanks.
bioRxiv : the preprint server for biology
2024
Abstract
Natural selection on complex traits is difficult to study in part due to the ascertainment inherent to genome-wide association studies (GWAS). The power to detect a trait-associated variant in GWAS is a function of frequency and effect size - but for traits under selection, the effect size of a variant determines the strength of selection against it, constraining its frequency. To account for GWAS ascertainment, we propose studying the joint distribution of allele frequencies across populations, conditional on the frequencies in the GWAS cohort. Before considering these conditional frequency spectra, we first characterized the impact of selection and non-equilibrium demography on allele frequency dynamics forwards and backwards in time. We then used these results to understand conditional frequency spectra under realistic human demography. Finally, we investigated empirical conditional frequency spectra for GWAS variants associated with 106 complex traits, finding compelling evidence for either stabilizing or purifying selection. Our results provide insight into polygenic score portability and other properties of variants ascertained with GWAS, highlighting the utility of conditional frequency spectra.
View details for DOI 10.1101/2024.06.15.599126
View details for PubMedID 38948697
View details for PubMedCentralID PMC11212903
-
Transfer learning reveals sequence determinants of the quantitative response to transcription factor dosage.
bioRxiv : the preprint server for biology
2024
Abstract
Deep learning approaches have made significant advances in predicting cell type-specific chromatin patterns from the identity and arrangement of transcription factor (TF) binding motifs. However, most models have been applied in unperturbed contexts, precluding a predictive understanding of how chromatin state responds to TF perturbation. Here, we used transfer learning to train and interpret deep learning models that use DNA sequence to predict, with accuracy approaching experimental reproducibility, how the concentration of two dosage-sensitive TFs (TWIST1, SOX9) affects regulatory element (RE) chromatin accessibility in facial progenitor cells. High-affinity motifs that allow for heterotypic TF co-binding and are concentrated at the center of REs buffer against quantitative changes in TF dosage and strongly predict unperturbed accessibility. In contrast, motifs with low-affinity or homotypic binding distributed throughout REs lead to sensitive responses with minimal contributions to unperturbed accessibility. Both buffering and sensitizing features show signatures of purifying selection. We validated these predictive sequence features using reporter assays and showed that a biophysical model of TF-nucleosome competition can explain the sensitizing effect of low-affinity motifs. Our approach of combining transfer learning and quantitative measurements of the chromatin response to TF dosage therefore represents a powerful method to reveal additional layers of the cis-regulatory code.
View details for DOI 10.1101/2024.05.28.596078
View details for PubMedID 38853998
-
Ancient DNA in Context: Port Cities and Mobility in the Iron Age and Roman Mediterranean
WILEY. 2024: 121
View details for Web of Science ID 001276799900453
-
Stable population structure in Europe since the Iron Age, despite high mobility.
eLife
2024; 13
Abstract
Ancient DNA research in the past decade has revealed that European population structure changed dramatically in the prehistoric period (14,000-3000 years before present, YBP), reflecting the widespread introduction of Neolithic farmer and Bronze Age Steppe ancestries. However, little is known about how population structure changed from the historical period onward (3000 YBP - present). To address this, we collected whole genomes from 204 individuals from Europe and the Mediterranean, many of which are the first historical period genomes from their region (e.g. Armenia and France). We found that most regions show remarkable inter-individual heterogeneity. At least 7% of historical individuals carry ancestry uncommon in the region where they were sampled, some indicating cross-Mediterranean contacts. Despite this high level of mobility, overall population structure across western Eurasia is relatively stable through the historical period up to the present, mirroring geography. We show that, under standard population genetics models with local panmixia, the observed level of dispersal would lead to a collapse of population structure. Persistent population structure thus suggests a lower effective migration rate than indicated by the observed dispersal. We hypothesize that this phenomenon can be explained by extensive transient dispersal arising from drastically improved transportation networks and the Roman Empire's mobilization of people for trade, labor, and military. This work highlights the utility of ancient DNA in elucidating finer scale human population dynamics in recent history.
View details for DOI 10.7554/eLife.79714
View details for PubMedID 38288729
-
Toward the Identifiability of Comparative Deep Generative Models
edited by Locatello, F., Didelez
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2024: 868-912
View details for Web of Science ID 001221064100032
-
Base-editing mutagenesis maps alleles to tune human T cell functions.
Nature
2023
Abstract
CRISPR-enabled screening is a powerful tool for the discovery of genes that control T cell function and has nominated candidate targets for immunotherapies1-6. However, new approaches are required to probe specific nucleotide sequences within key genes. Systematic mutagenesis in primary human T cells could reveal alleles that tune specific phenotypes. DNA base editors are powerful tools for introducing targeted mutations with high efficiency7,8. Here we develop a large-scale base-editing mutagenesis platform with the goal of pinpointing nucleotides that encode amino acid residues that tune primary human T cell activation responses. We generated a library of around 117,000 single guide RNA molecules targeting base editors to protein-coding sites across 385 genes implicated in T cell function and systematically identified protein domains and specific amino acid residues that regulate T cell activation and cytokine production. We found a broad spectrum of alleles with variants encoding critical residues in proteins including PIK3CD, VAV1, LCP2, PLCG1 and DGKZ, including both gain-of-function and loss-of-function mutations. We validated the functional effects of many alleles and further demonstrated that base-editing hits could positively and negatively tune T cell cytotoxic function. Finally, higher-resolution screening using a base editor with relaxed protospacer-adjacent motif requirements9 (NG versus NGG) revealed specific structural domains and protein-protein interaction sites that can be targeted to tune T cell functions. Base-editing screens in primary immune cells thus provide biochemical insights with the potential to accelerate immunotherapy design.
View details for DOI 10.1038/s41586-023-06835-6
View details for PubMedID 38093011
View details for PubMedCentralID 6689405
-
Systematic differences in discovery of genetic effects on gene expression and complex traits.
Nature genetics
2023
Abstract
Most signals in genome-wide association studies (GWAS) of complex traits implicate noncoding genetic variants with putative gene regulatory effects. However, currently identified regulatory variants, notably expression quantitative trait loci (eQTLs), explain only a small fraction of GWAS signals. Here, we show that GWAS and cis-eQTL hits are systematically different: eQTLs cluster strongly near transcription start sites, whereas GWAS hits do not. Genes near GWAS hits are enriched in key functional annotations, are under strong selective constraint and have complex regulatory landscapes across different tissue/cell types, whereas genes near eQTLs are depleted of most functional annotations, show relaxed constraint, and have simpler regulatory landscapes. We describe a model to understand these observations, including how natural selection on complex traits hinders discovery of functionally relevant eQTLs. Our results imply that GWAS and eQTL studies are systematically biased toward different types of variant, and support the use of complementary functional approaches alongside the next generation of eQTL studies.
View details for DOI 10.1038/s41588-023-01529-1
View details for PubMedID 37857933
View details for PubMedCentralID 7405896
-
The functional impact of rare variation across the regulatory cascade.
Cell genomics
2023; 3 (10): 100401
Abstract
Each human genome has tens of thousands of rare genetic variants; however, identifying impactful rare variants remains a major challenge. We demonstrate how use of personal multi-omics can enable identification of impactful rare variants by using the Multi-Ethnic Study of Atherosclerosis, which included several hundred individuals, with whole-genome sequencing, transcriptomes, methylomes, and proteomes collected across two time points, 10 years apart. We evaluated each multi-omics phenotype's ability to separately and jointly inform functional rare variation. By combining expression and protein data, we observed rare stop variants 62 times and rare frameshift variants 216 times as frequently as controls, compared to 13-27 times as frequently for expression or protein effects alone. We extended a Bayesian hierarchical model, "Watershed," to prioritize specific rare variants underlying multi-omics signals across the regulatory cascade. With this approach, we identified rare variants that exhibited large effect sizes on multiple complex traits including height, schizophrenia, and Alzheimer's disease.
View details for DOI 10.1016/j.xgen.2023.100401
View details for PubMedID 37868038
View details for PubMedCentralID PMC10589633
-
Scaling the Discrete-time Wright-Fisher Model to Biobank-scale Datasets.
Genetics
2023
Abstract
The Discrete-Time Wright-Fisher (DTWF) model and its diffusion limit are central to population genetics. These models can describe the forward-in-time evolution of allele frequencies in a population resulting from genetic drift, mutation, and selection. Computing likelihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large samples or in the presence of strong selection. Existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present a scalable algorithm that approximates the DTWF model with provably bounded error. Our approach relies on two key observations about the DTWF model. The first is that transition probabilities under the model are approximately sparse. The second is that transition distributions for similar starting allele frequencies are extremely close as distributions. Together, these observations enable approximate matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the tens of millions, paving the way for rigorous biobank-scale inference. Finally, we use our results to estimate the impact of larger samples on estimating selection coefficients for loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
View details for DOI 10.1093/genetics/iyad168
View details for PubMedID 37724741
-
Gene regulatory network inference from CRISPR perturbations in primary CD4+ T cells elucidates the genomic basis of immune disease.
bioRxiv : the preprint server for biology
2023
Abstract
The effects of genetic variation on complex traits act mainly through changes in gene regulation. Although many genetic variants have been linked to target genes in cis, the trans-regulatory cascade mediating their effects remains largely uncharacterized. Mapping trans-regulators based on natural genetic variation, including eQTL mapping, has been challenging due to small effects. Experimental perturbation approaches offer a complementary and powerful approach to mapping trans-regulators. We used CRISPR knockouts of 84 genes in primary CD4+ T cells to perturb an immune cell gene network, targeting both inborn error of immunity (IEI) disease transcription factors (TFs) and background TFs matched in constraint and expression level, but without a known immune disease association. We developed a novel Bayesian structure learning method called Linear Latent Causal Bayes (LLCB) to estimate the gene regulatory network from perturbation data and observed 211 directed edges among the genes which could not be detected in existing CD4+ trans-eQTL data. We used LLCB to characterize the differences between the IEI and background TFs, finding that the gene groups were highly interconnected, but that IEI TFs were much more likely to regulate immune cell specific pathways and immune GWAS genes. We further characterized nine coherent gene programs based on downstream effects of the TFs and linked these modules to regulation of GWAS genes, finding that canonical JAK-STAT family members are regulated by KMT2A, a global epigenetic regulator. These analyses reveal the trans-regulatory cascade from upstream epigenetic regulator to intermediate TFs to downstream effector cytokines and elucidate the logic linking immune GWAS genes to key signaling pathways.
View details for DOI 10.1101/2023.09.17.557749
View details for PubMedID 37745614
View details for PubMedCentralID PMC10516010
-
CRISPR screens decode cancer cell pathways that trigger γδ T cell detection.
Nature
2023
Abstract
γδ T cells are potent anticancer effectors with the potential to target tumours broadly, independent of patient-specific neoantigens or human leukocyte antigen background1-5. γδ T cells can sense conserved cell stress signals prevalent in transformed cells2,3, although the mechanisms behind the targeting of stressed target cells remain poorly characterized. Vγ9Vδ2 T cells-the most abundant subset of human γδ T cells4-recognize a protein complex containing butyrophilin 2A1 (BTN2A1) and BTN3A1 (refs. 6-8), a widely expressed cell surface protein that is activated by phosphoantigens abundantly produced by tumour cells. Here we combined genome-wide CRISPR screens in target cancer cells to identify pathways that regulate γδ T cell killing and BTN3A cell surface expression. The screens showed previously unappreciated multilayered regulation of BTN3A abundance on the cell surface and triggering of γδ T cells through transcription, post-translational modifications and membrane trafficking. In addition, diverse genetic perturbations and inhibitors disrupting metabolic pathways in the cancer cells, particularly ATP-producing processes, were found to alter BTN3A levels. This induction of both BTN3A and BTN2A1 during metabolic crises is dependent on AMP-activated protein kinase (AMPK). Finally, small-molecule activation of AMPK in a cell line model and in patient-derived tumour organoids led to increased expression of the BTN2A1-BTN3A complex and increased Vγ9Vδ2 T cell receptor-mediated killing. This AMPK-dependent mechanism of metabolic stress-induced ligand upregulation deepens our understanding of γδ T cell stress surveillance and suggests new avenues available to enhance γδ T cell anticancer activity.
View details for DOI 10.1038/s41586-023-06482-x
View details for PubMedID 37648854
View details for PubMedCentralID 7614706
-
A genetic history of continuity and mobility in the Iron Age central Mediterranean.
Nature ecology & evolution
2023
Abstract
The Iron Age was a dynamic period in central Mediterranean history, with the expansion of Greek and Phoenician colonies and the growth of Carthage into the dominant maritime power of the Mediterranean. These events were facilitated by the ease of long-distance travel following major advances in seafaring. We know from the archaeological record that trade goods and materials were moving across great distances in unprecedented quantities, but it is unclear how these patterns correlate with human mobility. Here, to investigate population mobility and interactions directly, we sequenced the genomes of 30 ancient individuals from coastal cities around the central Mediterranean, in Tunisia, Sardinia and central Italy. We observe a meaningful contribution of autochthonous populations, as well as highly heterogeneous ancestry including many individuals with non-local ancestries from other parts of the Mediterranean region. These results highlight both the role of local populations and the extreme interconnectedness of populations in the Iron Age Mediterranean. By studying these trans-Mediterranean neighbours together, we explore the complex interplay between local continuity and mobility that shaped the Iron Age societies of the central Mediterranean.
View details for DOI 10.1038/s41559-023-02143-4
View details for PubMedID 37592021
-
A genome-wide genetic screen uncovers determinants of human pigmentation.
Science (New York, N.Y.)
2023; 381 (6658): eade6289
Abstract
Skin color, one of the most diverse human traits, is determined by the quantity, type, and distribution of melanin. In this study, we leveraged the light-scattering properties of melanin to conduct a genome-wide screen for regulators of melanogenesis. We identified 169 functionally diverse genes that converge on melanosome biogenesis, endosomal transport, and gene regulation, of which 135 represented previously unknown associations with pigmentation. In agreement with their melanin-promoting function, the majority of screen hits were up-regulated in melanocytes from darkly pigmented individuals. We further unraveled functions of KLF6 as a transcription factor that regulates melanosome maturation and pigmentation in vivo, and of the endosomal trafficking protein COMMD3 in modulating melanosomal pH. Our study reveals a plethora of melanin-promoting genes, with broad implications for human variation, cell biology, and medicine.
View details for DOI 10.1126/science.ade6289
View details for PubMedID 37561850
-
On the number of genealogical ancestors tracing to the source groups of an admixed population.
Genetics
2023; 224 (3)
Abstract
Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual's genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75-85% value for African ancestry on average and 15-25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960-1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240-376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32-69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.
View details for DOI 10.1093/genetics/iyad079
View details for PubMedID 37410594
-
Bayesian estimation of gene constraint from an evolutionary model with gene features.
Research square
2023
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
View details for DOI 10.21203/rs.3.rs-3012879/v1
View details for PubMedID 37398424
View details for PubMedCentralID PMC10312940
-
Subfunctionalized expression drives evolutionary retention of ribosomal protein paralogs Rps27 and Rps27l in vertebrates.
eLife
2023; 12
Abstract
The formation of paralogs through gene duplication is a core evolutionary process. For paralogs that encode components of protein complexes such as the ribosome, a central question is whether they encode functionally distinct proteins, or whether they exist to maintain appropriate total expression of equivalent proteins. Here, we systematically tested evolutionary models of paralog function using the ribosomal protein paralogs Rps27 (eS27) and Rps27l (eS27L) as a case study. Evolutionary analysis suggests that Rps27 and Rps27l likely arose during whole-genome duplication(s) in a common vertebrate ancestor. We show that Rps27 and Rps27l have inversely correlated mRNA abundance across mouse cell types, with the highest Rps27 in lymphocytes and the highest Rps27l in mammary alveolar cells and hepatocytes. By endogenously tagging the Rps27 and Rps27l proteins, we demonstrate that Rps27- and Rps27l-ribosomes associate preferentially with different transcripts. Furthermore, murine Rps27 and Rps27l loss-of-function alleles are homozygous lethal at different developmental stages. However, strikingly, expressing Rps27 protein from the endogenous Rps27l locus or vice versa completely rescues loss-of-function lethality and yields mice with no detectable deficits. Together, these findings suggest that Rps27 and Rps27l are evolutionarily retained because their subfunctionalized expression patterns render both genes necessary to achieve the requisite total expression of two equivalent proteins across cell types. Our work represents the most in-depth characterization of a mammalian ribosomal protein paralog to date and highlights the importance of considering both protein function and expression when investigating paralogs.
View details for DOI 10.7554/eLife.78695
View details for PubMedID 37306301
-
Scaling the Discrete-time Wright Fisher model to biobank-scale datasets.
bioRxiv : the preprint server for biology
2023
Abstract
The Discrete-Time Wright Fisher (DTWF) model and its large population diffusion limit are central to population genetics. These models describe the forward-in-time evolution of the frequency of an allele in a population and can include the fundamental forces of genetic drift, mutation, and selection. Computing like-lihoods under the diffusion process is feasible, but the diffusion approximation breaks down for large sample sizes or in the presence of strong selection. Unfortunately, existing methods for computing likelihoods under the DTWF model do not scale to current exome sequencing sample sizes in the hundreds of thousands. Here we present an algorithm that approximates the DTWF model with provably bounded error and runs in time linear in the size of the population. Our approach relies on two key observations about Binomial distributions. The first is that Binomial distributions are approximately sparse. The second is that Binomial distributions with similar success probabilities are extremely close as distributions, allowing us to approximate the DTWF Markov transition matrix as a very low rank matrix. Together, these observations enable matrix-vector multiplication in linear (as opposed to the usual quadratic) time. We prove similar properties for Hypergeometric distributions, enabling fast computation of likelihoods for subsamples of the population. We show theoretically and in practice that this approximation is highly accurate and can scale to population sizes in the billions, paving the way for rigorous biobank-scale population genetic inference. Finally, we use our results to estimate how increasing sample sizes will improve the estimation of selection coefficients acting on loss-of-function variants. We find that increasing sample sizes beyond existing large exome sequencing cohorts will provide essentially no additional information except for genes with the most extreme fitness effects.
View details for DOI 10.1101/2023.05.19.541517
View details for PubMedID 37293115
View details for PubMedCentralID PMC10245735
-
Bayesian estimation of gene constraint from an evolutionary model with gene features.
bioRxiv : the preprint server for biology
2023
Abstract
Measures of selective constraint on genes have been used for many applications including clinical interpretation of rare coding variants, disease gene discovery, and studies of genome evolution. However, widely-used metrics are severely underpowered at detecting constraint for the shortest ~25% of genes, potentially causing important pathogenic mutations to be overlooked. We developed a framework combining a population genetics model with machine learning on gene features to enable accurate inference of an interpretable constraint metric, shet. Our estimates outperform existing metrics for prioritizing genes important for cell essentiality, human disease, and other phenotypes, especially for short genes. Our new estimates of selective constraint should have wide utility for characterizing genes relevant to human disease. Finally, our inference framework, GeneBayes, provides a flexible platform that can improve estimation of many gene-level properties, such as rare variant burden or gene expression differences.
View details for DOI 10.1101/2023.05.19.541520
View details for PubMedID 37292653
View details for PubMedCentralID PMC10245655
-
Narcolepsy risk loci outline role of T cell autoimmunity and infectious triggers in narcolepsy.
Nature communications
2023; 14 (1): 2709
Abstract
Narcolepsy type 1 (NT1) is caused by a loss of hypocretin/orexin transmission. Risk factors include pandemic 2009 H1N1 influenza A infection and immunization with Pandemrix®. Here, we dissect disease mechanisms and interactions with environmental triggers in a multi-ethnic sample of 6,073 cases and 84,856 controls. We fine-mapped GWAS signals within HLA (DQ0602, DQB1*03:01 and DPB1*04:02) and discovered seven novel associations (CD207, NAB1, IKZF4-ERBB3, CTSC, DENND1B, SIRPG, PRF1). Significant signals at TRA and DQB1*06:02 loci were found in 245 vaccination-related cases, who also shared polygenic risk. T cell receptor associations in NT1 modulated TRAJ*24, TRAJ*28 and TRBV*4-2 chain-usage. Partitioned heritability and immune cell enrichment analyses found genetic signals to be driven by dendritic and helper T cells. Lastly comorbidity analysis using data from FinnGen, suggests shared effects between NT1 and other autoimmune diseases. NT1 genetic variants shape autoimmunity and response to environmental triggers, including influenza A infection and immunization with Pandemrix®.
View details for DOI 10.1038/s41467-023-36120-z
View details for PubMedID 37188663
View details for PubMedCentralID PMC10185546
-
A novel quantitative trait locus implicates Msh3 in the propensity for genome-wide short tandem repeat expansions in mice.
Genome research
2023
Abstract
Short tandem repeats (STRs) are a class of rapidly mutating genetic elements typically characterized by repeated units of 1-6bp. We leveraged whole genome sequencing data for 152 recombinant inbred (RI) strains from the BXD family of mice to map loci that modulate genome-wide patterns of new mutations arising during parent-to-offspring transmission at STRs. We defined quantitative phenotypes describing the numbers and types of germline STR mutations in each strain and performed quantitative trait locus (QTL) analyses for each of these phenotypes. We identified a locus on Chromosome 13 at which strains inheriting the C57BL/6J (B) haplotype have a higher rate of STR expansions than those inheriting the DBA/2J (D) haplotype. The strongest candidate gene in this locus is Msh3, a known modifier of STR stability in cancer and at pathogenic repeat expansions in mice and humans, and a current drug target against Huntington's disease. The D haplotype at this locus harbors a cluster of variants near the 5' end of Msh3 including multiple missense variants near the DNA mismatch recognition domain. In contrast, the B haplotype contains a unique retrotransposon insertion. The rate of expansion covaries positively with Msh3 expression-with higher expression from the B haplotype. Finally, detailed analysis of mutation patterns showed that strains carrying the B allele have higher expansion rates, but slightly lower overall total mutation rates, compared to those with the D allele, particularly at tetranucleotide repeats. Our results suggest an important role for inherited variants in Msh3 in modulating genome-wide patterns of germline mutations at STRs.
View details for DOI 10.1101/gr.277576.122
View details for PubMedID 37127331
-
Precise modulation of transcription factor levels identifies features underlying dosage sensitivity.
Nature genetics
2023
Abstract
Transcriptional regulation exhibits extensive robustness, but human genetics indicates sensitivity to transcription factor (TF) dosage. Reconciling such observations requires quantitative studies of TF dosage effects at trait-relevant ranges, largely lacking so far. TFs play central roles in both normal-range and disease-associated variation in craniofacial morphology; we therefore developed an approach to precisely modulate TF levels in human facial progenitor cells and applied it to SOX9, a TF associated with craniofacial variation and disease (Pierre Robin sequence (PRS)). Most SOX9-dependent regulatory elements (REs) are buffered against small decreases in SOX9 dosage, but REs directly and primarily regulated by SOX9 show heightened sensitivity to SOX9 dosage; these RE responses partially predict gene expression responses. Sensitive REs and genes preferentially affect functional chondrogenesis and PRS-like craniofacial shape variation. We propose that such REs and genes underlie the sensitivity of specific phenotypes to TF dosage, while buffering of other genes leads to robust, nonlinear dosage-to-phenotype relationships.
View details for DOI 10.1038/s41588-023-01366-2
View details for PubMedID 37024583
-
A comprehensive rRNA variation atlas in health and disease.
bioRxiv : the preprint server for biology
2023
Abstract
The hundreds of copies of ribosomal-DNA genes are the dark-matter of the human genome as it is unknown whether they possess sequence variation that forms different types of ribosomes. Here, we have overcome the technical hurdle of long-read sequencing of full-length ribosomal-RNA (rRNA) and developed an efficient algorithm for rRNA-variant detection. We discovered hundreds of variants that are not silent but are incorporated into translating ribosomes. These include tens of abundant variants within functionally important domains of the ribosome. Strikingly, variants assemble into distinct ribosome subtypes encoded on different chromosomes. With this first atlas of expressed rRNA-variants, we discover the impact of rRNA variation on health and disease. Across human tissues, we observe tissue-specific variant expression in endoderm/ectoderm derived tissues. In cancer, low abundant rRNA-variants become highly expressed. Together, this study provides a curated atlas for exploring rRNA variation and functionally links ribosome variation to tissue-specific biology and cancer.
View details for DOI 10.1101/2023.01.30.526360
View details for PubMedID 36778251
View details for PubMedCentralID PMC9915487
-
Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling
edited by VanDerSchaar, M., Zhang, C., Janzing, D.
JMLR-JOURNAL MACHINE LEARNING RESEARCH. 2023: 662-691
View details for Web of Science ID 001222721800028
-
Integrative analysis of metabolite GWAS illuminates the molecular basis of pleiotropy and genetic correlation.
eLife
2022; 11
Abstract
Pleiotropy and genetic correlation are widespread features in GWAS, but they are often difficult to interpret at the molecular level. Here, we perform GWAS of 16 metabolites clustered at the intersection of amino acid catabolism, glycolysis, and ketone body metabolism in a subset of UK Biobank. We utilize the well-documented biochemistry jointly impacting these metabolites to analyze pleiotropic effects in the context of their pathways. Among the 213 lead GWAS hits, we find a strong enrichment for genes encoding pathway-relevant enzymes and transporters. We demonstrate that the effect directions of variants acting on biology between metabolite pairs often contrast with those of upstream or downstream variants as well as the polygenic background. Thus, we find that these outlier variants often reflect biology local to the traits. Finally, we explore the implications for interpreting disease GWAS, underscoring the potential of unifying biochemistry with dense metabolomics data to understand the molecular basis of pleiotropy in complex traits and diseases.
View details for DOI 10.7554/eLife.79348
View details for PubMedID 36073519
-
RNA editing underlies genetic risk of common inflammatory diseases.
Nature
2022
Abstract
A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.
View details for DOI 10.1038/s41586-022-05052-x
View details for PubMedID 35922514
-
Systematic discovery and perturbation of regulatory genes in human T cells reveals the architecture of immune networks.
Nature genetics
2022
Abstract
Gene regulatory networks ensure that important genes are expressed at precise levels. When gene expression is sufficiently perturbed, it can lead to disease. To understand how gene expression disruptions percolate through a network, we must first map connections between regulatory genes and their downstream targets. However, we lack comprehensive knowledge of the upstream regulators of most genes. Here, we developed an approach for systematic discovery of upstream regulators of critical immune factors-IL2RA, IL-2 and CTLA4-in primary human T cells. Then, we mapped the network of the target genes of these regulators and putative cis-regulatory elements using CRISPR perturbations, RNA-seq and ATAC-seq. These regulators form densely interconnected networks with extensive feedback loops. Furthermore, this network is enriched for immune-associated disease variants and genes. These results provide insight into how immune-associated disease genes are regulated in T cells and broader principles about the structure of human gene regulatory networks.
View details for DOI 10.1038/s41588-022-01106-y
View details for PubMedID 35817986
-
Genetic interactions drive heterogeneity in causal variant effect sizes for gene expression and complex traits.
American journal of human genetics
2022
Abstract
Despite the growing number of genome-wide association studies (GWASs), it remains unclear to what extent gene-by-gene and gene-by-environment interactions influence complex traits in humans. The magnitude of genetic interactions in complex traits has been difficult to quantify because GWASs are generally underpowered to detect individual interactions of small effect. Here, we develop a method to test for genetic interactions that aggregates information across all trait-associated loci. Specifically, we test whether SNPs in regions of European ancestry shared between European American and admixed African American individuals have the same causal effect sizes. We hypothesize that in African Americans, the presence of genetic interactions will drive the causal effect sizes of SNPs in regions of European ancestry to be more similar to those of SNPs in regions of African ancestry. We apply our method to two traits: gene expression in 296 African Americans and 482 European Americans in the Multi-Ethnic Study of Atherosclerosis (MESA) and low-density lipoprotein cholesterol (LDL-C) in 74K African Americans and 296K European Americans in the Million Veteran Program (MVP). We find significant evidence for genetic interactions in our analysis of gene expression; for LDL-C, we observe a similar point estimate, although this is not significant, most likely due to lower statistical power. These results suggest that gene-by-gene or gene-by-environment interactions modify the effect sizes of causal variants in human complex traits.
View details for DOI 10.1016/j.ajhg.2022.05.014
View details for PubMedID 35716666
-
A natural mutator allele shapes mutation spectrum variation in mice.
Nature
2022
Abstract
Although germline mutation rates and spectra can vary within and between species, commongenetic modifiers of themutation rate have not been identifiedin nature1. Here we searched for loci that influence germline mutagenesis using a uniquely powerful resource: a panel of recombinant inbred mouse lines known as theBXD, descended from the laboratory strains C57BL/6J (B haplotype) and DBA/2J (D haplotype). Each BXD lineage has been maintained by brother-sister mating in the near absence of natural selection, accumulating de novo mutations for up to 50 years on a known genetic background that is a unique linear mosaic of B and D haplotypes2. We show that mice inheriting D haplotypes at a quantitative trait locus on chromosome 4 accumulate C>A germline mutations at a 50% higher rate than those inheriting B haplotypes, primarily owing to the activity of a C>A-dominated mutational signature known as SBS18. The B and D quantitative trait locus haplotypes encode different alleles of Mutyh, a DNA repair gene that underlies the heritable cancer predisposition syndromethat causes colorectal tumors with a high SBS18 mutation load3,4. Both B and D Mutyh alleles are present in wild populations of Mus musculus domesticus, providing evidence that common genetic variation modulates germline mutagenesis in a model mammalian species.
View details for DOI 10.1038/s41586-022-04701-5
View details for PubMedID 35545679
-
Author Correction: Perspectives on ENCODE.
Nature
2022
View details for DOI 10.1038/s41586-021-04213-8
View details for PubMedID 35474002
-
Author Correction: Expanded encyclopaedias of DNA elements in the human and mouse genomes.
Nature
2022
View details for DOI 10.1038/s41586-021-04226-3
View details for PubMedID 35474001
-
Author Correction: Genetics of 35 blood and urine biomarkers in the UK Biobank.
Nature genetics
2021
View details for DOI 10.1038/s41588-021-00956-2
View details for PubMedID 34608296
-
Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression.
Nature genetics
2021
Abstract
Trait-associated genetic variants affect complex phenotypes primarily via regulatory mechanisms on the transcriptome. To investigate the genetics of gene expression, we performed cis- and trans-expression quantitative trait locus (eQTL) analyses using blood-derived expression from 31,684 individuals through the eQTLGen Consortium. We detected cis-eQTL for 88% of genes, and these were replicable in numerous tissues. Distal trans-eQTL (detected for 37% of 10,317 trait-associated variants tested) showed lower replication rates, partially due to low replication power and confounding by cell type composition. However, replication analyses in single-cell RNA-seq data prioritized intracellular trans-eQTL. Trans-eQTL exerted their effects via several mechanisms, primarily through regulation by transcription factors. Expression of 13% of the genes correlated with polygenic scores for 1,263 phenotypes, pinpointing potential drivers for those traits. In summary, this work represents a large eQTL resource, and its results serve as a starting point for in-depth interpretation of complex phenotypes.
View details for DOI 10.1038/s41588-021-00913-z
View details for PubMedID 34475573
-
Shared heritability of human face and brain shape.
Nature genetics
2021
Abstract
Evidence from model organisms and clinical genetics suggests coordination between the developing brain and face, but the role of this link in common genetic variation remains unknown. We performed a multivariate genome-wide association study of cortical surface morphology in 19,644 individuals of European ancestry, identifying 472 genomic loci influencing brain shape, of which 76 are also linked to face shape. Shared loci include transcription factors involved in craniofacial development, as well as members of signaling pathways implicated in brain-face cross-talk. Brain shape heritability is equivalently enriched near regulatory regions active in either forebrain organoids or facial progenitors. However, we do not detect significant overlap between shared brain-face genome-wide association study signals and variants affecting behavioral-cognitive traits. These results suggest that early in embryogenesis, the face and brain mutually shape each other through both structural effects and paracrine signaling, but this interplay may not impact later brain development associated with cognitive function.
View details for DOI 10.1038/s41588-021-00827-w
View details for PubMedID 33821002
-
GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background.
eLife
2021; 10
Abstract
Genome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. We describe UK Biobank GWAS results for three molecular traits-urate, IGF-1, and testosterone-with better-understood biology than most other complex traits. We find that many of the most significant hits are readily interpretable. We observe huge enrichment of associations near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of each trait, including differences in testosterone regulation between females and males. At the same time, even these molecular traits are highly polygenic, with many thousands of variants spread across the genome contributing to trait variance. In summary, for these three molecular traits we identify strong enrichment of signal in putative core gene sets, even while most of the SNP-based heritability is driven by a massively polygenic background.
View details for DOI 10.7554/eLife.58615
View details for PubMedID 33587031
-
A regulatory variant at 3q21.1 confers an increased pleiotropic risk for hyperglycemia and altered bone mineral density.
Cell metabolism
2021
Abstract
Skeletal and glycemic traits have shared etiology, but the underlying genetic factors remain largely unknown. To identify genetic loci that may have pleiotropic effects, we studied Genome-wide association studies (GWASs) for bone mineral density and glycemic traits and identified a bivariate risk locus at 3q21. Using sequence and epigenetic modeling, we prioritized an adenylate cyclase 5 (ADCY5) intronic causal variant, rs56371916. This SNP changes the binding affinity of SREBP1 and leads to differential ADCY5 gene expression, altering the chromatin landscape from poised to repressed. These alterations result in bone- and type 2 diabetes-relevant cell-autonomous changes in lipid metabolism in osteoblasts and adipocytes. We validated our findings by directly manipulating the regulator SREBP1, the target gene ADCY5, and the variant rs56371916, which together imply a novel link between fatty acid oxidation and osteoblast differentiation. Our work, by systematic functional dissection of pleiotropic GWAS loci, represents a framework to uncover biological mechanisms affecting pleiotropic traits.
View details for DOI 10.1016/j.cmet.2021.01.001
View details for PubMedID 33513366
-
Genetics of 35 blood and urine biomarkers in the UK Biobank.
Nature genetics
2021
Abstract
Clinical laboratory tests are a critical component of the continuum of care. We evaluate the genetic basis of 35 blood and urine laboratory measurements in the UK Biobank (n=363,228 individuals). We identify 1,857 loci associated with at least one trait, containing 3,374 fine-mapped associations and additional sets of large-effect (>0.1s.d.) protein-altering, human leukocyte antigen (HLA) and copy number variant (CNV) associations. Through Mendelian randomization (MR) analysis, we discover 51 causal relationships, including previously known agonistic effects of urate on gout and cystatin C on stroke. Finally, we develop polygenic risk scores (PRSs) for each biomarker and build 'multi-PRS' models for diseases using 35 PRSs simultaneously, which improved chronic kidney disease, type 2 diabetes, gout and alcoholic cirrhosis genetic risk stratification in an independent dataset (FinnGen; n=135,500) relative to single-disease PRSs. Together, our results delineate the genetic basis of biomarkers and their causal influences on diseases and improve genetic risk stratification for common diseases.
View details for DOI 10.1038/s41588-020-00757-z
View details for PubMedID 33462484
-
Using CRISPR screens to map gene regulatory networks in primary human T cells
WILEY. 2021: 41
View details for Web of Science ID 000605453000065
-
Perspectives on ENCODE.
Nature
2020; 583 (7818): 693–98
Abstract
The Encylopedia of DNA Elements (ENCODE) Project launched in 2003 with the long-term goal of developing a comprehensive map of functional elements in the human genome. These included genes, biochemical regions associated with gene regulation (for example, transcription factor binding sites, open chromatin, and histone marks) and transcript isoforms. The marks serve as sites for candidate cis-regulatory elements (cCREs) that may serve functional roles in regulating gene expression1. The project has been extended to model organisms, particularly the mouse. In the third phase of ENCODE, nearly a million and more than 300,000 cCRE annotations have been generated for human and mouse, respectively, and these have provided a valuable resource for the scientific community.
View details for DOI 10.1038/s41586-020-2449-8
View details for PubMedID 32728248
-
The Genetics of Malaria Resistance in Ancient Rome
WILEY. 2020: 191–92
View details for Web of Science ID 000513288902138
-
Variable prediction accuracy of polygenic scores within an ancestry group.
eLife
2020; 9
Abstract
Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.
View details for DOI 10.7554/eLife.48376
View details for PubMedID 31999256
-
Evolutionary Persistence of DNA Methylation for Millions of Years after Ancient Loss of a De Novo Methyltransferase.
Cell
2020
Abstract
Cytosine methylation of DNA is a widespread modification of DNA that plays numerous critical roles. In the yeast Cryptococcus neoformans, CG methylation occurs in transposon-rich repeats and requires theDNA methyltransferase Dnmt5. We show that Dnmt5 displays exquisite maintenance-type specificity invitro and invivo and utilizes similar invivo cofactors as the metazoan maintenance methylase Dnmt1. Remarkably, phylogenetic and functional analysis revealed that the ancestral species lost the gene for a de novo methylase, DnmtX, between 50-150 mya. We examined how methylation has persisted since the ancient loss of DnmtX. Experimental and comparative studies reveal efficient replication of methylation patterns in C.neoformans, rare stochastic methylation loss and gain events, and the action of natural selection. We propose that an epigenome has been propagated for >50 million years through aprocess analogous to Darwinian evolution of the genome.
View details for DOI 10.1016/j.cell.2019.12.012
View details for PubMedID 31955845
-
Expanded encyclopaedias of DNA elements in the human and mouse genomes.
Nature
2020; 583 (7818): 699–710
Abstract
The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.
View details for DOI 10.1038/s41586-020-2493-4
View details for PubMedID 32728249
-
Chromatin accessibility dynamics in a model of human forebrain development.
Science (New York, N.Y.)
2020; 367 (6476)
Abstract
Forebrain development is characterized by highly synchronized cellular processes, which, if perturbed, can cause disease. To chart the regulatory activity underlying these events, we generated a map of accessible chromatin in human three-dimensional forebrain organoids. To capture corticogenesis, we sampled glial and neuronal lineages from dorsal or ventral forebrain organoids over 20 months in vitro. Active chromatin regions identified in human primary brain tissue were observed in organoids at different developmental stages. We used this resource to map genetic risk for disease and to explore evolutionary conservation. Moreover, we integrated chromatin accessibility with transcriptomics to identify putative enhancer-gene linkages and transcription factors that regulate human corticogenesis. Overall, this platform brings insights into gene-regulatory dynamics at previously inaccessible stages of human forebrain development, including signatures of neuropsychiatric disorders.
View details for DOI 10.1126/science.aay1645
View details for PubMedID 31974223
-
Landscape of stimulation-responsive chromatin across diverse human immune cells.
Nature genetics
2019
Abstract
A hallmark of the immune system is the interplay among specialized cell types transitioning between resting and stimulated states. The gene regulatory landscape of this dynamic system has not been fully characterized in human cells. Here we collected assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA sequencing data under resting and stimulated conditions for up to 32 immune cell populations. Stimulation caused widespread chromatin remodeling, including response elements shared between stimulated B and T cells. Furthermore, several autoimmune traits showed significant heritability in stimulation-responsive elements from distinct cell types, highlighting the importance of these cell states in autoimmunity. Allele-specific read mapping identified variants that alter chromatin accessibility in particular conditions, allowing us to observe evidence of function for a candidate causal variant that is undetected by existing large-scale studies in resting cells. Our results provide a resource of chromatin dynamics and highlight the need to characterize the effects of genetic variation in stimulated cells.
View details for DOI 10.1038/s41588-019-0505-9
View details for PubMedID 31570894
-
Public Discussion Affects Question Asking at Academic Conferences.
American journal of human genetics
2019
Abstract
Women are under-represented in science, technology, engineering, and mathematics (STEM). Despite the recent emphasis on diversity in STEM, our understanding of what drives differences between women and men scientists remains limited. This, in turn, limits our ability to intervene to level the playing field. To quantify the representation and participation of women and men at academic meetings in human genetics, we developed high-throughput and crowd-sourced approaches focused on question-asking behavior. Question asking is one voluntary and self-initiated scientific activity we can measure. Here we report that women ask fewer questions than expected regardless of their representation in talk audiences. We present evidence that external barriers affect the representation of women in STEM. However, differences in question-asking behavior suggest that internal factors also impact women's participation. We then examine the effects of specific interventions and show that wide public discussion of the relative under-participation of women in question-and-answer sessions alters question-asking behavior. We suggest that engaging the community in such projects promotes visibility of diversity issues at academic meetings and allows for efficient data collection that can be used to further explore and understand differences in conference participation.
View details for DOI 10.1016/j.ajhg.2019.06.004
View details for PubMedID 31256875
-
Trans Effects on Gene Expression Can Drive Omnigenic Inheritance.
Cell
2019; 177 (4): 1022
Abstract
Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.
View details for PubMedID 31051098
-
Reduced signal for polygenic adaptation of height in UK Biobank
ELIFE
2019; 8
View details for DOI 10.7554/eLife.39725
View details for Web of Science ID 000461987200001
-
Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences.
Evolution, medicine, and public health
2019; 2019 (1): 26–34
Abstract
Recent analyses of polygenic scores have opened new discussions concerning the genetic basis and evolutionary significance of differences among populations in distributions of phenotypes. Here, we highlight limitations in research on polygenic scores, polygenic adaptation and population differences. We show how genetic contributions to traits, as estimated by polygenic scores, combine with environmental contributions so that differences among populations in trait distributions need not reflect corresponding differences in genetic propensity. Under a null model in which phenotypes are selectively neutral, genetic propensity differences contributing to phenotypic differences among populations are predicted to be small. We illustrate this null hypothesis in relation to health disparities between African Americans and European Americans, discussing alternative hypotheses with selective and environmental effects. Close attention to the limitations of research on polygenic phenomena is important for the interpretation of their relationship to human population differences.
View details for PubMedID 30838127
-
Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences
EVOLUTION MEDICINE AND PUBLIC HEALTH
2019: 26–34
View details for DOI 10.1093/emph/eoy036
View details for Web of Science ID 000461138300009
-
Ancient Rome: A genetic crossroads of Europe and the Mediterranean.
Science (New York, N.Y.)
2019; 366 (6466): 708–14
Abstract
Ancient Rome was the capital of an empire of ~70 million inhabitants, but little is known about the genetics of ancient Romans. Here we present 127 genomes from 29 archaeological sites in and around Rome, spanning the past 12,000 years. We observe two major prehistoric ancestry transitions: one with the introduction of farming and another prior to the Iron Age. By the founding of Rome, the genetic composition of the region approximated that of modern Mediterranean populations. During the Imperial period, Rome's population received net immigration from the Near East, followed by an increase in genetic contributions from Europe. These ancestry shifts mirrored the geopolitical affiliations of Rome and were accompanied by marked interindividual diversity, reflecting gene flow from across the Mediterranean, Europe, and North Africa.
View details for DOI 10.1126/science.aay6826
View details for PubMedID 31699931
-
High-resolution mapping of cancer cell networks using co-functional interactions.
Molecular systems biology
2018; 14 (12): e8594
Abstract
Powerful new technologies for perturbing genetic elements have recently expanded the study of genetic interactions in model systems ranging from yeast to human cell lines. However, technical artifacts can confound signal across genetic screens and limit the immense potential of parallel screening approaches. To address this problem, we devised a novel PCA-based method for correcting genome-wide screening data, bolstering the sensitivity and specificity of detection for genetic interactions. Applying this strategy to a set of 436 whole genome CRISPR screens, we report more than 1.5 million pairs of correlated "co-functional" genes that provide finer-scale information about cell compartments, biological pathways, and protein complexes than traditional gene sets. Lastly, we employed a gene community detection approach to implicate core genes for cancer growth and compress signal from functionally related genes in the same community into a single score. This work establishes new algorithms for probing cancer cell networks and motivates the acquisition of further CRISPR screen data across diverse genotypes and cell types to further resolve complex cellular processes.
View details for PubMedID 30573688
-
Evidence for Weak Selective Constraint on Human Gene Expression.
Genetics
2018
Abstract
Gene expression variation is a major contributor to phenotypic variation in human complex traits. Selection on complex traits may therefore be reflected in constraint on gene expression. Here, we explore the effects of stabilizing selection on cis-regulatory genetic variation in humans. We analyze patterns of expression variation at copy number variants and find evidence for selection against large increases in gene expression. Using allele-specific expression (ASE) data, we further show evidence of selection against smaller-effect variants. We estimate that, across all genes, singletons in a sample of 122 individuals have approximately 2.2* greater effects on expression variation than the average variant across allele frequencies. Despite their increased effect size relative to common variants, we estimate that singletons in the sample studied explain, on average, only 5% of the heritability of gene expression from cis-regulatory variants. Finally, we show that genes depleted for loss-of-function variants are also depleted for cis-eQTLs and have low levels of allelic imbalance, confirming tighter constraint on the expression levels of these genes. We conclude that constraint on gene expression is present, but has relatively weak effects on most cis-regulatory variants, thus permitting high levels of gene-regulatory genetic variation.
View details for PubMedID 30554168
-
Functional Genetic Variants Revealed by Massively Parallel Precise Genome Editing.
Cell
2018
Abstract
A major challenge in genetics is to identify genetic variants driving natural phenotypic variation. However, current methods of genetic mapping havelimited resolution. To address this challenge, we developed aCRISPR-Cas9-based high-throughput genome editing approach that can introduce thousands of specific genetic variants in a single experiment. This enabled us to study the fitness consequences of 16,006 natural genetic variants in yeast. We identified 572 variants with significant fitness differences in glucose media; these are highly enriched in promoters, particularly in transcription factor binding sites, while only 19.2% affect amino acid sequences. Strikingly, nearby variants nearly always favor the same parent's alleles, suggesting that lineage-specific selection is often driven by multiple clusteredvariants. In sum, our genome editing approach reveals the genetic architecture of fitness variation at single-base resolution and could be adapted tomeasure the effects of genome-wide genetic variation in any screen for cell survival or cell-sortable markers.
View details for PubMedID 30245013
-
Post-translational buffering leads to convergent protein expression levels between primates
GENOME BIOLOGY
2018; 19: 83
Abstract
Differences in gene regulation between human and closely related species influence phenotypes that are distinctly human. While gene regulation is a multi-step process, the majority of research concerning divergence in gene regulation among primates has focused on transcription.To gain a comprehensive view of gene regulation, we surveyed genome-wide ribosome occupancy, which reflects levels of protein translation, in lymphoblastoid cell lines derived from human, chimpanzee, and rhesus macaque. We further integrated messenger RNA and protein level measurements collected from matching cell lines. We find that, in addition to transcriptional regulation, the major factor determining protein level divergence between human and closely related species is post-translational buffering. Inter-species divergence in transcription is generally propagated to the level of protein translation. In contrast, gene expression divergence is often attenuated post-translationally, potentially mediated through post-translational modifications.Results from our analysis indicate that post-translational buffering is a conserved mechanism that led to relaxation of selective constraint on transcript levels in humans.
View details for PubMedID 29950183
-
Determining the genetic basis of anthracycline-cardiotoxicity by response QTL mapping in induced cardiomyocytes.
eLife
2018; 7
Abstract
Anthracycline-induced cardiotoxicity (ACT) is a key limiting factor in setting optimal chemotherapy regimes, with almost half of patients expected to develop congestive heart failure given high doses. However, the genetic basis of sensitivity to anthracyclines remains unclear. We created a panel of iPSC-derived cardiomyocytes from 45 individuals and performed RNA-seq after 24h exposure to varying doxorubicin dosages. The transcriptomic response is substantial: the majority of genes are differentially expressed and over 6000 genes show evidence of differential splicing, the later driven by reduced splicing fidelity in the presence of doxorubicin. We show that inter-individual variation in transcriptional response is predictive of in vitro cell damage, which in turn is associated with in vivo ACT risk. We detect 447 response-expression QTLs and 42 response-splicing QTLs, which are enriched in lower ACT GWAS p-values, supporting the in vivo relevance of our map of genetic regulation of cellular response to anthracyclines.
View details for PubMedID 29737278
-
Remodeling the Specificity of an Endosomal CORVET Tether Underlies Formation of Regulated Secretory Vesicles in the Ciliate Tetrahymena thermophila
CURRENT BIOLOGY
2018; 28 (5): 697-+
Abstract
In the endocytic pathway of animals, two related complexes, called CORVET (class C core vacuole/endosome transport) and HOPS (homotypic fusion and protein sorting), act as both tethers and fusion factors for early and late endosomes, respectively. Mutations in CORVET or HOPS lead to trafficking defects and contribute to human disease, including immune dysfunction. HOPS and CORVET are conserved throughout eukaryotes, but remarkably, in the ciliate Tetrahymena thermophila, the HOPS-specific subunits are absent, while CORVET-specific subunits have proliferated. VPS8 (vacuolar protein sorting), a CORVET subunit, expanded to 6 paralogs in Tetrahymena. This expansion correlated with loss of HOPS within a ciliate subgroup, including the Oligohymenophorea, which contains Tetrahymena. As uncovered via forward genetics, a single VPS8 paralog in Tetrahymena (VPS8A) is required to synthesize prominent secretory granules called mucocysts. More specifically, Δvps8a cells fail to deliver a subset of cargo proteins to developing mucocysts, instead accumulating that cargo in vesicles also bearing the mucocyst-sorting receptor Sor4p. Surprisingly, although this transport step relies on CORVET, it does not appear to involve early endosomes. Instead, Vps8a associates with the late endosomal/lysosomal marker Rab7, indicating that target specificity switching occurred in CORVET subunits during the evolution of ciliates. Mucocysts belong to a markedly diverse and understudied class of protist secretory organelles called extrusomes. Our results underscore that biogenesis of mucocysts depends on endolysosomal trafficking, revealing parallels with invasive organelles in apicomplexan parasites and suggesting that a wide array of secretory adaptations in protists, like in animals, depend on mechanisms related to lysosome biogenesis.
View details for PubMedID 29478853
View details for PubMedCentralID PMC5840023
-
Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment.
Cell stem cell
2018; 22 (4): 600–607.e4
Abstract
Aging is linked to functional deterioration and hematological diseases. The hematopoietic system is maintained by hematopoietic stem cells (HSCs), and dysfunction within the HSC compartment is thought to be a key mechanism underlying age-related hematopoietic perturbations. Using single-cell transplantation assays with five blood-lineage analysis, we previously identified myeloid-restricted repopulating progenitors (MyRPs) within the phenotypic HSC compartment in young mice. Here, we determined the age-related functional changes to the HSC compartment using over 400 single-cell transplantation assays. Notably, MyRP frequency increased dramatically with age, while multipotent HSCs expanded modestly within the bone marrow. We also identified a subset of functional cells that were myeloid restricted in primary recipients but displayed multipotent (five blood-lineage) output in secondary recipients. We have termed this cell type latent-HSCs, which appear exclusive to the aged HSC compartment. These results question the traditional dogma of HSC aging and our current approaches to assay and define HSCs.
View details for PubMedID 29625072
-
Annotation-free quantification of RNA splicing using LeafCutter
NATURE GENETICS
2018; 50 (1): 151-+
Abstract
The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
View details for PubMedID 29229983
View details for PubMedCentralID PMC5742080
-
Impact of regulatory variation across human iPSCs and differentiated cells
GENOME RESEARCH
2018; 28 (1): 122–31
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.
View details for PubMedID 29208628
-
Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2017; 114 (48): 12779–84
Abstract
Gene conversion is the copying of a genetic sequence from a "donor" region to an "acceptor." In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role NAGC plays in various genetic diseases and the concerted evolution of gene families, the parameters that govern NAGC are not well characterized. Here, we survey duplicate gene families and identify converted tracts in 46% of them. These conversions reflect a large GC bias of NAGC. We develop a sequence evolution model that leverages substantially more information in duplicate sequences than used by previous methods and use it to estimate the parameters that govern NAGC in humans: a mean converted tract length of 250 bp and a probability of [Formula: see text] per generation for a nucleotide to be converted (an order of magnitude higher than the point mutation rate). Despite this high baseline rate, we show that NAGC slows down as duplicate sequences diverge-until an eventual "escape" of the sequences from its influence. As a result, NAGC has a small average effect on the sequence divergence of duplicates. This work improves our understanding of the NAGC mechanism and the role that it plays in the evolution of gene duplicates.
View details for PubMedID 29138319
-
Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression
AMERICAN JOURNAL OF HUMAN GENETICS
2017; 101 (5): 686–99
Abstract
Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq. In this study, we demonstrated RolyPoly's accuracy through simulation and validated previously known tissue-trait associations. We discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.
View details for PubMedID 29106824
View details for PubMedCentralID PMC5673624
-
Quantification of transplant-derived circulating cell-free DNA in absence of a donor genotype
PLOS COMPUTATIONAL BIOLOGY
2017; 13 (8): e1005629
Abstract
Quantification of cell-free DNA (cfDNA) in circulating blood derived from a transplanted organ is a powerful approach to monitoring post-transplant injury. Genome transplant dynamics (GTD) quantifies donor-derived cfDNA (dd-cfDNA) by taking advantage of single-nucleotide polymorphisms (SNPs) distributed across the genome to discriminate donor and recipient DNA molecules. In its current implementation, GTD requires genotyping of both the transplant recipient and donor. However, in practice, donor genotype information is often unavailable. Here, we address this issue by developing an algorithm that estimates dd-cfDNA levels in the absence of a donor genotype. Our algorithm predicts heart and lung allograft rejection with an accuracy that is similar to conventional GTD. We furthermore refined the algorithm to handle closely related recipients and donors, a scenario that is common in bone marrow and kidney transplantation. We show that it is possible to estimate dd-cfDNA in bone marrow transplant patients that are unrelated or that are siblings of the donors, using a hidden Markov model (HMM) of identity-by-descent (IBD) states along the genome. Last, we demonstrate that comparing dd-cfDNA to the proportion of donor DNA in white blood cells can differentiate between relapse and the onset of graft-versus-host disease (GVHD). These methods alleviate some of the barriers to the implementation of GTD, which will further widen its clinical application.
View details for PubMedID 28771616
-
An Expanded View of Complex Traits: From Polygenic to Omnigenic
CELL
2017; 169 (7): 1177–86
Abstract
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome-including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an "omnigenic" model.
View details for PubMedID 28622505
View details for PubMedCentralID PMC5536862
-
Rapid evolution of the human mutation spectrum
ELIFE
2017; 6
Abstract
DNA is a remarkably precise medium for copying and storing biological information. This high fidelity results from the action of hundreds of genes involved in replication, proofreading, and damage repair. Evolutionary theory suggests that in such a system, selection has limited ability to remove genetic variants that change mutation rates by small amounts or in specific sequence contexts. Consistent with this, using SNV variation as a proxy for mutational input, we report here that mutational spectra differ substantially among species, human continental groups and even some closely related populations. Close examination of one signal, an increased TCC→TTC mutation rate in Europeans, indicates a burst of mutations from about 15,000 to 2000 years ago, perhaps due to the appearance, drift, and ultimate elimination of a genetic modifier of mutation rate. Our results suggest that mutation rates can evolve markedly over short evolutionary timescales and suggest the possibility of mapping mutational modifiers.
View details for DOI 10.7554/eLife.24284
View details for Web of Science ID 000401409000001
View details for PubMedID 28440220
-
Tracing the peopling of the world through genomics.
Nature
2017; 541 (7637): 302-310
Abstract
Advances in the sequencing and the analysis of the genomes of both modern and ancient peoples have facilitated a number of breakthroughs in our understanding of human evolutionary history. These include the discovery of interbreeding between anatomically modern humans and extinct hominins; the development of an increasingly detailed description of the complex dispersal of modern humans out of Africa and their population expansion worldwide; and the characterization of many of the genetic adaptions of humans to local environmental conditions. Our interpretation of the evolutionary history and adaptation of humans is being transformed by analyses of these new genomic data.
View details for DOI 10.1038/nature21347
View details for PubMedID 28102248
-
Batch effects and the effective design of single-cell gene expression studies
SCIENTIFIC REPORTS
2017; 7
Abstract
Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
View details for DOI 10.1038/srep39921
View details for Web of Science ID 000391022000001
View details for PubMedID 28045081
View details for PubMedCentralID PMC5206706
-
Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans
PLOS GENETICS
2016; 12 (12)
Abstract
The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. In summary, we show that variable mutation rates are key determinants of the SFS in humans.
View details for DOI 10.1371/journal.pgen.1006489
View details for Web of Science ID 000392138700034
View details for PubMedID 27977673
View details for PubMedCentralID PMC5157949
-
A Bibliometric History of the Journal GENETICS
GENETICS
2016; 204 (4): 1337-1342
View details for DOI 10.1534/genetics.116.196964
View details for Web of Science ID 000390765500004
View details for PubMedID 27927899
View details for PubMedCentralID PMC5161266
-
Detection of human adaptation during the past 2000 years.
Science
2016
Abstract
Detection of recent natural selection is a challenging problem in population genetics. Here we introduce the singleton density score (SDS), a method to infer very recent changes in allele frequencies from contemporary genome sequences. Applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past ~2000 to 3000 years. We see strong signals of selection at lactase and the major histocompatibility complex, and in favor of blond hair and blue eyes. For polygenic adaptation, we find that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we identify shifts associated with other complex traits, suggesting that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.
View details for PubMedID 27738015
-
Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.
Nature genetics
2016; 48 (10): 1193-1203
Abstract
We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.
View details for DOI 10.1038/ng.3646
View details for PubMedID 27526324
-
Genetic variation in MHC proteins is associated with T cell receptor expression biases.
Nature genetics
2016; 48 (9): 995-1002
Abstract
In each individual, a highly diverse T cell receptor (TCR) repertoire interacts with peptides presented by major histocompatibility complex (MHC) molecules. Despite extensive research, it remains controversial whether germline-encoded TCR-MHC contacts promote TCR-MHC specificity and, if so, whether differences exist in TCR V gene compatibilities with different MHC alleles. We applied expression quantitative trait locus (eQTL) mapping to test for associations between genetic variation and TCR V gene usage in a large human cohort. We report strong trans associations between variation in the MHC locus and TCR V gene usage. Fine-mapping of the association signals identifies specific amino acids from MHC genes that bias V gene usage, many of which contact or are spatially proximal to the TCR or peptide in the TCR-peptide-MHC complex. Hence, these MHC variants, several of which are linked to autoimmune diseases, can directly affect TCR-MHC interaction. These results provide the first examples of trans-QTL effects mediated by protein-protein interactions and are consistent with intrinsic TCR-MHC specificity.
View details for DOI 10.1038/ng.3625
View details for PubMedID 27479906
View details for PubMedCentralID PMC5010864
-
Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice.
Nature genetics
2016; 48 (8): 919-926
Abstract
Although mice are the most widely used mammalian model organism, genetic studies have suffered from limited mapping resolution due to extensive linkage disequilibrium (LD) that is characteristic of crosses among inbred strains. Carworth Farms White (CFW) mice are a commercially available outbred mouse population that exhibit rapid LD decay in comparison to other available mouse populations. We performed a genome-wide association study (GWAS) of behavioral, physiological and gene expression phenotypes using 1,200 male CFW mice. We used genotyping by sequencing (GBS) to obtain genotypes at 92,734 SNPs. We also measured gene expression using RNA sequencing in three brain regions. Our study identified numerous behavioral, physiological and expression quantitative trait loci (QTLs). We integrated the behavioral QTL and eQTL results to implicate specific genes, including Azi2 in sensitivity to methamphetamine and Zmynd11 in anxiety-like behavior. The combination of CFW mice, GBS and RNA sequencing constitutes a powerful approach to GWAS in mice.
View details for DOI 10.1038/ng.3609
View details for PubMedID 27376237
View details for PubMedCentralID PMC4963286
-
Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.
eLife
2016; 5
Abstract
Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.
View details for DOI 10.7554/eLife.13328
View details for PubMedID 27232982
View details for PubMedCentralID PMC4940163
-
Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals
SCIENCE
2016; 352 (6288): 1009-1013
Abstract
Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, whereas others focus on sharing of gene dosage. RNA-sequencing data from 46 human and 26 mouse tissues indicate that subfunctionalization of expression evolves slowly and is rare among duplicates that arose within the placental mammals, possibly because tandem duplicates are coregulated by shared genomic elements. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression levels of single-copy genes. Thus, dosage sharing of expression allows for the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.
View details for DOI 10.1126/science.aad8411
View details for Web of Science ID 000376147800053
View details for PubMedID 27199432
-
RNA splicing is a primary link between genetic variation and disease
SCIENCE
2016; 352 (6285): 600-604
Abstract
Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.
View details for DOI 10.1126/science.aad9417
View details for Web of Science ID 000374998600048
View details for PubMedID 27126046
-
Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs.
PLoS genetics
2016; 12 (1)
Abstract
The advent of induced pluripotent stem cells (iPSCs) revolutionized human genetics by allowing us to generate pluripotent cells from easily accessible somatic tissues. This technology can have immense implications for regenerative medicine, but iPSCs also represent a paradigm shift in the study of complex human phenotypes, including gene regulation and disease. Yet, an unresolved caveat of the iPSC model system is the extent to which reprogrammed iPSCs retain residual phenotypes from their precursor somatic cells. To directly address this issue, we used an effective study design to compare regulatory phenotypes between iPSCs derived from two types of commonly used somatic precursor cells. We find a remarkably small number of differences in DNA methylation and gene expression levels between iPSCs derived from different somatic precursors. Instead, we demonstrate genetic variation is associated with the majority of identifiable variation in DNA methylation and gene expression levels. We show that the cell type of origin only minimally affects gene expression levels and DNA methylation in iPSCs, and that genetic variation is the main driver of regulatory differences between iPSCs of different donors. Our findings suggest that studies using iPSCs should focus on additional individuals rather than clones from the same individual.
View details for DOI 10.1371/journal.pgen.1005793
View details for PubMedID 26812582
-
Abundant contribution of short tandem repeats to gene expression variation in humans.
Nature genetics
2016; 48 (1): 22-9
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
View details for DOI 10.1038/ng.3461
View details for PubMedID 26642241
View details for PubMedCentralID PMC4909355
-
Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs
PLOS GENETICS
2016; 12 (1)
Abstract
The advent of induced pluripotent stem cells (iPSCs) revolutionized human genetics by allowing us to generate pluripotent cells from easily accessible somatic tissues. This technology can have immense implications for regenerative medicine, but iPSCs also represent a paradigm shift in the study of complex human phenotypes, including gene regulation and disease. Yet, an unresolved caveat of the iPSC model system is the extent to which reprogrammed iPSCs retain residual phenotypes from their precursor somatic cells. To directly address this issue, we used an effective study design to compare regulatory phenotypes between iPSCs derived from two types of commonly used somatic precursor cells. We find a remarkably small number of differences in DNA methylation and gene expression levels between iPSCs derived from different somatic precursors. Instead, we demonstrate genetic variation is associated with the majority of identifiable variation in DNA methylation and gene expression levels. We show that the cell type of origin only minimally affects gene expression levels and DNA methylation in iPSCs, and that genetic variation is the main driver of regulatory differences between iPSCs of different donors. Our findings suggest that studies using iPSCs should focus on additional individuals rather than clones from the same individual.
View details for DOI 10.1371/journal.pgen.1005793
View details for Web of Science ID 000369368200031
View details for PubMedCentralID PMC4727884
-
Abundant contribution of short tandem repeats to gene expression variation in humans
NATURE GENETICS
2016; 48 (1): 22-?
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
View details for DOI 10.1038/ng.3461
View details for Web of Science ID 000367255300009
View details for PubMedCentralID PMC4909355
-
Whole Genome Sequencing Identifies a Novel Factor Required for Secretory Granule Maturation in Tetrahymena thermophila.
G3 (Bethesda, Md.)
2016; 6 (8): 2505-2516
Abstract
Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded to a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies, in part, on ancestral lysosomal sorting machinery, but is also likely to involve novel factors. In prior work, multiple strains with defects in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation-a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wild-type copy of MMA1, and disrupting MMA1 in an otherwise wild-type strain phenocopies UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation.
View details for DOI 10.1534/g3.116.028878
View details for PubMedID 27317773
View details for PubMedCentralID PMC4978903
-
WASP: allele-specific software for robust molecular quantitative trait locus discovery
NATURE METHODS
2015; 12 (11): 1061-1063
View details for DOI 10.1038/NMETH.3582
View details for PubMedID 26366987
-
WASP: allele-specific software for robust molecular quantitative trait locus discovery.
Nature methods
2015; 12 (11): 1061-3
Abstract
Allele-specific sequencing reads provide a powerful signal for identifying molecular quantitative trait loci (QTLs), but they are challenging to analyze and are prone to technical artifacts. Here we describe WASP, a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs. Using simulated reads, RNA-seq reads and chromatin immunoprecipitation sequencing (ChIP-seq) reads, we demonstrate that WASP has a low error rate and is far more powerful than existing QTL-mapping approaches.
View details for DOI 10.1038/nmeth.3582
View details for PubMedID 26366987
View details for PubMedCentralID PMC4626402
-
Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions
CELL
2015; 162 (5): 1051-1065
Abstract
Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.
View details for DOI 10.1016/j.cell.2015.07.048
View details for Web of Science ID 000360589900015
View details for PubMedCentralID PMC4556133
-
Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions.
Cell
2015; 162 (5): 1051-1065
Abstract
Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.
View details for DOI 10.1016/j.cell.2015.07.048
View details for PubMedID 26300125
-
The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
SCIENCE
2015; 348 (6235): 648-660
Abstract
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
View details for DOI 10.1126/science.1262110
View details for Web of Science ID 000354045700036
View details for PubMedCentralID PMC4547484
-
Effect of predicted protein-truncating genetic variants on the human transcriptome
SCIENCE
2015; 348 (6235): 666-669
Abstract
Accurate prediction of the functional effect of genetic variation is critical for clinical genome interpretation. We systematically characterized the transcriptome effects of protein-truncating variants, a class of variants expected to have profound effects on gene function, using data from the Genotype-Tissue Expression (GTEx) and Geuvadis projects. We quantitated tissue-specific and positional effects on nonsense-mediated transcript decay and present an improved predictive model for this decay. We directly measured the effect of variants both proximal and distal to splice junctions. Furthermore, we found that robustness to heterozygous gene inactivation is not due to dosage compensation. Our results illustrate the value of transcriptome data in the functional interpretation of genetic variants.
View details for DOI 10.1126/science.1261877
View details for Web of Science ID 000354045700038
View details for PubMedCentralID PMC4537935
-
Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature
PLOS GENETICS
2015; 11 (5)
Abstract
Renewable in vitro cell cultures, such as lymphoblastoid cell lines (LCLs), have facilitated studies that contributed to our understanding of genetic influence on human traits. However, the degree to which cell lines faithfully maintain differences in donor-specific phenotypes is still debated. We have previously reported that standard cell line maintenance practice results in a loss of donor-specific gene expression signatures in LCLs. An alternative to the LCL model is the induced pluripotent stem cell (iPSC) system, which carries the potential to model tissue-specific physiology through the use of differentiation protocols. Still, existing LCL banks represent an important source of starting material for iPSC generation, and it is possible that the disruptions in gene regulation associated with long-term LCL maintenance could persist through the reprogramming process. To address this concern, we studied the effect of reprogramming mature LCL cultures from six unrelated donors to iPSCs on the ensuing gene expression patterns within and between individuals. We show that the reprogramming process results in a recovery of donor-specific gene regulatory signatures, increasing the number of genes with a detectable donor effect by an order of magnitude. The proportion of variation in gene expression statistically attributed to donor increases from 6.9% in LCLs to 24.5% in iPSCs (P < 10-15). Since environmental contributions are unlikely to be a source of individual variation in our system of highly passaged cultured cell lines, our observations suggest that the effect of genotype on gene regulation is more pronounced in iPSCs than in LCLs. Our findings indicate that iPSCs can be a powerful model system for studies of phenotypic variation across individuals in general, and the genetic association with variation in gene regulation in particular. We further conclude that LCLs are an appropriate starting material for iPSC generation.
View details for DOI 10.1371/journal.pgen.1005216
View details for Web of Science ID 000355305200032
View details for PubMedID 25950834
View details for PubMedCentralID PMC4423863
-
Sharing and Specificity of Co-expression Networks across 35 Human Tissues.
PLoS computational biology
2015; 11 (5)
Abstract
To understand the regulation of tissue-specific gene expression, the GTEx Consortium generated RNA-seq expression data for more than thirty distinct human tissues. This data provides an opportunity for deriving shared and tissue specific gene regulatory networks on the basis of co-expression between genes. However, a small number of samples are available for a majority of the tissues, and therefore statistical inference of networks in this setting is highly underpowered. To address this problem, we infer tissue-specific gene co-expression networks for 35 tissues in the GTEx dataset using a novel algorithm, GNAT, that uses a hierarchy of tissues to share data between related tissues. We show that this transfer learning approach increases the accuracy with which networks are learned. Analysis of these networks reveals that tissue-specific transcription factors are hubs that preferentially connect to genes with tissue specific functions. Additionally, we observe that genes with tissue-specific functions lie at the peripheries of our networks. We identify numerous modules enriched for Gene Ontology functions, and show that modules conserved across tissues are especially likely to have functions common to all tissues, while modules that are upregulated in a particular tissue are often instrumental to tissue-specific function. Finally, we provide a web tool, available at mostafavilab.stat.ubc.ca/GNAT, which allows exploration of gene function and regulation in a tissue-specific manner.
View details for DOI 10.1371/journal.pcbi.1004220
View details for PubMedID 25970446
View details for PubMedCentralID PMC4430528
-
Genomic variation. Impact of regulatory variation from RNA to protein.
Science
2015; 347 (6222): 664-667
Abstract
The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.
View details for DOI 10.1126/science.1260793
View details for PubMedID 25657249
-
Impact of regulatory variation from RNA to protein
SCIENCE
2015; 347 (6222): 664-667
Abstract
The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.
View details for DOI 10.1126/science.1260793
View details for Web of Science ID 000349145200045
-
The Genetic and Mechanistic Basis for Variation in Gene Regulation
PLOS GENETICS
2015; 11 (1)
Abstract
It is now well established that noncoding regulatory variants play a central role in the genetics of common diseases and in evolution. However, until recently, we have known little about the mechanisms by which most regulatory variants act. For instance, what types of functional elements in DNA, RNA, or proteins are most often affected by regulatory variants? Which stages of gene regulation are typically altered? How can we predict which variants are most likely to impact regulation in a given cell type? Recent studies, in many cases using quantitative trait loci (QTL)-mapping approaches in cell lines or tissue samples, have provided us with considerable insight into the properties of genetic loci that have regulatory roles. Such studies have uncovered novel biochemical regulatory interactions and led to the identification of previously unrecognized regulatory mechanisms. We have learned that genetic variation is often directly associated with variation in regulatory activities (namely, we can map regulatory QTLs, not just expression QTLs [eQTLs]), and we have taken the first steps towards understanding the causal order of regulatory events (for example, the role of pioneer transcription factors). Yet, in most cases, we still do not know how to interpret overlapping combinations of regulatory interactions, and we are still far from being able to predict how variation in regulatory mechanisms is propagated through a chain of interactions to eventually result in changes in gene expression profiles.
View details for DOI 10.1371/journal.pgen.1004857
View details for Web of Science ID 000349314600009
View details for PubMedID 25569255
View details for PubMedCentralID PMC4287341
-
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.
PloS one
2015; 10 (9): e0138030
Abstract
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
View details for DOI 10.1371/journal.pone.0138030
View details for PubMedID 26406244
View details for PubMedCentralID PMC4583425
-
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.
PloS one
2015; 10 (9)
Abstract
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
View details for DOI 10.1371/journal.pone.0138030
View details for PubMedID 26406244
View details for PubMedCentralID PMC4583425
-
Methylation QTLs Are Associated with Coordinated Changes in Transcription Factor Binding, Histone Modifications, and Gene Expression Levels
PLOS GENETICS
2014; 10 (9)
Abstract
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at ∼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.
View details for DOI 10.1371/journal.pgen.1004663
View details for Web of Science ID 000343009600059
View details for PubMedCentralID PMC4169251
-
Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels.
PLoS genetics
2014; 10 (9): e1004663
Abstract
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at ∼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.
View details for DOI 10.1371/journal.pgen.1004663
View details for PubMedID 25233095
View details for PubMedCentralID PMC4169251
-
fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
GENETICS
2014; 197 (2): 573-U207
View details for DOI 10.1534/genetics.114.164350
View details for Web of Science ID 000338697000013
-
fastSTRUCTURE: variational inference of population structure in large SNP data sets.
Genetics
2014; 197 (2): 573-89
Abstract
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.
View details for DOI 10.1534/genetics.114.164350
View details for PubMedID 24700103
View details for PubMedCentralID PMC4063916
-
The deleterious mutation load is insensitive to recent population history.
Nature genetics
2014; 46 (3): 220-224
Abstract
Human populations have undergone major changes in population size in the past 100,000 years, including recent rapid growth. How these demographic events have affected the burden of deleterious mutations in individuals and the frequencies of disease mutations in populations remains unclear. We use population genetic models to show that recent human demography has probably had little impact on the average burden of deleterious mutations. This prediction is supported by two exome sequence data sets showing that individuals of west African and European ancestry carry very similar burdens of damaging mutations. We further show that for many diseases, rare alleles are unlikely to contribute a large fraction of the heritable variation, and therefore the impact of recent growth is likely to be modest. However, for those diseases that have a direct impact on fitness, strongly deleterious rare mutations probably do have an important role, and recent growth will have increased their impact.
View details for DOI 10.1038/ng.2896
View details for PubMedID 24509481
-
The functional consequences of variation in transcription factor binding.
PLoS genetics
2014; 10 (3)
Abstract
One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. Using this approach, we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as "active enhancers."
View details for DOI 10.1371/journal.pgen.1004226
View details for PubMedID 24603674
View details for PubMedCentralID PMC3945204
-
The chromatin architectural proteins HMGD1 and H1 bind reciprocally and have opposite effects on chromatin structure and gene regulation
BMC GENOMICS
2014; 15
Abstract
Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. While these proteins are almost certainly important for gene regulation they have been studied far less than the core histone proteins.Here we describe the genomic distributions and functional roles of two chromatin architectural proteins: histone H1 and the high mobility group protein HMGD1 in Drosophila S2 cells. Using ChIP-seq, biochemical and gene specific approaches, we find that HMGD1 binds to highly accessible regulatory chromatin and active promoters. In contrast, H1 is primarily associated with heterochromatic regions marked with repressive histone marks. We find that the ratio of HMGD1 to H1 binding is a better predictor of gene activity than either protein by itself, which suggests that reciprocal binding between these proteins is important for gene regulation. Using knockdown experiments, we show that HMGD1 and H1 affect the occupancy of the other protein, change nucleosome repeat length and modulate gene expression.Collectively, our data suggest that dynamic and mutually exclusive binding of H1 and HMGD1 to nucleosomes and their linker sequences may control the fluid chromatin structure that is required for transcriptional regulation. This study provides a framework to further study the interplay between chromatin architectural proteins and epigenetics in gene regulation.
View details for DOI 10.1186/1471-2164-15-92
View details for Web of Science ID 000332575900002
View details for PubMedID 24484546
View details for PubMedCentralID PMC3928079
-
Archaic humans: Four makes a party.
Nature
2014; 505 (7481): 32-4
View details for DOI 10.1038/nature12847
View details for PubMedID 24352230
-
The effect of freeze-thaw cycles on gene expression levels in lymphoblastoid cell lines.
PloS one
2014; 9 (9)
Abstract
Epstein-Barr virus (EBV) transformed lymphoblastoid cell lines (LCLs) are a widely used renewable resource for functional genomic studies in humans. The ability to accumulate multidimensional data pertaining to the same individual cell lines, from complete genomic sequences to detailed gene regulatory profiles, further enhances the utility of LCLs as a model system. However, the extent to which LCLs are a faithful model system is relatively unknown. We have previously shown that gene expression profiles of newly established LCLs maintain a strong individual component. Here, we extend our study to investigate the effect of freeze-thaw cycles on gene expression patterns in mature LCLs, especially in the context of inter-individual variation in gene expression. We report a profound difference in the gene expression profiles of newly established and mature LCLs. Once newly established LCLs undergo a freeze-thaw cycle, the individual specific gene expression signatures become much less pronounced as the gene expression levels in LCLs from different individuals converge to a more uniform profile, which reflects a mature transformed B cell phenotype. We found that previously identified eQTLs are enriched among the relatively few genes whose regulations in mature LCLs maintain marked individual signatures. We thus conclude that while insight drawn from gene regulatory studies in mature LCLs may generally not be affected by the artificial nature of the LCL model system, many aspects of primary B cell biology cannot be observed and studied in mature LCL cultures.
View details for DOI 10.1371/journal.pone.0107166
View details for PubMedID 25192014
View details for PubMedCentralID PMC4156430
-
Epigenetic modifications are associated with inter-species gene expression variation in primates
GENOME BIOLOGY
2014; 15 (12)
View details for DOI 10.1186/s13059-014-0547-3
View details for Web of Science ID 000346609500019
-
Primate Transcript and Protein Expression Levels Evolve Under Compensatory Selection Pressures
SCIENCE
2013; 342 (6162): 1100-1104
Abstract
Changes in gene regulation have likely played an important role in the evolution of primates. Differences in messenger RNA (mRNA) expression levels across primates have often been documented; however, it is not yet known to what extent measurements of divergence in mRNA levels reflect divergence in protein expression levels, which are probably more important in determining phenotypic differences. We used high-resolution, quantitative mass spectrometry to collect protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines and compared them to transcript expression data from the same samples. We found dozens of genes with significant expression differences between species at the mRNA level yet little or no difference in protein expression. Overall, our data suggest that protein expression levels evolve under stronger evolutionary constraint than mRNA levels.
View details for DOI 10.1126/science.1242379
View details for Web of Science ID 000327518600059
View details for PubMedID 24136357
-
Identification of Genetic Variants That Affect Histone Modifications in Human Cells
SCIENCE
2013; 342 (6159): 747-749
Abstract
Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines (LCLs). In many cases, the same variant is associated with quantitative changes in multiple histone marks and Pol II, as well as in deoxyribonuclease I sensitivity and nucleosome positioning. Transcription factor binding site polymorphisms are correlated overall with differences in local histone modification, and we identify specific transcription factors whose binding leads to histone modification in LCLs. Furthermore, variants that affect chromatin at distal regulatory sites frequently also direct changes in chromatin and gene expression at associated promoters.
View details for DOI 10.1126/science.1242429
View details for Web of Science ID 000326647600046
View details for PubMedID 24136359
-
The Genotype-Tissue Expression (GTEx) project
NATURE GENETICS
2013; 45 (6): 580-585
Abstract
Genome-wide association studies have identified thousands of loci for common diseases, but, for the majority of these, the mechanisms underlying disease susceptibility remain unknown. Most associated variants are not correlated with protein-coding changes, suggesting that polymorphisms in regulatory regions probably contribute to many disease phenotypes. Here we describe the Genotype-Tissue Expression (GTEx) project, which will establish a resource database and associated tissue bank for the scientific community to study the relationship between genetic variation and gene expression in human tissues.
View details for DOI 10.1038/ng.2653
View details for Web of Science ID 000319563900002
View details for PubMedID 23715323