Jonathan Pritchard
Bing Professor of Population Studies, Professor of Genetics and Biology
Bio
Jonathan Pritchard grew up in England before moving to Pennsylvania during high school. He received his BSc in Biology and Mathematics from Penn State University in 1994, and his PhD in Biology at Stanford in 1998. After that he moved to a postdoc in the Department of Statistics at Oxford University and then to his first faculty job at the University of Chicago in 2001. Pritchard returned to Stanford University in 2013, where he is now a Professor in the Departments of Biology and Genetics.
Administrative Appointments
-
Investigator, Howard Hughes Medical Institute (2008 - 2019)
-
Co-Director, Stanford's Center for Computational, Evolutionary and Human Genomics (2017 - Present)
Current Research and Scholarly Interests
My group has expertise in the development of new statistical methods for genetic analysis and in their application to genomic data from humans and other organisms. We focus on questions relating to genetic variation and evolution: How does genetic variation impact phenotypic traits and evolution, both at the organismal and cellular level? What can we learn from genome sequences of modern and ancient humans about the relationships among human populations, and the the nature of adaptation in these populations?
We often work on problems where there are no off-the-shelf statistical methods. Thus, an important part of our work is in developing appropriate statistical and computational approaches that can yield new insights into biological data. In the past, we have made important contributions to a variety of problems in human population genetics, including methods for complex trait mapping, inference of population structure and history, and studies of natural selection. We have a strong track record of producing user-friendly resources that are widely used in the community, and in applied data analysis to tackle important biological questions. Notably, our Structure algorithm and software package for inferring population structure from genetic data have received >30,000 total citations spread across several papers.
Since 2008 an important emphasis of my group has focused on understanding gene regulation, and in particular how genetic variation may impact regulation. Ultimately, we would like to be able to predict which noncoding variants in the genome are likely to have regulatory effects in any given cell type, and how these link to phenotypic variation and disease. My lab has been deeply involved in developing new computational methods to interpret various types of modern genomic assays and in linking these to genetic variation.
Secondly, we have had a major focus on understanding the genetic architecture of complex traits, and the implications for understanding evolution. We have argued that much--if not most--evolution in humans likely proceeds through a process that we call "polygenic adaptation" in which populations evolve through small allele frequency shifts at many loci.
We have also written extensively about conceptual models for understanding the genetic architecture of trait variation (Boyle et al, 2017). We have argued that the data are consistent with a model in which essentially every regulatory variant in disease-relevant cell types can affect risk, and proposed that most of these effects act through trans-regulatory networks. Testing this model is an ongoing focus of our work.
2024-25 Courses
- Advanced Genetics
GENE 205 (Win) - Biology PhD Lab Rotation
BIO 299 (Aut) - Culture, Evolution, and Society
HUMBIO 2B (Aut) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win) -
Independent Studies (11)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr) - Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr) - Directed Reading in Biology
BIO 198 (Aut, Win, Spr) - Directed Reading in Genetics
GENE 299 (Aut, Win, Spr) - Graduate Research
BIO 300 (Aut, Win, Spr) - Graduate Research
GENE 399 (Aut, Win, Spr) - Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr) - Medical Scholars Research
GENE 370 (Aut, Win, Spr) - Supervised Study
GENE 260 (Aut, Win, Spr) - Undergraduate Research
BIO 199 (Aut, Win, Spr) - Undergraduate Research
GENE 199 (Aut, Win, Spr)
- Biomedical Informatics Teaching Methods
-
Prior Year Courses
2023-24 Courses
- Advanced Genetics
GENE 205 (Win) - Biology PhD Lab Rotation
BIO 299 (Aut, Win, Spr) - Culture, Evolution, and Society
HUMBIO 2B (Aut) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win)
2022-23 Courses
- Advanced Genetics
GENE 205 (Win) - Genomic approaches to the study of human disease
BIO 127, BIO 247, GENE 247 (Win)
2021-22 Courses
- Advanced Genetics
GENE 205 (Win) - Genomic approaches to the study of human disease
BIO 247, GENE 247 (Win)
- Advanced Genetics
Stanford Advisees
-
Doctoral Dissertation Reader (AC)
Javier Blanco, Joleen Cheah, Connor Duffy, Olivia Ghosh, Maike Morrison, Alex Starr, Adele Xu -
Postdoctoral Faculty Sponsor
Emma Dann, Alyssa Lyn Fortier, Mineto Ota -
Doctoral Dissertation Advisor (AC)
Matthew Aguirre, Tami Gjorgjieva, Jon Judd, Nikhil Milind, Courtney Smith, Tony Zeng, Julie Zhu -
Doctoral Dissertation Co-Advisor (AC)
Alvina Adimoelja
Graduate and Fellowship Programs
-
Biology (School of Humanities and Sciences) (Phd Program)
-
Biomedical Informatics (Phd Program)
All Publications
-
Landscape of stimulation-responsive chromatin across diverse human immune cells.
Nature genetics
2019
Abstract
A hallmark of the immune system is the interplay among specialized cell types transitioning between resting and stimulated states. The gene regulatory landscape of this dynamic system has not been fully characterized in human cells. Here we collected assay for transposase-accessible chromatin using sequencing (ATAC-seq) and RNA sequencing data under resting and stimulated conditions for up to 32 immune cell populations. Stimulation caused widespread chromatin remodeling, including response elements shared between stimulated B and T cells. Furthermore, several autoimmune traits showed significant heritability in stimulation-responsive elements from distinct cell types, highlighting the importance of these cell states in autoimmunity. Allele-specific read mapping identified variants that alter chromatin accessibility in particular conditions, allowing us to observe evidence of function for a candidate causal variant that is undetected by existing large-scale studies in resting cells. Our results provide a resource of chromatin dynamics and highlight the need to characterize the effects of genetic variation in stimulated cells.
View details for DOI 10.1038/s41588-019-0505-9
View details for PubMedID 31570894
-
Public Discussion Affects Question Asking at Academic Conferences.
American journal of human genetics
2019
Abstract
Women are under-represented in science, technology, engineering, and mathematics (STEM). Despite the recent emphasis on diversity in STEM, our understanding of what drives differences between women and men scientists remains limited. This, in turn, limits our ability to intervene to level the playing field. To quantify the representation and participation of women and men at academic meetings in human genetics, we developed high-throughput and crowd-sourced approaches focused on question-asking behavior. Question asking is one voluntary and self-initiated scientific activity we can measure. Here we report that women ask fewer questions than expected regardless of their representation in talk audiences. We present evidence that external barriers affect the representation of women in STEM. However, differences in question-asking behavior suggest that internal factors also impact women's participation. We then examine the effects of specific interventions and show that wide public discussion of the relative under-participation of women in question-and-answer sessions alters question-asking behavior. We suggest that engaging the community in such projects promotes visibility of diversity issues at academic meetings and allows for efficient data collection that can be used to further explore and understand differences in conference participation.
View details for DOI 10.1016/j.ajhg.2019.06.004
View details for PubMedID 31256875
-
Trans Effects on Gene Expression Can Drive Omnigenic Inheritance.
Cell
2019; 177 (4): 1022
Abstract
Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.
View details for PubMedID 31051098
-
Reduced signal for polygenic adaptation of height in UK Biobank
ELIFE
2019; 8
View details for DOI 10.7554/eLife.39725
View details for Web of Science ID 000461987200001
-
Interpreting polygenic scores, polygenic adaptation, and human phenotypic differences.
Evolution, medicine, and public health
2019; 2019 (1): 26–34
Abstract
Recent analyses of polygenic scores have opened new discussions concerning the genetic basis and evolutionary significance of differences among populations in distributions of phenotypes. Here, we highlight limitations in research on polygenic scores, polygenic adaptation and population differences. We show how genetic contributions to traits, as estimated by polygenic scores, combine with environmental contributions so that differences among populations in trait distributions need not reflect corresponding differences in genetic propensity. Under a null model in which phenotypes are selectively neutral, genetic propensity differences contributing to phenotypic differences among populations are predicted to be small. We illustrate this null hypothesis in relation to health disparities between African Americans and European Americans, discussing alternative hypotheses with selective and environmental effects. Close attention to the limitations of research on polygenic phenomena is important for the interpretation of their relationship to human population differences.
View details for PubMedID 30838127
-
High-resolution mapping of cancer cell networks using co-functional interactions.
Molecular systems biology
2018; 14 (12): e8594
Abstract
Powerful new technologies for perturbing genetic elements have recently expanded the study of genetic interactions in model systems ranging from yeast to human cell lines. However, technical artifacts can confound signal across genetic screens and limit the immense potential of parallel screening approaches. To address this problem, we devised a novel PCA-based method for correcting genome-wide screening data, bolstering the sensitivity and specificity of detection for genetic interactions. Applying this strategy to a set of 436 whole genome CRISPR screens, we report more than 1.5 million pairs of correlated "co-functional" genes that provide finer-scale information about cell compartments, biological pathways, and protein complexes than traditional gene sets. Lastly, we employed a gene community detection approach to implicate core genes for cancer growth and compress signal from functionally related genes in the same community into a single score. This work establishes new algorithms for probing cancer cell networks and motivates the acquisition of further CRISPR screen data across diverse genotypes and cell types to further resolve complex cellular processes.
View details for PubMedID 30573688
-
Evidence for Weak Selective Constraint on Human Gene Expression.
Genetics
2018
Abstract
Gene expression variation is a major contributor to phenotypic variation in human complex traits. Selection on complex traits may therefore be reflected in constraint on gene expression. Here, we explore the effects of stabilizing selection on cis-regulatory genetic variation in humans. We analyze patterns of expression variation at copy number variants and find evidence for selection against large increases in gene expression. Using allele-specific expression (ASE) data, we further show evidence of selection against smaller-effect variants. We estimate that, across all genes, singletons in a sample of 122 individuals have approximately 2.2* greater effects on expression variation than the average variant across allele frequencies. Despite their increased effect size relative to common variants, we estimate that singletons in the sample studied explain, on average, only 5% of the heritability of gene expression from cis-regulatory variants. Finally, we show that genes depleted for loss-of-function variants are also depleted for cis-eQTLs and have low levels of allelic imbalance, confirming tighter constraint on the expression levels of these genes. We conclude that constraint on gene expression is present, but has relatively weak effects on most cis-regulatory variants, thus permitting high levels of gene-regulatory genetic variation.
View details for PubMedID 30554168
-
Functional Genetic Variants Revealed by Massively Parallel Precise Genome Editing.
Cell
2018
Abstract
A major challenge in genetics is to identify genetic variants driving natural phenotypic variation. However, current methods of genetic mapping havelimited resolution. To address this challenge, we developed aCRISPR-Cas9-based high-throughput genome editing approach that can introduce thousands of specific genetic variants in a single experiment. This enabled us to study the fitness consequences of 16,006 natural genetic variants in yeast. We identified 572 variants with significant fitness differences in glucose media; these are highly enriched in promoters, particularly in transcription factor binding sites, while only 19.2% affect amino acid sequences. Strikingly, nearby variants nearly always favor the same parent's alleles, suggesting that lineage-specific selection is often driven by multiple clusteredvariants. In sum, our genome editing approach reveals the genetic architecture of fitness variation at single-base resolution and could be adapted tomeasure the effects of genome-wide genetic variation in any screen for cell survival or cell-sortable markers.
View details for PubMedID 30245013
-
Post-translational buffering leads to convergent protein expression levels between primates
GENOME BIOLOGY
2018; 19: 83
Abstract
Differences in gene regulation between human and closely related species influence phenotypes that are distinctly human. While gene regulation is a multi-step process, the majority of research concerning divergence in gene regulation among primates has focused on transcription.To gain a comprehensive view of gene regulation, we surveyed genome-wide ribosome occupancy, which reflects levels of protein translation, in lymphoblastoid cell lines derived from human, chimpanzee, and rhesus macaque. We further integrated messenger RNA and protein level measurements collected from matching cell lines. We find that, in addition to transcriptional regulation, the major factor determining protein level divergence between human and closely related species is post-translational buffering. Inter-species divergence in transcription is generally propagated to the level of protein translation. In contrast, gene expression divergence is often attenuated post-translationally, potentially mediated through post-translational modifications.Results from our analysis indicate that post-translational buffering is a conserved mechanism that led to relaxation of selective constraint on transcript levels in humans.
View details for PubMedID 29950183
-
Determining the genetic basis of anthracycline-cardiotoxicity by response QTL mapping in induced cardiomyocytes.
eLife
2018; 7
Abstract
Anthracycline-induced cardiotoxicity (ACT) is a key limiting factor in setting optimal chemotherapy regimes, with almost half of patients expected to develop congestive heart failure given high doses. However, the genetic basis of sensitivity to anthracyclines remains unclear. We created a panel of iPSC-derived cardiomyocytes from 45 individuals and performed RNA-seq after 24h exposure to varying doxorubicin dosages. The transcriptomic response is substantial: the majority of genes are differentially expressed and over 6000 genes show evidence of differential splicing, the later driven by reduced splicing fidelity in the presence of doxorubicin. We show that inter-individual variation in transcriptional response is predictive of in vitro cell damage, which in turn is associated with in vivo ACT risk. We detect 447 response-expression QTLs and 42 response-splicing QTLs, which are enriched in lower ACT GWAS p-values, supporting the in vivo relevance of our map of genetic regulation of cellular response to anthracyclines.
View details for PubMedID 29737278
-
Remodeling the Specificity of an Endosomal CORVET Tether Underlies Formation of Regulated Secretory Vesicles in the Ciliate Tetrahymena thermophila
CURRENT BIOLOGY
2018; 28 (5): 697-+
Abstract
In the endocytic pathway of animals, two related complexes, called CORVET (class C core vacuole/endosome transport) and HOPS (homotypic fusion and protein sorting), act as both tethers and fusion factors for early and late endosomes, respectively. Mutations in CORVET or HOPS lead to trafficking defects and contribute to human disease, including immune dysfunction. HOPS and CORVET are conserved throughout eukaryotes, but remarkably, in the ciliate Tetrahymena thermophila, the HOPS-specific subunits are absent, while CORVET-specific subunits have proliferated. VPS8 (vacuolar protein sorting), a CORVET subunit, expanded to 6 paralogs in Tetrahymena. This expansion correlated with loss of HOPS within a ciliate subgroup, including the Oligohymenophorea, which contains Tetrahymena. As uncovered via forward genetics, a single VPS8 paralog in Tetrahymena (VPS8A) is required to synthesize prominent secretory granules called mucocysts. More specifically, Δvps8a cells fail to deliver a subset of cargo proteins to developing mucocysts, instead accumulating that cargo in vesicles also bearing the mucocyst-sorting receptor Sor4p. Surprisingly, although this transport step relies on CORVET, it does not appear to involve early endosomes. Instead, Vps8a associates with the late endosomal/lysosomal marker Rab7, indicating that target specificity switching occurred in CORVET subunits during the evolution of ciliates. Mucocysts belong to a markedly diverse and understudied class of protist secretory organelles called extrusomes. Our results underscore that biogenesis of mucocysts depends on endolysosomal trafficking, revealing parallels with invasive organelles in apicomplexan parasites and suggesting that a wide array of secretory adaptations in protists, like in animals, depend on mechanisms related to lysosome biogenesis.
View details for PubMedID 29478853
View details for PubMedCentralID PMC5840023
-
Impact of regulatory variation across human iPSCs and differentiated cells
GENOME RESEARCH
2018; 28 (1): 122–31
Abstract
Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.
View details for PubMedID 29208628
-
Annotation-free quantification of RNA splicing using LeafCutter
NATURE GENETICS
2018; 50 (1): 151-+
Abstract
The excision of introns from pre-mRNA is an essential step in mRNA processing. We developed LeafCutter to study sample and population variation in intron splicing. LeafCutter identifies variable splicing events from short-read RNA-seq data and finds events of high complexity. Our approach obviates the need for transcript annotations and circumvents the challenges in estimating relative isoform or exon usage in complex splicing events. LeafCutter can be used both to detect differential splicing between sample groups and to map splicing quantitative trait loci (sQTLs). Compared with contemporary methods, our approach identified 1.4-2.1 times more sQTLs, many of which helped us ascribe molecular effects to disease-associated variants. Transcriptome-wide associations between LeafCutter intron quantifications and 40 complex traits increased the number of associated disease genes at a 5% false discovery rate by an average of 2.1-fold compared with that detected through the use of gene expression levels alone. LeafCutter is fast, scalable, easy to use, and available online.
View details for PubMedID 29229983
View details for PubMedCentralID PMC5742080
-
Large-Scale Clonal Analysis Resolves Aging of the Mouse Hematopoietic Stem Cell Compartment.
Cell stem cell
2018; 22 (4): 600–607.e4
Abstract
Aging is linked to functional deterioration and hematological diseases. The hematopoietic system is maintained by hematopoietic stem cells (HSCs), and dysfunction within the HSC compartment is thought to be a key mechanism underlying age-related hematopoietic perturbations. Using single-cell transplantation assays with five blood-lineage analysis, we previously identified myeloid-restricted repopulating progenitors (MyRPs) within the phenotypic HSC compartment in young mice. Here, we determined the age-related functional changes to the HSC compartment using over 400 single-cell transplantation assays. Notably, MyRP frequency increased dramatically with age, while multipotent HSCs expanded modestly within the bone marrow. We also identified a subset of functional cells that were myeloid restricted in primary recipients but displayed multipotent (five blood-lineage) output in secondary recipients. We have termed this cell type latent-HSCs, which appear exclusive to the aged HSC compartment. These results question the traditional dogma of HSC aging and our current approaches to assay and define HSCs.
View details for PubMedID 29625072
-
Frequent nonallelic gene conversion on the human lineage and its effect on the divergence of gene duplicates
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2017; 114 (48): 12779–84
Abstract
Gene conversion is the copying of a genetic sequence from a "donor" region to an "acceptor." In nonallelic gene conversion (NAGC), the donor and the acceptor are at distinct genetic loci. Despite the role NAGC plays in various genetic diseases and the concerted evolution of gene families, the parameters that govern NAGC are not well characterized. Here, we survey duplicate gene families and identify converted tracts in 46% of them. These conversions reflect a large GC bias of NAGC. We develop a sequence evolution model that leverages substantially more information in duplicate sequences than used by previous methods and use it to estimate the parameters that govern NAGC in humans: a mean converted tract length of 250 bp and a probability of [Formula: see text] per generation for a nucleotide to be converted (an order of magnitude higher than the point mutation rate). Despite this high baseline rate, we show that NAGC slows down as duplicate sequences diverge-until an eventual "escape" of the sequences from its influence. As a result, NAGC has a small average effect on the sequence divergence of duplicates. This work improves our understanding of the NAGC mechanism and the role that it plays in the evolution of gene duplicates.
View details for PubMedID 29138319
-
Inferring Relevant Cell Types for Complex Traits by Using Single-Cell Gene Expression
AMERICAN JOURNAL OF HUMAN GENETICS
2017; 101 (5): 686–99
Abstract
Previous studies have prioritized trait-relevant cell types by looking for an enrichment of genome-wide association study (GWAS) signal within functional regions. However, these studies are limited in cell resolution by the lack of functional annotations from difficult-to-characterize or rare cell populations. Measurement of single-cell gene expression has become a popular method for characterizing novel cell types, and yet limited work has linked single-cell RNA sequencing (RNA-seq) to phenotypes of interest. To address this deficiency, we present RolyPoly, a regression-based polygenic model that can prioritize trait-relevant cell types and genes from GWAS summary statistics and gene expression data. RolyPoly is designed to use expression data from either bulk tissue or single-cell RNA-seq. In this study, we demonstrated RolyPoly's accuracy through simulation and validated previously known tissue-trait associations. We discovered a significant association between microglia and late-onset Alzheimer disease and an association between schizophrenia and oligodendrocytes and replicating fetal cortical cells. Additionally, RolyPoly computes a trait-relevance score for each gene to reflect the importance of expression specific to a cell type. We found that differentially expressed genes in the prefrontal cortex of individuals with Alzheimer disease were significantly enriched with genes ranked highly by RolyPoly gene scores. Overall, our method represents a powerful framework for understanding the effect of common variants on cell types contributing to complex traits.
View details for PubMedID 29106824
View details for PubMedCentralID PMC5673624
-
Quantification of transplant-derived circulating cell-free DNA in absence of a donor genotype
PLOS COMPUTATIONAL BIOLOGY
2017; 13 (8): e1005629
Abstract
Quantification of cell-free DNA (cfDNA) in circulating blood derived from a transplanted organ is a powerful approach to monitoring post-transplant injury. Genome transplant dynamics (GTD) quantifies donor-derived cfDNA (dd-cfDNA) by taking advantage of single-nucleotide polymorphisms (SNPs) distributed across the genome to discriminate donor and recipient DNA molecules. In its current implementation, GTD requires genotyping of both the transplant recipient and donor. However, in practice, donor genotype information is often unavailable. Here, we address this issue by developing an algorithm that estimates dd-cfDNA levels in the absence of a donor genotype. Our algorithm predicts heart and lung allograft rejection with an accuracy that is similar to conventional GTD. We furthermore refined the algorithm to handle closely related recipients and donors, a scenario that is common in bone marrow and kidney transplantation. We show that it is possible to estimate dd-cfDNA in bone marrow transplant patients that are unrelated or that are siblings of the donors, using a hidden Markov model (HMM) of identity-by-descent (IBD) states along the genome. Last, we demonstrate that comparing dd-cfDNA to the proportion of donor DNA in white blood cells can differentiate between relapse and the onset of graft-versus-host disease (GVHD). These methods alleviate some of the barriers to the implementation of GTD, which will further widen its clinical application.
View details for PubMedID 28771616
-
An Expanded View of Complex Traits: From Polygenic to Omnigenic
CELL
2017; 169 (7): 1177–86
Abstract
A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome-including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an "omnigenic" model.
View details for PubMedID 28622505
View details for PubMedCentralID PMC5536862
-
Rapid evolution of the human mutation spectrum
ELIFE
2017; 6
Abstract
DNA is a remarkably precise medium for copying and storing biological information. This high fidelity results from the action of hundreds of genes involved in replication, proofreading, and damage repair. Evolutionary theory suggests that in such a system, selection has limited ability to remove genetic variants that change mutation rates by small amounts or in specific sequence contexts. Consistent with this, using SNV variation as a proxy for mutational input, we report here that mutational spectra differ substantially among species, human continental groups and even some closely related populations. Close examination of one signal, an increased TCC→TTC mutation rate in Europeans, indicates a burst of mutations from about 15,000 to 2000 years ago, perhaps due to the appearance, drift, and ultimate elimination of a genetic modifier of mutation rate. Our results suggest that mutation rates can evolve markedly over short evolutionary timescales and suggest the possibility of mapping mutational modifiers.
View details for DOI 10.7554/eLife.24284
View details for Web of Science ID 000401409000001
View details for PubMedID 28440220
-
Tracing the peopling of the world through genomics.
Nature
2017; 541 (7637): 302-310
Abstract
Advances in the sequencing and the analysis of the genomes of both modern and ancient peoples have facilitated a number of breakthroughs in our understanding of human evolutionary history. These include the discovery of interbreeding between anatomically modern humans and extinct hominins; the development of an increasingly detailed description of the complex dispersal of modern humans out of Africa and their population expansion worldwide; and the characterization of many of the genetic adaptions of humans to local environmental conditions. Our interpretation of the evolutionary history and adaptation of humans is being transformed by analyses of these new genomic data.
View details for DOI 10.1038/nature21347
View details for PubMedID 28102248
-
Batch effects and the effective design of single-cell gene expression studies
SCIENTIFIC REPORTS
2017; 7
Abstract
Single-cell RNA sequencing (scRNA-seq) can be used to characterize variation in gene expression levels at high resolution. However, the sources of experimental noise in scRNA-seq are not yet well understood. We investigated the technical variation associated with sample processing using the single-cell Fluidigm C1 platform. To do so, we processed three C1 replicates from three human induced pluripotent stem cell (iPSC) lines. We added unique molecular identifiers (UMIs) to all samples, to account for amplification bias. We found that the major source of variation in the gene expression data was driven by genotype, but we also observed substantial variation between the technical replicates. We observed that the conversion of reads to molecules using the UMIs was impacted by both biological and technical variation, indicating that UMI counts are not an unbiased estimator of gene expression levels. Based on our results, we suggest a framework for effective scRNA-seq studies.
View details for DOI 10.1038/srep39921
View details for Web of Science ID 000391022000001
View details for PubMedID 28045081
View details for PubMedCentralID PMC5206706
-
Mutation Rate Variation is a Primary Determinant of the Distribution of Allele Frequencies in Humans
PLOS GENETICS
2016; 12 (12)
Abstract
The site frequency spectrum (SFS) has long been used to study demographic history and natural selection. Here, we extend this summary by examining the SFS conditional on the alleles found at the same site in other species. We refer to this extension as the "phylogenetically-conditioned SFS" or cSFS. Using recent large-sample data from the Exome Aggregation Consortium (ExAC), combined with primate genome sequences, we find that human variants that occurred independently in closely related primate lineages are at higher frequencies in humans than variants with parallel substitutions in more distant primates. We show that this effect is largely due to sites with elevated mutation rates causing significant departures from the widely-used infinite sites mutation model. Our analysis also suggests substantial variation in mutation rates even among mutations involving the same nucleotide changes. In summary, we show that variable mutation rates are key determinants of the SFS in humans.
View details for DOI 10.1371/journal.pgen.1006489
View details for Web of Science ID 000392138700034
View details for PubMedID 27977673
View details for PubMedCentralID PMC5157949
-
A Bibliometric History of the Journal GENETICS
GENETICS
2016; 204 (4): 1337-1342
View details for DOI 10.1534/genetics.116.196964
View details for Web of Science ID 000390765500004
View details for PubMedID 27927899
View details for PubMedCentralID PMC5161266
-
Detection of human adaptation during the past 2000 years.
Science
2016
Abstract
Detection of recent natural selection is a challenging problem in population genetics. Here we introduce the singleton density score (SDS), a method to infer very recent changes in allele frequencies from contemporary genome sequences. Applied to data from the UK10K Project, SDS reflects allele frequency changes in the ancestors of modern Britons during the past ~2000 to 3000 years. We see strong signals of selection at lactase and the major histocompatibility complex, and in favor of blond hair and blue eyes. For polygenic adaptation, we find that recent selection for increased height has driven allele frequency shifts across most of the genome. Moreover, we identify shifts associated with other complex traits, suggesting that polygenic adaptation has played a pervasive role in shaping genotypic and phenotypic variation in modern humans.
View details for PubMedID 27738015
-
Lineage-specific and single-cell chromatin accessibility charts human hematopoiesis and leukemia evolution.
Nature genetics
2016; 48 (10): 1193-1203
Abstract
We define the chromatin accessibility and transcriptional landscapes in 13 human primary blood cell types that span the hematopoietic hierarchy. Exploiting the finding that the enhancer landscape better reflects cell identity than mRNA levels, we enable 'enhancer cytometry' for enumeration of pure cell types from complex populations. We identify regulators governing hematopoietic differentiation and further show the lineage ontogeny of genetic elements linked to diverse human diseases. In acute myeloid leukemia (AML), chromatin accessibility uncovers unique regulatory evolution in cancer cells with a progressively increasing mutation burden. Single AML cells exhibit distinctive mixed regulome profiles corresponding to disparate developmental stages. A method to account for this regulatory heterogeneity identified cancer-specific deviations and implicated HOX factors as key regulators of preleukemic hematopoietic stem cell characteristics. Thus, regulome dynamics can provide diverse insights into hematopoietic development and disease.
View details for DOI 10.1038/ng.3646
View details for PubMedID 27526324
-
Genetic variation in MHC proteins is associated with T cell receptor expression biases.
Nature genetics
2016; 48 (9): 995-1002
Abstract
In each individual, a highly diverse T cell receptor (TCR) repertoire interacts with peptides presented by major histocompatibility complex (MHC) molecules. Despite extensive research, it remains controversial whether germline-encoded TCR-MHC contacts promote TCR-MHC specificity and, if so, whether differences exist in TCR V gene compatibilities with different MHC alleles. We applied expression quantitative trait locus (eQTL) mapping to test for associations between genetic variation and TCR V gene usage in a large human cohort. We report strong trans associations between variation in the MHC locus and TCR V gene usage. Fine-mapping of the association signals identifies specific amino acids from MHC genes that bias V gene usage, many of which contact or are spatially proximal to the TCR or peptide in the TCR-peptide-MHC complex. Hence, these MHC variants, several of which are linked to autoimmune diseases, can directly affect TCR-MHC interaction. These results provide the first examples of trans-QTL effects mediated by protein-protein interactions and are consistent with intrinsic TCR-MHC specificity.
View details for DOI 10.1038/ng.3625
View details for PubMedID 27479906
View details for PubMedCentralID PMC5010864
-
Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice.
Nature genetics
2016; 48 (8): 919-926
Abstract
Although mice are the most widely used mammalian model organism, genetic studies have suffered from limited mapping resolution due to extensive linkage disequilibrium (LD) that is characteristic of crosses among inbred strains. Carworth Farms White (CFW) mice are a commercially available outbred mouse population that exhibit rapid LD decay in comparison to other available mouse populations. We performed a genome-wide association study (GWAS) of behavioral, physiological and gene expression phenotypes using 1,200 male CFW mice. We used genotyping by sequencing (GBS) to obtain genotypes at 92,734 SNPs. We also measured gene expression using RNA sequencing in three brain regions. Our study identified numerous behavioral, physiological and expression quantitative trait loci (QTLs). We integrated the behavioral QTL and eQTL results to implicate specific genes, including Azi2 in sensitivity to methamphetamine and Zmynd11 in anxiety-like behavior. The combination of CFW mice, GBS and RNA sequencing constitutes a powerful approach to GWAS in mice.
View details for DOI 10.1038/ng.3609
View details for PubMedID 27376237
View details for PubMedCentralID PMC4963286
-
Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling.
eLife
2016; 5
Abstract
Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.
View details for DOI 10.7554/eLife.13328
View details for PubMedID 27232982
View details for PubMedCentralID PMC4940163
-
Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals
SCIENCE
2016; 352 (6288): 1009-1013
Abstract
Gene duplication is a fundamental process in genome evolution. However, most young duplicates are degraded by loss-of-function mutations, and the factors that allow some duplicate pairs to survive long-term remain controversial. One class of models to explain duplicate retention invokes sub- or neofunctionalization, whereas others focus on sharing of gene dosage. RNA-sequencing data from 46 human and 26 mouse tissues indicate that subfunctionalization of expression evolves slowly and is rare among duplicates that arose within the placental mammals, possibly because tandem duplicates are coregulated by shared genomic elements. Instead, consistent with the dosage-sharing hypothesis, most young duplicates are down-regulated to match expression levels of single-copy genes. Thus, dosage sharing of expression allows for the initial survival of mammalian duplicates, followed by slower functional adaptation enabling long-term preservation.
View details for DOI 10.1126/science.aad8411
View details for Web of Science ID 000376147800053
View details for PubMedID 27199432
-
RNA splicing is a primary link between genetic variation and disease
SCIENCE
2016; 352 (6285): 600-604
Abstract
Noncoding variants play a central role in the genetics of complex traits, but we still lack a full understanding of the molecular pathways through which they act. We quantified the contribution of cis-acting genetic effects at all major stages of gene regulation from chromatin to proteins, in Yoruba lymphoblastoid cell lines (LCLs). About ~65% of expression quantitative trait loci (eQTLs) have primary effects on chromatin, whereas the remaining eQTLs are enriched in transcribed regions. Using a novel method, we also detected 2893 splicing QTLs, most of which have little or no effect on gene-level expression. These splicing QTLs are major contributors to complex traits, roughly on a par with variants that affect gene expression levels. Our study provides a comprehensive view of the mechanisms linking genetic variation to variation in human gene regulation.
View details for DOI 10.1126/science.aad9417
View details for Web of Science ID 000374998600048
View details for PubMedID 27126046
-
Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs.
PLoS genetics
2016; 12 (1)
Abstract
The advent of induced pluripotent stem cells (iPSCs) revolutionized human genetics by allowing us to generate pluripotent cells from easily accessible somatic tissues. This technology can have immense implications for regenerative medicine, but iPSCs also represent a paradigm shift in the study of complex human phenotypes, including gene regulation and disease. Yet, an unresolved caveat of the iPSC model system is the extent to which reprogrammed iPSCs retain residual phenotypes from their precursor somatic cells. To directly address this issue, we used an effective study design to compare regulatory phenotypes between iPSCs derived from two types of commonly used somatic precursor cells. We find a remarkably small number of differences in DNA methylation and gene expression levels between iPSCs derived from different somatic precursors. Instead, we demonstrate genetic variation is associated with the majority of identifiable variation in DNA methylation and gene expression levels. We show that the cell type of origin only minimally affects gene expression levels and DNA methylation in iPSCs, and that genetic variation is the main driver of regulatory differences between iPSCs of different donors. Our findings suggest that studies using iPSCs should focus on additional individuals rather than clones from the same individual.
View details for DOI 10.1371/journal.pgen.1005793
View details for PubMedID 26812582
-
Abundant contribution of short tandem repeats to gene expression variation in humans.
Nature genetics
2016; 48 (1): 22-9
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
View details for DOI 10.1038/ng.3461
View details for PubMedID 26642241
View details for PubMedCentralID PMC4909355
-
Genetic Variation, Not Cell Type of Origin, Underlies the Majority of Identifiable Regulatory Differences in iPSCs
PLOS GENETICS
2016; 12 (1)
Abstract
The advent of induced pluripotent stem cells (iPSCs) revolutionized human genetics by allowing us to generate pluripotent cells from easily accessible somatic tissues. This technology can have immense implications for regenerative medicine, but iPSCs also represent a paradigm shift in the study of complex human phenotypes, including gene regulation and disease. Yet, an unresolved caveat of the iPSC model system is the extent to which reprogrammed iPSCs retain residual phenotypes from their precursor somatic cells. To directly address this issue, we used an effective study design to compare regulatory phenotypes between iPSCs derived from two types of commonly used somatic precursor cells. We find a remarkably small number of differences in DNA methylation and gene expression levels between iPSCs derived from different somatic precursors. Instead, we demonstrate genetic variation is associated with the majority of identifiable variation in DNA methylation and gene expression levels. We show that the cell type of origin only minimally affects gene expression levels and DNA methylation in iPSCs, and that genetic variation is the main driver of regulatory differences between iPSCs of different donors. Our findings suggest that studies using iPSCs should focus on additional individuals rather than clones from the same individual.
View details for DOI 10.1371/journal.pgen.1005793
View details for Web of Science ID 000369368200031
View details for PubMedCentralID PMC4727884
-
Abundant contribution of short tandem repeats to gene expression variation in humans
NATURE GENETICS
2016; 48 (1): 22-?
Abstract
The contribution of repetitive elements to quantitative human traits is largely unknown. Here we report a genome-wide survey of the contribution of short tandem repeats (STRs), which constitute one of the most polymorphic and abundant repeat classes, to gene expression in humans. Our survey identified 2,060 significant expression STRs (eSTRs). These eSTRs were replicable in orthogonal populations and expression assays. We used variance partitioning to disentangle the contribution of eSTRs from that of linked SNPs and indels and found that eSTRs contribute 10-15% of the cis heritability mediated by all common variants. Further functional genomic analyses showed that eSTRs are enriched in conserved regions, colocalize with regulatory elements and may modulate certain histone modifications. By analyzing known genome-wide association study (GWAS) signals and searching for new associations in 1,685 whole genomes from deeply phenotyped individuals, we found that eSTRs are enriched in various clinically relevant conditions. These results highlight the contribution of STRs to the genetic architecture of quantitative human traits.
View details for DOI 10.1038/ng.3461
View details for Web of Science ID 000367255300009
View details for PubMedCentralID PMC4909355
-
Whole Genome Sequencing Identifies a Novel Factor Required for Secretory Granule Maturation in Tetrahymena thermophila.
G3 (Bethesda, Md.)
2016; 6 (8): 2505-2516
Abstract
Unbiased genetic approaches have a unique ability to identify novel genes associated with specific biological pathways. Thanks to next generation sequencing, forward genetic strategies can be expanded to a wider range of model organisms. The formation of secretory granules, called mucocysts, in the ciliate Tetrahymena thermophila relies, in part, on ancestral lysosomal sorting machinery, but is also likely to involve novel factors. In prior work, multiple strains with defects in mucocyst biogenesis were generated by nitrosoguanidine mutagenesis, and characterized using genetic and cell biological approaches, but the genetic lesions themselves were unknown. Here, we show that analyzing one such mutant by whole genome sequencing reveals a novel factor in mucocyst formation. Strain UC620 has both morphological and biochemical defects in mucocyst maturation-a process analogous to dense core granule maturation in animals. Illumina sequencing of a pool of UC620 F2 clones identified a missense mutation in a novel gene called MMA1 (Mucocyst maturation). The defects in UC620 were rescued by expression of a wild-type copy of MMA1, and disrupting MMA1 in an otherwise wild-type strain phenocopies UC620. The product of MMA1, characterized as a CFP-tagged copy, encodes a large soluble cytosolic protein. A small fraction of Mma1p-CFP is pelletable, which may reflect association with endosomes. The gene has no identifiable homologs except in other Tetrahymena species, and therefore represents an evolutionarily recent innovation that is required for granule maturation.
View details for DOI 10.1534/g3.116.028878
View details for PubMedID 27317773
View details for PubMedCentralID PMC4978903
-
WASP: allele-specific software for robust molecular quantitative trait locus discovery
NATURE METHODS
2015; 12 (11): 1061-1063
View details for DOI 10.1038/NMETH.3582
View details for PubMedID 26366987
-
Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions
CELL
2015; 162 (5): 1051-1065
Abstract
Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.
View details for DOI 10.1016/j.cell.2015.07.048
View details for Web of Science ID 000360589900015
View details for PubMedCentralID PMC4556133
-
Genetic Control of Chromatin States in Humans Involves Local and Distal Chromosomal Interactions.
Cell
2015; 162 (5): 1051-1065
Abstract
Deciphering the impact of genetic variants on gene regulation is fundamental to understanding human disease. Although gene regulation often involves long-range interactions, it is unknown to what extent non-coding genetic variants influence distal molecular phenotypes. Here, we integrate chromatin profiling for three histone marks in lymphoblastoid cell lines (LCLs) from 75 sequenced individuals with LCL-specific Hi-C and ChIA-PET-based chromatin contact maps to uncover one of the largest collections of local and distal histone quantitative trait loci (hQTLs). Distal QTLs are enriched within topologically associated domains and exhibit largely concordant variation of chromatin state coordinated by proximal and distal non-coding genetic variants. Histone QTLs are enriched for common variants associated with autoimmune diseases and enable identification of putative target genes of disease-associated variants from genome-wide association studies. These analyses provide insights into how genetic variation can affect human disease phenotypes by coordinated changes in chromatin at interacting regulatory elements.
View details for DOI 10.1016/j.cell.2015.07.048
View details for PubMedID 26300125
-
The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans
SCIENCE
2015; 348 (6235): 648-660
Abstract
Understanding the functional consequences of genetic variation, and how it affects complex human disease and quantitative traits, remains a critical challenge for biomedicine. We present an analysis of RNA sequencing data from 1641 samples across 43 tissues from 175 individuals, generated as part of the pilot phase of the Genotype-Tissue Expression (GTEx) project. We describe the landscape of gene expression across tissues, catalog thousands of tissue-specific and shared regulatory expression quantitative trait loci (eQTL) variants, describe complex network relationships, and identify signals from genome-wide association studies explained by eQTLs. These findings provide a systematic understanding of the cellular and biological consequences of human genetic variation and of the heterogeneity of such effects among a diverse set of human tissues.
View details for DOI 10.1126/science.1262110
View details for Web of Science ID 000354045700036
View details for PubMedCentralID PMC4547484
-
Reprogramming LCLs to iPSCs Results in Recovery of Donor-Specific Gene Expression Signature
PLOS GENETICS
2015; 11 (5)
Abstract
Renewable in vitro cell cultures, such as lymphoblastoid cell lines (LCLs), have facilitated studies that contributed to our understanding of genetic influence on human traits. However, the degree to which cell lines faithfully maintain differences in donor-specific phenotypes is still debated. We have previously reported that standard cell line maintenance practice results in a loss of donor-specific gene expression signatures in LCLs. An alternative to the LCL model is the induced pluripotent stem cell (iPSC) system, which carries the potential to model tissue-specific physiology through the use of differentiation protocols. Still, existing LCL banks represent an important source of starting material for iPSC generation, and it is possible that the disruptions in gene regulation associated with long-term LCL maintenance could persist through the reprogramming process. To address this concern, we studied the effect of reprogramming mature LCL cultures from six unrelated donors to iPSCs on the ensuing gene expression patterns within and between individuals. We show that the reprogramming process results in a recovery of donor-specific gene regulatory signatures, increasing the number of genes with a detectable donor effect by an order of magnitude. The proportion of variation in gene expression statistically attributed to donor increases from 6.9% in LCLs to 24.5% in iPSCs (P < 10-15). Since environmental contributions are unlikely to be a source of individual variation in our system of highly passaged cultured cell lines, our observations suggest that the effect of genotype on gene regulation is more pronounced in iPSCs than in LCLs. Our findings indicate that iPSCs can be a powerful model system for studies of phenotypic variation across individuals in general, and the genetic association with variation in gene regulation in particular. We further conclude that LCLs are an appropriate starting material for iPSC generation.
View details for DOI 10.1371/journal.pgen.1005216
View details for Web of Science ID 000355305200032
View details for PubMedID 25950834
View details for PubMedCentralID PMC4423863
-
Genomic variation. Impact of regulatory variation from RNA to protein.
Science
2015; 347 (6222): 664-667
Abstract
The phenotypic consequences of expression quantitative trait loci (eQTLs) are presumably due to their effects on protein expression levels. Yet the impact of genetic variation, including eQTLs, on protein levels remains poorly understood. To address this, we mapped genetic variants that are associated with eQTLs, ribosome occupancy (rQTLs), or protein abundance (pQTLs). We found that most QTLs are associated with transcript expression levels, with consequent effects on ribosome and protein levels. However, eQTLs tend to have significantly reduced effect sizes on protein levels, which suggests that their potential impact on downstream phenotypes is often attenuated or buffered. Additionally, we identified a class of cis QTLs that affect protein abundance with little or no effect on messenger RNA or ribosome levels, which suggests that they may arise from differences in posttranslational regulation.
View details for DOI 10.1126/science.1260793
View details for PubMedID 25657249
-
Impact of regulatory variation from RNA to protein
SCIENCE
2015; 347 (6222): 664-667
View details for DOI 10.1126/science.1260793
View details for Web of Science ID 000349145200045
-
The Genetic and Mechanistic Basis for Variation in Gene Regulation
PLOS GENETICS
2015; 11 (1)
Abstract
It is now well established that noncoding regulatory variants play a central role in the genetics of common diseases and in evolution. However, until recently, we have known little about the mechanisms by which most regulatory variants act. For instance, what types of functional elements in DNA, RNA, or proteins are most often affected by regulatory variants? Which stages of gene regulation are typically altered? How can we predict which variants are most likely to impact regulation in a given cell type? Recent studies, in many cases using quantitative trait loci (QTL)-mapping approaches in cell lines or tissue samples, have provided us with considerable insight into the properties of genetic loci that have regulatory roles. Such studies have uncovered novel biochemical regulatory interactions and led to the identification of previously unrecognized regulatory mechanisms. We have learned that genetic variation is often directly associated with variation in regulatory activities (namely, we can map regulatory QTLs, not just expression QTLs [eQTLs]), and we have taken the first steps towards understanding the causal order of regulatory events (for example, the role of pioneer transcription factors). Yet, in most cases, we still do not know how to interpret overlapping combinations of regulatory interactions, and we are still far from being able to predict how variation in regulatory mechanisms is propagated through a chain of interactions to eventually result in changes in gene expression profiles.
View details for DOI 10.1371/journal.pgen.1004857
View details for Web of Science ID 000349314600009
View details for PubMedID 25569255
View details for PubMedCentralID PMC4287341
-
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.
PloS one
2015; 10 (9)
View details for DOI 10.1371/journal.pone.0138030
View details for PubMedID 26406244
-
msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.
PloS one
2015; 10 (9)
Abstract
Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
View details for DOI 10.1371/journal.pone.0138030
View details for PubMedID 26406244
View details for PubMedCentralID PMC4583425
-
Methylation QTLs Are Associated with Coordinated Changes in Transcription Factor Binding, Histone Modifications, and Gene Expression Levels
PLOS GENETICS
2014; 10 (9)
View details for DOI 10.1371/journal.pgen.1004663
View details for Web of Science ID 000343009600059
-
Methylation QTLs are associated with coordinated changes in transcription factor binding, histone modifications, and gene expression levels.
PLoS genetics
2014; 10 (9): e1004663
Abstract
DNA methylation is an important epigenetic regulator of gene expression. Recent studies have revealed widespread associations between genetic variation and methylation levels. However, the mechanistic links between genetic variation and methylation remain unclear. To begin addressing this gap, we collected methylation data at ∼300,000 loci in lymphoblastoid cell lines (LCLs) from 64 HapMap Yoruba individuals, and genome-wide bisulfite sequence data in ten of these individuals. We identified (at an FDR of 10%) 13,915 cis methylation QTLs (meQTLs)-i.e., CpG sites in which changes in DNA methylation are associated with genetic variation at proximal loci. We found that meQTLs are frequently associated with changes in methylation at multiple CpGs across regions of up to 3 kb. Interestingly, meQTLs are also frequently associated with variation in other properties of gene regulation, including histone modifications, DNase I accessibility, chromatin accessibility, and expression levels of nearby genes. These observations suggest that genetic variants may lead to coordinated molecular changes in all of these regulatory phenotypes. One plausible driver of coordinated changes in different regulatory mechanisms is variation in transcription factor (TF) binding. Indeed, we found that SNPs that change predicted TF binding affinities are significantly enriched for associations with DNA methylation at nearby CpGs.
View details for DOI 10.1371/journal.pgen.1004663
View details for PubMedID 25233095
View details for PubMedCentralID PMC4169251
-
fastSTRUCTURE: Variational Inference of Population Structure in Large SNP Data Sets
GENETICS
2014; 197 (2): 573-U207
View details for DOI 10.1534/genetics.114.164350
View details for Web of Science ID 000338697000013
-
fastSTRUCTURE: variational inference of population structure in large SNP data sets.
Genetics
2014; 197 (2): 573-89
Abstract
Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.
View details for DOI 10.1534/genetics.114.164350
View details for PubMedID 24700103
View details for PubMedCentralID PMC4063916
-
The deleterious mutation load is insensitive to recent population history.
Nature genetics
2014; 46 (3): 220-224
Abstract
Human populations have undergone major changes in population size in the past 100,000 years, including recent rapid growth. How these demographic events have affected the burden of deleterious mutations in individuals and the frequencies of disease mutations in populations remains unclear. We use population genetic models to show that recent human demography has probably had little impact on the average burden of deleterious mutations. This prediction is supported by two exome sequence data sets showing that individuals of west African and European ancestry carry very similar burdens of damaging mutations. We further show that for many diseases, rare alleles are unlikely to contribute a large fraction of the heritable variation, and therefore the impact of recent growth is likely to be modest. However, for those diseases that have a direct impact on fitness, strongly deleterious rare mutations probably do have an important role, and recent growth will have increased their impact.
View details for DOI 10.1038/ng.2896
View details for PubMedID 24509481
-
The functional consequences of variation in transcription factor binding.
PLoS genetics
2014; 10 (3)
Abstract
One goal of human genetics is to understand how the information for precise and dynamic gene expression programs is encoded in the genome. The interactions of transcription factors (TFs) with DNA regulatory elements clearly play an important role in determining gene expression outputs, yet the regulatory logic underlying functional transcription factor binding is poorly understood. Many studies have focused on characterizing the genomic locations of TF binding, yet it is unclear to what extent TF binding at any specific locus has functional consequences with respect to gene expression output. To evaluate the context of functional TF binding we knocked down 59 TFs and chromatin modifiers in one HapMap lymphoblastoid cell line. We then identified genes whose expression was affected by the knockdowns. We intersected the gene expression data with transcription factor binding data (based on ChIP-seq and DNase-seq) within 10 kb of the transcription start sites of expressed genes. This combination of data allowed us to infer functional TF binding. Using this approach, we found that only a small subset of genes bound by a factor were differentially expressed following the knockdown of that factor, suggesting that most interactions between TF and chromatin do not result in measurable changes in gene expression levels of putative target genes. We found that functional TF binding is enriched in regulatory elements that harbor a large number of TF binding sites, at sites with predicted higher binding affinity, and at sites that are enriched in genomic regions annotated as "active enhancers."
View details for DOI 10.1371/journal.pgen.1004226
View details for PubMedID 24603674
View details for PubMedCentralID PMC3945204
-
The chromatin architectural proteins HMGD1 and H1 bind reciprocally and have opposite effects on chromatin structure and gene regulation
BMC GENOMICS
2014; 15
Abstract
Chromatin architectural proteins interact with nucleosomes to modulate chromatin accessibility and higher-order chromatin structure. While these proteins are almost certainly important for gene regulation they have been studied far less than the core histone proteins.Here we describe the genomic distributions and functional roles of two chromatin architectural proteins: histone H1 and the high mobility group protein HMGD1 in Drosophila S2 cells. Using ChIP-seq, biochemical and gene specific approaches, we find that HMGD1 binds to highly accessible regulatory chromatin and active promoters. In contrast, H1 is primarily associated with heterochromatic regions marked with repressive histone marks. We find that the ratio of HMGD1 to H1 binding is a better predictor of gene activity than either protein by itself, which suggests that reciprocal binding between these proteins is important for gene regulation. Using knockdown experiments, we show that HMGD1 and H1 affect the occupancy of the other protein, change nucleosome repeat length and modulate gene expression.Collectively, our data suggest that dynamic and mutually exclusive binding of H1 and HMGD1 to nucleosomes and their linker sequences may control the fluid chromatin structure that is required for transcriptional regulation. This study provides a framework to further study the interplay between chromatin architectural proteins and epigenetics in gene regulation.
View details for DOI 10.1186/1471-2164-15-92
View details for Web of Science ID 000332575900002
View details for PubMedID 24484546
-
Archaic humans: Four makes a party.
Nature
2014; 505 (7481): 32-4
View details for DOI 10.1038/nature12847
View details for PubMedID 24352230
-
The effect of freeze-thaw cycles on gene expression levels in lymphoblastoid cell lines.
PloS one
2014; 9 (9)
Abstract
Epstein-Barr virus (EBV) transformed lymphoblastoid cell lines (LCLs) are a widely used renewable resource for functional genomic studies in humans. The ability to accumulate multidimensional data pertaining to the same individual cell lines, from complete genomic sequences to detailed gene regulatory profiles, further enhances the utility of LCLs as a model system. However, the extent to which LCLs are a faithful model system is relatively unknown. We have previously shown that gene expression profiles of newly established LCLs maintain a strong individual component. Here, we extend our study to investigate the effect of freeze-thaw cycles on gene expression patterns in mature LCLs, especially in the context of inter-individual variation in gene expression. We report a profound difference in the gene expression profiles of newly established and mature LCLs. Once newly established LCLs undergo a freeze-thaw cycle, the individual specific gene expression signatures become much less pronounced as the gene expression levels in LCLs from different individuals converge to a more uniform profile, which reflects a mature transformed B cell phenotype. We found that previously identified eQTLs are enriched among the relatively few genes whose regulations in mature LCLs maintain marked individual signatures. We thus conclude that while insight drawn from gene regulatory studies in mature LCLs may generally not be affected by the artificial nature of the LCL model system, many aspects of primary B cell biology cannot be observed and studied in mature LCL cultures.
View details for DOI 10.1371/journal.pone.0107166
View details for PubMedID 25192014
View details for PubMedCentralID PMC4156430
-
Epigenetic modifications are associated with inter-species gene expression variation in primates
GENOME BIOLOGY
2014; 15 (12)
View details for DOI 10.1186/s13059-014-0547-3
View details for Web of Science ID 000346609500019
-
Primate Transcript and Protein Expression Levels Evolve Under Compensatory Selection Pressures
SCIENCE
2013; 342 (6162): 1100-1104
Abstract
Changes in gene regulation have likely played an important role in the evolution of primates. Differences in messenger RNA (mRNA) expression levels across primates have often been documented; however, it is not yet known to what extent measurements of divergence in mRNA levels reflect divergence in protein expression levels, which are probably more important in determining phenotypic differences. We used high-resolution, quantitative mass spectrometry to collect protein expression measurements from human, chimpanzee, and rhesus macaque lymphoblastoid cell lines and compared them to transcript expression data from the same samples. We found dozens of genes with significant expression differences between species at the mRNA level yet little or no difference in protein expression. Overall, our data suggest that protein expression levels evolve under stronger evolutionary constraint than mRNA levels.
View details for DOI 10.1126/science.1242379
View details for Web of Science ID 000327518600059
View details for PubMedID 24136357
-
Identification of Genetic Variants That Affect Histone Modifications in Human Cells
SCIENCE
2013; 342 (6159): 747-749
Abstract
Histone modifications are important markers of function and chromatin state, yet the DNA sequence elements that direct them to specific genomic locations are poorly understood. Here, we identify hundreds of quantitative trait loci, genome-wide, that affect histone modification or RNA polymerase II (Pol II) occupancy in Yoruba lymphoblastoid cell lines (LCLs). In many cases, the same variant is associated with quantitative changes in multiple histone marks and Pol II, as well as in deoxyribonuclease I sensitivity and nucleosome positioning. Transcription factor binding site polymorphisms are correlated overall with differences in local histone modification, and we identify specific transcription factors whose binding leads to histone modification in LCLs. Furthermore, variants that affect chromatin at distal regulatory sites frequently also direct changes in chromatin and gene expression at associated promoters.
View details for DOI 10.1126/science.1242429
View details for Web of Science ID 000326647600046
View details for PubMedID 24136359