Current Research and Scholarly Interests
We study the regulation and evolution of gene expression using a combination of experimental and computational approaches.
Our work brings together quantitative genetics, genomics, epigenetics, and evolutionary biology to achieve a deeper understanding of how genetic variation within and between species affects genome-wide gene expression and ultimately shapes the phenotypic diversity of life.
- Plant Biology, Evolution, and Ecology
BIO 43 (Spr)
Independent Studies (10)
- Advanced Research Laboratory in Experimental Biology
BIO 199 (Aut, Win, Spr, Sum)
- Biomedical Informatics Teaching Methods
BIOMEDIN 290 (Aut, Win, Spr, Sum)
- Directed Reading and Research
BIOMEDIN 299 (Aut, Win, Spr, Sum)
- Directed Reading in Biology
BIO 198 (Aut, Win, Spr, Sum)
- Graduate Research
BIO 300 (Aut, Win, Spr, Sum)
- Medical Scholars Research
BIOMEDIN 370 (Aut, Win, Spr, Sum)
- Out-of-Department Advanced Research Laboratory in Experimental Biology
BIO 199X (Sum)
- Out-of-Department Directed Reading
BIO 198X (Aut, Win, Spr, Sum)
- Out-of-Department Graduate Research
BIO 300X (Aut, Win, Spr, Sum)
- Teaching of Biology
BIO 290 (Aut, Win, Spr)
- Advanced Research Laboratory in Experimental Biology
- Prior Year Courses
Graduate and Fellowship Programs
Biology (School of Humanities and Sciences) (Phd Program)
Biomedical Informatics (Phd Program)
Genetic conflict reflected in tissue-specific maps of genomic imprinting in human and mouse.
2015; 47 (5): 544-549
Genomic imprinting is an epigenetic process that restricts gene expression to either the maternally or paternally inherited allele. Many theories have been proposed to explain its evolutionary origin, but understanding has been limited by a paucity of data mapping the breadth and dynamics of imprinting within any organism. We generated an atlas of imprinting spanning 33 mouse and 45 human developmental stages and tissues. Nearly all imprinted genes were imprinted in early development and either retained their parent-of-origin expression in adults or lost it completely. Consistent with an evolutionary signature of parental conflict, imprinted genes were enriched for coexpressed pairs of maternally and paternally expressed genes, showed accelerated expression divergence between human and mouse, and were more highly expressed than their non-imprinted orthologs in other species. Our approach demonstrates a general framework for the discovery of imprinting in any species and sheds light on the causes and consequences of genomic imprinting in mammals.
View details for DOI 10.1038/ng.3274
View details for PubMedID 25848752
Evolution at two levels of gene expression in yeast
2014; 24 (3): 411-421
Despite the greater functional importance of protein levels, our knowledge of gene expression evolution is based almost entirely on studies of mRNA levels. In contrast, our understanding of how translational regulation evolves has lagged far behind. Here we have applied ribosome profiling-which measures both global mRNA levels and their translation rates-to two species of Saccharomyces yeast and their interspecific hybrid in order to assess the relative contributions of changes in mRNA abundance and translation to regulatory evolution. We report that both cis- and trans-acting regulatory divergence in translation are abundant, affecting at least 35% of genes. The majority of translational divergence acts to buffer changes in mRNA abundance, suggesting a widespread role for stabilizing selection acting across regulatory levels. Nevertheless, we observe evidence of lineage-specific selection acting on several yeast functional modules, including instances of reinforcing selection acting at both levels of regulation. Finally, we also uncover multiple instances of stop-codon readthrough that are conserved between species. Our analysis reveals the underappreciated complexity of post-transcriptional regulatory divergence and indicates that partitioning the search for the locus of selection into the binary categories of "coding" versus "regulatory" may overlook a significant source of selection, acting at multiple regulatory levels along the path from genotype to phenotype.
View details for DOI 10.1101/gr.165522.113
View details for Web of Science ID 000332246100005
Gene expression drives local adaptation in humans
2013; 23 (7): 1089-1096
The molecular basis of adaptation-and, in particular, the relative roles of protein-coding versus gene expression changes-has long been the subject of speculation and debate. Recently, the genotyping of diverse human populations has led to the identification of many putative "local adaptations" that differ between populations. Here I show that these local adaptations are over 10-fold more likely to affect gene expression than amino acid sequence. In addition, a novel framework for identifying polygenic local adaptations detects recent positive selection on the expression levels of genes involved in UV radiation response, immune cell proliferation, and diabetes-related pathways. These results provide the first examples of polygenic gene expression adaptation in humans, as well as the first genome-scale support for the hypothesis that changes in gene expression have driven human adaptation.
View details for DOI 10.1101/gr.152710.112
View details for Web of Science ID 000321119900006
View details for PubMedID 23539138
Polygenic cis-regulatory adaptaion in the evolution of yeast pathogenicity
2012; 22 (10): 1930-1939
The acquisition of new genes, via horizontal transfer or gene duplication/diversification, has been the dominant mechanism thus far implicated in the evolution of microbial pathogenicity. In contrast, the role of many other modes of evolution--such as changes in gene expression regulation-remains unknown. A transition to a pathogenic lifestyle has recently taken place in some lineages of the budding yeast Saccharomyces cerevisiae. Here we identify a module of physically interacting proteins involved in endocytosis that has experienced selective sweeps for multiple cis-regulatory mutations that down-regulate gene expression levels in a pathogenic yeast. To test if these adaptations affect virulence, we created a panel of single-allele knockout strains whose hemizygous state mimics the genes' adaptive down-regulations, and measured their virulence in a mammalian host. Despite having no growth advantage in standard laboratory conditions, nearly all of the strains were more virulent than their wild-type progenitor, suggesting that these adaptations likely played a role in the evolution of pathogenicity. Furthermore, genetic variants at these loci were associated with clinical origin across 88 diverse yeast strains, suggesting the adaptations may have contributed to the virulence of a wide range of clinical isolates. We also detected pleiotropic effects of these adaptations on a wide range of morphological traits, which appear to have been mitigated by compensatory mutations at other loci. These results suggest that cis-regulatory adaptation can occur at the level of physically interacting modules and that one such polygenic adaptation led to increased virulence during the evolution of a pathogenic yeast.
View details for DOI 10.1101/gr.134080.111
View details for Web of Science ID 000309325900010
View details for PubMedID 22645260
Common variants spanning PLK4 are associated with mitotic-origin aneuploidy in human embryos
2015; 348 (6231): 235-238
Aneuploidy, the inheritance of an atypical chromosome complement, is common in early human development and is the primary cause of pregnancy loss. By screening day-3 embryos during in vitro fertilization cycles, we identified an association between aneuploidy of putative mitotic origin and linked genetic variants on chromosome 4 of maternal genomes. This associated region contains a candidate gene, Polo-like kinase 4 (PLK4), that plays a well-characterized role in centriole duplication and has the ability to alter mitotic fidelity upon minor dysregulation. Mothers with the high-risk genotypes contributed fewer embryos for testing at day 5, suggesting that their embryos are less likely to survive to blastocyst formation. The associated region coincides with a signature of a selective sweep in ancient humans, suggesting that the causal variant was either the target of selection or hitchhiked to substantial frequency.
View details for DOI 10.1126/science.aaa3337
View details for Web of Science ID 000352613700046
View details for PubMedID 25859044
Discordance of DNA Methylation Variance Between two Accessible Human Tissues.
2015; 5: 8257-?
Population epigenetic studies have been seeking to identify differences in DNA methylation between specific exposures, demographic factors, or diseases in accessible tissues, but relatively little is known about how inter-individual variability differs between these tissues. This study presents an analysis of DNA methylation differences between matched peripheral blood mononuclear cells (PMBCs) and buccal epithelial cells (BECs), the two most accessible tissues for population studies, in 998 promoter-located CpG sites. Specifically we compared probe-wise DNA methylation variance, and how this variance related to demographic factors across the two tissues. PBMCs had overall higher DNA methylation than BECs, and the two tissues tended to differ most at genomic regions of low CpG density. Furthermore, although both tissues showed appreciable probe-wise variability, the specific regions and magnitude of variability differed strongly between tissues. Lastly, through exploratory association analysis, we found indication of differential association of BEC and PBMC with demographic variables. The work presented here offers insight into variability of DNA methylation between individuals and across tissues and helps guide decisions on the suitability of buccal epithelial or peripheral mononuclear cells for the biological questions explored by epigenetic studies in human populations.
View details for DOI 10.1038/srep08257
View details for PubMedID 25660083
Accounting for biases in riboprofiling data indicates a major role for proline in stalling translation
2014; 24 (12): 2011-2021
The recent advent of ribosome profiling-sequencing of short ribosome-bound fragments of mRNA-has offered an unprecedented opportunity to interrogate the sequence features responsible for modulating translational rates. Nevertheless, numerous analyses of the first riboprofiling data set have produced equivocal and often incompatible results. Here we analyze three independent yeast riboprofiling data sets, including two with much higher coverage than previously available, and find that all three show substantial technical sequence biases that confound interpretations of ribosomal occupancy. After accounting for these biases, we find no effect of previously implicated factors on ribosomal pausing. Rather, we find that incorporation of proline, whose unique side-chain stalls peptide synthesis in vitro, also slows the ribosome in vivo. We also reanalyze a method that implicated positively charged amino acids as the major determinant of ribosomal stalling and demonstrate that it produces false signals of stalling in low-coverage data. Our results suggest that any analysis of riboprofiling data should account for sequencing biases and sparse coverage. To this end, we establish a robust methodology that enables analysis of ribosome profiling data without prior assumptions regarding which positions spanned by the ribosome cause stalling.
View details for DOI 10.1101/gr.175893.114
View details for Web of Science ID 000345810600009
View details for PubMedID 25294246
- Transcript Length Mediates Developmental Timing of Gene Expression Across Drosophila MOLECULAR BIOLOGY AND EVOLUTION 2014; 31 (11): 2879-2889
A Novel Test for Selection on cis-Regulatory Elements Reveals Positive and Negative Selection Acting on Mammalian Transcriptional Enhancers
MOLECULAR BIOLOGY AND EVOLUTION
2013; 30 (11): 2509-2518
Measuring natural selection on genomic elements involved in the cis-regulation of gene expression-such as transcriptional enhancers and promoters-is critical for understanding the evolution of genomes, yet it remains a major challenge. Many studies have attempted to detect positive or negative selection in these noncoding elements by searching for those with the fastest or slowest rates of evolution, but this can be problematic. Here, we introduce a new approach to this issue, and demonstrate its utility on three mammalian transcriptional enhancers. Using results from saturation mutagenesis studies of these enhancers, we classified all possible point mutations as upregulating, downregulating, or silent, and determined which of these mutations have occurred on each branch of a phylogeny. Applying a framework analogous to Ka/Ks in protein-coding genes, we measured the strength of selection on upregulating and downregulating mutations, in specific branches as well as entire phylogenies. We discovered distinct modes of selection acting on different enhancers: although all three have experienced negative selection against downregulating mutations, the selection pressures on upregulating mutations vary. In one case, we detected positive selection for upregulation, whereas the other two had no detectable selection on upregulating mutations. Our methodology is applicable to the growing number of saturation mutagenesis data sets, and provides a detailed picture of the mode and strength of natural selection acting on cis-regulatory elements.
View details for DOI 10.1093/molbev/mst134
View details for Web of Science ID 000326745300011
View details for PubMedID 23904330
Ancient cis-regulatory constraints and the evolution of genome architecture
TRENDS IN GENETICS
2013; 29 (9): 521-528
The order of genes along metazoan chromosomes has generally been thought to be largely random, with few implications for organismal function. However, two recent studies, reporting hundreds of pairs of genes that have remained linked in diverse metazoan species over hundreds of millions of years of evolution, suggest widespread functional implications for gene order. These associations appear to largely reflect cis-regulatory constraints, with either (i) multiple genes sharing transcriptional regulatory elements, or (ii) regulatory elements for a developmental gene being found within a neighboring 'bystander' gene (known as a genomic regulatory block). We discuss implications, questions raised, and new research directions arising from these studies, as well as evidence for similar phenomena in other eukaryotic groups.
View details for DOI 10.1016/j.tig.2013.05.008
View details for Web of Science ID 000324284000006
View details for PubMedID 23791467
- The Molecular Mechanism of a Cis-Regulatory Adaptation in Yeast PLOS GENETICS 2013; 9 (9)
The molecular mechanism of a cis-regulatory adaptation in yeast.
2013; 9 (9)
Despite recent advances in our ability to detect adaptive evolution involving the cis-regulation of gene expression, our knowledge of the molecular mechanisms underlying these adaptations has lagged far behind. Across all model organisms, the causal mutations have been discovered for only a handful of gene expression adaptations, and even for these, mechanistic details (e.g. the trans-regulatory factors involved) have not been determined. We previously reported a polygenic gene expression adaptation involving down-regulation of the ergosterol biosynthesis pathway in the budding yeast Saccharomyces cerevisiae. Here we investigate the molecular mechanism of a cis-acting mutation affecting a member of this pathway, ERG28. We show that the causal mutation is a two-base deletion in the promoter of ERG28 that strongly reduces the binding of two transcription factors, Sok2 and Mot3, thus abolishing their regulation of ERG28. This down-regulation increases resistance to a widely used antifungal drug targeting ergosterol, similar to mutations disrupting this pathway in clinical yeast isolates. The identification of the causal genetic variant revealed that the selection likely occurred after the deletion was already present at high frequency in the population, rather than when it was a new mutation. These results provide a detailed view of the molecular mechanism of a cis-regulatory adaptation, and underscore the importance of this view to our understanding of evolution at the molecular level.
View details for DOI 10.1371/journal.pgen.1003813
View details for PubMedID 24068973
- Cell-cycle regulated transcription associates with DNA replication timing in yeast and human GENOME BIOLOGY 2013; 14 (10)
Differences in enhancer activity in mouse and zebrafish reporter assays are often associated with changes in gene expression
Phenotypic evolution in animals is thought to be driven in large part by differences in gene expression patterns, which can result from sequence changes in cis-regulatory elements (cis-changes) or from changes in the expression pattern or function of transcription factors (trans-changes). While isolated examples of trans-changes have been identified, the scale of their overall contribution to regulatory and phenotypic evolution remains unclear.Here, we attempt to examine the prevalence of trans-effects and their potential impact on gene expression patterns in vertebrate evolution by comparing the function of identical human tissue-specific enhancer sequences in two highly divergent vertebrate model systems, mouse and zebrafish. Among 47 human conserved non-coding elements (CNEs) tested in transgenic mouse embryos and in stable zebrafish lines, at least one species-specific expression domain was observed in the majority (83%) of cases, and 36% presented dramatically different expression patterns between the two species. Although some of these discrepancies may be due to the use of different transgenesis systems in mouse and zebrafish, in some instances we found an association between differences in enhancer activity and changes in the endogenous gene expression patterns between mouse and zebrafish, suggesting a potential role for trans-changes in the evolution of gene expression.In total, our results: (i) serve as a cautionary tale for studies investigating the role of human enhancers in different model organisms, and (ii) suggest that changes in the trans environment may play a significant role in the evolution of gene expression in vertebrates.
View details for DOI 10.1186/1471-2164-13-713
View details for Web of Science ID 000313248200001
View details for PubMedID 23253453
Extensive conservation of ancient microsynteny across metazoans due to cis-regulatory constraints
2012; 22 (12): 2356-2367
The order of genes in eukaryotic genomes has generally been assumed to be neutral, since gene order is largely scrambled over evolutionary time. Only a handful of exceptional examples are known, typically involving deeply conserved clusters of tandemly duplicated genes (e.g., Hox genes and histones). Here we report the first systematic survey of microsynteny conservation across metazoans, utilizing 17 genome sequences. We identified nearly 600 pairs of unrelated genes that have remained tightly physically linked in diverse lineages across over 600 million years of evolution. Integrating sequence conservation, gene expression data, gene function, epigenetic marks, and other genomic features, we provide extensive evidence that many conserved ancient linkages involve (1) the coordinated transcription of neighboring genes, or (2) genomic regulatory blocks (GRBs) in which transcriptional enhancers controlling developmental genes are contained within nearby bystander genes. In addition, we generated ChIP-seq data for key histone modifications in zebrafish embryos, which provided further evidence of putative GRBs in embryonic development. Finally, using chromosome conformation capture (3C) assays and stable transgenic experiments, we demonstrate that enhancers within bystander genes drive the expression of genes such as Otx and Islet, critical regulators of central nervous system development across bilaterians. These results suggest that ancient genomic functional associations are far more common than previously thought-involving ?12% of the ancestral bilaterian genome-and that cis-regulatory constraints are crucial in determining metazoan genome architecture.
View details for DOI 10.1101/gr.139725.112
View details for Web of Science ID 000311895500004
View details for PubMedID 22722344
Factors underlying variable DNA methylation in a human community cohort
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2012; 109: 17253-17260
Epigenetics is emerging as an attractive mechanism to explain the persistent genomic embedding of early-life experiences. Tightly linked to chromatin, which packages DNA into chromosomes, epigenetic marks primarily serve to regulate the activity of genes. DNA methylation is the most accessible and characterized component of the many chromatin marks that constitute the epigenome, making it an ideal target for epigenetic studies in human populations. Here, using peripheral blood mononuclear cells collected from a community-based cohort stratified for early-life socioeconomic status, we measured DNA methylation in the promoter regions of more than 14,000 human genes. Using this approach, we broadly assessed and characterized epigenetic variation, identified some of the factors that sculpt the epigenome, and determined its functional relation to gene expression. We found that the leukocyte composition of peripheral blood covaried with patterns of DNA methylation at many sites, as did demographic factors, such as sex, age, and ethnicity. Furthermore, psychosocial factors, such as perceived stress, and cortisol output were associated with DNA methylation, as was early-life socioeconomic status. Interestingly, we determined that DNA methylation was strongly correlated to the ex vivo inflammatory response of peripheral blood mononuclear cells to stimulation with microbial products that engage Toll-like receptors. In contrast, our work found limited effects of DNA methylation marks on the expression of associated genes across individuals, suggesting a more complex relationship than anticipated.
View details for DOI 10.1073/pnas.1121249109
View details for Web of Science ID 000310510500018
View details for PubMedID 23045638
Population-specificity of human DNA methylation
2012; 13 (2)
Ethnic differences in human DNA methylation have been shown for a number of CpG sites, but the genome-wide patterns and extent of these differences are largely unknown. In addition, whether the genetic control of polymorphic DNA methylation is population-specific has not been investigated.Here we measure DNA methylation near the transcription start sites of over 14, 000 genes in 180 cell lines derived from one African and one European population. We find population-specific patterns of DNA methylation at over a third of all genes. Furthermore, although the methylation at over a thousand CpG sites is heritable, these heritabilities also differ between populations, suggesting extensive divergence in the genetic control of DNA methylation. In support of this, genetic mapping of DNA methylation reveals that most of the population specificity can be explained by divergence in allele frequencies between populations, and that there is little overlap in genetic associations between populations. These population-specific genetic associations are supported by the patterns of DNA methylation in several hundred brain samples, suggesting that they hold in vivo and across tissues.These results suggest that DNA methylation is highly divergent between populations, and that this divergence may be due in large part to a combination of differences in allele frequencies and complex epistasis or gene × environment interactions.
View details for DOI 10.1186/gb-2012-13-2-r8
View details for Web of Science ID 000305391700001
View details for PubMedID 22322129
Genome-wide approaches to the study of adaptive gene expression evolution Systematic studies of evolutionary adaptations involving gene expression will allow many fundamental questions in evolutionary biology to be addressed
2011; 33 (6): 469-477
The role of gene expression in evolutionary adaptation has been a subject of debate for over 40 years. cis-regulation of transcription has been proposed to be the primary source of morphological novelty in evolution, though this is based on only a handful of examples. Recently the first genome-wide studies of gene expression adaptation have been published, giving us an initial global view of this process. Systematic studies such as these will allow a number of key questions currently facing the field of gene expression evolution to be addressed.
View details for DOI 10.1002/bies.201000094
View details for Web of Science ID 000291548300012
View details for PubMedID 21538412
Systematic Detection of Polygenic cis-Regulatory Evolution
2011; 7 (3)
The idea that most morphological adaptations can be attributed to changes in the cis-regulation of gene expression levels has been gaining increasing acceptance, despite the fact that only a handful of such cases have so far been demonstrated. Moreover, because each of these cases involves only one gene, we lack any understanding of how natural selection may act on cis-regulation across entire pathways or networks. Here we apply a genome-wide test for selection on cis-regulation to two subspecies of the mouse Mus musculus. We find evidence for lineage-specific selection at over 100 genes involved in diverse processes such as growth, locomotion, and memory. These gene sets implicate candidate genes that are supported by both quantitative trait loci and a validated causality-testing framework, and they predict a number of phenotypic differences, which we confirm in all four cases tested. Our results suggest that gene expression adaptation is widespread and that these adaptations can be highly polygenic, involving cis-regulatory changes at numerous functionally related genes. These coordinated adaptations may contribute to divergence in a wide range of morphological, physiological, and behavioral phenotypes.
View details for DOI 10.1371/journal.pgen.1002023
View details for Web of Science ID 000288996600053
View details for PubMedID 21483757
Genetic validation of whole-transcriptome sequencing for mapping expression affected by cis-regulatory variation
Identifying associations between genotypes and gene expression levels using microarrays has enabled systematic interrogation of regulatory variation underlying complex phenotypes. This approach has vast potential for functional characterization of disease states, but its prohibitive cost, given hundreds to thousands of individual samples from populations have to be genotyped and expression profiled, has limited its widespread application.Here we demonstrate that genomic regions with allele-specific expression (ASE) detected by sequencing cDNA are highly enriched for cis-acting expression quantitative trait loci (cis-eQTL) identified by profiling of 500 animals in parallel, with up to 90% agreement on the allele that is preferentially expressed. We also observed widespread noncoding and antisense ASE and identified several allele-specific alternative splicing variants.Monitoring ASE by sequencing cDNA from as little as one sample is a practical alternative to expression genetics for mapping cis-acting variation that regulates RNA transcription and processing.
View details for DOI 10.1186/1471-2164-11-473
View details for Web of Science ID 000282789200002
View details for PubMedID 20707912
Evidence for widespread adaptive evolution of gene expression in budding yeast
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2010; 107 (7): 2977-2982
Changes in gene expression have been proposed to underlie many, or even most, adaptive differences between species. Despite the increasing acceptance of this view, only a handful of cases of adaptive gene expression evolution have been demonstrated. To address this discrepancy, we introduce a simple test for lineage-specific selection on gene expression. Applying the test to genome-wide gene expression data from the budding yeast Saccharomyces cerevisiae, we find that hundreds of gene expression levels have been subject to lineage-specific selection. Comparing these findings with independent population genetic evidence of selective sweeps suggests that this lineage-specific selection has resulted in recent sweeps at over a hundred genes, most of which led to increased transcript levels. Examination of the implicated genes revealed a specific biochemical pathway--ergosterol biosynthesis--where the expression of multiple genes has been subject to selection for reduced levels. In sum, these results suggest that adaptive evolution of gene expression is common in yeast, that regulatory adaptation can occur at the level of entire pathways, and that similar genome-wide scans may be possible in other species, including humans.
View details for DOI 10.1073/pnas.0912245107
View details for Web of Science ID 000274599500050
View details for PubMedID 20133628
The Quantitative Genetics of Phenotypic Robustness
2010; 5 (1)
Phenotypic robustness, or canalization, has been extensively investigated both experimentally and theoretically. However, it remains unknown to what extent robustness varies between individuals, and whether factors buffering environmental variation also buffer genetic variation. Here we introduce a quantitative genetic approach to these issues, and apply this approach to data from three species. In mice, we find suggestive evidence that for hundreds of gene expression traits, robustness is polymorphic and can be genetically mapped to discrete genomic loci. Moreover, we find that the polymorphisms buffering genetic variation are distinct from those buffering environmental variation. In fact, these two classes have quite distinct mechanistic bases: environmental buffers of gene expression are predominantly sex-specific and trans-acting, whereas genetic buffers are not sex-specific and often cis-acting. Data from studies of morphological and life-history traits in plants and yeast support the distinction between polymorphisms buffering genetic and environmental variation, and further suggest that loci buffering different types of environmental variation do overlap with one another. These preliminary results suggest that naturally occurring polymorphisms affecting phenotypic robustness could be abundant, and that these polymorphisms may generally buffer either genetic or environmental variation, but not both.
View details for DOI 10.1371/journal.pone.0008635
View details for Web of Science ID 000273414200013
View details for PubMedID 20072615
Common polymorphic transcript variation in human disease
2009; 19 (4): 567-575
Most human genes are thought to express different transcript isoforms in different cell types; however, the full extent and functional consequences of polymorphic transcript variation (PTV), which differ between individuals within the same cell type, are unknown. Here we show that PTV is widespread in B-cells from two human populations. Tens of thousands of exons were found to be polymorphically expressed in a heritable fashion, and over 1000 of these showed strong correlations with single nucleotide polymorphism (SNP) genotypes in cis. The SNPs associated with PTV display signs of having been subject to recent positive selection in humans, and they are also highly enriched for SNPs implicated by recent genome-wide association studies of four autoimmune diseases. From this disease-association overlap, we infer that PTV is the likely mechanism by which eight common polymorphisms contribute to disease risk. A catalog of PTV will be a valuable resource for interpreting results from future disease-association studies and understanding the spectrum of phenotypic differences among humans.
View details for DOI 10.1101/gr.083477.108
View details for Web of Science ID 000264781900005
View details for PubMedID 19189928
Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2009; 106 (9): 3264-3269
Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5' and 3' UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome.
View details for DOI 10.1073/pnas.0812841106
View details for Web of Science ID 000263844100053
View details for PubMedID 19208812
- Confirmation of organized modularity in the yeast interactome PLOS BIOLOGY 2007; 5 (6): 1206-1210
Assessing the determinants of evolutionary rates in the presence of noise
MOLECULAR BIOLOGY AND EVOLUTION
2007; 24 (5): 1113-1121
Although protein sequences are known to evolve at vastly different rates, little is known about what determines their rate of evolution. However, a recent study using principal component regression (PCR) has concluded that evolutionary rates in yeast are primarily governed by a single determinant related to translation frequency. Here, we demonstrate that noise in biological data can confound PCRs, leading to spurious conclusions. When equalizing noise levels across 7 predictor variables used in previous studies, we find no evidence that protein evolution is dominated by a single determinant. Our results indicate that a variety of factors--including expression level, gene dispensability, and protein-protein interactions--may independently affect evolutionary rates in yeast. More accurate measurements or more sophisticated statistical techniques will be required to determine which one, if any, of these factors dominates protein evolution.
View details for DOI 10.1093/molbev/msm044
View details for Web of Science ID 000246802400004
View details for PubMedID 17347158
Using protein complexes to predict phenotypic effects of gene mutation
2007; 8 (11)
Predicting the phenotypic effects of mutations is a central goal of genetics research; it has important applications in elucidating how genotype determines phenotype and in identifying human disease genes.Using a wide range of functional genomic data from the yeast Saccharomyces cerevisiae, we show that the best predictor of a protein's knockout phenotype is the knockout phenotype of other proteins that are present in a protein complex with it. Even the addition of multiple datasets does not improve upon the predictions made from protein complex membership. Similarly, we find that a proxy for protein complexes is a powerful predictor of disease phenotypes in humans.We propose that identifying human protein complexes containing known disease genes will be an efficient method for large-scale disease gene discovery, and that yeast may prove to be an informative model system for investigating, and even predicting, the genetic basis of both Mendelian and complex disease phenotypes.
View details for DOI 10.1186/gb-2007-8-11-r252
View details for Web of Science ID 000252101100026
View details for PubMedID 18042286
Coevolution, modularity and human disease
CURRENT OPINION IN GENETICS & DEVELOPMENT
2006; 16 (6): 637-644
The concepts of coevolution and modularity have been studied separately for decades. Recent advances in genomics have led to the first systematic studies in each of these fields at the molecular level, resulting in several important discoveries. Both coevolution and modularity appear to be pervasive features of genomic data from all species studied to date, and their presence can be detected in many types of datasets, including genome sequences, gene expression data, and protein-protein interaction data. Moreover, the combination of these two ideas might have implications for our understanding of many aspects of biology, ranging from the general architecture of living systems to the causes of various human diseases.
View details for DOI 10.1016/j.gde.2006.09.001
View details for Web of Science ID 000242647400016
View details for PubMedID 17005391
Codon usage and selection on proteins
JOURNAL OF MOLECULAR EVOLUTION
2006; 63 (5): 635-653
Selection pressures on proteins are usually measured by comparing homologous nucleotide sequences (Zuckerkandl and Pauling 1965). Recently we introduced a novel method, termed volatility, to estimate selection pressures on proteins on the basis of their synonymous codon usage (Plotkin and Dushoff 2003; Plotkin et al. 2004). Here we provide a theoretical foundation for this approach. Under the Fisher-Wright model, we derive the expected frequencies of synonymous codons as a function of the strength of selection on amino acids, the mutation rate, and the effective population size. We analyze the conditions under which we can expect to draw inferences from biased codon usage, and we estimate the time scales required to establish and maintain such a signal. We find that synonymous codon usage can reliably distinguish between negative selection and neutrality only for organisms, such as some microbes, that experience large effective population sizes or periods of elevated mutation rates. The power of volatility to detect positive selection is also modest--requiring approximately 100 selected sites--but it depends less strongly on population size. We show that phenomena such as transient hyper-mutators can improve the power of volatility to detect selection, even when the neutral site heterozygosity is low. We also discuss several confounding factors, neglected by the Fisher-Wright model, that may limit the applicability of volatility in practice.
View details for DOI 10.1007/s00239-005-0233-x
View details for Web of Science ID 000242014800006
View details for PubMedID 17043750
Estimating selection pressures from limited comparative data
MOLECULAR BIOLOGY AND EVOLUTION
2006; 23 (8): 1457-1459
We recently introduced a novel method for estimating selection pressures on proteins, termed "volatility," which requires only a single genome sequence. Some criticisms that have been levied against this approach are valid, but many others are based on misconceptions of volatility, or they apply equally to comparative methods of estimating selection. Here, we introduce a simple regression technique for estimating selection pressures on all proteins in a genome, on the basis of limited comparative data. The regression technique does not depend on an underlying population-genetic mechanism. This new approach to estimating selection across a genome should be more powerful and more widely applicable than volatility itself.
View details for DOI 10.1093/molberv/msl021
View details for Web of Science ID 000239281200001
View details for PubMedID 16754640
Aging and gene expression in the primate brain
2005; 3 (9): 1653-1661
It is well established that gene expression levels in many organisms change during the aging process, and the advent of DNA microarrays has allowed genome-wide patterns of transcriptional changes associated with aging to be studied in both model organisms and various human tissues. Understanding the effects of aging on gene expression in the human brain is of particular interest, because of its relation to both normal and pathological neurodegeneration. Here we show that human cerebral cortex, human cerebellum, and chimpanzee cortex each undergo different patterns of age-related gene expression alterations. In humans, many more genes undergo consistent expression changes in the cortex than in the cerebellum; in chimpanzees, many genes change expression with age in cortex, but the pattern of changes in expression bears almost no resemblance to that of human cortex. These results demonstrate the diversity of aging patterns present within the human brain, as well as how rapidly genome-wide patterns of aging can evolve between species; they may also have implications for the oxidative free radical theory of aging, and help to improve our understanding of human neurodegenerative diseases.
View details for DOI 10.1371/journal.pbio.0030274
View details for Web of Science ID 000231820900016
View details for PubMedID 16048372
Sum1p, the origin recognition complex, and the spreading of a promoter-specific repressor in Saccharomyces cerevisiae
MOLECULAR AND CELLULAR BIOLOGY
2005; 25 (14): 5920-5932
In Saccharomyces cerevisiae, Sum1p is a promoter-specific repressor. A single amino acid change generates the mutant Sum1-1p, which causes regional silencing at new loci where wild-type Sum1p does not act. Thus, Sum1-1p is a model for understanding how the spreading of repressive chromatin is regulated. When wild-type Sum1p was targeted to a locus where mutant Sum1-1p spreads, wild-type Sum1p did not spread as efficiently as mutant Sum1-1p did, despite being in the same genomic context. Thus, the SUM1-1 mutation altered the ability of the protein to spread. The spreading of Sum1-1p required both an enzymatically active deacetylase, Hst1p, and the N-terminal tail of histone H4, consistent with the spreading of Sum1-1p involving sequential modification of and binding to histone tails, as observed for other silencing proteins. Furthermore, deletion of the N-terminal tail of H4 caused Sum1-1p to return to loci where wild-type Sum1p acts, consistent with the SUM1-1 mutation increasing the affinity of the protein for H4 tails. These results imply that the spreading of repressive chromatin proteins is regulated by their affinities for histone tails. Finally, this study uncovered a functional connection between wild-type Sum1p and the origin recognition complex, and this relationship also contributes to mutant Sum1-1p localization.
View details for DOI 10.1128/MCB.25.14.5920-5932.2005
View details for Web of Science ID 000230267000012
View details for PubMedID 15988008
Functional genomic analysis of the rates of protein evolution
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2005; 102 (15): 5483-5488
The evolutionary rates of proteins vary over several orders of magnitude. Recent work suggests that analysis of large data sets of evolutionary rates in conjunction with the results from high-throughput functional genomic experiments can identify the factors that cause proteins to evolve at such dramatically different rates. To this end, we estimated the evolutionary rates of >3,000 proteins in four species of the yeast genus Saccharomyces and investigated their relationship with levels of expression and protein dispensability. Each protein's dispensability was estimated by the growth rate of mutants deficient for the protein. Our analyses of these improved evolutionary and functional genomic data sets yield three main results. First, dispensability and expression have independent, significant effects on the rate of protein evolution. Second, measurements of expression levels in the laboratory can be used to filter data sets of dispensability estimates, removing variates that are unlikely to reflect real biological effects. Third, structural equation models show that although we may reasonably infer that dispensability and expression have significant effects on protein evolutionary rate, we cannot yet accurately estimate the relative strengths of these effects.
View details for DOI 10.1073/pnas.0501761102
View details for Web of Science ID 000228376600036
View details for PubMedID 15800036
Modularity and evolutionary constraint on proteins
2005; 37 (4): 351-352
Modularity, which has been found in the functional and physical protein interaction networks of many organisms, has been postulated to affect both the mode and tempo of evolution. Here I show that in the yeast Saccharomyces cerevisiae, protein interaction hubs situated in single modules are highly constrained, whereas those connecting different modules are more plastic. This pattern of change could reflect a tendency for evolutionary innovations to occur by altering the proteins and interactions between rather than within modules, in a manner somewhat similar to the evolution of new proteins through the shuffling of conserved protein domains.
View details for DOI 10.1038/ng1530
View details for Web of Science ID 000228040000016
View details for PubMedID 15750592
Adjusting for selection on synonymous sites in estimates of evolutionary distance
MOLECULAR BIOLOGY AND EVOLUTION
2005; 22 (1): 174-177
Evolution at silent sites is often used to estimate the pace of selectively neutral processes or to infer differences in divergence times of genes. However, silent sites are subject to selection in favor of preferred codons, and the strength of such selection varies dramatically across genes. Here, we use the relationship between codon bias and synonymous divergence observed in four species of the genus Saccharomyces to provide a simple correction for selection on silent sites.
View details for DOI 10.1093/molbev/msh265
View details for Web of Science ID 000225730100018
View details for PubMedID 15371530
Conservation and evolution of cis-regulatory systems in ascomycete fungi
2004; 2 (12): 2202-2219
Relatively little is known about the mechanisms through which gene expression regulation evolves. To investigate this, we systematically explored the conservation of regulatory networks in fungi by examining the cis-regulatory elements that govern the expression of coregulated genes. We first identified groups of coregulated Saccharomyces cerevisiae genes enriched for genes with known upstream or downstream cis-regulatory sequences. Reasoning that many of these gene groups are coregulated in related species as well, we performed similar analyses on orthologs of coregulated S. cerevisiae genes in 13 other ascomycete species. We find that many species-specific gene groups are enriched for the same flanking regulatory sequences as those found in the orthologous gene groups fromS. cerevisiae, indicating that those regulatory systems have been conserved in multiple ascomycete species. In addition to these clear cases of regulatory conservation, we find examples of cis-element evolution that suggest multiple modes of regulatory diversification, including alterations in transcription factor-binding specificity, incorporation of new gene targets into an existing regulatory system, and cooption of regulatory systems to control a different set of genes. We investigated one example in greater detail by measuring the in vitro activity of the S. cerevisiae transcription factor Rpn4p and its orthologs from Candida albicans and Neurospora crassa. Our results suggest that the DNA binding specificity of these proteins has coevolved with the sequences found upstream of the Rpn4p target genes and suggest that Rpn4p has a different function in N. crassa.
View details for DOI 10.1371/journal.pbio.0020398
View details for Web of Science ID 000226099600021
View details for PubMedID 15534694
Coevolution of gene expression among interacting proteins
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2004; 101 (24): 9033-9038
Physically interacting proteins or parts of proteins are expected to evolve in a coordinated manner that preserves proper interactions. Such coevolution at the amino acid-sequence level is well documented and has been used to predict interacting proteins, domains, and amino acids. Interacting proteins are also often precisely coexpressed with one another, presumably to maintain proper stoichiometry among interacting components. Here, we show that the expression levels of physically interacting proteins coevolve. We estimate average expression levels of genes from four closely related fungi of the genus Saccharomyces using the codon adaptation index and show that expression levels of interacting proteins exhibit coordinated changes in these different species. We find that this coevolution of expression is a more powerful predictor of physical interaction than is coevolution of amino acid sequence. These results demonstrate that gene expression levels can coevolve, adding another dimension to the study of the coevolution of interacting proteins and underscoring the importance of maintaining coexpression of interacting proteins over evolutionary time. Our results also suggest that expression coevolution can be used for computational prediction of protein-protein interactions.
View details for DOI 10.1073/pnas.0402591101
View details for Web of Science ID 000222104900038
View details for PubMedID 15175431
Noise minimization in eukaryotic gene expression
2004; 2 (6): 834-838
All organisms have elaborate mechanisms to control rates of protein production. However, protein production is also subject to stochastic fluctuations, or "noise." Several recent studies in Saccharomyces cerevisiae and Escherichia coli have investigated the relationship between transcription and translation rates and stochastic fluctuations in protein levels, or more generally, how such randomness is a function of intrinsic and extrinsic factors. However, the fundamental question of whether stochasticity in protein expression is generally biologically relevant has not been addressed, and it remains unknown whether random noise in the protein production rate of most genes significantly affects the fitness of any organism. We propose that organisms should be particularly sensitive to variation in the protein levels of two classes of genes: genes whose deletion is lethal to the organism and genes that encode subunits of multiprotein complexes. Using an experimentally verified model of stochastic gene expression in S. cerevisiae, we estimate the noise in protein production for nearly every yeast gene, and confirm our prediction that the production of essential and complex-forming proteins involves lower levels of noise than does the production of most other genes. Our results support the hypothesis that noise in gene expression is a biologically important variable, is generally detrimental to organismal fitness, and is subject to natural selection.
View details for DOI 10.1371/journal.pbio.0020137
View details for Web of Science ID 000222380400022
View details for PubMedID 15124029
Evolutionary rate depends on number of protein-protein interactions independently of gene expression level
BMC EVOLUTIONARY BIOLOGY
Whether or not a protein's number of physical interactions with other proteins plays a role in determining its rate of evolution has been a contentious issue. A recent analysis suggested that the observed correlation between number of interactions and evolutionary rate may be due to experimental biases in high-throughput protein interaction data sets.The number of interactions per protein, as measured by some protein interaction data sets, shows no correlation with evolutionary rate. Other data sets, however, do reveal a relationship. Furthermore, even when experimental biases of these data sets are taken into account, a real correlation between number of interactions and evolutionary rate appears to exist.A strong and significant correlation between a protein's number of interactions and evolutionary rate is apparent for interaction data from some studies. The extremely low agreement between different protein interaction data sets indicates that interaction data are still of low coverage and/or quality. These limitations may explain why some data sets reveal no correlation with evolutionary rates.
View details for Web of Science ID 000222014000001
View details for PubMedID 15165289
Detecting selection using a single genome sequence of M-tuberculosis and P-falciparum
2004; 428 (6986): 942-945
Selective pressures on proteins are usually measured by comparing nucleotide sequences. Here we introduce a method to detect selection on the basis of a single genome sequence. We catalogue the relative strength of selection on each gene in the entire genomes of Mycobacterium tuberculosis and Plasmodium falciparum. Our analysis confirms that most antigens are under strong selection for amino-acid substitutions, particularly the PE/PPE family of putative surface proteins in M. tuberculosis and the EMP1 family of cytoadhering surface proteins in P. falciparum. We also identify many uncharacterized proteins that are under strong selection in each pathogen. We provide a genome-wide analysis of natural selection acting on different stages of an organism's life cycle: genes expressed in the ring stage of P. falciparum are under stronger positive selection than those expressed in other stages of the parasite's life cycle. Our method of estimating selective pressures requires far fewer data than comparative sequence analysis, and it measures selection across an entire genome; the method can readily be applied to a large range of sequenced organisms.
View details for DOI 10.1038/nature02458
View details for Web of Science ID 000221083000041
View details for PubMedID 15118727
Detecting putative orthologs
2003; 19 (13): 1710-1711
We developed an algorithm that improves upon the common procedure of taking reciprocal best blast hits(rbh) in the identification of orthologs. The method-reciprocal smallest distance algorithm (rsd)-relies on global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. rsd finds many putative orthologs missed by rbh because it is less likely than rbh to be misled by the presence of a close paralog.
View details for DOI 10.1093/bioinformatics/btg213
View details for Web of Science ID 000185310600016
View details for PubMedID 15593400
A simple dependence between protein evolution rate and the number of protein-protein interactions
BMC EVOLUTIONARY BIOLOGY
It has been shown for an evolutionarily distant genomic comparison that the number of protein-protein interactions a protein has correlates negatively with their rates of evolution. However, the generality of this observation has recently been challenged. Here we examine the problem using protein-protein interaction data from the yeast Saccharomyces cerevisiae and genome sequences from two other yeast species.In contrast to a previous study that used an incomplete set of protein-protein interactions, we observed a highly significant correlation between number of interactions and evolutionary distance to either Candida albicans or Schizosaccharomyces pombe. This study differs from the previous one in that it includes all known protein interactions from S. cerevisiae, and a larger set of protein evolutionary rates. In both evolutionary comparisons, a simple monotonic relationship was found across the entire range of the number of protein-protein interactions. In agreement with our earlier findings, this relationship cannot be explained by the fact that proteins with many interactions tend to be important to yeast. The generality of these correlations in other kingdoms of life unfortunately cannot be addressed at this time, due to the incompleteness of protein-protein interaction data from organisms other than S. cerevisiae.Protein-protein interactions tend to slow the rate at which proteins evolve. This may be due to structural constraints that must be met to maintain interactions, but more work is needed to definitively establish the mechanism(s) behind the correlations we have observed.
View details for Web of Science ID 000188122100011
View details for PubMedID 12769820
Evolutionary rate in the protein interaction network
2002; 296 (5568): 750-752
High-throughput screens have begun to reveal the protein interaction network that underpins most cellular functions in the yeast Saccharomyces cerevisiae. How the organization of this network affects the evolution of the proteins that compose it is a fundamental question in molecular evolution. We show that the connectivity of well-conserved proteins in the network is negatively correlated with their rate of evolution. Proteins with more interactors evolve more slowly not because they are more important to the organism, but because a greater proportion of the protein is directly involved in its function. At sites important for interaction between proteins, evolutionary changes may occur largely by coevolution, in which substitutions in one protein result in selection pressure for reciprocal changes in interacting partners. We confirm one predicted outcome of this process-namely, that interacting proteins evolve at similar rates.
View details for Web of Science ID 000175281700060
View details for PubMedID 11976460
Explaining mortality rate plateaus
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA
2001; 98 (26): 15383-15386
We propose a stochastic model of aging to explain deviations from exponential growth in mortality rates commonly observed in empirical studies. Mortality rate plateaus are explained as a generic consequence of considering death in terms of first passage times for processes undergoing a random walk with drift. Simulations of populations with age-dependent distributions of viabilities agree with a wide array of experimental results. The influence of cohort size is well accounted for by the stochastic nature of the model.
View details for Web of Science ID 000172848800114
View details for PubMedID 11752476
Protein dispensability and rate of evolution
2001; 411 (6841): 1046-1049
If protein evolution is due in large part to slightly deleterious amino acid substitutions, then the rate of evolution should be greater in proteins that contribute less to individual fitness. The rationale for this prediction is that relatively dispensable proteins should be subject to weaker purifying selection, and should therefore accumulate mildly deleterious substitutions more rapidly. Although this argument was presented over twenty years ago, and is fundamental to many applications of evolutionary theory, the prediction has proved difficult to confirm. In fact, a recent study showed that essential mouse genes do not evolve more slowly than non-essential ones. Thus, although a variety of factors influencing the rate of protein evolution have been supported by extensive sequence analysis, the relationship between protein dispensability and evolutionary rate has remained unconfirmed. Here we use the results from a highly parallel growth assay of single gene deletions in yeast to assess protein dispensability, which we relate to evolutionary rate estimates that are based on comparisons of sequences drawn from twenty-one fully annotated genomes. Our analysis reveals a highly significant relationship between protein dispensability and evolutionary rate, and explains why this relationship is not detectable by categorical comparison of essential versus non-essential proteins. The relationship is highly conserved, so that protein dispensability in yeast is also predictive of evolutionary rate in a nematode worm.
View details for Web of Science ID 000169528500047
View details for PubMedID 11429604