Professional Education


  • Bachelor of Science, Massachusetts Institute of Technology (2008)
  • Doctor of Philosophy, California Institute of Technology (2014)

Lab Affiliations


All Publications


  • Boost longevity of economic model NATURE Marinov, G. K. 2020; 581 (7808): 262

    View details for Web of Science ID 000534256000016

    View details for PubMedID 32427913

  • Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nature methods Shipony, Z., Marinov, G. K., Swaffer, M. P., Sinnott-Armstrong, N. A., Skotheim, J. M., Kundaje, A., Greenleaf, W. J. 2020

    Abstract

    Mapping open chromatin regions has emerged as a widely used tool for identifying active regulatory elements in eukaryotes. However, existing approaches, limited by reliance on DNA fragmentation and short-read sequencing, cannot provide information about large-scale chromatin states or reveal coordination between the states of distal regulatory elements. We have developed a method for profiling the accessibility of individual chromatin fibers, a single-molecule long-read accessible chromatin mapping sequencing assay (SMAC-seq), enabling the simultaneous, high-resolution, single-molecule assessment of chromatin states at multikilobase length scales. Our strategy is based on combining the preferential methylation of open chromatin regions by DNA methyltransferases with low sequence specificity, in this case EcoGII, an N6-methyladenosine (m6A) methyltransferase, and the ability of nanopore sequencing to directly read DNA modifications. We demonstrate that aggregate SMAC-seq signals match bulk-level accessibility measurements, observe single-molecule nucleosome and transcription factor protection footprints, and quantify the correlation between chromatin states of distal genomic elements.

    View details for DOI 10.1038/s41592-019-0730-2

    View details for PubMedID 32042188

  • Mitigation of off-target toxicity in CRISPR-Cas9 screens for essential non-coding elements. Nature communications Tycko, J., Wainberg, M., Marinov, G. K., Ursu, O., Hess, G. T., Ego, B. K., Li, A., Truong, A., Trevino, A. E., Spees, K., Yao, D., Kaplow, I. M., Greenside, P. G., Morgens, D. W., Phanstiel, D. H., Snyder, M. P., Bintu, L., Greenleaf, W. J., Kundaje, A., Bassik, M. C. 2019; 10 (1): 4063

    Abstract

    Pooled CRISPR-Cas9 screens are a powerful method for functionally characterizing regulatory elements in the non-coding genome, but off-target effects in these experiments have not been systematically evaluated. Here, we investigate Cas9, dCas9, and CRISPRi/a off-target activity in screens for essential regulatory elements. The sgRNAs with the largest effects in genome-scale screens for essential CTCF loop anchors in K562 cells were not single guide RNAs (sgRNAs) that disrupted gene expression near the on-target CTCF anchor. Rather, these sgRNAs had high off-target activity that, while only weakly correlated with absolute off-target site number, could be predicted by the recently developed GuideScan specificity score. Screens conducted in parallel with CRISPRi/a, which do not induce double-stranded DNA breaks, revealed that a distinct set of off-targets also cause strong confounding fitness effects with these epigenome-editing tools. Promisingly, filtering of CRISPRi libraries using GuideScan specificity scores removed these confounded sgRNAs and enabled identification of essential regulatory elements.

    View details for DOI 10.1038/s41467-019-11955-7

    View details for PubMedID 31492858

  • Population Genetics of Paramecium Mitochondrial Genomes: Recombination, Mutation Spectrum, and Efficacy of Selection. Genome biology and evolution Johri, P., Marinov, G. K., Doak, T. G., Lynch, M. 2019; 11 (5): 1398–1416

    Abstract

    The evolution of mitochondrial genomes and their population-genetic environment among unicellular eukaryotes are understudied. Ciliate mitochondrial genomes exhibit a unique combination of characteristics, including a linear organization and the presence of multiple genes with no known function or detectable homologs in other eukaryotes. Here we study the variation of ciliate mitochondrial genomes both within and across 13 highly diverged Paramecium species, including multiple species from the P. aurelia species complex, with four outgroup species: P. caudatum, P. multimicronucleatum, and two strains that may represent novel related species. We observe extraordinary conservation of gene order and protein-coding content in Paramecium mitochondria across species. In contrast, significant differences are observed in tRNA content and copy number, which is highly conserved in species belonging to the P. aurelia complex but variable among and even within the other Paramecium species. There is an increase in GC content from ∼20% to ∼40% on the branch leading to the P. aurelia complex. Patterns of polymorphism in population-genomic data and mutation-accumulation experiments suggest that the increase in GC content is primarily due to changes in the mutation spectra in the P. aurelia species. Finally, we find no evidence of recombination in Paramecium mitochondria and find that the mitochondrial genome appears to experience either similar or stronger efficacy of purifying selection than the nucleus.

    View details for DOI 10.1093/gbe/evz081

    View details for PubMedID 30980669

    View details for PubMedCentralID PMC6505448

  • Deciphering regulatory DNA sequences and noncoding genetic variants using neural network models of massively parallel reporter assays. PloS one Movva, R., Greenside, P., Marinov, G. K., Nair, S., Shrikumar, A., Kundaje, A. 2019; 14 (6): e0218073

    Abstract

    The relationship between noncoding DNA sequence and gene expression is not well-understood. Massively parallel reporter assays (MPRAs), which quantify the regulatory activity of large libraries of DNA sequences in parallel, are a powerful approach to characterize this relationship. We present MPRA-DragoNN, a convolutional neural network (CNN)-based framework to predict and interpret the regulatory activity of DNA sequences as measured by MPRAs. While our method is generally applicable to a variety of MPRA designs, here we trained our model on the Sharpr-MPRA dataset that measures the activity of ∼500,000 constructs tiling 15,720 regulatory regions in human K562 and HepG2 cell lines. MPRA-DragoNN predictions were moderately correlated (Spearman ρ = 0.28) with measured activity and were within range of replicate concordance of the assay. State-of-the-art model interpretation methods revealed high-resolution predictive regulatory sequence features that overlapped transcription factor (TF) binding motifs. We used the model to investigate the cell type and chromatin state preferences of predictive TF motifs. We explored the ability of our model to predict the allelic effects of regulatory variants in an independent MPRA experiment and fine map putative functional SNPs in loci associated with lipid traits. Our results suggest that interpretable deep learning models trained on MPRA data have the potential to reveal meaningful patterns in regulatory DNA sequences and prioritize regulatory genetic variants, especially as larger, higher-quality datasets are produced.

    View details for DOI 10.1371/journal.pone.0218073

    View details for PubMedID 31206543

  • Response to Martin and colleagues: mitochondria do not boost the bioenergetic capacity of eukaryotic cells BIOLOGY DIRECT Lynch, M., Marinov, G. K. 2018; 13
  • A decade of ChIP-seq BRIEFINGS IN FUNCTIONAL GENOMICS Marinov, G. K. 2018; 17 (2): 77–79

    View details for PubMedID 29596621

  • ChIP-ping the branches of the tree: functional genomics and the evolution of eukaryotic gene regulation BRIEFINGS IN FUNCTIONAL GENOMICS Marinov, G. K., Kundaje, A. 2018; 17 (2): 116–37

    Abstract

    Advances in the methods for detecting protein-DNA interactions have played a key role in determining the directions of research into the mechanisms of transcriptional regulation. The most recent major technological transformation happened a decade ago, with the move from using tiling arrays [chromatin immunoprecipitation (ChIP)-on-Chip] to high-throughput sequencing (ChIP-seq) as a readout for ChIP assays. In addition to the numerous other ways in which it is superior to arrays, by eliminating the need to design and manufacture them, sequencing also opened the door to carrying out comparative analyses of genome-wide transcription factor occupancy across species and studying chromatin biology in previously less accessible model and nonmodel organisms, thus allowing us to understand the evolution and diversity of regulatory mechanisms in unprecedented detail. Here, we review the biological insights obtained from such studies in recent years and discuss anticipated future developments in the field.

    View details for DOI 10.1093/bfgp/ely004

    View details for Web of Science ID 000429027600006

    View details for PubMedID 29529131

  • Response to Martin and colleagues: mitochondria do not boost the bioenergetic capacity of eukaryotic cells. Biology direct Lynch, M., Marinov, G. K. 2018; 13 (1): 26

    Abstract

    A recent paper by (Gerlitz et al., Biol Direct 13:21, 2018) questions the validity of the data underlying prior analyses on the bioenergetics capacities of cells, and continues to promote the idea that the mitochondrion endowed eukaryotic cells with energetic superiority over prokaryotes. The former point has been addressed previously, with no resultant changes in the conclusions, and the latter point remains inconsistent with multiple lines of empirical data.

    View details for PubMedID 30621777

  • Population Genomics of Paramecium Species MOLECULAR BIOLOGY AND EVOLUTION Johri, P., Krenek, S., Marinov, G. K., Doak, T. G., Berendonk, T. U., Lynch, M. 2017; 34 (5): 1194-1216

    Abstract

    Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium.

    View details for DOI 10.1093/molbev/msx074

    View details for Web of Science ID 000399373300013

    View details for PubMedID 28204679

  • Membranes, energetics, and evolution across the prokaryote-eukaryote divide ELIFE Lynch, M., Marinov, G. K. 2017; 6

    Abstract

    The evolution of the eukaryotic cell marked a profound moment in Earth's history, with most of the visible biota coming to rely on intracellular membrane-bound organelles. It has been suggested that this evolutionary transition was critically dependent on the movement of ATP synthesis from the cell surface to mitochondrial membranes and the resultant boost to the energetic capacity of eukaryotic cells. However, contrary to this hypothesis, numerous lines of evidence suggest that eukaryotes are no more bioenergetically efficient than prokaryotes. Thus, although the origin of the mitochondrion was a key event in evolutionary history, there is no reason to think membrane bioenergetics played a direct, causal role in the transition from prokaryotes to eukaryotes and the subsequent explosive diversification of cellular and organismal complexity.

    View details for DOI 10.7554/elaife.20437.001

    View details for Web of Science ID 000397627000001

    View details for PubMedID 28300533

    View details for PubMedCentralID PMC5354521

  • On the design and prospects of direct RNA sequencing. Briefings in functional genomics Marinov, G. K. 2017

    Abstract

    Throughout the past nearly a decade, the application of high-throughput sequencing to RNA molecules in the form of RNA sequencing (RNA-seq) and its many variations has revolutionized transcriptomic studies by enabling researchers to take a simultaneously deep and truly global look into the transcriptome. However, there is still considerable scope for improvement on RNA-seq data in its current form, primarily because of the short-read nature of the dominant sequencing technologies, which prevents the completely reliable reconstruction and quantification of full-length transcripts, and the sequencing library building protocols used, which introduce various distortions in the final data sets. The ideal approach toward resolving these remaining issues would involve the direct amplification-free sequencing of full-length RNA molecules. This has recently become practical with the advent of nanopore sequencing, which raises the possibility of yet another revolution in transcriptomics. I discuss the design considerations to be taken into account, the technical challenges that need to be addressed and the biological questions these advances can be expected to resolve.

    View details for DOI 10.1093/bfgp/elw043

    View details for PubMedID 28334071

  • ChIP-seq for the Identification of Functional Elements in the Human Genome. Methods in molecular biology (Clifton, N.J.) Marinov, G. K. 2017; 1543: 3-18

    Abstract

    Functional elements in the genome express their function through physical association with particular proteins: transcription factors, components of the transcription machinery, specific histone modifications, and others. The genome-wide characterization of the protein-DNA interaction landscape of these proteins is thus a key approach toward the identification of candidate genomic regulatory regions. ChIP-seq (Chromatin Immunoprecipitation coupled with high-throughput sequencing) has emerged as the primary experimental methods for carrying out this task. Here, the ChIP-seq protocol is described together with some of the most important considerations for applying it in practice.

    View details for DOI 10.1007/978-1-4939-6716-2_1

    View details for PubMedID 28349419

  • Transcriptomic analysis of the role of RasGEF1B circular RNA in the TLR4/LPS pathway. Scientific reports Ng, W. L., Marinov, G. K., Chin, Y. M., Lim, Y. Y., Ea, C. K. 2017; 7 (1): 12227

    Abstract

    Circular RNAs (circRNAs) have recently emerged as a large class of novel non-coding RNA species. However, the detailed functional significance of the vast majority of them remains to be elucidated. Most functional characterization studies targeting circRNAs have been limited to resting cells, leaving their role in dynamic cellular responses to stimuli largely unexplored. In this study, we focus on the LPS-induced cytoplasmic circRNA, mcircRasGEF1B, and combine targeted mcircRasGEF1B depletion with high-throughput transcriptomic analysis to gain insight into its function during the cellular response to LPS stimulation. We show that knockdown of mcircRasGEF1B results in altered expression of a wide array of genes. Pathway analysis revealed an overall enrichment of genes involved in cell cycle progression, mitotic division, active metabolism, and of particular interest, NF-κB, LPS signaling pathways, and macrophage activation. These findings expand the set of functionally characterized circRNAs and support the regulatory role of mcircRasGEF1B in immune response during macrophage activation and protection against microbial infections.

    View details for PubMedID 28947785

  • Identification of Candidate Functional Elements in the Genome from ChIP-seq Data. Methods in molecular biology (Clifton, N.J.) Marinov, G. K. 2017; 1543: 19-43

    Abstract

    ChIP-seq datasets provide a wealth of information for the identification of candidate regulatory elements in the genome. For this potential to be fully realized, methods for evaluating data quality and for distinguishing reproducible signal from technical and biological noise are necessary. Here, the computational methods for addressing these challenges developed by the ENCODE Consortium are described and the key considerations for analyzing and interpreting ChIP-seq data are discussed.

    View details for DOI 10.1007/978-1-4939-6716-2_2

    View details for PubMedID 28349420

  • SLC7A11 Overexpression in Glioblastoma Is Associated with Increased Cancer Stem Cell-Like Properties. Stem cells and development Polewski, M. D., Reveron-Thornton, R. F., Cherryholmes, G. A., Marinov, G. K., Aboody, K. S. 2017; 26 (17): 1236–46

    Abstract

    System xc(-) is a sodium-independent electroneutral transporter, comprising a catalytic subunit xCT (SLC7A11), which is involved in importing cystine. Certain cancers such as gliomas upregulate the expression of system xc(-), which confers a survival advantage against the detrimental effects of reactive oxygen species (ROS) by increasing generation of the antioxidant glutathione. However, ROS have also been shown to function as targeted, intracellular second messengers in an array of physiological processes such as proliferation. Several studies have implicated ROS in important cancer features such as migration, invasion, and contribution to a cancer stem cell (CSC)-like phenotype. The role of system xc(-) in regulating these ROS-sensitive processes in glioblastoma multiforme (GBM), the most aggressive malignant primary brain tumor in adults, remains unknown. Stable SLC7A11 knockdown and overexpressing U251 glioma cells were generated and characterized to understand the role of redox and system xc(-) in glioma progression. SLC7A11 knockdown resulted in higher endogenous ROS levels and enhanced invasive properties. On the contrary, overexpression of SLC7A11 resulted in decreased endogenous ROS levels as well as decreased migration and invasion. However, SLC7A11-overexpressing cells displayed actin cytoskeleton changes reminiscent of epithelial-like cells and exhibited an increased CSC-like phenotype. The enhanced CSC-like phenotype may contribute to increased chemoresistance and suggests that overexpression of SLC7A11 in the context of GBM may contribute to tumor progression. These findings have important implications for cancer management where targeting system xC(-) in combination with other chemotherapeutics can reduce cancer resistance and recurrence and improve GBM patient survival.

    View details for DOI 10.1089/scd.2017.0123

    View details for PubMedID 28610554

    View details for PubMedCentralID PMC5576215

  • Increased Expression of System x(c)(-) in Glioblastoma Confers an Altered Metabolic State and Temozolomide Resistance MOLECULAR CANCER RESEARCH Polewski, M. D., Reveron-Thornton, R. F., Cherryholmes, G. A., Marinov, G. K., Cassady, K., Aboody, K. S. 2016; 14 (12): 1229-1242

    Abstract

    Glioblastoma multiforme is the most aggressive malignant primary brain tumor in adults. Several studies have shown that glioma cells upregulate the expression of xCT (SLC7A11), the catalytic subunit of system xc(-), a transporter involved in cystine import, that modulates glutathione production and glioma growth. However, the role of system xc(-) in regulating the sensitivity of glioma cells to chemotherapy is currently debated. Inhibiting system xc(-) with sulfasalazine decreased glioma growth and survival via redox modulation, and use of the chemotherapeutic agent temozolomide together with sulfasalazine had a synergistic effect on cell killing. To better understand the functional consequences of system xc(-) in glioma, stable SLC7A11-knockdown and -overexpressing U251 glioma cells were generated. Modulation of SLC7A11 did not alter cellar proliferation but overexpression did increase anchorage-independent cell growth. Knockdown of SLC7A11 increased basal reactive oxygen species (ROS) and decreased glutathione generation resulting in increased cell death under oxidative and genotoxic stress. Overexpression of SLC7A11 resulted in increased resistance to oxidative stress and decreased chemosensitivity to temozolomide. In addition, SLC7A11 overexpression was associated with altered cellular metabolism including increased mitochondrial biogenesis, oxidative phosphorylation, and ATP generation. These results suggest that expression of SLC7A11 in the context of glioma contributes to tumorigenesis, tumor progression, and resistance to standard chemotherapy.SLC7A11, in addition to redox modulation, appears to be associated with increased cellular metabolism and is a mediator of temozolomide resistance in human glioma, thus making system xC(-) a potential therapeutic target in glioblastoma multiforme. Mol Cancer Res; 14(12); 1229-42. ©2016 AACR.

    View details for DOI 10.1158/1541-7786.MCR-16-0028

    View details for Web of Science ID 000389632700007

    View details for PubMedID 27658422

  • Conservation and divergence of the histone code in nucleomorphs BIOLOGY DIRECT Marinov, G. K., Lynch, M. 2016; 11

    Abstract

    Nucleomorphs, the remnant nuclei of photosynthetic algae that have become endosymbionts to other eukaryotes, represent a unique example of convergent reductive genome evolution in eukaryotes, having evolved independently on two separate occasions in chlorarachniophytes and cryptophytes. The nucleomorphs of the two groups have evolved in a remarkably convergent manner, with numerous very similar features. Chief among them is the extreme reduction and compaction of nucleomorph genomes, with very small chromosomes and extremely short or even completely absent intergenic spaces. These characteristics pose a number of intriguing questions regarding the mechanisms of transcription and gene regulation in such a crowded genomic context, in particular in terms of the functioning of the histone code, which is common to almost all eukaryotes and plays a central role in chromatin biology.This study examines the sequences of nucleomorph histone proteins in order to address these issues. Remarkably, all classical transcription- and repression-related components of the histone code seem to be missing from chlorarachniophyte nucleomorphs. Cryptophyte nucleomorph histones are generally more similar to the conventional eukaryotic state; however, they also display significant deviations from the typical histone code. Based on the analysis of specific components of the code, we discuss the state of chromatin and the transcriptional machinery in these nuclei.The results presented here shed new light on the mechanisms of nucleomorph transcription and gene regulation and provide a foundation for future studies of nucleomorph chromatin and transcriptional biology.

    View details for DOI 10.1186/s13062-016-0119-4

    View details for Web of Science ID 000373323800001

    View details for PubMedID 27048461

    View details for PubMedCentralID PMC4822330

  • Splicing-independent loading of TREX on nascent RNA is required for efficient expression of dual-strand piRNA clusters in Drosophila GENES & DEVELOPMENT Hur, J. K., Luo, Y., Moon, S., Ninova, M., Marinov, G. K., Chung, Y. D., Aravin, A. A. 2016; 30 (7): 840-855

    Abstract

    The conserved THO/TREX (transcription/export) complex is critical for pre-mRNA processing and mRNA nuclear export. In metazoa, TREX is loaded on nascent RNA transcribed by RNA polymerase II in a splicing-dependent fashion; however, how TREX functions is poorly understood. Here we show that Thoc5 and other TREX components are essential for the biogenesis of piRNA, a distinct class of small noncoding RNAs that control expression of transposable elements (TEs) in the Drosophila germline. Mutations in TREX lead to defects in piRNA biogenesis, resulting in derepression of multiple TE families, gametogenesis defects, and sterility. TREX components are enriched on piRNA precursors transcribed from dual-strand piRNA clusters and colocalize in distinct nuclear foci that overlap with sites of piRNA transcription. The localization of TREX in nuclear foci and its loading on piRNA precursor transcripts depend on Cutoff, a protein associated with chromatin of piRNA clusters. Finally, we show that TREX is required for accumulation of nascent piRNA precursors. Our study reveals a novel splicing-independent mechanism for TREX loading on nascent RNA and its importance in piRNA biogenesis.

    View details for DOI 10.1101/gad.276030.115

    View details for Web of Science ID 000373194300008

    View details for PubMedID 27036967

    View details for PubMedCentralID PMC4826399

  • Reply to Lane and Martin: Mitochondria do not boost the bioenergetic capacity of eukaryotic cells. Proceedings of the National Academy of Sciences of the United States of America Lynch, M., Marinov, G. K. 2016; 113 (6): E667-8

    View details for DOI 10.1073/pnas.1523394113

    View details for PubMedID 26811483

    View details for PubMedCentralID PMC4760791

  • Draft Whole-Genome Sequence of Haemophilus ducreyi Strain AUSPNG1, Isolated from a Cutaneous Ulcer of a Child from Papua New Guinea. Genome announcements Gangaiah, D., Marinov, G. K., Roberts, S. A., Robson, J., Spinola, S. M. 2016; 4 (1)

    Abstract

    Haemophilus ducreyi has recently emerged as a leading cause of cutaneous ulcers in the yaws-endemic areas of Papua New Guinea and other South Pacific islands. Here, we report the draft genome sequence of the H. ducreyi strain AUSPNG1, isolated from a cutaneous ulcer of a child from Papua New Guinea.

    View details for DOI 10.1128/genomeA.01661-15

    View details for PubMedID 26847887

    View details for PubMedCentralID PMC4742684

  • Inducible RasGEF1B circular RNA is a positive regulator of ICAM-1 in the TLR4/LPS pathway RNA BIOLOGY Ng, W. L., Marinov, G. K., Liau, E. S., Lam, Y. L., Lim, Y., Ea, C. 2016; 13 (9): 861-871

    Abstract

    Circular RNAs (circRNAs) constitute a large class of RNA species formed by the back-splicing of co-linear exons, often within protein-coding transcripts. Despite much progress in the field, it remains elusive whether the majority of circRNAs are merely aberrant splicing by-products with unknown functions, or their production is spatially and temporally regulated to carry out specific biological functions. To date, the majority of circRNAs have been cataloged in resting cells. Here, we identify an LPS-inducible circRNA: mcircRasGEF1B, which is predominantly localized in cytoplasm, shows cell-type specific expression, and has a human homolog with similar properties, hcircRasGEF1B. We show that knockdown of the expression of mcircRasGEF1B reduces LPS-induced ICAM-1 expression. Additionally, we demonstrate that mcircRasGEF1B regulates the stability of mature ICAM-1 mRNAs. These findings expand the inventory of functionally characterized circRNAs with a novel RNA species that may play a critical role in fine-tuning immune responses and protecting cells against microbial infection.

    View details for DOI 10.1080/15476286.2016.1207036

    View details for Web of Science ID 000383358100014

    View details for PubMedID 27362560

    View details for PubMedCentralID PMC5014010

  • The bioenergetic costs of a gene PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Lynch, M., Marinov, G. K. 2015; 112 (51): 15690-15695

    Abstract

    An enduring mystery of evolutionary genomics concerns the mechanisms responsible for lineage-specific expansions of genome size in eukaryotes, especially in multicellular species. One idea is that all excess DNA is mutationally hazardous, but weakly enough so that genome-size expansion passively emerges in species experiencing relatively low efficiency of selection owing to small effective population sizes. Another idea is that substantial gene additions were impossible without the energetic boost provided by the colonizing mitochondrion in the eukaryotic lineage. Contrary to this latter view, analysis of cellular energetics and genomics data from a wide variety of species indicates that, relative to the lifetime ATP requirements of a cell, the costs of a gene at the DNA, RNA, and protein levels decline with cell volume in both bacteria and eukaryotes. Moreover, these costs are usually sufficiently large to be perceived by natural selection in bacterial populations, but not in eukaryotes experiencing high levels of random genetic drift. Thus, for scaling reasons that are not yet understood, by virtue of their large size alone, eukaryotic cells are subject to a broader set of opportunities for the colonization of novel genes manifesting weakly advantageous or even transiently disadvantageous phenotypic effects. These results indicate that the origin of the mitochondrion was not a prerequisite for genome-size expansion.

    View details for DOI 10.1073/pnas.1514974112

    View details for Web of Science ID 000366916000053

    View details for PubMedID 26575626

    View details for PubMedCentralID PMC4697398

  • Diversity and Divergence of Dinoflagellate Histone Proteins. G3 (Bethesda, Md.) Marinov, G. K., Lynch, M. 2015; 6 (2): 397-422

    Abstract

    Histone proteins and the nucleosomal organization of chromatin are near-universal eukaroytic features, with the exception of dinoflagellates. Previous studies have suggested that histones do not play a major role in the packaging of dinoflagellate genomes, although several genomic and transcriptomic surveys have detected a full set of core histone genes. Here, transcriptomic and genomic sequence data from multiple dinoflagellate lineages are analyzed, and the diversity of histone proteins and their variants characterized, with particular focus on their potential post-translational modifications and the conservation of the histone code. In addition, the set of putative epigenetic mark readers and writers, chromatin remodelers and histone chaperones are examined. Dinoflagellates clearly express the most derived set of histones among all autonomous eukaryote nuclei, consistent with a combination of relaxation of sequence constraints imposed by the histone code and the presence of numerous specialized histone variants. The histone code itself appears to have diverged significantly in some of its components, yet others are conserved, implying conservation of the associated biochemical processes. Specifically, and with major implications for the function of histones in dinoflagellates, the results presented here strongly suggest that transcription through nucleosomal arrays happens in dinoflagellates. Finally, the plausible roles of histones in dinoflagellate nuclei are discussed.

    View details for DOI 10.1534/g3.115.023275

    View details for PubMedID 26646152

    View details for PubMedCentralID PMC4751559

  • The microRNA-212/132 cluster regulates B cell development by targeting Sox4 JOURNAL OF EXPERIMENTAL MEDICINE Mehta, A., Mann, M., Zhao, J. L., Marinov, G. K., Majumdar, D., Garcia-Flores, Y., Du, X., Erikci, E., Chowdhury, K., Baltimore, D. 2015; 212 (10): 1679-1692

    Abstract

    MicroRNAs have emerged as key regulators of B cell fate decisions and immune function. Deregulation of several microRNAs in B cells leads to the development of autoimmune disease and cancer in mice. We demonstrate that the microRNA-212/132 cluster (miR-212/132) is induced in B cells in response to B cell receptor signaling. Enforced expression of miR-132 results in a block in early B cell development at the prepro-B cell to pro-B cell transition and induces apoptosis in primary bone marrow B cells. Importantly, loss of miR-212/132 results in accelerated B cell recovery after antibody-mediated B cell depletion. We find that Sox4 is a target of miR-132 in B cells. Co-expression of SOX4 with miR-132 rescues the defect in B cell development from overexpression of miR-132 alone, thus suggesting that miR-132 may regulate B lymphopoiesis through Sox4. In addition, we show that the expression of miR-132 can inhibit cancer development in cells that are prone to B cell cancers, such as B cells expressing the c-Myc oncogene. We have thus uncovered miR-132 as a novel contributor to B cell development.

    View details for DOI 10.1084/jem.20150489

    View details for Web of Science ID 000365135200016

    View details for PubMedID 26371188

    View details for PubMedCentralID PMC4577845

  • MIWI2 and MILI Have Differential Effects on piRNA Biogenesis and DNA Methylation CELL REPORTS Manakov, S. A., Pezic, D., Marinov, G. K., Pastor, W. A., Sachidanandam, R., Aravin, A. A. 2015; 12 (8): 1234-1243

    Abstract

    In developing male germ cells, prospermatogonia, two Piwi proteins, MILI and MIWI2, use Piwi-interacting RNA (piRNA) guides to repress transposable element (TE) expression and ensure genome stability and proper gametogenesis. In addition to their roles in post-transcriptional TE repression, both proteins are required for DNA methylation of TE sequences. Here, we analyzed the effect of Miwi2 deficiency on piRNA biogenesis and transposon repression. Miwi2 deficiency had only a minor impact on piRNA biogenesis; however, the piRNA profile of Miwi2-knockout mice indicated overexpression of several LINE1 TE families that led to activation of the ping-pong piRNA cycle. Furthermore, we found that MILI and MIWI2 have distinct functions in TE repression in the nucleus. MILI is responsible for DNA methylation of a larger subset of TE families than MIWI2 is, suggesting that the proteins have independent roles in establishing DNA methylation patterns.

    View details for DOI 10.1016/j.celrep.2015.07.036

    View details for Web of Science ID 000360182200003

    View details for PubMedID 26279574

    View details for PubMedCentralID PMC4554733

  • The MicroRNA-132 and MicroRNA-212 Cluster Regulates Hematopoietic Stem Cell Maintenance and Survival with Age by Buffering FOXO3 Expression IMMUNITY Mehta, A., Zhao, J. L., Sinha, N., Marinov, G. K., Mann, M., Kowalczyk, M. S., Galimidi, R. P., Du, X., Erikci, E., Regev, A., Chowdhury, K., Baltimore, D. 2015; 42 (6): 1021-1032

    Abstract

    MicroRNAs are critical post-transcriptional regulators of hematopoietic cell-fate decisions, though little remains known about their role in aging hematopoietic stem cells (HSCs). We found that the microRNA-212/132 cluster (Mirc19) is enriched in HSCs and is upregulated during aging. Both overexpression and deletion of microRNAs in this cluster leads to inappropriate hematopoiesis with age. Enforced expression of miR-132 in the bone marrow of mice led to rapid HSC cycling and depletion. A genetic deletion of Mirc19 in mice resulted in HSCs that had altered cycling, function, and survival in response to growth factor starvation. We found that miR-132 exerted its effect on aging HSCs by targeting the transcription factor FOXO3, a known aging associated gene. Our data demonstrate that Mirc19 plays a role in maintaining balanced hematopoietic output by buffering FOXO3 expression. We have thus identified it as a potential target that might play a role in age-related hematopoietic defects.

    View details for DOI 10.1016/j.immuni.2015.05.017

    View details for Web of Science ID 000356362800009

    View details for PubMedID 26084022

    View details for PubMedCentralID PMC4471877

  • Genome Sequence of Magnetospirillum magnetotacticum Strain MS-1. Genome announcements Smalley, M. D., Marinov, G. K., Bertani, L. E., DeSalvo, G. 2015; 3 (2)

    Abstract

    Here, we report the genome sequence of Magnetospirillum magnetotacticum strain MS-1, which consists of of 36 contigs and 4,136 protein-coding genes.

    View details for DOI 10.1128/genomeA.00233-15

    View details for PubMedID 25838488

    View details for PubMedCentralID PMC4384492

  • The elephant in the room Advertising science as a driver of economic growth is a long-term losing strategy EMBO REPORTS Marinov, G. K. 2015; 16 (4): 399-403

    View details for Web of Science ID 000352167500002

    View details for PubMedID 25744523

    View details for PubMedCentralID PMC4388605

  • Pitfalls of Mapping High-Throughput Sequencing Data to Repetitive Sequences: Piwi's Genomic Targets Still Not Identified DEVELOPMENTAL CELL Marinov, G. K., Wang, J., Handler, D., Wold, B. J., Weng, Z., Hannon, G. J., Aravin, A. A., Zamore, P. D., Brennecke, J., Toth, K. F. 2015; 32 (6): 765-771

    Abstract

    Huang et al. (2013) recently reported that chromatin immunoprecipitation sequencing (ChIP-seq) reveals the genome-wide sites of occupancy by Piwi, a piRNA-guided Argonaute protein central to transposon silencing in Drosophila. Their study also reported that loss of Piwi causes widespread rewiring of transcriptional patterns, as evidenced by changes in RNA polymerase II occupancy across the genome. Here we reanalyze their data and report that the underlying deep-sequencing dataset does not support the authors' genome-wide conclusions.

    View details for DOI 10.1016/j.devcel.2015.01.013

    View details for Web of Science ID 000351841900014

    View details for PubMedID 25805138

    View details for PubMedCentralID PMC4494788

  • Single-Cell Transcriptome Analysis Reveals Dynamic Changes in lncRNA Expression during Reprogramming CELL STEM CELL Kim, D. H., Marinov, G. K., Pepke, S., Singer, Z. S., He, P., Williams, B., Schroth, G. P., Elowitz, M. B., Wold, B. J. 2015; 16 (1): 88-101

    Abstract

    Cellular reprogramming highlights the epigenetic plasticity of the somatic cell state. Long noncoding RNAs (lncRNAs) have emerging roles in epigenetic regulation, but their potential functions in reprogramming cell fate have been largely unexplored. We used single-cell RNA sequencing to characterize the expression patterns of over 16,000 genes, including 437 lncRNAs, during defined stages of reprogramming to pluripotency. Self-organizing maps (SOMs) were used as an intuitive way to structure and interrogate transcriptome data at the single-cell level. Early molecular events during reprogramming involved the activation of Ras signaling pathways, along with hundreds of lncRNAs. Loss-of-function studies showed that activated lncRNAs can repress lineage-specific genes, while lncRNAs activated in multiple reprogramming cell types can regulate metabolic gene expression. Our findings demonstrate that reprogramming cells activate defined sets of functionally relevant lncRNAs and provide a resource to further investigate how dynamic changes in the transcriptome reprogram cell state.

    View details for DOI 10.1016/j.stem.2014.11.005

    View details for Web of Science ID 000347708400014

    View details for PubMedID 25575081

    View details for PubMedCentralID PMC4291542

  • A comparative encyclopedia of DNA elements in the mouse genome NATURE Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., De Sousa, B. L., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Santos, M. R., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-?

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for Web of Science ID 000345770600034

  • A comparative encyclopedia of DNA elements in the mouse genome. Nature Yue, F., Cheng, Y., Breschi, A., Vierstra, J., Wu, W., Ryba, T., Sandstrom, R., Ma, Z., Davis, C., Pope, B. D., Shen, Y., Pervouchine, D. D., Djebali, S., Thurman, R. E., Kaul, R., Rynes, E., Kirilusha, A., Marinov, G. K., Williams, B. A., Trout, D., Amrhein, H., Fisher-Aylor, K., Antoshechkin, I., DeSalvo, G., See, L., Fastuca, M., Drenkow, J., Zaleski, C., Dobin, A., Prieto, P., Lagarde, J., Bussotti, G., Tanzer, A., Denas, O., Li, K., Bender, M. A., Zhang, M., Byron, R., Groudine, M. T., McCleary, D., Pham, L., Ye, Z., Kuan, S., Edsall, L., Wu, Y., Rasmussen, M. D., Bansal, M. S., Kellis, M., Keller, C. A., Morrissey, C. S., Mishra, T., Jain, D., Dogan, N., Harris, R. S., Cayting, P., Kawli, T., Boyle, A. P., Euskirchen, G., Kundaje, A., Lin, S., Lin, Y., Jansen, C., Malladi, V. S., Cline, M. S., Erickson, D. T., Kirkup, V. M., Learned, K., Sloan, C. A., Rosenbloom, K. R., Lacerda de Sousa, B., Beal, K., Pignatelli, M., Flicek, P., Lian, J., Kahveci, T., Lee, D., Kent, W. J., Ramalho Santos, M., Herrero, J., Notredame, C., Johnson, A., Vong, S., Lee, K., Bates, D., Neri, F., Diegel, M., Canfield, T., Sabo, P. J., Wilken, M. S., Reh, T. A., Giste, E., Shafer, A., Kutyavin, T., Haugen, E., Dunn, D., Reynolds, A. P., Neph, S., Humbert, R., Hansen, R. S., de Bruijn, M., Selleri, L., Rudensky, A., Josefowicz, S., Samstein, R., Eichler, E. E., Orkin, S. H., Levasseur, D., Papayannopoulou, T., Chang, K., Skoultchi, A., Gosh, S., Disteche, C., Treuting, P., Wang, Y., Weiss, M. J., Blobel, G. A., Cao, X., Zhong, S., Wang, T., Good, P. J., Lowdon, R. F., Adams, L. B., Zhou, X., Pazin, M. J., Feingold, E. A., Wold, B., Taylor, J., Mortazavi, A., Weissman, S. M., Stamatoyannopoulos, J. A., Snyder, M. P., Guigo, R., Gingeras, T. R., Gilbert, D. M., Hardison, R. C., Beer, M. A., Ren, B. 2014; 515 (7527): 355-364

    Abstract

    The laboratory mouse shares the majority of its protein-coding genes with humans, making it the premier model organism in biomedical research, yet the two mammals differ in significant ways. To gain greater insights into both shared and species-specific transcriptional and cellular regulatory programs in the mouse, the Mouse ENCODE Consortium has mapped transcription, DNase I hypersensitivity, transcription factor binding, chromatin modifications and replication domains throughout the mouse genome in diverse cell and tissue types. By comparing with the human genome, we not only confirm substantial conservation in the newly annotated potential functional sequences, but also find a large degree of divergence of sequences involved in transcriptional regulation, chromatin state and higher order chromatin organization. Our results illuminate the wide range of evolutionary forces acting on genes and their regulatory regions, and provide a general resource for research into mammalian biology and mechanisms of human diseases.

    View details for DOI 10.1038/nature13992

    View details for PubMedID 25409824

  • A ratiometric-based measure of gene co-expression BMC BIOINFORMATICS Abelin, A. C., Marinov, G. K., Williams, B. A., McCue, K., Wold, B. J. 2014; 15

    Abstract

    Gene co-expression analysis has previously been based on measures that include correlation coefficients and mutual information, as well as newcomers such as MIC. These measures depend primarily on the degree of association between the RNA levels of two genes and to a lesser extent on their variability. They focus on the similarity of expression value trajectories that change in like manner across samples. However there are relationships of biological interest for which these classical measures are expected to be insensitive. These include genes whose expression levels are ratiometrically stable and genes whose variance is tightly constrained. Large-scale studies of relatively homogeneous samples, including single cell RNA-seq, are experimental settings in which such relationships might be especially pertinent.We develop and implement a ratiometric approach for detecting gene associations (abbreviated RA). It is based on the coefficient of variation of the measured expression ratio of each pair of genes. We apply it to a collection of lymphoblastoid RNA-seq data from the 1000 Genomes Project Consortium, a typical sample set with high overall homogeneity. RA is a selective method, reporting in this case ~1/4 of all possible gene pairs, yet these relationships include a distilled picture of biological relationships previously found by other methods. In addition, RA reveals expression relationships that are not detected by traditional correlation and mutual information methods. We also analyze data from individual lymphoblastoid cells and show that desirable properties of the RA method extend to single-cell RNA-seq.We show that our ratiometric method identifies biologically significant relationships that are often missed or low-ranked by conventional association-based methods when applied to a relatively homogenous dataset. The results open new questions about the regulatory mechanisms that produce strong RA relationships. RA is scalable and potentially well suited for the analysis of thousands of bulk-RNA or single-cell transcriptomes.

    View details for DOI 10.1186/1471-2105-15-331

    View details for Web of Science ID 000347427300001

    View details for PubMedID 25411051

    View details for PubMedCentralID PMC4289233

  • A Transgenerational Process Defines piRNA Biogenesis in Drosophila virilis CELL REPORTS Le Thomas, A., Marinov, G. K., Aravin, A. A. 2014; 8 (6): 1617-1623

    Abstract

    Piwi-interacting (pi)RNAs repress diverse transposable elements in germ cells of Metazoa and are essential for fertility in both invertebrates and vertebrates. The precursors of piRNAs are transcribed from distinct genomic regions, the so-called piRNA clusters; however, how piRNA clusters are differentiated from the rest of the genome is not known. To address this question, we studied piRNA biogenesis in two D. virilis strains that show differential ability to generate piRNAs from several genomic regions. We found that active piRNA biogenesis correlates with high levels of histone 3 lysine 9 trimethylation (H3K9me3) over genomic regions that give rise to piRNAs. Furthermore, piRNA biogenesis in the progeny requires the transgenerational inheritance of an epigenetic signal, presumably in the form of homologous piRNAs that are generated in the maternal germline and deposited into the oocyte. The inherited piRNAs enhance piRNA biogenesis through the installment of H3K9me3 on piRNA clusters.

    View details for DOI 10.1016/j.celrep.2014.08.013

    View details for Web of Science ID 000343867400002

    View details for PubMedID 25199836

    View details for PubMedCentralID PMC5054749

  • Reply to Brunet and Doolittle: Both selected effect and causal role elements can influence human biology and disease PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (33): E3366-E3366

    View details for DOI 10.1073/pnas.1410434111

    View details for Web of Science ID 000340438800004

    View details for PubMedID 25275169

    View details for PubMedCentralID PMC4143047

  • Fully automated high-throughput chromatin immunoprecipitation for ChIP-seq: Identifying ChIP-quality p300 monoclonal antibodies SCIENTIFIC REPORTS Gasper, W. C., Marinov, G. K., Pauli-Behn, F., Scott, M. T., Newberry, K., DeSalvo, G., Ou, S., Myers, R. M., Vielmetter, J., Wold, B. J. 2014; 4

    Abstract

    Chromatin immunoprecipitation coupled with DNA sequencing (ChIP-seq) is the major contemporary method for mapping in vivo protein-DNA interactions in the genome. It identifies sites of transcription factor, cofactor and RNA polymerase occupancy, as well as the distribution of histone marks. Consortia such as the ENCyclopedia Of DNA Elements (ENCODE) have produced large datasets using manual protocols. However, future measurements of hundreds of additional factors in many cell types and physiological states call for higher throughput and consistency afforded by automation. Such automation advances, when provided by multiuser facilities, could also improve the quality and efficiency of individual small-scale projects. The immunoprecipitation process has become rate-limiting, and is a source of substantial variability when performed manually. Here we report a fully automated robotic ChIP (R-ChIP) pipeline that allows up to 96 reactions. A second bottleneck is the dearth of renewable ChIP-validated immune reagents, which do not yet exist for most mammalian transcription factors. We used R-ChIP to screen new mouse monoclonal antibodies raised against p300, a histone acetylase, well-known as a marker of active enhancers, for which ChIP-competent monoclonal reagents have been lacking. We identified, validated for ChIP-seq, and made publicly available a monoclonal reagent called ENCITp300-1.

    View details for DOI 10.1038/srep05152

    View details for Web of Science ID 000337225100001

    View details for PubMedID 24919486

    View details for PubMedCentralID PMC4053718

  • Defining functional DNA elements in the human genome PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Kellis, M., Wold, B., Snyder, M. P., Bernstein, B. E., Kundaje, A., Marinov, G. K., Ward, L. D., Birney, E., Crawford, G. E., Dekker, J., Dunham, I., Elnitski, L. L., Farnham, P. J., Feingold, E. A., Gerstein, M., Giddings, M. C., Gilbert, D. M., Gingeras, T. R., Green, E. D., Guigo, R., Hubbard, T., Kent, J., Lieb, J. D., Myers, R. M., Pazin, M. J., Ren, B., Stamatoyannopoulos, J. A., Weng, Z., White, K. P., Hardison, R. C. 2014; 111 (17): 6131-6138

    Abstract

    With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease.

    View details for DOI 10.1073/pnas.1318948111

    View details for Web of Science ID 000335199000025

    View details for PubMedID 24753594

    View details for PubMedCentralID PMC4035993

  • From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing GENOME RESEARCH Marinov, G. K., Williams, B. A., McCue, K., Schroth, G. P., Gertz, J., Myers, R. M., Wold, B. J. 2014; 24 (3): 496-510

    Abstract

    Single-cell RNA-seq mammalian transcriptome studies are at an early stage in uncovering cell-to-cell variation in gene expression, transcript processing and editing, and regulatory module activity. Despite great progress recently, substantial challenges remain, including discriminating biological variation from technical noise. Here we apply the SMART-seq single-cell RNA-seq protocol to study the reference lymphoblastoid cell line GM12878. By using spike-in quantification standards, we estimate the absolute number of RNA molecules per cell for each gene and find significant variation in total mRNA content: between 50,000 and 300,000 transcripts per cell. We directly measure technical stochasticity by a pool/split design and find that there are significant differences in expression between individual cells, over and above technical variation. Specific gene coexpression modules were preferentially expressed in subsets of individual cells, including one enriched for mRNA processing and splicing factors. We assess cell-to-cell variation in alternative splicing and allelic bias and report evidence of significant differences in splice site usage that exceed splice variation in the pool/split comparison. Finally, we show that transcriptomes from small pools of 30-100 cells approach the information content and reproducibility of contemporary RNA-seq from large amounts of input material. Together, our results define an experimental and computational path forward for analyzing gene expression in rare cell types and cell states.

    View details for DOI 10.1101/gr.161034.113

    View details for Web of Science ID 000332246100013

    View details for PubMedID 24299736

    View details for PubMedCentralID PMC3941114

  • Evidence for Site-Specific Occupancy of the Mitochondrial Genome by Nuclear Transcription Factors PLOS ONE Marinov, G. K., Wang, Y. E., Chan, D., Wold, B. J. 2014; 9 (1)

    Abstract

    Mitochondria contain their own circular genome, with mitochondria-specific transcription and replication systems and corresponding regulatory proteins. All of these proteins are encoded in the nuclear genome and are post-translationally imported into mitochondria. In addition, several nuclear transcription factors have been reported to act in mitochondria, but there has been no comprehensive mapping of their occupancy patterns and it is not clear how many other factors may also be found in mitochondria. Here we address these questions by using ChIP-seq data from the ENCODE, mouseENCODE and modENCODE consortia for 151 human, 31 mouse and 35 C. elegans factors. We identified 8 human and 3 mouse transcription factors with strong localized enrichment over the mitochondrial genome that was usually associated with the corresponding recognition sequence motif. Notably, these sites of occupancy are often the sites with highest ChIP-seq signal intensity within both the nuclear and mitochondrial genomes and are thus best explained as true binding events to mitochondrial DNA, which exist in high copy number in each cell. We corroborated these findings by immunocytochemical staining evidence for mitochondrial localization. However, we were unable to find clear evidence for mitochondrial binding in ENCODE and other publicly available ChIP-seq data for most factors previously reported to localize there. As the first global analysis of nuclear transcription factors binding in mitochondria, this work opens the door to future studies that probe the functional significance of the phenomenon.

    View details for DOI 10.1371/journal.pone.0084713

    View details for Web of Science ID 000330240500021

    View details for PubMedID 24465428

    View details for PubMedCentralID PMC3896368

  • Large-Scale Quality Analysis of Published ChIP-seq Data. G3 (Bethesda, Md.) Marinov, G. K., Kundaje, A., Park, P. J., Wold, B. J. 2014; 4 (2): 209-223

    Abstract

    ChIP-seq has become the primary method for identifying in vivo protein-DNA interactions on a genome-wide scale, with nearly 800 publications involving the technique appearing in PubMed as of December 2012. Individually and in aggregate, these data are an important and information-rich resource. However, uncertainties about data quality confound their use by the wider research community. Recently, the Encyclopedia of DNA Elements (ENCODE) project developed and applied metrics to objectively measure ChIP-seq data quality. The ENCODE quality analysis was useful for flagging datasets for closer inspection, eliminating or replacing poor data, and for driving changes in experimental pipelines. There had been no similarly systematic quality analysis of the large and disparate body of published ChIP-seq profiles. Here, we report a uniform analysis of vertebrate transcription factor ChIP-seq datasets in the Gene Expression Omnibus (GEO) repository as of April 1, 2012. The majority (55%) of datasets scored as being highly successful, but a substantial minority (20%) were of apparently poor quality, and another ∼25% were of intermediate quality. We discuss how different uses of ChIP-seq data are affected by specific aspects of data quality, and we highlight exceptional instances for which the metric values should not be taken at face value. Unexpectedly, we discovered that a significant subset of control datasets (i.e., no immunoprecipitation and mock immunoprecipitation samples) display an enrichment structure similar to successful ChIP-seq data. This can, in turn, affect peak calling and data interpretation. Published datasets identified here as high-quality comprise a large group that users can draw on for large-scale integrated analysis. In the future, ChIP-seq quality assessment similar to that used here could guide experimentalists at early stages in a study, provide useful input in the publication process, and be used to stratify ChIP-seq data for different community-wide uses.

    View details for DOI 10.1534/g3.113.008680

    View details for PubMedID 24347632

    View details for PubMedCentralID PMC3931556

  • Integrating and mining the chromatin landscape of cell-type specificity using self-organizing maps GENOME RESEARCH Mortazavi, A., Pepke, S., Jansen, C., Marinov, G. K., Ernst, J., Kellis, M., Hardison, R. C., Myers, R. M., Wold, B. J. 2013; 23 (12): 2136-2148

    Abstract

    We tested whether self-organizing maps (SOMs) could be used to effectively integrate, visualize, and mine diverse genomics data types, including complex chromatin signatures. A fine-grained SOM was trained on 72 ChIP-seq histone modifications and DNase-seq data sets from six biologically diverse cell lines studied by The ENCODE Project Consortium. We mined the resulting SOM to identify chromatin signatures related to sequence-specific transcription factor occupancy, sequence motif enrichment, and biological functions. To highlight clusters enriched for specific functions such as transcriptional promoters or enhancers, we overlaid onto the map additional data sets not used during training, such as ChIP-seq, RNA-seq, CAGE, and information on cis-acting regulatory modules from the literature. We used the SOM to parse known transcriptional enhancers according to the cell-type-specific chromatin signature, and we further corroborated this pattern on the map by EP300 (also known as p300) occupancy. New candidate cell-type-specific enhancers were identified for multiple ENCODE cell types in this way, along with new candidates for ubiquitous enhancer activity. An interactive web interface was developed to allow users to visualize and custom-mine the ENCODE SOM. We conclude that large SOMs trained on chromatin data from multiple cell types provide a powerful way to identify complex relationships in genomic data at user-selected levels of granularity.

    View details for DOI 10.1101/gr.158261.113

    View details for Web of Science ID 000327946900016

    View details for PubMedID 24170599

    View details for PubMedCentralID PMC3847782

  • Genome-Wide Analysis Reveals Coating of the Mitochondrial Genome by TFAM PLOS ONE Wang, Y. E., Marinov, G. K., Wold, B. J., Chan, D. C. 2013; 8 (8)

    Abstract

    Mitochondria contain a 16.6 kb circular genome encoding 13 proteins as well as mitochondrial tRNAs and rRNAs. Copies of the genome are organized into nucleoids containing both DNA and proteins, including the machinery required for mtDNA replication and transcription. The transcription factor TFAM is critical for initiation of transcription and replication of the genome, and is also thought to perform a packaging function. Although specific binding sites required for initiation of transcription have been identified in the D-loop, little is known about the characteristics of TFAM binding in its nonspecific packaging state. In addition, it is unclear whether TFAM also plays a role in the regulation of nuclear gene expression. Here we investigate these questions by using ChIP-seq to directly localize TFAM binding to DNA in human cells. Our results demonstrate that TFAM uniformly coats the whole mitochondrial genome, with no evidence of robust TFAM binding to the nuclear genome. Our study represents the first high-resolution assessment of TFAM binding on a genome-wide scale in human cells.

    View details for DOI 10.1371/journal.pone.0074513

    View details for Web of Science ID 000324228800121

    View details for PubMedID 23991223

    View details for PubMedCentralID PMC3753274

  • Piwi induces piRNA-guided transcriptional silencing and establishment of a repressive chromatin state GENES & DEVELOPMENT Le Thomas, A., Rogers, A. K., Webster, A., Marinov, G. K., Liao, S. E., Perkins, E. M., Hur, J. K., Aravin, A. A., Toth, K. F. 2013; 27 (4): 390-399

    Abstract

    In the metazoan germline, piwi proteins and associated piwi-interacting RNAs (piRNAs) provide a defense system against the expression of transposable elements. In the cytoplasm, piRNA sequences guide piwi complexes to destroy complementary transposon transcripts by endonucleolytic cleavage. However, some piwi family members are nuclear, raising the possibility of alternative pathways for piRNA-mediated regulation of gene expression. We found that Drosophila Piwi is recruited to chromatin, colocalizing with RNA polymerase II (Pol II) on polytene chromosomes. Knockdown of Piwi in the germline increases expression of transposable elements that are targeted by piRNAs, whereas protein-coding genes remain largely unaffected. Derepression of transposons upon Piwi depletion correlates with increased occupancy of Pol II on their promoters. Expression of piRNAs that target a reporter construct results in a decrease in Pol II occupancy and an increase in repressive H3K9me3 marks and heterochromatin protein 1 (HP1) on the reporter locus. Our results indicate that Piwi identifies targets complementary to the associated piRNA and induces transcriptional repression by establishing a repressive chromatin state when correct targets are found.

    View details for DOI 10.1101/gad.209841.112

    View details for Web of Science ID 000315286300005

    View details for PubMedID 23392610

    View details for PubMedCentralID PMC3589556

  • Antitumor activity of a pyrrole-imidazole polyamide PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Yang, F., Nickols, N. G., Li, B. C., Marinov, G. K., Said, J. W., Dervan, P. B. 2013; 110 (5): 1863-1868

    Abstract

    Many cancer therapeutics target DNA and exert cytotoxicity through the induction of DNA damage and inhibition of transcription. We report that a DNA minor groove binding hairpin pyrrole-imidazole (Py-Im) polyamide interferes with RNA polymerase II (RNAP2) activity in cell culture. Polyamide treatment activates p53 signaling in LNCaP prostate cancer cells without detectable DNA damage. Genome-wide mapping of RNAP2 binding shows reduction of occupancy, preferentially at transcription start sites, but occupancy at enhancer sites is unchanged. Polyamide treatment results in a time- and dose-dependent depletion of the RNAP2 large subunit RPB1 that is preventable with proteasome inhibition. This polyamide demonstrates antitumor activity in a prostate tumor xenograft model with limited host toxicity.

    View details for DOI 10.1073/pnas.1222035110

    View details for Web of Science ID 000314558100059

    View details for PubMedID 23319609

    View details for PubMedCentralID PMC3562772

  • Gene expression changes in a tumor xenograft by a pyrrole-imidazole polyamide PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Raskatov, J. A., Nickols, N. G., Hargrove, A. E., Marinov, G. K., Wold, B., Dervan, P. B. 2012; 109 (40): 16041-16045

    Abstract

    Gene regulation by DNA binding small molecules could have important therapeutic applications. This study reports the investigation of a DNA-binding pyrrole-imidazole polyamide targeted to bind the DNA sequence 5'-WGGWWW-3' with reference to its potency in a subcutaneous xenograft tumor model. The molecule is capable of trafficking to the tumor site following subcutaneous injection and modulates transcription of select genes in vivo. An FITC-labeled analogue of this polyamide can be detected in tumor-derived cells by confocal microscopy. RNA deep sequencing (RNA-seq) of tumor tissue allowed the identification of further affected genes, a representative panel of which was interrogated by quantitative reverse transcription-PCR and correlated with cell culture expression levels.

    View details for DOI 10.1073/pnas.1214267109

    View details for Web of Science ID 000309611400026

    View details for PubMedID 22988074

    View details for PubMedCentralID PMC3479560

  • An integrated encyclopedia of DNA elements in the human genome NATURE Dunham, I., Kundaje, A., Aldred, S. F., Collins, P. J., Davis, C., Doyle, F., Epstein, C. B., Frietze, S., Harrow, J., Kaul, R., Khatun, J., Lajoie, B. R., Landt, S. G., Lee, B., Pauli, F., Rosenbloom, K. R., Sabo, P., Safi, A., Sanyal, A., Shoresh, N., Simon, J. M., Song, L., Trinklein, N. D., Altshuler, R. C., Birney, E., Brown, J. B., Cheng, C., Djebali, S., Dong, X., Dunham, I., Ernst, J., Furey, T. S., Gerstein, M., Giardine, B., Greven, M., Hardison, R. C., Harris, R. S., Herrero, J., Hoffman, M. M., Iyer, S., Kellis, M., Khatun, J., Kheradpour, P., Kundaje, A., Lassmann, T., Li, Q., Lin, X., Marinov, G. K., Merkel, A., Mortazavi, A., Parker, S. C., Reddy, T. E., Rozowsky, J., Schlesinger, F., Thurman, R. E., Wang, J., Ward, L. D., Whitfield, T. W., Wilder, S. P., Wu, W., Xi, H. S., Yip, K. Y., Zhuang, J., Bernstein, B. E., Birney, E., Dunham, I., Green, E. D., Gunter, C., Snyder, M., Pazin, M. J., Lowdon, R. F., Dillon, L. A., Adams, L. B., Kelly, C. J., Zhang, J., Wexler, J. R., Green, E. D., Good, P. J., Feingold, E. A., Bernstein, B. E., Birney, E., Crawford, G. E., Dekker, J., Elnitski, L., Farnham, P. J., Gerstein, M., Giddings, M. C., Gingeras, T. R., Green, E. D., Guigo, R., Hardison, R. C., Hubbard, T. J., Kellis, M., Kent, W. J., Lieb, J. D., Margulies, E. H., Myers, R. M., Snyder, M., Stamatoyannopoulos, J. A., Tenenbaum, S. A., Weng, Z., White, K. P., Wold, B., Khatun, J., Yu, Y., Wrobel, J., Risk, B. A., Gunawardena, H. P., Kuiper, H. C., Maier, C. W., Xie, L., Chen, X., Giddings, M. C., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Kheradpour, P., Mikkelsen, T. S., Gillespie, S., Goren, A., Ram, O., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Truong, T., Ward, L. D., Altshuler, R. C., Eaton, M. L., Kellis, M., Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Batut, P., Bell, I., Bell, K., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H. P., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Li, G., Luo, O. J., Park, E., Preall, J. B., Presaud, K., Ribeca, P., Risk, B. A., Robyr, D., Ruan, X., Sammeth, M., Sandhu, K. S., Schaeffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T. J., Reymond, A., Antonarakis, S. E., Hannon, G. J., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R., Rosenbloom, K. R., Sloan, C. A., Learned, K., Malladi, V. S., Wong, M. C., Barber, G., Cline, M. S., Dreszer, T. R., Heitner, S. G., Karolchik, D., Kent, W. J., Kirkup, V. M., Meyer, L. R., Long, J. C., Maddren, M., Raney, B. J., Furey, T. S., Song, L., Grasfeder, L. L., Giresi, P. G., Lee, B., Battenhouse, A., Sheffield, N. C., Simon, J. M., Showers, K. A., Safi, A., London, D., Bhinge, A. A., Shestak, C., Schaner, M. R., Kim, S. K., Zhang, Z. Z., Mieczkowski, P. A., Mieczkowska, J. O., Liu, Z., McDaniell, R. M., Ni, Y., Rashid, N. U., Kim, M. J., Adar, S., Zhang, Z., Wang, T., Winter, D., Keefe, D., Birney, E., Iyer, V. R., Lieb, J. D., Crawford, G. E., Li, G., Sandhu, K. S., Zheng, M., Wang, P., Luo, O. J., Shahab, A., Fullwood, M. J., Ruan, X., Ruan, Y., Myers, R. M., Pauli, F., Williams, B. A., Gertz, J., Marinov, G. K., Reddy, T. E., Vielmetter, J., Partridge, E. C., Trout, D., Varley, K. E., Gasper, C., Bansal, A., Pepke, S., Jain, P., Amrhein, H., Bowling, K. M., Anaya, M., Cross, M. K., King, B., Muratet, M. A., Antoshechkin, I., Newberry, K. M., McCue, K., Nesmith, A. S., Fisher-Aylor, K. I., Pusey, B., DeSalvo, G., Parker, S. L., Balasubramanian, S., Davis, N. S., Meadows, S. K., Eggleston, T., Gunter, C., Newberry, J. S., Levy, S. E., Absher, D. M., Mortazavi, A., Wong, W. H., Wold, B., Blow, M. J., Visel, A., Pennachio, L. A., Elnitski, L., Margulies, E. H., Parker, S. C., Petrykowska, H. M., Abyzov, A., Aken, B., Barrell, D., Barson, G., Berry, A., Bignell, A., Boychenko, V., Bussotti, G., Chrast, J., Davidson, C., Derrien, T., Despacio-Reyes, G., Diekhans, M., Ezkurdia, I., Frankish, A., Gilbert, J., Gonzalez, J. M., Griffiths, E., Harte, R., Hendrix, D. A., Howald, C., Hunt, T., Jungreis, I., Kay, M., Khurana, E., Kokocinski, F., Leng, J., Lin, M. F., Loveland, J., Lu, Z., Manthravadi, D., Mariotti, M., Mudge, J., Mukherjee, G., Notredame, C., Pei, B., Rodriguez, J. M., Saunders, G., Sboner, A., Searle, S., Sisu, C., Snow, C., Steward, C., Tanzer, A., Tapanari, E., Tress, M. L., van Baren, M. J., Walters, N., Washietl, S., Wilming, L., Zadissa, A., Zhang, Z., Brent, M., Haussler, D., Kellis, M., Valencia, A., Gerstein, M., Reymond, A., Guigo, R., Harrow, J., Hubbard, T. J., Landt, S. G., Frietze, S., Abyzov, A., Addleman, N., Alexander, R. P., Auerbach, R. K., Balasubramanian, S., Bettinger, K., Bhardwaj, N., Boyle, A. P., Cao, A. R., Cayting, P., Charos, A., Cheng, Y., Cheng, C., Eastman, C., Euskirchen, G., Fleming, J. D., Grubert, F., Habegger, L., Hariharan, M., Harmanci, A., Iyengar, S., Jin, V. X., Karczewski, K. J., Kasowski, M., Lacroute, P., Lam, H., Lamarre-Vincent, N., Leng, J., Lian, J., Lindahl-Allen, M., Min, R., Miotto, B., Monahan, H., Moqtaderi, Z., Mu, X. J., O'Geen, H., Ouyang, Z., Patacsil, D., Pei, B., Raha, D., Ramirez, L., Reed, B., Rozowsky, J., Sboner, A., Shi, M., Sisu, C., Slifer, T., Witt, H., Wu, L., Xu, X., Yan, K., Yang, X., Yip, K. Y., Zhang, Z., Struhl, K., Weissman, S. M., Gerstein, M., Farnham, P. J., Snyder, M., Tenenbaum, S. A., Penalva, L. O., Doyle, F., Karmakar, S., Landt, S. G., Bhanvadia, R. R., Choudhury, A., Domanus, M., Ma, L., Moran, J., Patacsil, D., Slifer, T., Victorsen, A., Yang, X., Snyder, M., White, K. P., Auer, T., Centanin, L., Eichenlaub, M., Gruhl, F., Heermann, S., Hoeckendorf, B., Inoue, D., Kellner, T., Kirchmaier, S., Mueller, C., Reinhardt, R., Schertel, L., Schneider, S., Sinn, R., Wittbrodt, B., Wittbrodt, J., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Aldred, S. F., Trinklein, N. D., Partridge, E. C., Myers, R. M., Dekker, J., Jain, G., Lajoie, B. R., Sanyal, A., Balasundaram, G., Bates, D. L., Byron, R., Canfield, T. K., Diegel, M. J., Dunn, D., Ebersol, A. K., Frum, T., Garg, K., Gist, E., Hansen, R. S., Boatman, L., Haugen, E., Humbert, R., Jain, G., Johnson, A. K., Johnson, E. M., Kutyavin, T. V., Lajoie, B. R., Lee, K., Lotakis, D., Maurano, M. T., Neph, S. J., Neri, F. V., Nguyen, E. D., Qu, H., Reynolds, A. P., Roach, V., Rynes, E., Sabo, P., Sanchez, M. E., Sandstrom, R. S., Sanyal, A., Shafer, A. O., Stergachis, A. B., Thomas, S., Thurman, R. E., Vernot, B., Vierstra, J., Vong, S., Wang, H., Weaver, M. A., Yan, Y., Zhang, M., Akey, J. M., Bender, M., Dorschner, M. O., Groudine, M., MacCoss, M. J., Navas, P., Stamatoyannopoulos, G., Kaul, R., Dekker, J., Stamatoyannopoulos, J. A., Dunham, I., Beal, K., Brazma, A., Flicek, P., Herrero, J., Johnson, N., Keefe, D., Lukk, M., Luscombe, N. M., Sobral, D., Vaquerizas, J. M., Wilder, S. P., Batzoglou, S., Sidow, A., Hussami, N., Kyriazopoulou-Panagiotopoulou, S., Libbrecht, M. W., Schaub, M. A., Kundaje, A., Hardison, R. C., Miller, W., Giardine, B., Harris, R. S., Wu, W., Bickel, P. J., Banfai, B., Boley, N. P., Brown, J. B., Huang, H., Li, Q., Li, J. J., Noble, W. S., Bilmes, J. A., Buske, O. J., Hoffman, M. M., Sahu, A. D., Kharchenko, P. V., Park, P. J., Baker, D., Taylor, J., Weng, Z., Iyer, S., Dong, X., Greven, M., Lin, X., Wang, J., Xi, H. S., Zhuang, J., Gerstein, M., Alexander, R. P., Balasubramanian, S., Cheng, C., Harmanci, A., Lochovsky, L., Min, R., Mu, X. J., Rozowsky, J., Yan, K., Yip, K. Y., Birney, E. 2012; 489 (7414): 57-74

    Abstract

    The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

    View details for DOI 10.1038/nature11247

    View details for Web of Science ID 000308347000039

    View details for PubMedID 22955616

    View details for PubMedCentralID PMC3439153

  • Landscape of transcription in human cells NATURE Djebali, S., Davis, C. A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., Xue, C., Marinov, G. K., Khatun, J., Williams, B. A., Zaleski, C., Rozowsky, J., Roeder, M., Kokocinski, F., Abdelhamid, R. F., Alioto, T., Antoshechkin, I., Baer, M. T., Bar, N. S., Batut, P., Bell, K., Bell, I., Chakrabortty, S., Chen, X., Chrast, J., Curado, J., Derrien, T., Drenkow, J., Dumais, E., Dumais, J., Duttagupta, R., Falconnet, E., Fastuca, M., Fejes-Toth, K., Ferreira, P., Foissac, S., Fullwood, M. J., Gao, H., Gonzalez, D., Gordon, A., Gunawardena, H., Howald, C., Jha, S., Johnson, R., Kapranov, P., King, B., Kingswood, C., Luo, O. J., Park, E., Persaud, K., Preall, J. B., Ribeca, P., Risk, B., Robyr, D., Sammeth, M., Schaffer, L., See, L., Shahab, A., Skancke, J., Suzuki, A. M., Takahashi, H., Tilgner, H., Trout, D., Walters, N., Wang, H., Wrobel, J., Yu, Y., Ruan, X., Hayashizaki, Y., Harrow, J., Gerstein, M., Hubbard, T., Reymond, A., Antonarakis, S. E., Hannon, G., Giddings, M. C., Ruan, Y., Wold, B., Carninci, P., Guigo, R., Gingeras, T. R. 2012; 489 (7414): 101-108

    Abstract

    Eukaryotic cells make many types of primary and processed RNAs that are found either in specific subcellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic subcellular localizations are also poorly understood. Because RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell's regulatory capabilities are focused on its synthesis, processing, transport, modification and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three-quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations, taken together, prompt a redefinition of the concept of a gene.

    View details for DOI 10.1038/nature11233

    View details for Web of Science ID 000308347000043

    View details for PubMedID 22955620

  • ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia GENOME RESEARCH Landt, S. G., Marinov, G. K., Kundaje, A., Kheradpour, P., Pauli, F., Batzoglou, S., Bernstein, B. E., Bickel, P., Brown, J. B., Cayting, P., Chen, Y., DeSalvo, G., Epstein, C., Fisher-Aylor, K. I., Euskirchen, G., Gerstein, M., Gertz, J., Hartemink, A. J., Hoffman, M. M., Iyer, V. R., Jung, Y. L., Karmakar, S., Kellis, M., Kharchenko, P. V., Li, Q., Liu, T., Liu, X. S., Ma, L., Milosavljevic, A., Myers, R. M., Park, P. J., Pazin, M. J., Perry, M. D., Raha, D., Reddy, T. E., Rozowsky, J., Shoresh, N., Sidow, A., Slattery, M., Stamatoyannopoulos, J. A., Tolstorukov, M. Y., White, K. P., Xi, S., Farnham, P. J., Lieb, J. D., Wold, B. J., Snyder, M. 2012; 22 (9): 1813-1831

    Abstract

    Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

    View details for DOI 10.1101/gr.136184.111

    View details for PubMedID 22955991

  • Effects of sequence variation on differential allelic transcription factor occupancy and gene expression GENOME RESEARCH Reddy, T. E., Gertz, J., Pauli, F., Kucera, K. S., Varley, K. E., Newberry, K. M., Marinov, G. K., Mortazavi, A., Williams, B. A., Song, L., Crawford, G. E., Wold, B., Willard, H. F., Myers, R. M. 2012; 22 (5): 860-869

    Abstract

    A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences.

    View details for DOI 10.1101/gr.131201.111

    View details for Web of Science ID 000303369600006

    View details for PubMedID 22300769

    View details for PubMedCentralID PMC3337432

  • An encyclopedia of mouse DNA elements (Mouse ENCODE) GENOME BIOLOGY Stamatoyannopoulos, J. A., Snyder, M., Hardison, R., Ren, B., Gingeras, T., Gilbert, D. M., Groudine, M., Bender, M., Kaul, R., Canfield, T., Giste, E., Johnson, A., Zhang, M., Balasundaram, G., Byron, R., Roach, V., Sabo, P. J., Sandstrom, R., Stehling, A. S., Thurman, R. E., Weissman, S. M., Cayting, P., Hariharan, M., Lian, J., Cheng, Y., Landt, S. G., Ma, Z., Wold, B. J., Dekker, J., Crawford, G. E., Keller, C. A., Wu, W., Morrissey, C., Kumar, S. A., Mishra, T., Jain, D., Byrska-Bishop, M., Blankenberg, D., Lajoie, B. R., Jain, G., Sanyal, A., Chen, K., Denas, O., Taylor, J., Blobel, G. A., Weiss, M. J., Pimkin, M., Deng, W., Marinov, G. K., Williams, B. A., Fisher-Aylor, K. I., DeSalvo, G., Kiralusha, A., Trout, D., Amrhein, H., Mortazavi, A., Edsall, L., McCleary, D., Kuan, S., Shen, Y., Yue, F., Ye, Z., Davis, C. A., Zaleski, C., Jha, S., Xue, C., Dobin, A., Lin, W., Fastuca, M., Wang, H., Guigo, R., Djebali, S., Lagarde, J., Ryba, T., Sasaki, T., Malladi, V. S., Cline, M. S., Kirkup, V. M., Learned, K., Rosenbloom, K. R., Kent, W. J., Feingold, E. A., Good, P. J., Pazin, M., Lowdon, R. F., Adams, L. B. 2012; 13 (8)
  • A User's Guide to the Encyclopedia of DNA Elements (ENCODE) PLOS BIOLOGY Myers, R. M., Stamatoyannopoulos, J., Snyder, M., Dunham, I., Hardison, R. C., Bernstein, B. E., Gingeras, T. R., Kent, W. J., Birney, E., Wold, B., Crawford, G. E., Bernstein, B. E., Epstein, C. B., Shoresh, N., Ernst, J., Mikkelsen, T. S., Kheradpour, P., Zhang, X., Wang, L., Issner, R., Coyne, M. J., Durham, T., Ku, M., Thanh Truong, T., Ward, L. D., Altshuler, R. C., Lin, M. F., Kellis, M., Gingeras, T. R., Davis, C. A., Kapranov, P., Dobin, A., Zaleski, C., Schlesinger, F., Batut, P., Chakrabortty, S., Jha, S., Lin, W., Drenkow, J., Wang, H., Bell, K., Gao, H., Bell, I., Dumais, E., Dumais, J., Antonarakis, S. E., Ucla, C., Borel, C., Guigo, R., Djebali, S., Lagarde, J., Kingswood, C., Ribeca, P., Sammeth, M., Alioto, T., Merkel, A., Tilgner, H., Carninci, P., Hayashizaki, Y., Lassmann, T., Takahashi, H., Abdelhamid, R. F., Hannon, G., Fejes-Toth, K., Preall, J., Gordon, A., Sotirova, V., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Ruan, Y., Ruan, X., Shahab, A., Poh, W. T., Wei, C., Crawford, G. E., Furey, T. S., Boyle, A. P., Sheffield, N. C., Song, L., Shibata, Y., Vales, T., Winter, D., Zhang, Z., London, D., Wang, T., Birney, E., Keefe, D., Iyer, V. R., Lee, B., McDaniell, R. M., Liu, Z., Battenhouse, A., Bhinge, A. A., Lieb, J. D., Grasfeder, L. L., Showers, K. A., Giresi, P. G., Kim, S. K., Shestak, C., Myers, R. M., Pauli, F., Reddy, T. E., Gertz, J., Partridge, E. C., Jain, P., Sprouse, R. O., Bansal, A., Pusey, B., Muratet, M. A., Varley, K. E., Bowling, K. M., Newberry, K. M., Nesmith, A. S., Dilocker, J. A., Parker, S. L., Waite, L. L., Thibeault, K., Roberts, K., Absher, D. M., Wold, B., Mortazavi, A., Williams, B., Marinov, G., Trout, D., Pepke, S., King, B., McCue, K., Kirilusha, A., DeSalvo, G., Fisher-Aylor, K., Amrhein, H., Vielmetter, J., Sherlock, G., Sidow, A., Batzoglou, S., Rauch, R., Kundaje, A., Libbrecht, M., Margulies, E. H., Parker, S. C., Elnitski, L., Green, E. D., Hubbard, T., Harrow, J., Searle, S., Kokocinski, F., Aken, B., Frankish, A., Hunt, T., Despacio-Reyes, G., Kay, M., Mukherjee, G., Bignell, A., Saunders, G., Boychenko, V., Brent, M., van Baren, M. J., Brown, R. H., Gerstein, M., Khurana, E., Balasubramanian, S., Zhang, Z., Lam, H., Cayting, P., Robilotto, R., Lu, Z., Guigo, R., Derrien, T., Tanzer, A., Knowles, D. G., Mariotti, M., Kent, W. J., Haussler, D., Harte, R., Diekhans, M., Kellis, M., Lin, M., Kheradpour, P., Ernst, J., Reymond, A., Howald, C., Graison, E. A., Chrast, J., Valencia, A., Tress, M., Manuel Rodriguez, J., Snyder, M., Landt, S. G., Raha, D., Shi, M., Euskirchen, G., Grubert, F., Kasowski, M., Lian, J., Cayting, P., Lacroute, P., Xu, Y., Monahan, H., Patacsil, D., Slifer, T., Yang, X., Charos, A., Reed, B., Wu, L., Auerbach, R. K., Habegger, L., Hariharan, M., Rozowsky, J., Abyzov, A., Weissman, S. M., Gerstein, M., Struhl, K., Lamarre-Vincent, N., Lindahl-Allen, M., Miotto, B., Moqtaderi, Z., Fleming, J. D., Newburger, P., Farnham, P. J., Frietze, S., O'Geen, H., Xu, X., Blahnik, K. R., Cao, A. R., Iyengar, S., Stamatoyannopoulos, J. A., Kaul, R., Thurman, R. E., Wang, H., Navas, P. A., Sandstrom, R., Sabo, P. J., Weaver, M., Canfield, T., Lee, K., Neph, S., Roach, V., Reynolds, A., Johnson, A., Rynes, E., Giste, E., Vong, S., Neri, J., Frum, T., Johnson, E. M., Nguyen, E. D., Ebersol, A. K., Sanchez, M. E., Sheffer, H. H., Lotakis, D., Haugen, E., Humbert, R., Kutyavin, T., Shafer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Kent, W. J., Rosenbloom, K. R., Dreszer, T. R., Raney, B. J., Barber, G. P., Meyer, L. R., Sloan, C. A., Malladi, V. S., Cline, M. S., Learned, K., Swing, V. K., Zweig, A. S., Rhead, B., Fujita, P. A., Roskin, K., Karolchik, D., Kuhn, R. M., Haussler, D., Birney, E., Dunham, I., Wilder, S. P., Keefe, D., Sobral, D., Herrero, J., Beal, K., Lukk, M., Brazma, A., Vaquerizas, J. M., Luscombe, N. M., Bickel, P. J., Boley, N., Brown, J. B., Li, Q., Huang, H., Gerstein, M., Habegger, L., Sboner, A., Rozowsky, J., Auerbach, R. K., Yip, K. Y., Cheng, C., Yan, K., Bhardwaj, N., Wang, J., Lochovsky, L., Jee, J., Gibson, T., Leng, J., Du, J., Hardison, R. C., Harris, R. S., Song, G., Miller, W., Haussler, D., Roskin, K., Suh, B., Wang, T., Paten, B., Noble, W. S., Hoffman, M. M., Buske, O. J., Weng, Z., Dong, X., Wang, J., Xi, H., Tenenbaum, S. A., Doyle, F., Penalva, L. O., Chittur, S., Tullius, T. D., Parker, S. C., White, K. P., Karmakar, S., Victorsen, A., Jameel, N., Bild, N., Grossman, R. L., Snyder, M., Landt, S. G., Yang, X., Patacsil, D., Slifer, T., Dekker, J., Lajoie, B. R., Sanyal, A., Weng, Z., Whitfield, T. W., Wang, J., Collins, P. J., Trinklein, N. D., Partridge, E. C., Myers, R. M., Giddings, M. C., Chen, X., Khatun, J., Maier, C., Yu, Y., Gunawardena, H., Risk, B., Feingold, E. A., Lowdon, R. F., Dillon, L. A., Good, P. J. 2011; 9 (4)

    Abstract

    The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

    View details for DOI 10.1371/journal.pbio.1001046

    View details for Web of Science ID 000289938900014