All Publications

  • Single-molecule chromatin configurations link transcription factor binding to expression in human cells. bioRxiv : the preprint server for biology Doughty, B. R., Hinks, M. M., Schaepe, J. M., Marinov, G. K., Thurm, A. R., Rios-Martinez, C., Parks, B. E., Tan, Y., Marklund, E., Dubocanin, D., Bintu, L., Greenleaf, W. J. 2024


    The binding of multiple transcription factors (TFs) to genomic enhancers activates gene expression in mammalian cells. However, the molecular details that link enhancer sequence to TF binding, promoter state, and gene expression levels remain opaque. We applied single-molecule footprinting (SMF) to measure the simultaneous occupancy of TFs, nucleosomes, and components of the transcription machinery on engineered enhancer/promoter constructs with variable numbers of TF binding sites for both a synthetic and an endogenous TF. We find that activation domains enhance a TF's capacity to compete with nucleosomes for binding to DNA in a BAF-dependent manner, TF binding on nucleosome-free DNA is consistent with independent binding between TFs, and average TF occupancy linearly contributes to promoter activation rates. We also decompose TF strength into separable binding and activation terms, which can be tuned and perturbed independently. Finally, we develop thermodynamic and kinetic models that quantitatively predict both the binding microstates observed at the enhancer and subsequent time-dependent gene expression. This work provides a template for quantitative dissection of distinct contributors to gene activation, including the activity of chromatin remodelers, TF activation domains, chromatin acetylation, TF concentration, TF binding affinity, and TF binding site configuration.

    View details for DOI 10.1101/2024.02.02.578660

    View details for PubMedID 38352517

  • Genome-wide enhancer maps link risk variants to disease genes. Nature Nasser, J., Bergman, D. T., Fulco, C. P., Guckelberger, P., Doughty, B. R., Patwardhan, T. A., Jones, T. R., Nguyen, T. H., Ulirsch, J. C., Lekschas, F., Mualim, K., Natri, H. M., Weeks, E. M., Munson, G., Kane, M., Kang, H. Y., Cui, A., Ray, J. P., Eisenhaure, T. M., Collins, R. L., Dey, K., Pfister, H., Price, A. L., Epstein, C. B., Kundaje, A., Xavier, R. J., Daly, M. J., Huang, H., Finucane, H. K., Hacohen, N., Lander, E. S., Engreitz, J. M. 2021


    Genome-wide association studies (GWAS) have identified thousands of noncoding loci that are associated with human diseases and complex traits, each of which could reveal insights into the mechanisms of disease1. Many of the underlying causal variants may affect enhancers2,3, but we lack accurate maps of enhancers and their target genes to interpret such variants. We recently developed the activity-by-contact (ABC) model to predict which enhancers regulate which genes and validated the model using CRISPR perturbations in several cell types4. Here we apply this ABC model to create enhancer-genemaps in 131 human cell types and tissues, and use these maps to interpret the functions of GWAS variants. Across 72 diseases and complex traits, ABC links 5,036 GWAS signals to 2,249 unique genes, including a class of 577 genes that appear to influence multiple phenotypes through variants in enhancers that act in different cell types. In inflammatory bowel disease (IBD), causal variants are enriched in predicted enhancers by more than 20-fold in particular cell types such as dendritic cells, and ABC achieves higher precision than other regulatory methods at connecting noncoding variants to target genes. These variant-to-function maps reveal an enhancer that contains an IBD risk variant and that regulates the expression of PPIF to alter the membrane potential of mitochondria in macrophages. Our study reveals principles of genome regulation, identifies genes that affect IBD and provides a resource and generalizable strategy to connect risk variants of common diseases to their molecular and cellular functions.

    View details for DOI 10.1038/s41586-021-03446-x

    View details for PubMedID 33828297

  • HyPR-seq: Single-cell quantification of chosen RNAs via hybridization and sequencing of DNA probes. Proceedings of the National Academy of Sciences of the United States of America Marshall, J. L., Doughty, B. R., Subramanian, V., Guckelberger, P., Wang, Q., Chen, L. M., Rodriques, S. G., Zhang, K., Fulco, C. P., Nasser, J., Grinkevich, E. J., Noel, T., Mangiameli, S., Bergman, D. T., Greka, A., Lander, E. S., Chen, F., Engreitz, J. M. 2020; 117 (52): 33404–13


    Single-cell quantification of RNAs is important for understanding cellular heterogeneity and gene regulation, yet current approaches suffer from low sensitivity for individual transcripts, limiting their utility for many applications. Here we present Hybridization of Probes to RNA for sequencing (HyPR-seq), a method to sensitively quantify the expression of hundreds of chosen genes in single cells. HyPR-seq involves hybridizing DNA probes to RNA, distributing cells into nanoliter droplets, amplifying the probes with PCR, and sequencing the amplicons to quantify the expression of chosen genes. HyPR-seq achieves high sensitivity for individual transcripts, detects nonpolyadenylated and low-abundance transcripts, and can profile more than 100,000 single cells. We demonstrate how HyPR-seq can profile the effects of CRISPR perturbations in pooled screens, detect time-resolved changes in gene expression via measurements of gene introns, and detect rare transcripts and quantify cell-type frequencies in tissue using low-abundance marker genes. By directing sequencing power to genes of interest and sensitively quantifying individual transcripts, HyPR-seq reduces costs by up to 100-fold compared to whole-transcriptome single-cell RNA-sequencing, making HyPR-seq a powerful method for targeted RNA profiling in single cells.

    View details for DOI 10.1073/pnas.2010738117

    View details for PubMedID 33376219

  • Multicenter integrated analysis of noncoding CRISPRi screens. Nature methods Yao, D., Tycko, J., Oh, J. W., Bounds, L. R., Gosai, S. J., Lataniotis, L., Mackay-Smith, A., Doughty, B. R., Gabdank, I., Schmidt, H., Guerrero-Altamirano, T., Siklenka, K., Guo, K., White, A. D., Youngworth, I., Andreeva, K., Ren, X., Barrera, A., Luo, Y., Yardımcı, G. G., Tewhey, R., Kundaje, A., Greenleaf, W. J., Sabeti, P. C., Leslie, C., Pritykin, Y., Moore, J. E., Beer, M. A., Gersbach, C. A., Reddy, T. E., Shen, Y., Engreitz, J. M., Bassik, M. C., Reilly, S. K. 2024


    The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.

    View details for DOI 10.1038/s41592-024-02216-7

    View details for PubMedID 38504114

    View details for PubMedCentralID 3771521

  • Rewriting regulatory DNA to dissect and reprogram gene expression. bioRxiv : the preprint server for biology Martyn, G. E., Montgomery, M. T., Jones, H., Guo, K., Doughty, B. R., Linder, J., Chen, Z., Cochran, K., Lawrence, K. A., Munson, G., Pampari, A., Fulco, C. P., Kelley, D. R., Lander, E. S., Kundaje, A., Engreitz, J. M. 2023


    Regulatory DNA sequences within enhancers and promoters bind transcription factors to encode cell type-specific patterns of gene expression. However, the regulatory effects and programmability of such DNA sequences remain difficult to map or predict because we have lacked scalable methods to precisely edit regulatory DNA and quantify the effects in an endogenous genomic context. Here we present an approach to measure the quantitative effects of hundreds of designed DNA sequence variants on gene expression, by combining pooled CRISPR prime editing with RNA fluorescence in situ hybridization and cell sorting (Variant-FlowFISH). We apply this method to mutagenize and rewrite regulatory DNA sequences in an enhancer and the promoter of PPIF in two immune cell lines. Of 672 variant-cell type pairs, we identify 497 that affect PPIF expression. These variants appear to act through a variety of mechanisms including disruption or optimization of existing transcription factor binding sites, as well as creation of de novo sites. Disrupting a single endogenous transcription factor binding site often led to large changes in expression (up to -40% in the enhancer, and -50% in the promoter). The same variant often had different effects across cell types and states, demonstrating a highly tunable regulatory landscape. We use these data to benchmark performance of sequence-based predictive models of gene regulation, and find that certain types of variants are not accurately predicted by existing models. Finally, we computationally design 185 small sequence variants (≤10 bp) and optimize them for specific effects on expression in silico. 84% of these rationally designed edits showed the intended direction of effect, and some had dramatic effects on expression (-100% to +202%). Variant-FlowFISH thus provides a powerful tool to map the effects of variants and transcription factor binding sites on gene expression, test and improve computational models of gene regulation, and reprogram regulatory DNA.

    View details for DOI 10.1101/2023.12.20.572268

    View details for PubMedID 38187584

    View details for PubMedCentralID PMC10769263

  • An encyclopedia of enhancer-gene regulatory interactions in the human genome. bioRxiv : the preprint server for biology Gschwind, A. R., Mualim, K. S., Karbalayghareh, A., Sheth, M. U., Dey, K. K., Jagoda, E., Nurtdinov, R. N., Xi, W., Tan, A. S., Jones, H., Ma, X. R., Yao, D., Nasser, J., Avsec, Ž., James, B. T., Shamim, M. S., Durand, N. C., Rao, S. S., Mahajan, R., Doughty, B. R., Andreeva, K., Ulirsch, J. C., Fan, K., Perez, E. M., Nguyen, T. C., Kelley, D. R., Finucane, H. K., Moore, J. E., Weng, Z., Kellis, M., Bassik, M. C., Price, A. L., Beer, M. A., Guigó, R., Stamatoyannopoulos, J. A., Lieberman Aiden, E., Greenleaf, W. J., Leslie, C. S., Steinmetz, L. M., Kundaje, A., Engreitz, J. M. 2023


    Identifying transcriptional enhancers and their target genes is essential for understanding gene regulation and the impact of human genetic variation on disease1-6. Here we create and evaluate a resource of >13 million enhancer-gene regulatory interactions across 352 cell types and tissues, by integrating predictive models, measurements of chromatin state and 3D contacts, and largescale genetic perturbations generated by the ENCODE Consortium7. We first create a systematic benchmarking pipeline to compare predictive models, assembling a dataset of 10,411 elementgene pairs measured in CRISPR perturbation experiments, >30,000 fine-mapped eQTLs, and 569 fine-mapped GWAS variants linked to a likely causal gene. Using this framework, we develop a new predictive model, ENCODE-rE2G, that achieves state-of-the-art performance across multiple prediction tasks, demonstrating a strategy involving iterative perturbations and supervised machine learning to build increasingly accurate predictive models of enhancer regulation. Using the ENCODE-rE2G model, we build an encyclopedia of enhancer-gene regulatory interactions in the human genome, which reveals global properties of enhancer networks, identifies differences in the functions of genes that have more or less complex regulatory landscapes, and improves analyses to link noncoding variants to target genes and cell types for common, complex diseases. By interpreting the model, we find evidence that, beyond enhancer activity and 3D enhancer-promoter contacts, additional features guide enhancerpromoter communication including promoter class and enhancer-enhancer synergy. Altogether, these genome-wide maps of enhancer-gene regulatory interactions, benchmarking software, predictive models, and insights about enhancer function provide a valuable resource for future studies of gene regulation and human genetics.

    View details for DOI 10.1101/2023.11.09.563812

    View details for PubMedID 38014075

    View details for PubMedCentralID PMC10680627

  • The landscape of the histone-organized chromatin of Bdellovibrionota bacteria. bioRxiv : the preprint server for biology Marinov, G. K., Doughty, B., Kundaje, A., Greenleaf, W. J. 2023


    Histone proteins have traditionally been thought to be restricted to eukaryotes and most archaea, with eukaryotic nucleosomal histones deriving from their archaeal ancestors. In contrast, bacteria lack histones as a rule. However, histone proteins have recently been identified in a few bacterial clades, most notably the phylum Bdellovibrionota, and these histones have been proposed to exhibit a range of divergent features compared to histones in archaea and eukaryotes. However, no functional genomic studies of the properties of Bdellovibrionota chromatin have been carried out. In this work, we map the landscape of chromatin accessibility, active transcription and three-dimensional genome organization in a member of Bdellovibrionota (a Bacteriovorax strain). We find that, similar to what is observed in some archaea and in eukaryotes with compact genomes such as yeast, Bacteriovorax chromatin is characterized by preferential accessibility around promoter regions. Similar to eukaryotes, chromatin accessibility in Bacteriovorax positively correlates with gene expression. Mapping active transcription through single-strand DNA (ssDNA) profiling revealed that unlike in yeast, but similar to the state of mammalian and fly promoters, Bacteriovorax promoters exhibit very strong polymerase pausing. Finally, similar to that of other bacteria without histones, the Bacteriovorax genome exists in a three-dimensional (3D) configuration organized by the parABS system along the axis defined by replication origin and termination regions. These results provide a foundation for understanding the chromatin biology of the unique Bdellovibrionota bacteria and the functional diversity in chromatin organization across the tree of life.

    View details for DOI 10.1101/2023.10.30.564843

    View details for PubMedID 37961278

    View details for PubMedCentralID PMC10634947

  • Single-cell chromatin state transitions during epigenetic memory formation. bioRxiv : the preprint server for biology Fujimori, T., Rios-Martinez, C., Thurm, A. R., Hinks, M. M., Doughty, B. R., Sinha, J., Le, D., Hafner, A., Greenleaf, W. J., Boettiger, A. N., Bintu, L. 2023


    Repressive chromatin modifications are thought to compact chromatin to silence transcription. However, it is unclear how chromatin structure changes during silencing and epigenetic memory formation. We measured gene expression and chromatin structure in single cells after recruitment and release of repressors at a reporter gene. Chromatin structure is heterogeneous, with open and compact conformations present in both active and silent states. Recruitment of repressors associated with epigenetic memory produces chromatin compaction across 10-20 kilobases, while reversible silencing does not cause compaction at this scale. Chromatin compaction is inherited, but changes molecularly over time from histone methylation (H3K9me3) to DNA methylation. The level of compaction at the end of silencing quantitatively predicts epigenetic memory weeks later. Similarly, chromatin compaction at the Nanog locus predicts the degree of stem-cell fate commitment. These findings suggest that the chromatin state across tens of kilobases, beyond the gene itself, is important for epigenetic memory formation.

    View details for DOI 10.1101/2023.10.03.560616

    View details for PubMedID 37873344

    View details for PubMedCentralID PMC10592931

  • Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations NATURE GENETICS Fulco, C. P., Nasser, J., Jones, T. R., Munson, G., Bergman, D. T., Subramanian, V., Grossman, S. R., Anyoha, R., Doughty, B. R., Patwardhan, T. A., Nguyen, T. H., Kane, M., Perez, E. M., Durand, N. C., Lareau, C. A., Stamenova, E. K., Aiden, E., Lander, E. S., Engreitz, J. M. 2019; 51 (12): 1664-+


    Enhancer elements in the human genome control how genes are expressed in specific cell types and harbor thousands of genetic variants that influence risk for common diseases1-4. Yet, we still do not know how enhancers regulate specific genes, and we lack general rules to predict enhancer-gene connections across cell types5,6. We developed an experimental approach, CRISPRi-FlowFISH, to perturb enhancers in the genome, and we applied it to test >3,500 potential enhancer-gene connections for 30 genes. We found that a simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in our CRISPR dataset. This activity-by-contact model allows us to construct genome-wide maps of enhancer-gene connections in a given cell type, on the basis of chromatin state measurements. Together, CRISPRi-FlowFISH and the activity-by-contact model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.

    View details for DOI 10.1038/s41588-019-0538-0

    View details for Web of Science ID 000499696700003

    View details for PubMedID 31784727

    View details for PubMedCentralID PMC6886585