Postdoctoral research fellow interested in computational biology, single-cell genomics, immunology, and machine learning.
Honors & Awards
Stanford Science Fellow, Stanford University (2020)
NIH Ruth L. Kirschstein National Research Service Award (F31), National Cancer Institute (2018)
NSF GRFP, National Science Foundation (2015)
DAAD Rise Fellow, Deutscher Akademischer Austauschdienst (2013, 2014)
Barry M. Goldwater Scholar, Goldwater Foundation (2013)
Doctor of Philosophy, Harvard University (2020)
Master of Arts, Harvard University (2017)
Bachelor of Science, University of Tulsa (2015)
Ansuman Satpathy, Postdoctoral Faculty Sponsor
The SARS-CoV-2 RNA-protein interactome in infected human cells.
Characterizing the interactions that SARS-CoV-2 viral RNAs make with host cell proteins during infection can improve our understanding of viral RNA functions and the host innate immune response. Using RNA antisense purification and mass spectrometry, we identified up to 104 human proteins that directly and specifically bind to SARS-CoV-2 RNAs in infected human cells. We integrated the SARS-CoV-2 RNA interactome with changes in proteome abundance induced by viral infection and linked interactome proteins to cellular pathways relevant to SARS-CoV-2 infections. We demonstrated by genetic perturbation that cellular nucleic acid-binding protein (CNBP) and La-related protein 1 (LARP1), two of the most strongly enriched viral RNA binders, restrict SARS-CoV-2 replication in infected cells and provide a global map of their direct RNA contact sites. Pharmacological inhibition of three other RNA interactome members, PPIA, ATP1A1, and the ARP2/3 complex, reduced viral replication in two human cell lines. The identification of host dependency factors and defence strategies as presented in this work will improve the design of targeted therapeutics against SARS-CoV-2.
View details for DOI 10.1038/s41564-020-00846-z
View details for PubMedID 33349665
Chromatin Potential Identified by Shared Single-Cell Profiling of RNA and Chromatin.
Cell differentiation and function are regulated across multiple layers of gene regulation, including modulation of gene expression by changes in chromatin accessibility. However, differentiation is an asynchronous process precluding a temporal understanding of regulatory events leading to cell fate commitment. Here we developed simultaneous high-throughput ATAC and RNA expression with sequencing (SHARE-seq), a highly scalable approach for measurement of chromatin accessibility and gene expression in the same single cell, applicable to different tissues. Using 34,774 joint profiles from mouse skin, we develop a computational strategy to identify cis-regulatory interactions and define domains of regulatory chromatin (DORCs) that significantly overlap with super-enhancers. During lineage commitment, chromatin accessibility at DORCs precedes gene expression, suggesting that changes in chromatin accessibility may prime cells for lineage commitment. We computationally infer chromatin potential as a quantitative measure of chromatin lineage-priming and use it to predict cell fate outcomes. SHARE-seq is an extensible platform to study regulatory circuitry across diverse cells in tissues.
View details for DOI 10.1016/j.cell.2020.09.056
View details for PubMedID 33098772
Inherited myeloproliferative neoplasm risk affects haematopoietic stem cells.
Myeloproliferative neoplasms (MPNs) are blood cancers that are characterized by the excessive production of mature myeloid cells and arise from the acquisition of somatic driver mutations in haematopoietic stem cells (HSCs). Epidemiological studies indicate a substantial heritable component of MPNs that is among the highest known for cancers1. However, only a limited number of genetic risk loci have been identified, and the underlying biological mechanisms that lead to the acquisition of MPNs remain unclear. Here, by conducting a large-scale genome-wide association study (3,797 cases and 1,152,977 controls), we identify 17 MPN risk loci (P<5.0*10-8), 7 of which have not been previously reported. We find that there is a shared genetic architecture between MPN risk and several haematopoietic traits from distinct lineages; that there is an enrichment for MPN risk variants within accessible chromatin of HSCs; and that increased MPN risk is associated with longer telomere length in leukocytes and other clonal haematopoietic states-collectively suggesting that MPN risk is associated with the function and self-renewal of HSCs. We use gene mapping to identify modulators of HSC biologylinked to MPN risk, and show through targeted variant-to-function assays that CHEK2 and GFI1B have roles in altering the function of HSCs to confer disease risk. Overall, our results reveal a previously unappreciated mechanism for inherited MPN risk through the modulation of HSC function.
View details for DOI 10.1038/s41586-020-2786-7
View details for PubMedID 33057200
The Polygenic and Monogenic Basis of Blood Traits and Diseases.
2020; 182 (5): 1214
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
View details for DOI 10.1016/j.cell.2020.08.008
View details for PubMedID 32888494
Trans-ethnic and Ancestry-Specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations.
2020; 182 (5): 1198
Most loci identified by GWASs have been found in populations of European ancestry (EUR). In trans-ethnic meta-analyses for 15 hematological traits in 746,667 participants, including 184,535 non-EUR individuals, we identified 5,552 trait-variant associations at p< 5* 10-9, including 71 novel associations not found in EUR populations. We also identified 28 additional novel variants in ancestry-specific, non-EUR meta-analyses, including an IL7 missense variant in South Asians associated with lymphocyte count invivo and IL-7 secretion levels invitro. Fine-mapping prioritized variants annotated as functional and generated 95% credible sets that were 30% smaller when using the trans-ethnic as opposed to the EUR-only results. We explored the clinical significance and predictive value of trans-ethnic variants in multiple populations and compared genetic architecture and the effect of natural selection on these blood phenotypes between populations. Altogether, our results for hematological traits highlight the value of a more global representation of populations in genetic studies.
View details for DOI 10.1016/j.cell.2020.06.045
View details for PubMedID 32888493
Epigenomic State Transitions Characterize Tumor Progression in Mouse Lung Adenocarcinoma.
2020; 38 (2): 212
Regulatory networks that maintain functional, differentiated cell states are often dysregulated in tumor development. Here, we use single-cell epigenomics to profile chromatin state transitions in a mouse model of lung adenocarcinoma (LUAD). We identify an epigenomic continuum representing loss of cellular identity and progression toward a metastatic state. We define co-accessible regulatory programs and infer key activating and repressive chromatin regulators of these cell states. Among these co-accessibility programs, we identify a pre-metastatic transition, characterized by activation of RUNX transcription factors, which mediates extracellular matrix remodeling to promote metastasis and is predictive of survival across human LUAD patients. Together, these results demonstrate the power of single-cell epigenomics to identify regulatory programs to uncover mechanisms and key biomarkers of tumor progression.
View details for DOI 10.1016/j.ccell.2020.06.006
View details for PubMedID 32707078
A dual-deaminase CRISPR base editor enables concurrent adenine and cytosine editing
2020; 38 (7): 861–U27
Existing adenine and cytosine base editors induce only a single type of modification, limiting the range of DNA alterations that can be created. Here we describe a CRISPR-Cas9-based synchronous programmable adenine and cytosine editor (SPACE) that can concurrently introduce A-to-G and C-to-T substitutions with minimal RNA off-target edits. SPACE expands the range of possible DNA sequence alterations, broadening the research applications of CRISPR base editors.
View details for DOI 10.1038/s41587-020-0535-y
View details for Web of Science ID 000537041400001
View details for PubMedID 32483364
Prioritizing disease and trait causal variants at the TNFAIP3 locus using functional and genomic features
2020; 11 (1): 1237
Genome-wide association studies have associated thousands of genetic variants with complex traits and diseases, but pinpointing the causal variant(s) among those in tight linkage disequilibrium with each associated variant remains a major challenge. Here, we use seven experimental assays to characterize all common variants at the multiple disease-associated TNFAIP3 locus in five disease-relevant immune cell lines, based on a set of features related to regulatory potential. Trait/disease-associated variants are enriched among SNPs prioritized based on either: (1) residing within CRISPRi-sensitive regulatory regions, or (2) localizing in a chromatin accessible region while displaying allele-specific reporter activity. Of the 15 trait/disease-associated haplotypes at TNFAIP3, 9 have at least one variant meeting one or both of these criteria, 5 of which are further supported by genetic fine-mapping. Our work provides a comprehensive strategy to characterize genetic variation at important disease-associated loci, and aids in the effort to identify trait causal genetic variants.
View details for DOI 10.1038/s41467-020-15022-4
View details for Web of Science ID 000549162600014
View details for PubMedID 32144282
View details for PubMedCentralID PMC7060350
Inference and effects of barcode multiplets in droplet-based single-cell assays
2020; 11 (1): 866
A widespread assumption for single-cell analyses specifies that one cell's nucleic acids are predominantly captured by one oligonucleotide barcode. Here, we show that ~13-21% of cell barcodes from the 10x Chromium scATAC-seq assay may have been derived from a droplet with more than one oligonucleotide sequence, which we call "barcode multiplets". We demonstrate that barcode multiplets can be derived from at least two different sources. First, we confirm that approximately 4% of droplets from the 10x platform may contain multiple beads. Additionally, we find that approximately 5% of beads may contain detectable levels of multiple oligonucleotide barcodes. We show that this artifact can confound single-cell analyses, including the interpretation of clonal diversity and proliferation of intra-tumor lymphocytes. Overall, our work provides a conceptual and computational framework to identify and assess the impacts of barcode multiplets in single-cell data.
View details for DOI 10.1038/s41467-020-14667-5
View details for Web of Science ID 000514928000007
View details for PubMedID 32054859
View details for PubMedCentralID PMC7018801
Control of human hemoglobin switching by LIN28B-mediated regulation of BCL11A translation
2020; 52 (2): 138-+
Increased production of fetal hemoglobin (HbF) can ameliorate the severity of sickle cell disease and β-thalassemia1. BCL11A represses the genes encoding HbF and regulates human hemoglobin switching through variation in its expression during development2-7. However, the mechanisms underlying the developmental expression of BCL11A remain mysterious. Here we show that BCL11A is regulated at the level of messenger RNA (mRNA) translation during human hematopoietic development. Despite decreased BCL11A protein synthesis earlier in development, BCL11A mRNA continues to be associated with ribosomes. Through unbiased genomic and proteomic analyses, we demonstrate that the RNA-binding protein LIN28B, which is developmentally expressed in a pattern reciprocal to that of BCL11A, directly interacts with ribosomes and BCL11A mRNA. Furthermore, we show that BCL11A mRNA translation is suppressed by LIN28B through direct interactions, independently of its role in regulating let-7 microRNAs, and that BCL11A is the major target of LIN28B-mediated HbF induction. Our results reveal a previously unappreciated mechanism underlying human hemoglobin switching that illuminates new therapeutic opportunities.
View details for DOI 10.1038/s41588-019-0568-7
View details for Web of Science ID 000508324400002
View details for PubMedID 31959994
View details for PubMedCentralID PMC7031047
- An old BATF's new T-ricks. Nature immunology 2020
Purifying Selection against Pathogenic Mitochondrial DNA in Human T Cells.
The New England journal of medicine
Many mitochondrial diseases are caused by mutations in mitochondrial DNA (mtDNA). Patients' cells contain a mixture of mutant and nonmutant mtDNA (a phenomenon called heteroplasmy). The proportion of mutant mtDNA varies across patients and among tissues within a patient. We simultaneously assayed single-cell heteroplasmy and cell state in thousands of blood cells obtained from three unrelated patients who had A3243G-associated mitochondrial encephalomyopathy, lactic acidosis, and strokelike episodes. We observed a broad range of heteroplasmy across all cell types but also found markedly reduced heteroplasmy in T cells, a finding consistent with purifying selection within this lineage. We observed this pattern in six additional patients who had heteroplasmic A3243G without strokelike episodes. (Funded by the Marriott Foundation and others.).
View details for DOI 10.1056/NEJMoa2001265
View details for PubMedID 32786181
Massively parallel single-cell mitochondrial DNA genotyping and chromatin profiling.
Natural mitochondrial DNA (mtDNA) mutations enable the inference of clonal relationships among cells. mtDNA can be profiled along with measures of cell state, but has not yet been combined with the massively parallel approaches needed to tackle the complexity of human tissue. Here, we introduce a high-throughput, droplet-based mitochondrial single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), a method that combines high-confidence mtDNA mutation calling in thousands of single cells with their concomitant high-quality accessible chromatin profile. This enables the inference of mtDNA heteroplasmy, clonal relationships, cell state and accessible chromatin variation in individual cells. We reveal single-cell variation in heteroplasmy of a pathologic mtDNA variant, which we associate with intra-individual chromatin variability and clonal evolution. We clonally trace thousands of cells from cancers, linking epigenomic variability to subclonal evolution, and infer cellular dynamics of differentiating hematopoietic cells in vitro and in vivo. Taken together, our approach enables the study of cellular population dynamics and clonal properties in vivo.
View details for DOI 10.1038/s41587-020-0645-6
View details for PubMedID 32788668
Single Cell Transcriptomics Implicate Novel Monocyte and T Cell Immune Dysregulation in Sarcoidosis.
Frontiers in immunology
2020; 11: 567342
Sarcoidosis is a systemic inflammatory disease characterized by infiltration of immune cells into granulomas. Previous gene expression studies using heterogeneous cell mixtures lack insight into cell-type-specific immune dysregulation. We performed the first single-cell RNA-sequencing study of sarcoidosis in peripheral immune cells in 48 patients and controls. Following unbiased clustering, differentially expressed genes were identified for 18 cell types and bioinformatically assessed for function and pathway enrichment. Our results reveal persistent activation of circulating classical monocytes with subsequent upregulation of trafficking molecules. Specifically, classical monocytes upregulated distinct markers of activation including adhesion molecules, pattern recognition receptors, and chemokine receptors, as well as enrichment of immunoregulatory pathways HMGB1, mTOR, and ephrin receptor signaling. Predictive modeling implicated TGFbeta and mTOR signaling as drivers of persistent monocyte activation. Additionally, sarcoidosis T cell subsets displayed patterns of dysregulation. CD4 naive T cells were enriched for markers of apoptosis and Th17/Treg differentiation, while effector T cells showed enrichment of anergy-related pathways. Differentially expressed genes in regulatory T cells suggested dysfunctional p53, cell death, and TNFR2 signaling. Using more sensitive technology and more precise units of measure, we identify cell-type specific, novel inflammatory and regulatory pathways. Based on our findings, we suggest a novel model involving four convergent arms of dysregulation: persistent hyperactivation of innate and adaptive immunity via classical monocytes and CD4 naive T cells, regulatory T cell dysfunction, and effector T cell anergy. We further our understanding of the immunopathology of sarcoidosis and point to novel therapeutic targets.
View details for DOI 10.3389/fimmu.2020.567342
View details for PubMedID 33363531
Large-Scale Topological Changes Restrain Malignant Progression in Colorectal Cancer.
Widespread changes to DNA methylation and chromatin are well documented in cancer, but the fate of higher-order chromosomal structure remains obscure. Here we integrated topological maps for colon tumors and normal colons with epigenetic, transcriptional, and imaging data to characterize alterations to chromatin loops, topologically associated domains, and large-scale compartments. We found that spatial partitioning of the open and closed genome compartments is profoundly compromised in tumors. This reorganization is accompanied by compartment-specific hypomethylation and chromatin changes. Additionally, we identify a compartment at the interface between the canonical A and B compartments that is reorganized in tumors. Remarkably, similar shifts were evident in non-malignant cells that have accumulated excess divisions. Our analyses suggest that these topological changes repress stemness and invasion programs while inducing anti-tumor immunity genes and may therefore restrain malignant progression. Our findings call into question the conventional view that tumor-associated epigenomic alterations are primarily oncogenic.
View details for DOI 10.1016/j.cell.2020.07.030
View details for PubMedID 32841603
Longitudinal assessment of clonal mosaicism in human hematopoiesis via mitochondrial mutation tracking
2019; 3 (24): 4161–65
Our ability to track cellular dynamics in humans over time in vivo has been limited. Here, we demonstrate how somatic mutations in mitochondrial DNA (mtDNA) can be used to longitudinally track the dynamic output of hematopoietic stem and progenitor cells in humans. Over the course of 3 years of blood sampling in a single individual, our analyses reveal somatic mtDNA sequence variation and evolution reminiscent of models of hematopoiesis established by genetic labeling approaches. Furthermore, we observe fluctuations in mutation heteroplasmy, coinciding with specific clinical events, such as infections, and further identify lineage-specific somatic mtDNA mutations in longitudinally sampled circulating blood cell subsets in individuals with leukemia. Collectively, these observations indicate the significant potential of using tracking of somatic mtDNA sequence variation as a broadly applicable approach to systematically assess hematopoietic clonal dynamics in human health and disease.
View details for DOI 10.1182/bloodadvances.2019001196
View details for Web of Science ID 000504042200003
View details for PubMedID 31841597
View details for PubMedCentralID PMC6929387
Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations
2019; 51 (12): 1664-+
Enhancer elements in the human genome control how genes are expressed in specific cell types and harbor thousands of genetic variants that influence risk for common diseases1-4. Yet, we still do not know how enhancers regulate specific genes, and we lack general rules to predict enhancer-gene connections across cell types5,6. We developed an experimental approach, CRISPRi-FlowFISH, to perturb enhancers in the genome, and we applied it to test >3,500 potential enhancer-gene connections for 30 genes. We found that a simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in our CRISPR dataset. This activity-by-contact model allows us to construct genome-wide maps of enhancer-gene connections in a given cell type, on the basis of chromatin state measurements. Together, CRISPRi-FlowFISH and the activity-by-contact model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.
View details for DOI 10.1038/s41588-019-0538-0
View details for Web of Science ID 000499696700003
View details for PubMedID 31784727
View details for PubMedCentralID PMC6886585
Assessment of computational methods for the analysis of single-cell ATAC-seq data
2019; 20 (1): 241
Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1-10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10-45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level.We present a benchmarking framework that is applied to 10 computational methods for scATAC-seq on 13 synthetic and real datasets from different assays, profiling cell types from diverse tissues and organisms. Methods for processing and featurizing scATAC-seq data were compared by their ability to discriminate cell types when combined with common unsupervised clustering approaches. We rank evaluated methods and discuss computational challenges associated with scATAC-seq analysis including inherently sparse data, determination of features, peak calling, the effects of sequencing coverage and noise, and clustering performance. Running times and memory requirements are also discussed.This reference summary of scATAC-seq methods offers recommendations for best practices with consideration for both the non-expert user and the methods developer. Despite variation across methods and datasets, SnapATAC, Cusanovich2018, and cisTopic outperform other methods in separating cell populations of different coverages and noise levels in both synthetic and real datasets. Notably, SnapATAC is the only method able to analyze a large dataset (> 80,000 cells).
View details for DOI 10.1186/s13059-019-1854-5
View details for Web of Science ID 000501809500001
View details for PubMedID 31739806
View details for PubMedCentralID PMC6859644
CRISPR DNA base editors with reduced RNA off-target and self-editing activities
2019; 37 (9): 1041-+
Cytosine or adenine base editors (CBEs or ABEs) can introduce specific DNA C-to-T or A-to-G alterations1-4. However, we recently demonstrated that they can also induce transcriptome-wide guide-RNA-independent editing of RNA bases5, and created selective curbing of unwanted RNA editing (SECURE)-BE3 variants that have reduced unwanted RNA-editing activity5. Here we describe structure-guided engineering of SECURE-ABE variants with reduced off-target RNA-editing activity and comparable on-target DNA-editing activity that are also among the smallest Streptococcus pyogenes Cas9 base editors described to date. We also tested CBEs with cytidine deaminases other than APOBEC1 and found that the human APOBEC3A-based CBE induces substantial editing of RNA bases, whereas an enhanced APOBEC3A-based CBE6, human activation-induced cytidine deaminase-based CBE7, and the Petromyzon marinus cytidine deaminase-based CBE Target-AID4 induce less editing of RNA. Finally, we found that CBEs and ABEs that exhibit RNA off-target editing activity can also self-edit their own transcripts, thereby leading to heterogeneity in base-editor coding sequences.
View details for DOI 10.1038/s41587-019-0236-6
View details for Web of Science ID 000488532200020
View details for PubMedID 31477922
View details for PubMedCentralID PMC6730565
Droplet-based combinatorial indexing for massive-scale single-cell chromatin accessibility
2019; 37 (8): 916-+
Recent technical advancements have facilitated the mapping of epigenomes at single-cell resolution; however, the throughput and quality of these methods have limited their widespread adoption. Here we describe a high-quality (105 nuclear fragments per cell) droplet-microfluidics-based method for single-cell profiling of chromatin accessibility. We use this approach, named 'droplet single-cell assay for transposase-accessible chromatin using sequencing' (dscATAC-seq), to assay 46,653 cells for the unbiased discovery of cell types and regulatory elements in adult mouse brain. We further increase the throughput of this platform by combining it with combinatorial indexing (dsciATAC-seq), enabling single-cell studies at a massive scale. We demonstrate the utility of this approach by measuring chromatin accessibility across 136,463 resting and stimulated human bone marrow-derived cells to reveal changes in the cis- and trans-regulatory landscape across cell types and under stimulatory conditions at single-cell resolution. Altogether, we describe a total of 510,123 single-cell profiles, demonstrating the scalability and flexibility of this droplet-based platform.
View details for DOI 10.1038/s41587-019-0147-6
View details for Web of Science ID 000482876100023
View details for PubMedID 31235917
Transcriptional States and Chromatin Accessibility Underlying Human Erythropoiesis
2019; 27 (11): 3228-+
Human erythropoiesis serves as a paradigm of physiologic cellular differentiation. This process is also of considerable interest for better understanding anemias and identifying new therapies. Here, we apply deep transcriptomic and accessible chromatin profiling to characterize a faithful ex vivo human erythroid differentiation system from hematopoietic stem and progenitor cells. We reveal stage-specific transcriptional states and chromatin accessibility during various stages of erythropoiesis, including 14,260 differentially expressed genes and 63,659 variably accessible chromatin peaks. Our analysis suggests differentiation stage-predominant roles for specific master regulators, including GATA1 and KLF1. We integrate chromatin profiles with common and rare genetic variants associated with erythroid cell traits and diseases, finding that variants regulating different erythroid phenotypes likely act at variable points during differentiation. In addition, we identify a regulator of terminal erythropoiesis, TMCC2, more broadly illustrating the value of this comprehensive analysis to improve our understanding of erythropoiesis in health and disease.
View details for DOI 10.1016/j.celrep.2019.05.046
View details for Web of Science ID 000470993200011
View details for PubMedID 31189107
View details for PubMedCentralID PMC6579117
Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors
2019; 569 (7756): 433-+
CRISPR-Cas base-editor technology enables targeted nucleotide alterations, and is being increasingly used for research and potential therapeutic applications1,2. The most widely used cytosine base editors (CBEs) induce deamination of DNA cytosines using the rat APOBEC1 enzyme, which is targeted by a linked Cas protein-guide RNA complex3,4. Previous studies of the specificity of CBEs have identified off-target DNA edits in mammalian cells5,6. Here we show that a CBE with rat APOBEC1 can cause extensive transcriptome-wide deamination of RNA cytosines in human cells, inducing tens of thousands of C-to-U edits with frequencies ranging from 0.07% to 100% in 38-58% of expressed genes. CBE-induced RNA edits occur in both protein-coding and non-protein-coding sequences and generate missense, nonsense, splice site, and 5' and 3' untranslated region mutations. We engineered two CBE variants bearing mutations in rat APOBEC1 that substantially decreased the number of RNA edits (by more than 390-fold and more than 3,800-fold) in human cells. These variants also showed more precise on-target DNA editing than the wild-type CBE and, for most guide RNAs tested, no substantial reduction in editing efficiency. Finally, we show that an adenine base editor7 can also induce transcriptome-wide RNA edits. These results have implications for the use of base editors in both research and clinical settings, illustrate the feasibility of engineering improved variants with reduced RNA editing activities, and suggest the need to more fully define and characterize the RNA off-target effects of deaminase enzymes in base editor platforms.
View details for DOI 10.1038/s41586-019-1161-z
View details for Web of Science ID 000468123700044
View details for PubMedID 30995674
View details for PubMedCentralID PMC6657343
Gene-centric functional dissection of human genetic variation uncovers regulators of hematopoiesis
Genome-wide association studies (GWAS) have identified thousands of variants associated with human diseases and traits. However, the majority of GWAS-implicated variants are in non-coding regions of the genome and require in depth follow-up to identify target genes and decipher biological mechanisms. Here, rather than focusing on causal variants, we have undertaken a pooled loss-of-function screen in primary hematopoietic cells to interrogate 389 candidate genes contained in 75 loci associated with red blood cell traits. Using this approach, we identify 77 genes at 38 GWAS loci, with most loci harboring 1-2 candidate genes. Importantly, the hit set was strongly enriched for genes validated through orthogonal genetic approaches. Genes identified by this approach are enriched in specific and relevant biological pathways, allowing regulators of human erythropoiesis and modifiers of blood diseases to be defined. More generally, this functional screen provides a paradigm for gene-centric follow up of GWAS for a variety of human diseases and traits.
View details for DOI 10.7554/eLife.44080
View details for Web of Science ID 000468967900001
View details for PubMedID 31070582
View details for PubMedCentralID PMC6534380
Impaired human hematopoiesis due to a cryptic intronic GATA1 splicing mutation
JOURNAL OF EXPERIMENTAL MEDICINE
2019; 216 (5): 1050–60
Studies of allelic variation underlying genetic blood disorders have provided important insights into human hematopoiesis. Most often, the identified pathogenic mutations result in loss-of-function or missense changes. However, assessing the pathogenicity of noncoding variants can be challenging. Here, we characterize two unrelated patients with a distinct presentation of dyserythropoietic anemia and other impairments in hematopoiesis associated with an intronic mutation in GATA1 that is 24 nucleotides upstream of the canonical splice acceptor site. Functional studies demonstrate that this single-nucleotide alteration leads to reduced canonical splicing and increased use of an alternative splice acceptor site that causes a partial intron retention event. The resultant altered GATA1 contains a five-amino acid insertion at the C-terminus of the C-terminal zinc finger and has no observable activity. Collectively, our results demonstrate how altered splicing of GATA1, which reduces levels of the normal form of this master transcription factor, can result in distinct changes in human hematopoiesis.
View details for DOI 10.1084/jem.20181625
View details for Web of Science ID 000466981400009
View details for PubMedID 30914438
View details for PubMedCentralID PMC6504223
Heritability of fetal hemoglobin, white cell count, and other clinical traits from a sickle cell disease family cohort
AMERICAN JOURNAL OF HEMATOLOGY
2019; 94 (5): 522–27
Sickle cell disease (SCD) is the most common monogenic disorder in the world. Notably, there is extensive clinical heterogeneity in SCD that cannot be fully accounted for by known factors, and in particular, the extent to which the phenotypic diversity of SCD can be explained by genetic variation has not been reliably quantified. Here, in a family-based cohort of 449 patients with SCD and 755 relatives, we first show that 5 known modifiers affect 11 adverse outcomes in SCD to varying degrees. We then utilize a restricted maximum likelihood procedure to estimate the heritability of 20 hematologic traits, including fetal hemoglobin (HbF) and white blood cell count (WBC), in the clinically relevant context of inheritance from healthy carriers to SCD patients. We report novel estimations of heritability for HbF at 31.6% (±5.4%) and WBC at 41.2% (±6.8%) in our cohort. Finally, we demonstrate shared genetic bases between HbF, WBC, and other hematologic traits, but surprisingly little overlap between HbF and WBC themselves. In total, our analyses show that HbF and WBC have significant heritable components among individuals with SCD and their relatives, demonstrating the value of using family-based studies to better understand modifiers of SCD.
View details for DOI 10.1002/ajh.25421
View details for Web of Science ID 000468303900002
View details for PubMedID 30680775
View details for PubMedCentralID PMC6449202
Single-cell trajectories reconstruction, exploration and mapping of omics data with STREAM
2019; 10: 1903
Single-cell transcriptomic assays have enabled the de novo reconstruction of lineage differentiation trajectories, along with the characterization of cellular heterogeneity and state transitions. Several methods have been developed for reconstructing developmental trajectories from single-cell transcriptomic data, but efforts on analyzing single-cell epigenomic data and on trajectory visualization remain limited. Here we present STREAM, an interactive pipeline capable of disentangling and visualizing complex branching trajectories from both single-cell transcriptomic and epigenomic data. We have tested STREAM on several synthetic and real datasets generated with different single-cell technologies. We further demonstrate its utility for understanding myoblast differentiation and disentangling known heterogeneity in hematopoiesis for different organisms. STREAM is an open-source software package.
View details for DOI 10.1038/s41467-019-09670-4
View details for Web of Science ID 000465202300008
View details for PubMedID 31015418
View details for PubMedCentralID PMC6478907
Novel CRISPR Cytosine Base Editors with Minimized Off-Target Effects and Improved Editing Properties
CELL PRESS. 2019: 295
View details for Web of Science ID 000464381003087
The ATPase module of mammalian SWI/SNF family complexes mediates subcomplex identity and catalytic activity-independent genomic targeting
2019; 51 (4): 618-+
Perturbations to mammalian switch/sucrose non-fermentable (mSWI/SNF) chromatin remodeling complexes have been widely implicated as driving events in cancer1. One such perturbation is the dual loss of the SMARCA4 and SMARCA2 ATPase subunits in small cell carcinoma of the ovary, hypercalcemic type (SCCOHT)2-5, SMARCA4-deficient thoracic sarcomas6 and dedifferentiated endometrial carcinomas7. However, the consequences of dual ATPase subunit loss on mSWI/SNF complex subunit composition, chromatin targeting, DNA accessibility and gene expression remain unknown. Here we identify an ATPase module of subunits that is required for functional specification of the Brahma-related gene-associated factor (BAF) and polybromo-associated BAF (PBAF) mSWI/SNF family subcomplexes. Using SMARCA4/2 ATPase mutant variants, we define the catalytic activity-dependent and catalytic activity-independent contributions of the ATPase module to the targeting of BAF and PBAF complexes on chromatin genome-wide. Finally, by linking distinct mSWI/SNF complex target sites to tumor-suppressive gene expression programs, we clarify the transcriptional consequences of SMARCA4/2 dual loss in SCCOHT.
View details for DOI 10.1038/s41588-019-0363-5
View details for Web of Science ID 000462767500009
View details for PubMedID 30858614
View details for PubMedCentralID PMC6755913
- Interrogation of human hematopoiesis at single-cell and single-variant resolution NATURE GENETICS 2019; 51 (4): 683-+
Lineage Tracing in Humans Enabled by Mitochondrial Mutations and Single-Cell Genomics
2019; 176 (6): 1325-+
Lineage tracing provides key insights into the fate of individual cells in complex organisms. Although effective genetic labeling approaches are available in model systems, in humans, most approaches require detection of nuclear somatic mutations, which have high error rates, limited scale, and do not capture cell state information. Here, we show that somatic mutations in mtDNA can be tracked by single-cell RNA or assay for transposase accessible chromatin (ATAC) sequencing. We leverage somatic mtDNA mutations as natural genetic barcodes and demonstrate their utility as highly accurate clonal markers to infer cellular relationships. We track native human cells both in vitro and in vivo and relate clonal dynamics to gene expression and chromatin accessibility. Our approach should allow clonal tracking at a 1,000-fold greater scale than with nuclear genome sequencing, with simultaneous information on cell state, opening the way to chart cellular dynamics in human health and disease.
View details for DOI 10.1016/j.cell.2019.01.022
View details for Web of Science ID 000460509600009
View details for PubMedID 30827679
View details for PubMedCentralID PMC6408267
The cis-Regulatory Atlas of the Mouse Immune System
2019; 176 (4): 897-+
A complete chart of cis-regulatory elements and their dynamic activity is necessary to understand the transcriptional basis of differentiation and function of an organ system. We generated matched epigenome and transcriptome measurements in 86 primary cell types that span the mouse immune system and its differentiation cascades. This breadth of data enable variance components analysis that suggests that genes fall into two distinct classes, controlled by either enhancer- or promoter-driven logic, and multiple regression that connects genes to the enhancers that regulate them. Relating transcription factor (TF) expression to the genome-wide accessibility of their binding motifs classifies them as predominantly openers or closers of local chromatin accessibility, pinpointing specific cis-regulatory elements where binding of given TFs is likely functionally relevant, validated by chromatin immunoprecipitation sequencing (ChIP-seq). Overall, this cis-regulatory atlas provides a trove of information on transcriptional regulation through immune differentiation and a foundational scaffold to define key regulatory events throughout the immunological genome.
View details for DOI 10.1016/j.cell.2018.12.036
View details for Web of Science ID 000457969200019
View details for PubMedID 30686579
View details for PubMedCentralID PMC6785993
Preprocessing and Computational Analysis of Single-Cell Epigenomic Datasets.
Methods in molecular biology (Clifton, N.J.)
2019; 1935: 187–202
Recent technological developments have enabled the characterization of the epigenetic landscape of single cells across a range of tissues in normal and diseased states and under various biological and chemical perturbations. While analysis of these profiles resembles methods from single-cell transcriptomic studies, unique challenges are associated with bioinformatics processing of single-cell epigenetic data, including a much larger (10-1,000*) feature set and significantly greater sparsity, requiring customized solutions. Here, we discuss the essentials of the computational methodology required for analyzing common single-cell epigenomic measurements for DNA methylation using bisulfite sequencing and open chromatin using ATAC-Seq.
View details for DOI 10.1007/978-1-4939-9057-3_13
View details for PubMedID 30758828
A non-canonical SWI/SNF complex is a synthetic lethal target in cancers driven by BAF complex perturbation
NATURE CELL BIOLOGY
2018; 20 (12): 1410-+
Mammalian SWI/SNF chromatin remodelling complexes exist in three distinct, final-form assemblies: canonical BAF (cBAF), PBAF and a newly characterized non-canonical complex (ncBAF). However, their complex-specific targeting on chromatin, functions and roles in disease remain largely undefined. Here, we comprehensively mapped complex assemblies on chromatin and found that ncBAF complexes uniquely localize to CTCF sites and promoters. We identified ncBAF subunits as synthetic lethal targets specific to synovial sarcoma and malignant rhabdoid tumours, which both exhibit cBAF complex (SMARCB1 subunit) perturbation. Chemical and biological depletion of the ncBAF subunit, BRD9, rapidly attenuates synovial sarcoma and malignant rhabdoid tumour cell proliferation. Importantly, in cBAF-perturbed cancers, ncBAF complexes maintain gene expression at retained CTCF-promoter sites and function in a manner distinct from fusion oncoprotein-bound complexes. Together, these findings unmask the unique targeting and functional roles of ncBAF complexes and present new cancer-specific therapeutic targets.
View details for DOI 10.1038/s41556-018-0221-1
View details for Web of Science ID 000451328500013
View details for PubMedID 30397315
View details for PubMedCentralID PMC6698386
Enhancer histone-QTLs are enriched on autoimmune risk haplotypes and influence gene expression within chromatin networks
2018; 9: 2905
Genetic variants can confer risk to complex genetic diseases by modulating gene expression through changes to the epigenome. To assess the degree to which genetic variants influence epigenome activity, we integrate epigenetic and genotypic data from lupus patient lymphoblastoid cell lines to identify variants that induce allelic imbalance in the magnitude of histone post-translational modifications, referred to herein as histone quantitative trait loci (hQTLs). We demonstrate that enhancer hQTLs are enriched on autoimmune disease risk haplotypes and disproportionately influence gene expression variability compared with non-hQTL variants in strong linkage disequilibrium. We show that the epigenome regulates HLA class II genes differently in individuals who carry HLA-DR3 or HLA-DR15 haplotypes, resulting in differential 3D chromatin conformation and gene expression. Finally, we identify significant expression QTL (eQTL) x hQTL interactions that reveal substructure within eQTL gene expression, suggesting potential implications for functional genomic studies that leverage eQTL data for subject selection and stratification.
View details for DOI 10.1038/s41467-018-05328-9
View details for Web of Science ID 000439687600003
View details for PubMedID 30046115
View details for PubMedCentralID PMC6060153
Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation
2018; 173 (6): 1535-+
Human hematopoiesis involves cellular differentiation of multipotent cells into progressively more lineage-restricted states. While the chromatin accessibility landscape of this process has been explored in defined populations, single-cell regulatory variation has been hidden by ensemble averaging. We collected single-cell chromatin accessibility profiles across 10 populations of immunophenotypically defined human hematopoietic cell types and constructed a chromatin accessibility landscape of human hematopoiesis to characterize differentiation trajectories. We find variation consistent with lineage bias toward different developmental branches in multipotent cell types. We observe heterogeneity within common myeloid progenitors (CMPs) and granulocyte-macrophage progenitors (GMPs) and develop a strategy to partition GMPs along their differentiation trajectory. Furthermore, we integrated single-cell RNA sequencing (scRNA-seq) data to associate transcription factors to chromatin accessibility changes and regulatory elements to target genes through correlations of expression and regulatory element accessibility. Overall, this work provides a framework for integrative exploration of complex regulatory dynamics in a primary human tissue at single-cell resolution.
View details for PubMedID 29706549
Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types
2018; 50 (4): 621-+
We introduce an approach to identify disease-relevant tissues and cell types by analyzing gene expression data together with genome-wide association study (GWAS) summary statistics. Our approach uses stratified linkage disequilibrium (LD) score regression to test whether disease heritability is enriched in regions surrounding genes with the highest specific expression in a given tissue. We applied our approach to gene expression data from several sources together with GWAS summary statistics for 48 diseases and traits (average N = 169,331) and found significant tissue-specific enrichments (false discovery rate (FDR) < 5%) for 34 traits. In our analysis of multiple tissues, we detected a broad range of enrichments that recapitulated known biology. In our brain-specific analysis, significant enrichments included an enrichment of inhibitory over excitatory neurons for bipolar disorder, and excitatory over inhibitory neurons for schizophrenia and body mass index. Our results demonstrate that our polygenic approach is a powerful way to leverage gene expression data for interpreting GWAS signals.
View details for DOI 10.1038/s41588-018-0081-4
View details for Web of Science ID 000429529300022
View details for PubMedID 29632380
View details for PubMedCentralID PMC5896795
- Response to "Unexpected mutations after CRISPR-Cas9 editing in vivo" NATURE METHODS 2018; 15 (4): 238–39
- hichipper: a preprocessing pipeline for calling DNA loops from HiChIP data NATURE METHODS 2018; 15 (3): 155–56
diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data
2018; 34 (4): 672–74
The 3D architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in DNA looping structure are associated with variation in gene expression and cell state. To systematically assess changes in DNA looping architecture between samples, we introduce diffloop, an R/Bioconductor package that provides a suite of functions for the quality control, statistical testing, annotation, and visualization of DNA loops. We demonstrate this functionality by detecting differences between ENCODE ChIA-PET samples and relate looping to variability in epigenetic state.Diffloop is implemented as an R/Bioconductor package available at https://firstname.lastname@example.org.Supplementary data are available at Bioinformatics online.
View details for DOI 10.1093/bioinformatics/btx623
View details for Web of Science ID 000424889300017
View details for PubMedID 29028898
View details for PubMedCentralID PMC5860605
Common genes associated with antidepressant response in mouse and man identify key role of glucocorticoid receptor sensitivity
2017; 15 (12): e2002690
Response to antidepressant treatment in major depressive disorder (MDD) cannot be predicted currently, leading to uncertainty in medication selection, increasing costs, and prolonged suffering for many patients. Despite tremendous efforts in identifying response-associated genes in large genome-wide association studies, the results have been fairly modest, underlining the need to establish conceptually novel strategies. For the identification of transcriptome signatures that can distinguish between treatment responders and nonresponders, we herein submit a novel animal experimental approach focusing on extreme phenotypes. We utilized the large variance in response to antidepressant treatment occurring in DBA/2J mice, enabling sample stratification into subpopulations of good and poor treatment responders to delineate response-associated signature transcript profiles in peripheral blood samples. As a proof of concept, we translated our murine data to the transcriptome data of a clinically relevant human cohort. A cluster of 259 differentially regulated genes was identified when peripheral transcriptome profiles of good and poor treatment responders were compared in the murine model. Differences in expression profiles from baseline to week 12 of the human orthologues selected on the basis of the murine transcript signature allowed prediction of response status with an accuracy of 76% in the patient population. Finally, we show that glucocorticoid receptor (GR)-regulated genes are significantly enriched in this cluster of antidepressant-response genes. Our findings point to the involvement of GR sensitivity as a potential key mechanism shaping response to antidepressant treatment and support the hypothesis that antidepressants could stimulate resilience-promoting molecular mechanisms. Our data highlight the suitability of an appropriate animal experimental approach for the discovery of treatment response-associated pathways across species.
View details for DOI 10.1371/journal.pbio.2002690
View details for Web of Science ID 000418943900003
View details for PubMedID 29283992
View details for PubMedCentralID PMC5746203
A B Cell Regulome Links Notch to Downstream Oncogenic Pathways in Small B Cell Lymphomas
2017; 21 (3): 784–97
Gain-of-function Notch mutations are recurrent in mature small B cell lymphomas such as mantle cell lymphoma (MCL) and chronic lymphocytic leukemia (CLL), but the Notch target genes that contribute to B cell oncogenesis are largely unknown. We performed integrative analysis of Notch-regulated transcripts, genomic binding of Notch transcription complexes, and genome conformation data to identify direct Notch target genes in MCL cell lines. This B cell Notch regulome is largely controlled through Notch-bound distal enhancers and includes genes involved in B cell receptor and cytokine signaling and the oncogene MYC, which sustains proliferation of Notch-dependent MCL cell lines via a Notch-regulated lineage-restricted enhancer complex. Expression of direct Notch target genes is associated with Notch activity in an MCL xenograft model and in CLL lymph node biopsies. Our findings provide key insights into the role of Notch in MCL and other B cell malignancies and have important implications for therapeutic targeting of Notch-dependent oncogenic pathways.
View details for DOI 10.1016/j.celrep.2017.09.066
View details for Web of Science ID 000413090600019
View details for PubMedID 29045844
View details for PubMedCentralID PMC5687286
An Epigenome-Guided Approach to Causal Variant Discovery in Autoimmune Disease
View details for Web of Science ID 000411824100184
Dissecting hematopoietic and renal cell heterogeneity in adult zebrafish at single-cell resolution using RNA sequencing
JOURNAL OF EXPERIMENTAL MEDICINE
2017; 214 (10): 2875–87
Recent advances in single-cell, transcriptomic profiling have provided unprecedented access to investigate cell heterogeneity during tissue and organ development. In this study, we used massively parallel, single-cell RNA sequencing to define cell heterogeneity within the zebrafish kidney marrow, constructing a comprehensive molecular atlas of definitive hematopoiesis and functionally distinct renal cells found in adult zebrafish. Because our method analyzed blood and kidney cells in an unbiased manner, our approach was useful in characterizing immune-cell deficiencies within DNA-protein kinase catalytic subunit (prkdc), interleukin-2 receptor γ a (il2rga), and double-homozygous-mutant fish, identifying blood cell losses in T, B, and natural killer cells within specific genetic mutants. Our analysis also uncovered novel cell types, including two classes of natural killer immune cells, classically defined and erythroid-primed hematopoietic stem and progenitor cells, mucin-secreting kidney cells, and kidney stem/progenitor cells. In total, our work provides the first, comprehensive, single-cell, transcriptomic analysis of kidney and marrow cells in the adult zebrafish.
View details for DOI 10.1084/jem.20170976
View details for Web of Science ID 000412015600006
View details for PubMedID 28878000
View details for PubMedCentralID PMC5626406
- Confounding in ex vivo models of Diamond-Blackfan anemia BLOOD 2017; 130 (9): 1165–68
- Notch-Regulated Enhancers in B-Cell Lymphoma Activate MYC and Potentiate B-Cell Receptor Signaling AMER SOC HEMATOLOGY. 2016
Computationally Efficient Solutions for Functionalizing Common Variants in Three-Dimensional Models
WILEY-BLACKWELL. 2015: 562
View details for Web of Science ID 000363340500092
Fine mapping of chromosome 15q25 implicates ZNF592 in neurosarcoidosis patients
ANNALS OF CLINICAL AND TRANSLATIONAL NEUROLOGY
2015; 2 (10): 972–77
Neurosarcoidosis is a clinical subtype of sarcoidosis characterized by the presence of granulomas in the nervous system. Here, we report a highly significant association with a variant (rs75652600, P = 3.12 × 10(-8), odds ratios = 4.34) within a zinc finger gene, ZNF592, from an imputation-based fine-mapping study of the chromosomal region 15q25 in African-Americans with neurosarcoidosis. We validate the association with ZNF592, a gene previously shown to cause cerebellar ataxia, in a cohort of European-Americans with neurosarcoidosis by uncovering low-frequency variants with a similar risk effect size (chr15:85309284, P = 0.0021, odds ratios = 5.36).
View details for DOI 10.1002/acn3.229
View details for Web of Science ID 000367239000004
View details for PubMedID 26478897
View details for PubMedCentralID PMC4603380