Institute Affiliations

All Publications

  • Identification and quantification of small exon-containing isoforms in long-read RNA sequencing data. Nucleic acids research Liu, Z., Zhu, C., Steinmetz, L. M., Wei, W. 2023


    Small exons are pervasive in transcriptomes across organisms, and their quantification in RNA isoforms is crucial for understanding gene functions. Although long-read RNA-seq based on Oxford Nanopore Technologies (ONT) offers the advantage of covering transcripts in full length, its lower base accuracy poses challenges for identifying individual exons, particularly microexons (≤ 30 nucleotides). Here, we systematically assess small exons quantification in synthetic and human ONT RNA-seq datasets. We demonstrate that reads containing small exons are often not properly aligned, affecting the quantification of relevant transcripts. Thus, we develop a local-realignment method for misaligned exons (MisER), which remaps reads with misaligned exons to the transcript references. Using synthetic and simulated datasets, we demonstrate the high sensitivity and specificity of MisER for the quantification of transcripts containing small exons. Moreover, MisER enabled us to identify small exons with a higher percent spliced-in index (PSI) in neural, particularly neural-regulated microexons, when comparing 14 neural to 16 non-neural tissues in humans. Our work introduces an improved quantification method for long-read RNA-seq and especially facilitates studies using ONT long-reads to elucidate the regulation of genes involving small exons.

    View details for DOI 10.1093/nar/gkad810

    View details for PubMedID 37843096

  • Integrative omic profiling and analyses in two pig heart to human xenotransplants Keating, B., Schmauch, E., Piening, B., Xia, B., Zhu, C., Chang, B., Khalil, K., Kim, J., Weldon, E., Pass, H., Ayares, D., Griesemer, A., Mangiola, M., Stern, J., Snyder, M. P., Boeke, J., Montgomery, R. A. LIPPINCOTT WILLIAMS & WILKINS. 2023: 137
  • Advances and prospects for the Human BioMolecular Atlas Program (HuBMAP). Nature cell biology Jain, S., Pei, L., Spraggins, J. M., Angelo, M., Carson, J. P., Gehlenborg, N., Ginty, F., Gonçalves, J. P., Hagood, J. S., Hickey, J. W., Kelleher, N. L., Laurent, L. C., Lin, S., Lin, Y., Liu, H., Naba, A., Nakayasu, E. S., Qian, W. J., Radtke, A., Robson, P., Stockwell, B. R., Van de Plas, R., Vlachos, I. S., Zhou, M., Börner, K., Snyder, M. P. 2023


    The Human BioMolecular Atlas Program (HuBMAP) aims to create a multi-scale spatial atlas of the healthy human body at single-cell resolution by applying advanced technologies and disseminating resources to the community. As the HuBMAP moves past its first phase, creating ontologies, protocols and pipelines, this Perspective introduces the production phase: the generation of reference spatial maps of functional tissue units across many organs from diverse populations and the creation of mapping tools and infrastructure to advance biomedical research.

    View details for DOI 10.1038/s41556-023-01194-w

    View details for PubMedID 37468756

    View details for PubMedCentralID 8238499

  • Organization of the human intestine at single-cell resolution. Nature Hickey, J. W., Becker, W. R., Nevins, S. A., Horning, A., Perez, A. E., Zhu, C., Zhu, B., Wei, B., Chiu, R., Chen, D. C., Cotter, D. L., Esplin, E. D., Weimer, A. K., Caraccio, C., Venkataraaman, V., Schürch, C. M., Black, S., Brbić, M., Cao, K., Chen, S., Zhang, W., Monte, E., Zhang, N. R., Ma, Z., Leskovec, J., Zhang, Z., Lin, S., Longacre, T., Plevritis, S. K., Lin, Y., Nolan, G. P., Greenleaf, W. J., Snyder, M. 2023; 619 (7970): 572-584


    The intestine is a complex organ that promotes digestion, extracts nutrients, participates in immune surveillance, maintains critical symbiotic relationships with microbiota and affects overall health1. The intesting has a length of over nine metres, along which there are differences in structure and function2. The localization of individual cell types, cell type development trajectories and detailed cell transcriptional programs probably drive these differences in function. Here, to better understand these differences, we evaluated the organization of single cells using multiplexed imaging and single-nucleus RNA and open chromatin assays across eight different intestinal sites from nine donors. Through systematic analyses, we find cell compositions that differ substantially across regions of the intestine and demonstrate the complexity of epithelial subtypes, and find that the same cell types are organized into distinct neighbourhoods and communities, highlighting distinct immunological niches that are present in the intestine. We also map gene regulatory differences in these cells that are suggestive of a regulatory differentiation cascade, and associate intestinal disease heritability with specific cell types. These results describe the complexity of the cell composition, regulation and organization for this organ, and serve as an important reference map for understanding human biology and disease.

    View details for DOI 10.1038/s41586-023-05915-x

    View details for PubMedID 37468586

    View details for PubMedCentralID PMC10356619

  • Genotype Complements the Phenotype: Identification of the Pathogenicity of an LMNA Splice Variant by Nanopore Long-Read Sequencing in a Large DCM Family. International journal of molecular sciences Sedaghat-Hamedani, F., Rebs, S., Kayvanpour, E., Zhu, C., Amr, A., Müller, M., Haas, J., Wu, J., Steinmetz, L. M., Ehlermann, P., Streckfuss-Bömeke, K., Frey, N., Meder, B. 2022; 23 (20)


    Dilated cardiomyopathy (DCM) is a common cause of heart failure (HF) and is of familial origin in 20-40% of cases. Genetic testing by next-generation sequencing (NGS) has yielded a definite diagnosis in many cases; however, some remain elusive. In this study, we used a combination of NGS, human-induced pluripotent-stem-cell-derived cardiomyocytes (iPSC-CMs) and nanopore long-read sequencing to identify the causal variant in a multi-generational pedigree of DCM. A four-generation family with familial DCM was investigated. Next-generation sequencing (NGS) was performed on 22 family members. Skin biopsies from two affected family members were used to generate iPSCs, which were then differentiated into iPSC-CMs. Short-read RNA sequencing was used for the evaluation of the target gene expression, and long-read RNA nanopore sequencing was used to evaluate the relevance of the splice variants. The pedigree suggested a highly penetrant, autosomal dominant mode of inheritance. The phenotype of the family was suggestive of laminopathy, but previous genetic testing using both Sanger and panel sequencing only yielded conflicting evidence for LMNA p.R644C (rs142000963), which was not fully segregated. By re-sequencing four additional affected family members, further non-coding LMNA variants could be detected: rs149339264, rs199686967, rs201379016, and rs794728589. To explore the roles of these variants, iPSC-CMs were generated. RNA sequencing showed the LMNA expression levels to be significantly lower in the iPSC-CMs of the LMNA variant carriers. We demonstrated a dysregulated sarcomeric structure and altered calcium homeostasis in the iPSC-CMs of the LMNA variant carriers. Using targeted nanopore long-read sequencing, we revealed the biological significance of the variant c.356+1G>A, which generates a novel 5' splice site in exon 1 of the cardiac isomer of LMNA, causing a nonsense mRNA product with almost complete RNA decay and haploinsufficiency. Using novel molecular analysis and nanopore technology, we demonstrated the pathogenesis of the rs794728589 (c.356+1G>A) splice variant in LMNA. This study highlights the importance of precise diagnostics in the clinical management and workup of cardiomyopathies.

    View details for DOI 10.3390/ijms232012230

    View details for PubMedID 36293084

  • Transcription Factor GATA4 Regulates Cell Type-Specific Splicing Through Direct Interaction With RNA in Human Induced Pluripotent Stem Cell-Derived Cardiac Progenitors. Circulation Zhu, L., Choudhary, K., Gonzalez-Teran, B., Ang, Y., Thomas, R., Stone, N. R., Liu, L., Zhou, P., Zhu, C., Ruan, H., Huang, Y., Jin, S., Pelonero, A., Koback, F., Padmanabhan, A., Sadagopan, N., Hsu, A., Costa, M. W., Gifford, C. A., van Bemmel, J., Huttenhain, R., Vedantham, V., Conklin, B. R., Black, B. L., Bruneau, B. G., Steinmetz, L., Krogan, N. J., Pollard, K. S., Srivastava, D. 2022: CIRCULATIONAHA121057620


    BACKGROUND: GATA4 (GATA-binding protein 4), a zinc finger-containing, DNA-binding transcription factor, is essential for normal cardiac development and homeostasis in mice and humans, and mutations in this gene have been reported in human heart defects. Defects in alternative splicing are associated with many heart diseases, yet relatively little is known about how cell type- or cell state-specific alternative splicing is achieved in the heart. Here, we show that GATA4 regulates cell type-specific splicing through direct interaction with RNA and the spliceosome in human induced pluripotent stem cell-derived cardiac progenitors.METHODS: We leveraged a combination of unbiased approaches including affinity purification of GATA4 and mass spectrometry, enhanced cross-linking with immunoprecipitation, electrophoretic mobility shift assays, in vitro splicing assays, and unbiased transcriptomic analysis to uncover GATA4's novel function as a splicing regulator in human induced pluripotent stem cell-derived cardiac progenitors.RESULTS: We found that GATA4 interacts with many members of the spliceosome complex in human induced pluripotent stem cell-derived cardiac progenitors. Enhanced cross-linking with immunoprecipitation demonstrated that GATA4 also directly binds to a large number of mRNAs through defined RNA motifs in a sequence-specific manner. In vitro splicing assays indicated that GATA4 regulates alternative splicing through direct RNA binding, resulting in functionally distinct protein products. Correspondingly, knockdown of GATA4 in human induced pluripotent stem cell-derived cardiac progenitors resulted in differential alternative splicing of genes involved in cytoskeleton organization and calcium ion import, with functional consequences associated with the protein isoforms.CONCLUSIONS: This study shows that in addition to its well described transcriptional function, GATA4 interacts with members of the spliceosome complex and regulates cell type-specific alternative splicing via sequence-specific interactions with RNA. Several genes that have splicing regulated by GATA4 have functional consequences and many are associated with dilated cardiomyopathy, suggesting a novel role for GATA4 in achieving the necessary cardiac proteome in normal and stress-responsive conditions.

    View details for DOI 10.1161/CIRCULATIONAHA.121.057620

    View details for PubMedID 35938400

  • Single-molecule, full-length transcript isoform sequencing reveals disease-associated RNA isoforms in cardiomyocytes. Nature communications Zhu, C., Wu, J., Sun, H., Briganti, F., Meder, B., Wei, W., Steinmetz, L. M. 2021; 12 (1): 4203


    Alternative splicing generates differing RNA isoforms that govern phenotypic complexity of eukaryotes. Its malfunction underlies many diseases, including cancer and cardiovascular diseases. Comparative analysis of RNA isoforms at the genome-wide scale has been difficult. Here, we establish an experimental and computational pipeline that performs de novo transcript annotation and accurately quantifies transcript isoforms from cDNA sequences with a full-length isoform detection accuracy of 97.6%. We generate a searchable, quantitative human transcriptome annotation with 31,025 known and 5,740 novel transcript isoforms ( ). By analyzing the isoforms in the presence of RNA Binding Motif Protein 20 (RBM20) mutations associated with aggressive dilated cardiomyopathy (DCM), we identify 121 differentially expressed transcript isoforms in 107 cardiac genes. Our approach enables quantitative dissection of complex transcript architecture instead of mere identification of inclusion or exclusion of individual exons, as exemplified by the discovery of IMMT isoforms mis-spliced by RBM20 mutations. Thereby we achieve a path to direct differential expression testing independent of an existing annotation of transcript isoforms, providing more immediate biological interpretation and higher resolution transcriptome comparisons.

    View details for DOI 10.1038/s41467-021-24484-z

    View details for PubMedID 34244519

  • iPSC Modeling of RBM20-Deficient DCM Identifies Upregulation of RBM20 as a Therapeutic Strategy. Cell reports Briganti, F. n., Sun, H. n., Wei, W. n., Wu, J. n., Zhu, C. n., Liss, M. n., Karakikes, I. n., Rego, S. n., Cipriano, A. n., Snyder, M. n., Meder, B. n., Xu, Z. n., Millat, G. n., Gotthardt, M. n., Mercola, M. n., Steinmetz, L. M. 2020; 32 (10): 108117


    Recent advances in induced pluripotent stem cell (iPSC) technology and directed differentiation of iPSCs into cardiomyocytes (iPSC-CMs) make it possible to model genetic heart disease in vitro. We apply CRISPR/Cas9 genome editing technology to introduce three RBM20 mutations in iPSCs and differentiate them into iPSC-CMs to establish an in vitro model of RBM20 mutant dilated cardiomyopathy (DCM). In iPSC-CMs harboring a known causal RBM20 variant, the splicing of RBM20 target genes, calcium handling, and contractility are impaired consistent with the disease manifestation in patients. A variant (Pro633Leu) identified by exome sequencing of patient genomes displays the same disease phenotypes, thus establishing this variant as disease causing. We find that all-trans retinoic acid upregulates RBM20 expression and reverts the splicing, calcium handling, and contractility defects in iPSC-CMs with different causal RBM20 mutations. These results suggest that pharmacological upregulation of RBM20 expression is a promising therapeutic strategy for DCM patients with a heterozygous mutation in RBM20.

    View details for DOI 10.1016/j.celrep.2020.108117

    View details for PubMedID 32905764

  • NAD(P)HX repair deficiency causes central metabolic perturbations in yeast and human cells FEBS JOURNAL Becker-Kettern, J., Paczia, N., Conrotte, J., Zhu, C., Fiehn, O., Jung, P. P., Steinmetz, L. M., Linster, C. L. 2018; 285 (18): 3376–3401


    NADHX and NADPHX are hydrated and redox inactive forms of the NADH and NADPH cofactors, known to inhibit several dehydrogenases in vitro. A metabolite repair system that is conserved in all domains of life and that comprises the two enzymes NAD(P)HX dehydratase and NAD(P)HX epimerase, allows reconversion of both the S- and R-epimers of NADHX and NADPHX to the normal cofactors. An inherited deficiency in this system has recently been shown to cause severe neurometabolic disease in children. Although evidence for the presence of NAD(P)HX has been obtained in plant and human cells, little is known about the mechanism of formation of these derivatives in vivo and their potential effects on cell metabolism. Here, we show that NAD(P)HX dehydratase deficiency in yeast leads to an important, temperature-dependent NADHX accumulation in quiescent cells with a concomitant depletion of intracellular NAD+ and serine pools. We demonstrate that NADHX potently inhibits the first step of the serine synthesis pathway in yeast. Human cells deficient in the NAD(P)HX dehydratase also accumulated NADHX and showed decreased viability. In addition, those cells consumed more glucose and produced more lactate, potentially indicating impaired mitochondrial function. Our results provide first insights into how NADHX accumulation affects cellular functions and pave the way for a better understanding of the mechanism(s) underlying the rapid and severe neurodegeneration leading to early death in NADHX repair-deficient children.

    View details for PubMedID 30098110

  • Modulation of mRNA and lncRNA expression dynamics by the Set2-Rpd3S pathway NATURE COMMUNICATIONS Kim, J. H., Lee, B. B., Oh, Y. M., Zhu, C., Steinmetz, L. M., Lee, Y., Kim, W. K., Lee, S. B., Buratowski, S., Kim, T. 2016; 7


    H3K36 methylation by Set2 targets Rpd3S histone deacetylase to transcribed regions of mRNA genes, repressing internal cryptic promoters and slowing elongation. Here we explore the function of this pathway by analysing transcription in yeast undergoing a series of carbon source shifts. Approximately 80 mRNA genes show increased induction upon SET2 deletion. A majority of these promoters have overlapping lncRNA transcription that targets H3K36me3 and deacetylation by Rpd3S to the mRNA promoter. We previously reported a similar mechanism for H3K4me2-mediated repression via recruitment of the Set3C histone deacetylase. Here we show that the distance between an mRNA and overlapping lncRNA promoter determines whether Set2-Rpd3S or Set3C represses. This analysis also reveals many previously unreported cryptic ncRNAs induced by specific carbon sources, showing that cryptic promoters can be environmentally regulated. Therefore, in addition to repression of cryptic transcription and modulation of elongation, H3K36 methylation maintains optimal expression dynamics of many mRNAs and ncRNAs.

    View details for DOI 10.1038/ncomms13534

    View details for PubMedID 27892458

  • Chromatin Dynamics and the RNA Exosome Function in Concert to Regulate Transcriptional Homeostasis CELL REPORTS Rege, M., Subramanian, V., Zhu, C., Hsieh, T. S., Weiner, A., Friedman, N., Clauder-Muenster, S., Steinmetz, L. M., Rando, O. J., Boyer, L. A., Peterson, C. L. 2015; 13 (8): 1610-1622


    The histone variant H2A.Z is a hallmark of nucleosomes flanking promoters of protein-coding genes and is often found in nucleosomes that carry lysine 56-acetylated histone H3 (H3-K56Ac), a mark that promotes replication-independent nucleosome turnover. Here, we find that H3-K56Ac promotes RNA polymerase II occupancy at many protein-coding and noncoding loci, yet neither H3-K56Ac nor H2A.Z has a significant impact on steady-state mRNA levels in yeast. Instead, broad effects of H3-K56Ac or H2A.Z on RNA levels are revealed only in the absence of the nuclear RNA exosome. H2A.Z is also necessary for the expression of divergent, promoter-proximal noncoding RNAs (ncRNAs) in mouse embryonic stem cells. Finally, we show that H2A.Z functions with H3-K56Ac to facilitate formation of chromosome interaction domains (CIDs). Our study suggests that H2A.Z and H3-K56Ac work in concert with the RNA exosome to control mRNA and ncRNA expression, perhaps in part by regulating higher-order chromatin structures.

    View details for DOI 10.1016/j.celrep.2015.10.030

    View details for Web of Science ID 000365404900011

    View details for PubMedID 26586442

    View details for PubMedCentralID PMC4662874

  • Roadblock Termination by Reb1p Restricts Cryptic and Readthrough Transcription MOLECULAR CELL Colin, J., Candelli, T., Porrua, O., Boulay, J., Zhu, C., Lacroute, F., Steinmetz, L. M., Libri, D. 2014; 56 (5): 667-680


    Widely transcribed compact genomes must cope with the major challenge of frequent overlapping or concurrent transcription events. Efficient and timely transcription termination is crucial to control pervasive transcription and prevent transcriptional interference. In yeast, transcription termination of RNA polymerase II (RNAPII) occurs via two possible pathways that both require recognition of termination signals on nascent RNA by specific factors. We describe here an additional mechanism of transcription termination for RNAPII and demonstrate its biological significance. We show that the transcriptional activator Reb1p bound to DNA is a roadblock for RNAPII, which pauses and is ubiquitinated, thus triggering termination. Reb1p-dependent termination generates a class of cryptic transcripts that are degraded in the nucleus by the exosome. We also observed transcriptional interference between neighboring genes in the absence of Reb1p. This work demonstrates the importance of roadblock termination for controlling pervasive transcription and preventing transcription through gene regulatory regions.

    View details for DOI 10.1016/j.molcel.2014.10.026

    View details for Web of Science ID 000346653300008

    View details for PubMedID 25479637

  • Yeast Growth Plasticity Is Regulated by Environment-Specific Multi-QTL Interactions G3-GENES GENOMES GENETICS Bhatia, A., Yadav, A., Zhu, C., Gagneur, J., Radhakrishnan, A., Steinmetz, L. M., Bhanot, G., Sinha, H. 2014; 4 (5): 769-777


    For a unicellular, nonmotile organism like Saccharomyces cerevisiae, carbon sources act as nutrients and as signaling molecules; consequently, these sources affect various fitness parameters, including growth. It is therefore advantageous for yeast strains to adapt their growth to carbon source variation. The ability of a given genotype to manifest different phenotypes in varying environments is known as phenotypic plasticity. To identify quantitative trait loci (QTL) that drive plasticity in growth, two growth parameters (growth rate and biomass) were measured for a set of meiotic recombinants of two genetically divergent yeast strains grown in different carbon sources. To identify QTL contributing to plasticity across pairs of environments, gene-environment interaction mapping was performed, which identified several QTL that have a differential effect across environments, some of which act antagonistically across pairs of environments. Multi-QTL analysis identified loci interacting with previously known growth affecting QTL as well as novel two-QTL interactions that affect growth. A QTL that had no significant independent effect was found to alter growth rate and biomass for several carbon sources through two-QTL interactions. Our study demonstrates that environment-specific epistatic interactions contribute to the growth plasticity in yeast. We propose that a targeted scan for epistatic interactions, such as the one described here, can help unravel mechanisms regulating phenotypic plasticity.

    View details for DOI 10.1534/g3.113.009142

    View details for Web of Science ID 000336483900001

    View details for PubMedID 24474169

    View details for PubMedCentralID PMC4025475

  • Genotype-environment interactions reveal causal pathways that mediate genetic effects on phenotype. PLoS genetics Gagneur, J., Stegle, O., Zhu, C., Jakob, P., Tekkedil, M. M., Aiyar, R. S., Schuon, A., Pe'er, D., Steinmetz, L. M. 2013; 9 (9)


    Unraveling the molecular processes that lead from genotype to phenotype is crucial for the understanding and effective treatment of genetic diseases. Knowledge of the causative genetic defect most often does not enable treatment; therefore, causal intermediates between genotype and phenotype constitute valuable candidates for molecular intervention points that can be therapeutically targeted. Mapping genetic determinants of gene expression levels (also known as expression quantitative trait loci or eQTL studies) is frequently used for this purpose, yet distinguishing causation from correlation remains a significant challenge. Here, we address this challenge using extensive, multi-environment gene expression and fitness profiling of hundreds of genetically diverse yeast strains, in order to identify truly causal intermediate genes that condition fitness in a given environment. Using functional genomics assays, we show that the predictive power of eQTL studies for inferring causal intermediate genes is poor unless performed across multiple environments. Surprisingly, although the effects of genotype on fitness depended strongly on environment, causal intermediates could be most reliably predicted from genetic effects on expression present in all environments. Our results indicate a mechanism explaining this apparent paradox, whereby immediate molecular consequences of genetic variation are shared across environments, and environment-dependent phenotypic effects result from downstream integration of environmental signals. We developed a statistical model to predict causal intermediates that leverages this insight, yielding over 400 transcripts, for the majority of which we experimentally validated their role in conditioning fitness. Our findings have implications for the design and analysis of clinical omics studies aimed at discovering personalized targets for molecular intervention, suggesting that inferring causation in a single cellular context can benefit from molecular profiling in multiple contexts.

    View details for DOI 10.1371/journal.pgen.1003803

    View details for PubMedID 24068968

    View details for PubMedCentralID PMC3778020