Bio


Nicolas Altemose is an Assistant Professor of Genetics and a Chan Zuckerberg Biohub Investigator. The Altemose Lab develops new experimental and analytical tools to study how chromatin proteins organize and regulate complex regions of the human genome. For more information see altemoselab.stanford.edu

Academic Appointments


  • Assistant Professor, Genetics
  • Member, Bio-X

Honors & Awards


  • Pew Biomedical Scholar Award, Pew Trusts (2024-2028)
  • CZ Biohub Investigator Award, Chan Zuckerberg Biohub (2023-2028)
  • HHMI Hanna H. Gray Fellowship, Howard Hughes Medical Institute (2020-2027)
  • Siebel Scholarship, Siebel Scholars Foundation (2020)
  • HHMI Gilliam Fellowship, Howard Hughes Medical Institute (2013-2018)
  • Marshall Scholarship, UK Marshall Aid Commemoration Commission (2011-2013)
  • Angier B. Duke Scholarship, Duke University (2007-2011)

Professional Education


  • Postdoctoral Fellow, UC Berkeley, Molecular & Cell Biology
  • PhD, UC Berkeley and UCSF, Bioengineering (2021)
  • DPhil, University of Oxford, Statistics (2016)
  • BS, Duke University, Biology (2011)

Current Research and Scholarly Interests


The Altemose Lab develops new experimental and analytical tools to study how chromatin proteins organize and regulate complex regions of the human genome.

2024-25 Courses


Stanford Advisees


Graduate and Fellowship Programs


All Publications


  • Human Satellite 3 DNA encodes megabase-scale transcription factor binding platforms. bioRxiv : the preprint server for biology Franklin, J. M., Dubocanin, D., Chittenden, C., Barillas, A., Lee, R. J., Ghosh, R. P., Gerton, J. L., Guan, K. L., Altemose, N. 2025

    Abstract

    Eukaryotic genomes frequently contain large arrays of tandem repeats, called satellite DNA. While some satellite DNAs participate in centromere function, others do not. For example, Human Satellite 3 (HSat3) forms the largest satellite DNA arrays in the human genome, but these multi-megabase regions were almost fully excluded from genome assemblies until recently, and their potential functions remain understudied and largely unknown. To address this, we performed a systematic screen for HSat3 binding proteins. Our work revealed that HSat3 contains millions of copies of transcription factor (TF) motifs bound by over a dozen TFs from various signaling pathways, including the growth-regulating transcription effector family TEAD1-4 from the Hippo pathway. Imaging experiments show that TEAD recruits the co-activator YAP to HSat3 regions in a cell-state specific manner. Using synthetic reporter assays, targeted repression of HSat3, inducible degradation of YAP, and super-resolution microscopy, we show that HSat3 arrays can localize YAP/TEAD inside the nucleolus, enhancing RNA Polymerase I activity. Beyond discovering a direct relationship between the Hippo pathway and ribosomal DNA regulation, this work demonstrates that satellite DNA can encode multiple transcription factor binding motifs, defining an important functional role for these enormous genomic elements.

    View details for DOI 10.1101/2024.10.22.616524

    View details for PubMedID 39484556

    View details for PubMedCentralID PMC11526998

  • Integrating Single-Molecule Sequencing and Deep Learning to Predict Haplotype-Specific 3D Chromatin Organization in a Mendelian Condition. bioRxiv : the preprint server for biology Dubocanin, D., Kalygina, A., Franklin, J. M., Chittenden, C., Vollger, M. R., Neph, S., Stergachis, A. B., Altemose, N. 2025

    Abstract

    The three-dimensional (3D) architecture of the genome plays a crucial role in gene regulation and various human diseases. Short-read sequencing methods for measuring 3D genome organization are powerful, but they lack the ability to resolve individual human haplotypes or structurally complex regions. To address this, we present FiberFold, a deep learning model that combines convolutional neural networks and transformer architectures to accurately predict cell-type-specific and haplotype-specific 3D genome organization using multi-omic data from a single, long-read sequencing assay, Fiber-seq. By applying FiberFold to a cell line with allelic X-inactivation, we show that Topologically Associated Domains (TADs) are attenuated on the inactive chrX. Furthermore, FiberFold predicts significant changes to TADs surrounding a 13;X balanced translocation in a patient with a rare Mendelian disease. FiberFold showcases the power of integrating long-read epigenomic sequencing with deep learning tools to investigate fundamental chromatin biology as well as the molecular basis of human disease.

    View details for DOI 10.1101/2025.02.26.640261

    View details for PubMedID 40166185

    View details for PubMedCentralID PMC11957061

  • DiMeLo-cito: a one-tube protocol for mapping protein-DNA interactions reveals CTCF bookmarking in mitosis. bioRxiv : the preprint server for biology Gamarra, N., Chittenden, C., Sundararajan, K., Schwartz, J. P., Lundqvist, S., Robles, D., Dixon-Luinenburg, O., Marcus, J., Maslan, A., Franklin, J. M., Streets, A., Straight, A. F., Altemose, N. 2025

    Abstract

    Genome regulation relies on complex and dynamic interactions between DNA and proteins. Recently, powerful methods have emerged that leverage third-generation sequencing to map protein-DNA interactions genome-wide. For example, Directed Methylation with Long-read sequencing (DiMeLo-seq) enables mapping of protein-DNA interactions along long, single chromatin fibers, including in highly repetitive genomic regions. However, DiMeLo-seq involves lossy centrifugation-based wash steps that limit its applicability to many sample types. To address this, we developed DiMeLo-cito, a single-tube, wash-free protocol that maximizes the yield and quality of genomic DNA obtained for long-read sequencing. This protocol enables the interrogation of genome-wide protein binding with as few as 100,000 cells and without the requirement of a nuclear envelope, enabling confident measurement of protein-DNA interactions during mitosis. Using this protocol, we detected strong binding of CTCF to mitotic chromosomes in diploid human cells, in contrast with earlier studies in karyotypically unstable cancer cell lines, suggesting that CTCF "bookmarks" specific sites critical for maintaining genome architecture across cell divisions. By expanding the capabilities of DiMeLo-seq to a broader range of sample types, DiMeLo-cito can provide new insights into genome regulation and organization.

    View details for DOI 10.1101/2025.03.11.642717

    View details for PubMedID 40161611

    View details for PubMedCentralID PMC11952428

  • A classical revival: Human satellite DNAs enter the genomics era SEMINARS IN CELL & DEVELOPMENTAL BIOLOGY Altemose, N. 2022; 128: 2-14

    Abstract

    The classical human satellite DNAs, also referred to as human satellites 1, 2 and 3 (HSat1, HSat2, HSat3, or collectively HSat1-3), occur on most human chromosomes as large, pericentromeric tandem repeat arrays, which together constitute roughly 3% of the human genome (100 megabases, on average). Even though HSat1-3 were among the first human DNA sequences to be isolated and characterized at the dawn of molecular biology, they have remained almost entirely missing from the human genome reference assembly for 20 years, hindering studies of their sequence, regulation, and potential structural roles in the nucleus. Recently, the Telomere-to-Telomere Consortium produced the first truly complete assembly of a human genome, paving the way for new studies of HSat1-3 with modern genomic tools. This review provides an account of the history and current understanding of HSat1-3, with a view towards future studies of their evolution and roles in health and disease.

    View details for DOI 10.1016/j.semcdb.2022.04.012

    View details for Web of Science ID 000816909200002

    View details for PubMedID 35487859

  • DiMeLo-seq: a long-read, single-molecule method for mapping protein-DNA interactions genome wide. Nature methods Altemose, N., Maslan, A., Smith, O. K., Sundararajan, K., Brown, R. R., Mishra, R., Detweiler, A. M., Neff, N., Miga, K. H., Straight, A. F., Streets, A. 2022

    Abstract

    Studies of genome regulation routinely use high-throughput DNA sequencing approaches to determine where specific proteins interact with DNA, and they rely on DNA amplification and short-read sequencing, limiting their quantitative application in complex genomic regions. To address these limitations, we developed directed methylation with long-read sequencing (DiMeLo-seq), which uses antibody-tethered enzymes to methylate DNA near a target protein's binding sites in situ. These exogenous methylation marks are then detected simultaneously with endogenous CpG methylation on unamplified DNA using long-read, single-molecule sequencing technologies. We optimized and benchmarked DiMeLo-seq by mapping chromatin-binding proteins and histone modifications across the human genome. Furthermore, we identified where centromere protein A localizes within highly repetitive regions that were unmappable with short sequencing reads, and we estimated the density of centromere protein A molecules along single chromatin fibers. DiMeLo-seq is a versatile method that provides multimodal, genome-wide information for investigating protein-DNA interactions.

    View details for DOI 10.1038/s41592-022-01475-6

    View details for PubMedID 35396487

  • Complete genomic and epigenetic maps of human centromeres. Science (New York, N.Y.) Altemose, N., Logsdon, G. A., Bzikadze, A. V., Sidhwani, P., Langley, S. A., Caldas, G. V., Hoyt, S. J., Uralsky, L., Ryabov, F. D., Shew, C. J., Sauria, M. E., Borchers, M., Gershman, A., Mikheenko, A., Shepelev, V. A., Dvorkina, T., Kunyavskaya, O., Vollger, M. R., Rhie, A., McCartney, A. M., Asri, M., Lorig-Roach, R., Shafin, K., Lucas, J. K., Aganezov, S., Olson, D., de Lima, L. G., Potapova, T., Hartley, G. A., Haukness, M., Kerpedjiev, P., Gusev, F., Tigyi, K., Brooks, S., Young, A., Nurk, S., Koren, S., Salama, S. R., Paten, B., Rogaev, E. I., Streets, A., Karpen, G. H., Dernburg, A. F., Sullivan, B. A., Straight, A. F., Wheeler, T. J., Gerton, J. L., Eichler, E. E., Phillippy, A. M., Timp, W., Dennis, M. Y., O'Neill, R. J., Zook, J. M., Schatz, M. C., Pevzner, P. A., Diekhans, M., Langley, C. H., Alexandrov, I. A., Miga, K. H. 2022; 376 (6588): eabl4178

    Abstract

    Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

    View details for DOI 10.1126/science.abl4178

    View details for PubMedID 35357911

  • mu DamID: A Microfluidic Approach for Joint Imaging and Sequencing of Protein-DNA Interactions in Single Cells CELL SYSTEMS Altemose, N., Maslan, A., Rios-Martinez, C., Lai, A., White, J. A., Streets, A. 2020; 11 (4): 354-+

    Abstract

    DNA adenine methyltransferase identification (DamID) measures a protein's DNA-binding history by methylating adenine bases near each protein-DNA interaction site and then selectively amplifying and sequencing these methylated regions. Additionally, these interactions can be visualized using m6A-Tracer, a fluorescent protein that binds to methyladenines. Here, we combine these imaging and sequencing technologies in an integrated microfluidic platform (μDamID) that enables single-cell isolation, imaging, and sorting, followed by DamID. We use μDamID and an improved m6A-Tracer protein to generate paired imaging and sequencing data from individual human cells. We validate interactions between Lamin-B1 protein and lamina-associated domains (LADs), observe variable 3D chromatin organization and broad gene regulation patterns, and jointly measure single-cell heterogeneity in Dam expression and background methylation. μDamID provides the unique ability to compare paired imaging and sequencing data for each cell and between cells, enabling the joint analysis of the nuclear localization, sequence identity, and variability of protein-DNA interactions. A record of this paper's transparent peer review process is included in the Supplemental Information.

    View details for DOI 10.1016/j.cels.2020.08.015

    View details for Web of Science ID 000582118000004

    View details for PubMedID 33099405

    View details for PubMedCentralID PMC7588622

  • A high-resolution map of non-crossover events reveals impacts of genetic diversity on mammalian meiotic recombination NATURE COMMUNICATIONS Li, R., Bitoun, E., Altemose, N., Davies, R. W., Davies, B., Myers, S. R. 2019; 10: 3900

    Abstract

    During meiotic recombination, homologue-templated repair of programmed DNA double-strand breaks (DSBs) produces relatively few crossovers and many difficult-to-detect non-crossovers. By intercrossing two diverged mouse subspecies over five generations and deep-sequencing 119 offspring, we detect thousands of crossover and non-crossover events genome-wide with unprecedented power and spatial resolution. We find that both crossovers and non-crossovers are strongly depleted at DSB hotspots where the DSB-positioning protein PRDM9 fails to bind to the unbroken homologous chromosome, revealing that PRDM9 also functions to promote homologue-templated repair. Our results show that complex non-crossovers are much rarer in mice than humans, consistent with complex events arising from accumulated non-programmed DNA damage. Unexpectedly, we also find that GC-biased gene conversion is restricted to non-crossover tracts containing only one mismatch. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility.

    View details for DOI 10.1038/s41467-019-11675-y

    View details for Web of Science ID 000483017900010

    View details for PubMedID 31467277

    View details for PubMedCentralID PMC6715734

  • A map of human PRDM9 binding provides evidence for novel behaviors of PRDM9 and other zinc-finger proteins in meiosis ELIFE Altemose, N., Noor, N., Bitoun, E., Tumian, A., Imbeault, M., Chapman, J., Aricescu, A., Myers, S. R. 2017; 6

    Abstract

    PRDM9 binding localizes almost all meiotic recombination sites in humans and mice. However, most PRDM9-bound loci do not become recombination hotspots. To explore factors that affect binding and subsequent recombination outcomes, we mapped human PRDM9 binding sites in a transfected human cell line and measured PRDM9-induced histone modifications. These data reveal varied DNA-binding modalities of PRDM9. We also find that human PRDM9 frequently binds promoters, despite their low recombination rates, and it can activate expression of a small number of genes including CTCFL and VCX. Furthermore, we identify specific sequence motifs that predict consistent, localized meiotic recombination suppression around a subset of PRDM9 binding sites. These motifs strongly associate with KRAB-ZNF protein binding, TRIM28 recruitment, and specific histone modifications. Finally, we demonstrate that, in addition to binding DNA, PRDM9's zinc fingers also mediate its multimerization, and we show that a pair of highly diverged alleles preferentially form homo-multimers.

    View details for DOI 10.7554/e.Life.28383

    View details for Web of Science ID 000416379900001

    View details for PubMedID 29072575

    View details for PubMedCentralID PMC5705219

  • Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly PLOS COMPUTATIONAL BIOLOGY Altemose, N., Miga, K. H., Maggioni, M., Willard, H. F. 2014; 10 (5): e1003628

    Abstract

    The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

    View details for DOI 10.1371/journal.pcbi.1003628

    View details for Web of Science ID 000337288000037

    View details for PubMedID 24831296

    View details for PubMedCentralID PMC4022460

  • Enhancing transcription-replication conflict targets ecDNA-positive cancers. Nature Tang, J., Weiser, N. E., Wang, G., Chowdhry, S., Curtis, E. J., Zhao, Y., Wong, I. T., Marinov, G. K., Li, R., Hanoian, P., Tse, E., Mojica, S. G., Hansen, R., Plum, J., Steffy, A., Milutinovic, S., Meyer, S. T., Luebeck, J., Wang, Y., Zhang, S., Altemose, N., Curtis, C., Greenleaf, W. J., Bafna, V., Benkovic, S. J., Pinkerton, A. B., Kasibhatla, S., Hassig, C. A., Mischel, P. S., Chang, H. Y. 2024; 635 (8037): 210-218

    Abstract

    Extrachromosomal DNA (ecDNA) presents a major challenge for cancer patients. ecDNA renders tumours treatment resistant by facilitating massive oncogene transcription and rapid genome evolution, contributing to poor patient survival1-7. At present, there are no ecDNA-specific treatments. Here we show that enhancing transcription-replication conflict enables targeted elimination of ecDNA-containing cancers. Stepwise analyses of ecDNA transcription reveal pervasive RNA transcription and associated single-stranded DNA, leading to excessive transcription-replication conflicts and replication stress compared with chromosomal loci. Nucleotide incorporation on ecDNA is markedly slower, and replication stress is significantly higher in ecDNA-containing tumours regardless of cancer type or oncogene cargo. pRPA2-S33, a mediator of DNA damage repair that binds single-stranded DNA, shows elevated localization on ecDNA in a transcription-dependent manner, along with increased DNA double strand breaks, and activation of the S-phase checkpoint kinase, CHK1. Genetic or pharmacological CHK1 inhibition causes extensive and preferential tumour cell death in ecDNA-containing tumours. We advance a highly selective, potent and bioavailable oral CHK1 inhibitor, BBI-2779, that preferentially kills ecDNA-containing tumour cells. In a gastric cancer model containing FGFR2 amplified on ecDNA, BBI-2779 suppresses tumour growth and prevents ecDNA-mediated acquired resistance to the pan-FGFR inhibitor infigratinib, resulting in potent and sustained tumour regression in mice. Transcription-replication conflict emerges as a target for ecDNA-directed therapy, exploiting a synthetic lethality of excess to treat cancer.

    View details for DOI 10.1038/s41586-024-07802-5

    View details for PubMedID 39506153

  • Mapping protein-DNA interactions with DiMeLo-seq. Nature protocols Maslan, A., Altemose, N., Marcus, J., Mishra, R., Brennan, L. D., Sundararajan, K., Karpen, G., Straight, A. F., Streets, A. 2024

    Abstract

    We recently developed directed methylation with long-read sequencing (DiMeLo-seq) to map protein-DNA interactions genome wide. DiMeLo-seq is capable of mapping multiple interaction sites on single DNA molecules, profiling protein binding in the context of endogenous DNA methylation, identifying haplotype-specific protein-DNA interactions and mapping protein-DNA interactions in repetitive regions of the genome that are difficult to study with short-read methods. With DiMeLo-seq, adenines in the vicinity of a protein of interest are methylated in situ by tethering the Hia5 methyltransferase to an antibody using protein A. Protein-DNA interactions are then detected by direct readout of adenine methylation with long-read, single-molecule DNA sequencing platforms such as Nanopore sequencing. Here we present a detailed protocol and practical guidance for performing DiMeLo-seq. This protocol can be run on nuclei from fresh, lightly fixed or frozen cells. The protocol requires 1-2 d for performing in situ targeted methylation, 1-5 d for library preparation depending on desired fragment length and 1-3 d for Nanopore sequencing depending on desired sequencing depth. The protocol requires basic molecular biology skills and equipment, as well as access to a Nanopore sequencer. We also provide a Python package, dimelo, for analysis of DiMeLo-seq data.

    View details for DOI 10.1038/s41596-024-01032-9

    View details for PubMedID 39237830

    View details for PubMedCentralID 2921165

  • The complete sequence of a human Y chromosome NATURE Rhie, A., Nurk, S., Cechova, M., Hoyt, S. J., Taylor, D. J., Altemose, N., Hook, P. W., Koren, S., Rautiainen, M., Alexandrov, I. A., Allen, J., Asri, M., Bzikadze, A. V., Chen, N., Chin, C., Diekhans, M., Flicek, P., Formenti, G., Fungtammasan, A., Garcia Giron, C., Garrison, E., Gershman, A., Gerton, J. L., Grady, P. S., Guarracino, A., Haggerty, L., Halabian, R., Hansen, N. F., Harris, R., Hartley, G. A., Harvey, W. T., Haukness, M., Heinz, J., Hourlier, T., Hubley, R. M., Hunt, S. E., Hwang, S., Jain, M., Kesharwani, R. K., Lewis, A. P., Li, H., Logsdon, G. A., Lucas, J. K., Makalowski, W., Markovic, C., Martin, F. J., Cartney, A., Mccoy, R. C., Mcdaniel, J., Mcnulty, B. M., Medvedev, P., Mikheenko, A., Munson, K. M., Murphy, T. D., Olsen, H. E., Olson, N. D., Paulin, L. F., Porubsky, D., Potapova, T., Ryabov, F., Salzberg, S. L., Sauria, M. G., Sedlazeck, F. J., Shafin, K., Shepelev, V. A., Shumate, A., Storer, J. M., Surapaneni, L., Taravella Oill, A. M., Thibaud-Nissen, F., Timp, W., Tomaszkiewicz, M., Vollger, M. R., Walenz, B. P., Watwood, A. C., Weissensteiner, M. H., Wenger, A. M., Wilson, M. A., Zarate, S., Zhu, Y., Zook, J. M., Eichler, E. E., O'Neill, R. J., Schatz, M. C., Miga, K. H., Makova, K. D., Phillippy, A. M. 2023; 621 (7978): 344-354

    Abstract

    The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

    View details for DOI 10.1038/s41586-023-06457-y

    View details for Web of Science ID 001082304000001

    View details for PubMedID 37612512

    View details for PubMedCentralID 3975068

  • The complete sequence of a human genome SCIENCE Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A., Mikheenko, A., Vollger, M. R., Altemose, N., Uralsky, L., Gershman, A., Aganezov, S., Hoyt, S. J., Diekhans, M., Logsdon, G. A., Alonge, M., Antonarakis, S. E., Borchers, M., Bouffard, G. G., Brooks, S. Y., Caldas, G., Chen, N., Cheng, H., Chin, C., Chow, W., de Lima, L. G., Dishuck, P. C., Durbin, R., Dvorkina, T., Fiddes, I. T., Formenti, G., Fulton, R. S., Fungtammasan, A., Garrison, E., Grady, P. S., Graves-Lindsay, T. A., Hall, I. M., Hansen, N. F., Hartley, G. A., Haukness, M., Howe, K., Hunkapiller, M. W., Jain, C., Jain, M., Jarvis, E. D., Kerpedjiev, P., Kirsche, M., Kolmogorov, M., Korlach, J., Kremitzki, M., Li, H., Maduro, V. V., Marschall, T., McCartney, A. M., McDaniel, J., Miller, D. E., Mullikin, J. C., Myers, E. W., Olson, N. D., Paten, B., Peluso, P., Pevzner, P. A., Porubsky, D., Potapova, T., Rogaev, E., Rosenfeld, J. A., Salzberg, S. L., Schneider, V. A., Sedlazeck, F. J., Shafin, K., Shew, C. J., Shumate, A., Sims, Y., Smit, A. A., Soto, D. C., Sovic, I., Storer, J. M., Streets, A., Sullivan, B. A., Thibaud-Nissen, F., Torrance, J., Wagner, J., Walenz, B. P., Wenger, A., Wood, J. D., Xiao, C., Yan, S. M., Young, A. C., Zarate, S., Surti, U., McCoy, R. C., Dennis, M. Y., Alexandrov, I. A., Gerton, J. L., O'Neill, R. J., Timp, W., Zook, J. M., Schatz, M. C., Eichler, E. E., Miga, K. H., Phillippy, A. M. 2022; 376 (6588): 44-+
  • From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science (New York, N.Y.) Hoyt, S. J., Storer, J. M., Hartley, G. A., Grady, P. G., Gershman, A., de Lima, L. G., Limouse, C., Halabian, R., Wojenski, L., Rodriguez, M., Altemose, N., Rhie, A., Core, L. J., Gerton, J. L., Makalowski, W., Olson, D., Rosen, J., Smit, A. F., Straight, A. F., Vollger, M. R., Wheeler, T. J., Schatz, M. C., Eichler, E. E., Phillippy, A. M., Timp, W., Miga, K. H., O'Neill, R. J. 2022; 376 (6588): eabk3112

    Abstract

    Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.

    View details for DOI 10.1126/science.abk3112

    View details for PubMedID 35357925

  • Epigenetic patterns in a complete human genome SCIENCE Gershman, A., Sauria, M. G., Guitart, X., Vollger, M. R., Hook, P. W., Hoyt, S. J., Jain, M., Shumate, A., Razaghi, R., Koren, S., Altemose, N., Caldas, G., Logsdon, G. A., Rhie, A., Eichler, E. E., Schatz, M. C., O'Neill, R. J., Phillippy, A. M., Miga, K. H., Timp, W. 2022; 376 (6588): 58-+

    Abstract

    The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.

    View details for DOI 10.1126/science.abj5089

    View details for Web of Science ID 000780195200026

    View details for PubMedID 35357915

    View details for PubMedCentralID PMC9170183

  • Characterization of transcript enrichment and detection bias in single-nucleus RNA-seq for mapping of distinct human adipocyte lineages GENOME RESEARCH Gupta, A., Shamsi, F., Altemose, N., Dorlhiac, G. F., Cypess, A. M., White, A. P., Yosef, N., Patti, M., Tseng, Y., Streets, A. 2022; 32 (2): 242-257

    Abstract

    Single-cell RNA sequencing (scRNA-seq) enables molecular characterization of complex biological tissues at high resolution. The requirement of single-cell extraction, however, makes it challenging for profiling tissues such as adipose tissue, for which collection of intact single adipocytes is complicated by their fragile nature. For such tissues, single-nucleus extraction is often much more efficient and therefore single-nucleus RNA sequencing (snRNA-seq) presents an alternative to scRNA-seq. However, nuclear transcripts represent only a fraction of the transcriptome in a single cell, with snRNA-seq marked with inherent transcript enrichment and detection biases. Therefore, snRNA-seq may be inadequate for mapping important transcriptional signatures in adipose tissue. In this study, we compare the transcriptomic landscape of single nuclei isolated from preadipocytes and mature adipocytes across human white and brown adipocyte lineages, with whole-cell transcriptome. We show that snRNA-seq is capable of identifying the broad cell types present in scRNA-seq at all states of adipogenesis. However, we also explore how and why the nuclear transcriptome is biased and limited, as well as how it can be advantageous. We robustly characterize the enrichment of nuclear-localized transcripts and adipogenic regulatory lncRNAs in snRNA-seq, while also providing a detailed understanding for the preferential detection of long genes upon using this technique. To remove such technical detection biases, we propose a normalization strategy for a more accurate comparison of nuclear and cellular data. Finally, we show successful integration of scRNA-seq and snRNA-seq data sets with existing bioinformatic tools. Overall, our results illustrate the applicability of snRNA-seq for the characterization of cellular diversity in the adipose tissue.

    View details for DOI 10.1101/gr.275509.121

    View details for Web of Science ID 000749564500004

    View details for PubMedID 35042723

    View details for PubMedCentralID PMC8805720

  • Two genetic variants explain the association of European ancestry with multiple sclerosis risk in African-Americans SCIENTIFIC REPORTS Nakatsuka, N., Patterson, N., Patsopoulos, N. A., Altemose, N., Tandon, A., Beecham, A. H., McCauley, J. L., Isobe, N., Hauser, S., De Jager, P. L., Hafler, D. A., Oksenberg, J. R., Reich, D. 2020; 10 (1): 16902

    Abstract

    Epidemiological studies have suggested differences in the rate of multiple sclerosis (MS) in individuals of European ancestry compared to African ancestry, motivating genetic scans to identify variants that could contribute to such patterns. In a whole-genome scan in 899 African-American cases and 1155 African-American controls, we confirm that African-Americans who inherit segments of the genome of European ancestry at a chromosome 1 locus are at increased risk for MS [logarithm of odds (LOD) = 9.8], although the signal weakens when adding an additional 406 cases, reflecting heterogeneity in the two sets of cases [logarithm of odds (LOD) = 2.7]. The association in the 899 individuals can be fully explained by two variants previously associated with MS in European ancestry individuals. These variants tag a MS susceptibility haplotype associated with decreased CD58 gene expression (odds ratio of 1.37; frequency of 84% in Europeans and 22% in West Africans for the tagging variant) as well as another haplotype near the FCRL3 gene (odds ratio of 1.07; frequency of 49% in Europeans and 8% in West Africans). Controlling for all other genetic and environmental factors, the two variants predict a 1.44-fold higher rate of MS in European-Americans compared to African-Americans.

    View details for DOI 10.1038/s41598-020-74035-7

    View details for Web of Science ID 000615373100014

    View details for PubMedID 33037294

    View details for PubMedCentralID PMC7547691

  • On-ratio PDMS bonding for multilayer microfluidic device fabrication JOURNAL OF MICROMECHANICS AND MICROENGINEERING Lai, A., Altemose, N., White, J. A., Streets, A. M. 2019; 29 (10)
  • Re-engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice NATURE Davies, B., Hatton, E., Altemose, N., Hussin, J. G., Pratto, F., Zhang, G., Hinch, A., Moralli, D., Biggs, D., Diaz, R., Preece, C., Li, R., Bitoun, E., Brick, K., Green, C. M., Amerini-Otero, R. C., Myers, S. R., Donnelly, P. 2016; 530 (7589): 171-+

    Abstract

    The DNA-binding protein PRDM9 directs positioning of the double-strand breaks (DSBs) that initiate meiotic recombination in mice and humans. Prdm9 is the only mammalian speciation gene yet identified and is responsible for sterility phenotypes in male hybrids of certain mouse subspecies. To investigate PRDM9 binding and its role in fertility and meiotic recombination, we humanized the DNA-binding domain of PRDM9 in C57BL/6 mice. This change repositions DSB hotspots and completely restores fertility in male hybrids. Here we show that alteration of one Prdm9 allele impacts the behaviour of DSBs controlled by the other allele at chromosome-wide scales. These effects correlate strongly with the degree to which each PRDM9 variant binds both homologues at the DSB sites it controls. Furthermore, higher genome-wide levels of such 'symmetric' PRDM9 binding associate with increasing fertility measures, and comparisons of individual hotspots suggest binding symmetry plays a downstream role in the recombination process. These findings reveal that subspecies-specific degradation of PRDM9 binding sites by meiotic drive, which steadily increases asymmetric PRDM9 binding, has impacts beyond simply changing hotspot positions, and strongly support a direct involvement in hybrid infertility. Because such meiotic drive occurs across mammals, PRDM9 may play a wider, yet transient, role in the early stages of speciation.

    View details for DOI 10.1038/nature16931

    View details for Web of Science ID 000369916700029

    View details for PubMedID 26840484

    View details for PubMedCentralID PMC4756437

  • Non-crossover gene conversions show strong GC bias and unexpected clustering in humans ELIFE Williams, A. L., Genovese, G., Dyer, T., Altemose, N., Truax, K., Jun, G., Patterson, N., Myers, S. R., Curran, J. E., Duggirala, R., Blangero, J., Reich, D., Przeworski, M., T2D-GENES Consortium 2015; 4

    Abstract

    Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (NCO) gene conversion. We report the first genome-wide study of NCO events in humans. Using SNP array data from 98 meioses, we identified 103 sites affected by NCO, of which 50/52 were confirmed in sequence data. Overlap with double strand break (DSB) hotspots indicates that most of the events are likely of meiotic origin. We estimate that a site is involved in a NCO at a rate of 5.9 × 10(-6)/bp/generation, consistent with sperm-typing studies, and infer that tract lengths span at least an order of magnitude. Observed NCO events show strong allelic bias at heterozygous AT/GC SNPs, with 68% (58-78%) transmitting GC alleles (p = 5 × 10(-4)). Strikingly, in 4 of 15 regions with resequencing data, multiple disjoint NCO tracts cluster in close proximity (∼20-30 kb), a phenomenon not previously seen in mammals.

    View details for DOI 10.7554/eLife.04637

    View details for Web of Science ID 000351867100004

    View details for PubMedID 25806687

    View details for PubMedCentralID PMC4404656

  • Recombination in the Human Pseudoautosomal Region PAR1 PLOS GENETICS Hinch, A. G., Altemose, N., Noor, N., Donnelly, P., Myers, S. R. 2014; 10 (7): e1004503

    Abstract

    The pseudoautosomal region (PAR) is a short region of homology between the mammalian X and Y chromosomes, which has undergone rapid evolution. A crossover in the PAR is essential for the proper disjunction of X and Y chromosomes in male meiosis, and PAR deletion results in male sterility. This leads the human PAR with the obligatory crossover, PAR1, to having an exceptionally high male crossover rate, which is 17-fold higher than the genome-wide average. However, the mechanism by which this obligatory crossover occurs remains unknown, as does the fine-scale positioning of crossovers across this region. Recent research in mice has suggested that crossovers in PAR may be mediated independently of the protein PRDM9, which localises virtually all crossovers in the autosomes. To investigate recombination in this region, we construct the most fine-scale genetic map containing directly observed crossovers to date using African-American pedigrees. We leverage recombination rates inferred from the breakdown of linkage disequilibrium in human populations and investigate the signatures of DNA evolution due to recombination. Further, we identify direct PRDM9 binding sites using ChIP-seq in human cells. Using these independent lines of evidence, we show that, in contrast with mouse, PRDM9 does localise peaks of recombination in the human PAR1. We find that recombination is a far more rapid and intense driver of sequence evolution in PAR1 than it is on the autosomes. We also show that PAR1 hotspot activities differ significantly among human populations. Finally, we find evidence that PAR1 hotspot positions have changed between human and chimpanzee, with no evidence of sharing among the hottest hotspots. We anticipate that the genetic maps built and validated in this work will aid research on this vital and fascinating region of the genome.

    View details for DOI 10.1371/journal.pgen.1004503

    View details for Web of Science ID 000339902600048

    View details for PubMedID 25033397

    View details for PubMedCentralID PMC4102438

  • Centromere reference models for human chromosomes X and Y satellite arrays GENOME RESEARCH Miga, K. H., Newton, Y., Jain, M., Altemose, N., Willard, H. F., Kent, W. 2014; 24 (4): 697-707

    Abstract

    The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.

    View details for DOI 10.1101/gr.159624.113

    View details for Web of Science ID 000334055600015

    View details for PubMedID 24501022

    View details for PubMedCentralID PMC3975068

  • Using population admixture to help complete maps of the human genome NATURE GENETICS Genovese, G., Handsaker, R. E., Li, H., Altemose, N., Lindgren, A. M., Chambert, K., Pasaniuc, B., Price, A. L., Reich, D., Morton, C. C., Pollak, M. R., Wilson, J. G., McCarroll, S. A. 2013; 45 (4): 406-414

    Abstract

    Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces using the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed at the RNA level and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.

    View details for DOI 10.1038/ng.2565

    View details for Web of Science ID 000316840600011

    View details for PubMedID 23435088

    View details for PubMedCentralID PMC3683849