Nicolas Altemose is an Assistant Professor of Genetics and a Chan Zuckerberg Biohub Investigator. The Altemose Lab develops new experimental and analytical tools to study how chromatin proteins organize and regulate complex regions of the human genome. For more information see

Academic Appointments

  • Assistant Professor, Genetics
  • Member, Bio-X

Honors & Awards

  • CZ Biohub Investigator Award, Chan Zuckerberg Biohub (2023-2028)
  • HHMI Hanna H. Gray Fellowship, Howard Hughes Medical Institute (2020-2027)
  • Siebel Scholarship, Siebel Scholars Foundation (2020)
  • HHMI Gilliam Fellowship, Howard Hughes Medical Institute (2013-2018)
  • Marshall Scholarship, UK Marshall Aid Commemoration Commission (2011-2013)
  • Angier B. Duke Scholarship, Duke University (2007-2011)

Professional Education

  • Postdoctoral Fellow, UC Berkeley, Molecular & Cell Biology
  • PhD, UC Berkeley and UCSF, Bioengineering (2021)
  • DPhil, University of Oxford, Statistics (2016)
  • BS, Duke University, Biology (2011)

Current Research and Scholarly Interests

The Altemose Lab develops new experimental and analytical tools to study how chromatin proteins organize and regulate complex regions of the human genome.

2023-24 Courses

Stanford Advisees

Graduate and Fellowship Programs

All Publications

  • A classical revival: Human satellite DNAs enter the genomics era SEMINARS IN CELL & DEVELOPMENTAL BIOLOGY Altemose, N. 2022; 128: 2-14


    The classical human satellite DNAs, also referred to as human satellites 1, 2 and 3 (HSat1, HSat2, HSat3, or collectively HSat1-3), occur on most human chromosomes as large, pericentromeric tandem repeat arrays, which together constitute roughly 3% of the human genome (100 megabases, on average). Even though HSat1-3 were among the first human DNA sequences to be isolated and characterized at the dawn of molecular biology, they have remained almost entirely missing from the human genome reference assembly for 20 years, hindering studies of their sequence, regulation, and potential structural roles in the nucleus. Recently, the Telomere-to-Telomere Consortium produced the first truly complete assembly of a human genome, paving the way for new studies of HSat1-3 with modern genomic tools. This review provides an account of the history and current understanding of HSat1-3, with a view towards future studies of their evolution and roles in health and disease.

    View details for DOI 10.1016/j.semcdb.2022.04.012

    View details for Web of Science ID 000816909200002

    View details for PubMedID 35487859

  • DiMeLo-seq: a long-read, single-molecule method for mapping protein-DNA interactions genome wide. Nature methods Altemose, N., Maslan, A., Smith, O. K., Sundararajan, K., Brown, R. R., Mishra, R., Detweiler, A. M., Neff, N., Miga, K. H., Straight, A. F., Streets, A. 2022


    Studies of genome regulation routinely use high-throughput DNA sequencing approaches to determine where specific proteins interact with DNA, and they rely on DNA amplification and short-read sequencing, limiting their quantitative application in complex genomic regions. To address these limitations, we developed directed methylation with long-read sequencing (DiMeLo-seq), which uses antibody-tethered enzymes to methylate DNA near a target protein's binding sites in situ. These exogenous methylation marks are then detected simultaneously with endogenous CpG methylation on unamplified DNA using long-read, single-molecule sequencing technologies. We optimized and benchmarked DiMeLo-seq by mapping chromatin-binding proteins and histone modifications across the human genome. Furthermore, we identified where centromere protein A localizes within highly repetitive regions that were unmappable with short sequencing reads, and we estimated the density of centromere protein A molecules along single chromatin fibers. DiMeLo-seq is a versatile method that provides multimodal, genome-wide information for investigating protein-DNA interactions.

    View details for DOI 10.1038/s41592-022-01475-6

    View details for PubMedID 35396487

  • Complete genomic and epigenetic maps of human centromeres. Science (New York, N.Y.) Altemose, N., Logsdon, G. A., Bzikadze, A. V., Sidhwani, P., Langley, S. A., Caldas, G. V., Hoyt, S. J., Uralsky, L., Ryabov, F. D., Shew, C. J., Sauria, M. E., Borchers, M., Gershman, A., Mikheenko, A., Shepelev, V. A., Dvorkina, T., Kunyavskaya, O., Vollger, M. R., Rhie, A., McCartney, A. M., Asri, M., Lorig-Roach, R., Shafin, K., Lucas, J. K., Aganezov, S., Olson, D., de Lima, L. G., Potapova, T., Hartley, G. A., Haukness, M., Kerpedjiev, P., Gusev, F., Tigyi, K., Brooks, S., Young, A., Nurk, S., Koren, S., Salama, S. R., Paten, B., Rogaev, E. I., Streets, A., Karpen, G. H., Dernburg, A. F., Sullivan, B. A., Straight, A. F., Wheeler, T. J., Gerton, J. L., Eichler, E. E., Phillippy, A. M., Timp, W., Dennis, M. Y., O'Neill, R. J., Zook, J. M., Schatz, M. C., Pevzner, P. A., Diekhans, M., Langley, C. H., Alexandrov, I. A., Miga, K. H. 2022; 376 (6588): eabl4178


    Existing human genome assemblies have almost entirely excluded repetitive sequences within and near centromeres, limiting our understanding of their organization, evolution, and functions, which include facilitating proper chromosome segregation. Now, a complete, telomere-to-telomere human genome assembly (T2T-CHM13) has enabled us to comprehensively characterize pericentromeric and centromeric repeats, which constitute 6.2% of the genome (189.9 megabases). Detailed maps of these regions revealed multimegabase structural rearrangements, including in active centromeric repeat arrays. Analysis of centromere-associated sequences uncovered a strong relationship between the position of the centromere and the evolution of the surrounding DNA through layered repeat expansions. Furthermore, comparisons of chromosome X centromeres across a diverse panel of individuals illuminated high degrees of structural, epigenetic, and sequence variation in these complex and rapidly evolving regions.

    View details for DOI 10.1126/science.abl4178

    View details for PubMedID 35357911

  • mu DamID: A Microfluidic Approach for Joint Imaging and Sequencing of Protein-DNA Interactions in Single Cells CELL SYSTEMS Altemose, N., Maslan, A., Rios-Martinez, C., Lai, A., White, J. A., Streets, A. 2020; 11 (4): 354-+


    DNA adenine methyltransferase identification (DamID) measures a protein's DNA-binding history by methylating adenine bases near each protein-DNA interaction site and then selectively amplifying and sequencing these methylated regions. Additionally, these interactions can be visualized using m6A-Tracer, a fluorescent protein that binds to methyladenines. Here, we combine these imaging and sequencing technologies in an integrated microfluidic platform (μDamID) that enables single-cell isolation, imaging, and sorting, followed by DamID. We use μDamID and an improved m6A-Tracer protein to generate paired imaging and sequencing data from individual human cells. We validate interactions between Lamin-B1 protein and lamina-associated domains (LADs), observe variable 3D chromatin organization and broad gene regulation patterns, and jointly measure single-cell heterogeneity in Dam expression and background methylation. μDamID provides the unique ability to compare paired imaging and sequencing data for each cell and between cells, enabling the joint analysis of the nuclear localization, sequence identity, and variability of protein-DNA interactions. A record of this paper's transparent peer review process is included in the Supplemental Information.

    View details for DOI 10.1016/j.cels.2020.08.015

    View details for Web of Science ID 000582118000004

    View details for PubMedID 33099405

    View details for PubMedCentralID PMC7588622

  • A high-resolution map of non-crossover events reveals impacts of genetic diversity on mammalian meiotic recombination NATURE COMMUNICATIONS Li, R., Bitoun, E., Altemose, N., Davies, R. W., Davies, B., Myers, S. R. 2019; 10: 3900


    During meiotic recombination, homologue-templated repair of programmed DNA double-strand breaks (DSBs) produces relatively few crossovers and many difficult-to-detect non-crossovers. By intercrossing two diverged mouse subspecies over five generations and deep-sequencing 119 offspring, we detect thousands of crossover and non-crossover events genome-wide with unprecedented power and spatial resolution. We find that both crossovers and non-crossovers are strongly depleted at DSB hotspots where the DSB-positioning protein PRDM9 fails to bind to the unbroken homologous chromosome, revealing that PRDM9 also functions to promote homologue-templated repair. Our results show that complex non-crossovers are much rarer in mice than humans, consistent with complex events arising from accumulated non-programmed DNA damage. Unexpectedly, we also find that GC-biased gene conversion is restricted to non-crossover tracts containing only one mismatch. These results demonstrate that local genetic diversity profoundly alters meiotic repair pathway decisions via at least two distinct mechanisms, impacting genome evolution and Prdm9-related hybrid infertility.

    View details for DOI 10.1038/s41467-019-11675-y

    View details for Web of Science ID 000483017900010

    View details for PubMedID 31467277

    View details for PubMedCentralID PMC6715734

  • A map of human PRDM9 binding provides evidence for novel behaviors of PRDM9 and other zinc-finger proteins in meiosis ELIFE Altemose, N., Noor, N., Bitoun, E., Tumian, A., Imbeault, M., Chapman, J., Aricescu, A., Myers, S. R. 2017; 6


    PRDM9 binding localizes almost all meiotic recombination sites in humans and mice. However, most PRDM9-bound loci do not become recombination hotspots. To explore factors that affect binding and subsequent recombination outcomes, we mapped human PRDM9 binding sites in a transfected human cell line and measured PRDM9-induced histone modifications. These data reveal varied DNA-binding modalities of PRDM9. We also find that human PRDM9 frequently binds promoters, despite their low recombination rates, and it can activate expression of a small number of genes including CTCFL and VCX. Furthermore, we identify specific sequence motifs that predict consistent, localized meiotic recombination suppression around a subset of PRDM9 binding sites. These motifs strongly associate with KRAB-ZNF protein binding, TRIM28 recruitment, and specific histone modifications. Finally, we demonstrate that, in addition to binding DNA, PRDM9's zinc fingers also mediate its multimerization, and we show that a pair of highly diverged alleles preferentially form homo-multimers.

    View details for DOI 10.7554/e.Life.28383

    View details for Web of Science ID 000416379900001

    View details for PubMedID 29072575

    View details for PubMedCentralID PMC5705219

  • Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly PLOS COMPUTATIONAL BIOLOGY Altemose, N., Miga, K. H., Maggioni, M., Willard, H. F. 2014; 10 (5): e1003628


    The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.

    View details for DOI 10.1371/journal.pcbi.1003628

    View details for Web of Science ID 000337288000037

    View details for PubMedID 24831296

    View details for PubMedCentralID PMC4022460

  • The complete sequence of a human genome SCIENCE Nurk, S., Koren, S., Rhie, A., Rautiainen, M., Bzikadze, A., Mikheenko, A., Vollger, M. R., Altemose, N., Uralsky, L., Gershman, A., Aganezov, S., Hoyt, S. J., Diekhans, M., Logsdon, G. A., Alonge, M., Antonarakis, S. E., Borchers, M., Bouffard, G. G., Brooks, S. Y., Caldas, G., Chen, N., Cheng, H., Chin, C., Chow, W., de Lima, L. G., Dishuck, P. C., Durbin, R., Dvorkina, T., Fiddes, I. T., Formenti, G., Fulton, R. S., Fungtammasan, A., Garrison, E., Grady, P. S., Graves-Lindsay, T. A., Hall, I. M., Hansen, N. F., Hartley, G. A., Haukness, M., Howe, K., Hunkapiller, M. W., Jain, C., Jain, M., Jarvis, E. D., Kerpedjiev, P., Kirsche, M., Kolmogorov, M., Korlach, J., Kremitzki, M., Li, H., Maduro, V. V., Marschall, T., McCartney, A. M., McDaniel, J., Miller, D. E., Mullikin, J. C., Myers, E. W., Olson, N. D., Paten, B., Peluso, P., Pevzner, P. A., Porubsky, D., Potapova, T., Rogaev, E., Rosenfeld, J. A., Salzberg, S. L., Schneider, V. A., Sedlazeck, F. J., Shafin, K., Shew, C. J., Shumate, A., Sims, Y., Smit, A. A., Soto, D. C., Sovic, I., Storer, J. M., Streets, A., Sullivan, B. A., Thibaud-Nissen, F., Torrance, J., Wagner, J., Walenz, B. P., Wenger, A., Wood, J. D., Xiao, C., Yan, S. M., Young, A. C., Zarate, S., Surti, U., McCoy, R. C., Dennis, M. Y., Alexandrov, I. A., Gerton, J. L., O'Neill, R. J., Timp, W., Zook, J. M., Schatz, M. C., Eichler, E. E., Miga, K. H., Phillippy, A. M. 2022; 376 (6588): 44-+
  • From telomere to telomere: The transcriptional and epigenetic state of human repeat elements. Science (New York, N.Y.) Hoyt, S. J., Storer, J. M., Hartley, G. A., Grady, P. G., Gershman, A., de Lima, L. G., Limouse, C., Halabian, R., Wojenski, L., Rodriguez, M., Altemose, N., Rhie, A., Core, L. J., Gerton, J. L., Makalowski, W., Olson, D., Rosen, J., Smit, A. F., Straight, A. F., Vollger, M. R., Wheeler, T. J., Schatz, M. C., Eichler, E. E., Phillippy, A. M., Timp, W., Miga, K. H., O'Neill, R. J. 2022; 376 (6588): eabk3112


    Mobile elements and repetitive genomic regions are sources of lineage-specific genomic innovation and uniquely fingerprint individual genomes. Comprehensive analyses of such repeat elements, including those found in more complex regions of the genome, require a complete, linear genome assembly. We present a de novo repeat discovery and annotation of the T2T-CHM13 human reference genome. We identified previously unknown satellite arrays, expanded the catalog of variants and families for repeats and mobile elements, characterized classes of complex composite repeats, and located retroelement transduction events. We detected nascent transcription and delineated CpG methylation profiles to define the structure of transcriptionally active retroelements in humans, including those in centromeres. These data expand our insight into the diversity, distribution, and evolution of repetitive regions that have shaped the human genome.

    View details for DOI 10.1126/science.abk3112

    View details for PubMedID 35357925

  • Epigenetic patterns in a complete human genome SCIENCE Gershman, A., Sauria, M. G., Guitart, X., Vollger, M. R., Hook, P. W., Hoyt, S. J., Jain, M., Shumate, A., Razaghi, R., Koren, S., Altemose, N., Caldas, G., Logsdon, G. A., Rhie, A., Eichler, E. E., Schatz, M. C., O'Neill, R. J., Phillippy, A. M., Miga, K. H., Timp, W. 2022; 376 (6588): 58-+


    The completion of a telomere-to-telomere human reference genome, T2T-CHM13, has resolved complex regions of the genome, including repetitive and homologous regions. Here, we present a high-resolution epigenetic study of previously unresolved sequences, representing entire acrocentric chromosome short arms, gene family expansions, and a diverse collection of repeat classes. This resource precisely maps CpG methylation (32.28 million CpGs), DNA accessibility, and short-read datasets (166,058 previously unresolved chromatin immunoprecipitation sequencing peaks) to provide evidence of activity across previously unidentified or corrected genes and reveals clinically relevant paralog-specific regulation. Probing CpG methylation across human centromeres from six diverse individuals generated an estimate of variability in kinetochore localization. This analysis provides a framework with which to investigate the most elusive regions of the human genome, granting insights into epigenetic regulation.

    View details for DOI 10.1126/science.abj5089

    View details for Web of Science ID 000780195200026

    View details for PubMedID 35357915

    View details for PubMedCentralID PMC9170183

  • Characterization of transcript enrichment and detection bias in single-nucleus RNA-seq for mapping of distinct human adipocyte lineages GENOME RESEARCH Gupta, A., Shamsi, F., Altemose, N., Dorlhiac, G. F., Cypess, A. M., White, A. P., Yosef, N., Patti, M., Tseng, Y., Streets, A. 2022; 32 (2): 242-257


    Single-cell RNA sequencing (scRNA-seq) enables molecular characterization of complex biological tissues at high resolution. The requirement of single-cell extraction, however, makes it challenging for profiling tissues such as adipose tissue, for which collection of intact single adipocytes is complicated by their fragile nature. For such tissues, single-nucleus extraction is often much more efficient and therefore single-nucleus RNA sequencing (snRNA-seq) presents an alternative to scRNA-seq. However, nuclear transcripts represent only a fraction of the transcriptome in a single cell, with snRNA-seq marked with inherent transcript enrichment and detection biases. Therefore, snRNA-seq may be inadequate for mapping important transcriptional signatures in adipose tissue. In this study, we compare the transcriptomic landscape of single nuclei isolated from preadipocytes and mature adipocytes across human white and brown adipocyte lineages, with whole-cell transcriptome. We show that snRNA-seq is capable of identifying the broad cell types present in scRNA-seq at all states of adipogenesis. However, we also explore how and why the nuclear transcriptome is biased and limited, as well as how it can be advantageous. We robustly characterize the enrichment of nuclear-localized transcripts and adipogenic regulatory lncRNAs in snRNA-seq, while also providing a detailed understanding for the preferential detection of long genes upon using this technique. To remove such technical detection biases, we propose a normalization strategy for a more accurate comparison of nuclear and cellular data. Finally, we show successful integration of scRNA-seq and snRNA-seq data sets with existing bioinformatic tools. Overall, our results illustrate the applicability of snRNA-seq for the characterization of cellular diversity in the adipose tissue.

    View details for DOI 10.1101/gr.275509.121

    View details for Web of Science ID 000749564500004

    View details for PubMedID 35042723

    View details for PubMedCentralID PMC8805720

  • Two genetic variants explain the association of European ancestry with multiple sclerosis risk in African-Americans SCIENTIFIC REPORTS Nakatsuka, N., Patterson, N., Patsopoulos, N. A., Altemose, N., Tandon, A., Beecham, A. H., McCauley, J. L., Isobe, N., Hauser, S., De Jager, P. L., Hafler, D. A., Oksenberg, J. R., Reich, D. 2020; 10 (1): 16902


    Epidemiological studies have suggested differences in the rate of multiple sclerosis (MS) in individuals of European ancestry compared to African ancestry, motivating genetic scans to identify variants that could contribute to such patterns. In a whole-genome scan in 899 African-American cases and 1155 African-American controls, we confirm that African-Americans who inherit segments of the genome of European ancestry at a chromosome 1 locus are at increased risk for MS [logarithm of odds (LOD) = 9.8], although the signal weakens when adding an additional 406 cases, reflecting heterogeneity in the two sets of cases [logarithm of odds (LOD) = 2.7]. The association in the 899 individuals can be fully explained by two variants previously associated with MS in European ancestry individuals. These variants tag a MS susceptibility haplotype associated with decreased CD58 gene expression (odds ratio of 1.37; frequency of 84% in Europeans and 22% in West Africans for the tagging variant) as well as another haplotype near the FCRL3 gene (odds ratio of 1.07; frequency of 49% in Europeans and 8% in West Africans). Controlling for all other genetic and environmental factors, the two variants predict a 1.44-fold higher rate of MS in European-Americans compared to African-Americans.

    View details for DOI 10.1038/s41598-020-74035-7

    View details for Web of Science ID 000615373100014

    View details for PubMedID 33037294

    View details for PubMedCentralID PMC7547691

  • On-ratio PDMS bonding for multilayer microfluidic device fabrication JOURNAL OF MICROMECHANICS AND MICROENGINEERING Lai, A., Altemose, N., White, J. A., Streets, A. M. 2019; 29 (10)
  • Re-engineering the zinc fingers of PRDM9 reverses hybrid sterility in mice NATURE Davies, B., Hatton, E., Altemose, N., Hussin, J. G., Pratto, F., Zhang, G., Hinch, A., Moralli, D., Biggs, D., Diaz, R., Preece, C., Li, R., Bitoun, E., Brick, K., Green, C. M., Amerini-Otero, R. C., Myers, S. R., Donnelly, P. 2016; 530 (7589): 171-+


    The DNA-binding protein PRDM9 directs positioning of the double-strand breaks (DSBs) that initiate meiotic recombination in mice and humans. Prdm9 is the only mammalian speciation gene yet identified and is responsible for sterility phenotypes in male hybrids of certain mouse subspecies. To investigate PRDM9 binding and its role in fertility and meiotic recombination, we humanized the DNA-binding domain of PRDM9 in C57BL/6 mice. This change repositions DSB hotspots and completely restores fertility in male hybrids. Here we show that alteration of one Prdm9 allele impacts the behaviour of DSBs controlled by the other allele at chromosome-wide scales. These effects correlate strongly with the degree to which each PRDM9 variant binds both homologues at the DSB sites it controls. Furthermore, higher genome-wide levels of such 'symmetric' PRDM9 binding associate with increasing fertility measures, and comparisons of individual hotspots suggest binding symmetry plays a downstream role in the recombination process. These findings reveal that subspecies-specific degradation of PRDM9 binding sites by meiotic drive, which steadily increases asymmetric PRDM9 binding, has impacts beyond simply changing hotspot positions, and strongly support a direct involvement in hybrid infertility. Because such meiotic drive occurs across mammals, PRDM9 may play a wider, yet transient, role in the early stages of speciation.

    View details for DOI 10.1038/nature16931

    View details for Web of Science ID 000369916700029

    View details for PubMedID 26840484

    View details for PubMedCentralID PMC4756437

  • Non-crossover gene conversions show strong GC bias and unexpected clustering in humans ELIFE Williams, A. L., Genovese, G., Dyer, T., Altemose, N., Truax, K., Jun, G., Patterson, N., Myers, S. R., Curran, J. E., Duggirala, R., Blangero, J., Reich, D., Przeworski, M., T2D-GENES Consortium 2015; 4


    Although the past decade has seen tremendous progress in our understanding of fine-scale recombination, little is known about non-crossover (NCO) gene conversion. We report the first genome-wide study of NCO events in humans. Using SNP array data from 98 meioses, we identified 103 sites affected by NCO, of which 50/52 were confirmed in sequence data. Overlap with double strand break (DSB) hotspots indicates that most of the events are likely of meiotic origin. We estimate that a site is involved in a NCO at a rate of 5.9 × 10(-6)/bp/generation, consistent with sperm-typing studies, and infer that tract lengths span at least an order of magnitude. Observed NCO events show strong allelic bias at heterozygous AT/GC SNPs, with 68% (58-78%) transmitting GC alleles (p = 5 × 10(-4)). Strikingly, in 4 of 15 regions with resequencing data, multiple disjoint NCO tracts cluster in close proximity (∼20-30 kb), a phenomenon not previously seen in mammals.

    View details for DOI 10.7554/eLife.04637

    View details for Web of Science ID 000351867100004

    View details for PubMedID 25806687

    View details for PubMedCentralID PMC4404656

  • Recombination in the Human Pseudoautosomal Region PAR1 PLOS GENETICS Hinch, A. G., Altemose, N., Noor, N., Donnelly, P., Myers, S. R. 2014; 10 (7): e1004503


    The pseudoautosomal region (PAR) is a short region of homology between the mammalian X and Y chromosomes, which has undergone rapid evolution. A crossover in the PAR is essential for the proper disjunction of X and Y chromosomes in male meiosis, and PAR deletion results in male sterility. This leads the human PAR with the obligatory crossover, PAR1, to having an exceptionally high male crossover rate, which is 17-fold higher than the genome-wide average. However, the mechanism by which this obligatory crossover occurs remains unknown, as does the fine-scale positioning of crossovers across this region. Recent research in mice has suggested that crossovers in PAR may be mediated independently of the protein PRDM9, which localises virtually all crossovers in the autosomes. To investigate recombination in this region, we construct the most fine-scale genetic map containing directly observed crossovers to date using African-American pedigrees. We leverage recombination rates inferred from the breakdown of linkage disequilibrium in human populations and investigate the signatures of DNA evolution due to recombination. Further, we identify direct PRDM9 binding sites using ChIP-seq in human cells. Using these independent lines of evidence, we show that, in contrast with mouse, PRDM9 does localise peaks of recombination in the human PAR1. We find that recombination is a far more rapid and intense driver of sequence evolution in PAR1 than it is on the autosomes. We also show that PAR1 hotspot activities differ significantly among human populations. Finally, we find evidence that PAR1 hotspot positions have changed between human and chimpanzee, with no evidence of sharing among the hottest hotspots. We anticipate that the genetic maps built and validated in this work will aid research on this vital and fascinating region of the genome.

    View details for DOI 10.1371/journal.pgen.1004503

    View details for Web of Science ID 000339902600048

    View details for PubMedID 25033397

    View details for PubMedCentralID PMC4102438

  • Centromere reference models for human chromosomes X and Y satellite arrays GENOME RESEARCH Miga, K. H., Newton, Y., Jain, M., Altemose, N., Willard, H. F., Kent, W. 2014; 24 (4): 697-707


    The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.

    View details for DOI 10.1101/gr.159624.113

    View details for Web of Science ID 000334055600015

    View details for PubMedID 24501022

    View details for PubMedCentralID PMC3975068

  • Using population admixture to help complete maps of the human genome NATURE GENETICS Genovese, G., Handsaker, R. E., Li, H., Altemose, N., Lindgren, A. M., Chambert, K., Pasaniuc, B., Price, A. L., Reich, D., Morton, C. C., Pollak, M. R., Wilson, J. G., McCarroll, S. A. 2013; 45 (4): 406-414


    Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces using the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed at the RNA level and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.

    View details for DOI 10.1038/ng.2565

    View details for Web of Science ID 000316840600011

    View details for PubMedID 23435088

    View details for PubMedCentralID PMC3683849