Academic Appointments

Honors & Awards

  • K01 Mentored Scientist Career Development Award, National institute of Health (2022)
  • Stanford MCHRI Instructor K Support Award, Stanford MCHRI (2022)
  • School of Medicine Dean's Postdoctoral Fellowship (awarded, but declined offer), Stanford University (2016)
  • Stanford Graduate Fellowship, Stanford University (2010)
  • Penn Genomics Institute Undergraduate Research Fellowship, University of Pennsylvania (2007)
  • Penn Stem Cell Institute Undergraduate Research Fellowship (awarded, but declined offer), University of Pennsylvania (2007)

All Publications

  • Genomic data resources of the Brain Somatic Mosaicism Network for neuropsychiatric diseases. Scientific data Garrison, M. A., Jang, Y., Bae, T., Cherskov, A., Emery, S. B., Fasching, L., Jones, A., Moldovan, J. B., Molitor, C., Pochareddy, S., Peters, M. A., Shin, J. H., Wang, Y., Yang, X., Akbarian, S., Chess, A., Gage, F. H., Gleeson, J. G., Kidd, J. M., McConnell, M., Mills, R. E., Moran, J. V., Park, P. J., Sestan, N., Urban, A. E., Vaccarino, F. M., Walsh, C. A., Weinberger, D. R., Wheelan, S. J., Abyzov, A. 2023; 10 (1): 813


    Somatic mosaicism is defined as an occurrence of two or more populations of cells having genomic sequences differing at given loci in an individual who is derived from a single zygote. It is a characteristic of multicellular organisms that plays a crucial role in normal development and disease. To study the nature and extent of somatic mosaicism in autism spectrum disorder, bipolar disorder, focal cortical dysplasia, schizophrenia, and Tourette syndrome, a multi-institutional consortium called the Brain Somatic Mosaicism Network (BSMN) was formed through the National Institute of Mental Health (NIMH). In addition to genomic data of affected and neurotypical brains, the BSMN also developed and validated a best practices somatic single nucleotide variant calling workflow through the analysis of reference brain tissue. These resources, which include >400 terabytes of data from 1087 subjects, are now available to the research community via the NIMH Data Archive (NDA) and are described here.

    View details for DOI 10.1038/s41597-023-02645-7

    View details for PubMedID 37985666

    View details for PubMedCentralID 7749073

  • Pan-conserved segment tags identify ultra-conserved sequences across assemblies in the human pangenome. Cell reports methods Lee, H., Greer, S. U., Pavlichin, D. S., Zhou, B., Urban, A. E., Weissman, T., Ji, H. P. 2023; 3 (8): 100543


    The human pangenome, a new reference sequence, addresses many limitations of the current GRCh38 reference. The first release is based on 94 high-quality haploid assemblies from individuals with diverse backgrounds. We employed a k-mer indexing strategy for comparative analysis across multiple assemblies, including the pangenome reference, GRCh38, and CHM13, a telomere-to-telomere reference assembly. Our k-mer indexing approach enabled us to identify a valuable collection of universally conserved sequences across all assemblies, referred to as "pan-conserved segment tags" (PSTs). By examining intervals between these segments, we discerned highly conserved genomic segments and those with structurally related polymorphisms. We found 60,764 polymorphic intervals with unique geo-ethnic features in the pangenome reference. In this study, we utilized ultra-conserved sequences (PSTs) to forge a link between human pangenome assemblies and reference genomes. This methodology enables the examination of any sequence of interest within the pangenome, using the reference genome as a comparative framework.

    View details for DOI 10.1016/j.crmeth.2023.100543

    View details for PubMedID 37671027

    View details for PubMedCentralID PMC10475782

  • Analysis of somatic mutations in 131 human brains reveals aging-associated hypermutability. Science (New York, N.Y.) Bae, T., Fasching, L., Wang, Y., Shin, J. H., Suvakov, M., Jang, Y., Norton, S., Dias, C., Mariani, J., Jourdon, A., Wu, F., Panda, A., Pattni, R., Chahine, Y., Yeh, R., Roberts, R. C., Huttner, A., Kleinman, J. E., Hyde, T. M., Straub, R. E., Walsh, C. A., Urban, A. E., Leckman, J. F., Weinberger, D. R., Vaccarino, F. M., Abyzov, A., Walsh, C. A., Park, P. J., Sestan, N., Weinberger, D., Moran, J. V., Gage, F. H., Vaccarino, F. M., Gleeson, J., Mathern, G., Courchesne, E., Roy, S., Chess, A. J., Akbarian, S., Bizzotto, S., Coulter, M., Dias, C., D'Gama, A., Ganz, J., Hill, R., Huang, A. Y., Khoshkhoo, S., Kim, S., Lee, A., Lodato, M., Maury, E. A., Miller, M., Borges-Monroy, R., Rodin, R., Zhou, Z., Bohrson, C., Chu, C., Cortes-Ciriano, I., Dou, Y., Galor, A., Gulhan, D., Kwon, M., Luquette, J., Sherman, M., Viswanadham, V., Jones, A., Rosenbluh, C., Cho, S., Langmead, B., Thorpe, J., Erwin, J., Jaffe, A., McConnell, M., Narurkar, R., Paquola, A., Shin, J., Straub, R., Abyzov, A., Bae, T., Jang, Y., Wang, Y., Molitor, C., Peters, M., Linker, S., Reed, P., Wang, M., Urban, A., Zhou, B., Zhu, X., Pattni, R., Serres Amero, A., Juan, D., Lobon, I., Marques-Bonet, T., Solis Moruno, M., Garcia Perez, R., Povolotskaya, I., Soriano, E., Antaki, D., Averbuj, D., Ball, L., Breuss, M., Yang, X., Chung, C., Emery, S. B., Flasch, D. A., Kidd, J. M., Kopera, H. C., Kwan, K. Y., Mills, R. E., Moldovan, J. B., Sun, C., Zhao, X., Zhou, W., Frisbie, T. J., Cherskov, A., Fasching, L., Jourdon, A., Pochareddy, S., Scuderi, S. 2022; 377 (6605): 511-517


    We analyzed 131 human brains (44 neurotypical, 19 with Tourette syndrome, 9 with schizophrenia, and 59 with autism) for somatic mutations after whole genome sequencing to a depth of more than 200×. Typically, brains had 20 to 60 detectable single-nucleotide mutations, but ~6% of brains harbored hundreds of somatic mutations. Hypermutability was associated with age and damaging mutations in genes implicated in cancers and, in some brains, reflected in vivo clonal expansions. Somatic duplications, likely arising during development, were found in ~5% of normal and diseased brains, reflecting background mutagenesis. Brains with autism were associated with mutations creating putative transcription factor binding motifs in enhancer-like regions in the developing brain. The top-ranked affected motifs corresponded to MEIS (myeloid ectopic viral integration site) transcription factors, suggesting a potential link between their involvement in gene regulation and autism.

    View details for DOI 10.1126/science.abm6222

    View details for PubMedID 35901164

  • Somatic mosaicism reveals clonal distributions of neocortical development NATURE Breuss, M. W., Yang, X., Schlachetzki, J. M., Antaki, D., Lana, A. J., Xu, X., Chung, C., Chai, G., Stanley, V., Song, Q., Newmeyer, T. F., An Nguyen, O'Brien, S., Hoeksema, M. A., Cao, B., Nott, A., McEvoy-Venneri, J., Pasillas, M. P., Barton, S. T., Copeland, B. R., Nahas, S., Van der Kraan, L., Ding, Y., Glass, C. K., Gleeson, J. G., NIMH Brain Somatic Mosaicism Netwo 2022; 604 (7907): 689-+


    The structure of the human neocortex underlies species-specific traits and reflects intricate developmental programs. Here we sought to reconstruct processes that occur during early development by sampling adult human tissues. We analysed neocortical clones in a post-mortem human brain through a comprehensive assessment of brain somatic mosaicism, acting as neutral lineage recorders1,2. We combined the sampling of 25 distinct anatomic locations with deep whole-genome sequencing in a neurotypical deceased individual and confirmed results with 5 samples collected from each of three additional donors. We identified 259 bona fide mosaic variants from the index case, then deconvolved distinct geographical, cell-type and clade organizations across the brain and other organs. We found that clones derived after the accumulation of 90-200 progenitors in the cerebral cortex tended to respect the midline axis, well before the anterior-posterior or ventral-dorsal axes, representing a secondary hierarchy following the overall patterning of forebrain and hindbrain domains. Clones across neocortically derived cells were consistent with a dual origin from both dorsal and ventral cellular populations, similar to rodents, whereas the microglia lineage appeared distinct from other resident brain cells. Our data provide a comprehensive analysis of brain somatic mosaicism across the neocortex and demonstrate cellular origins and progenitor distribution patterns within the human brain.

    View details for DOI 10.1038/s41586-022-04602-7

    View details for Web of Science ID 000784115900004

    View details for PubMedID 35444276

    View details for PubMedCentralID PMC9436791

  • Comprehensive identification of somatic nucleotide variants in human brain tissue. Genome biology Wang, Y., Bae, T., Thorpe, J., Sherman, M. A., Jones, A. G., Cho, S., Daily, K., Dou, Y., Ganz, J., Galor, A., Lobon, I., Pattni, R., Rosenbluh, C., Tomasi, S., Tomasini, L., Yang, X., Zhou, B., Akbarian, S., Ball, L. L., Bizzotto, S., Emery, S. B., Doan, R., Fasching, L., Jang, Y., Juan, D., Lizano, E., Luquette, L. J., Moldovan, J. B., Narurkar, R., Oetjens, M. T., Rodin, R. E., Sekar, S., Shin, J. H., Soriano, E., Straub, R. E., Zhou, W., Chess, A., Gleeson, J. G., Marques-Bonet, T., Park, P. J., Peters, M. A., Pevsner, J., Walsh, C. A., Weinberger, D. R., Brain Somatic Mosaicism Network, Vaccarino, F. M., Moran, J. V., Urban, A. E., Kidd, J. M., Mills, R. E., Abyzov, A. 2021; 22 (1): 92


    BACKGROUND: Post-zygotic mutations incurred during DNA replication, DNA repair, and other cellular processes lead to somatic mosaicism. Somatic mosaicism is an established cause of various diseases, including cancers. However, detecting mosaic variants in DNA from non-cancerous somatic tissues poses significant challenges, particularly if the variants only are present in a small fraction of cells.RESULTS: Here, the Brain Somatic Mosaicism Network conducts a coordinated, multi-institutional study to examine the ability of existing methods to detect simulated somatic single-nucleotide variants (SNVs) in DNA mixing experiments, generate multiple replicates of whole-genome sequencing data from the dorsolateral prefrontal cortex, other brain regions, dura mater, and dural fibroblasts of a single neurotypical individual, devise strategies to discover somatic SNVs, and apply various approaches to validate somatic SNVs. These efforts lead to the identification of 43 bona fide somatic SNVs that range in variant allele fractions from ~0.005 to ~0.28. Guided by these results, we devise best practices for calling mosaic SNVs from 250* whole-genome sequencing data in the accessible portion of the human genome that achieve 90% specificity and sensitivity. Finally, we demonstrate that analysis of multiple bulk DNA samples from a single individual allows the reconstruction of early developmental cell lineage trees.CONCLUSIONS: This study provides a unified set of best practices to detect somatic SNVs in non-cancerous tissues. The data and methods are freely available to the scientific community and should serve as a guide to assess the contributions of somatic SNVs to neuropsychiatric diseases.

    View details for DOI 10.1186/s13059-021-02285-3

    View details for PubMedID 33781308

  • Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nature neuroscience Zhu, X. n., Zhou, B. n., Pattni, R. n., Gleason, K. n., Tan, C. n., Kalinowski, A. n., Sloan, S. n., Fiston-Lavier, A. S., Mariani, J. n., Petrov, D. n., Barres, B. A., Duncan, L. n., Abyzov, A. n., Vogel, H. n., Moran, J. V., Vaccarino, F. M., Tamminga, C. A., Levinson, D. F., Urban, A. E. 2021


    Retrotransposons can cause somatic genome variation in the human nervous system, which is hypothesized to have relevance to brain development and neuropsychiatric disease. However, the detection of individual somatic mobile element insertions presents a difficult signal-to-noise problem. Using a machine-learning method (RetroSom) and deep whole-genome sequencing, we analyzed L1 and Alu retrotransposition in sorted neurons and glia from human brains. We characterized two brain-specific L1 insertions in neurons and glia from a donor with schizophrenia. There was anatomical distribution of the L1 insertions in neurons and glia across both hemispheres, indicating retrotransposition occurred during early embryogenesis. Both insertions were within the introns of genes (CNNM2 and FRMD4A) inside genomic loci associated with neuropsychiatric disorders. Proof-of-principle experiments revealed these L1 insertions significantly reduced gene expression. These results demonstrate that RetroSom has broad applications for studies of brain development and may provide insight into the possible pathological effects of somatic retrotransposition.

    View details for DOI 10.1038/s41593-020-00767-4

    View details for PubMedID 33432196

  • Complex mosaic structural variations in human fetal brains. Genome research Sekar, S. n., Tomasini, L. n., Proukakis, C. n., Bae, T. n., Manlove, L. n., Jang, Y. n., Scuderi, S. n., Zhou, B. n., Kalyva, M. n., Amiri, A. n., Mariani, J. n., Sedlazeck, F. n., Urban, A. E., Vaccarino, F. n., Abyzov, A. n. 2020


    Somatic mosaicism, manifesting as single nucleotide variants (SNVs), mobile element insertions and structural changes in the DNA, is a common phenomenon in human brain cells, with potential functional consequences. Using a clonal approach, we previously detected 200-400 mosaic SNVs per cell in three human fetal brains (15 to 21 weeks post-conception). However, structural variation in the human fetal brain has not yet been investigated. Here, we discover and validate four mosaic structural variants (SVs) in the same brains and resolve their precise breakpoints. The SVs were of kilobase scale and complex, consisting of deletion(s) and rearranged genomic fragments, which sometimes originated from different chromosomes. Sequences at the breakpoints of these rearrangements had microhomologies, suggesting their origin from replication errors. One SV was found in two clones and we timed its origin to ~14 weeks post-conception. No large scale mosaic copy number variants (CNVs) were detectable in normal fetal human brains, suggesting that previously reported megabase-scale CNVs in neurons arise at later stages of development. By reanalysis of public single nuclei data from adult brain neurons, we detected an extra-chromosomal circular DNA event. Our study reveals the existence of mosaic SVs in the developing human brain, likely arising from cell proliferation during mid-neurogenesis. Although relatively rare compared to SNVs, and present in ~10% neurons, SVs in developing human brain affect a comparable number of bases in the genome (~6,200 vs ~4,000 bps), implying that they may have similar functional consequences.

    View details for DOI 10.1101/gr.262667.120

    View details for PubMedID 33122304

  • Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2 NUCLEIC ACIDS RESEARCH Zhou, B., Ho, S. S., Greer, S. U., Spies, N., Bell, J. M., Zhang, X., Zhu, X., Arthur, J. G., Byeon, S., Pattni, R., Saha, I., Huang, Y., Song, G., Perrin, D., Wong, W. H., Ji, H. P., Abyzov, A., Urban, A. E. 2019; 47 (8): 3846–61

    View details for DOI 10.1093/nar/gkz169

    View details for Web of Science ID 000473754300010

  • Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA. Nature communications Yang, E., Bahn, J. H., Hsiao, E. Y., Tan, B. X., Sun, Y., Fu, T., Zhou, B., Van Nostrand, E. L., Pratt, G. A., Freese, P., Wei, X., Quinones-Valdez, G., Urban, A. E., Graveley, B. R., Burge, C. B., Yeo, G. W., Xiao, X. 2019; 10 (1): 1338


    Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants (GVs) mediating post-transcriptional regulation. Recently, genome-wide detection of in vivo binding of RNA-binding proteins is greatly facilitated by the enhanced crosslinking and immunoprecipitation (eCLIP) method. We developed a new computational approach, called BEAPR, to identify allele-specific binding (ASB) events in eCLIP-Seq data. BEAPR takes into account crosslinking-induced sequence propensity and variations between replicated experiments. Using simulated and actual data, we show that BEAPR largely outperforms often-used count analysis methods. Importantly, BEAPR overcomes the inherent overdispersion problem of these methods. Complemented by experimental validations, we demonstrate that the application of BEAPR to ENCODE eCLIP-Seq data of 154 proteins helps to predict functional GVs that alter splicing or mRNA abundance. Moreover, many GVs with ASB patterns have known disease relevance. Overall, BEAPR is an effective method that helps to address the outstanding challenge of functional interpretation of GVs.

    View details for PubMedID 30902979

  • Comprehensive, integrated, and phased whole-genome analysis of the primary ENCODE cell line K562 GENOME RESEARCH Zhou, B., Ho, S. S., Greer, S. U., Zhu, X., Bell, J. M., Arthur, J. G., Spies, N., Zhang, X., Byeon, S., Pattni, R., Ben-Efraim, N., Haney, M. S., Haraksingh, R. R., Song, G., Ji, H. P., Perrin, D., Wong, W. H., Abyzov, A., Urban, A. E. 2019; 29 (3): 472–84
  • Extensive and deep sequencing of the Venter/HuRef genome for developing and benchmarking genome analysis tools. Scientific data Zhou, B., Arthur, J. G., Ho, S. S., Pattni, R., Huang, Y., Wong, W. H., Urban, A. E. 2018; 5: 180261


    We produced an extensive collection of deep re-sequencing datasets for the Venter/HuRef genome using the Illumina massively-parallel DNA sequencing platform. The original Venter genome sequence is a very-high quality phased assembly based on Sanger sequencing. Therefore, researchers developing novel computational tools for the analysis of human genome sequence variation for the dominant Illumina sequencing technology can test and hone their algorithms by making variant calls from these Venter/HuRef datasets and then immediately confirm the detected variants in the Sanger assembly, freeing them of the need for further experimental validation. This process also applies to implementing and benchmarking existing genome analysis pipelines. We prepared and sequenced 200bp and 350bp short-insert whole-genome sequencing libraries (sequenced to 100x and 40x genomic coverages respectively) as well as 2kb, 5kb, and 12kb mate-pair libraries (49x, 122x, and 145x physical coverages respectively). Lastly, we produced a linked-read library (128x physical coverage) from which we also performed haplotype phasing.

    View details for PubMedID 30561434

  • Whole-genome sequencing analysis of CNV using low-coverage and paired-end strategies is efficient and outperforms array-based CNV analysis JOURNAL OF MEDICAL GENETICS Zhou, B., Ho, S. S., Zhang, X., Pattni, R., Haraksingh, R. R., Urban, A. E. 2018; 55 (11): 735-743
  • 1q21.1 microduplication: large verbal-nonverbal performance discrepancy and ddPCR assays of HYDIN/HYDIN2 copy number NPJ GENOMIC MEDICINE Xavier, J., Zhou, B., Bilan, F., Zhang, X., Gilbert-Dussardier, B., Viaux-Savelon, S., Pattni, R., Ho, S. S., Cohen, D., Levinson, D. F., Urbana, A. E., Laurent-Levinson, C. 2018; 3: 24


    Microduplication of chromosome 1q21.1 is observed in ~0.03% of adults. It has a highly variable, incompletely penetrant phenotype that can include intellectual disability, global developmental delay, specific learning disabilities, autism, schizophrenia, heart anomalies and dysmorphic features. We evaluated a 10-year-old-male with a 1q21.1 duplication by CGH microarray. He presented with major attention deficits, phonological dysphasia, poor fine motor skills, dysmorphia and mild autistic features, but not the typical macrocephaly. Neuropsychiatric evaluation demonstrated a novel phenotype: an unusually large discrepancy between non-verbal capacities (borderline-impaired WISC-IV index scores of 70 for Working Memory and 68 for Processing Speed) vs. strong verbal skills - scores of 126 for Verbal Comprehension (superior) and 111 for Perceptual Reasoning (normal). HYDIN2 has been hypothesized to underlie macrocephaly and perhaps cognitive deficits in this syndrome, but assessment of HYDIN2 copy number by microarray is difficult because of extensive segmental duplications. We performed whole-genome sequencing which supported HYDIN2 duplication (chr1:146,370,001-148,590,000, 2.22 Mb, hg38). To evaluate copy number more rigorously we developed droplet digital PCR assays of HYDIN2 (targeting unique 1 kb and 6 kb insertions) and its paralog HYDIN (targeting a unique 154 bp segment outside the HYDIN2 overlap). In an independent cohort, ddPCR was concordant with previous microarray data. Duplication of HYDIN2 was confirmed in the patient by ddPCR. This case demonstrates that a large discrepancy of verbal and non-verbal abilities can occur in 1q21.1 duplication syndrome, but it remains unclear whether this has a specific genomic basis. These ddPCR assays may be useful for future research on HYDIN2 copy number.

    View details for PubMedID 30155272

  • Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis SCIENCE Bae, T., Tomasini, L., Mariani, J., Zhou, B., Roychowdhury, T., Franjic, D., Pletikos, M., Pattni, R., Chen, B., Venturini, E., Riley-Gillis, B., Sestan, N., Urban, A. E., Abyzov, A., Vaccarino, F. M. 2018; 359 (6375): 550-+


    Somatic mosaicism in the human brain may alter function of individual neurons. We analyzed genomes of single cells from the forebrains of three human fetuses (15 to 21 weeks postconception) using clonal cell populations. We detected 200 to 400 single-nucleotide variations (SNVs) per cell. SNV patterns resembled those found in cancer cell genomes, indicating a role of background mutagenesis in cancer. SNVs with a frequency of >2% in brain were also present in the spleen, revealing a pregastrulation origin. We reconstructed cell lineages for the first five postzygotic cleavages and calculated a mutation rate of ~1.3 mutations per division per cell. Later in development, during neurogenesis, the mutation spectrum shifted toward oxidative damage, and the mutation rate increased. Both neurogenesis and early embryogenesis exhibit substantially more mutagenesis than adulthood.

    View details for PubMedID 29217587

  • Detection and Quantification of Mosaic Genomic DNA Variation in Primary Somatic Tissues Using ddPCR: Analysis of Mosaic Transposable-Element Insertions, Copy-Number Variants, and Single-Nucleotide Variants. Methods in molecular biology (Clifton, N.J.) Zhou, B. n., Haney, M. S., Zhu, X. n., Pattni, R. n., Abyzov, A. n., Urban, A. E. 2018; 1768: 173–90


    Here, we describe approaches using droplet digital polymerase chain reaction (ddPCR) to validate and quantify somatic mosaic events contributed by transposable-element insertions, copy-number variants, and single-nucleotide variants. In the ddPCR assay, sample or template DNA is partitioned into tens of thousands of individual droplets such that when DNA input is low, the vast majority of droplets contains no more than one copy of template DNA. PCR takes place in each individual droplet and produces a fluorescent readout to indicate the presence or absence of the target of interest allowing for the accurate "counting" of the number of copies present in the sample. The number of partitions is large enough to assay somatic mosaic events with frequencies down to less than 1%.

    View details for PubMedID 29717444

  • Intersection of diverse neuronal genomes and neuropsychiatric disease: The Brain Somatic Mosaicism Network SCIENCE McConnell, M. J., Moran, J. V., Abyzov, A., Akbarian, S., Bae, T., Cortes-Ciriano, I., Erwin, J. A., Fasching, L., Flasch, D. A., Freed, D., Ganz, J., Jaffe, A. E., Kwan, K. Y., Kwon, M., Lodato, M. A., Mills, R. E., Paquola, A. C., Rodin, R. E., Rosenbluh, C., Sestan, N., Sherman, M. A., Shin, J. H., Song, S., Straub, R. E., Thorpe, J., Weinberger, D. R., Urban, A. E., Zhou, B., Gage, F. H., Lehner, T., Senthil, G., Walsh, C. A., Chess, A., Courchesne, E., Gleeson, J. G., Kidd, J. M., Park, P. J., Pevsner, J., Vaccarino, F. M. 2017; 356 (6336): 395-?


    Neuropsychiatric disorders have a complex genetic architecture. Human genetic population-based studies have identified numerous heritable sequence and structural genomic variants associated with susceptibility to neuropsychiatric disease. However, these germline variants do not fully account for disease risk. During brain development, progenitor cells undergo billions of cell divisions to generate the ~80 billion neurons in the brain. The failure to accurately repair DNA damage arising during replication, transcription, and cellular metabolism amid this dramatic cellular expansion can lead to somatic mutations. Somatic mutations that alter subsets of neuronal transcriptomes and proteomes can, in turn, affect cell proliferation and survival and lead to neurodevelopmental disorders. The long life span of individual neurons and the direct relationship between neural circuits and behavior suggest that somatic mutations in small populations of neurons can significantly affect individual neurodevelopment. The Brain Somatic Mosaicism Network has been founded to study somatic mosaicism both in neurotypical human brains and in the context of complex neuropsychiatric disorders.

    View details for DOI 10.1126/science.aal1641

    View details for Web of Science ID 000400143000042

    View details for PubMedID 28450582

  • One thousand somatic SNVs per skin fibroblast cell set baseline of mosaic mutational load with patterns that suggest proliferative origin. Genome research Abyzov, A., Tomasini, L., Zhou, B., Vasmatzis, N., Coppola, G., Amenduni, M., Pattni, R., Wilson, M., Gerstein, M., Weissman, S., Urban, A. E., Vaccarino, F. M. 2017


    Few studies have been conducted to understand post-zygotic accumulation of mutations in cells of the healthy human body. We reprogrammed 32 skin fibroblast cells from families of donors into human induced pluripotent stem cell (hiPSC) lines. The clonal nature of hiPSC lines allows a high-resolution analysis of the genomes of the founder fibroblast cells without being confounded by the artifacts of single-cell whole-genome amplification. We estimate that on average a fibroblast cell in children has 1035 mostly benign mosaic SNVs. On average, 235 SNVs could be directly confirmed in the original fibroblast population by ultradeep sequencing, down to an allele frequency (AF) of 0.1%. More sensitive droplet digital PCR experiments confirmed more SNVs as mosaic with AF as low as 0.01%, suggesting that 1035 mosaic SNVs per fibroblast cell is the true average. Similar analyses in adults revealed no significant increase in the number of SNVs per cell, suggesting that a major fraction of mosaic SNVs in fibroblasts arises during development. Mosaic SNVs were distributed uniformly across the genome and were enriched in a mutational signature previously observed in cancers and in de novo variants and which, we hypothesize, is a hallmark of normal cell proliferation. Finally, AF distribution of mosaic SNVs had distinct narrow peaks, which could be a characteristic of clonal cell selection, clonal expansion, or both. These findings reveal a large degree of somatic mosaicism in healthy human tissues, link de novo and cancer mutations to somatic mosaicism, and couple somatic mosaicism with cell proliferation.

    View details for DOI 10.1101/gr.215517.116

    View details for PubMedID 28235832

    View details for PubMedCentralID PMC5378170

  • The global regulatory architecture of transcription during the Caulobacter cell cycle. PLoS genetics Zhou, B., Schrader, J. M., Kalogeraki, V. S., Abeliuk, E., Dinh, C. B., Pham, J. Q., Cui, Z. Z., Dill, D. L., McAdams, H. H., Shapiro, L. 2015; 11 (1)


    Each Caulobacter cell cycle involves differentiation and an asymmetric cell division driven by a cyclical regulatory circuit comprised of four transcription factors (TFs) and a DNA methyltransferase. Using a modified global 5' RACE protocol, we globally mapped transcription start sites (TSSs) at base-pair resolution, measured their transcription levels at multiple times in the cell cycle, and identified their transcription factor binding sites. Out of 2726 TSSs, 586 were shown to be cell cycle-regulated and we identified 529 binding sites for the cell cycle master regulators. Twenty-three percent of the cell cycle-regulated promoters were found to be under the combinatorial control of two or more of the global regulators. Previously unknown features of the core cell cycle circuit were identified, including 107 antisense TSSs which exhibit cell cycle-control, and 241 genes with multiple TSSs whose transcription levels often exhibited different cell cycle timing. Cumulatively, this study uncovered novel new layers of transcriptional regulation mediating the bacterial cell cycle.

    View details for DOI 10.1371/journal.pgen.1004831

    View details for PubMedID 25569173

    View details for PubMedCentralID PMC4287350

  • The coding and noncoding architecture of the Caulobacter crescentus genome. PLoS genetics Schrader, J. M., Zhou, B., Li, G., Lasker, K., Childers, W. S., Williams, B., Long, T., Crosson, S., McAdams, H. H., Weissman, J. S., Shapiro, L. 2014; 10 (7)


    Caulobacter crescentus undergoes an asymmetric cell division controlled by a genetic circuit that cycles in space and time. We provide a universal strategy for defining the coding potential of bacterial genomes by applying ribosome profiling, RNA-seq, global 5'-RACE, and liquid chromatography coupled with tandem mass spectrometry (LC-MS) data to the 4-megabase C. crescentus genome. We mapped transcript units at single base-pair resolution using RNA-seq together with global 5'-RACE. Additionally, using ribosome profiling and LC-MS, we mapped translation start sites and coding regions with near complete coverage. We found most start codons lacked corresponding Shine-Dalgarno sites although ribosomes were observed to pause at internal Shine-Dalgarno sites within the coding DNA sequence (CDS). These data suggest a more prevalent use of the Shine-Dalgarno sequence for ribosome pausing rather than translation initiation in C. crescentus. Overall 19% of the transcribed and translated genomic elements were newly identified or significantly improved by this approach, providing a valuable genomic resource to elucidate the complete C. crescentus genetic circuitry that controls asymmetric cell division.

    View details for DOI 10.1371/journal.pgen.1004463

    View details for PubMedID 25078267

    View details for PubMedCentralID PMC4117421

  • Global methylation state at base-pair resolution of the Caulobacter genome throughout the cell cycle. Proceedings of the National Academy of Sciences of the United States of America Kozdon, J. B., Melfi, M. D., Luong, K., Clark, T. A., Boitano, M., Wang, S., Zhou, B., Gonzalez, D., Collier, J., Turner, S. W., Korlach, J., Shapiro, L., McAdams, H. H. 2013; 110 (48): E4658-67


    The Caulobacter DNA methyltransferase CcrM is one of five master cell-cycle regulators. CcrM is transiently present near the end of DNA replication when it rapidly methylates the adenine in hemimethylated GANTC sequences. The timing of transcription of two master regulator genes and two cell division genes is controlled by the methylation state of GANTC sites in their promoters. To explore the global extent of this regulatory mechanism, we determined the methylation state of the entire chromosome at every base pair at five time points in the cell cycle using single-molecule, real-time sequencing. The methylation state of 4,515 GANTC sites, preferentially positioned in intergenic regions, changed progressively from full to hemimethylation as the replication forks advanced. However, 27 GANTC sites remained unmethylated throughout the cell cycle, suggesting that these protected sites could participate in epigenetic regulatory functions. An analysis of the time of activation of every cell-cycle regulatory transcription start site, coupled to both the position of a GANTC site in their promoter regions and the time in the cell cycle when the GANTC site transitions from full to hemimethylation, allowed the identification of 59 genes as candidates for epigenetic regulation. In addition, we identified two previously unidentified N(6)-methyladenine motifs and showed that they maintained a constant methylation state throughout the cell cycle. The cognate methyltransferase was identified for one of these motifs as well as for one of two 5-methylcytosine motifs.

    View details for DOI 10.1073/pnas.1319315110

    View details for PubMedID 24218615

  • The risk of adolescent suicide across patterns of drug use: a nationally representative study of high school students in the United States from 1999 to 2009. Social psychiatry and psychiatric epidemiology Wong, S. S., Zhou, B., Goebert, D., Hishinuma, E. S. 2013; 48 (10): 1611-1620


    OBJECTIVE: Substance use is associated with suicidal ideation, planning and attempts among adolescents, but it is unclear how this association varies across different types and number of substances. This study examined the association between patterns of substance use and suicidality among a nationally representative sample of high school students in the United States during the last decade. METHOD: Data from the 2001 to 2009 Youth Risk Behavior Survey including 73,183 high school students were analyzed. Logistic regression analyses examined the association between lifetime use of ten common substances of abuse (alcohol, cocaine, ecstasy, hallucinogens, heroin, inhalants, marijuana, methamphetamines, steroids, and tobacco) and four measures of suicidality over the last year (suicidal ideation, suicide plan, suicide attempt, and severe suicide attempt requiring medical attention), controlling for potential confounders (socio-demographic variables, interpersonal violence, sexual intercourse, and symptoms of depression and eating disorder). RESULTS: Among the ten substances, univariate analysis demonstrates that adolescents reporting a history of heroin use have the strongest association with suicidal ideation, suicide plan, suicide attempts and severe suicide attempts in the last year (odds ratio = 5.0, 5.9, 12.0, and 23.6 compared to non-users), followed by users of methamphetamines (OR = 4.3-13.1) and steroids (OR = 3.7-11.8). Cocaine, ecstasy, hallucinogens and inhalants had a moderate association with suicidality (OR = 3.1-10.8). Users of marijuana, alcohol and tobacco also had an increased odds ratio of suicidality (OR = 1.9-5.2). The association between each of ten substances and the four measures of suicidality remained significant with multivariate analysis controlling for multiple confounders (p < 0.05), except for the association between alcohol use and severe suicide attempts. The seven illicit substances had a stronger association with severe suicide attempts as compared to all other confounding risk factors except depression. The number of substances used had a graded relationship to suicidality. CONCLUSIONS: Substance abuse is a strong risk factor for suicidal thoughts and behaviors among American high school students, with the strength of this relationship dramatically increasing with particular illicit drugs and a higher number of substances. The findings reinforce the importance of routine screening for substance abuse in the assessment of adolescent suicide risk.

    View details for DOI 10.1007/s00127-013-0721-z

    View details for PubMedID 23744443

  • Molecular mechanism underlying the regulatory specificity of a Drosophila homeodomain protein that specifies myoblast identity DEVELOPMENT Busser, B. W., Shokri, L., Jaeger, S. A., Gisselbrecht, S. S., Singhania, A., Berger, M. F., Zhou, B., Bulyk, M. L., Michelson, A. M. 2012; 139 (6): 1164-1174


    A subfamily of Drosophila homeodomain (HD) transcription factors (TFs) controls the identities of individual muscle founder cells (FCs). However, the molecular mechanisms by which these TFs generate unique FC genetic programs remain unknown. To investigate this problem, we first applied genome-wide mRNA expression profiling to identify genes that are activated or repressed by the muscle HD TFs Slouch (Slou) and Muscle segment homeobox (Msh). Next, we used protein-binding microarrays to define the sequences that are bound by Slou, Msh and other HD TFs that have mesodermal expression. These studies revealed that a large class of HDs, including Slou and Msh, predominantly recognize TAAT core sequences but that each HD also binds to unique sites that deviate from this canonical motif. To understand better the regulatory specificity of an individual FC identity HD, we evaluated the functions of atypical binding sites that are preferentially bound by Slou relative to other HDs within muscle enhancers that are either activated or repressed by this TF. These studies showed that Slou regulates the activities of particular myoblast enhancers through Slou-preferred sequences, whereas swapping these sequences for sites that are capable of binding to multiple HD family members does not support the normal regulatory functions of Slou. Moreover, atypical Slou-binding sites are overrepresented in putative enhancers associated with additional Slou-responsive FC genes. Collectively, these studies provide new insights into the roles of individual HD TFs in determining cellular identity, and suggest that the diversity of HD binding preferences can confer regulatory specificity.

    View details for DOI 10.1242/dev.077362

    View details for Web of Science ID 000300640700013

    View details for PubMedID 22296846

    View details for PubMedCentralID PMC3283125

  • Persistent expression of Pax3 in the neural crest causes cleft palate and defective osteogenesis in mice JOURNAL OF CLINICAL INVESTIGATION Wu, M., Li, J., Engleka, K. A., Zhou, B., Lu, M. M., Plotkin, J. B., Epstein, J. A. 2008; 118 (6): 2076-2087


    Transcription factors regulate tissue patterning and cell fate determination during development; however, expression of early regulators frequently abates upon differentiation, suggesting that they may also play a role in maintaining an undifferentiated phenotype. The transcription factor paired box 3 (Pax3) is expressed by multipotent neural crest precursors and is implicated in neural crest disorders in humans such as Waardenburg syndrome. Pax3 is required for development of multiple neural crest lineages and for activation of lineage-specific programs, yet expression is generally extinguished once neural crest cells migrate from the dorsal neural tube and differentiate. Using a murine Cre-inducible system, we asked whether persistent Pax3 expression in neural crest derivatives would affect development or patterning. We found that persistent expression of Pax3 in cranial neural crest cells resulted in cleft palate, ocular defects, malformation of the sphenoid bone, and perinatal lethality. Furthermore, we demonstrated that Pax3 directly regulates expression of Sostdc1, a soluble inhibitor of bone morphogenetic protein (BMP) signaling. Persistent Pax3 expression renders the cranial crest resistant to BMP-induced osteogenesis. Thus, one mechanism by which Pax3 maintains the undifferentiated state of neural crest mesenchyme may be to block responsiveness to differentiation signals from the environment. These studies provide in vivo evidence for the importance of Pax3 downregulation during differentiation of multipotent neural crest precursors and cranial development.

    View details for DOI 10.1172/JCI33715

    View details for Web of Science ID 000256445100013

    View details for PubMedID 18483623

    View details for PubMedCentralID PMC2381747

  • A molecular approach to species identification of Chenopodiaceae pollen grains in surface soil AMERICAN JOURNAL OF BOTANY Zhou, L., Pei, K., Zhou, B., Ma, K. 2007; 94 (3): 477-481


    Pollen identification and classification are important not only for palynologists, but also for systematists and ecologists. Because palynological methods for the identification of pollen in surface soil until now could resolve at best to the generic level, we have developed a molecular approach to species-level identification of Chenopodiaceae pollen in surface soils. Surface soil samples were collected in the central area of Junggar Desert Basin, Xinjiang, China. Fresh leaves of 19 Chenopodiaceae species were sampled for DNA sequencing, establishing a database of internal transcribed spacer (ITS) regions of nuclear ribosomal DNA for Chenopodiaceae. Individual chenopod pollen grains in a soil sample were separated from the soil and the ITS1 region of each pollen grain was amplified using nested PCR and sequenced. By comparing the amplified ITS1 sequences to those in the Chenopodiaceous database, we identified the pollen in the soil samples to the level of species. The new method provides a technical reference for species identification of soil surface pollen for other families. This work is necessary for further efforts to interpret the relationship of surface soil pollen to vegetation characteristics. It also has significant potential for enhancing the ability to identify pollen in clinical airborne allergen or criminological studies.

    View details for Web of Science ID 000245097500018

    View details for PubMedID 21636418

  • Seasonal variation in soil nitrogen availability under Mongolian pine plantations at the Keerqin Sand Lands, China JOURNAL OF ARID ENVIRONMENTS Chen, F. S., Zeng, D. H., Zhou, B., Singh, A. N., Fan, Z. P. 2006; 67 (2): 226-239